18–21 May 2026
Europe/Warsaw timezone

Integrative Prediction Models for Multi-Omics Data with Missing Modalities

20 May 2026, 11:21
18m
Room 14

Room 14

oral presentation High dimensional data 2

Speaker

Marina Bleskina (Institute of Medical Biometry and Statistics, University of Lübeck, University Hospital of Schleswig-Holstein, Campus Lübeck)

Description

Personalized medicine aims to improve the treatment of complex diseases by tailoring therapies to the individual molecular characteristics of patients. This is possible by using multi-omics data, which combine different molecular modalities from the same individuals. Integrating these modalities allows more comprehensive and powerful modeling. However, their unique characteristics make integration challenging, particularly when specific modalities are missing for some individuals.

Many existing approaches require complete datasets, excluding individuals with incomplete modalities. This substantially reduces the sample size and may impair predictive performance.

A promising approach suitable for such missingness involves training modality-specific prediction models separately. Their predictions are used as input for a meta-model that delivers the final predictions and handles missing values. However, it is currently unclear which meta-learners are optimal for a specific research question and combination of partially missing modalities.

We systematically evaluated the prediction performance of different meta-learners for multi-omics data with missing modalities in a simulation study. We simulated a binary outcome and methylation values, gene expression, and protein abundance using R package InterSIM based on parameters derived from the TCGA Breast Invasive Carcinoma data set, preserving realistic correlations within and across modalities. We considered scenarios varying in the number and combination of modalities exhibiting an effect, which could be independent or dependent across modalities. Effect sizes in each modality ranged from absent to weak, moderate, or strong. For each modality, individuals with missing data were randomly selected. In total, we generated 30 simulation settings, each repeated 100 times.

The evaluated meta-learners included weighted average, best modality-specific learner, logistic regression, least absolute shrinkage and selection operator (LASSO), the combined regression alternative (COBRA), and random forest. Their performance was assessed using the Brier Score, F1 score, and AUC based on predicted probabilities. Our results show that complex meta-learners, including logistic regression, LASSO, random forest, and COBRA, consistently outperform simpler approaches (weighted average and best modality-specific learner), particularly in settings with stronger effects.

64288212328

Author

Marina Bleskina (Institute of Medical Biometry and Statistics, University of Lübeck, University Hospital of Schleswig-Holstein, Campus Lübeck)

Co-authors

Fouodo Cesaire (Leibniz Institute for Prevention Research and Epidemiology – BIPS) Silke Szymczak (Institute of Medical Biometry and Statistics, University of Lübeck, University Hospital of Schleswig-Holstein, Campus Lübeck)

Presentation materials

There are no materials yet.