Speaker
Description
Personalized medicine aims to improve the treatment of complex diseases by tailoring therapies to the individual molecular characteristics of patients. This is possible by using multi-omics data, which combine different molecular modalities from the same individuals. Integrating these modalities allows more comprehensive and powerful modeling. However, their unique characteristics make integration challenging, particularly when specific modalities are missing for some individuals.
Many existing approaches require complete datasets, excluding individuals with incomplete modalities. This substantially reduces the sample size and may impair predictive performance.
A promising approach suitable for such missingness involves training modality-specific prediction models separately. Their predictions are used as input for a meta-model that delivers the final predictions and handles missing values. However, it is currently unclear which meta-learners are optimal for a specific research question and combination of partially missing modalities.
We systematically evaluated the prediction performance of different meta-learners for multi-omics data with missing modalities in a simulation study. We simulated a binary outcome and methylation values, gene expression, and protein abundance using R package InterSIM based on parameters derived from the TCGA Breast Invasive Carcinoma data set, preserving realistic correlations within and across modalities. We considered scenarios varying in the number and combination of modalities exhibiting an effect, which could be independent or dependent across modalities. Effect sizes in each modality ranged from absent to weak, moderate, or strong. For each modality, individuals with missing data were randomly selected. In total, we generated 30 simulation settings, each repeated 100 times.
The evaluated meta-learners included weighted average, best modality-specific learner, logistic regression, least absolute shrinkage and selection operator (LASSO), the combined regression alternative (COBRA), and random forest. Their performance was assessed using the Brier Score, F1 score, and AUC based on predicted probabilities. Our results show that complex meta-learners, including logistic regression, LASSO, random forest, and COBRA, consistently outperform simpler approaches (weighted average and best modality-specific learner), particularly in settings with stronger effects.
64288212328