18–21 May 2026
Europe/Warsaw timezone

Handling Missing Data in Life Science: A Comparative Study of Imputation Methods for Medical Data.

21 May 2026, 14:39
18m
Room 13 A

Room 13 A

oral presentation Missing data 1

Speaker

Maria Thurow (TU Dortmund University)

Description

Handling of missing data is a crucial aspect when preparing data sets for further analyses in several research areas. Previous studies have shown that the choice of imputation method can have a high influence on subsequent analyses, especially in medical research, where missing values often occur due to study design or data collection challenges.
In this study, we conduct a comparative simulation study of commonly used imputation methods. The study includes methods such as missRanger, mixgb (both Random Forest based imputation methods), MICE (Multiple Imputation by Chained Equations) and the naive imputation (based on the arithmetic mean and mode) as a benchmark.
We use both, simulated and real-world data sets from medical research. Based on these data sets, we show how to assess the imputation methods based on their imputation accuracy. Since there is no unique definition of the accuracy of an imputation method, we focus on different goals that researchers might have when imputing missing values. We assess the predictive accuracy (reconstruction of the actual values) by using normalized root squared error (NRMSE) and the proportion of false classification/imputation (PFC). To assess how well the original distribution is reconstructed, we use distribution distance measures such as a uni- and multivariate Kolmogorov Smirnov Statistic.
While previous studies often find tree-based methods to perform “best“ , our results demonstrate that no single method consistently outperforms others. The optimal choice depends on the analysis goal and evaluation criteria.
This study can be seen as a guide for researchers for selecting imputation methods aligned with different research goals with particular relevance for medical research and beyond.

85717600924

Author

Maria Thurow (TU Dortmund University)

Co-authors

Florian Dumpert (Federal Statistical Office of Germany) Inken Veips (TU Dortmund University) Markus Pauly (TU Dortmund University)

Presentation materials

There are no materials yet.