18–21 May 2026
Europe/Warsaw timezone

missKnockoffs: A Robust Approach to Variable Selection in Incomplete Omics Data under False Discovery Control

20 May 2026, 14:00
15m
Room 1 B

Room 1 B

oral presentation YSS2 (DR & PLR)

Speaker

Dominik Nowakowski (Medical University of Bialystok, Department of Biostatistics and Medical Informatics)

Description

Over the past two decades, the problem of selecting relevant variables in high-dimensional data analysis has gained particular importance in both statistics and machine learning. Despite substantial advances in modeling techniques and numerous algorithmic proposals, most existing approaches overlook the issue of missing observations — a phenomenon ubiquitous in real-world datasets, especially those derived from omics studies.

To bridge this methodological gap, we propose missKnockoffs, an extension of the Model-X knockoff framework to the setting of incomplete data. The procedure proceeds in two stages: first, missing values are imputed using selected imputation strategies; subsequently, so-called knockoff variables are generated to enable control of the False Discovery Rate (FDR). To reduce the impact of randomness in the knockoff generation process, we introduce a mechanism of multiple knockoff replications combined with appropriate aggregation techniques.

Furthermore, we propose a novel approach to aggregating knockoff statistics, which exhibits desirable and well-justified theoretical properties. The effectiveness of the missKnockoffs method is validated through an extensive suite of simulation experiments, primarily evaluating test power and FDR control capability.

In a comparative analysis, missKnockoffs is benchmarked against several state-of-the-art variable selection algorithms, including SLOBE and ABLAS. Experimental results demonstrate that the proposed method achieves competitive, and in many cases superior, performance in terms of balancing test power and false discovery control. Additionally, the method’s practical utility is confirmed through an application to real omics datasets.

64288205884

Author

Dominik Nowakowski (Medical University of Bialystok, Department of Biostatistics and Medical Informatics)

Presentation materials

There are no materials yet.