Speaker
Description
Over the past two decades, the problem of selecting relevant variables in high-dimensional data analysis has gained particular importance in both statistics and machine learning. Despite substantial advances in modeling techniques and numerous algorithmic proposals, most existing approaches overlook the issue of missing observations — a phenomenon ubiquitous in real-world datasets, especially those derived from omics studies.
To bridge this methodological gap, we propose missKnockoffs, an extension of the Model-X knockoff framework to the setting of incomplete data. The procedure proceeds in two stages: first, missing values are imputed using selected imputation strategies; subsequently, so-called knockoff variables are generated to enable control of the False Discovery Rate (FDR). To reduce the impact of randomness in the knockoff generation process, we introduce a mechanism of multiple knockoff replications combined with appropriate aggregation techniques.
Furthermore, we propose a novel approach to aggregating knockoff statistics, which exhibits desirable and well-justified theoretical properties. The effectiveness of the missKnockoffs method is validated through an extensive suite of simulation experiments, primarily evaluating test power and FDR control capability.
In a comparative analysis, missKnockoffs is benchmarked against several state-of-the-art variable selection algorithms, including SLOBE and ABLAS. Experimental results demonstrate that the proposed method achieves competitive, and in many cases superior, performance in terms of balancing test power and false discovery control. Additionally, the method’s practical utility is confirmed through an application to real omics datasets.
64288205884