18–21 May 2026
Europe/Warsaw timezone

A Nested Cross-Validation Framework for Leakage-Free Calibration of Adaptive Elastic-Net Regression in High-Dimensional Data

19 May 2026, 11:57
18m
Room 14

Room 14

oral presentation High dimensional data 1

Speaker

Gul Inan (Koc University)

Description

Penalized regression models such as Lasso, Elastic-Net and their adaptive extensions are widely used for simultaneous variable selection and prediction in high-dimensional data analysis. However, conventional implementations of adaptive Elastic-Net (AdaENet) regression often estimate the adaptive hyper-parameter for the Elastic-Net penalty term using the entire dataset before dividing it into training and testing sets. This practice inadvertently introduces data leakage, allowing information from the test set to influence model training and leading to optimistically biased performance estimates.

To address this common issue, we propose a Nested Cross-Validation (Nested CV) framework for calibrating the AdaENet model. In our approach, all data-dependent operations, including adaptive weight estimation and hyper-parameter tuning, are performed strictly within the training folds of the inner CV loop. This ensures complete separation between training and testing data, yielding unbiased estimates of predictive performance and variable selection stability. The method jointly identifies the optimal model settings while rigorously assessing generalization accuracy on unseen data.

We systematically present the mathematical formulations of key penalized regression techniques (Lasso, Elastic-Net, Adaptive Lasso and AdaENet) and compare the standard K-fold CV procedure, susceptible to data leakage, with our proposed leakage-free Nested CV framework. Using comprehensive simulations across varying sample sizes, feature correlations and signal strengths, we evaluate and contrast model performance in terms of signed support accuracy, false positive rate and test mean squared error.

Results demonstrate that conventional adaptive Elastic-Net approaches substantially underestimate prediction error and overstate variable selection accuracy, while the proposed Nested CV framework achieves superior predictive reliability and robustness across diverse experimental conditions. This advancement provides a generalizable methodology for calibrating adaptive penalized regression models without information leakage. Particularly in complex, high-dimensional domains such as genetic studies, where data exhibit numerous interrelated predictors, our framework ensures reproducible, trustworthy and unbiased model evaluation.

85717605319

Author

Gul Inan (Koc University)

Co-authors

Abdussamed Cakir (Istanbul Technical University) Izzet Goksel (Istanbul Technical University)

Presentation materials

There are no materials yet.