18–21 May 2026
Europe/Warsaw timezone

A comparison of variable selection approaches for spline regression

20 May 2026, 11:03
18m
Room 13 A

Room 13 A

oral presentation Statistical modelling 2

Speaker

Franziska Kappenberg (University of Bonn, Medical Faculty, Institute for Medical Biometry, Informatics and Epidemiology)

Description

Multivariable regression models are a powerful statistical tool with an innumerable number of applications in explanatory and predictive settings. One key challenge is variable selection —deciding which variables to include or exclude, particularly when dealing with large numbers of candidate predictors. In biomedical data, non-linear relationships between the candidate predictors and the outcome of interest may be present. Generalized additive models (GAMs), which represent non-linear predictor effects using spline functions, are a popular method for modeling non-linear relationships (Perperoglou et al. 2019). In GAMs, the challenge of choosing an appropriate set of variables is further complicated by the simultaneous estimation of appropriate functional forms for the selected variables.
In this talk, we present an overview of selection algorithms in the context of GAMs (Marra, Wood 2011; Kovács 2024; Breheny, Huang 2015). The investigated selection methods can be grouped into stepwise (forward selection, backward elimination) and shrinkage approaches. The latter include the group LASSO, the group smoothly clipped absolute deviation (group SCAD), the non-negative garrote, and the double penalty approach in which both the range space and the null space of the penalty are shrunken.
To the best of our knowledge, many of these have not been used before in the context of GAMs and have not been formally compared. We compare the different approaches in a controlled simulation study, using a plasmode framework based on an open-source dataset to retain potentially complex correlation structures. The focus of the simulation study is the performance of the different resulting models, particularly with respect to calibration, discrimination, selection rates, model size and integrated squared loss (Ullmann 2025) . The simulation results provide valuable insights into the properties of the different selection approaches, marking an essential step toward building the evidence base for guidance regarding their use in explanatory and predictive analyses.

Breheny, P., Huang, J. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat Comput 25 (2015).
Kovács, L. Feature selection algorithms in generalized additive models under concurvity. Comput Stat 39 (2024).
Marra, G., Wood, S.N. Practical variable selection for generalized additive models. CSDA 55 (2011).
Perperoglou, A., Sauerbrei, W., Abrahamowicz, M. et al. A review of spline function procedures in R. BMC Med Res Methodol 19 (2019).
Ullmann, T., Heinze, G., Abrahamowicz, M. et al. A Systematic Categorization of Performance Measures for Estimated Non-Linear Associations Between an Outcome and Continuous Predictors. WIREs Comp Stats 17 (2025).

21429416749

Author

Thomas Prince (University of Bonn, Medical Faculty, Institute for Medical Biometry, Informatics and Epidemiology)

Co-authors

Daniela Dunkler (Medical University of Vienna, Institute of Clinical Biometrics, Center for Medical Data Science) Franziska Kappenberg (University of Bonn, Medical Faculty, Institute for Medical Biometry, Informatics and Epidemiology) Georg Heinze (Medical University of Vienna, Institute of Clinical Biometrics, Center for Medical Data Science City: Vienna) Matthias Schmid (University of Bonn, Medical Faculty, Institute for Medical Biometry, Informatics and Epidemiology City: Bonn) Wil Sauerbrei (University of Freiburg, Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center)

Presentation materials

There are no materials yet.