Speaker
Description
Multivariable regression models are a powerful statistical tool with an innumerable number of applications in explanatory and predictive settings. One key challenge is variable selection —deciding which variables to include or exclude, particularly when dealing with large numbers of candidate predictors. In biomedical data, non-linear relationships between the candidate predictors and the outcome of interest may be present. Generalized additive models (GAMs), which represent non-linear predictor effects using spline functions, are a popular method for modeling non-linear relationships (Perperoglou et al. 2019). In GAMs, the challenge of choosing an appropriate set of variables is further complicated by the simultaneous estimation of appropriate functional forms for the selected variables.
In this talk, we present an overview of selection algorithms in the context of GAMs (Marra, Wood 2011; Kovács 2024; Breheny, Huang 2015). The investigated selection methods can be grouped into stepwise (forward selection, backward elimination) and shrinkage approaches. The latter include the group LASSO, the group smoothly clipped absolute deviation (group SCAD), the non-negative garrote, and the double penalty approach in which both the range space and the null space of the penalty are shrunken.
To the best of our knowledge, many of these have not been used before in the context of GAMs and have not been formally compared. We compare the different approaches in a controlled simulation study, using a plasmode framework based on an open-source dataset to retain potentially complex correlation structures. The focus of the simulation study is the performance of the different resulting models, particularly with respect to calibration, discrimination, selection rates, model size and integrated squared loss (Ullmann 2025) . The simulation results provide valuable insights into the properties of the different selection approaches, marking an essential step toward building the evidence base for guidance regarding their use in explanatory and predictive analyses.
Breheny, P., Huang, J. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat Comput 25 (2015).
Kovács, L. Feature selection algorithms in generalized additive models under concurvity. Comput Stat 39 (2024).
Marra, G., Wood, S.N. Practical variable selection for generalized additive models. CSDA 55 (2011).
Perperoglou, A., Sauerbrei, W., Abrahamowicz, M. et al. A review of spline function procedures in R. BMC Med Res Methodol 19 (2019).
Ullmann, T., Heinze, G., Abrahamowicz, M. et al. A Systematic Categorization of Performance Measures for Estimated Non-Linear Associations Between an Outcome and Continuous Predictors. WIREs Comp Stats 17 (2025).
21429416749