Speaker
Description
Background: Risk prediction models are increasingly being used in clinical practice to predict health outcomes. These models are often developed using data from multiple centres (clustered data) where patient outcomes within a centre are likely to be correlated. It is important that the dataset used to develop a risk model is of an appropriate size, to avoid model overfitting problems and poor predictions in new data. Wynants et al. recommended using at least 10 events per variable (including the random parameter) to minimise bias in the regression coefficients and obtain acceptable C-statistic values when applying a random-effects model to clustered data. This approach focused only on ‘median predictions’ where the random effect is ignored. More recently, Riley et al. (2020) and Pavlou et al. (2024) have proposed methods for sample size for independent data directly targeting the predictive performance of models however, these methods may not be appropriate for clustered data.
Methods: We conducted full-factorial simulations to assess whether the Wynants method provides sufficient sample sizes for developing prediction models with good predictive performance. We also evaluated the applicability of the sample size methods proposed by Riley and Pavlou for clustered data. Simulation scenarios varied by degree of clustering, number of clusters and predictors, model strength, and outcome prevalence. Model performance was assessed using mean absolute prediction error (MAPE), calibration slope (CS), and the c-statistic. Cluster-specific performance measures were applied, and acceptable target values were prespecified. In addition, we propose two new sample size calculation methods for clustered data: a meta-model based method and another that adapts the Riley and Pavlou approaches through the application of shrinkage. Both approaches directly target model performance measures based on cluster-specific predictions.
Results: None of the existing methods achieved the target MAPE values. The Wynants and Riley methods failed to attain a CS of at least 0.9 when outcome prevalence was ≥15%. All methods generally yielded c-statistics within 0.02 of their true values. The new methods consistently achieved the target MAPE values and produced CS ≥0.9, with c-statistics within 0.02 of their true values when prevalence was ≤25%.
Conclusions: Current sample size calculation methods for developing binary risk models often failed to ensure adequate predictive performance of models and may therefore be unsuitable for clustered data. We propose new sample size calculation approaches that consistently achieve strong predictive performance across a wide range of clustered data scenarios.
75002911349