Speaker
Description
Machine learning (ML) models have emerged as a powerful alternative to traditional statistical methods due to their flexibility and ability to leverage large-scale, high-dimensional datasets. However, in sensitive application areas such as clinical and prognostic modeling, deploying ML models requires interpretability in order to reveal underlying model behavior, identify influential risk factors and detect potential biases. Although interpretable machine learning (IML) techniques are increasingly used to illuminate these “black box’’ models, the systematic evaluation of interpretability in non-standard prediction settings remains limited.
In this study, we examine IML for risk prediction modeling using longitudinal features such as diagnosis (ICD) and medication (ATC) codes, where high dimensionality and sparsity present major methodological challenges. Our focus is on prediction tasks with binary and time-to-event outcomes. We conduct a simulation study to evaluate the effectiveness of different IML techniques in explaining ML-based prediction models under increasing data complexity, including varying degrees of sparsity, dimensionality, and outcome types.
We fit several prediction models with an emphasis on deep neural network architectures tailored for longitudinal data, and apply a set of model-agnostic and model-specific IML techniques. We assess the accuracy with which these methods recover known data-generating relationships and the alignment of interpretability with predictive accuracy. Finally, we apply the evaluation framework to real-world health insurance data to assess generalizability. This study is the first to systematically evaluates and compares IML techniques for longitudinal prediction modeling. It offers practical guidance for method selection and advances understanding of IML’s role in risk prediction and clinical decision support within healthcare and biomedical contexts.
75002909317