Speaker
Description
Longitudinal or clustered data often arise in clinical research, potentially violating the independent and identically distributed (i.i.d) assumption. In regression, (generalized) linear mixed-effect models are frequently used to account for the correlation structure of the data, but these come with restrictions such as the linearity assumption and pre-specification of predictors and their interaction. In contrast, machine learning based algorithms, such as regression trees, random forests and neural networks, offer greater flexibility. Amongst other things, they can automatically select the most relevant variables, capture non-linear relationships and interactions between variables. Nevertheless, these models usually assume i.i.d observations, at least implicitly. To make use of machine learning models in the scenario of clustered or longitudinal data, they need to account for such correlations.
To address this limitation, several extensions to machine learning algorithms have been proposed to incorporate random effects, drawing on the idea from (generalized) linear mixed effect models (Sela and Simonoff 2012; Hajjem et al. 2017; Ngufor et al. 2019). Alternative approaches include Multi-Task Learning and Recurrent Models for longitudinal data (Cascarano et al. 2023), and domain adaptation and domain generalization techniques for clustered data (Nguyen et al. 2023). However, a comprehensive review on supervised machine-learning algorithms capable of handling correlated data in both longitudinal and clustered formats is currently lacking.
This work aims to fill this gap by systematically identifying and comparing modelling strategies and methods for the analysis of longitudinal and clustered data with supervised machine learning algorithms. A secondary objective is to assess their use in biomedical research by examining applications in appropriate journals. To achieve this, we employ a scoping review methodology, which involves a comprehensive literature search to map the current knowledge base.
In this talk, we will present the findings of the scoping review, providing an overview of the available modelling strategies and comparing their strengths and limitations in the context of biomedical research. We will also provide insights into the current use of these modelling strategies, shedding light on their adoption in the field.
75002909208