Speaker
Description
Mixed-effects models (MEMs) are widely used in epidemiology to analyze data not being independent and identically distributed (i.i.d.) like longitudinal data. However, MEMs rely on parametric assumptions and require predefined interactions among predictors. In contrast, machine learning (ML) methods such as random forests (RF) assume i.i.d. data but are more flexible in capturing nonlinear relationships and interactions. Mixed-effects ML methods, which combine MEMs with ML, have shown improved prediction performance by leveraging the strengths while mitigating limitations of both approaches [1]. However, these mixed-effects ML methods remain underexplored in multi-cohort settings where multiple cohorts spanning different life stages are combined to extend the observation period and improve individual predictions. We propose multi-cohort mixed-effects random forests (MERFmulti-cohort), a new approach combining MEMs and RF to improve the predictions of individual-level health trajectories beyond the actual individual measurement period based on multi-cohort data.
Mixed-effects random forests (MERF) [2] iteratively estimate fixed effects and random effects, where fixed effects are captured by the RF while the dependencies in the data are accounted for through suitable correlation structures in the MEM. However, the covariance structures of random effects and residuals are oversimplified in previous MERF implementations and failed to accommodate the complexities in multi-cohort data with hierarchical structures. Our MERFmulti-cohort builds on the out-of-the-box (OOB) implementation of the mixed-effects machine learning framework (mixedML) by Kilian et al. [3] and extends MERF to accommodate complex covariance structures in multi-cohort data. We illustrate and evaluate our approach by predicting individual-level BMI z-score trajectories using harmonized data from two children cohorts.
We compared our MERFmulti-cohort to previous MERF approaches [4] and three conventional methods: (a) RF, (b) linear regression (LM), (c) MEM. The prediction accuracy was evaluated in four scenarios forecasting BMI z-score in children either based on single or multi-cohort data.
We show that MERFmulti-cohort outperforms RF, LM and previous MERF approaches when predicting BMI z-scores either in single or multi-cohort settings. When compared with MEM, the improvements pertain exclusively to certain multi-cohort scenarios. We provide guidance on prediction scenarios where MERFmulti-cohort outperforms other methods. We further discuss the benefits of using multi-cohort over single cohort data to enhance the accuracy of individual-level predictions, particularly in cases where single cohorts are limited in the age range covered.
96432307368