18–21 May 2026
Europe/Warsaw timezone

Advancing mixed-effects random forests to predict BMI development in children and adolescents based on multi-cohort data

20 May 2026, 11:39
18m
Room 12

Room 12

oral presentation Methods in epidemiology 1

Speaker

Jiumeng Zhang (Leibniz-Institute for Prevention Research and Epidemiology - BIPS)

Description

Mixed-effects models (MEMs) are widely used in epidemiology to analyze data not being independent and identically distributed (i.i.d.) like longitudinal data. However, MEMs rely on parametric assumptions and require predefined interactions among predictors. In contrast, machine learning (ML) methods such as random forests (RF) assume i.i.d. data but are more flexible in capturing nonlinear relationships and interactions. Mixed-effects ML methods, which combine MEMs with ML, have shown improved prediction performance by leveraging the strengths while mitigating limitations of both approaches [1]. However, these mixed-effects ML methods remain underexplored in multi-cohort settings where multiple cohorts spanning different life stages are combined to extend the observation period and improve individual predictions. We propose multi-cohort mixed-effects random forests (MERFmulti-cohort), a new approach combining MEMs and RF to improve the predictions of individual-level health trajectories beyond the actual individual measurement period based on multi-cohort data.
Mixed-effects random forests (MERF) [2] iteratively estimate fixed effects and random effects, where fixed effects are captured by the RF while the dependencies in the data are accounted for through suitable correlation structures in the MEM. However, the covariance structures of random effects and residuals are oversimplified in previous MERF implementations and failed to accommodate the complexities in multi-cohort data with hierarchical structures. Our MERFmulti-cohort builds on the out-of-the-box (OOB) implementation of the mixed-effects machine learning framework (mixedML) by Kilian et al. [3] and extends MERF to accommodate complex covariance structures in multi-cohort data. We illustrate and evaluate our approach by predicting individual-level BMI z-score trajectories using harmonized data from two children cohorts.
We compared our MERFmulti-cohort to previous MERF approaches [4] and three conventional methods: (a) RF, (b) linear regression (LM), (c) MEM. The prediction accuracy was evaluated in four scenarios forecasting BMI z-score in children either based on single or multi-cohort data.
We show that MERFmulti-cohort outperforms RF, LM and previous MERF approaches when predicting BMI z-scores either in single or multi-cohort settings. When compared with MEM, the improvements pertain exclusively to certain multi-cohort scenarios. We provide guidance on prediction scenarios where MERFmulti-cohort outperforms other methods. We further discuss the benefits of using multi-cohort over single cohort data to enhance the accuracy of individual-level predictions, particularly in cases where single cohorts are limited in the age range covered.

96432307368

Author

Jiumeng Zhang (Leibniz-Institute for Prevention Research and Epidemiology - BIPS)

Co-authors

Claudia Börnhorst (Leibniz-Institute for Prevention Research and Epidemiology - BIPS) Iris Pigeot (Leibniz-Institute for Prevention Research and Epidemiology - BIPS) Jordan Behrendt (University of Bremen) Siyu Zhou (Amsterdam UMC) Tanja Schultz (University of Bremen) Tanja Vrijkotte (Amsterdam UMC)

Presentation materials

There are no materials yet.