18–21 May 2026
Europe/Warsaw timezone

Regularized Multi-Omics Regression Modelling for Transcriptomic–Proteomic Integration in Mice with induced liver Damage.

19 May 2026, 11:21
18m
Room 14

Room 14

oral presentation High dimensional data 1

Speaker

Ngoune Darwin (TU Dortmund University)

Description

Regularized Multi-Omics Regression Modeling for Transcriptomic–Proteomic Integration in Mice with induced liver Damage.

Toxicological compounds exert complex effects on tissues and organisms, which can be investigated using genomic, transcriptomic, and proteomic data. A central challenge lies in understanding the relationship between RNA and protein levels. While these are expected to be positively correlated, transcriptomic variation often explains only a small fraction of protein abundance. This discrepancy motivates the use of advanced regression approaches to improve prediction. Previous work analysed transcriptomic and proteomic data from CCl₄ exposed mice, using regression and DiPa plots to group genes and enhance predictive accuracy. Although mRNA–protein correlations are frequently studied on a genome wide scale, individual mRNA–protein relationships are rarely explored in a regression modeling context that incorporates additional omics data.

Building on earlier analyses, this study aims to reapply and expand an established regression pipeline to new mouse data sets. Specifically, we seek to predict protein intensities across a new large proteome wide scale under different treatments, evaluate cross dataset comparability, and assess the benefits of combining datasets to increase sample size and improve predictive performance.

The new dataset comprises 18 mice divided into three groups (control, BDL treatment, and BDL ABSTi remedy), with approximately 1,436 matched gene–protein pairs. A stringent normalization pipeline was applied to minimize technical artifacts while preserving biological structure. Given the high dimensional context (p >> n), we employed based on the pipeline a random forest preselection step prior to LASSO regression. The top 10 covariates were selected based on variable importance, followed by LASSO and post LASSO regression modeling. Cross data set validation was performed to assess model generalizability.

Using protein covariates, the models achieved an average correlation of ~0.7 between predicted and true protein levels in out of sample validation. This performance demonstrates the utility of random forest preselection combined with LASSO regression in high dimensional omics data. The approach successfully reduced noise, improved predictive accuracy, and highlighted biologically meaningful gene–protein relationships.

These results indicate that protein intensities of new samples can be predicted with high precision, provided that covariate data are available. In practice, this framework can be applied to impute missing values in proteomic datasets or to detect implausible outliers. Furthermore, combining datasets to increase sample size enhances predictive performance and provides deeper biological insight into toxicological responses. Future work will focus on refining variable selection strategies to fully exploit the potential of integrative omics modeling.

85717608484

Author

Ngoune Darwin (TU Dortmund University)

Co-authors

Andreas Groll (TU Dortmund University) Heiner Jonas (TU Dortmund University) Jan Hengstler (Leibniz Research Centre for Working Environment and Human Factors at the Technical University of Dortmund (IfADo) City: Dortmund)

Presentation materials

There are no materials yet.