18–21 May 2026
Europe/Warsaw timezone

Statistical basis for precision screening with infrared molecular fingerprints: functional data decomposition and lung cancer signals

20 May 2026, 11:03
18m
Room 14

Room 14

oral presentation High dimensional data 2

Speaker

Lea Gigou (Department of Statistics, Ludwig Maximilian University of Munich; Max Planck Institute of Quantum Optics; Center for Molecular Fingerprinting)

Description

Early detection of lethal diseases such as lung cancer requires resolving faint signals amid biological heterogeneity. Precision screening aims to sensitively detect meaningful departures from an individual’s baseline by considering individual-level rather than population-level variability. This work investigates whether infrared molecular fingerprinting (IMF) - mid-infrared vibrational spectroscopy of blood plasma that captures broad molecular information - can support such precision screening. We find that IMFs exhibit strong variability stemming from different sources such as health status, population-level and individual-level variation. We show that functional principal component analysis can disentangle these components, allowing subsequent investigation of various IMF characteristics and application scenarios.
Two clinical datasets are analyzed: a cross‑sectional lung‑cancer case-control study (511 cases, 993 controls) and a longitudinal healthy cohort (5830 participants with five visits). Each observation comprises an IMF spectrum linked to demographic, lifestyle and clinical covariates. The spectra are treated as functional data over wavenumbers. We estimate between‑ and within‑person covariance components and perform FPCA separately on each to extract dominant modes of variation and reduce data dimensionality. Analogously, lung cancer FPCs are estimated from the diseased cohort. Assuming IMFs follow a Gaussian process, the Karhunen-Loève expansion is used to simulate spectra for different variability scenarios based on which confidence bands are extracted via the modified band depth. Disease detectability is assessed with logistic regression and functional anomaly‑detection methods across representations (full spectra, leading FPCA scores, residuals after projection onto a healthy FPC subspace), summarized by ROC‑AUC. The disease effect in the case-control study, i.e. the average treatment effect (ATE) of cancer on IMF profiles, is approximated via propensity‑score matched cohorts. The ATE estimate is then contrasted with variance components and confidence bands to gauge detectability in precision versus cross‑sectional settings.
The first 4 FPCs explain 93.1 % of spectral variance in healthy cohorts, with reproducible eigenfunction patterns across healthy datasets. Between‑person variability substantially exceeds within‑person variability and within‑person variation concentrates in specific spectral subregions. The dominant variation modes of lung cancer patients show patterns well distinguishable from healthy controls, indicating a clear disease signal. Binary classification achieves ROC‑AUCs up to 0.89 and anomaly detection up to 0.78. The ATE is comparable to overall variability scales yet clearly exceeds the average within‑person variability. Limitations of this work include covariate availability and site heterogeneity. These findings provide initial evidence that IMF-based precision screening has the potential to detect subtler perturbations under the observed variance structure and merits further evaluation.

32144111046

Author

Lea Gigou (Department of Statistics, Ludwig Maximilian University of Munich; Max Planck Institute of Quantum Optics; Center for Molecular Fingerprinting)

Co-authors

Göran Kauermann (Department of Statistics, Ludwig Maximilian University of Munich) Kosmas Kepesidis (Chair of Experimental Physics - Laser Physics, Ludwig Maximilian University of Munich; Max Planck Institute of Quantum Optics; Center for Molecular Fingerprinting)

Presentation materials

There are no materials yet.