Speaker
Description
Population-scale genomic biobanks provide unique opportunities for data-driven drug target discovery. However, these resources often lack detailed data on clinical phenotypes, whereas clinical trials offer rich phenotypic information but are limited in omics coverage and mostly lack genotyping. This imbalance creates gaps in the mechanistic interpretation of clinical findings.
To address this, we explore a recently proposed back-translation framework that links clinical trial data with genomic biobank data, leveraging the complementary strengths of both sources. Since biobank data can be considered representative of the general population, important disease or trial-specific genetic signals may end up being diluted. To mitigate this, we apply population matching strategies to obtain a biobank subpopulation comparable to the clinical trial cohort, based on demographic and disease-related baseline markers.
This framework applies propensity score-based methods, commonly used for external data integration in clinical research to biobank settings, with a focus on disease-relevant genetic information. We investigate propensity score matching with different matching specifications (e.g. caliper width, a predefined maximum acceptable difference between the matched units) to account for two competing goals: maximizing similarity between matched populations and maintaining sufficient power for carrying out genome-wide association studies. We perform a simulation study to evaluate how different matching designs affect the efficiency and power in detecting true genetic associations between patients’ genotypes and quantitative phenotypes or disease onset labels.
The matched biobank–trial integration enables the identification of disease-relevant genetic signals that would otherwise remain hidden in heterogeneous populations. Such information can support downstream efforts in target validation, patient stratification, mechanistic studies, and precision medicine development.
85717617768