Speaker
Description
Genome-wide association studies (GWAS) often identify genomic regions containing hundreds or thousands of genetic variants with comparable statistical evidence. Extensive linkage disequilibrium (LD) and the sparsity of causal variants obscure association signals, hindering the identification of true causal variants underlying complex traits. Fine-mapping approaches are introduced to distinguish causal variants from closely correlated non-causal ones. Although most previous GWAS and fine-mapping studies have focused on individuals of European ancestry, cross-population fine-mapping can improve causal resolution and discovery power by leveraging broader genetic diversity. We introduce a machine learning–based Bayesian framework that integrates GWAS z-scores and LD matrices from multiple populations. By modelling shared causal configurations across ancestries, the model efficiently estimates posterior inclusion probabilities and identifies credible sets for multiple causal variants. We comprehensively evaluated the performance of the proposed model through simulations, comparing it against single-population fine-mapping with post hoc aggregation and the state-of-the-art SuSiEx method under varying numbers of causal variants, cross-population genetic correlations, and noise levels. We further applied the model to summary statistics from the UK Biobank and China Kadoorie Biobank, representing European and Asian ancestries, to identify causal variants associated with different subtypes of breast cancer. Compared with baseline methods, our model generally achieves better power with comparable coverage, assigns higher posterior inclusion probabilities to putative causal variants, and successfully identifies variants missed by other approaches due to infinitesimal effects from non-causal signals. This deep learning–driven Bayesian inference framework enables scalable fine-mapping across diverse biobanks, offering new opportunities for biological discovery.
21429412105