ResearchHub | Open Science Community

A new method for multi-ancestry polygenic prediction improves performance across diverse populations

Haoyu Zhang et al.Oct 24, 2023

Polygenic risk scores (PRS) increasingly predict complex traits, however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRS using ancestry-specific GWAS summary statistics from multi-ancestry training samples, integrating clumping and thresholding, empirical Bayes and super learning. We evaluate CT-SLEB and nine-alternatives methods with large-scale simulated GWAS (∼19 million common variants) and datasets from 23andMe Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across thirteen complex traits. Results demonstrate that CT-SLEB significantly improves PRS performance in non-European populations compared to simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offer insights into sample size requirements and SNP density effects on multi-ancestry risk prediction.

Biobank

Bayes' Theorem

Genome-wide Association Study

99

Paper

Save

A Saturated Map of Common Genetic Variants Associated with Human Height from 5.4 Million Individuals of Diverse Ancestries

Loïc Yengo et al.Jan 12, 2022

ABSTRACT Common SNPs are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes. Here we show, using GWAS data from 5.4 million individuals of diverse ancestries, that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a median size of ~90 kb, covering ~21% of the genome. The density of independent associations varies across the genome and the regions of elevated density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs account for 40% of phenotypic variance in European ancestry populations but only ~10%-20% in other ancestries. Effect sizes, associated regions, and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely explained by linkage disequilibrium and allele frequency differences within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than needed to implicate causal genes and variants. Overall, this study, the largest GWAS to date, provides an unprecedented saturated map of specific genomic regions containing the vast majority of common height-associated variants.

Single-nucleotide Polymorphism

Linkage Disequilibrium

Genome-wide Association Study

3

Paper

Single-nucleotide Polymorphism

14

0

Save

0

An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction

Jingning Zhang et al.Sep 17, 2023

+7

J

Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of L 1 (lasso) and L 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R 2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.

Lasso (Programming Language)

Regression

Genome-wide Association Study

0

Paper

Lasso (Programming Language)

6

0

Save

1

Multi-ancestry GWAS of major depression aids locus discovery, fine-mapping, gene prioritisation, and causal inference

Xiangrui Meng et al.Oct 24, 2023

Abstract Most genome-wide association studies (GWAS) of major depression (MD) have been conducted in samples of European ancestry. Here we report a multi-ancestry GWAS of MD, adding data from 21 studies with 88,316 MD cases and 902,757 controls to previously reported data from individuals of European ancestry. This includes samples of African (36% of effective sample size), East Asian (26%) and South Asian (6%) ancestry and Hispanic/Latinx participants (32%). The multi-ancestry GWAS identified 190 significantly associated loci, 53 of them novel. For previously reported loci from GWAS in European ancestry the power-adjusted transferability ratio was 0.6 in the Hispanic/Latinx group and 0.3 in each of the other groups. Fine-mapping benefited from additional sample diversity: the number of credible sets with ≤5 variants increased from 3 to 12. A transcriptome-wide association study identified 354 significantly associated genes, 205 of them novel. Mendelian Randomisation showed a bidirectional relationship with BMI exclusively in samples of European ancestry. This first multi-ancestry GWAS of MD demonstrates the importance of large diverse samples for the identification of target genes and putative mechanisms.

Genome-wide Association Study

Genetic Genealogy

Genetic Association

1

Paper

Genome-wide Association Study

3

0

Save

0

Genome-wide association studies of coffee intake in UK/US participants of European ancestry uncover cohort-specific genetic associations

Hayley Thorpe et al.Sep 11, 2024

Genome-wide Association Study

Obesity

0

Paper

Save

MUSSEL: Enhanced Bayesian Polygenic Risk Prediction Leveraging Information across Multiple Ancestry Groups

Jin Jin et al.Sep 22, 2023

+13

J

Polygenic risk scores (PRS) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across different populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in the summary statistics from genome-wide association studies (GWAS) across multiple ancestry groups. MUSSEL conducts Bayesian hierarchical modeling under a MUltivariate Spike-and-Slab model for effect-size distribution and incorporates an Ensemble Learning step using super learner to combine information across different tuning parameter settings and ancestry groups. In our simulation studies and data analyses of 16 traits across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. The method, for example, has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African Ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, underlying trait architecture, and the choice of reference samples for LD estimation, and thus ultimately, a combination of methods may be needed to generate the most robust PRS across diverse populations.

Genome-wide Association Study

Trait

Multivariate Statistics

0

Paper

Genome-wide Association Study

Trait

0

Save

0

Genetic predisposition to mosaic Y chromosome loss in blood is associated with genomic instability in other tissues and susceptibility to non-haematological cancers

Deborah Thompson et al.May 6, 2020

Mosaic loss of chromosome Y (LOY) in circulating white blood cells is the most common form of clonal mosaicism, yet our knowledge of the causes and consequences of this is limited. Using a newly developed approach, we estimate that 20% of the UK Biobank male population (N=205,011) has detectable LOY. We identify 156 autosomal genetic determinants of LOY, which we replicate in 757,114 men of European and Japanese ancestry. These loci highlight genes involved in cell-cycle regulation, cancer susceptibility, somatic drivers of tumour growth and cancer therapy targets. Genetic susceptibility to LOY is associated with non-haematological health outcomes in both men and women, supporting the hypothesis that clonal haematopoiesis is a biomarker of genome instability in other tissues. Single-cell RNA sequencing identifies dysregulated autosomal gene expression in leukocytes with LOY, providing insights into how LOY may confer cellular growth advantage. Collectively, these data highlight the utility of studying clonal mosaicism to uncover fundamental mechanisms underlying cancer and other ageing-related diseases.

Biology

Genome Instability

Genetics

0

Paper

Save

Genetic predisposition to mosaic Y chromosome loss in blood is associated with genomic instability in other tissues and susceptibility to non-haematological cancers

D. Nunn et al.Oct 24, 2023

This research has been conducted using the UK Biobank Resource under application 9905 and 19808. This work was supported by the Medical Research Council [Unit Programme number MC_UU_12015/2]. Full study-specific and individual acknowledgements can be found in the supplementary information.

Biobank

Mosaic

Biology

0

Paper

Biobank

Mosaic

0

Save