ResearchHub | Open Science Community

Analysis of protein-coding genetic variation in 60,706 humans

Olle Melander et al.Aug 1, 2016

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human ‘knockout’ variants in protein-coding genes. Exome sequencing data from 60,706 people of diverse geographic ancestry is presented, providing insight into genetic variation across populations, and illuminating the relationship between DNA variants and human disease. As part of the Exome Aggregation Consortium (ExAC) project, Daniel MacArthur and colleagues report on the generation and analysis of high-quality exome sequencing data from 60,706 individuals of diverse ancestry. This provides the most comprehensive catalogue of human protein-coding genetic variation to date, yielding unprecedented resolution for the analysis of very rare variants across multiple human populations. The catalogue is freely accessible and provides a critical reference panel for the clinical interpretation of genetic variants and the discovery of disease-related genes.

Genetics

Biology

0

Paper

Save

FinnGen provides genetic insights from a well-phenotyped isolated population

Mitja Kurki et al.Jan 18, 2023

Abstract Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored 1,2 . FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10 –11 ) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.

Genetics

Biology

0

Paper

Save

A cross-population atlas of genetic associations for 220 human phenotypes

Saori Sakaue et al.Sep 30, 2021

Current genome-wide association studies do not yet capture sufficient diversity in populations and scope of phenotypes. To expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype genome-wide association studies (diseases, biomarkers and medication usage) in BioBank Japan (n = 179,000), by incorporating past medical history and text-mining of electronic medical records. Meta-analyses with the UK Biobank and FinnGen (ntotal = 628,000) identified ~5,000 new loci, which improved the resolution of the genomic map of human traits. This atlas elucidated the landscape of pleiotropy as represented by the major histocompatibility complex locus, where we conducted HLA fine-mapping. Finally, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified latent genetic components, which pinpointed responsible variants and biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (for example, allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human diseases through genetics. Genome-wide analyses in BioBank Japan, UK Biobank and FinnGen identify ~5,000 new loci associated with 220 human traits. Statistical decomposition of matrices of phenome-wide summary statistics further highlights variants underpinning diseases across populations.

Genetics

Molecular Biology

0

Paper

Save

Rare loss-of-function variants in SETD1A are associated with schizophrenia and developmental disorders

Tarjinder Singh et al.Mar 14, 2016

The authors analyzed the whole-exome sequences of over 16,000 individuals and found that very rare variants predicted to disrupt the SETD1A gene confer substantial risk for schizophrenia. Damaging variants in SETD1A were also associated with diverse, severe developmental disorders, providing an important genetic link between schizophrenia and other neurodevelopmental disorders. By analyzing the whole-exome sequences of 4,264 schizophrenia cases, 9,343 controls and 1,077 trios, we identified a genome-wide significant association between rare loss-of-function (LoF) variants in SETD1A and risk for schizophrenia (P = 3.3 × 10−9). We found only two heterozygous LoF variants in 45,376 exomes from individuals without a neuropsychiatric diagnosis, indicating that SETD1A is substantially depleted of LoF variants in the general population. Seven of the ten individuals with schizophrenia carrying SETD1A LoF variants also had learning difficulties. We further identified four SETD1A LoF carriers among 4,281 children with severe developmental disorders and two more carriers in an independent sample of 5,720 Finnish exomes, both with notable neuropsychiatric phenotypes. Together, our observations indicate that LoF variants in SETD1A cause a range of neurodevelopmental disorders, including schizophrenia. Combining these data with previous common variant evidence, we suggest that epigenetic dysregulation, specifically in the histone H3K4 methylation pathway, is an important mechanism in the pathogenesis of schizophrenia.

Genetics

Biology

0

Paper

Save

Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers

Nina Mars et al.Apr 1, 2020

Polygenic risk scores (PRSs) have shown promise in predicting susceptibility to common diseases1-3. We estimated their added value in clinical risk prediction of five common diseases, using large-scale biobank data (FinnGen; n = 135,300) and the FINRISK study with clinical risk factors to test genome-wide PRSs for coronary heart disease, type 2 diabetes, atrial fibrillation, breast cancer and prostate cancer. We evaluated the lifetime risk at different PRS levels, and the impact on disease onset and on prediction together with clinical risk scores. Compared to having an average PRS, having a high PRS contributed 21% to 38% higher lifetime risk, and 4 to 9 years earlier disease onset. PRSs improved model discrimination over age and sex in type 2 diabetes, atrial fibrillation, breast cancer and prostate cancer, and over clinical risk in type 2 diabetes, breast cancer and prostate cancer. In all diseases, PRSs improved reclassification over clinical thresholds, with the largest net reclassification improvements for early-onset coronary heart disease, atrial fibrillation and prostate cancer. This study provides evidence for the additional value of PRSs in clinical disease prediction. The practical applications of polygenic risk information for stratified screening or for guiding lifestyle and medical interventions in the clinical setting remain to be defined in further studies.

Genetics

Oncology

0

Paper

Save

An efficient and accurate frailty model approach for genome-wide survival association analysis controlling for population structure and relatedness in large-scale biobanks

Rounak Dey et al.Nov 1, 2020

Abstract With decades of electronic health records linked to genetic data, large biobanks provide unprecedented opportunities for systematically understanding the genetics of the natural history of complex diseases. Genome-wide survival association analysis can identify genetic variants associated with ages of onset, disease progression and lifespan. We developed an efficient and accurate frailty (random effects) model approach for genome-wide survival association analysis of censored time-to-event (TTE) phenotypes in large biobanks by accounting for both population structure and relatedness. Our method utilizes state-of-the-art optimization strategies to reduce the computational cost. The saddlepoint approximation is used to allow for analysis of heavily censored phenotypes (>90%) and low frequency variants (down to minor allele count 20). We demonstrated the performance of our method through extensive simulation studies and analysis of five TTE phenotypes, including lifespan, with heavy censoring rates (90.9% to 99.8%) on ~400,000 UK Biobank participants with white British ancestry and ~180,000 samples in FinnGen, respectively. We further performed genome-wide association analysis for 871 TTE phenotypes in UK Biobank and presented the genome-wide scale phenome-wide association (PheWAS) results with the PheWeb browser.

Genetics

Molecular Biology

12

Paper

Save

The impact of non-additive genetic associations on age-related complex diseases

Marta Guindo-Martínez et al.May 14, 2020

Abstract Genome-wide association studies (GWAS) are not fully comprehensive as current strategies typically test only the additive model, exclude the X chromosome, and use only one reference panel for genotype imputation. We implemented an extensive GWAS strategy, GUIDANCE, which improves genotype imputation by using multiple reference panels, includes the analysis of the X chromosome and non-additive models to test for association. We applied this methodology to 62,281 subjects across 22 age-related diseases and identified 94 genome-wide associated loci, including 26 previously unreported. We observed that 27.6% of the 94 loci would be missed if we only used standard imputation strategies and only tested the additive model. Among the new findings, we identified three novel low-frequency recessive variants with odds ratios larger than 4, which would need at least a three-fold larger sample size to be detected under the additive model. This study highlights the benefits of applying innovative strategies to better uncover the genetic architecture of complex diseases.

Genetics

Molecular Biology

35

Paper

Save

Genetic analysis of obstructive sleep apnoea discovers a strong association with cardiometabolic health

Satu Strausz et al.Aug 4, 2020

Abstract There is currently only limited understanding of the genetic aetiology of obstructive sleep apnoea (OSA). The aim of our study is to identify genetic loci associated with OSA risk and to test if OSA and its comorbidities share a common genetic background. We conducted the first large-scale genome-wide association study of OSA using FinnGen Study (217,955 individuals) with 16,761 OSA patients identified using nationwide health registries. We estimated 8.3% [0.06-0.11] heritability and identified five loci associated with OSA (P < 5.0 × 10 −8 ): rs4837016 near GTPase activating protein and VPS9 domains 1 ( GAPVD1 ), rs10928560 near C-X-C motif chemokine receptor 4 ( CXCR4 ), rs185932673 near Calcium/calmodulin-dependent protein kinase ID ( CAMK1D ) and rs9937053 near Fat mass and obesity-associated protein ( FTO ) - a variant previously associated with body mass index (BMI). In a BMI-adjusted analysis, an association was observed for rs10507084 near Rhabdomyosarcoma 2 associated transcript ( RMST )/NEDD1 gamma-tubulin ring complex targeting factor ( NEDD1 ). We found genetic correlations between OSA and BMI (rg=0.72 [0.62-0.83]) and with comorbidities including hypertension, type 2 diabetes (T2D), coronary heart disease (CHD), stroke, depression, hypothyroidism, asthma and inflammatory rheumatic diseases (IRD) (rg > 0.30). Polygenic risk score (PRS) for BMI showed 1.98-fold increased OSA risk between the highest and the lowest quintile and Mendelian randomization supported a causal relationship between BMI and OSA. Our findings support the causal link between obesity and OSA and joint genetic basis between OSA and comorbidities.

Genetics

Internal Medicine

1

Paper

Save

Haplotype sharing provides insights into fine-scale population history and disease in Finland

Alicia Martin et al.Oct 13, 2017

Abstract Finland provides unique opportunities to investigate population and medical genomics because of its adoption of unified national electronic health records, detailed historical and birth records, and serial population bottlenecks. We assemble a comprehensive view of recent population history (≤100 generations), the timespan during which most rare disease-causing alleles arose, by comparing pairwise haplotype sharing from 43,254 Finns to geographically and linguistically adjacent countries with different population histories, including 16,060 Swedes, Estonians, Russians, and Hungarians. We find much more extensive sharing in Finns, with at least one ≥ 5 cM tract on average between pairs of unrelated individuals. By coupling haplotype sharing with fine-scale birth records from over 25,000 individuals, we find that while haplotype sharing broadly decays with geographical distance, there are pockets of excess haplotype sharing; individuals from northeast Finland share several-fold more of their genome in identity-by-descent (IBD) segments than individuals from southwest regions containing the major cities of Helsinki and Turku. We estimate recent effective population size changes over time across regions of Finland and find significant differences between the Early and Late Settlement Regions as expected; however, our results indicate more continuous gene flow than previously indicated as Finns migrated towards the northernmost Lapland region. Lastly, we show that haplotype sharing is locally enriched among pairs of individuals sharing rare alleles by an order of magnitude, especially among pairs sharing rare disease causing variants. Our work provides a general framework for using haplotype sharing to reconstruct an integrative view of recent population history and gain insight into the evolutionary origins of rare variants contributing to disease.

Genetics

History

0

Paper

Save

Contribution of rare and common variants to intellectual disability in a high-risk population sub-isolate of Northern Finland

Mitja Kurki et al.May 28, 2018

Abstract The contribution of de novo and ultra-rare genetic variants in severe and moderate intellectual disability (ID) has been extensively studied whereas the genetic architecture of mild ID has been less well characterized. To elucidate the genetic background of milder ID we studied a regional cohort of 442 ID patients enriched for mild ID (>50%) from a population isolate of Finland. We analyzed rare variants using exome sequencing and CNV genotyping and common variants using common variant polygenic risk scores. As controls we used a Finnish collection of exome sequenced (n=11311) and GWAS chip genotyped (n=11699) individuals. We show that rare damaging variants in genes known to be associated with cognitive defects are observed more often in severe (27%) than in mild ID (13%) patients (p-value: 7.0e-4). We further observed a significant enrichment of protein truncating variants in loss-of-function intolerant genes, as well as damaging missense variants in genes not yet associated with cognitive defects (OR: 2.1, p-value: 3e-8). For the first time to our knowledge, we show that a common variant polygenic load significantly contributes to all severity forms of ID. The heritability explained was the highest for educational attainment (EDU) in mild ID explaining 2.2% of the heritability on liability scale. For more severe ID it was lower at 0.6%. Finally, we identified a homozygote variant in the CRADD gene to be a cause of a specific syndrome with ID and pachygyria. The frequency of this variant is 50x higher in the Finnish population than in non-Finnish Europeans, demonstrating the benefits of utilizing population isolates in rare variant analysis of diseases under negative selection.

Genetics

Biology

0

Paper

Genetics

3

0

Save