ResearchHub | Open Science Community

Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study

Alanna Morrison et al.Dec 22, 2016

Unleashing the power of precision medicine Precision medicine promises the ability to identify risks and treat patients on the basis of pathogenic genetic variation. Two studies combined exome sequencing results for over 50,000 people with their electronic health records. Dewey et al. found that ∼3.5% of individuals in their cohort had clinically actionable genetic variants. Many of these variants affected blood lipid levels that could influence cardiovascular health. Abul-Husn et al. extended these findings to investigate the genetics and treatment of familial hypercholesterolemia, a risk factor for cardiovascular disease, within their patient pool. Genetic screening helped identify at-risk patients who could benefit from increased treatment. Science , this issue p. 10.1126/science.aaf6814 , p. 10.1126/science.aaf7000

Genetics

Cancer Research

0

Paper

Save

Exome sequencing and characterization of 49,960 individuals in the UK Biobank

Cristopher Hout et al.Oct 21, 2020

Abstract The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world 1 . Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.

Genetics

Biology

0

Paper

Save

Genotyping, sequencing and analysis of 140,000 adults from the Mexico City Prospective Study

Andrey Ziyatdinov et al.Jun 29, 2022

Abstract The Mexico City Prospective Study (MCPS) is a prospective cohort of over 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City. We generated genotype and exome sequencing data for all individuals, and whole genome sequencing for 10,000 selected individuals. We uncovered high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Native American, European and African ancestry, with extensive admixture from indigenous groups in Central, Southern and South Eastern Mexico. Native Mexican segments of the genome had lower levels of coding variation, but an excess of homozygous loss of function variants compared with segments of African and European origin. We estimated population specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Native Mexico at exome variants, all available via a public browser. Using whole genome sequencing, we developed an imputation reference panel which outperforms existing panels at common variants in individuals with high proportions of Central, South and South Eastern Native Mexican ancestry. Our work illustrates the value of genetic studies in populations with diverse ancestry and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States where the Hispanic/Latino population is predominantly of Mexican descent.

Genetics

Ecology

1

Paper

Save

A deep catalog of protein-coding variation in 985,830 individuals

Kathie Sun et al.May 10, 2023

ABSTRACT Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.

Genetics

Cancer Research

1

Paper

Save

Mutation spectrum ofNOD2reveals recessive inheritance as a main driver of Early Onset Crohn’s Disease

Julie Horowitz et al.Jan 6, 2017

Abstract Inflammatory bowel disease (IBD), clinically defined as Crohn’s disease (CD), ulcerative colitis (UC), or IBD-unclassified, results in chronic inflammation of the gastrointestinal tract in genetically susceptible hosts. Pediatric onset IBD represents ≥25% of all IBD diagnoses and often presents with intestinal stricturing, perianal disease, and failed response to conventional treatments. NOD2 was the first and is the most replicated locus associated with adult IBD, to date. To determine the role of NOD2 and other genes in pediatric IBD, we performed whole-exome sequencing on a cohort of 1,183 patients with pediatric onset IBD (ages 0-18.5 years). We identified 92 probands who were homozygous or compound heterozygous for rare and low frequency NOD2 variants accounting for approximately 8% of our cohort, suggesting a Mendelian recessive inheritance pattern of disease. Additionally, we investigated the contribution of recessive inheritance of NOD2 alleles in adult IBD patients from the Regeneron Genetics Center (RGC)-Geisinger Health System DiscovEHR study, which links whole exome sequences to longitudinal electronic health records (EHRs) from 51,289 participants. We found that ~7% of cases in this adult IBD cohort, including ~10% of CD cases, can be attributed to recessive inheritance of NOD2 variants, confirming the observations from our pediatric IBD cohort. Exploration of EHR data showed that 14% of these adult IBD patients obtained their initial IBD diagnosis before 18 years of age, consistent with early onset disease. Collectively, our findings show that recessive inheritance of rare and low frequency deleterious NOD2 variants account for 7-10% of CD cases and implicate NOD2 as a Mendelian disease gene for early onset Crohn’s Disease. Author Summary Pediatric onset inflammatory bowel disease (IBD) represents ≥25% of IBD diagnoses; yet the genetic architecture of early onset IBD remains largely uncharacterized. To investigate this, we performed whole-exome sequencing and rare variant analysis on a cohort of 1,183 pediatric onset IBD patients. We found that 8% of patients in our cohort were homozygous or compound heterozygous for rare or low frequency deleterious variants in the nucleotide binding and oligomerization domain containing 2 (NOD2) gene. Further investigation of whole-exome sequencing of a large clinical cohort of adult IBD patients uncovered recessive inheritance of rare and low frequency NOD2 variants in 7% of cases and that the relative risk for NOD2 variant homozygosity has likely been underestimated. While it has been reported that having >1 NOD2 risk alleles is associated with increased susceptibility to Crohn’s Disease (CD), our data formally demonstrate what has long been suspected: recessive inheritance of NOD2 alleles is a mechanistic driver of early onset IBD, specifically CD, likely due to loss of NOD2 protein function. Our data suggest that a subset of IBD-CD patients with early disease onset is characterized by recessive inheritance of NOD2 alleles, which has important implications for the screening, diagnosis, and treatment of IBD.

Genetics

Internal Medicine

0

Paper

Save

Rare and Common Genetic Variation Underlying Atrial Fibrillation Risk

Oliver Vad et al.Jun 26, 2024

Importance Atrial fibrillation (AF) has a substantial genetic component. The importance of polygenic risk is well established, while the contribution of rare variants to disease risk warrants characterization in large cohorts. Objective To identify rare predicted loss-of-function (pLOF) variants associated with AF and elucidate their role in risk of AF, cardiomyopathy (CM), and heart failure (HF) in combination with a polygenic risk score (PRS). Design, Setting, and Participants This was a genetic association and nested case-control study. The impact of rare pLOF variants was evaluated on the risk of incident AF. HF and CM were assessed in cause-specific Cox regressions. End of follow-up was July 1, 2022. Data were analyzed from January to October 2023. The UK Biobank enrolled 502 480 individuals aged 40 to 69 years at inclusion in the United Kingdom between March 13, 2006, and October 1, 2010. UK residents of European ancestry were included. Individuals with prior diagnosis of AF were excluded from analyses of incident AF. Exposures Rare pLOF variants and an AF PRS. Main Outcomes and Measures Risk of AF and incident HF or CM prior to and subsequent to AF diagnosis. Results A total of 403 990 individuals (218 489 [54.1%] female) with a median (IQR) age of 58 (51-63) years were included; 24 447 were diagnosed with incident AF over a median (IQR) follow-up period of 13.3 (12.4-14.0) years. Rare pLOF variants in 6 genes ( TTN , RPL3L , PKP2 , CTNNA3 , KDM5B , and C10orf71 ) were associated with AF. Of these, TTN , RPL3L , PKP2 , CTNNA3 , and KDM5B replicated in an external cohort. Combined with high PRS, rare pLOF variants conferred an odds ratio of 7.08 (95% CI, 6.03-8.28) for AF. Carriers with high PRS also had a substantial 10-year risk of AF (16% in female individuals and 24% in male individuals older than 60 years). Rare pLOF variants were associated with increased risk of CM both prior to AF (hazard ratio [HR], 3.13; 95% CI, 2.24-4.36) and subsequent to AF (HR, 2.98; 95% CI, 1.89-4.69). Conclusions and Relevance Rare and common genetic variation were associated with an increased risk of AF. The findings provide insights into the genetic underpinnings of AF and may aid in future genetic risk stratification.

Internal Medicine

Cardiology And Cardiovascular Medicine

0

Paper

Save

KaryoScan: abnormal karyotype detection from whole-exome sequence

Evan Maxwell et al.Oct 17, 2017

Abstract Motivation Detection of abnormal karyotypes from whole-exome sequencing has significant clinical potential, enabling a primary screen for chromosomal anomalies among samples undergoing short-read sequencing for nucleotide resolution genomic characterization. Results We present KaryoScan, a high-throughput method for detecting chromosomal anomalies within large cohort exome sequencing studies. We detect and validate autosomal and sex chromosomal aneuploidies in a large exome sequencing cohort, and demonstrate detection of smaller and complex events (partial chromosome, mosaic, copy neutral, and complex rearrangements), representing the range of anomalies that can be uncovered from the exome. Availability https://github.com/rgcgithub/karyoscan

Genetics

Cancer Research

0

Paper

Save

Profiling copy number variation and disease associations from 50,726 DiscovEHR Study exomes

Evan Maxwell et al.Mar 22, 2017

Copy number variants (CNVs) are a substantial source of genomic variation and contribute to a wide range of human disorders. Gene-disrupting exonic CNVs have important clinical implications as they can underlie variability in disease presentation and susceptibility. The relationship between exonic CNVs and clinical traits has not been broadly explored at the population level, primarily due to technical challenges. We surveyed common and rare CNVs in the exome sequences of 50,726 adult DiscovEHR study participants with linked electronic health records (EHRs). We evaluated the diagnostic yield and clinical expressivity of known pathogenic CNVs, and performed tests of association with EHR-derived serum lipids, thereby evaluating the relationship between CNVs and complex traits and phenotypes in an unbiased, real-world clinical context. We identified CNVs from megabase to exon-level resolution, demonstrating reliable, high-throughput detection of clinically relevant exonic CNVs. In doing so, we created a catalog of high-confidence common and rare CNVs and refined population frequency estimates of known and novel gene-disrupting CNVs. Our survey among an unselected clinical population provides further evidence that neuropathy-associated duplications and deletions in 17p12 have similar population prevalence but are clinically under-diagnosed. Similarly, adults who harbor 22q11.2 deletions frequently had EHR documentation of neurodevelopmental/neuropsychiatric disorders and congenital anomalies, but not a formal genetic diagnosis (i.e., deletion). In an exome-wide association study of lipid levels, we identified a novel five-exon duplication within LDLR segregating in a large kindred with features of familial hypercholesterolemia. Exonic CNVs provide new opportunities to understand and diagnose human disease.

Genetics

Molecular Biology

0

Paper

Save

Validating gene-phenotype associations using relationships in the UMLS

Andrew Blumenfeld et al.Jul 30, 2020

Abstract Objective Large scale next-generation sequencing of population cohorts paired with patients’ electronic health records (EHR) provides an excellent resource for the study of gene-disease associations. To validate those associations, researchers often consult databases that identify relationships between genes of interest and relevant disease phenotypes, which we refer to as simply “phenotypes”. However, most of these databases contain phenotypes that are not suited for automated analysis of EHR data, which often captured these phenotypes in the form of International Classification of Diseases (ICD) codes. There is a need for a resource that comprehensively provides gene-phenotype mappings in a format that can be used to evaluate phenotypes from EHR. Methods We built a directed graph database of genes, medical concepts and ICD codes based on a subset of the National Library of Medicine’s Unified Medical Language System (UMLS) and other resources. To obtain associations between genes and ICD codes, we traversed the defined relationships from gene, variant and disease concepts to ICD codes, resulting in a set of mappings that link specific genes and variants to these ICD codes. Results Our method created 249,764 mappings between genes and ICD codes, including 27,226 “disease” phenotypes and 222,538 “symptom” phenotypes, and provided mappings for 4,456 unique genes. Paths were validated by manual review of a diverse sample of paths. In a cohort of 92,455 samples, we used these mappings to validate gene-phenotype associations in 32,786 samples where a person had a potentially disease-causing genetic mutation and at least one corresponding diagnosis in their EHR. Conclusion The concepts and relationships in the UMLS can be used to generate gene-ICD phenotype mappings that are not explicit in the source vocabularies. We were able use these mappings to validate gene-disease associations in a large cohort of sequenced exomes paired with EHR.

Genetics

Molecular Biology

6

Paper

Save

Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank

Cristopher Hout et al.Mar 9, 2019

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world. Here we describe the first tranche of large-scale exome sequence data for 49,960 study participants, revealing approximately 4 million coding variants (of which ~98.4% have frequency < 1%). The data includes 231,631 predicted loss of function variants, a >10-fold increase compared to imputed sequence for the same participants. Nearly all genes (>97%) had ≥1 predicted loss of function carrier, and most genes (>69%) had ≥10 loss of function carriers. We illustrate the power of characterizing loss of function variation in this large population through association analyses across 1,741 phenotypes. In addition to replicating a range of established associations, we discover novel loss of function variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical significance in this population, finding that 2% of the population has a medically actionable variant. Additionally, we leverage the phenotypic data to characterize the relationship between rare BRCA1 and BRCA2 pathogenic variants and cancer risk. Exomes from the first 49,960 participants are now made accessible to the scientific community and highlight the promise offered by genomic sequencing in large-scale population-based studies.

Genetics

Biology

0

Paper

Genetics

Biology

0

Save