ResearchHub | Open Science Community

Improving genetic diagnosis in Mendelian disease with transcriptome sequencing

Beryl Cummings et al.Sep 8, 2016

Abstract Exome and whole-genome sequencing are becoming increasingly routine approaches in Mendelian disease diagnosis. Despite their success, the current diagnostic rate for genomic analyses across a variety of rare diseases is approximately 25-50%. Here, we explore the utility of transcriptome sequencing (RNA-seq) as a complementary diagnostic tool in a cohort of 50 patients with genetically undiagnosed rare muscle disorders. We describe an integrated approach to analyze patient muscle RNA-seq, leveraging an analysis framework focused on the detection of transcript-level changes that are unique to the patient compared to over 180 control skeletal muscle samples. We demonstrate the power of RNA-seq to validate candidate splice-disrupting mutations and to identify splice-altering variants in both exonic and deep intronic regions, yielding an overall diagnosis rate of 35%. We also report the discovery of a highly recurrent de novo intronic mutation in COL6A1 that results in a dominantly acting splice-gain event, disrupting the critical glycine repeat motif of the triple helical domain. We identify this pathogenic variant in a total of 27 genetically unsolved patients in an external collagen VI-like dystrophy cohort, thus explaining approximately 25% of patients clinically suggestive of collagen VI dystrophy in whom prior genetic analysis is negative. Overall, this study represents a large systematic application of transcriptome sequencing to rare disease diagnosis and highlights its utility for the detection and interpretation of variants missed by current standard diagnostic approaches. One Sentence Summary Transcriptome sequencing improves the diagnostic rate for Mendelian disease in patients for whom genetic analysis has not returned a diagnosis.

Genetics

Molecular Biology

0

Paper

Save

Nuclear genetic control of mtDNA copy number and heteroplasmy in humans

Rahul Gupta et al.Aug 16, 2023

Abstract Mitochondrial DNA (mtDNA) is a maternally inherited, high-copy-number genome required for oxidative phosphorylation 1 . Heteroplasmy refers to the presence of a mixture of mtDNA alleles in an individual and has been associated with disease and ageing. Mechanisms underlying common variation in human heteroplasmy, and the influence of the nuclear genome on this variation, remain insufficiently explored. Here we quantify mtDNA copy number (mtCN) and heteroplasmy using blood-derived whole-genome sequences from 274,832 individuals and perform genome-wide association studies to identify associated nuclear loci. Following blood cell composition correction, we find that mtCN declines linearly with age and is associated with variants at 92 nuclear loci. We observe that nearly everyone harbours heteroplasmic mtDNA variants obeying two principles: (1) heteroplasmic single nucleotide variants tend to arise somatically and accumulate sharply after the age of 70 years, whereas (2) heteroplasmic indels are maternally inherited as mixtures with relative levels associated with 42 nuclear loci involved in mtDNA replication, maintenance and novel pathways. These loci may act by conferring a replicative advantage to certain mtDNA alleles. As an illustrative example, we identify a length variant carried by more than 50% of humans at position chrM:302 within a G-quadruplex previously proposed to mediate mtDNA transcription/replication switching 2,3 . We find that this variant exerts cis -acting genetic control over mtDNA abundance and is itself associated in- trans with nuclear loci encoding machinery for this regulatory switch. Our study suggests that common variation in the nuclear genome can shape variation in mtCN and heteroplasmy dynamics across the human population.

Genetics

Molecular Biology

0

Paper

Save

Transcript expression-aware annotation improves rare variant discovery and interpretation

Beryl Cummings et al.Feb 19, 2019

Abstract The acceleration of DNA sequencing in patients and population samples has resulted in unprecedented catalogues of human genetic variation, but the interpretation of rare genetic variants discovered using such technologies remains extremely challenging. A striking example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Through manual curation of putative loss of function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)( 1 ), we show that one explanation for this paradox involves alternative mRNA splicing, which allows exons of a gene to be expressed at varying levels across cell types. Currently, no existing annotation tool systematically incorporates this exon expression information into variant interpretation. Here, we develop a transcript-level annotation metric, the proportion expressed across transcripts (pext), which summarizes isoform quantifications for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression project( 2 ) (GTEx) and show that it clearly differentiates between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder (ASD) and developmental disorders and intellectual disability (DD/ID) to show that pLoF variants in weakly expressed regions have effect sizes similar to those of synonymous variants, while pLoF variants in highly expressed exons are most strongly enriched among cases versus controls. Our annotation is fast, flexible, and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for rare disease diagnosis, rare variant burden analyses in complex disorders, and curation and prioritization of variants in recall-by-genotype studies.

Genetics

Molecular Biology

0

Paper

Save

GATK-gCNV: A Rare Copy Number Variant Discovery Algorithm and Its Application to Exome Sequencing in the UK Biobank

Mehrtash Babadi et al.Aug 26, 2022

SUMMARY Copy number variants (CNVs) are major contributors to genetic diversity and disease. To date, exome sequencing (ES) has been generated for millions of individuals in international biobanks, human disease studies, and clinical diagnostic screening. While standardized methods exist for detecting short variants (single nucleotide and insertion/deletion variants) using tools such as the Genome Analysis ToolKit (GATK), technical challenges have confounded similarly uniform large-scale CNV analyses from ES data. Given the profound impact of rare and de novo coding CNVs on genome organization and human disease, the lack of widely-adopted and robustly benchmarked rare CNV discovery tools has presented a barrier to routine exome-wide assessment of this critical class of variation. Here, we introduce GATK-gCNV, a flexible algorithm to discover rare CNVs from genome sequencing read-depth information, which we distribute as an open-source tool packaged in GATK. GATK-gCNV uses a probabilistic model and inference framework that accounts for technical biases while simultaneously predicting CNVs, which enables self-consistency between technical read-depth normalization and variant calling. We benchmarked GATK-gCNV in 7,962 exomes from individuals in quartet families with matched genome sequencing and microarray data. These analyses demonstrated 97% recall of rare (≤1% site frequency) coding CNVs detected by microarrays and 95% recall of rare coding CNVs discovered by genome sequencing at a resolution of more than two exons. We applied GATK-gCNV to generate a reference catalog of rare coding CNVs in 197,306 individuals with ES from the UK Biobank. We observed strong correlations between CNV rates per gene and measures of mutational constraint, as well as rare CNV associations with multiple traits. In summary, GATK-gCNV is a tunable approach for sensitive and specific CNV discovery in ES, which can easily be applied across trait association and clinical screening.

Genetics

Molecular Biology

1

Paper

Save

Characterising the loss-of-function impact of 5’ untranslated region variants in whole genome sequence data from 15,708 individuals

Leif Groop et al.Feb 7, 2019

Abstract Upstream open reading frames (uORFs) are important tissue-specific cis -regulators of protein translation. Although isolated case reports have shown that variants that create or disrupt uORFs can cause disease, genetic sequencing approaches typically focus on protein-coding regions and ignore these variants. Here, we describe a systematic genome-wide study of variants that create and disrupt human uORFs, and explore their role in human disease using 15,708 whole genome sequences collected by the Genome Aggregation Database (gnomAD) project. We show that 14,897 variants that create new start codons upstream of the canonical coding sequence (CDS), and 2,406 variants disrupting the stop site of existing uORFs, are under strong negative selection. Furthermore, variants creating uORFs that overlap the CDS show signals of selection equivalent to coding loss-of-function variants, and uORF-perturbing variants are under strong selection when arising upstream of known disease genes and genes intolerant to loss-of-function variants. Finally, we identify specific genes where perturbation of uORFs is likely to represent an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in families with neurofibromatosis. Our results highlight uORF-perturbing variants as an important and under-recognised functional class that can contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data to study the deleteriousness of specific classes of non-coding variants.

Genetics

Oncology

0

Paper

Save

Tractor: A framework allowing for improved inclusion of admixed individuals in large-scale association studies

Elizabeth Atkinson et al.May 19, 2020

Abstract Admixed populations are routinely excluded from medical genomic studies due to concerns over population structure. Here, we present a statistical framework and software package, Tractor, to facilitate the inclusion of admixed individuals in association studies by leveraging local ancestry. We test Tractor with simulations and empirical data focused on admixed African-European individuals. Tractor generates ancestryspecific effect size estimates, can boost GWAS power, and improves the resolution of association signals. Using a local ancestry aware regression model, we replicate known hits for blood lipids in admixed populations, discover novel hits missed by standard GWAS procedures, and localize signals closer to putative causal variants.

Genetics

Software

70

Paper

Save

Biological insights from the whole genome analysis of human embryonic stem cells

Florian Merkle et al.Oct 26, 2020

ABSTRACT There has not yet been a systematic analysis of hESC whole genomes at a single nucleotide resolution. We therefore performed whole genome sequencing (WGS) of 143 hESC lines and annotated their single nucleotide and structural genetic variants. We found that while a substantial fraction of hESC lines contained large deleterious structural variants, finer scale structural and single nucleotide variants (SNVs) that are ascertainable only through WGS analyses were present in hESCs genomes and human blood-derived genomes at similar frequencies. However, WGS did identify SNVs associated with cancer or other diseases that will likely alter cellular phenotypes and may compromise the safety of hESC-derived cellular products transplanted into humans. As a resource to enable reproducible hESC research and safer translation, we provide a user-friendly WGS data portal and a data-driven scheme for cell line maintenance and selection. GRAPHICAL ABSTRACT IN BRIEF Merkle and Ghosh et al. describe insights from the whole genome sequences of commonly used human embryonic stem cell (hESC) lines. Analyses of these sequences show that while hESC genomes had more large structural variants than humans do from genetic inheritance, hESCs did not have an observable excess of finer-scale variants. However, many hESC lines contained rare loss-of-function variants and combinations of common variants that may profoundly shape their biological phenotypes. Thus, genome sequencing data can be valuable to those selecting cell lines for a given biological or clinical application, and the sequences and analysis reported here should facilitate such choices. HIGHLIGHTS One third of hESCs we analysed are siblings, and almost all are of European ancestry Large structural variants are common in hESCs, but finer-scale variation is similar to that human populations Many strong-effect loss-of-function mutations and cancer-associated mutations are present in specific hESC lines We provide user-friendly resources for rational hESC line selection based on genome sequence

Genetics

Molecular Biology

9

Paper

Save

Using high-resolution variant frequencies to empower clinical genome interpretation

Nicola Whiffin et al.Sep 2, 2016

ABSTRACT Whole exome and genome sequencing have transformed the discovery of genetic variants that cause human Mendelian disease, but discriminating pathogenic from benign variants remains a daunting challenge. Rarity is recognised as a necessary, although not sufficient, criterion for pathogenicity, but frequency cutoffs used in Mendelian analysis are often arbitrary and overly lenient. Recent very large reference datasets, such as the Exome Aggregation Consortium (ExAC), provide an unprecedented opportunity to obtain robust frequency estimates even for very rare variants. Here we present a statistical framework for the frequency-based filtering of candidate disease-causing variants, accounting for disease prevalence, genetic and allelic heterogeneity, inheritance mode, penetrance, and sampling variance in reference datasets. Using the example of cardiomyopathy, we show that our approach reduces by two-thirds the number of candidate variants under consideration in the average exome, and identifies 43 variants previously reported as pathogenic that can now be reclassified. We present precomputed allele frequency cutoffs for all variants in the ExAC dataset.

Genetics

Molecular Biology

0

Paper

Save

Haplotype sharing provides insights into fine-scale population history and disease in Finland

Alicia Martin et al.Oct 13, 2017

Abstract Finland provides unique opportunities to investigate population and medical genomics because of its adoption of unified national electronic health records, detailed historical and birth records, and serial population bottlenecks. We assemble a comprehensive view of recent population history (≤100 generations), the timespan during which most rare disease-causing alleles arose, by comparing pairwise haplotype sharing from 43,254 Finns to geographically and linguistically adjacent countries with different population histories, including 16,060 Swedes, Estonians, Russians, and Hungarians. We find much more extensive sharing in Finns, with at least one ≥ 5 cM tract on average between pairs of unrelated individuals. By coupling haplotype sharing with fine-scale birth records from over 25,000 individuals, we find that while haplotype sharing broadly decays with geographical distance, there are pockets of excess haplotype sharing; individuals from northeast Finland share several-fold more of their genome in identity-by-descent (IBD) segments than individuals from southwest regions containing the major cities of Helsinki and Turku. We estimate recent effective population size changes over time across regions of Finland and find significant differences between the Early and Late Settlement Regions as expected; however, our results indicate more continuous gene flow than previously indicated as Finns migrated towards the northernmost Lapland region. Lastly, we show that haplotype sharing is locally enriched among pairs of individuals sharing rare alleles by an order of magnitude, especially among pairs sharing rare disease causing variants. Our work provides a general framework for using haplotype sharing to reconstruct an integrative view of recent population history and gain insight into the evolutionary origins of rare variants contributing to disease.

Genetics

History

0

Paper

Save

Human genetic analyses of organelles highlight the nucleus in age-related trait heritability

Rahul Gupta et al.Jan 22, 2021

Abstract Most age-related human diseases are accompanied by a decline in cellular organelle integrity, including impaired lysosomal proteostasis and defective mitochondrial oxidative phosphorylation. An open question, however, is the degree to which inherited variation in or near genes encoding each organelle contributes to age-related disease pathogenesis. Here, we evaluate if genetic loci encoding organelle proteomes confer greater-than-expected age-related disease risk. As mitochondrial dysfunction is a “hallmark” of aging, we begin by assessing nuclear and mitochondrial DNA loci near genes encoding the mitochondrial proteome and surprisingly observe a lack of enrichment across 24 age-related traits. Within nine other organelles, we find no enrichment with one exception: the nucleus, where enrichment emanates from nuclear transcription factors. In agreement, we find that genes encoding several organelles tend to be “haplosufficient,” while we observe strong purifying selection against heterozygous protein-truncating variants impacting the nucleus. Our work identifies common variation near transcription factors as having outsize influence on age-related trait risk, motivating future efforts to determine if and how this inherited variation then contributes to observed age-related organelle deterioration.

Genetics

Epidemiology

1

Paper

Genetics

2

0

Save