ResearchHub | Open Science Community

Inherited Causes of Clonal Hematopoiesis of Indeterminate Potential in TOPMed Whole Genomes

Alexander Bick et al.Sep 27, 2019

ABSTRACT Age is the dominant risk factor for most chronic human diseases; yet the mechanisms by which aging confers this risk are largely unknown. 1 Recently, the age-related acquisition of somatic mutations in regenerating hematopoietic stem cell populations was associated with both hematologic cancer incidence 2–4 and coronary heart disease prevalence. 5 Somatic mutations with leukemogenic potential may confer selective cellular advantages leading to clonal expansion, a phenomenon termed ‘Clonal Hematopoiesis of Indeterminate Potential’ (CHIP). 6 Simultaneous germline and somatic whole genome sequence analysis now provides the opportunity to identify root causes of CHIP. Here, we analyze high-coverage whole genome sequences from 97,691 participants of diverse ancestries in the NHLBI TOPMed program and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid, and inflammatory traits specific to different CHIP genes. Association of a genome-wide set of germline genetic variants identified three genetic loci associated with CHIP status, including one locus at TET2 that was African ancestry specific. In silico -informed in vitro evaluation of the TET2 germline locus identified a causal variant that disrupts a TET2 distal enhancer. Aggregates of rare germline loss-of-function variants in CHEK2 , a DNA damage repair gene, predisposed to CHIP acquisition. Overall, we observe that germline genetic variation altering hematopoietic stem cell function and the fidelity of DNA-damage repair increase the likelihood of somatic mutations leading to CHIP.

Genetics

Molecular Biology

0

Paper

Save

A Saturated Map of Common Genetic Variants Associated with Human Height from 5.4 Million Individuals of Diverse Ancestries

Loïc Yengo et al.Jan 10, 2022

ABSTRACT Common SNPs are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes. Here we show, using GWAS data from 5.4 million individuals of diverse ancestries, that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a median size of ~90 kb, covering ~21% of the genome. The density of independent associations varies across the genome and the regions of elevated density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs account for 40% of phenotypic variance in European ancestry populations but only ~10%-20% in other ancestries. Effect sizes, associated regions, and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely explained by linkage disequilibrium and allele frequency differences within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than needed to implicate causal genes and variants. Overall, this study, the largest GWAS to date, provides an unprecedented saturated map of specific genomic regions containing the vast majority of common height-associated variants.

Genetics

Biology

3

Paper

Save

Clonal hematopoiesis is driven by aberrant activation of TCL1A

Joshua Weinstock et al.Dec 13, 2021

Abstract A diverse set of driver genes, such as regulators of DNA methylation, RNA splicing, and chromatin remodeling, have been associated with pre-malignant clonal expansion of hematopoietic stem cells (HSCs). The factors mediating expansion of these mutant clones remain largely unknown, partially due to a paucity of large cohorts with longitudinal blood sampling. To circumvent this limitation, we developed and validated a method to infer clonal expansion rate from single timepoint data called PACER (passenger-approximated clonal expansion rate). Applying PACER to 5,071 persons with clonal hematopoiesis accurately recapitulated the known fitness effects due to different driver mutations. A genome-wide association study of PACER revealed that a common inherited polymorphism in the TCL1A promoter was associated with slower clonal expansion. Those carrying two copies of this protective allele had up to 80% reduced odds of having driver mutations in TET2, ASXL1, SF3B1, SRSF2 , and JAK2 , but not DNMT3A. TCL1A was not expressed in normal or DNMT3A -mutated HSCs, but the introduction of mutations in TET2 or ASXL1 by CRISPR editing led to aberrant expression of TCL1A and expansion of HSCs in vitro. These effects were abrogated in HSCs from donors carrying the protective TCL1A allele. Our results indicate that the fitness advantage of multiple common driver genes in clonal hematopoiesis is mediated through TCL1A activation. PACER is an approach that can be widely applied to uncover genetic and environmental determinants of pre-malignant clonal expansion in blood and other tissues.

Genetics

Hematology

1

Paper

Save

Characterising the loss-of-function impact of 5’ untranslated region variants in whole genome sequence data from 15,708 individuals

Leif Groop et al.Feb 7, 2019

Abstract Upstream open reading frames (uORFs) are important tissue-specific cis -regulators of protein translation. Although isolated case reports have shown that variants that create or disrupt uORFs can cause disease, genetic sequencing approaches typically focus on protein-coding regions and ignore these variants. Here, we describe a systematic genome-wide study of variants that create and disrupt human uORFs, and explore their role in human disease using 15,708 whole genome sequences collected by the Genome Aggregation Database (gnomAD) project. We show that 14,897 variants that create new start codons upstream of the canonical coding sequence (CDS), and 2,406 variants disrupting the stop site of existing uORFs, are under strong negative selection. Furthermore, variants creating uORFs that overlap the CDS show signals of selection equivalent to coding loss-of-function variants, and uORF-perturbing variants are under strong selection when arising upstream of known disease genes and genes intolerant to loss-of-function variants. Finally, we identify specific genes where perturbation of uORFs is likely to represent an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in families with neurofibromatosis. Our results highlight uORF-perturbing variants as an important and under-recognised functional class that can contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data to study the deleteriousness of specific classes of non-coding variants.

Genetics

Oncology

0

Paper

Save

Validation of human telomere length multi-ancestry meta-analysis association signals identifies POP5 and KBTBD6 as human telomere length regulation genes

Rebecca Keener et al.May 24, 2024

Abstract Genome-wide association studies (GWAS) have become well-powered to detect loci associated with telomere length. However, no prior work has validated genes nominated by GWAS to examine their role in telomere length regulation. We conducted a multi-ancestry meta-analysis of 211,369 individuals and identified five novel association signals. Enrichment analyses of chromatin state and cell-type heritability suggested that blood/immune cells are the most relevant cell type to examine telomere length association signals. We validated specific GWAS associations by overexpressing KBTBD6 or POP5 and demonstrated that both lengthened telomeres. CRISPR/Cas9 deletion of the predicted causal regions in K562 blood cells reduced expression of these genes, demonstrating that these loci are related to transcriptional regulation of KBTBD6 and POP5 . Our results demonstrate the utility of telomere length GWAS in the identification of telomere length regulation mechanisms and validate KBTBD6 and POP5 as genes affecting telomere length regulation.

Genetics

Physiology

0

Paper

Save

Whole genome association testing in 333,100 individuals across three biobanks identifies rare non-coding single variant and genomic aggregate associations with height

Gareth Hawkes et al.Jan 1, 2023

The role of rare non-coding variation in complex human phenotypes is still largely unknown. To elucidate the impact of rare variants in regulatory elements, we performed a whole-genome sequencing association analysis for height using 333,100 individuals from three datasets: UK Biobank (N=200,003), TOPMed (N=87,652) and All of Us (N=45,445). We performed rare (<0.1% minor-allele-frequency) single-variant and aggregate testing of non-coding variants in regulatory regions based on proximal, intergenic and deep-intronic annotation. We observed 29 independent variants associated with height at P<6x10-10 after conditioning on previously reported variants, with effect sizes ranging from -7cm to +4.7cm. We also identified and replicated non-coding aggregate-based associations proximal to HMGA1 containing variants associated with a 5cm taller height and of highly-conserved variants in MIR497HG on chromosome 17. We have developed a novel approach for identifying non-coding rare variants in regulatory regions with large effects from whole-genome sequencing data associated with complex traits.

Genetics

Molecular Biology

0

Paper

Save

Atrial Fibrillation Genetic Risk Differentiates Cardioembolic Stroke from other Stroke Subtypes

Eric Boerwinkle et al.Dec 24, 2017

Atrial fibrillation is a prevalent arrhythmia associated with a five-fold increased risk of ischemic stroke, and specifically the cardioembolic stroke subtype. Genome-wide association studies of these traits have yielded overlapping risk loci, but genome-wide investigation of genetic susceptibility shared between stroke and atrial fibrillation is lacking. Comparing the genetic architectures of the two diseases could inform whether cardioembolic strokes are driven by inherited atrial fibrillation susceptibility, and may help elucidate ischemic stroke mechanisms. Here, we analyze genome-wide genotyping data and estimate SNP-based heritability in atrial fibrillation and cardioembolic stroke to be nearly identical (20.0% and 19.5%, respectively). Further, we find that the traits are genetically correlated (r=0.77 for SNPs with p < 4.4 x 10-4 in a previous atrial fibrillation meta-analysis). Clinical studies are warranted to assess whether genetic susceptibility to atrial fibrillation can be leveraged to improve the diagnosis and care of ischemic stroke patients.

Genetics

Internal Medicine

0

Paper

Save

Robust, flexible, and scalable tests for Hardy-Weinberg Equilibrium across diverse ancestries

Alan Kwong et al.Jun 24, 2020

ABSTRACT Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ 2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in datasets comprised of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence datasets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently amongst the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth .

Genetics

Ecology

0

Paper

Save

Epigenetic and proteomic signatures associate with clonal hematopoiesis expansion rate

Taralynn Mack et al.Jun 4, 2024

Clonal hematopoiesis of indeterminate potential (CHIP), whereby somatic mutations in hematopoietic stem cells confer a selective advantage and drive clonal expansion, not only correlates with age but also confers increased risk of morbidity and mortality. Here, we leverage genetically predicted traits to identify factors that determine CHIP clonal expansion rate. We used the passenger-approximated clonal expansion rate method to quantify the clonal expansion rate for 4,370 individuals in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) cohort and calculated polygenic risk scores for DNA methylation aging, inflammation-related measures and circulating protein levels. Clonal expansion rate was significantly associated with both genetically predicted and measured epigenetic clocks. No associations were identified with inflammation-related lab values or diseases and CHIP expansion rate overall. A proteome-wide search identified predicted circulating levels of myeloid zinc finger 1 and anti-Müllerian hormone as associated with an increased CHIP clonal expansion rate and tissue inhibitor of metalloproteinase 1 and glycine N-methyltransferase as associated with decreased CHIP clonal expansion rate. Together, our findings identify epigenetic and proteomic patterns associated with the rate of hematopoietic clonal expansion. Exploring the clonal expansion of somatically mutated hematopoietic stem cells with aging, Mack, Raddatz et al. quantify rates of clonal expansion in 4,370 individuals in the Trans-Omics for Precision Medicine cohort, observing epigenetic and proteomic patterns associated with clonal hematopoiesis of indeterminate potential.

Genetics

Immunology

0

Paper

Save

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Daniel Taliun et al.Mar 6, 2019

The Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here, we describe TOPMed goals and design as well as resources and early insights from the sequence data. The resources include a variant browser, a genotype imputation panel, and sharing of genomic and phenotypic data via dbGaP. In 53,581 TOPMed samples, >400 million single-nucleotide and insertion/deletion variants were detected by alignment with the reference genome. Additional novel variants are detectable through assembly of unmapped reads and customized analysis in highly variable loci. Among the >400 million variants detected, 97% have frequency <1% and 46% are singletons. These rare variants provide insights into mutational processes and recent human evolutionary history. The nearly complete catalog of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and non-coding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and extends the reach of nearly all genome-wide association studies to include variants down to ~0.01% in frequency.

Genetics

Pathology And Forensic Medicine

0

Paper

Genetics

Pathology And Forensic Medicine

0

Save