ResearchHub | Open Science Community

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Daniel Taliun et al.Feb 10, 2021

Abstract The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes) 1 . In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Genetics

Molecular Biology

1

Paper

Save

Human cholesterol 7α-hydroxylase (CYP7A1) deficiency has a hypercholesterolemic phenotype

Clive Pullinger et al.Jul 1, 2002

Bile acid synthesis plays a critical role in the maintenance of mammalian cholesterol homeostasis. The CYP7A1 gene encodes the enzyme cholesterol 7α-hydroxylase, which catalyzes the initial step in cholesterol catabolism and bile acid synthesis. We report here a new metabolic disorder presenting with hyperlipidemia caused by a homozygous deletion mutation in CYP7A1. The mutation leads to a frameshift (L413fsX414) that results in loss of the active site and enzyme function. High levels of LDL cholesterol were seen in three homozygous subjects. Analysis of a liver biopsy and stool from one of these subjects revealed double the normal hepatic cholesterol content, a markedly deficient rate of bile acid excretion, and evidence for upregulation of the alternative bile acid pathway. Two male subjects studied had hypertriglyceridemia and premature gallstone disease, and their LDL cholesterol levels were noticeably resistant to 3-hydroxy-3-methylglutaryl-coenzyme A reductase inhibitors. One subject also had premature coronary and peripheral vascular disease. Study of the kindred, which is of English and Celtic background, revealed that individuals heterozygous for the mutation are also hyperlipidemic, indicating that this is a codominant disorder.

Biochemistry

Oncology

0

Paper

Save

The genetics of Mexico recapitulates Native American substructure and affects biomedical traits

Andrés Moreno‐Estrada et al.Jun 12, 2014

The population structure of Native Mexicans The genetics of indigenous Mexicans exhibit substantial geographical structure, some as divergent from each other as are existing populations of Europeans and Asians. By performing genome-wide analyses on Native Mexicans from differing populations, Moreno-Estrada et al. successfully recapitulated the pre-Columbian substructure of Mexico. This ancestral structure is evident among cosmopolitan Mexicans and is correlated with subcontinental origins and medically relevant aspects of lung function. These findings exemplify the importance of understanding the genetic contributions of admixed individuals. Science , this issue p. 1280

Genetics

Ecology

0

Paper

Save

Human cholesterol 7α-hydroxylase (CYP7A1) deficiency has a hypercholesterolemic phenotype

Clive Pullinger et al.Jul 1, 2002

Bile acid synthesis plays a critical role in the maintenance of mammalian cholesterol homeostasis. The CYP7A1 gene encodes the enzyme cholesterol 7α-hydroxylase, which catalyzes the initial step in cholesterol catabolism and bile acid synthesis. We report here a new metabolic disorder presenting with hyperlipidemia caused by a homozygous deletion mutation in CYP7A1. The mutation leads to a frameshift (L413fsX414) that results in loss of the active site and enzyme function. High levels of LDL cholesterol were seen in three homozygous subjects. Analysis of a liver biopsy and stool from one of these subjects revealed double the normal hepatic cholesterol content, a markedly deficient rate of bile acid excretion, and evidence for upregulation of the alternative bile acid pathway. Two male subjects studied had hypertriglyceridemia and premature gallstone disease, and their LDL cholesterol levels were noticeably resistant to 3-hydroxy-3-methylglutaryl-coenzyme A reductase inhibitors. One subject also had premature coronary and peripheral vascular disease. Study of the kindred, which is of English and Celtic background, revealed that individuals heterozygous for the mutation are also hyperlipidemic, indicating that this is a codominant disorder.

Biochemistry

Internal Medicine

0

Paper

Save

Reconstructing the Population Genetic History of the Caribbean

Andrés Moreno‐Estrada et al.Nov 14, 2013

The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse—which today is reflected by shorter, older ancestry tracts—consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse—reflected by longer, younger tracts—is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived.

Genetics

Ecology

0

Paper

Save

Assembly of a pan-genome from deep sequencing of 910 humans of African descent

Rachel Sherman et al.Nov 13, 2018

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic. Assembly of a pan-genome from 910 humans of African descent identifies 296.5 Mb of novel DNA mapping to 125,715 distinct contigs. This African pan-genome contains ~10% more DNA than the current human reference genome.

Genetics

Molecular Biology

0

Paper

Save

Development of a Panel of Genome-Wide Ancestry Informative Markers to Study Admixture Throughout the Americas

Joshua Galanter et al.Mar 8, 2012

Most individuals throughout the Americas are admixed descendants of Native American, European, and African ancestors. Complex historical factors have resulted in varying proportions of ancestral contributions between individuals within and among ethnic groups. We developed a panel of 446 ancestry informative markers (AIMs) optimized to estimate ancestral proportions in individuals and populations throughout Latin America. We used genome-wide data from 953 individuals from diverse African, European, and Native American populations to select AIMs optimized for each of the three main continental populations that form the basis of modern Latin American populations. We selected markers on the basis of locus-specific branch length to be informative, well distributed throughout the genome, capable of being genotyped on widely available commercial platforms, and applicable throughout the Americas by minimizing within-continent heterogeneity. We then validated the panel in samples from four admixed populations by comparing ancestry estimates based on the AIMs panel to estimates based on genome-wide association study (GWAS) data. The panel provided balanced discriminatory power among the three ancestral populations and accurate estimates of individual ancestry proportions (R² > 0.9 for ancestral components with significant between-subject variance). Finally, we genotyped samples from 18 populations from Latin America using the AIMs panel and estimated variability in ancestry within and between these populations. This panel and its reference genotype information will be useful resources to explore population history of admixture in Latin America and to correct for the potential effects of population stratification in admixed samples in the region.

Genetics

Philosophy

0

Paper

Save

Dissecting childhood asthma with nasal transcriptomics distinguishes subphenotypes of disease

Alex Poole et al.Feb 3, 2014

Genetics

Immunology

0

Paper

Save

Fast and accurate inference of local ancestry in Latino populations

Yael Baran et al.Apr 11, 2012

Abstract Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in two-way admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos. Availability: http://lamp.icsi.berkeley.edu/lamp/lampld/ Contact: bpasaniu@hsph.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Genetics

Artificial Intelligence

0

Paper

Save

Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies

Elior Rahmani et al.Mar 28, 2016

Genetics

Artificial Intelligence

0

Paper

Genetics

226

0

Save