ResearchHub | Open Science Community

A reference panel of 64,976 haplotypes for genotype imputation

Shane McCarthy et al.Aug 22, 2016

Jonathan Marchini, Gonçalo Abecasis, Richard Durbin and colleagues describe the construction of a reference panel of human haplotypes from whole-genome sequencing data. They are able to use this to accurately impute genotypes at low minor allele frequency and present remote server resources for use by the community. We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.

Genetics

Molecular Biology

0

Paper

Save

Reference-based phasing using the Haplotype Reference Consortium panel

Po‐Ru Loh et al.Oct 3, 2016

Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.

Genetics

Demography

1

Paper

Save

The GenomeAsia 100K Project enables genetic discoveries across Asia

Jeffrey Wall et al.Dec 4, 2019

Abstract The underrepresentation of non-Europeans in human genetic studies so far has limited the diversity of individuals in genomic datasets and led to reduced medical relevance for a large proportion of the world’s population. Population-specific reference genome datasets as well as genome-wide association studies in diverse populations are needed to address this issue. Here we describe the pilot phase of the GenomeAsia 100K Project. This includes a whole-genome sequencing reference dataset from 1,739 individuals of 219 population groups and 64 countries across Asia. We catalogue genetic variation, population structure, disease associations and founder effects. We also explore the use of this dataset in imputation, to facilitate genetic studies in populations across Asia and worldwide.

Genetics

Molecular Biology

0

Paper

Save

South Asian Patient Population Genetics Reveal Strong Founder Effects and High Rates of Homozygosity – New Resources for Precision Medicine

Jeffrey Wall et al.Oct 2, 2020

Abstract Population-scale genetic studies can identify drug targets and allow disease risk to be predicted with resulting benefit for management of individual health risks and system-wide allocation of health care delivery resources. Although population-scale projects are underway in many parts of the world, genetic variation between population groups means that additional projects are warranted. South Asia has a population whose genetics is the least characterized of any of the world’s major populations. Here we describe GenomeAsia studies that characterize population structure in South Asia and that create tools for economical and accurate genotyping at population-scale. Prior work on population structure characterized isolated population groups, the relevance of which to large-scale studies of disease genetics is unclear. For our studies we used whole genome sequence information from 4,807 individuals recruited in the health care delivery systems of Pakistan, India and Bangladesh to ensure relevance to population-scale studies of disease genetics. We combined this with WGS data from 927 individuals from isolated South Asian population groups, and developed a custom SNP array (called SARGAM) that is optimized for future human genetic studies in South Asia. We find evidence for high rates of reproductive isolation, endogamy and consanguinity that vary across the subcontinent and that lead to levels of homozygosity that approach 100 times that seen in outbred populations. We describe founder effects that increase the power to associate functional variants with disease processes and that make South Asia a uniquely powerful place for population-scale genetic studies.

Genetics

Demography

43

Paper

Save

Haplocheck: Phylogeny-based Contamination Detection in Mitochondrial and Whole-Genome Sequencing Studies

Hansi Weißensteiner et al.May 8, 2020

Abstract Within-species contamination is a major issue in sequencing studies, especially for mitochondrial studies. Contamination can be detected by analysing the nuclear genome or by inspecting the heteroplasmic sites in the mitochondrial genome. Existing methods using the nuclear genome are computationally expensive, and no suitable tool for detecting contamination in large-scale mitochondrial datasets is available. Here we present haplocheck, a tool that requires only the mitochondrial genome to detect contamination in both mitochondrial and whole-genome sequencing studies. Haplocheck is able to distinguish between contaminated and real heteroplasmic sites using the mitochondrial phylogeny. By applying haplocheck to the 1000 Genomes Project data, we show (1) high concordance in contamination estimates between mitochondrial and nuclear DNA and (2) quantify the impact of mitochondrial copy numbers on the mitochondrial based contamination results. Haplocheck complements leading nuclear DNA based contamination tools, and can therefore be used as a proxy tool in nuclear genome studies. Haplocheck is available both as a command-line tool at https://github.com/genepi/haplocheck and as a cloud web-service producing interactive reports that facilitates the navigation through the phylogeny of contaminated samples.

Genetics

Ecology

1

Paper

Save

LBP-32-The Natural History of Ferroportin Disease-First Results of the International, Multicenter EASL non-HFE Registry

Benedikt Schaefer et al.Apr 1, 2019

Molecular Biology

Internal Medicine

1

Paper

Save

Loss-of-function genomic variants with impact on liver-related blood traits highlight potential therapeutic targets for cardiovascular disease

Jonas Nielsen et al.Apr 2, 2019

Cardiovascular diseases (CVD), and in particular cerebrovascular and ischemic heart diseases, are leading causes of death globally. Lowering circulating lipids is an important treatment strategy to reduce risk. However, some pharmaceutical mechanisms of reducing CVD may increase risk of fatty liver disease or other metabolic disorders. To identify potential novel therapeutic targets, which may reduce risk of CVD without increasing risk of metabolic disease, we focused on the simultaneous evaluation of quantitative traits related to liver function and CVD. Using a combination of low-coverage (5x) whole-genome sequencing and targeted genotyping, deep genotype imputation based on the TOPMed reference pane, and genome-wide association study (GWAS) meta-analysis, we analyzed 12 liver-related blood traits (including liver enzymes, blood lipids, and markers of iron metabolism) in up to 203,476 people from three population-based cohorts of different ancestries. We identified 88 likely causal protein-altering variants that were associated with one or more liver-related blood traits. We identified several loss-of-function (LoF) variants reducing low-density lipoprotein cholesterol (LDL-C) or risk of CVD without increased risk of liver disease or diabetes, including variants in known lipid genes (e.g. APOB, LPL). A novel LoF variant, ZNF529:p.K405X, was associated with decreased levels of LDL-C (P=1.3x10-8) but demonstrated no association with liver enzymes or non-fasting blood glucose levels. Silencing of ZNF529 in human hepatocytes resulted in upregulation of LDL receptor (LDLR) and increased LDL-C uptake in the cells, suggesting that inhibition of ZNF529 or its gene product could be used for treating hypercholesterolemia and hence reduce the risk of CVD. Taken together, we demonstrate that simultaneous consideration of multiple phenotypes and a focus on rare protein-altering variants may identify promising therapeutic targets.

Genetics

Epidemiology

0

Paper

Save

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Daniel Taliun et al.Mar 6, 2019

The Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here, we describe TOPMed goals and design as well as resources and early insights from the sequence data. The resources include a variant browser, a genotype imputation panel, and sharing of genomic and phenotypic data via dbGaP. In 53,581 TOPMed samples, >400 million single-nucleotide and insertion/deletion variants were detected by alignment with the reference genome. Additional novel variants are detectable through assembly of unmapped reads and customized analysis in highly variable loci. Among the >400 million variants detected, 97% have frequency <1% and 46% are singletons. These rare variants provide insights into mutational processes and recent human evolutionary history. The nearly complete catalog of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and non-coding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and extends the reach of nearly all genome-wide association studies to include variants down to ~0.01% in frequency.

Genetics

Pathology And Forensic Medicine

0

Paper

Genetics

Pathology And Forensic Medicine

0

Save

1

The Natural History of Ferroportin Disease – First Results of the International, Multicenter EASL non-HFE Registry

Maria Troppmair et al.May 1, 2023

Background Ferroportin disease is caused by heterozygous mutations in SLC40A1, characterized by high serum ferritin and hepatic iron overload. Prognosis and management of patients with SLC40A1 mutations has been inferred from HFE associated hemochromatosis, despite different phenotypic presentation in patients with ferroportin disease. The aim of the present study was to define the clinical and biochemical characteristics and management of patients with SLC40A1 mutations.

Genetics

Internal Medicine

1

Paper

Save

Reference-based phasing using the Haplotype Reference Consortium panel

Po‐Ru Loh et al.May 10, 2016

Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing within a genotyped cohort, an approach that can attain high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here, we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium, HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ≈20x speedup and ≈10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2x the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.

Genetics

Machine Learning

0

Paper

Genetics

Machine Learning

0

Save