ResearchHub | Open Science Community

Genotyping common, large structural variations in 5,202 genomes using pangenomes, the Giraffe mapper, and the vg toolkit

Jouni Sirén et al.Dec 6, 2020

ABSTRACT We introduce Giraffe, a pangenome short read mapper that can efficiently map to a collection of haplotypes threaded through a sequence graph. Giraffe, part of the variation graph toolkit (vg) 1 , maps reads to thousands of human genomes at around the same speed BWA-MEM 2 maps reads to a single reference genome, while maintaining comparable accuracy to VG-MAP, vg’s original mapper. We have developed efficient genotyping pipelines using Giraffe. We demonstrate improvements in genotyping for single-nucleotide variants (SNVs), small insertions and deletions (indels) and structural variations (SVs) genome-wide. We use Giraffe to genotype about 167 thousand structural variants ascertained from long read studies in 5,202 human genomes sequenced with short reads, including the complete 1000 Genomes Project dataset, at an average cost of $1.50 per sample. We determine the frequency of these variations in diverse human populations, characterize their complex allelic variations and identify thousands of expression quantitative trait loci (eQTLs) driven by these variations.

Genetics

Molecular Biology

124

Paper

Save

A missense variant in Mitochondrial Amidoxime Reducing Component 1 gene and protection against liver disease

Connor Emdin et al.Mar 31, 2019

Analyzing 5770 all-cause cirrhosis cases and 572,850 controls from seven cohorts, we identify a missense variant in the Mitochondrial Amidoxime Reducing Component 1 gene ( MARC1 p.A165T) that associates with protection from all-cause cirrhosis (OR 0.88, p=2.1*10 −8 ). This same variant also associates with lower levels of hepatic fat on computed tomographic imaging and lower odds of physician-diagnosed fatty liver as well as lower blood levels of alanine transaminase (−0.012 SD, 1.4*10 −8 ), alkaline phosphatase (−0.019 SD, 6.6*10 −9 ), total cholesterol (−0.037 SD, p=1*10 −18 ) and LDL cholesterol (−0.035 SD, p=7.3*10 −16 ). Carriers of rare protein-truncating variants in MARC1 had lower liver enzyme levels, cholesterol levels, and reduced odds of liver disease (OR 0.19, p= 0.04) suggesting that deficiency of the MARC1 enzyme protects against cirrhosis.

Genetics

Biochemistry

0

Paper

Save

Characterising the loss-of-function impact of 5’ untranslated region variants in whole genome sequence data from 15,708 individuals

Leif Groop et al.Feb 7, 2019

Abstract Upstream open reading frames (uORFs) are important tissue-specific cis -regulators of protein translation. Although isolated case reports have shown that variants that create or disrupt uORFs can cause disease, genetic sequencing approaches typically focus on protein-coding regions and ignore these variants. Here, we describe a systematic genome-wide study of variants that create and disrupt human uORFs, and explore their role in human disease using 15,708 whole genome sequences collected by the Genome Aggregation Database (gnomAD) project. We show that 14,897 variants that create new start codons upstream of the canonical coding sequence (CDS), and 2,406 variants disrupting the stop site of existing uORFs, are under strong negative selection. Furthermore, variants creating uORFs that overlap the CDS show signals of selection equivalent to coding loss-of-function variants, and uORF-perturbing variants are under strong selection when arising upstream of known disease genes and genes intolerant to loss-of-function variants. Finally, we identify specific genes where perturbation of uORFs is likely to represent an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in families with neurofibromatosis. Our results highlight uORF-perturbing variants as an important and under-recognised functional class that can contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data to study the deleteriousness of specific classes of non-coding variants.

Genetics

Oncology

0

Paper

Save

Multiset correlation and factor analysis enables exploration of multi-omic data

Brielin Brown et al.Jul 20, 2022

Abstract Multi-omics datasets are becoming more common, necessitating better integration methods to realize their revolutionary potential. Here, we introduce Multi-set Correlation and Factor Analysis, an unsupervised integration method that enables fast inference of shared and private factors in multi-modal data. Applied to 614 ancestry-diverse participant samples across five ‘omics types, MCFA infers a shared space that captures clinically relevant molecular processes.

Genetics

Artificial Intelligence

28

Paper

Save

Gene expression in African Americans and Latinos reveals ancestry-specific patterns of genetic architecture

Linda Kachuri et al.Aug 19, 2021

ABSTRACT We analyzed whole genome and RNA sequencing data from 2,733 African American and Hispanic/Latino children to explore ancestry- and heterozygosity-related differences in the genetic architecture of whole blood gene expression. We found that heritability of gene expression significantly increases with greater proportion of African genetic ancestry and decreases with higher levels of Indigenous American ancestry, consistent with a relationship between heterozygosity and genetic variance. Among heritable protein-coding genes, the prevalence of statistically significant ancestry-specific expression quantitative trait loci (anc-eQTLs) was 30% in African ancestry and 8% for Indigenous American ancestry segments. Most of the anc-eQTLs (89%) were driven by population differences in allele frequency, demonstrating the importance of measuring gene expression across multiple populations. Transcriptome-wide association analyses of multi-ancestry summary statistics for 28 traits identified 79% more gene-trait pairs using models trained in our admixed population than models trained in GTEx. Our study highlights the importance of large and ancestrally diverse genomic studies for enabling new discoveries of complex trait architecture and reducing disparities.

Genetics

Molecular Biology

1

Paper

Save

Rare coding variants in 35 genes associate with circulating lipid levels – a multi-ancestry analysis of 170,000 exomes

George Hindy et al.Dec 23, 2020

Abstract Large-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency<1%) predicted damaging coding variation using sequence data from >170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels. Ten of these: ALB , SRSF2 , JAK2, CREB3L3 , TMEM136 , VARS , NR1H3 , PLA2G12A , PPARG and STAB1 have not been implicated for lipid levels using rare coding variation in population-based samples. We prioritize 32 genes identified in array-based genome-wide association study (GWAS) loci based on gene-based associations, of which three: EVI5, SH2B3 , and PLIN1 , had no prior evidence of rare coding variant associations. Most of the associated genes showed evidence of association in multiple ancestries. Also, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes, and for genes closest to GWAS index single nucleotide polymorphisms (SNP). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.

Genetics

Surgery

1

Paper

Save

The functional impact of rare variation across the regulatory cascade

Taibo Li et al.Sep 9, 2022

Abstract Each human genome has tens of thousands of rare genetic variants; however, identifying impactful rare variants remains a major challenge. We demonstrate how use of personal multi-omics can enable identification of impactful rare variants by using the Multi-Ethnic Study of Atherosclerosis (MESA) which included several hundred individuals with whole genome sequencing, transcriptomes, methylomes, and proteomes collected across two time points, ten years apart. We evaluated each multi-omic phenotype’s ability to separately and jointly inform functional rare variation. By combining expression and protein data, we observed rare stop variants 62x and rare frameshift variants 216x as frequently as controls, compared to 13x to 27x for expression or protein effects alone. We developed a Bayesian hierarchical model to prioritize specific rare variants underlying multi-omic signals across the regulatory cascade. With this approach, we identified rare variants that exhibited large effect sizes on multiple complex traits including height, schizophrenia, and Alzheimer’s disease.

Genetics

Molecular Biology

1

Paper

Save

Interaction molecular QTL mapping discovers cellular and environmental modifiers of genetic regulatory effects

Silva Kasela et al.Jun 27, 2023

Abstract Bulk tissue molecular quantitative trait loci (QTLs) have been the starting point for interpreting disease-associated variants, while context-specific QTLs show particular relevance for disease. Here, we present the results of mapping interaction QTLs (iQTLs) for cell type, age, and other phenotypic variables in multi-omic, longitudinal data from blood of individuals of diverse ancestries. By modeling the interaction between genotype and estimated cell type proportions, we demonstrate that cell type iQTLs could be considered as proxies for cell type-specific QTL effects. The interpretation of age iQTLs, however, warrants caution as the moderation effect of age on the genotype and molecular phenotype association may be mediated by changes in cell type composition. Finally, we show that cell type iQTLs contribute to cell type-specific enrichment of diseases that, in combination with additional functional data, may guide future functional studies. Overall, this study highlights iQTLs to gain insights into the context-specificity of regulatory effects.

Genetics

Paleontology

43

Paper

Save

An open resource of structural variation for medical and population genetics

Ryan Collins et al.Mar 14, 2019

Structural variants (SVs) rearrange large segments of the genome and can have profound consequences for evolution and human diseases. As national biobanks, disease association studies, and clinical genetic testing grow increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD) have become integral for interpreting genetic variation. To date, no large-scale reference maps of SVs exist from high-coverage sequencing comparable to those available for point mutations in protein-coding genes. Here, we constructed a reference atlas of SVs across 14,891 genomes from diverse global populations (54% non-European) as a component of gnomAD. We discovered a rich landscape of 433,371 distinct SVs, including 5,295 multi-breakpoint complex SVs across 11 mutational subclasses, and examples of localized chromosome shattering, as in chromothripsis. The average individual harbored 7,439 SVs, which accounted for 25-29% of all rare protein-truncating events per genome. We found strong correlations between constraint against damaging point mutations and rare SVs that both disrupt and duplicate protein-coding sequence, suggesting intolerance to reciprocal dosage alterations for a subset of tightly regulated genes. We also uncovered modest selection against noncoding SVs in cis -regulatory elements, although selection against protein-truncating SVs was stronger than any effect on noncoding SVs. Finally, we benchmarked carrier rates for medically relevant SVs, finding very large (≥1Mb) rare SVs in 3.8% of genomes (~1:26 individuals) and clinically reportable incidental SVs in 0.18% of genomes (~1:556 individuals). These data have been integrated directly into the gnomAD browser ( ) and will have broad utility for population genetics, disease association, and diagnostic screening.

Genetics

Cancer Research

0

Paper

Save

Proteome-Wide Association Studies for Blood Lipids and Comparison with Transcriptome-Wide Association Studies

Daiwei Zhang et al.Aug 21, 2023

Blood lipid traits are treatable and heritable risk factors for heart disease, a leading cause of mortality worldwide. Although genome-wide association studies (GWAS) have discovered hundreds of variants associated with lipids in humans, most of the causal mechanisms of lipids remain unknown. To better understand the biological processes underlying lipid metabolism, we investigated the associations of plasma protein levels with total cholesterol (TC), triglycerides (TG), high-density lipoprotein cholesterol (HDL), and low-density lipoprotein cholesterol (LDL) in blood. We trained protein prediction models based on samples in the Multi-Ethnic Study of Atherosclerosis (MESA) and applied them to conduct proteome-wide association studies (PWAS) for lipids using the Global Lipids Genetics Consortium (GLGC) data. Of the 749 proteins tested, 42 were significantly associated with at least one lipid trait. Furthermore, we performed transcriptome-wide association studies (TWAS) for lipids using 9,714 gene expression prediction models trained on samples from peripheral blood mononuclear cells (PBMCs) in MESA and 49 tissues in the Genotype-Tissue Expression (GTEx) project. We found that although PWAS and TWAS can show different directions of associations in an individual gene, 40 out of 49 tissues showed a positive correlation between PWAS and TWAS signed p-values across all the genes, which suggests a high-level consistency between proteome-lipid associations and transcriptome-lipid associations.

Genetics

Epidemiology

1

Paper

Genetics

Epidemiology

0

Save