ResearchHub | Open Science Community

Genetic analyses of diverse populations improves discovery for complex traits

Genevieve Wojcik et al.Jun 1, 2019

Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry1–3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific4–10. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations11,12. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States—where minority populations have a disproportionately higher burden of chronic conditions13—the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities. Genetic analyses of ancestrally diverse populations show evidence of heterogeneity across ancestries and provide insights into clinical implications, highlighting the importance of including ancestrally diverse populations to maximize genetic discovery and reduce health disparities.

Genetics

Anthropology

0

Paper

Save

Comprehensive evaluation of methods for differential expression analysis of metatranscriptomics data

Hunyong Cho et al.Jul 14, 2021

Abstract Understanding the function of the human microbiome is important; however, the development of statistical methods specifically for the microbial gene expression (i.e., metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of ten differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e., model fit, type I error, false discovery rate, and sensitivity) of the methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal-Wallis, and two-part Kruskal-Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of early childhood caries (ECC), whereas validations were sought in two additional datasets from an ECC study and an inflammatory bowel disease (IBD) study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental diseases. This comprehensive model evaluation offer practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones.

Molecular Biology

Periodontics

12

Paper

Save

Improved Metabolite Prediction Using Microbiome Data-Based Elastic Net Models

Jialiu Xie et al.Jul 1, 2021

Abstract Microbiome data are becoming increasingly available in large health cohorts yet metabolomics data are still scant. While many studies generate microbiome data, they lack matched metabolomics data or have considerable missing proportions of metabolites. Since metabolomics is key to understanding microbial and general biological activities, the possibility of imputing individual metabolites or inferring metabolomics pathways from microbial taxonomy or metagenomics is intriguing. Importantly, current metabolomics profiling methods such as the HMP Unified Metabolic Analysis Network (HUMAnN) have unknown accuracy and are limited in their ability to predict individual metabolites. To address this gap, we developed a novel metabolite prediction method, and we present its application and evaluation in an oral microbiome study. We developed ENVIM based on the Elastic Net Model (ENM) to predict metabolites using micorbiome data. ENVIM introduces an extra step to ENM to consider variable importance scores and thus achieve better prediction power. We investigate the metabolite prediction performance of ENVIM using metagenomic and metatranscriptomic data in a supragingival biofilm multi-omics dataset of 297 children ages 3-5 who were participants of a community-based study of early childhood oral health (ZOE 2.0) in North Carolina, United States. We further validate ENVIM in two additional publicly available multi-omics datasets generated from studies of gut health and vagina health. We select gene-family sets based on variable importance scores and modify the existing ENM strategy used in the MelonnPan prediction software to accommodate the unique features of microbiome and metabolome data. We evaluate metagenomic and metatranscriptomic predictors and compare the prediction performance of ENVIM to the standard ENM employed in MelonnPan. The newly-developed ENVIM method showed superior metabolite predictive accuracy than MelonnPan using metatranscriptomics data only, metagenomics data only, or both of these two. Both methods perform better prediction using gut or vagina microbiome data than using oral microbiome data for the samples’ corresponding metabolites. The top predictable compounds have been reported in all these three datasets from three different body sites. Enrichment of prediction some contributing species has been detected.

Genetics

Artificial Intelligence

1

Paper

Genetics

Artificial Intelligence

0

Save

2

BZINB model-based pathway analysis and module identification facilitates integration of microbiome and metabolome data

Bridget Lin et al.Feb 1, 2023

Abstract Integration of multi-omics data is a challenging but necessary step to advance our understanding of the biology underlying human health and disease processes. To date, investigations seeking to integrate multi-omics (e.g., microbiome and metabolome) employ simple correlation-based network analyses; however, these methods are not always well-suited for microbiome analyses because they do not accommodate the excess zeros typically present in these data. In this paper, we introduce a bivariate zero-inflated negative binomial (BZINB) model-based network and module analysis method that addresses this limitation and improves microbiome-metabolome correlation-based model fitting by accommodating excess zeros. We use real and simulated data based on a multi-omics study of childhood oral health (ZOE 2.0; investigating early childhood dental disease, ECC) and find that the accuracy of the BZINB model-based correlation method is superior compared to Spearman’s rank and Pearson correlations in terms of approximating the underlying relationships between microbial taxa and metabolites. The new method, BZINB-iMMPath facilitates the construction of metabolite-species and species-species correlation networks using BZINB and identifies modules of (i.e., correlated) species by combining BZINB and similarity-based clustering. Perturbations in correlation networks and modules can be efficiently tested between groups (i.e., healthy and diseased study participants). Upon application of the new method in the ZOE 2.0 study microbiome-metabolome data, we identify that several biologically-relevant correlations of ECC-associated microbial taxa with carbohydrate metabolites differ between healthy and dental caries-affected participants. In sum, we find that the BZINB model is a useful alternative to Spearman or Pearson correlations for estimating the underlying correlation of zero-inflated bivariate count data and thus is suitable for integrative analyses of multi-omics data such as those encountered in microbiome and metabolome studies.

Ecology

Artificial Intelligence

2

Paper

Ecology

Artificial Intelligence

0

Save

0

X-chromosome and kidney function: evidence from a multi-trait genetic analysis of 908,697 individuals reveals sex-specific and sex-differential findings in genes regulated by androgen response elements

Markus Scholz et al.Jan 18, 2024

Genetics

Molecular Biology

0

Paper

Save

Spatial Immunophenotyping from Whole-Slide Multiplexed Tissue Imaging Using Convolutional Neural Networks

Mohammad Yosofvand et al.Aug 19, 2024

Abstract The multiplexed immunofluorescence (mIF) platform enables biomarker discovery through the simultaneous detection of multiple markers on a single tissue slide, offering detailed insights into intratumor heterogeneity and the tumor-immune microenvironment at spatially resolved single cell resolution. However, current mIF image analyses are labor-intensive, requiring specialized pathology expertise which limits their scalability and clinical application. To address this challenge, we developed CellGate, a deep-learning (DL) computational pipeline that provides streamlined, end-to-end whole-slide mIF image analysis including nuclei detection, cell segmentation, cell classification, and combined immuno-phenotyping across stacked images. The model was trained on over 750,000 single cell images from 34 melanomas in a retrospective cohort of patients using whole tissue sections stained for CD3, CD8, CD68, CK-SOX10, PD-1, PD-L1, and FOXP3 with manual gating and extensive pathology review. When tested on new whole mIF slides, the model demonstrated high precision-recall AUC. Further validation on whole-slide mIF images of 9 primary melanomas from an independent cohort confirmed that CellGate can reproduce expert pathology analysis with high accuracy. We show that spatial immuno-phenotyping results using CellGate provide deep insights into the immune cell topography and differences in T cell functional states and interactions with tumor cells in patients with distinct histopathology and clinical characteristics. This pipeline offers a fully automated and parallelizable computing process with substantially improved consistency for cell type classification across images, potentially enabling high throughput whole-slide mIF tissue image analysis for large-scale clinical and research applications.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save

0

The PAGE Study: How Genetic Diversity Improves Our Understanding of the Architecture of Complex Traits

Genevieve Wojcik et al.Sep 15, 2017

Genome-wide association studies (GWAS) have laid the foundation for many downstream investigations, including the biology of complex traits, drug development, and clinical guidelines. However, the dominance of European-ancestry populations in GWAS creates a biased view of human variation and hinders the translation of genetic associations into clinical and public health applications. To demonstrate the benefit of studying underrepresented populations, the Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioral phenotypes in 49,839 non-European individuals. Using novel strategies for multi-ethnic analysis of admixed populations, we confirm 574 GWAS catalog variants across these traits, and find 28 novel loci and 42 residual signals in known loci. Our data show strong evidence of effect-size heterogeneity across ancestries for published GWAS associations, which substantially restricts genetically-guided precision medicine. We advocate for new, large genome-wide efforts in diverse populations to reduce health disparities.

Genetics

Molecular Biology

0

Paper

Genetics

Molecular Biology

0

Save