ResearchHub | Open Science Community

NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses

Thiago Leal et al.Oct 23, 2021

Abstract Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to large datasets. NAToRA outperformed two popular methodologies (implemented in software PLINK and KING) by showing the best combination of effective relatedness-pruning, removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar or lesser reduction in genetic diversity. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also present a genealogies simulator software used for different tests performed in the manuscript.

Artificial Intelligence

Molecular Biology

1

Paper

Artificial Intelligence

1

0

Save

22

Unappreciated Subcontinental Admixture in Europeans and European Americans: Implications for Genetic Epidemiology Studies

Mateus Gouveia et al.Nov 29, 2022

+4

C

M

ABSTRACT European-ancestry populations are recognized as stratified but not as admixed, implying that residual confounding by locus-specific ancestry can affect studies of association, polygenic adaptation, and polygenic risk scores. We integrated individual-level genome-wide data from ~ 19,000 European-ancestry individuals across 79 European populations and five European American cohorts. We generated a new reference panel that captures ancestral diversity missed by both the 1000 Genomes and Human Genome Diversity Projects. Both Europeans and European-Americans are admixed at subcontinental level, with admixture dates differing among subgroups of European Americans. After adjustment for both genome-wide and locus-specific ancestry, associations between a highly differentiated variant in LCT (rs4988235) and height or LDL-cholesterol were confirmed to be false positives whereas the association between LCT and body mass index was genuine. We provide formal evidence of subcontinental admixture in individuals with European ancestry, which, if not properly accounted for, can produce spurious results in genetic epidemiology studies.

Genetics

Epidemiology

22

Paper

Save

Admixture/fine-mapping in Brazilians reveals a West African associated potential regulatory variant (rs114066381) with a strong female-specific effect on body mass- and fat mass-indexes

Marília Scliar et al.Nov 14, 2019

Admixed populations are a resource to study the global genetic architecture of complex phenotypes, which is critical, considering that non-European populations are severely under-represented in genomic studies. Leveraging admixture in Brazilians, whose chromosomes are mosaics of fragments of Native American, European and African origins, we used genome-wide data to perform admixture mapping/fine-mapping of Body Mass Index (BMI) in three population-based cohorts from Northeast (Salvador), Southeast (Bambuí) and South (Pelotas) of the country. We found significant associations with African-associated alleles in children from Salvador (PALD1 and ZMIZ1 genes), and in young adults from Pelotas (NOD2 and MTUS2 genes). More importantly, in Pelotas, rs114066381, mapped in a potential regulatory region, is significantly associated only in females (p= 2.76 e-06). This variant is very rare in Europeans but with frequencies of ~3% in West Africa, and has a strong female-specific effect (95%CI: 2.32-5.65 kg/m2 per each A allele). We confirmed this sex-specific association and replicated its strong effect for an adjusted fat-mass index in the same Pelotas cohort, and for BMI in another Brazilian cohort from São Paulo (Southeast Brazil). A meta-analysis confirmed the significant association. Remarkably, we observed that while the frequency of rs114066381-A allele ranges from 0.8 to 2.1% in the studied populations, it attains ~9% among morbidly obese women from Pelotas, São Paulo, and Bambuí. The effect size of rs114066381 is at least five-times the effect size of the FTO SNPs rs9939609 and rs1558902, already emblematic for their high effects, and for which we replicated associations in Pelotas. We demonstrate how, after a decade of GWAS mostly performed in European-ancestry populations, non-European and admixed populations are a source of new relevant phenotype-associated genetic variants.

Genetics

Internal Medicine

0

Paper

Save

Origins, admixture dynamics and homogenization of the African gene pool in the Americas

Mateus Gouveia et al.May 28, 2019

The Transatlantic Slave Trade transported more than 9 million Africans to the Americas between the early 16th and the mid-19th centuries. We performed genome-wide analysis of 6,267 individuals from 22 populations and observed an enrichment in West-African ancestry in northern latitudes of the Americas, whereas South/East African ancestry is more prevalent in southern South-America. This pattern results from distinct geographic and geopolitical factors leading to population differentiation. However, we observed a decrease of 68% in the African gene pool between-population diversity within the Americas when compared to the regions of origin from Africa, underscoring the importance of historical factors favoring admixture between individuals with different African origins in the New World. This is consistent with the excess of West-Central Africa ancestry (the most prevalent in the Americas) in the US and Southeast-Brazil, respect to historical-demography expectations. Also, in most of the Americas, admixture intensification occurred between 1,750 and 1,850, which correlates strongly with the peak of arrivals from Africa. This study contributes with a population genetics perspective to the ongoing social, cultural and political debate regarding ancestry, race, and admixture in the Americas.

Genetics

History

35

Paper

Save

Random forest classifiers trained on simulated data enable accurate short read-based genotyping of structural variants in the alpha globin region at Chr16p13.3

Nancy Hansen et al.Jan 1, 2023

In regions where reads don9t align well to a reference, it is generally difficult to characterize structural variation using short read sequencing. Here, we utilize machine learning classifiers and short sequence reads to genotype structural variants in the alpha globin locus on chromosome 16, a medically-relevant region that is challenging to genotype in individuals. Using models trained only with simulated data, we accurately genotype two hard-to-distinguish deletions in two separate human cohorts. Furthermore, population allele frequencies produced by our methods across a wide set of ancestries agree more closely with previously-determined frequencies than those obtained using currently available genotyping software.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save

0

GWAS in Africans identifies novel lipids loci and demonstrates heterogenous association within Africa

Amy Bentley et al.Oct 29, 2020

Abstract Background Serum lipids are biomarkers of cardiometabolic disease risk, and understanding the genomic factors contributing to their distribution has been of considerable interest. Large genome-wide association studies (GWAS) have identified over 150 lipids loci; however, GWAS of Africans (AF) are rare. Given the genomic diversity among those of African ancestry, it is expected that a GWAS in Africans could identify novel lipids loci. While GWAS have been conducted in African Americans (AA), such studies are not proxies for studies in continental Africans due to the drastically different environmental context. Therefore, we conducted a GWAS of 4,317 Africans enrolled in the Africa America Diabetes Mellitus study. Methods and Results We used linear mixed models of the inverse normal transformations of covariate-djusted residuals of high-density lipoprotein cholesterol (HDLC), low-density lipoprotein cholesterol (LDLC), total cholesterol (CHOL), triglycerides (TG), and TG/HDLC, with adjustment for three principal components and the random effect of relatedness. Replication of loci associated at p<5×10 −8 was attempted in 9,542 AA. Meta-analysis of AF and AA was also conducted. We also conducted analyses that excluded the relatively small number of East Africans. We evaluated known lipids loci in Africans using both exact replication and “local” replication, which accounts for interethnic differences in linkage disequilibrium. In our main analysis, we identified 23 novel associations in Africans. Of the 14 of these that were able to be tested in AA, two associations replicated ( GPNMB -TG and ENPP1 -TG). Two additional novel loci were discovered upon meta-analysis with AA (rs138282551-TG and TLL2 -CHOL). Analyses considering only those with predominantly West African ancestry (Nigeria, Ghana, and AA) yielded new insights: ORC5 -LDLC and chr20:60973327-CHOL. Conclusions While functional work will be useful to confirm and understand the biological mechanisms underlying these associations, this study demonstrates the utility of conducting large-scale genomic analyses in Africans for discovering novel loci. The functional significance of some of these loci in relation to lipids remains to be elucidated, yet some have known connections to lipids pathways. For instance, rs147706369 (intronic, TLL2 ) alters a regulatory motif for sterol regulatory element-binding proteins (SREBPs), which are a family of transcription factors that control the expression of a range of enzymes involved in cholesterol, fatty acid, and triglyceride synthesis.

Genetics

Paleontology

0

Paper

Genetics

Paleontology

0

Save