ResearchHub | Open Science Community

Genomewide Association Study of Leprosy

Furen Zhang et al.Dec 18, 2009

The narrow host range of Mycobacterium leprae and the fact that it is refractory to growth in culture has limited research on and the biologic understanding of leprosy. Host genetic factors are thought to influence susceptibility to infection as well as disease progression.

Genetics

Epidemiology

0

Paper

Save

Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus

Yu Xiong et al.Oct 18, 2009

Genetics

Immunology

0

Paper

Save

Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

Jie Huang et al.Sep 14, 2015

Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.

Genetics

Machine Learning

0

Paper

Save

Genetic Structure of the Han Chinese Population Revealed by Genome-wide SNP Variation

Jieming Chen et al.Nov 28, 2009

Genetics

Demography

0

Paper

Save

UK BioCoin: Swift Trait-Specific Summary Statistics Regression for UK Biobank

Jing-Cheng He et al.Apr 15, 2024

Abstract Summary statistics derived from large-scale biobanks facilitate the sharing of genetic discoveries while minimizing the risk of compromising individual-level data privacy. However, these summary statistics, such as those from the UK Biobank (UKB) provided by Neale’s lab, are often adjusted by a fixed set of covariates to all traits (12 covariates including 10 PCs, sex and age), preventing the exploration of trait-specific summary statistics. In this study, we present a novel computational device UK BioCoin ( UKC ), which is designed to provide an efficient framework for trait-specific adjustment for covariates. Without requiring access to individual-level data from UKB, UKC leverages summary statistics regression technique and resources from UKB (289 GB of 199 phenotypes and 10 million SNPs), to enable the generation of GWAS summary statistics adjusted by user-specified covariates. Through a comprehensive analysis of height under trait-specific adjustments, we demonstrate that the GWAS summary statistics generated by UKC closely mirror those generated from individual-level UKB GWAS ( ρ ≥ 0.99 for effect sizes and ρ ≥ 0.99 for p -values). Furthermore, we demonstrate the results for GWAS, SNP-heritability estimation, polygenic score, and Mendelian randomization, after various trait-specific covariate adjustments as allowed by UKC, indicating UKC a platform that harnesses in-depth exploration for researchers lacking access to UKB. The whole framework of UKC is portable for other biobank, as demonstrated in Westlake Biobank, which can equivalently be converted to a ‘UKC-like” platform and promote data sharing. UKC has its computational engine fully optimized, and the computational efficiency of UKC is about 70 times faster than that of UKB. We package UKC as a Docker image of 20 GB ( https://github.com/Ttttt47/UKBioCoin ), which can be easily deployed on an average computer (e.g. laptop). One sentence summary We develop UK BioCoin (UKC), which allows fine-tuning of covariates for each UK Biobank trait but does not relay on UK Biobank individual-level data. It will change the current landscape of GWAS and reshape its downstream analyses.

Genetics

Oncology

0

Paper

Save

Searching across-cohort relatives in 54,092 GWAS samples via encrypted genotype regression

Xin Huang et al.Oct 21, 2022

Abstract Explicitly sharing individual level data in genomics studies has many merits comparing to sharing summary statistics, including more strict QCs, common statistical analyses, relative identification and improved statistical power in GWAS, but it is hampered by privacy or ethical constraints. In this study, we developed encG-reg , a regression approach that can detect relatives of various degrees based on encrypted genomic data, which is immune of ethical constraints. The encryption properties of encG-reg are based on the random matrix theory by masking the original genotypic matrix without sacrificing precision of individual-level genotype data. We established a connection between the dimension of a random matrix, which masked genotype matrices, and the required precision of a study for encrypted genotype data. encG-reg has false positive and false negative rates equivalent to sharing original individual level data, and is computationally efficient when searching relatives. We split the UK Biobank into their respective centers, and then encrypted the genotype data. We observed that the relatives estimated using encG-reg was equivalently accurate with the estimation by KING, which is a widely used software but requires original genotype data. In a more complex application, we launched a finely devised multi-center collaboration across 5 research institutes in China, covering 9 cohorts of 54,092 GWAS samples. encG-reg again identified true relatives existing across the cohorts with even different ethnic backgrounds and genotypic qualities. Our study clearly demonstrates that encrypted genomic data can be used for data sharing without loss of information or data sharing barrier. Author Summary Estimating pairwise genetic relatedness within a single cohort is straightforward. However, in practice, related samples are often distributed across different cohorts, making it challenging to estimate inter-cohort relatedness. In this study, we propose a method called encrypted genotype regression ( encG-reg ), which provides an unbiased estimation of inter-cohort relatedness using encrypted genotypes. The genotype matrix of each cohort is masked by a random matrix, which acts similarly to a private key in a cryptographic scheme. This masking process produces encrypted genotypes, which are a projection of the original genotype matrix. We derive the expectation and particularly the sampling variance for encG-reg , the latter involves eighth-order moments calculation. encG-reg allows us to accurately identify relatedness across cohorts, even for large-scale biobank data. To demonstrate the efficacy of encG-reg , we verified it in a multi-ethnicity UK Biobank dataset comprising 485,158 samples. For this case, we successfully tracked down to the 1st-degree relatedness (such as full sibs and parent-offspring). Furthermore, we used encG-reg in a collaboration involving 9 Chinese cohorts, encompassing a total of 54,092 samples from 5 genomic centers. It is worth noting that if the number of effective markers is sufficient encG-reg has the potential to detect even more distant degrees of relatedness beyond what we demonstrated.

Genetics

Cancer Research

6

Paper

Save

Early life lipid overload in Native American myopathy is phenocopied bystac3knock out in zebrafish

Rajashekar Donaka et al.Jul 29, 2023

Abstract Understanding the early stages of human congenital myopathies is critical for proposing strategies for improving skeletal muscle performance by the functional integrity of cytoskeleton. SH3 and cysteine-rich domain 3 (Stac3) is a protein involved in nutrient sensing, and is an essential component of the excitation-contraction (EC) coupling machinery for Ca 2+ releasing. A mutation in STAC3 causes debilitating Native American myopathy (NAM) in humans, and loss of this gene in mice and zebrafish resulted in death in early life. Previously, NAM patients demonstrated increased lipids in skeletal muscle biopsy. However, elevated neutral lipids could alter muscle function in NAM disease via EC coupling apparatus is yet undiscovered in early development. Here, using a CRISPR/Cas9 induced stac3 knockout (KO) zebrafish model, we determined that loss of stac3 led to muscle weakness, as evidenced by delayed larval hatching. We observed decreased whole-body Ca 2+ level at 5 days post-fertilization (dpf) and defects in the skeletal muscle cytoskeleton, i.e., F-actin and slow muscle fibers at 5 and 7 dpf. Homozygous larvae exhibited elevated neutral lipid levels at 5 dpf, which persisted beyond 7 dpf. Myogenesis regulators such as myoD and myf5 , were significantly altered in stac3 -/- larvae at 5 dpf, thus a progressive death of the KO larva by 11 dpf. In summary, the presented findings suggest that stac3 -/- can serve as a non-mammalian model to identify lipid-lowering molecules for refining muscle function in NAM patients.

Genetics

Biochemistry

1

Paper

Save

Early life lipid overload in Native American Myopathy is phenocopied by stac3 knockout in zebrafish

Rajashekar Donaka et al.Nov 1, 2024

Genetics

Molecular Biology

0

Paper

Save

Transcriptome Sequencing Reveals Widespread Gene-Gene and Gene-Environment Interactions

Alfonso Buil et al.Oct 19, 2014

Understanding the genetic architecture of gene expression is an intermediate step to understand the genetic architecture of complex diseases. RNA-seq technologies have improved the quantification of gene expression and allow to measure allelic specific expression (ASE)1-3. ASE is hypothesized to result from the direct effect of cis regulatory variants, but a proper estimation of the causes of ASE has not been performed to date. In this study we take advantage of a sample of twins to measure the relative contribution of genetic and environmental effects on ASE and we found substantial effects of gene x gene (GxG) and gene x environment (GxE) interactions. We propose a model where ASE requires genetic variability in cis, a difference in the sequence of both alleles, but the magnitude of the ASE effect depends on trans genetic and environmental factors that interact with the cis genetic variants. We uncover large GxG and GxE effects on gene expression and likely complex phenotypes that currently remain elusive.

Genetics

Molecular Biology

0

Paper

Save

Genomic analyses of 10,376 individuals provides comprehensive map of genetic variations, structure and reference haplotypes for Chinese population

Peikuan Cong et al.Feb 8, 2021

Abstract Here, we initiated the Westlake BioBank for Chinese (WBBC) pilot project with 4,535 whole-genome sequencing individuals and 5,481 high-density genotyping individuals. We identified 80.99 million SNPs and INDELs, of which 38.6% are novel. The genetic evidence of Chinese population structure supported the corresponding geographical boundaries of the Qinling-Huaihe Line and Nanling Mountains. The genetic architecture within North Han was more homogeneous than South Han, and the history of effective population size of Lingnan began to deviate from the other three regions from 6 thousand years ago. In addition, we identified a novel locus ( SNX29 ) under selection pressure and confirmed several loci associated with alcohol metabolism and histocompatibility systems. We observed significant selection of genes on epidermal cell differentiation and skin development only in southern Chinese. Finally, we provided an online imputation server ( https://wbbc.westlake.edu.cn/ ) which could result in higher imputation accuracy compared to the existing panels, especially for lower frequency variants.

Genetics

Demography

16

Paper

Genetics

Demography

0

Save