ResearchHub | Open Science Community

Automated assembly of high-quality diploid human reference genomes

Erich Jarvis et al.Mar 6, 2022

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has greatly benefited society 1, 2 . However, it still has many gaps and errors, and does not represent a biological human genome since it is a blend of multiple individuals 3, 4 . Recently, a high-quality telomere-to-telomere reference genome, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a duplicate genome, and is thus nearly homozygous 5 . To address these limitations, the Human Pangenome Reference Consortium (HPRC) recently formed with the goal of creating a collection of high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and automated assembly approaches yields the most complete, accurate, and cost-effective diploid genome assemblies with minimal manual curation. Approaches that used highly accurate long reads and parent-child data to sort haplotypes during assembly outperformed those that did not. Developing a combination of all the top performing methods, we generated our first high- quality diploid reference assembly, containing only ∼4 gaps (range 0-12) per chromosome, most within + 1% of CHM13’s length. Nearly 1/4th of protein coding genes have synonymous amino acid changes between haplotypes, and centromeric regions showed the highest density of variation. Our findings serve as a foundation for assembling near-complete diploid human genomes at the scale required for constructing a human pangenome reference that captures all genetic variation from single nucleotides to large structural rearrangements.

Genetics

Molecular Biology

107

Paper

Save

Genome-wide prediction and integrative functional characterization of Alzheimer’s disease-associated genes

Cuixiang Lin et al.Feb 10, 2021

Abstract The mechanism of Alzheimer’s disease (AD) remains elusive, partly due to the incomplete identification of risk genes. We developed an approach to predict AD-associated genes by learning the functional pattern of curated AD-associated genes from brain gene networks. We created a pipeline to evaluate disease-gene association by interrogating heterogeneous biological networks at different molecular levels. Our analysis showed that top-ranked genes were functionally related to AD. We identified gene modules associated with AD pathways, and found that top-ranked genes were correlated with both neuropathological and clinical phenotypes of AD on independent datasets. We also identified potential causal variants for genes such as FYN and PRKAR1A by integrating brain eQTL and ATAC-seq data. Lastly, we created the ALZLINK web interface, enabling users to exploit the functional relevance of predicted genes to AD. The predictions and pipeline could become a valuable resource to advance the identification of therapeutic targets for AD.

Genetics

Philosophy

9

Paper

Save

Genome-wide Detection of Cytosine Methylations in Plant from Nanopore sequencing data using Deep Learning

Peng Ni et al.Feb 8, 2021

Abstract Methylation states of DNA bases can be detected from native Nanopore reads directly. At present, there are many computational methods that can detect 5mCs in CpG contexts accurately by Nanopore sequencing. However, there is currently a lack of methods to detect 5mCs in non-CpG contexts. In this study, we propose a computational pipeline which can detect 5mC sites in both CpG and non-CpG contexts of plant genomes by using Nanopore sequencing. And we sequenced two model plants Arabidopsis thaliana ( A. thaliana ) and Oryza sativa ( O. sativa ) by using Nanopore sequencing and bisulfite sequencing. The results of our proposed pipeline in the two plants achieved high correlations with bisulfite sequencing: above 0.98, 0.96, 0.85 for CpG, CHG, and CHH (H indicates A, C or T) motif, respectively. Our proposed pipeline also achieved high performance on Brassica nigra ( B. nigra ). Experiments also showed that our proposed pipeline can achieve high performance even with low coverage of reads. Moreover, by using Nanopore sequencing, our proposed pipeline is capable of profiling methylation of more cytosines than bisulfite sequencing.

Genetics

Molecular Biology

29

Paper

Genetics

Molecular Biology

0

Save