ResearchHub | Open Science Community

An atlas of human long non-coding RNAs with accurate 5′ ends

Chung-Chau Hon et al.Feb 28, 2017

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5′ ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome. A catalogue of human long non-coding RNA genes and their expression profiles across samples from major human primary cell types, tissues and cell lines. Alistair Forrest, Piero Carninci and colleagues of the FANTOM Consortium provide a catalogue of human long non-coding RNA (lncRNA) genes and their expression profiles across samples from human primary cell types, tissues and cell lines. They used combined analyses of multiple data sets to identify 27,919 lncRNA genes with high-confidence 5′ ends, as well as a subset of 19,175 potentially functional lncRNA loci. The lncRNA catalogue and annotations are available through an open web resource.

Genetics

Molecular Biology

0

Paper

Save

An integrated expression atlas of miRNAs and their promoters in human and mouse

Derek Rie et al.Aug 21, 2017

An atlas of microRNA expression patterns and regulators is produced by deep sequencing of short RNAs in human and mouse cells. MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.

Genetics

Molecular Biology

0

Paper

Save

Phylogeny-Based Evolutionary, Demographical, and Geographical Dissection of North American Type 2 Porcine Reproductive and Respiratory Syndrome Viruses

Mǎng Shī et al.Jun 17, 2010

ABSTRACT Type 2 (or North American-like) porcine reproductive and respiratory syndrome virus (PRRSV) was first recorded in 1987 in the United States and now occurs in most commercial swine industries throughout the world. In this study, we investigated the epidemiological and evolutionary behaviors of type 2 PRRSV. Based on phylogenetic analyses of 8,624 ORF5 sequences, we described a comprehensive picture of the diversity of type 2 PRRSVs and systematically classified all available sequences into lineages and sublineages, including a number of previously undescribed lineages. With the rapid growth of sequence deposition into the databases, it would be technically difficult for veterinary researchers to genotype their sequences by reanalyzing all sequences in the databases. To this end, a set of reference sequences was established based on our classification system, which represents the principal diversity of all available sequences and can readily be used for further genotyping studies. In addition, we further investigated the demographic histories of these lineages and sublineages by using Bayesian coalescence analyses, providing evolutionary insights into several important epidemiological events of type 2 PRRSV. Moreover, by using a phylogeographic approach, we were able to estimate the transmission frequencies between the pig-producing states in the United States and identified several states as the major sources of viral spread, i.e., “transmission centers.” In summary, this study represents the most extensive phylogenetic analyses of type 2 PRRSV to date, providing a basis for future genotyping studies and dissecting the epidemiology of type 2 PRRSV from phylogenetic perspectives.

Genetics

Plant Science

0

Paper

Save

Analysis of the Genome and Transcriptome of Cryptococcus neoformans var. grubii Reveals Complex RNA Expression and Microevolution Leading to Virulence Attenuation

Guilhem Janbon et al.Apr 17, 2014

Cryptococcus neoformans is a pathogenic basidiomycetous yeast responsible for more than 600,000 deaths each year. It occurs as two serotypes (A and D) representing two varieties (i.e. grubii and neoformans, respectively). Here, we sequenced the genome and performed an RNA-Seq-based analysis of the C. neoformans var. grubii transcriptome structure. We determined the chromosomal locations, analyzed the sequence/structural features of the centromeres, and identified origins of replication. The genome was annotated based on automated and manual curation. More than 40,000 introns populating more than 99% of the expressed genes were identified. Although most of these introns are located in the coding DNA sequences (CDS), over 2,000 introns in the untranslated regions (UTRs) were also identified. Poly(A)-containing reads were employed to locate the polyadenylation sites of more than 80% of the genes. Examination of the sequences around these sites revealed a new poly(A)-site-associated motif (AUGHAH). In addition, 1,197 miscRNAs were identified. These miscRNAs can be spliced and/or polyadenylated, but do not appear to have obvious coding capacities. Finally, this genome sequence enabled a comparative analysis of strain H99 variants obtained after laboratory passage. The spectrum of mutations identified provides insights into the genetics underlying the micro-evolution of a laboratory strain, and identifies mutations involved in stress responses, mating efficiency, and virulence.

Genetics

Epidemiology

0

Paper

Save

Update of the FANTOM web resource: expansion to provide additional transcriptome atlases

Marina Lizio et al.Oct 20, 2018

The FANTOM web resource (http://fantom.gsc.riken.jp/) was developed to provide easy access to the data produced by the FANTOM project. It contains the most complete and comprehensive sets of actively transcribed enhancers and promoters in the human and mouse genomes. We determined the transcription activities of these regulatory elements by CAGE (Cap Analysis of Gene Expression) for both steady and dynamic cellular states in all major and some rare cell types, consecutive stages of differentiation and responses to stimuli. We have expanded the resource by employing different assays, such as RNA-seq, short RNA-seq and a paired-end protocol for CAGE (CAGEscan), to provide new angles to study the transcriptome. That yielded additional atlases of long noncoding RNAs, miRNAs and their promoters. We have also expanded the CAGE analysis to cover rat, dog, chicken, and macaque species for a limited number of cell types. The CAGE data obtained from human and mouse were reprocessed to make them available on the latest genome assemblies. Here, we report the recent updates of both data and interfaces in the FANTOM web resource.

Genetics

Molecular Biology

0

Paper

Save

Systematic identification of cis-interacting lncRNAs and their targets

Saumya Agrawal et al.Jan 14, 2021

Abstract The human genome is pervasively transcribed and produces a wide variety of long non-coding RNAs (lncRNAs), constituting the majority of transcripts across human cell types. Studying lncRNAs is challenging due to their low expression level, cell type-specific occurrence, poor sequence conservation between orthologs, and lack of information about RNA domains. LncRNAs direct the regulatory factors in the locations that are in cis to their transcription sites. We designed a model to predict if an lncRNA acts in cis based on its features and trained it using RNA-chromatin interaction data. The trained model is cell type-independent and does not require RNA-chromatin data. Combining RNA-chromatin and Hi-C data, we showed that lncRNA-chromatin binding sites are determined by chromosome conformation. For each lncRNA, the spatially proximal genes were identified as their potential targets by combining Hi-C and Cap Analysis Gene Expression (CAGE) data in 18 human cell types. RNA-protein and RNA-chromatin interaction data suggested that lncRNAs act as scaffolds to recruit regulatory proteins to target promoters and enhancers. We provide the data through an interactive visualization web portal at https://fantom.gsc.riken.jp/zenbu/reports/#F6_3D_lncRNA .

Genetics

Philosophy

1

Paper

Save

Recombination of repeat elements generates somatic complexity in human genomes

Giovanni Pascarella et al.Jul 2, 2020

Summary Millions of Alu and L1 copies in our genomes contribute to evolution and genetic disorders via non-allelic homologous recombination, but the somatic extent of these rearrangements has not been systematically investigated. Here we combine short and long DNA reads sequencing of repeat elements with a new bioinformatic pipeline to show that somatic recombination of Alu and L1 elements is common in human genomes. We report new tissue-specific recombination hallmarks, and show that retroelements acting as recombination hotspots are enriched in centromeres and cancer genes. We compare recombination profiles in human induced pluripotent stem cells and differentiated neurons and show that neuron-specific recombination of repeat elements accompanies chromatin changes during cell-fate determination. Finally, we find that somatic recombination profiles are altered in Parkinson’s and Alzheimer’s disease, indicating a link between retroelements recombination and genomic instability in neurodegeneration. This work shows that somatic recombination of repeat elements contributes massively to genomic diversity in health and disease.

Genetics

Molecular Biology

1

Paper

Save

Profiling of transcribed cis-regulatory elements in single cells

Jonathan Moody et al.Apr 4, 2021

Abstract Profiling of cis -regulatory elements (CREs, mostly promoters and enhancers) in single cells allows the interrogation of the cell-type and cell-state-specific contexts of gene regulation and genetic predisposition to diseases. Here we demonstrate single-cell RNA-5′end-sequencing (sc-end5-seq) methods can detect transcribed CREs (tCREs), enabling simultaneous quantification of gene expression and enhancer activities in a single assay at no extra cost. We showed enhancer RNAs can be detected using sc-end5-seq methods with either random or oligo(dT) priming. To analyze tCREs in single cells, we developed SCAFE (Single Cell Analysis of Five-prime Ends) to identify genuine tCREs and analyze their activities ( https://github.com/chung-lab/scafe ). As compared to accessible CRE (aCRE, based on chromatin accessibility), tCREs are more accurate in predicting CRE interactions by co-activity, more sensitive in detecting shifts in alternative promoter usage and more enriched in diseases heritability. Our results highlight additional dimensions within sc-end5-seq data which can be used for interrogating gene regulation and disease heritability.

Genetics

Immunology

1

Paper

Save

Single-cell transcriptomics, scRNA-Seq and C1 CAGE discovered distinct phases of pluripotency during naïve-to-primed conversion in mice

Michael Böttcher et al.Sep 25, 2020

Abstract Background Two types of mammalian pluripotent stem cells (PSC), i.e. naïve and primed possess distinct cellular characteristics. It is largely unknown how these differences are generated during naïve-to-primed transition process. We have established a robust in vitro transition system using a Wnt inhibitor for the first time and analyzed dynamic changes in cellular status via single-cell RNA-sequencing and C1 CAGE analyses. Results Analysis of known marker genes suggested that the cell transition process progresses as expected. However, cluster analyses revealed a sudden increase in expression profile diversities three and four days after induction of the transition. These expression diversities can be reconciled by the presence of two subpopulations with distinct transcription profiles emerging at these time points. One of the subpopulations appears transiently, and surprisingly these cells showed a global downregulation of gene expression. Moreover, initiation of random X chromosome inactivation (XCI) coincides with the appearance of these transient cells. The other subpopulation can be maintained as a stem cell line and possesses expression profiles more similar to those of primed epiblast stem cells (EpiSC) than embryonic stem cells (ESC). However, there are important differences in gene expression related to epithelial-mesenchymal transition (EMT), suggesting that this subpopulation may represent a novel pluripotent state that has an intermediate cellular phenotype between ESC and EpiSC. Conclusions These findings should contribute to our understanding of the establishment and maintenance of distinct differentiation statuses of mammalian PSCs and provide new insights into the pluripotency spectrum in general.

Genetics

Molecular Biology

17

Paper

Save

Single-cell analysis of human diversity in circulating immune cells

Kian Kock et al.Jul 1, 2024

Lack of diversity and proportionate representation in genomics datasets and databases contributes to inequity in healthcare outcomes globally. The relationships of human diversity with biological and biomedical phenotypes are pervasive, yet remain understudied, particularly in a single-cell genomics context. Here we present the Asian Immune Diversity Atlas (AIDA), a multi-national single-cell RNA-sequencing (scRNA-seq) healthy reference atlas of human immune cells. AIDA comprises 1,265,624 circulating immune cells from 619 healthy donors and 6 controls, spanning 7 population groups across 5 countries. AIDA is one of the largest healthy blood datasets in terms of number of cells, and also the most diverse in terms of number of population groups. Though population groups are frequently compared at the continental level, we identified a pervasive impact of sub-continental diversity on cellular and molecular properties of immune cells. These included cell populations and genes implicated in disease risk and pathogenesis as well as those relevant for diagnostics. We detected single-cell signatures of human diversity not apparent at the level of cell types, as well as modulation of the effects of age and sex by self-reported ethnicity. We discovered functional genetic variants influencing cell type-specific gene expression, including context-dependent effects, which were under-represented in analyses of non-Asian population groups, and which helped contextualise disease-associated variants. We validated our findings using multiple independent datasets and cohorts. AIDA provides fundamental insights into the relationships of human diversity with immune cell phenotypes, enables analyses of multi-ancestry disease datasets, and facilitates the development of precision medicine efforts in Asia and beyond.

Genetics

Immunology

0

Paper

Genetics

2

0

Save