ResearchHub | Open Science Community

CNSA: a data repository for archiving omics data

Xueqin Guo et al.Jan 1, 2020

Abstract With the application and development of high-throughput sequencing technology in life and health sciences, massive multi-omics data brings the problem of efficient management and utilization. Database development and biocuration are the prerequisites for the reuse of these big data. Here, relying on China National GeneBank (CNGB), we present CNGB Sequence Archive (CNSA) for archiving omics data, including raw sequencing data and its further analyzed results which are organized into six objects, namely Project, Sample, Experiment, Run, Assembly and Variation at present. Moreover, CNSA has created a correlation model of living samples, sample information and analytical data on some projects. Both living samples and analytical data are directly correlated with the sample information. From either one, information or data of the other two can be obtained, so that all data can be traced throughout the life cycle from the living sample to the sample information to the analytical data. Complying with the data standards commonly used in the life sciences, CNSA is committed to building a comprehensive and curated data repository for storing, managing and sharing of omics data. We will continue to improve the data standards and provide free access to open-data resources for worldwide scientific communities to support academic research and the bio-industry. Database URL: https://db.cngb.org/cnsa/.

Molecular Biology

Computer Science

1

Paper

Save

STOmicsDB: a database of Spatial Transcriptomic data

Zhicheng Xu et al.Mar 14, 2022

ABSTRACT Recent technological development in spatial transcriptomics allows researchers to measure gene expression of cells and their spatial locations at the almost single-cell level, which generates detailed biological insight into biological processes. However, specialized spatial transcriptomics databases are rare. Here, we present the Spatial TranscriptOmics DataBase (STOmicsDB), a user-friendly database with multifunctions including search of relevant publications and tools, public dataset visualization, customized specialized databases, new data archive, and online analysis. The current version of STOmicsDB consists of 141 curated spatial transcript datasets covering 12 species, and includes 5,618 spatial multi-omics publications and 674 tools. STOmicsDB is freely accessible at https://db.cngb.org/stomics/ .

Biochemistry

Immunology

8

Paper

Save

VirusDIP: Virus Data Integration Platform

Lina Wang et al.Jun 9, 2020

Abstract Motivation The Coronavirus Disease 2019 (COVID-19) pandemic poses a huge threat to human public health. Viral sequence data plays an important role in the scientific prevention and control of epidemics. A comprehensive virus database will be vital useful for virus data retrieval and deep analysis. To promote sharing of virus data, several virus databases and related analyzing tools have been created. Results To facilitate virus research and promote the global sharing of virus data, we present here VirusDIP, a one-stop service platform for archive, integration, access, analysis of virus data. It accepts the submission of viral sequence data from all over the world and currently integrates data resources from the National GeneBank Database (CNGBdb), Global initiative on sharing all influenza data (GISAID), and National Center for Biotechnology Information (NCBI). Moreover, based on the comprehensive data resources, BLAST sequence alignment tool and multi-party security computing tools are deployed for multi-sequence alignment, phylogenetic tree building and global trusted sharing. VirusDIP is gradually establishing cooperation with more databases, and paving the way for the analysis of virus origin and evolution. All public data in VirusDIP are freely available for all researchers worldwide. Availability https://db.cngb.org/virus/ Contact weixiaofeng@cngb.org

Ecology

Epidemiology

6

Paper

Save

CNSA: a data repository for archiving omics data

Xueqin Guo et al.Apr 9, 2020

Abstract With the application and development of high-throughput sequencing technology in life and health sciences, massive multi-dimensional biological data brings the problem of efficient management and utilization. Database development and biocuration are the prerequisites for the reuse of these big data. Here, relying on China National GeneBank (CNGB), we present CNGB Sequence Archive (CNSA) for archiving omics data, including raw sequencing data and its analytical data and related metadata which are organized into six objects, namely Project, Sample, Experiment, Run, Assembly, and Variation at present. Moreover, CNSA has created the correlation model of living samples, sample information, and analytical data on some projects, so that all data can be traced throughout the life cycle from the living sample to the sample information to the analytical data. Complying with the data standards commonly used in the life sciences, CNSA is committed to building a comprehensive and curated data repository for the storage, management and sharing of omics data, improving the data standards, and providing free access to open data resources for worldwide scientific communities to support academic research and the bio-industry. Database URL: https://db.cngb.org/cnsa/

Genetics

Molecular Biology

0

Paper

Save

CDCP: a visualization and analyzing platform for single-cell datasets

Yuejiao Li et al.Aug 25, 2021

Abstract Advances in single-cell sequencing technology provide a unique approach to characterize the heterogeneity and distinctive functional states at single-cell resolution, leading to rapid accumulation of large-scale single-cell datasets. A big challenge undertaken by research community especially bench scientists is how to simplify the way of retrieving, processing and analyzing the huge number of datasets. Towards this end, we developed Cell-omics Data Coordinate Platform (CDCP), a platform that aims to share and integrate comprehensive single-cell datasets, and to provide a network analysis toolkit for personalized analysis. CDCP contains single-cell RNA-seq and ATAC-seq datasets of 474 572 cells from 6 459 samples in species covering humans, non-human primate models and other animals. It allows querying and visualization of interested datasets and the expression profile of distinct genes in different cell clusters and cell types. Besides, this platform provides an analysis pipeline for non-bioinformatician experimental scientists to address questions not focused by the submitters of the datasets. In summary, CDCP provides a user-friendly interface for researchers to explore, visualize, analyze, download and submit published single-cell datasets and it will be a valuable resource for investigators to explore the global transcriptome profiling at single-cell level.

Genetics

Molecular Biology

6

Paper

Save

Cell transcriptomic atlas of the non-human primate Macaca fascicularis

Lei Han et al.Dec 13, 2021

Studying tissue composition and function in non-human primates (NHP) is crucial to understand the nature of our own species. Here, we present a large-scale single-cell and single-nucleus transcriptomic atlas encompassing over one million cells from 43 tissues from the adult NHP Macaca fascicularis . This dataset provides a vast, carefully annotated, resource to study a species phylogenetically close to humans. As proof of principle, we have reconstructed the cell-cell interaction networks driving Wnt signalling across the body, mapped the distribution of receptors and co-receptors for viruses causing human infectious diseases and intersected our data with human genetic disease orthologous coordinates to identify both expected and unexpected associations. Our Macaca fascicularis cell atlas constitutes an essential reference for future single-cell studies in human and NHP.

Genetics

Immunology

1

Paper

Save

Screening of cell-virus, cell-cell, gene-gene cross-talks among kingdoms of life at single cell resolution

Dongsheng Chen et al.Aug 13, 2021

Abstract The outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) issued a significant and urgent threat to global health. The exact animal origin of SARS-CoV-2 remains obscure and understanding its host range is vital for preventing interspecies transmission. Previously, we have assessed the target cell profiles of SARS-CoV-2 in pets, livestock, poultry and wild animals. Herein, we expand this investigation to a wider range of animal species and viruses to provide a comprehensive source for large-scale screening of potential virus hosts. Single cell atlas for several mammalian species (alpaca, hamster, hedgehog, chinchilla etc.), as well as comparative atlas for lung, brain and peripheral blood mononuclear cells (PBMC) for various lineages of animals were constructed, from which we systemically analyzed the virus entry factors for 113 viruses over 20 species from mammalians, birds, reptiles, amphibians and invertebrates. Conserved cellular connectomes and regulomes were also identified, revealing the fundamental cell-cell and gene-gene cross-talks between these species. Overall, our study could help identify the potential host range and tissue tropism of SARS-CoV-2 and a diverse set of viruses and reveal the host-virus co-evolution footprints.

Genetics

Molecular Biology

9

Paper

Save

Prioritizing drug targets in systemic lupus erythematosus from a genetic perspective: a druggable genome-wide Mendelian randomization study

Yuan Gao et al.Jul 13, 2024

Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease with an unsatisfactory state of treatment. We aim to explore novel targets for SLE from a genetic standpoint.

Genetics

Immunology

0

Paper

Save

Stereopy: modeling comparative and spatiotemporal cellular heterogeneity via multi-sample spatial transcriptomics

Shuangsang Fang et al.Jan 1, 2023

Tracing cellular dynamic changes across conditions, time, and space is crucial for understanding the molecular mechanisms underlying complex biological systems. However, integrating multi-sample data in a unified and flexible way to explore cellular heterogeneity remains a major challenge. Here, we present Stereopy, a flexible and versatile framework for modeling and dissecting comparative and spatiotemporal patterns in multi-sample spatial transcriptomics with interactive data visualization. To optimize this flexible framework, we have developed three key components: a multi-sample tailored data container, a scope controller, and an analysis transformer. Furthermore, Stereopy showcases three transformative applications supported by pivotal algorithms. Firstly, the multi-sample cell community detection (CCD) algorithm introduces an innovative capability to detect specific cell communities and identify genes responsible for pathological changes in comparable datasets. Secondly, the spatially resolved temporal gene pattern inference (TGPI) algorithm represents a notable advancement in detecting important spatiotemporal gene patterns while concurrently considering spatial and temporal features, which enhances the identification of important genes, domains and regulatory factors closely associated with temporal datasets. Finally, the 3D niche-based regulation inference tool, named NicheReg3D, reconstructs the 3D cell niches to enable the inference of cell-gene interaction network within the spatial texture, thus bridging intercellular communications and intracellular regulations to unravel the intricate regulatory mechanisms that govern cellular behavior. Overall, Stereopy serves as both a bioinformatics toolbox and an extensible framework that provides researchers with enhanced data interpretation abilities and new perspectives for mining multi-sample spatial transcriptomics data.

Artificial Intelligence

Biochemistry

0

Paper

Artificial Intelligence

Biochemistry

0

Save