ResearchHub | Open Science Community

Library size confounds biology in spatial transcriptomics data

Dharmesh Bhuva et al.Mar 15, 2023

Abstract Spatial molecular technologies have revolutionised the study of disease microenvironments by providing spatial context to tissue heterogeneity. Recent spatial technologies are increasing the throughput and spatial resolution of measurements, resulting in larger datasets. The added spatial dimension and volume of measurements poses an analytics challenge that has, in the short-term, been addressed by adopting methods designed for the analysis of single-cell RNA-seq data. Though these methods work well in some cases, not all necessarily translate appropriately to spatial technologies. A common assumption is that total sequencing depth, also known as library size, represents technical variation in single-cell RNA-seq technologies, and this is often normalised out during analysis. Through analysis of several different spatial datasets, we noted that this assumption does not necessarily hold in spatial molecular data. To formally assess this, we explore the relationship between library size and independently annotated spatial regions, across 23 samples from 4 different spatial technologies with varying throughput and spatial resolution. We found that library size confounded biology across all technologies, regardless of the tissue being investigated. Statistical modelling of binned total transcripts shows that tissue region is strongly associated with library size across all technologies, even after accounting for cell density of the bins. Through a benchmarking experiment, we show that normalising out library size leads to sub-optimal spatial domain identification using common graph-based clustering algorithms. On average, better clustering was achieved when library size effects were not normalised out explicitly, especially with data from the newer sub-cellular localised technologies. Taking these results into consideration, we recommend that spatial data should not be specifically corrected for library size prior to analysis unless strongly motivated. We also emphasise that spatial data are different to single-cell RNA-seq and care should be taken when adopting algorithms designed for single cell data.

Artificial Intelligence

Paleontology

42

Paper

Artificial Intelligence

5

0

Save

1

vissE: A versatile tool to identify and visualise higher-order molecular phenotypes from functional enrichment analysis

Dharmesh Bhuva et al.Mar 7, 2022

Abstract Functional analysis of high throughput experiments using pathway analysis is now ubiquitous. Though powerful, these methods often produce thousands of redundant results owing to knowledgebase redundancies upstream. This scale of results hinders extensive exploration by biologists and often leads to investigator biases due to previous knowledge and expectations. To address this issue, we present vissE, a flexible network-based analysis method that summarises redundancies into biological themes and provides various analytical modules to characterise and visualise them with respect to the underlying data, thus providing a comprehensive view of the biological system. We demonstrate vissE’s versatility by applying it to three different technologies: bulk, single-cell and spatial transcriptomics. Applying vissE to a factor analysis of a breast cancer spatial transcriptomic data, we identified stromal phenotypes that support tumour dissemination. Its adaptability allows vissE to enhance all existing gene-set enrichment and pathway analysis workflows, removing investigator bias from molecular discovery. Abstract Figure

Ecology

Molecular Biology

1

Paper

Save

CLARA: A web portal for interactive exploration of the cardiovascular cellular landscape in health and disease

Malathi Dona et al.Jul 19, 2021

ABSTRACT Mammalian cardiovascular tissues are comprised of complex and diverse collections of cells. Recent advances in single-cell profiling technologies have accelerated our understanding of tissue cellularity and the molecular networks that orchestrate cardiovascular development, maintain homeostasis, and are disrupted in pathological states. Despite the rapid development and application of these technologies, many cardiac single-cell functional genomics datasets remain inaccessible for most cardiovascular biologists. Access to custom visual representations of the data, including querying changes in cellular phenotypes and interactions in diverse contexts, remains unavailable in publicly accessible data portals. Visualizing data is also challenging for scientists without expertise in processing single-cell genomic data. Here we present CLARA—CardiovascuLAR Atlas—a web portal facilitating exploration of the cardiovascular cellular landscape. Using mouse and human single-cell transcriptomic datasets, CLARA enables scientists unfamiliar with single-cell-omic data analysis approaches to examine gene expression patterns and the cell population dynamics of cardiac cells in a range of contexts. The web-application also enables investigation of intercellular interactions that form the cardiac cellular niche. CLARA is designed for ease-of-use and we anticipate that the portal will aid deeper exploration of cardiovascular cellular landscapes in the context of development, homeostasis and disease. CLARA is freely available at https://clara.baker.edu.au .

Genetics

Biophysics

1

Paper

Save

hoodscanR: profiling single-cell neighborhoods in spatial transcriptomics data

Ning Liu et al.Mar 29, 2024

Abstract Understanding complex cellular niches and neighborhoods are giving us new insights into tissue biology. Accurate neighborhood identification is crucial, yet existing methodologies often struggle to detect mixed neighborhoods and generate cell-specific neighborhood profiles. To address these limitations, we introduce hoodscanR, a Bioconductor package designed for neighborhood identification and downstream analyses using spatial data. Applying hoodscanR to breast and lung cancer datasets, we showcase its efficacy in conducting detailed neighborhood analyses and identify subtle transcriptional changes in tumor cells from different neighborhoods. Such analyses can help researchers gain valuable insights into disease mechanisms and potential therapeutic targets.

Molecular Biology

Computer Science

0

Paper

Save

Identification of cell types, states and programs by learning gene set representations

Soroor Hediyeh‐Zadeh et al.Jan 1, 2023

As single cell molecular data expand, there is an increasing need for algorithms that efficiently query and prioritize gene programs, cell types and states in single-cell sequencing data, particularly in cell atlases. Here we present scDECAF, a statistical learning algorithm to identify cell types, states and programs in single-cell gene expression data using vector representation of gene sets, which improves biological interpretation by selecting a subset of most biologically relevant programs. We applied scDECAF to scRNAseq data from PBMC, Lung, Pancreas, Brain and slide-tags snRNA of human prefrontal cortex for automatic cell type annotation. We demonstrate that scDECAF can recover perturbed gene programs in Lupus PBMC cells stimulated with IFNbeta and TGFBeta-induced cells undergoing epithelial-to-mesenchymal transition. scDECAF delineates patient-specific heterogeneity in cellular programs in Ovarian Cancer data. Using a healthy PBMC reference, we apply scDECAF to a mapped query PBMC COVID-19 case-control dataset and identify multicellular programs associated with severe COVID-19. scDECAF can improve biological interpretation and complement reference mapping analysis, and provides a method for gene set and pathway analysis in single cell gene expression data.

Genetics

Biophysics

0

Paper

Genetics

Biophysics

0

Save