ResearchHub | Open Science Community

Cross-tissue immune cell analysis reveals tissue-specific features in humans

Cecilia Conde et al.May 12, 2022

Despite their crucial role in health and disease, our knowledge of immune cells within human tissues remains limited. We surveyed the immune compartment of 16 tissues from 12 adult donors by single-cell RNA sequencing and VDJ sequencing generating a dataset of ~360,000 cells. To systematically resolve immune cell heterogeneity across tissues, we developed CellTypist, a machine learning tool for rapid and precise cell type annotation. Using this approach, combined with detailed curation, we determined the tissue distribution of finely phenotyped immune cell types, revealing hitherto unappreciated tissue-specific features and clonal architecture of T and B cells. Our multitissue approach lays the foundation for identifying highly resolved immune cell types by leveraging a common reference dataset, tissue-integrated expression analysis, and antigen receptor sequencing.

Genetics

Immunology

2

Paper

Save

Accurate estimation of cell composition in bulk expression through robust integration of single-cell information

Brandon Jew et al.Apr 24, 2020

Abstract We present Bisque, a tool for estimating cell type proportions in bulk expression. Bisque implements a regression-based approach that utilizes single-cell RNA-seq (scRNA-seq) or single-nucleus RNA-seq (snRNA-seq) data to generate a reference expression profile and learn gene-specific bulk expression transformations to robustly decompose RNA-seq data. These transformations significantly improve decomposition performance compared to existing methods when there is significant technical variation in the generation of the reference profile and observed bulk expression. Importantly, compared to existing methods, our approach is extremely efficient, making it suitable for the analysis of large genomic datasets that are becoming ubiquitous. When applied to subcutaneous adipose and dorsolateral prefrontal cortex expression datasets with both bulk RNA-seq and snRNA-seq data, Bisque replicates previously reported associations between cell type proportions and measured phenotypes across abundant and rare cell types. We further propose an additional mode of operation that merely requires a set of known marker genes.

Genetics

Epidemiology

1

Paper

Save

Accurate estimation of cell composition in bulk expression through robust integration of single-cell information

Brandon Jew et al.Jun 15, 2019

Abstract We present Bisque, a tool for estimating cell type proportions in bulk expression. Bisque implements a regression-based approach that utilizes single-cell RNA-seq (scRNA-seq) data to generate a reference expression profile and learn gene-specific bulk expression transformations to robustly decompose RNA-seq data. These transformations significantly improve decomposition performance compared to existing methods when there is significant technical variation in the generation of the reference profile and observed bulk expression. Importantly, compared to existing methods, our approach is extremely efficient, making it suitable for the analysis of large genomic datasets that are becoming ubiquitous. When applied to subcutaneous adipose and dorsolateral prefrontal cortex expression datasets with both bulk RNA-seq and single-nucleus RNA-seq (snRNA-seq) data, Bisque was able to replicate previously reported associations between cell type proportions and measured phenotypes across abundant and rare cell types. Bisque requires a single-cell reference dataset that reflects physiological cell type composition and can further leverage datasets that includes both bulk and single cell measurements over the same samples for improved accuracy. We further propose an additional mode of operation that merely requires a set of known marker genes. Bisque is available as an R package at: https://github.com/cozygene/bisque .

Genetics

Artificial Intelligence

0

Paper

Save

Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

Marcus Alvarez et al.Sep 30, 2019

Abstract Single-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro , 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem .

Genetics

Molecular Biology

0

Paper

Save

Phenotypic subtyping via contrastive learning

Aditya Gorla et al.Jan 6, 2023

Defining and accounting for subphenotypic structure has the potential to increase statistical power and provide a deeper understanding of the heterogeneity in the molecular basis of complex disease. Existing phenotype subtyping methods primarily rely on clinically observed heterogeneity or metadata clustering. However, they generally tend to capture the dominant sources of variation in the data, which often originate from variation that is not descriptive of the mechanistic heterogeneity of the phenotype of interest; in fact, such dominant sources of variation, such as population structure or technical variation, are, in general, expected to be independent of subphenotypic structure. We instead aim to find a subspace with signal that is unique to a group of samples for which we believe that subphenotypic variation exists (e.g., cases of a disease). To that end, we introduce Phenotype Aware Components Analysis (PACA), a contrastive learning approach leveraging canonical correlation analysis to robustly capture weak sources of subphenotypic variation. In the context of disease, PACA learns a gradient of variation unique to cases in a given dataset, while leveraging control samples for accounting for variation and imbalances of biological and technical confounders between cases and controls. We evaluated PACA using an extensive simulation study, as well as on various subtyping tasks using genotypes, transcriptomics, and DNA methylation data. Our results provide multiple strong evidence that PACA allows us to robustly capture weak unknown variation of interest while being calibrated and well-powered, far superseding the performance of alternative methods. This renders PACA as a state-of-the-art tool for defining de novo subtypes that are more likely to reflect molecular heterogeneity, especially in challenging cases where the phenotypic heterogeneity may be masked by a myriad of strong unrelated effects in the data.

Genetics

Paleontology

6

Paper

Save

Identifying systematic variation at the single-cell level by leveraging low-resolution population-level data

Elior Rahmani et al.Jan 28, 2022

Abstract A major limitation in single-cell genomics is a lack of ability to conduct cost-effective population-level studies. As a result, much of the current research in single-cell genomics focuses on biological processes that are broadly conserved across individuals, such as cellular organization and tissue development. This limitation prevents us from studying the etiology of experimental or clinical conditions that may be inconsistent across individuals owing to molecular variation and a wide range of effects in the population. In order to address this gap, we developed “kernel of integrated single cells” (Keris), a novel model-based framework to inform the analysis of single-cell gene expression data with population-level effects of a condition of interest. By inferring cell-type-specific moments and their variation across conditions using large tissue-level bulk data representing a population, Keris allows us to generate testable hypotheses at the single-cell level that would otherwise require collecting single-cell data from a large number of donors. Within the Keris framework, we show how the combination of low-resolution, large bulk data with small but high-resolution single-cell data enables the identification of changes in cell-subtype compositions and the characterization of subpopulations of cells that are affected by a condition of interest. Using Keris we estimate linear and non-linear age-associated changes in cell-type expression in large bulk peripheral blood mononuclear cells (PBMC) data. Combining with three independent single-cell PBMC datasets, we demonstrate that Keris can identify changes in cell-subtype composition with age and capture cell-type-specific subpopulations of senescent cells. This demonstrates the promise of enhancing single-cell data with population-level information to study compositional changes and to profile condition-affected subpopulations of cells, and provides a potential resource of targets for future clinical interventions.

Genetics

Immunology

4

Paper

Save

A unified model for cell-type resolution genomics from heterogeneous omics data

Zeyuan Chen et al.Jan 30, 2024

The vast majority of population-scale genomic datasets collected to date consist of “bulk” samples obtained from heterogeneous tissues, reflecting mixtures of different cell types. In order to facilitate discovery at the cell-type level, there is a pressing need for computational deconvolution methods capable of leveraging the multitude of underutilized bulk profiles already collected across various organisms, tissues, and conditions. Here, we introduce Unico, a unified cross-omics method designed to deconvolve standard 2-dimensional bulk matrices of samples by features into 3-dimensional tensors representing samples by features by cell types. Unico stands out as the first principled model-based deconvolution method that is theoretically justified for any heterogeneous genomic data. Through the deconvolution of bulk gene expression and DNA methylation datasets, we demonstrate that the transferability of Unico across different data modalities translates into superior performance compared to existing approaches. This advancement enhances our capability to conduct powerful large-scale genomic studies at cell-type resolution without the need for cell sorting or single-cell biology. An R implementation of Unico is available on CRAN.

Genetics

Molecular Biology

0

Paper

Save

Genome wide association study and genomic risk prediction of age related macular degeneration in Israel

Michelle Grunin et al.Jun 6, 2024

Abstract The risk of developing age-related macular degeneration (AMD) is influenced by genetic background. In 2016, the International AMD Genomics Consortium (IAMDGC) identified 52 risk variants in 34 loci, and a polygenic risk score (PRS) from these variants was associated with AMD. The Israeli population has a unique genetic composition: Ashkenazi Jewish (AJ), Jewish non-Ashkenazi, and Arab sub-populations. We aimed to perform a genome-wide association study (GWAS) for AMD in Israel, and to evaluate PRSs for AMD. Our discovery set recruited 403 AMD patients and 256 controls at Hadassah Medical Center. We genotyped individuals via custom exome chip. We imputed non-typed variants using cosmopolitan and AJ reference panels. We recruited additional 155 cases and 69 controls for validation. To evaluate predictive power of PRSs for AMD, we used IAMDGC summary-statistics excluding our study and developed PRSs via clumping/thresholding or LDpred2. In our discovery set, 31/34 loci reported by IAMDGC were AMD-associated (P < 0.05). Of those, all effects were directionally consistent with IAMDGC and 11 loci had a P-value under Bonferroni-corrected threshold (0.05/34 = 0.0015). At a 5 × 10 −5 threshold, we discovered four suggestive associations in FAM189A1 , IGDCC4 , C7orf50 , and CNTNAP4 . Only the FAM189A1 variant was AMD-associated in the replication cohort after Bonferroni-correction. A prediction model including LDpred2-based PRS + covariates had an AUC of 0.82 (95% CI 0.79–0.85) and performed better than covariates-only model (P = 5.1 × 10 −9 ). Therefore, previously reported AMD-associated loci were nominally associated with AMD in Israel. A PRS developed based on a large international study is predictive in Israeli populations.

Genetics

Oncology

0

Paper

Save

Multimodal profiling reveals tissue-directed signatures of human immune cells altered with age

Steven Wells et al.Jan 3, 2024

The immune system comprises multiple cell lineages and heterogeneous subsets found in blood and tissues throughout the body. While human immune responses differ between sites and over age, the underlying sources of variation remain unclear as most studies are limited to peripheral blood. Here, we took a systems approach to comprehensively profile RNA and surface protein expression of over 1.25 million immune cells isolated from blood, lymphoid organs, and mucosal tissues of 24 organ donors aged 20-75 years. We applied a multimodal classifier to annotate the major immune cell lineages (T cells, B cells, innate lymphoid cells, and myeloid cells) and their corresponding subsets across the body, leveraging probabilistic modeling to define bases for immune variations across donors, tissue, and age. We identified dominant tissue-specific effects on immune cell composition and function across lineages for lymphoid sites, intestines, and blood-rich tissues. Age-associated effects were intrinsic to both lineage and site as manifested by macrophages in mucosal sites, B cells in lymphoid organs, and T and NK cells in blood-rich sites. Our results reveal tissue-specific signatures of immune homeostasis throughout the body and across different ages. This information provides a basis for defining the transcriptional underpinnings of immune variation and potential associations with disease-associated immune pathologies across the human lifespan.

Immunology

Molecular Biology

0

Paper

Save

Highly parameterized polygenic scores tend to overfit to population stratification via random effects

Alan Aw et al.Jan 29, 2024

Abstract Polygenic scores (PGSs), increasingly used in clinical settings, frequently include many genetic variants, with performance typically peaking at thousands of variants. Such highly parameterized PGSs often include variants that do not pass a genome-wide significance threshold. We propose a mathematical perspective that renders the effects of many of these nonsignificant variants random rather than causal, with the randomness capturing population structure. We devise methods to assess variant effect randomness and population stratification bias. Applying these methods to 141 traits from the UK Biobank, we find that, for many PGSs, the effects of non-significant variants are considerably random, with the extent of randomness associated with the degree of overfitting to population structure of the discovery cohort. Our findings explain why highly parameterized PGSs simultaneously have superior cohort-specific performance and limited generalizability, suggesting the critical need for variant randomness tests in PGS evaluation. Supporting code and a dashboard are available at https://github.com/songlab-cal/StratPGS .

Genetics

Artificial Intelligence

0

Paper

Genetics

1

0

Save