ResearchHub | Open Science Community

Integrated Identification of Disease Specific Pathways Using Multi-omics data

Yingzhou Lu et al.Jun 11, 2019

Abstract Motivation Identification of biological pathways plays a central role in understanding both human health and diseases. Although much work has previously been done to explore the biological pathways by using single omics data, little effort has been reported using multi-omics data integration, mainly due to methodological and technological limitations. Compared to single omics data, multi-omics data will help identifying disease specific functional pathways with both higher sensitivity and specificity, thus gaining more comprehensive insights into the molecular architecture of disease processes. Results In this paper, we propose two computational approaches that integrate multi-omics data and identify disease-specific biological pathways with high sensitivity and specificity. Applying our methods to an experimental multi-omics data dataset on muscular dystrophy subtypes, we identified disease-specific pathways of high biological plausibility. The developed methodology will likely have a broad impact on improving the molecular characterization of many common diseases. Contact yuewang@vt.edu Supplementary information Supplementary information attached.

Genetics

Molecular Biology

0

Paper

Save

Cell group analysis reveals changes in upper-layer neurons associated with schizophrenia

Chao Chen et al.Oct 23, 2020

Abstract Genome-wide association studies (GWAS) of schizophrenia (SCZ) have revealed over 100 risk loci. We investigated whether these SCZ-associated variants regulate gene expression by cell type. Using a fully unsupervised deconvolution method, we calculated gene expression by clusters of estimated cell types (cell-groups, CGs). Five CGs emerged in the dorsolateral prefrontal cortices (DLPFC) of 341 donors with and without SCZ. By mapping expression quantitative trait loci (eQTL) per CG, we partitioned the heritability of SCZ risk in GWAS by CGs. CG-specific expressions and eQTLs were replicated in both a deconvoluted bulk tissue data set with a different method and also in sorted-cell expression data. Further, we characterized CG-specific gene differential expression and cell proportion changes in SCZ brains. We found upper-layer neurons in the DLPFC to be associated with SCZ based on enrichment of SCZ heritability in eQTLs, disease-related transcriptional signatures, and decreased cell proportion. Our study suggests that neurons and related anomalous circuits in the upper layers of the DLPFC may have a major contribution to SCZ risk.

Genetics

Molecular Biology

3

Paper

Save

COT: an efficient Python tool for detecting marker genes among many subtypes

Yingzhou Lu et al.Jan 11, 2021

Abstract We develop an accurate and efficient method to detect marker genes among many subtypes using subtype-enriched expression profiles. We implement a Cosine based One-sample Test (COT) Python software that is easy to use and applicable to multi-omics data. We demonstrate the performance and utility of COT on gene expression and proteomics data acquired from tissue or cell subtypes. Formulated as a one-sample test with Cosine similarity test statistic in scatter space, the detected de novo marker genes will allow biologists to perform a more comprehensive and unbiased molecular characterization, deconvolution and classification of complex tissue or cell subtypes.

Genetics

Artificial Intelligence

3

Paper

Save

Determining molecular archetype composition and expression from bulk tissues with unsupervised deconvolution

Wu Chao et al.Jul 13, 2021

Complex tissues are composite ecological systems whose components interact with each other to create a unique physiological or pathophysiological state distinct from that found in other tissue microenvironments. To explore this ground yet dynamic state, molecular profiling of bulk tissues and mathematical deconvolution can be jointly used to characterize heterogeneity as an aggregate of molecularly distinct tissue or cell subtypes. We first introduce an efficient and fully unsupervised deconvolution method, namely the Convex Analysis of Mixtures – CAM3.0, that may aid biologists to confirm existing or generate novel scientific hypotheses about complex tissues in many biomedical contexts. We then evaluate the CAM3.0 functional pipelines using both simulations and benchmark data. We also report diverse case studies on bulk tissues with unknown number, proportion and expression patterns of the molecular archetypes. Importantly, these preliminary results support the concept that expression patterns of molecular archetypes often reflect the interactive not individual contributions of many known or novel cell types, and unsupervised deconvolution would be more powerful in uncovering novel multicellular or subcellular archetypes.

Genetics

Artificial Intelligence

1

Paper

Save

swCAM: estimation of subtype-specific expressions in individual samples with unsupervised sample-wise deconvolution

Lulu Chen et al.Jan 5, 2021

Abstract Motivation Complex biological tissues are often a heterogeneous mixture of several molecularly distinct cell or tissue subtypes. Both subtype compositions and expressions in individual samples can vary across different biological states or conditions. Computational deconvolution aims to dissect patterns of bulk gene expression data into subtype compositions and subtype-specific expressions. Typically, existing deconvolution methods can only estimate averaged subtype-specific expressions in a population, while detecting differential expressions or co-expression networks in particular subtypes requires unique subtype expression estimates in individual samples. Different from population-level deconvolution, however, individual-level deconvolution is mathematically an underdetermined problem because there are more variables than observations. Results We report a sample-wise Convex Analysis of Mixtures (swCAM) method that can estimate subtype proportions and subtype-specific expressions in individual samples from bulk tissue transcriptomes. We extend our previous CAM framework to include a new term accounting for between-sample variations and formulate swCAM as a nuclear-norm and ℓ 2,1 -norm regularized matrix factorization problem. We determine hyperparameter values using a cross-validation scheme with random entry exclusion and obtain a swCAM solution using an efficient alternating direction method of multipliers. The swCAM is implemented in open-source R scripts. Experimental results on realistic simulation data show that swCAM can accurately estimate subtype-specific expressions in individual samples and successfully extract co-expression networks in particular subtypes that are otherwise unobtainable using bulk expression data. Application of swCAM to bulk-tissue data of 320 samples from bipolar disorder patients and controls identified changes in cell proportions, expression and coexpression modules in patient neurons. Mitochondria related genes showed significant changes suggesting an important role of energy dysregulation in bipolar disorder. Availability and implementation The R Scripts of swCAM is freely available at https://github.com/Lululuella/swCAM . A user’s guide and a vignette are provided. Contact yuewang@vt.edu Supplementary information Supplementary data are available at Bioinformatics online.

Artificial Intelligence

Molecular Biology

6

Paper

Artificial Intelligence

1

0

Save

0

Alignment of LC-MS Profiles by Neighbor-wise Compound-specific Graphical Time Warping with Misalignment Detection

Chiung-Ting Wu et al.Jul 26, 2019

Motivation: Liquid chromatography - mass spectrometry (LC-MS) is a standard method for proteomics and metabolomics analysis of biological samples. Unfortunately, it suffers from small changes in the retention times (RT) of the same compound in different samples, and these must be subsequently corrected (aligned) during data processing. Classic alignment methods such as in the popular XCMS package often assume a single time-warping function for each sample. Thus, the potentially varying RT drift for compounds with different masses in a sample is neglected in these methods. Moreover, the systematic change in RT drift across run order is often not considered by alignment algorithms. Therefore, these methods cannot completely correct misalignments. For a large-scale experiment involving many samples, the existence of misalignment becomes inevitable and concerning. Results: Here we describe an integrated reference-free profile alignment method, neighbor-wise compound-specific Graphical Time Warping (ncGTW), that can detect misaligned features and align profiles by leveraging expected RT drift structures and compound-specific warping functions. Specifically, ncGTW uses individualized warping functions for different compounds and assigns constraint edges on warping functions of neighboring samples. Validated with both realistic synthetic data and internal quality control samples, ncGTW applied to two large-scale metabolomics LC-MS datasets identifies many misaligned features and successfully realigns them. These features would otherwise be discarded or uncorrected using existing methods. The ncGTW software tool is developed currently as a plug-in to the XCMS package.

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

Molecular Biology

0

Save

0

Radial Displacement Measurement Method for Magnetic Bearing based on FPC Coils

Jinghua Hu et al.Jan 1, 2024

Mechanical Engineering

Aerospace Engineering

0

Paper

Mechanical Engineering

Aerospace Engineering

0

Save