ResearchHub | Open Science Community

So you think you can PLS-DA?

Daniel Ruiz-Perez et al.Dec 1, 2020

Abstract Background Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). Results We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda Conclusions Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

246

0

Save

0

So you think you can PLS-DA?

Daniel Ruiz-Perez et al.Oct 21, 2017

Abstract Background Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). Results We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda Conclusions Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

12

0

Save

62

Microbiome Maps: Hilbert Curve Visualizations of Metagenomic Profiles

Camilo Valdes et al.Mar 23, 2021

Abstract Motivation Abundance profiles from metagenomic sequencing data synthesize information from billions of sequenced reads coming from thousands of microbial genomes. Analyzing and understanding these profiles can be a challenge since the data they represent are complex. Particularly challenging is their visualization, as existing techniques are inadequate when the taxa number is in the thousands. We present a technique, and accompanying software, for the visualization of metagenomic abundance profiles using a space-filling curve that transforms a profile into an interactive 2D image. Results We created J asper , an easy to use tool for the visualization and exploration of metagenomic profiles from DNA sequencing data. It orders taxa using a space-filling Hilbert curve, and creates a “Microbiome Map”, where each position in the image represents the abundance of a single taxon from a reference collection. J asper can order taxa in multiple ways, and the resulting microbiome maps can highlight “hot spots” of microbes that are dominant in taxonomic clades or biological conditions. We use J asper to visualize samples from a variety of microbiome studies, and discuss ways in which microbiome maps can be an invaluable tool to visualize spatial, temporal, disease, and differential profiles. Our approach can create detailed microbiome maps involving hundreds of thousands of microbial reference genomes with the potential to unravel latent relationships (taxonomic, spatio-temporal, functional, and other) that could remain hidden using traditional visualization techniques. The maps can also be converted into animated movies that bring to life the dynamicity of microbiomes. Availability J asper is freely available at microbiomemaps.org and via biorg.cs.fiu.edu/jasper Contact cvaldes2@unl.edu ; giri@fiu.edu Supplementary information Supplementary materials are available at microbiomemaps.org

Genetics

Ecology

62

Paper

Save

Unfolding and De-confounding: Biologically meaningful causal inference from longitudinal multi-omic networks using`METALICA`

Daniel Ruiz-Perez et al.Dec 13, 2023

ABSTRACT A key challenge in the analysis of microbiome data is the integration of multi-omic datasets and the discovery of interactions between microbial taxa, their expressed genes, and the metabolites they consume and/or produce. In an effort to improve the state-of-the-art in inferring biologically meaningful multi-omic interactions, we sought to address some of the most fundamental issues in causal inference from longitudinal multi-omics microbiome data sets. We developed METALICA, a suite of tools and techniques that can infer interactions between microbiome entities. METALICA introduces novel unrolling and de-confounding techniques used to uncover multi-omic entities that are believed to act as confounders for some of the relationships that may be inferred using standard causal inferencing tools. The results lend support to predictions about biological models and processes by which microbial taxa interact with each other in a microbiome. The unrolling process helps to identify putative intermediaries (genes and/or metabolites) to explain the interactions between microbes; the de-confounding process identifies putative common causes that may lead to spurious relationships to be inferred. METALICA was applied to the networks inferred by existing causal discovery and network inference algorithms applied to a multi-omics data set resulting from a longitudinal study of IBD microbiomes. The most significant unrollings and de-confoundings were manually validated using the existing literature and databases. Importance We have developed a suite of tools and techniques capable of inferring interactions between microbiome entities. METALICAintroduces novel techniques called unrolling and de-confounding that are employed to uncover multi-omic entities considered to be confounders for some of the relationships that may be inferred using standard causal inferencing tools. To evaluate our method, we conducted tests on the Inflammatory Bowel Disease (IBD) dataset from the iHMP longitudinal study, which we pre-processed in accordance with our previous work.

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

Molecular Biology

0

Save

0

Unfolding and de-confounding: biologically meaningful causal inference from longitudinal multi-omic networks using METALICA

Daniel Ruiz-Perez et al.Sep 6, 2024

ABSTRACT A key challenge in the analysis of microbiome data is the integration of multi-omic datasets and the discovery of interactions between microbial taxa, their expressed genes, and the metabolites they consume and/or produce. In an effort to improve the state of the art in inferring biologically meaningful multi-omic interactions, we sought to address some of the most fundamental issues in causal inference from longitudinal multi-omics microbiome data sets. We developed METALICA, a suite of tools and techniques that can infer interactions between microbiome entities. METALICA introduces novel unrolling and de-confounding techniques used to uncover multi-omic entities that are believed to act as confounders for some of the relationships that may be inferred using standard causal inferencing tools. The results lend support to predictions about biological models and processes by which microbial taxa interact with each other in a microbiome. The unrolling process helps identify putative intermediaries (genes and/or metabolites) to explain the interactions between microbes; the de-confounding process identifies putative common causes that may lead to spurious relationships to be inferred. METALICA was applied to the networks inferred by existing causal discovery, and network inference algorithms were applied to a multi-omics data set resulting from a longitudinal study of IBD microbiomes. The most significant unrollings and de-confoundings were manually validated using the existing literature and databases. IMPORTANCE We have developed a suite of tools and techniques capable of inferring interactions between microbiome entities. METALICA introduces novel techniques called unrolling and de-confounding that are employed to uncover multi-omic entities considered to be confounders for some of the relationships that may be inferred using standard causal inferencing tools. To evaluate our method, we conducted tests on the inflammatory bowel disease (IBD) dataset from the iHMP longitudinal study, which we pre-processed in accordance with our previous work. From this dataset, we generated various subsets, encompassing different combinations of metagenomics, metabolomics, and metatranscriptomics datasets. Using these multi-omics datasets, we demonstrate how the unrolling process aids in the identification of putative intermediaries (genes and/or metabolites) to explain the interactions between microbes. Additionally, the de-confounding process identifies potential common causes that may give rise to spurious relationships to be inferred. The most significant unrollings and de-confoundings were manually validated using the existing literature and databases.

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

Molecular Biology

0

Save

0

Dynamic interaction network inference from longitudinal microbiome data

Jose Lugo-Martinez et al.Sep 29, 2018

Background: Several studies have focused on the microbiota living in environmental niches including human body sites. In many of these studies researchers collect longitudinal data with the goal of understanding not just the composition of the microbiome but also the interactions between the different taxa. However, analysis of such data is challenging and very few methods have been developed to reconstruct dynamic models from time series microbiome data. Results: Here we present a computational pipeline that enables the integration of data across individuals for the reconstruction of such models. Our pipeline starts by aligning the data collected for all individuals. The aligned profiles are then used to learn a dynamic Bayesian network which represents causal relationships between taxa and clinical variables. Testing our methods on three longitudinal microbiome data sets we show that our pipeline improve upon prior methods developed for this task. We also discuss the biological insights provided by the models which include several known and novel interactions. Conclusions: We propose a computational pipeline for analyzing longitudinal microbiome data. Our results provide evidence that microbiome alignments coupled with dynamic Bayesian networks improve predictive performance over previous methods and enhance our ability to infer biological relationships within the microbiome and between taxa and clinical factors.

Artificial Intelligence

Microbiology

0

Paper

Artificial Intelligence

Microbiology

0

Save

0

Inferring directional relationships in microbial communities using signed Bayesian networks

Musfiqur Sazal et al.Feb 20, 2020

Background: Microbe-microbe and host-microbe interactions in a microbiome play a vital role in both health and disease. However, the structure of the microbial community and the colonization patterns are highly complex to infer even under controlled wet laboratory conditions. In this study, we investigate what information, if any, can be provided by a Bayesian Network (BN) about a microbial community. Unlike the previously proposed Co-occurrence Networks (CoNs), BNs are based on conditional dependencies and can help in revealing complex associations. Results: In this paper, we propose a way of combining a BN and a CoN to construct a signed Bayesian Network (sBN). We report a surprising association between directed edges in signed BNs and known colonization orders. Conclusions: BNs are powerful tools for community analysis and extracting influences and colonization patterns, even though the analysis only uses an abundance matrix with no temporal information. We conclude that directed edges in sBNs when combined with negative correlations are consistent with and strongly suggestive of colonization order. Keywords: Bayesian Networks; Conditional Dependence; Microbiome; Colonization Order; PC-stable

Ecology

Artificial Intelligence

0

Paper

Ecology

Artificial Intelligence

0

Save

0

Dynamic Bayesian networks for integrating multi-omics time-series microbiome data

Daniel Ruiz-Perez et al.Nov 8, 2019

A key challenge in the analysis of longitudinal microbiomes data is to go beyond computing their compositional profiles and infer the complex web of interactions between the various microbial taxa, their genes, and the metabolites they consume and produce. To address this challenge, we developed a computational pipeline that first aligns multi-omics data and then uses dynamic Bayesian networks (DBNs) to integrate them into a unified model. We discuss how our approach handles the different sampling and progression rates between individuals, how we reduce the large number of different entities and parameters in the DBNs, and the construction and use of a validation set to model edges. Applying our method to data collected from Inflammatory Bowel Disease (IBD) patients, we show that it can accurately identify known and novel interactions between various entities and can improve on current methods for learning such interactions. Experimental validations support several predictions about novel metabolite-taxa interactions. The source code is freely available under the MIT Open Source license agreement and can be downloaded from https://github.com/DaniRuizPerez/longitudinal\_multiomic\_analysis_public.

Genetics

Ecology

0

Paper

Genetics

Ecology

0

Save

So you think you can PLS-DA?

So you think you can PLS-DA?

Microbiome Maps: Hilbert Curve Visualizations of Metagenomic Profiles

Unfolding and De-confounding: Biologically meaningful causal inference from longitudinal multi-omic networks usingMETALICA

Unfolding and de-confounding: biologically meaningful causal inference from longitudinal multi-omic networks using METALICA

Dynamic interaction network inference from longitudinal microbiome data

Inferring directional relationships in microbial communities using signed Bayesian networks

Dynamic Bayesian networks for integrating multi-omics time-series microbiome data

Unfolding and De-confounding: Biologically meaningful causal inference from longitudinal multi-omic networks using`METALICA`