ResearchHub | Open Science Community

Bioinformatic Genome Comparisons for Taxonomic and Phylogenetic Assignments Using Aeromonas as a Test Case

Sophie Colston et al.Nov 19, 2014

Prokaryotic taxonomy is the underpinning of microbiology, as it provides a framework for the proper identification and naming of organisms. The "gold standard" of bacterial species delineation is the overall genome similarity determined by DNA-DNA hybridization (DDH), a technically rigorous yet sometimes variable method that may produce inconsistent results. Improvements in next-generation sequencing have resulted in an upsurge of bacterial genome sequences and bioinformatic tools that compare genomic data, such as average nucleotide identity (ANI), correlation of tetranucleotide frequencies, and the genome-to-genome distance calculator, or in silico DDH (isDDH). Here, we evaluate ANI and isDDH in combination with phylogenetic studies using Aeromonas, a taxonomically challenging genus with many described species and several strains that were reassigned to different species as a test case. We generated improved, high-quality draft genome sequences for 33 Aeromonas strains and combined them with 23 publicly available genomes. ANI and isDDH distances were determined and compared to phylogenies from multilocus sequence analysis of housekeeping genes, ribosomal proteins, and expanded core genes. The expanded core phylogenetic analysis suggested relationships between distant Aeromonas clades that were inconsistent with studies using fewer genes. ANI values of ≥ 96% and isDDH values of ≥ 70% consistently grouped genomes originating from strains of the same species together. Our study confirmed known misidentifications, validated the recent revisions in the nomenclature, and revealed that a number of genomes deposited in GenBank are misnamed. In addition, two strains were identified that may represent novel Aeromonas species.Improvements in DNA sequencing technologies have resulted in the ability to generate large numbers of high-quality draft genomes and led to a dramatic increase in the number of publically available genomes. This has allowed researchers to characterize microorganisms using genome data. Advantages of genome sequence-based classification include data and computing programs that can be readily shared, facilitating the standardization of taxonomic methodology and resolving conflicting identifications by providing greater uniformity in an overall analysis. Using Aeromonas as a test case, we compared and validated different approaches. Based on our analyses, we recommend cutoff values for distance measures for identifying species. Accurate species classification is critical not only to obviate the perpetuation of errors in public databases but also to ensure the validity of inferences made on the relationships among species within a genus and proper identification in clinical and veterinary diagnostic laboratories.

Genetics

Immunology

0

Paper

Save

Systematic Detection of Large-Scale Multi-Gene Horizontal Transfer in Prokaryotes

Lina Kloub et al.Aug 27, 2020

Abstract Horizontal gene transfer (HGT) is central to prokaryotic evolution. However, little is known about the “scale” of individual HGT events. In this work, we introduce the first computational framework to help answer the following fundamental question: How often does more than one gene get horizontally transferred in a single HGT event? Our method, called HoMer , uses phylogenetic reconciliation to infer single-gene HGT events across a given set of species/strains, employs several techniques to account for inference error and uncertainty, combines that information with gene order information from extant genomes, and uses statistical analysis to identify candidate horizontal multi-gene transfers (HMGTs) in both extant and ancestral species/strains. HoMer is highly scalable and can be easily used to infer HMGTs across hundreds of genomes. We apply HoMer to a genome-scale dataset of over 22000 gene families from 103 Aeromonas genomes and identify a large number of plausible HMGTs of various scales at both small and large phylogenetic distances. Analysis of these HMGTs reveals interesting relationships between gene function, phylogenetic distance, and frequency of multi-gene transfer. Among other insights, we find that (i) the relative frequency of HMGT increases as divergence between genomes increases, (ii) HMGTs often have conserved gene functions, and (iii) rare genes are frequently acquired through HMGT. We also analyze in detail HMGTs involving the zonula occludens toxin and type III secretion systems. By enabling the systematic inference of HMGTs on a large scale, HoMer will facilitate a more accurate and more complete understanding of HGT and microbial evolution.

Genetics

Ecology

7

Paper

Save

The Patchy Distribution of Restriction-Modification System Genes and the Conservation of Orphan Methyltransferases in Halobacteria.

Matthew Fullmer et al.Feb 15, 2019

Restriction-modification (RM) systems in Bacteria are implicated in multiple biological roles ranging from defense against parasitic genetic elements, to selfish addiction cassettes, and barriers to gene transfer and lineage homogenization. In Bacteria, DNA-methylation without cognate restriction also plays important roles in DNA replication, mismatch repair, protein expression, and in biasing DNA uptake. Little is known about archaeal RM systems and DNA methylation. To elucidate further understanding for the role of RM systems and DNA methylation in Archaea, we undertook a survey of the presence of RM system genes and related genes, including orphan DNA methylases, in the halophilic archaeal class Halobacteria. Our results reveal that some orphan DNA methyltransferase genes were highly conserved among lineages indicating an important functional constraint, whereas RM systems demonstrated patchy patterns of presence and absence. This irregular distribution is due to frequent horizontal gene transfer and gene loss, a finding suggesting that the evolution and life cycle of RM systems may be best described as that of a selfish genetic element. A putative target motif (CTAG) of one of the orphan methylases was underrepresented in all of the analyzed genomes, whereas another motif (GATC) was overrepresented in most of the haloarchaeal genomes, particularly in those that encoded the cognate orphan methylase.

Genetics

Molecular Biology

0

Paper

Save

Expanding the utility of sequence comparisons using data from whole genomes

Sophia Gosselin et al.Jan 16, 2020

Whole genome comparisons based on Average Nucleotide Identities (ANI), and the Genome-to-genome distance calculator have risen to prominence in rapidly classifying taxa using whole genome sequences. Some implementations have even been proposed as a new standard in species classification and have become a common technique for papers describing newly sequenced genomes. However, attempts to apply whole genome divergence data to delineation of higher taxonomic units, and to phylogenetic inference have had difficulty matching those produced by more complex phylogenetics methods. We present a novel method for generating reliable and statistically supported phylogenies using established ANI techniques. For the test cases to which we applied the developed approach we obtained accurate results up to at least the family level. The developed method uses non-parametric bootstrapping to gauge reliability of inferred groups. This method offers the opportunity make use of whole-genome comparison data that is already being generated to quickly produce accurate phylogenies. Additionally, the developed ANI methodology can assist classification of higher order taxonomic groups.

Genetics

Molecular Biology

0

Paper

Save

Tertiary-interaction characters enable fast, model-based structural phylogenetics beyond the twilight zone

Caroline Puente-Lelièvre et al.Jan 1, 2023

Protein structure is more conserved than protein sequence, and therefore may be useful for phylogenetic inference beyond the "twilight zone" where sequence similarity is highly decayed. Until recently, structural phylogenetics was constrained by the lack of solved structures for most proteins, and the reliance on phylogenetic distance methods which made it difficult to treat inference and uncertainty statistically. AlphaFold has mostly overcome the first problem by making structural predictions readily available. We address the second problem by redeploying a structural alphabet recently developed for Foldseek, a highly-efficient deep homology search program. For each residue in a structure, Foldseek identifies a tertiary interaction closest-neighbor residue in the structure, and classifies it into one of twenty "3Di" states. We test the hypothesis that 3Dis can be used as standard phylogenetic characters using a dataset of 53 structures from the ferritin-like superfamily. We performed 60 IQtree Maximum Likelihood runs to compare structure-free, PDB, and AlphaFold analyses, and default versus custom model sets that include a 3DI-specific rate matrix. Analyses that combine amino acids, 3Di characters, partitioning, and custom models produce the closest match to the structural distances tree of Malik et al. (2020), avoiding the long-branch attraction errors of structure-free analyses. Analyses include standard ultrafast bootstrapping confidence measures, and take minutes instead of weeks to run on desktop computers. These results suggest that structural phylogenetics could soon be routine practice in protein phylogenetics, allowing the re-exploration of many fundamental phylogenetic problems.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save

0

Interaction range of common goods shapes Black Queen dynamics beyond the cheater-cooperator narrative

Matthew Fullmer et al.Jul 19, 2024

Dependencies among microorganisms often appear mutualistic in the lab, as microbes grow faster together than alone. However, according to the Black Queen (BQ) hypothesis, these dependencies are underpinned by the evolutionary benefits from loss-of-function mutations when others in the community can supply the necessary common goods. BQ dynamics often describe a cheater-cooperator scenario, where some ecotypes, the ″cheaters,″ produce no common goods and rely on others, the ″cooperators″, for survival. We have previously proposed that in systems with multiple common goods, an alternative endpoint for BQ dynamics can emerge. This endpoint describes an ecosystem of interdependent ecotypes engaging in ″mutual cheating″, i.e. where common good production is distributed. However, even with multiple goods the common good production can be centralized, i.e. with one ecotype providing all common goods for the ecosystem. Here, we present an eco-evolutionary model that reveals that BQ dynamics can result in both distributed- or centralized common good production. The interaction range, i.e. the number of beneficiaries a producer can support, distinguishes between these two endpoints. While many ecosystems evolve to be maximally distributed or maximally centralized, we also find intermediate ecosystems, where ecotypes that appear redundant are coexisting for long periods of time. Due to the limited interaction range, these redundant ecotypes are unable to distribute the production of common goods fully due to the presence of non-producing types. Despite non-producers thus stalling the division of labor, we observe that sudden structural shifts can occur that purge the non-producers from the ecosystem. Overall, our findings broaden the understanding of BQ dynamics, unveiling complex interactions beyond the simple cheater-cooperator narrative.

Demography

Aerospace Engineering

0

Paper

Demography

Aerospace Engineering

0

Save