ResearchHub | Open Science Community

Insurgence and worldwide diffusion of genomic variants in SARS-CoV-2 genomes

Francesco Comandatore et al.Apr 30, 2020

Abstract The SARS-CoV-2 pandemic that we are currently experiencing is exerting a massive toll both in human lives and economic impact. One of the challenges we must face is to try to understand if and how different variants of the virus emerge and change their frequency in time. Such information can be extremely valuable as it may indicate shifts in aggressiveness, and it could provide useful information to trace the spread of the virus in the population. In this work we identified and traced over time 7 amino acid variants that are present with high frequency in Italy and Europe, but that were absent or present at very low frequencies during the first stages of the epidemic in China and the initial reports in Europe. The analysis of these variants helps defining 6 phylogenetic clades that are currently spreading throughout the world with changes in frequency that are sometimes very fast and dramatic. In the absence of conclusive data at the time of writing, we discuss whether the spread of the variants may be due to a prominent founder effect or if it indicates an adaptive advantage.

Genetics

Philosophy

31

Paper

Save

The origin and evolution of mitochondrial tropism inMidichloriabacteria

Anna Floriano et al.May 16, 2022

Abstract Midichloria are intracellular bacterial symbionts of ticks. Some representatives of this genus have the unique capability to colonize mitochondria in the cells of their hosts. Hypotheses on the nature of this interaction have proven difficult to test, partly due to a lack of data. Indeed, until now, mitochondrial tropism information and genomes were available only for symbionts of three and two tick host species, respectively. Here we analyzed the mitochondrial tropism of three additional Midichloria and sequenced nine novel genomes, showing that the tropism is pnon-monophyletic, either due to losses of the trait or multiple parallel acquisitions. Comparative genome analyses support the first hypothesis, as the genomes of non-mitochondrial symbionts appear to be reduced subsets of those capable of colonizing the organelles. We detect genomic signatures of mitochondrial tropism, showing a set of candidate genes characteristic of the strains capable of mitochondrial colonization. These include the type IV secretion system and the flagellum, which could allow the secretion of unique effectors, direct interaction with, or invasion of the mitochondria. Other genes, including putative adhesion molecules, proteins possibly involved in actin polymerization, cell wall and outer membrane proteins, are only present in mitochondrial symbionts. The bacteria could use these to manipulate host structures, including mitochondrial membranes, in order to fuse with the organelles or manipulate the mitochondrial network.

Genetics

Biochemistry

10

Paper

Save

P-DOR, an easy-to-use pipeline to reconstruct outbreaks using pathogen genomics

Gherard Biffignandi et al.Jun 1, 2023

Summary Bacterial Healthcare Associated Infections (HAIs) are a major threat worldwide, which can be counteracted by establishing effective infection control measures, guided by constant surveillance and timely epidemiological investigations. Genomics is crucial in modern epidemiology but lacks standard methods and user-friendly software, accessible to users without a strong bioinformatics proficiency. To overcome these issues we developed P-DOR, a novel tool for rapid bacterial outbreak characterization. P-DOR accepts genome assemblies as input, it automatically selects a background of publicly available genomes using k-mer distances and adds it to the analysis dataset before inferring a SNP-based phylogeny. Epidemiological clusters are identified considering the phylogenetic tree topology and SNP distances. By analyzing the SNP-distance distribution, the user can gauge the correct threshold. Patient metadata can be inputted as well, to provide a spatio-temporal representation of the outbreak. The entire pipeline is fast and scalable and can be also run on low-end computers. Availability and implementation P-DOR is implemented in Python3 and R and can be installed using conda environments. It is available from GitHub https://github.com/SteMIDIfactory/P-DOR under the GPL-3.0 license.

Genetics

Artificial Intelligence

2

Paper

Save

Comparative genomics reveals the emergence of an outbreak-associated Cryptosporidium parvum population in Europe and its spread to the USA

Greta Bellinzona et al.Jan 1, 2023

The zoonotic parasite Cryptosporidium parvum is a global cause of gastrointestinal disease in humans and ruminants. Sequence analysis of the highly polymorphic gp60 gene enabled the classification of C. parvum isolates into multiple groups (e.g. IIa, IIc, Id) and a large number of subtypes. In Europe, subtype IIaA15G2R1 is largely predominant and has been associated with many water- and food-borne outbreaks. In this study, we generated new whole genome sequence (WGS) data from 123 human- and ruminant-derived isolates collected in 13 European countries and included other available WGS data from Europe, Egypt, China and the USA (n=72) in the largest comparative genomics study to date. We applied rigorous filters to exclude mixed infections and analysed a dataset from 141 isolates from the zoonotic groups IIa (n=119) and IId (n=22). Based on 28,047 high quality, biallelic genomic SNPs, we identified three distinct and strongly supported populations: isolates from China (IId) and Egypt (IIa and IId) formed population 1, a minority of European isolates (IIa and IId) formed population 2, while the majority of European (IIa, including all IIaA15G2R1 isolates) and all isolates from the USA (IIa) clustered in population 3. Based on analyses of the population structure, population genetics and recombination, we show that population 3 has recently emerged and expanded throughout Europe to then, possibly from the UK, reach the USA where it also expanded. In addition, genetic exchanges between different populations led to the formation of mosaic genomes. The reason(s) for the successful spread of population 3 remained elusive, although genes under selective pressure uniquely in this population were identified.

Genetics

Microbiology

0

Paper

Save

Genetic barriers more than ecological adaptations shaped Serratia marcescens diversity

Lodovico Sterzi et al.Jan 1, 2023

Bacterial species often comprise well-separated lineages, likely emerged and maintained by genetic isolation and/or ecological divergence. How these two evolutionary actors interact in the shaping of bacterial population structure is currently not fully understood. In this study, we investigated the genetic and ecological drivers underlying the evolution of Serratia marcescens, an opportunistic pathogen with high genomic flexibility and able to colonise diverse environments. Comparative genomic analyses revealed a population structure composed of five deeply-demarcated genetic clusters with open pan-genome but limited inter-cluster gene flow, partially explained by Restriction-Modification (R-M) systems incompatibility. Furthermore, a large-scale research on hundred-thousands metagenomic datasets revealed only a partial ecological separation of the clusters. Globally, two clusters only showed a peculiar gene composition and evident ecological adaptations. These results suggest that genetic isolation preceded ecological adaptations in the shaping of the species diversity, suggesting an evolutionary scenario for several bacterial species.

Genetics

Ecology

3

Paper

Save

Optimising machine learning prediction of minimum inhibitory concentrations inKlebsiella pneumoniae

Gherard Biffignandi et al.Nov 21, 2023

ABSTRACT Minimum Inhibitory Concentrations (MICs) are the gold standard for quantitatively measuring antibiotic resistance. However, lab-based MIC determination can be time-consuming and suffers from low reproducibility, and interpretation as sensitive or resistant relies on guidelines which change over time. Genome sequencing and machine learning promise to allow in-silico MIC prediction as an alternative approach which overcomes some of these difficulties, albeit the interpretation of MIC is still needed. Nevertheless, precisely how we should handle MIC data when dealing with predictive models remains unclear, since they are measured semi-quantitatively, with varying resolution, and are typically also left- and right-censored within varying ranges. We therefore investigated genome-based prediction of MICs in the pathogen Klebsiella pneumoniae using 4367 genomes with both simulated semi-quantitative traits and real MICs. As we were focused on clinical interpretation, we used interpretable rather than black-box machine learning models, namely, Elastic Net, Random Forests, and linear mixed models. Simulated traits were generated accounting for oligogenic, polygenic, and homoplastic genetic effects with different levels of heritability. Then we assessed how model prediction accuracy was affected when MICs were framed as regression and classification. Our results showed that treating the MICs differently depending on the number of concentration levels of antibiotic available was the most promising learning strategy. Specifically, to optimise both prediction accuracy and inference of the correct causal variants, we recommend considering the MICs as continuous and framing the learning problem as a regression when the number of observed antibiotic concentration levels is large, whereas with a smaller number of concentration levels they should be treated as a categorical variable and the learning problem should be framed as a classification. Our findings also underline how predictive models can be improved when prior biological knowledge is taken into account, due to the varying genetic architecture of each antibiotic resistance trait. Finally, we emphasise that incrementing the population database is pivotal for the future clinical implementation of these models to support routine machine-learning based diagnostics. Data Summary The scripts used to run and fit the models can be found at https://github.com/gbatbiff/Kpneu_MIC_prediction . The Illumina sequences from Thorpe et al. are available from the European Nucleotide Archive under accession PRJEB27342 . All the other genomes are available on https://www.bv-brc.org/ database. Impact statement Klebsiella pneumoniae is a leading cause of hospital and community acquired infections worldwide, highly contributing to the global burden of antimicrobial resistance (AMR). Ordinary methods to assess antibiotic resistance are not always satisfactory, and may not be effective in terms of costs and delays, so robust methods able to accurately predict AMR are increasingly needed. Genome-based prediction of minimum inhibitory concentrations (MICs) through machine learning methods is a promising tool to assist clinical diagnosis, also offsetting phenotypic MIC discordance between the different culture-based assays. However, benchmarking predictive models against phenotypic data is problematic due to inconsistencies in the way these data are generated and how they should be handled remains unclear. In this work, we focused on genome-based prediction of MIC and evaluated the performance of interpretable machine learning models across different genetic architectures and data encodings. Our workflow highlighted how MICs need to be treated as different types of data depending on the method used to measure them, in particular considering each antibiotic separately. Our findings shed further light on the factors affecting model performance, paving the way to future improvements of antibiotic resistance prediction.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save

0

How to measure bacterial genome plasticity? A novel time-integrated index helps gather insights on pathogens

Greta Bellinzona et al.Jan 23, 2024

Abstract Genome plasticity can be defined as the capacity of a bacterial population to swiftly gain or lose genes. The time factor plays a fundamental role for the evolutionary success of microbes, particularly when considering pathogens and their tendency to gain antimicrobial resistance factors under the pressure of the extensive use of antibiotics. Multiple metrics have been proposed to provide insights into the gene content repertoire, yet they overlook the temporal component, which has a critical role in determining the adaptation and survival of a bacterial strain. In this study, we introduce a novel index that incorporates the time dimension to assess the rate at which bacteria exchange genes, thus fitting the definition of plasticity. Opposite to available indices, our method also takes into account the possibility of contiguous genes being transferred together in one single event. We applied our novel index to measure plasticity in three widely studied bacterial species: Klebsiella pneumoniae , Staphylococcus aureus , and Escherichia coli . Our results highlight distinctive plasticity patterns in specific sequence types and clusters, suggesting a possible correlation between heightened genome plasticity and globally recognized high-risk clones. Our approach holds promise as an index for predicting the emergence of strains of potential clinical concern, possibly allowing for timely and more effective interventions.

Genetics

Ecology

0

Paper

Genetics

Ecology

0

Save