ResearchHub | Open Science Community

Population structure, biogeography and transmissibility of Mycobacterium tuberculosis

Luca Freschi et al.Sep 29, 2020

Abstract Mycobacterium tuberculosis is a clonal pathogen proposed to have co-evolved with its human host for millennia, yet our understanding of its genomic diversity and biogeography remains incomplete. Here we use a combination of phylogenetics and dimensionality reduction to reevaluate the population structure of M. tuberculosis , providing the first in-depth analysis of the ancient East African Indian Lineage 1 and the modern Central Asian Lineage 3 and expanding our understanding of Lineages 2 and 4. We assess sub-lineages using genomic sequences from 4,939 pan-susceptible strains and find 30 new genetically distinct clades that we validate in a dataset of 4,645 independent isolates. We characterize sub-lineage geographic distributions and demonstrate a consistent geographically restricted and unrestricted pattern for 20 groups, including three groups of Lineage 1. We assess the transmissibility of the four major lineages by examining the distribution of terminal branch lengths across the M. tuberculosis phylogeny and identify evidence supporting higher transmissibility in Lineages 2 and 4 than 3 and 1 on a global scale. We define a robust expanded barcode of 95 single nucleotide substitutions (SNS) that allows for the rapid identification of 69 Mtb sub-lineages and 26 additional internal groups. Our results paint a higher resolution picture of the Mtb phylogeny and biogeography.

Genetics

Ecology

47

Paper

Save

Genomic sequence characteristics and the empiric accuracy of short-read sequencing

Maximillian Marin et al.Apr 11, 2021

Abstract Background Short-read whole genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences, and sequencing bias, reduce the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. For the clonal pathogen Mycobacterium tuberculosis (Mtb), researchers frequently exclude 10.7% of the genome believed to be repetitive and prone to erroneous variant calls. To benchmark short-read variant calling, we used 36 diverse clinical Mtb isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically study the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias, and GC content. å Results Reference based Illumina variant calling had a recall ≥89.0% and precision ≥98.5% across parameters evaluated. The best balance between precision and recall was achieved by tuning the mapping quality (MQ) threshold, i.e. confidence of the read mapping (recall 85.8%, precision 99.1% at MQ ≥ 40). Masking repetitive sequence content is an alternative conservative approach to variant calling that maintains high precision (recall 70.2%, precision 99.6% at MQ≥40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52 of the 168 PE/PPE genes (34.5%). We present a refined list of low confidence regions and examine the largest sources of variant calling error. Conclusions Our improved approach to variant calling has broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems, and more generally for WGS applications in other organisms.

Genetics

Artificial Intelligence

54

Paper

Save

The role of epistasis in amikacin, kanamycin, bedaquiline, and clofazimine resistance in Mycobacterium tuberculosis complex

Roger Vargas et al.May 8, 2021

ABSTRACT Antibiotic resistance among bacterial pathogens poses a major global health threat. M. tuberculosis complex (MTBC) is estimated to have the highest resistance rates of any pathogen globally. Given the slow growth rate and the need for a biosafety level 3 laboratory, the only realistic avenue to scale up drug-susceptibility testing (DST) for this pathogen is to rely on genotypic techniques. This raises the fundamental question of whether a mutation is a reliable surrogate for phenotypic resistance or whether the presence of a second mutation can completely counteract its effect, resulting in major diagnostic errors (i.e. systematic false resistance results). To date, such epistatic interactions have only been reported for streptomycin that is now rarely used. By analyzing more than 31,000 MTBC genomes, we demonstrated that eis C-14T promoter mutation, which is interrogated by several genotypic DST assays endorsed by the World Health Organization, cannot confer resistance to amikacin and kanamycin if it coincides with loss-of-function (LoF) mutations in the coding region of eis . To our knowledge, this represents the first definitive example of antibiotic reversion in MTBC. Moreover, we raise the possibility that mmpR ( Rv0678 ) mutations are not valid markers of resistance to bedaquiline and clofazimine if these coincide with LoF mutation in the efflux pump encoded by mmpS5 ( Rv0677c ) and mmpL5 ( Rv0676c ).

Genetics

Epidemiology

0

Paper

Save

Phase variation as a major mechanism of adaptation inMycobacterium tuberculosiscomplex

Roger Vargas et al.Jun 10, 2022

ABSTRACT Phase variation induced by insertions and deletions (INDELs) in genomic homopolymeric tracts (HT) can silence and regulate genes in pathogenic bacteria but this process is not characterized in MTBC adaptation. We leverage 31,428 diverse clinical isolates to identify genomic regions including phase-variants under positive selection. Of 87,651 INDEL events that emerge repeatedly across the phylogeny, 12.4% are phase-variants within HTs (0.02% of the genome by length). We estimated the in-vitro frameshift rate in a neutral HT at 100x the neutral substitution rate at 1.1 × 10 −5 frameshifts/HT/year. Using neutral evolution simulations, we identified 4,098 substitutions and 45 phase-variants to be putatively adaptive to MTBC (P<0.002). We experimentally confirm that a putatively adaptive phase-variant alters the expression of espA, a critical mediator of ESX-1 dependent virulence. Our evidence supports a new hypothesis that phase variation in the ESX-1 system of MTBC can act as a toggle between antigenicity and survival in the host.

Genetics

Epidemiology

16

Paper

Save

GenTB: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learning

Matthias Gröschel et al.Mar 29, 2021

ABSTRACT Introduction Multidrug-resistant Mycobacterium tuberculosis ( Mtb ) is a significant global public health threat. Genotypic resistance prediction from Mtb DNA sequences offers an alternative to laboratory-based drug-susceptibility testing. User-friendly and accurate resistance prediction tools are needed to enable public health and clinical practitioners to rapidly diagnose resistance and inform treatment regimens. Methods We present Translational Genomics platform for Tuberculosis (GenTB), a web-based application to predict antibiotic resistance from next-generation sequence data. The user can choose between two potential predictors, a Random Forest (RF) classifier and a Wide and Deep Neural Network (WDNN) to predict phenotypic resistance to 13 and 10 anti-tuberculosis drugs, respectively. We benchmark GenTB’s predictive performance along with leading TB resistance prediction tools (Mykrobe and TB-Profiler) using a ground truth dataset of 20,408 isolates with laboratory-based drug susceptibility data. Results All four tools reliably predicted resistance to first-line tuberculosis drugs but had varying performance for second-line drugs. The mean sensitivities for GenTB-RF and GenTB-WDNN across the nine shared drugs was 77.6% (95% CI 76.6 - 78.5%) and 75.4% (95% CI 74.5 - 76.4%) respectively, and marginally higher than the sensitivities of TB-Profiler at 74.4% (95% CI 73.4 - 75.3%) and Mykrobe at 71.9% (95% CI 70.9 - 72.9%). The higher sensitivities were at an expense of ≤1.5% lower specificity: Mykrobe 97.6% (95% CI 97.5 - 97.7%), TB-Profiler 96.9% (95% CI 96.7 to 97.0%), GenTB-WDNN 96.2% (95% CI 96.0 to 96.4%), and GenTB-RF 96.1% (95% CI 96.0 to 96.3%). Genotypic resistance sensitivity was 11% and 9% lower for isoniazid and rifampicin respectively, on isolates sequenced at low depth (<10x across 95% of the genome) emphasizing the need to quality control input sequence data before prediction. We discuss differences between tools in reporting results to the user including variants underlying the resistance calls and any novel or indeterminate variants Conclusion GenTB is an easy-to-use online tool to rapidly and accurately predict resistance to anti-tuberculosis drugs. GenTB can be accessed online at https://gentb.hms.harvard.edu , and the source code is available at https://github.com/farhat-lab/gentb-site .

Genetics

Ecology

13

Paper

Save

The discovery of genome-wide mutational dependence in naturally evolving populations

Anna Green et al.Jun 28, 2022

Abstract Background Evolutionary pressures on bacterial pathogens can result in phenotypic change including increased virulence, drug resistance, and transmissibility. Understanding the evolution of these phenotypes in nature and the multiple genetic changes needed has historically been difficult due to sparse and contemporaneous sampling. A complete picture of the evolutionary routes frequently travelled by pathogens would allow us to better understand bacterial biology and potentially forecast pathogen population shifts. Methods In this work, we develop a phylogeny-based method to assess evolutionary dependency between mutations. We apply our method to a dataset of 31,428 Mycobacterium tuberculosis complex (MTBC) genomes, a globally prevalent bacterial pathogen with increasing levels of antibiotic resistance. Results We find evolutionary dependency within simultaneously- and sequentially-acquired variation, and identify that genes with dependent sites are enriched in antibiotic resistance and antigenic function. We discover 20 mutations that potentiate the development of antibiotic resistance and 1,003 dependencies that evolve as a consequence antibiotic resistance. Varying by antibiotic, between 9% and 80% of resistant strains harbor a dependent mutation acquired after a resistance-conferring variant. We demonstrate that mutational dependence can not only improve prediction of phenotype (e.g. antibiotic resistance), but can also detect sequential environmental pressures on the pathogen (e.g. the pressures imposed by sequential antibiotic exposure during the course of standard multi-antibiotic treatment). Taken together, our results demonstrate the feasibility and utility of detecting dependent events in the evolution of natural populations. Data and code available at: https://github.com/farhat-lab/DependentMutations

Genetics

Molecular Biology

25

Paper

Save

Differential rates of Mycobacterium tuberculosis transmission associate with host–pathogen sympatry

Matthias Gröschel et al.Aug 1, 2024

Several human-adapted Mycobacterium tuberculosis complex (Mtbc) lineages exhibit a restricted geographical distribution globally. These lineages are hypothesized to transmit more effectively among sympatric hosts, that is, those that share the same geographical area, though this is yet to be confirmed while controlling for exposure, social networks and disease risk after exposure. Using pathogen genomic and contact tracing data from 2,279 tuberculosis cases linked to 12,749 contacts from three low-incidence cities, we show that geographically restricted Mtbc lineages were less transmissible than lineages that have a widespread global distribution. Allopatric host–pathogen exposure, in which the restricted pathogen and host are from non-overlapping areas, had a 38% decrease in the odds of infection among contacts compared with sympatric exposures. We measure tenfold lower uptake of geographically restricted lineage 6 strains compared with widespread lineage 4 strains in allopatric macrophage infections. We conclude that Mtbc strain–human long-term coexistence has resulted in differential transmissibility of Mtbc lineages and that this differs by human population. Epidemiological analysis of Mycobacterium tuberculosis genomes and public health data show that lineage-specific variation in transmission varies with the degree of host and pathogen geographical coincidence and reveals signals of a biological effect of host–pathogen coexistence.

Genetics

Ecology

0

Paper

Save

Common gene signature model discovery and systematic validation for TB prognosis and response to treatment

Roger Vargas et al.Nov 30, 2022

ABSTRACT While blood gene signatures have shown promise in tuberculosis (TB) diagnosis and treatment monitoring, most signatures derived from a single cohort may be insufficient to capture TB heterogeneity in populations and individuals. Here we report a new generalized approach combining a network-based meta-analysis with machine-learning modeling to leverage the power of heterogeneity among studies. The transcriptome datasets from 57 studies (37 TB and 20 viral infections) across demographics and TB disease states were used for gene signature discovery and model training and validation. The network-based meta-analysis identified a common 45-gene signature specific to active TB disease across studies. Two optimized random forest regression models, using the full or partial 45-gene signature, were then established to model the continuum from Mycobacterium tuberculosis infection to disease and treatment response. In model validation, using pooled multi-cohort datasets to mimic the real-world setting, the model provides robust predictive performance for incipient to active TB risk over a 2.5-year period with an AUROC of 0.85, 74.2% sensitivity, and 78.3% specificity, which approximated the minimum criteria (>75% sensitivity and >75% specificity) within the WHO target product profile for prediction of progression to TB. Moreover, the model strongly discriminates active TB from viral infection (AUROC 0.93, 95% CI 0.91-0.94). For treatment monitoring, the TB scores generated by the model statistically correlate with treatment responses over time and were predictive, even before treatment initiation, of standard treatment clinical outcomes. We demonstrate an end-to-end gene signature model development scheme that considers heterogeneity for TB risk estimation and treatment monitoring. AUTHOR SUMMARY An early diagnosis for incipient TB is a one of the key approaches to reduce global TB deaths and incidence, particularly in low and middle-income countries. However, in appreciation of TB heterogenicity at the population and individual level due to TB pathogenesis, host genetics, demographics, disease comorbidities and technical variations from sample collecting and gene profiling, the responses of the molecular gene signatures have showed to be associated with these diverse factors In this work, we develop a new computational approach that combines a network-based meta-analysis with machine-learning modeling to address the existing challenge of early incipient TB prediction against TB heterogenicity. With this new approach, we harness the power of TB heterogeneity in diverse populations and individuals during model construction by including massive datasets (57 studies in total) that allow us not only to consider different confounding variables inherited from each cohort while identifying the common gene set and building the predictive model, but also to systematically validate the model by pooling the datasets to mimic the real-world setting. This generalized predicting model provides a robust prediction of long-term TB risk estimation (>30 months to TB disease). In addition, this model also demonstrates the utility in TB treatment monitoring along with Mycobacterium tuberculosis elimination.

Genetics

Oncology

9

Paper

Save

Genotypic clustering does not imply recent tuberculosis transmission in a high prevalence setting: A genomic epidemiology study in Lima, Peru

Avika Dixit et al.Sep 16, 2018

Background: Whole genome sequencing (WGS) can elucidate Mycobacterium tuberculosis (Mtb) transmission patterns but more data is needed to guide its use in high-burden settings. In a household-based transmissibility study of 4,000 TB patients in Lima, Peru, we identified a large MIRU-VNTR Mtb cluster with a range of resistance phenotypes and studied host and bacterial factors contributing to its spread. Methods: WGS was performed on 61 of 148 isolates in the cluster. We compared transmission link inference using epidemiological or genomic data with and without the inclusion of controversial variants, and estimated the dates of emergence of the cluster and antimicrobial drug resistance acquisition events by generating a time-calibrated phylogeny. We validated our findings in genomic data from an outbreak of 325 TB cases in London. Using a larger set of 12,032 public Mtb genomes, we determined bacterial factors characterizing this cluster and under positive selection in other Mtb lineages. Findings: Four isolates were distantly related and the remaining 57 isolates diverged ca. 1968 (95% HPD: 1945-1985). Isoniazid resistance arose once, whereas rifampicin resistance emerged subsequently at least three times. Amplification of other drug resistance occurred as recently as within the last year of sampling. High quality PE/PPE variants and indels added information for transmission inference. We identified five cluster-defining SNPs, including esxV S23L to be potentially contributing to transmissibility. Interpretation: Clusters defined by MIRU-VNTR typing, could be circulating for decades in a high-burden setting. WGS allows for an improved understanding of transmission, as well as bacterial resistance and fitness factors. Funding: The study was funded by the National Institutes of Health (Peru Epi study U19-AI076217 and K01-ES026835 to MRF). The funding sources had no role in any aspect of the study, manuscript or decision to submit it for publication.

Genetics

Epidemiology

0

Paper

Save

Single nucleotide variation catalogue from clinical isolates mapped on tertiary and quaternary structures of ESX-1 related proteins reveals critical regions as putative Mtb therapeutic targets

Oren Tzfadia et al.Jun 23, 2023

Abstract Proteins encoded by the ESX-1 genes of interests are essential for full virulence in all Mycobacterium tuberculosis complex (MTBc) lineages, the pathogens with the highest mortality worldwide. Identifying critical regions in these ESX-1 related proteins could provide preventive or therapeutic targets for MTB infection, the game changer needed for tuberculosis control. We analysed a compendium of whole genome sequences of clinical MTB isolates from all lineages from >32,000 patients and identified single nucleotide variations (SNV). When mutations corresponding to all nonsynonymous SNPs were mapped on the surface of known and AlphaFold-predicted ternary protein structures, fully conserved regions emerged. Some could be assigned to known quaternary structures, whereas others could be predicted to be involved in yet-to-be-discovered interactions. Some mutants had clonally expanded (found in >1% of the isolates): these were mostly located at the surface of globular domains, remote from known intra- and inter-molecular protein–protein interactions. Fully conserved intrinsically disordered regions (IDRs) of proteins were found, suggesting that these are crucial for the pathogenicity of the MTBc. Altogether, our findings provide an evolutionary structural perspective on MTB virulence and highlight fully conserved regions of proteins as attractive vaccine antigens and drug targets. Extending this approach to other pathogens can provide a novel critical resource for the development of innovative tools for pathogen control.

Genetics

Epidemiology

1

Paper

Genetics

Epidemiology

0

Save