ResearchHub | Open Science Community

Comparison of R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome reconstruction

Nicholas Sanderson et al.Oct 24, 2023

2. Abstract Complete, accurate, cost-effective, and high-throughput reconstruction of bacterial genomes for large-scale genomic epidemiological studies is currently only possible with hybrid assembly, combining long- (typically using nanopore sequencing) and short-read (Illumina) datasets. Being able to utilise nanopore-only data would be a significant advance. Oxford Nanopore Technologies (ONT) have recently released a new flowcell (R10.4) and chemistry (Kit12), which reportedly generate per-read accuracies rivalling those of Illumina data. To evaluate this, we sequenced DNA extracts from four commonly studied bacterial pathogens, namely Escherichia coli , Klebsiella pneumoniae , Pseudomonas aeruginosa and Staphylococcus aureus , using Illumina and ONT’s R9.4.1/Kit10, R10.3/Kit12, R10.4/Kit12 flowcells/chemistries. We compared raw read accuracy and assembly accuracy for each modality, considering the impact of different nanopore basecalling models, commonly used assemblers, sequencing depth, and the use of duplex versus simplex reads. “Super accuracy” (sup) basecalled R10.4 reads - in particular duplex reads - have high per-read accuracies and could be used to robustly reconstruct bacterial genomes without the use of Illumina data. However, the per-run yield of duplex reads generated in our hands with standard sequencing protocols was low (typically <10%), with substantial implications for cost and throughput if relying on nanopore data only to enable bacterial genome reconstruction. In addition, recovery of small plasmids with the best-performing long-read assembler (Flye) was inconsistent. R10.4/Kit12 combined with sup basecalling holds promise as a singular sequencing technology in the reconstruction of commonly studied bacterial genomes, but hybrid assembly (Illumina+R9.4.1 hac) currently remains the highest throughput, most robust, and cost-effective approach to fully reconstruct these bacterial genomes. 3. Impact statement Our understanding of microbes has been greatly enhanced by the capacity to evaluate their genetic make-up using a technology known as whole genome sequencing. Sequencers represent microbial genomes as stretches of shorter sequence known as ‘reads’, which are then assembled using computational algorithms. Different types of sequencing approach have advantages and disadvantages with respect to the accuracy and length of the reads they generate; this in turn affects how reliably genomes can be assembled. Currently, to completely reconstruct bacterial genomes in a high-throughput and cost-effective manner, researchers tend to use two different types of sequencing data, namely Illumina (short-read) and nanopore (long-read) data. Illumina data are highly accurate; nanopore data are much longer, and this combination facilitates accurate and complete bacterial genomes in a so-called “hybrid assembly”. However, new developments in nanopore sequencing have reportedly greatly improved the accuracy of nanopore data, hinting at the possibility of requiring only a single sequencing approach for bacterial genomics. Here we evaluate these improvements in nanopore sequencing in the reconstruction of four bacterial reference strains, where the true sequence is already known. We show that although these improvements are extremely promising, for high-throughput, low-cost complete reconstruction of bacterial genomes hybrid assembly currently remains the optimal approach. 4. Data summary The authors confirm all supporting data, code and protocols have been provided within the article, through supplementary data files, or in publicly accessible repositories. Nanopore fast5 and fastq data are available in the ENA under project accession: PRJEB51164. Assemblies have been made available at: https://figshare.com/articles/online_resource/q20_comparison_genome_assemblies/196838 67. Code and analysis outputs are available at: https://gitlab.com/ModernisingMedicalMicrobiology/assembly_comparison_analysis/-/tree/main (tagged version v0.5.5).

Nanopore Sequencing

Minion

Nanopore

58

Paper

Save

Niche and local geography shape the pangenome of wastewater- and livestock-associated Enterobacteriaceae

Liam Shaw et al.Oct 24, 2023

Escherichia coli and other Enterobacteriaceae are highly diverse species with ‘open’ pangenomes 1,2 , where genes move intra- and inter-species via horizontal gene transfer 3 . These species can cause clinical infections 4,5 as well as persist environmentally 6,7 . Environmental populations have been suggested as important reservoirs of antimicrobial resistance (AMR) genes. However, as most analyses focus on clinical isolates 8,9 , the pangenome dynamics of natural populations remain understudied, particularly the role of plasmids. Here, we reconstructed near-complete genomes for 828 Enterobacteriaceae , including 553 Escherichia spp. and 275 non- Escherichia species with 2,293 circularised plasmids in total, collected from nineteen locations (livestock farms and wastewater treatment works in the United Kingdom) within a 30km radius at three timepoints over the course of a year. We find different dynamics for the chromosomal and plasmid-borne components of the pangenome, showing that plasmids have a higher burden of both AMR genes and insertion sequences, and AMR plasmids show evidence of being under stronger selective pressure. Focusing on E. coli , we observe that plasmid dynamics are more strongly dominated by niche and local geography, rather than phylogeny or season. Our results highlight the diversity of the AMR reservoir in these species and niches, and the importance of local strategies for controlling the emergence and spread of AMR.

Plasmid

Escherichia Coli

Ecological Niche

68

Paper

Save

Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny

Martin Hunt et al.May 28, 2024

The SARS-CoV-2 genome occupies a unique place in infection biology -- it is the most highly sequenced genome on earth (making up over 20% of public sequencing datasets) with fine scale information on sampling date and geography, and has been subject to unprecedented intense analysis. As a result, these phylogenetic data are an incredibly valuable resource for science and public health. However, the vast majority of the data was sequenced by tiling amplicons across the full genome, with amplicon schemes that changed over the pandemic as mutations in the viral genome interacted with primer binding sites. In combination with the disparate set of genome assembly workflows and lack of consistent quality control (QC) processes, the current genomes have many systematic errors that have evolved with the virus and amplicon schemes. These errors have significant impacts on the phylogeny, and therefore over the last few years, many thousands of hours of researchers time has been spent in "eyeballing" trees, looking for artefacts, and then patching the tree. Given the huge value of this dataset, we therefore set out to reprocess the complete set of public raw sequence data in a rigorous amplicon-aware manner, and build a cleaner phylogeny. Here we provide a global tree of 3,960,704 samples, built from a consistently assembled set of high quality consensus sequences from all available public data as of March 2023, viewable at https://viridian.taxonium.org. Each genome was constructed using a novel assembly tool called Viridian (https://github.com/iqbal-lab-org/viridian), developed specifically to process amplicon sequence data, eliminating artefactual errors and mask the genome at low quality positions. We provide simulation and empirical validation of the methodology, and quantify the improvement in the phylogeny. Phase 2 of our project will address the fact that the data in the public archives is heavily geographically biased towards the Global North. We therefore have contributed new raw data to ENA/SRA from many countries including Ghana, Thailand, Laos, Sri Lanka, India, Argentina and Singapore. We will incorporate these, along with all public raw data submitted between March 2023 and the current day, into an updated set of assemblies, and phylogeny. We hope the tree, consensus sequences and Viridian will be a valuable resource for researchers.

Genome

Amplicon

Phylogenetic Tree

0

Paper

Save

Multi-omic surveillance of Escherichia coli and Klebsiella spp. in hospital sink drains and patients

Bede Constantinides et al.May 7, 2020

Escherichia coli and Klebsiella spp. are important human pathogens that cause a wide spectrum of clinical disease. In healthcare settings, sinks and other wastewater sites have been shown to be reservoirs of antimicrobial-resistant E. coli and Klebsiella spp., particularly in the context of outbreaks of resistant strains amongst patients. Without focusing exclusively on resistance markers or a clinical outbreak, we demonstrate that many hospital sink drains are abundantly and persistently colonised with diverse populations of E. coli, Klebsiella pneumoniae and Klebsiella oxytoca, including both antimicrobial-resistant and susceptible strains. Using whole genome sequencing (WGS) of 439 isolates, we show that environmental bacterial populations are largely structured by ward and sink, with only a handful of lineages, such as E. coli ST635, being widely distributed, suggesting different prevailing ecologies which may vary as a result of different inputs and selection pressures. WGS of 46 contemporaneous patient isolates identified one (2%; 95% CI 0.05-11%) E. coli urine infection-associated isolate with high similarity to a prior sink isolate, suggesting that sinks may contribute to up to 10% of infections caused by these organisms in patients on the ward over the same timeframe. Using metagenomics from 20 sink-timepoints, we show that sinks also harbour many clinically relevant antimicrobial resistance genes including blaCTX-M, blaSHV and mcr, and may act as niches for the exchange and amplification of these genes. Our study reinforces the potential role of sinks in contributing to Enterobacterales infection and antimicrobial resistance in hospital patients, something that could be amenable to intervention.

Klebsiella Oxytoca

Klebsiella Pneumoniae

Escherichia Coli

0

Paper

Klebsiella Oxytoca

Klebsiella Pneumoniae

0

Save

0

DNA Thermo-Protection Facilitates Whole Genome Sequencing of Mycobacteria Direct from Clinical Samples by the Nanopore Platform

Sophie George et al.May 7, 2020

Mycobacterium tuberculosis (MTB) is the leading cause of death from bacterial infection. Improved rapid diagnosis and antimicrobial resistance determination, such as by whole genome sequencing, are required. Our aim was to develop a simple, low-cost method of preparing DNA for Oxford Nanopore Technologies (ONT) sequencing direct from MTB positive clinical samples (without culture). Simultaneous sputum liquefaction, bacteria heat-inactivation (99ºC/30min) and enrichment for Mycobacteria DNA was achieved using an equal volume of thermo-protection buffer (4M KCl, 0.05M HEPES buffer pH7.5, 0.1% DTT). The buffer emulated intracellular conditions found in hyperthermophiles, thus protecting DNA from rapid thermo-degradation, which renders it a poor template for sequencing. Initial validation employed Mycobacteria DNA (extracted or intracellular). Next, mock clinical samples (infection-negative human sputum spiked 0-105 BCG cells/ml) underwent liquefaction in thermo-protection buffer and heat-inactivation. DNA was extracted and sequenced. Human DNA degraded faster than Mycobacteria DNA, resulting in target enrichment. Four replicate experiments each demonstrated detection at 101 BCG cells/ml, with 31-59 MTB complex reads. Maximal genome coverage (>97% at 5x-depth) was achieved at 104 BCG cells/ml; >91% coverage (1x depth) at 103 BCG cells/ml. Final validation employed MTB positive clinical samples (n=20), revealed initial sample volumes ≥1ml typically yielded higher mean depth of MTB genome coverage, the overall range 0.55-81.02. A mean depth of 3 gave >96% one-fold TB genome coverage (in 15/20 clinical samples). A mean depth of 15 achieved >99% five-fold genome coverage (in 9/20 clinical samples). In summary, direct-from-sample sequencing of MTB genomes was facilitated by a low cost thermo-protection buffer.

Hyperthermophile

Dna

Sputum

0

Paper

Save

A workflow for the detection of antibiotic residues, measurement of water chemistry and preservation of hospital sink drain samples for metagenomic sequencing

Gillian Rodger et al.Sep 23, 2023

Structured summary Background Hospital sinks are environmental reservoirs that harbour healthcare-associated (HCA) pathogens. Selective pressures in sink environments, such as antibiotic residues, nutrient waste and hardness ions, may promote antibiotic resistance gene (ARG) exchange between bacteria. However, cheap and accurate sampling methods to characterise these factors are lacking. Aim To validate a workflow to detect antibiotic residues and evaluate water chemistry using dipsticks. Secondarily, to validate boric acid to preserve the taxonomic and ARG (″resistome″) composition of sink trap samples for metagenomic sequencing. Methods Antibiotic residue dipsticks were validated against serial dilutions of ampicillin, doxycycline, sulfamethoxazole and ciprofloxacin, and water chemistry dipsticks against serial dilutions of chemical calibration standards. Sink trap aspirates were used for a ″real-world″ pilot evaluation of dipsticks. To assess boric acid as a preservative of microbial diversity, the impact of incubation with and without boric acid at ~22°C on metagenomic sequencing outputs was evaluated at Day 2 and Day 5 compared with baseline (Day 0). Findings The limits of detection for each antibiotic were: 3μg/L (ampicillin), 10μg/L (doxycycline), 20μg/L (sulfamethoxazole) and 8μg/L (ciprofloxacin). The best performing water chemistry dipstick correctly characterised 34/40 (85%) standards in a concentration-dependent manner. One trap sample tested positive for the presence of tetracyclines and sulfonamides. Taxonomic and resistome composition were largely maintained after storage with boric acid at ~22°C for up to five days. Conclusions Dipsticks can be used to detect antibiotic residues and characterise water chemistry in sink trap samples. Boric acid was an effective preservative of trap sample composition, representing a low-cost alternative to cold-chain transport.

Metagenomics

Boric Acid

Ciprofloxacin

0

Paper

Save

Genomic Sequencing from Sputum for Tuberculosis Disease Diagnosis, Lineage Determination and Drug Susceptibility Prediction

Kayzad Nilgiriwala et al.Oct 24, 2023

+14

M

K

Abstract Background Universal access to drug susceptibility testing for newly diagnosed tuberculosis patients is recommended. Access to culture-based diagnostics remains limited and targeted molecular assays are vulnerable to emerging resistance conferring mutations. Improved sample preparation protocols for direct-from-sputum sequencing of Mycobacterium tuberculosis would accelerate access to comprehensive drug susceptibility testing and molecular typing. Methods We assessed a thermo-protection buffer-based direct-from-sample M. tuberculosis whole-genome sequencing protocol. We prospectively processed and analyzed 60 acid-fast bacilli smear-positive sputum samples from tuberculosis patients in India and Madagascar. A diversity of semi-quantitative smear positivity level samples were included. Sequencing was performed using Illumina and MinION (monoplex and multiplex) technologies. We measured the impact of bacterial inoculum and sequencing platforms on M. tuberculosis genomic mean read depth, drug susceptibility prediction performance and typing accuracy. Results M. tuberculosis was identified from 88% (Illumina), 89% (MinION-monoplex) and 83% (MinION-multiplex) of samples for which sufficient DNA could be extracted. The fraction of M. tuberculosis reads from MinION sequencing was lower than from Illumina, but monoplexing grade 3+ sputum samples on MinION produced higher read depth than Illumina ( p <0.05) and MinION multiplex ( p <0.01). No significant difference in overall sensitivity and specificity of drug susceptibility predictions was seen across these sequencing modalities or within each sequencing technology when stratified by smear grade. Lineage typing agreement percentages between direct and culture-based sequencing were 85% (MinION-monoplex), 88% (Illumina) and 100% (MinION-multiplex) Conclusions M. tuberculosis direct-from-sample whole-genome sequencing remains challenging. Improved and affordable sample treatment protocols are needed prior to clinical deployment.

Minion

Tuberculosis

Multiplex

7

Paper

Save

Comparison of direct cDNA and PCR-cDNA Nanopore sequencing ofEscherichia coliisolates

Gillian Rodger et al.Jan 24, 2024

2. Abstract Whole-transcriptome (long-read) RNA sequencing (Oxford Nanopore Technologies, ONT) holds promise for agnostic analysis of differential gene expression (DGE) in pathogenic bacteria, including for antimicrobial resistance genes (ARGs). However, direct cDNA ONT sequencing requires large concentrations of polyadenylated mRNA, and amplification protocols may introduce technical bias. Here we evaluated the impact of direct cDNA and cDNA PCR-based ONT sequencing on transcriptomic analysis of clinical Escherichia coli . Four E. coli bloodstream infection-associated isolates (n=2 biological replicates/isolate) were sequenced using the ONT Direct cDNA Sequencing SQK-DCS109 and PCR-cDNA Barcoding SQK-PCB111.24 kits. Biological and technical replicates were distributed over 8 flow cells using 16 barcodes to minimise batch/barcoding bias. Reads were mapped to a transcript reference and transcript abundance quantified after in silico depletion of low abundance and rRNA genes. We found there were strong correlations between read counts using both kits and when restricting the analysis to include only ARGs. We highlighted correlations were weaker for genes with a higher GC content. Read lengths were longer for the direct cDNA kit compared to the PCR-cDNA kit whereas total yield was higher for the PCR-cDNA kit. In this small but methodologically rigorous evaluation of biological and technical replicates of isolates sequenced with the direct cDNA and PCR-cDNA ONT sequencing kits, we demonstrated that PCR-based amplification substantially improves yield with largely unbiased assessment of core gene and ARG expression. However, users of PCR-based kits should be aware of a small risk of technical bias which appears greater for genes with an unusually high (>52%)/low (<44%) GC-content. 3. Impact statement RNA sequencing allows quantification of RNA within a biological sample providing information on the expression of genes at a particular time. This helps understand the expression of antimicrobial resistance genes (ARGs). In RNA-Seq experimental workflows extra steps of reverse transcription may be needed to generate more stable cDNA to allow for amplification by PCR if starting RNA input was low. Two current methods of long-read RNA sequencing include direct cDNA and PCR-cDNA based sequencing (Oxford Nanopore Technologies, ONT). However, few studies have compared these two methods of RNA-sequencing using clinical bacterial isolates. We therefore undertook a study to compare both kits using a methodological balanced design of biological and technical replicates of E. coli . Our study showed that direct cDNA and PCR-cDNA sequencing is highly reproducible between biological and technical E. coli replicates with very small differences in gene expression signatures generated between kits. The PCR-cDNA kit generates increased sequencing yield but a smaller proportion of mappable reads, the generation of shorter reads of lower quality and some PCR-associated bias. PCR-based amplification greatly increased sequencing yield of core genes and ARGs, however there may be a small risk of PCR-bias in genes that have a higher GC content. 4. Data summary The transcript reads of the four sequenced Escherichia coli strains have been deposited in the Figshare, DOI: 10.6084/m9.figshare.25044051. The authors confirm all supporting data (available in Figshare), code (available at: https://github.com/samlipworth/rna_methods ) and protocols have been provided within the article or through supplementary data files.

Complementary Dna

Biology

Rapid Amplification Of Cdna Ends

0

Paper

Complementary Dna

Biology

0

Save