ResearchHub | Open Science Community

Duplications drive diversity in Bordetella pertussis on an underestimated scale.

Jonathan Abrahams et al.Feb 7, 2020

Bacterial genetic diversity is often described using solely base pair changes despite a wide variety of other mutation types likely being major contributors. Tandem duplications of genomic loci are thought to be widespread among bacteria but due to their often intractable size and instability, comprehensive studies of the range and genome dynamics of these mutations are rare. We define a methodology to investigate duplications in bacterial genomes based on read depth of genome sequence data as a proxy for copy number. We demonstrate the approach with Bordetella pertussis, whose insertion sequence element-rich genome provides extensive scope for duplications to occur. Analysis of genome sequence data for 2430 B. pertussis isolates identified 272 putative duplications, of which 94% were located at 11 hotspot loci. We demonstrate limited phylogenetic connection for the occurrence of duplications, suggesting unstable and sporadic characteristics. Genome instability was further described in-vitro using long read sequencing via the Nanopore platform. Clonally derived laboratory cultures produced heterogenous populations containing multiple structural variants. Short read data was used to predict 272 duplications, whilst long reads generated on the Nanopore platform enabled the in-depth study of the genome dynamics of tandem duplications in B. pertussis. Our work reveals the unrecognised and dynamic genetic diversity of B. pertussis and, as the complexity of the B. pertussis genome is not unique, highlights the need for a holistic and fundamental understanding of bacterial genetics.

Resolving the complex Bordetella pertussis genome using barcoded nanopore sequencing

Natalie Ring et al.Jul 31, 2018

The genome of Bordetella pertussis is complex, with high GC content and many repeats, each longer than 1,000 bp. Short-read DNA sequencing is unable to resolve the structure of the genome; however, long-read sequencing offers the opportunity to produce single-contig B. pertussis assemblies using sequencing reads which are longer than the repetitive sections. We used an R9.4 MinION flow cell and barcoding to sequence five B. pertussis strains in a single sequencing run. We then trialled combinations of the many nanopore-user-community-built long-read analysis tools to establish the current optimal assembly pipeline for B. pertussis genome sequences. Our best long-read-only assemblies were produced by Canu read correction followed by assembly with Flye and polishing with Nanopolish, whilst the best hybrids (using nanopore and Illumina reads together) were produced by Canu correction followed by Unicycler. This pipeline produced closed genome sequences for four strains, revealing inter-strain genomic rearrangement. However, read mapping to the Tohama I reference genome suggests that the remaining strain contains an ultra-long duplicated region (over 100 kbp), which was not resolved by our pipeline. We have therefore demonstrated the ability to resolve the structure of several B. pertussis strains per single barcoded nanopore flow cell, but the genomes with highest complexity (e.g. very large duplicated regions) remain only partially resolved using the standard library preparation and will require an alternative library preparation method. For full strain characterisation, we recommend hybrid assembly of long and short reads together; for comparison of genome arrangement, assembly using long reads alone is sufficient.