ResearchHub | Open Science Community

Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

Mette Larsen et al.Jan 12, 2012

ABSTRACT Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the “gold standard” of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST .

Genetics

Microbiology

0

Paper

Save

The minimum information about a genome sequence (MIGS) specification

Dawn Field et al.May 1, 2008

With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.

Genetics

Ecology

0

Paper

Save

Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88

Herman Pel et al.Jan 28, 2007

The filamentous fungus Aspergillus niger is widely exploited by the fermentation industry for the production of enzymes and organic acids, particularly citric acid. We sequenced the 33.9-megabase genome of A. niger CBS 513.88, the ancestor of currently used enzyme production strains. A high level of synteny was observed with other aspergilli sequenced. Strong function predictions were made for 6,506 of the 14,165 open reading frames identified. A detailed description of the components of the protein secretion pathway was made and striking differences in the hydrolytic enzyme spectra of aspergilli were observed. A reconstructed metabolic network comprising 1,069 unique reactions illustrates the versatile metabolism of A. niger. Noteworthy is the large number of major facilitator superfamily transporters and fungal zinc binuclear cluster transcription factors, and the presence of putative gene clusters for fumonisin and ochratoxin A synthesis.

Genetics

Biochemistry

0

Paper

Save

Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes

Henrik Nielsen et al.Jul 6, 2014

0

Paper

Save

The transcriptional landscape and small RNAs of Salmonella enterica serovar Typhimurium

Carsten Kröger et al.Apr 25, 2012

More than 50 y of research have provided great insight into the physiology, metabolism, and molecular biology of Salmonella enterica serovar Typhimurium (S. Typhimurium), but important gaps in our knowledge remain. It is clear that a precise choreography of gene expression is required for Salmonella infection, but basic genetic information such as the global locations of transcription start sites (TSSs) has been lacking. We combined three RNA-sequencing techniques and two sequencing platforms to generate a robust picture of transcription in S. Typhimurium. Differential RNA sequencing identified 1,873 TSSs on the chromosome of S. Typhimurium SL1344 and 13% of these TSSs initiated antisense transcripts. Unique findings include the TSSs of the virulence regulators phoP, slyA, and invF. Chromatin immunoprecipitation revealed that RNA polymerase was bound to 70% of the TSSs, and two-thirds of these TSSs were associated with σ(70) (including phoP, slyA, and invF) from which we identified the -10 and -35 motifs of σ(70)-dependent S. Typhimurium gene promoters. Overall, we corrected the location of important genes and discovered 18 times more promoters than identified previously. S. Typhimurium expresses 140 small regulatory RNAs (sRNAs) at early stationary phase, including 60 newly identified sRNAs. Almost half of the experimentally verified sRNAs were found to be unique to the Salmonella genus, and <20% were found throughout the Enterobacteriaceae. This description of the transcriptional map of SL1344 advances our understanding of S. Typhimurium, arguably the most important bacterial infection model.

Genetics

Ecology

0

Paper

Save

What can we learn from over 100,000 Escherichia coli genomes?

Kaleb Abram et al.Jul 19, 2019

The explosion of microbial genome sequences in public databases allows for large-scale population genomic studies of bacterial species, such as Escherichia coli . In this study, we examine and classify more than one hundred thousand E. coli and Shigella genomes. After removing outliers, a semi-automated Mash-based analysis of 10,667 assembled genomes reveals 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup serves as a proxy to classify more than 95,000 unassembled genomes. This analysis shows that most sequenced E. coli genomes belong to 4 phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups described is supported by pangenomic and phylogenetic analyses, which show differences in gene preservation between phylogroups. A phylogenetic tree constructed with 2,613 single copy core genes along with a matrix of phylogenetic profiles is used to confirm that the 14 phylogroups change at different rates of gene gain/loss/duplication. The methodology used in this work is able to identify previously uncharacterized phylogroups in E. coli species. Some of these new phylogroups harbor clonal strains that have undergone a process of genomic adaptation to the acquisition of new genomic elements related to virulence or antibiotic resistance. This is, to our knowledge, the largest E. coli genome dataset analyzed to date and provides valuable insights into the population structure of the species.

Genetics

Molecular Biology

0

Paper

Save

dBBQs : dataBase of Bacterial Quality scores

Visanu Wanchai et al.Sep 12, 2017

Background: It is well-known that genome sequencing technologies are becoming significantly cheaper and faster. As a result of this, the exponential growth in sequencing data in public databases allows us to explore ever growing large collections of genome sequences. However, it is less known that the majority of available sequenced genome sequences in public databases are not complete, drafts of varying qualities. We have calculated quality scores for around 100,000 bacterial genomes from all major genome repositories and put them in a fast and easy-to-use database. Results: Prokaryotic genomic data from all sources were collected and combined to make a non-redundant set of bacterial genomes. The genome quality score for each was calculated by four different measurements: assembly quality, number of rRNA and tRNA genes, and the occurrence of conserved functional domains. The dataBase of Bacterial Quality scores (dBBQs) was designed to store and retrieve quality scores. It offers fast searching and download features which the result can be used for further analysis. In addition, the search results are shown in interactive JavaScript chart framework using DC.js. The analysis of quality scores across major public genome databases find that around 68% of the genomes are of acceptable quality for many uses. Conclusions: dBBQs (available at http://arc-gem.uams.edu/dbbqs) provides genome quality scores for all available prokaryotic genome sequences with a user-friendly Web-interface. These scores can be used as cut-offs to get a high-quality set of genomes for testing bioinformatics tools or improving the analysis. Moreover, all data of the four measurements that were combined to make the quality score for each genome, which can potentially be used for further analysis. dBBQs will be updated regularly and is freely use for non-commercial purpose.

Genetics

Philosophy

0

Paper

Save

Decoding the Epitranscriptional Landscape from Native RNA Sequences

Thidathip Wongsurawat et al.Dec 17, 2018

Sequencing of native RNA and corresponding cDNA was performed using Oxford Nanopore Technology. The % Error of Specific Bases (%ESB) was higher for native RNA than for cDNA, which enabled detection of ribonucleotide modification sites. Based on %ESB differences of the two templates, a bioinformatic tool ELIGOS was developed and applied to rRNAs of E. coli, yeast and human cells. ELIGOS captured 91%, 95%, ~75%, respectively, of the known variety of RNA methylation sites in these rRNAs. Yeast transcriptomes from different growth conditions were also compared, which identified an association between metabolic adaptation and inferred RNA modifications. ELIGOS was further applied to human transcriptome datasets, which identified the well-known DRACH motif containing N6-methyadenine being located close to 3 prime-untranslated regions of mRNA. Moreover, the RNA G-quadruplex motif was uncovered by ELIGOS. In summary, we have developed an experimental method coupled with bioinformatic software to uncover native RNA modifications and secondary-structures within transcripts.

Genetics

Molecular Biology

0

Paper

Save

Genomic Surveillance of SARS-CoV-2 Using Long-Range PCR Primers

Sangam Kandel et al.Jul 11, 2023

+3

A

S

Whole Genome Sequencing (WGS) of the SARS-CoV-2 virus is crucial in the surveillance of the COVID-19 pandemic. Several primer schemes have been developed to sequence the ~30,000 nucleotide SARS-CoV-2 genome that use a multiplex PCR approach to amplify cDNA copies of the viral genomic RNA. Midnight primers and ARTIC V4.1 primers are the most popular primer schemes that can amplify segments of SARS-CoV-2 (400 bp and 1200 bp, respectively) tiled across the viral RNA genome. Mutations within primer binding sites and primer-primer interactions can result in amplicon dropouts and coverage bias, yielding low-quality genomes with 'Ns' inserted in the missing amplicon regions, causing inaccurate lineage assignments, and making it challenging to monitor lineage-specific mutations in Variants of Concern (VoCs). This study uses seven long-range PCR primers with an amplicon size of ~4500 bp to tile across the complete SARS-CoV-2 genome. One of these regions includes the full-length S-gene by using a set of flanking primers. Using a small set of long-range primers to sequence SARS-CoV-2 genomes reduces the possibility of amplicon dropout and coverage bias.

Genetics

Organic Chemistry

3

Paper

Save

Rapid Sequencing of Multiple RNA Viruses in their Native Form

Thidathip Wongsurawat et al.Nov 29, 2018

Long-read nanopore sequencing by a MinION device offers the unique possibility to directly sequence native RNA. We combined an enzymatic poly-A tailing reaction with the native RNA sequencing to (i) sequence complex population of single-stranded (ss)RNA viruses in parallel, (ii) detect genome, subgenomic mRNA/mRNA simultaneously, (iii) detect a complex transcriptomic architecture without the need for assembly, (iv) enable real-time detection. Using this protocol, positive-ssRNA, negative-ssRNA, with/without a poly(A)-tail, segmented/non-segmented genomes were mixed and sequenced in parallel. Mapping of the generated sequences on the reference genomes showed 100% length recovery with up to 97% identity. This work provides a proof of principle and the validity of this strategy, opening up a wide range of applications to study RNA viruses.

Genetics

Ecology

0

Paper

Genetics

Ecology

0

Save