ResearchHub | Open Science Community

0

Genomic analysis of globally diverse Mycobacterium tuberculosis strains provides insights into the emergence and spread of multidrug resistance

Abigail Manson et al.Jan 16, 2017

Genetics

Epidemiology

0

Paper

Save

Alpaca: a kmer-based approach for investigating mosaic structures in microbial genomes

Alex Salazar et al.Feb 15, 2019

Abstract Summary Microbial genomes are often mosaic: different regions can possess different evolutionary origins due to genetic recombination. The recent feasibility to assemble microbial genomes completely and the availability of sequencing data for complete microbial populations, means that researchers can now investigate the potentially rich evolutionary history of a microbe at a much higher resolution. Here, we present Alpaca: a method to investigate mosaicism in microbial genomes based on kmer similarity of large sequencing datasets. Alpaca partitions a given assembly into various sub-regions and compares their similarity across a population of genomes. The result is a high-resolution map of an entire genome and the most similar scoring clades across the given population. Availability https://github.com/AbeelLab/Alpaca Contact t.abeel@tudelft.nl

Genetics

Ecology

0

Paper

Save

Nanopore sequencing and comparative genome analysis confirm lager-brewing yeasts originated from a single hybridization

Alex Salazar et al.Apr 9, 2019

Abstract Background The lager brewing yeast, S. pastorianus , is a hybrid between S. cerevisiae and S. eubayanus with extensive chromosome aneuploidy. S. pastorianus is subdivided into Group 1 and Group 2 strains, where Group 2 strains have higher copy number and a larger degree of heterozygosity for S. cerevisiae chromosomes. As a result, Group 2 strains were hypothesized to have emerged from a hybridization event distinct from Group 1 strains. Current genome assemblies of S. pastorianus strains are incomplete and highly fragmented, limiting our ability to investigate their evolutionary history. Results To fill this gap, we generated a chromosome-level genome assembly of the S. pastorianus strain CBS 1483 using MinION sequencing and analysed the newly assembled subtelomeric regions and chromosome heterozygosity. To analyse the evolutionary history of S. pastorianus strains, we developed Alpaca: a method to compute sequence similarity between genomes without assuming linear evolution. Alpaca revealed high similarities between the S. cerevisiae subgenomes of Group 1 and 2 strains, and marked differences from sequenced S. cerevisiae strains . Conclusions Our findings suggest that Group 1 and Group 2 strains originated from a single hybridization involving a heterozygous S. cerevisiae strain, followed by different evolutionary trajectories. The clear differences between both groups may originate from a severe population bottleneck caused by the isolation of the first pure cultures. Alpaca provides a computationally inexpensive method to analyse evolutionary relationships while considering non-linear evolution such as horizontal gene transfer and sexual reproduction, providing a complementary viewpoint beyond traditional phylogenetic approaches.

Genetics

Molecular Biology

0

Paper

Save

Characterising tandem repeat complexities across long-read sequencing platforms with TREAT

Niccólo Tesi et al.Mar 17, 2024

Tandem repeats (TR) play important roles in genomic variation and disease risk in humans. Long-read sequencing allows for the characterisation of TRs, however, the underlying bioinformatics perspective remains challenging. We evaluated potential biases when genotyping >864k TRs using diverse Oxford Nanopore Technology (ONT) and PacBio long-read sequencing technologies. We showed that, in rare cases, long-read sequencing suffers from coverage drops in TRs, such as the disease-associated TRs in ABCA7 and RFC1 genes. Such coverage drops can lead to TR mis-genotyping, hampering accurate assessments of TR alleles and highlighting the need for bioinformatic tools to characterise TRs across different technologies and data-types. For this reason, we have developed otter and TREAT: otter is a fast targeted local assembler, cross-compatible across different sequencing platforms. It is integrated in TREAT, an end-to-end workflow for TR characterisation, visualisation and analysis across multiple genomes. Together, these tools enabled accurate characterisation of >864k TRs in long-read sequencing data from ONT and PacBio technologies, with error rates ranging 0.6-1.1% and with limited computational resources. This performance extends across diverse genomes: applied to clinically relevant TRs, TREAT significantly detected diseased individuals with extreme expansions (p=4.3x10-7 and p=1.4x10-5, TR expansions in RFC1 gene). Importantly, in a case-control setting, we significantly replicated previously reported TRs-associations with Alzheimer's Disease, including those near or within APOC1 (p=2.63x10-9), SPI1 (p=6.5x10-3) and ABCA7 (p=0.04) genes. Our tools overcome common limitations regarding cross-sequencing platform compatibility and allow end-to-end analysis and comparisons of tandem repeats in human genomes, with broad applications in research and clinical fields.

Genetics

Molecular Biology

0

Paper

Save

MotifScope: a multi-sample motif discovery and visualization tool for tandem repeats

Yaran Zhang et al.Mar 11, 2024

Tandem repeats (TRs) constitute a significant portion of the human genome, exhibiting high levels of polymorphism due to variations in size and motif composition. These variations have been associated with various neuropathological disorders, underscoring the clinical importance of TRs. Furthermore, the motif structure of these repeats can offer valuable insights into evolutionary dynamics and population structure. However, analysis of TRs has been hampered by the limitations of short-read sequencing technology, which lacks the ability to fully capture the complexity of these sequences. With long-read data becoming more accessible, there is now also a need for tools to explore and characterize these TRs. In this study, we introduce MotifScope, a novel algorithm for visualization of TRs in their population context based on a de novo k-mer approach for motif discovery. Comparative analysis against three established tools, uTR, TRF, and vamos, reveals that MotifScope can identify a greater number of motifs and more accurately represent the actual repeat sequence. Additionally, MotifScope enables comparison of sequencing reads within an individual and assemblies across different individuals, showing its applicability in diverse genomic contexts. We demonstrate potential applications of MotifScope in diverse fields, including population genetics, clinical settings, and forensic analyses.

Genetics

Molecular Biology

0

Paper

Save

Laboratory evolution of a Saccharomyces cerevisiae x S. eubayanus hybrid under simulated lager-brewing conditions: genetic diversity and phenotypic convergence

Arthur Vries et al.Nov 22, 2018

Saccharomyces pastorianus lager-brewing yeasts are domesticated hybrids of S. cerevisiae x S. eubayanus that display extensive inter-strain chromosome copy number variation and chromosomal recombinations. It is unclear to what extent such genome rearrangements are intrinsic to the domestication of hybrid brewing yeasts and whether they contribute to their industrial performance. Here, an allodiploid laboratory hybrid of S. cerevisiae and S. eubayanus was evolved for up to 418 generations on wort under simulated lager-brewing conditions in six independent sequential batch bioreactors. Characterization of 55 single-cell isolates from the evolved cultures showed large phenotypic diversity and whole-genome sequencing revealed a large array of mutations. Frequent loss of heterozygosity involved diverse, strain-specific chromosomal translocations, which differed from those observed in domesticated, aneuploid S. pastorianus brewing strains. In contrast to the extensive aneuploidy of domesticated S. pastorianus strains, the evolved isolates only showed limited (segmental) aneuploidy. Specific mutations could be linked to calcium-dependent flocculation, loss of maltotriose utilisation and loss of mitochondrial activity, three industrially relevant traits that also occur in domesticated S. pastorianus strains. This study indicates that fast acquisition of extensive aneuploidy is not required for genetic adaptation of S. cerevisiae x S. eubayanus hybrids to brewing environments. In addition, this work demonstrates that, consistent with the diversity of brewing strains for maltotriose utilization, domestication under brewing conditions can result in loss of this industrially relevant trait. These observations have important implications for the design of strategies to improve industrial performance of novel laboratory-made hybrids.

Genetics

Molecular Biology

0

Paper

Save

Multisample motif discovery and visualization for tandem repeats

Yaran Zhang et al.Nov 13, 2024

Tandem Repeats (TR) occupy a significant portion of the human genome and are the source of polymorphism due to variations in sizes and motif compositions. Some of these variations have been associated with various neuropathological disorders, highlighting the clinical importance of assessing the motif structure of TRs. Moreover, assessing the TR motif variation can offer valuable insights into evolutionary dynamics and population structure. Previously, characterizations of TRs have been limited by short-read sequencing technology, which lacks the ability to accurately capture the full TR sequences. As long-read sequencing becomes more accessible and can capture the full complexity of TRs, there is now also a need for tools to characterize and analyze TRs using long-read data across multiple samples. In this study, we present MotifScope, a novel algorithm for characterization and visualization of TRs based on a de novo k -mer approach for motif discovery. Comparative analysis against established tools reveals that MotifScope can identify a greater number of motifs and more accurately represent the underlying repeat sequence. Moreover, MotifScope has been specifically designed to enable motif composition comparisons across assemblies of different individuals, as well as across long-read sequencing reads within an individual, through combined motif discovery and sequence alignment. We showcase potential applications of MotifScope in diverse fields, including population genetics, clinical settings, and forensic analyses.

Genetics

Molecular Biology

0

Paper

Save

Nanopore sequencing enables near-complete de novo assembly of Saccharomyces cerevisiae reference strain CEN.PK113-7D

Alex Salazar et al.Aug 14, 2017

The haploid Saccharomyces cerevisiae strain CEN.PK113-7D is a popular model system for metabolic engineering and systems biology research. Current genome assemblies are based on short-read sequencing data scaffolded based on homology to strain S288C. However, these assemblies contain large sequence gaps, particularly in subtelomeric regions, and the assumption of perfect homology to S288C for scaffolding introduces bias. In this study, we obtained a near-complete genome assembly of CEN.PK113-7D using only Oxford Nanopore Technology's MinION sequencing platform. 15 of the 16 chromosomes, the mitochondrial genome, and the 2-micron plasmid are assembled in single contigs and all but one chromosome starts or ends in a telomere cap. This improved genome assembly contains 770 Kbp of added sequence containing 248 gene annotations in comparison to the previous assembly of CEN.PK113-7D. Many of these genes encode functions determining fitness in specific growth conditions and are therefore highly relevant for various industrial applications. Furthermore, we discovered a translocation between chromosomes III and VIII which caused misidentification of a MAL locus in the previous CEN.PK113-7D assembly. This study demonstrates the power of long-read sequencing by providing a high-quality reference assembly and annotation of CEN.PK113-7D and places a caveat on assumed genome stability of microorganisms.

Genetics

Molecular Biology

0

Paper

Genetics

Molecular Biology

0

Save