ResearchHub | Open Science Community

Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome

Wouter Coster et al.Oct 3, 2018

Abstract We sequenced the Yoruban NA19240 genome on the long read sequencing platform Oxford Nanopore PromethION for benchmarking and evaluation of recently published aligners and structural variant calling tools. In this work, we determined the precision and recall, present high confidence and high sensitivity call sets of variants and discuss optimal parameters. The aligner Minimap2 and structural variant caller Sniffles are both the most accurate and the most computationally efficient tools in our study. We describe our scalable workflow for identification, annotation, and characterization of tens of thousands of structural variants from long read genome sequencing of an individual or population. By discussing the results of this genome we provide an approximation of what can be expected in future long read sequencing studies aiming for structural variant identification.

Genetics

Artificial Intelligence

0

Paper

Save

Accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION

Arne Roeck et al.Oct 9, 2018

Abstract Tandem repeats (TRs) can cause disease through their length, sequence motif interruptions, and nucleotide modifications. For many TRs, however, these features are very difficult - if not impossible - to assess, requiring low-throughput and labor-intensive assays. One example is a VNTR in ABCA7 for which we recently discovered that expanded alleles strongly increase risk of Alzheimer’s disease. Here, we investigated the potential of long-read whole genome sequencing to surmount these challenges, using the high-throughput PromethION platform from Oxford Nanopore Technologies. To overcome the limitations of conventional base calling and alignment, we developed an algorithm to study the TR size and sequence directly on raw PromethION current data. We report the long-read sequencing of multiple human genomes (n = 11) using only a single sequencing run and flow cell per individual. With the use of fresh DNA extractions, DNA shearing to approximately 20kb and size selection, we obtained an average output of 70 gigabases (Gb) per flow cell, corresponding to a 21x genome coverage, and a maximum yield of 98 Gb (30x genome coverage). All ABCA7 VNTR alleles, including expansions up to 10,000 bases, were spanned by long sequencing reads, validated by Southern blotting. Classical approaches of TR length estimation suffered from low accuracy, low precision, DNA strand effects and/or inability to call pathogenic repeat expansions. In contrast, our novel NanoSatellite algorithm, which circumvents base calling by using dynamic time warping on raw PromethION current data, achieved more than 90% accuracy and high precision (5.6% relative standard deviation) of TR length estimation, and detected all clinically relevant repeat expansions. In addition, we identified alternative TR sequence motifs with high consistency, allowing determination of TR sequence and distinction of VNTR alleles with homozygous length. In conclusion, we validated the robustness of single-experiment whole genome long-read sequencing on PromethION, a prerequisite for application of long-read sequencing in the clinic. In addition, we outperformed Southern blotting, enabling improved characterization of the role of expanded ABCA7 VNTR alleles in Alzheimer’s disease, and opening new opportunities for TR research.

Genetics

Molecular Biology

0

Paper

Save

What are the reference strains of Acinetobacter baumannii referring to?

Chantal Philippe et al.Feb 27, 2022

Abstract We assembled the whole genome sequence (WGS) of a collection of 43 non-redundant modern clinical isolates and four broadly used reference strains of Acinetobacter baumannii . Comparison of these isolates and their WGS confirmed the high heterogeneity in capsule loci, sequence types, the presence of virulence and antibiotic resistance genes. However, a significant portion of clinical isolates strongly differ when compared to several reference strains in the light of colony morphology, cellular density, capsule production, natural transformability and in vivo virulence. These genetic and phenotypic differences between current circulating strains of A. baumannii and established reference strains could hamper the study of A. baumannii as an entity. The broadly used reference strains led to the current state of the art of the A. baumannii field, however, we propose that established reference strains in the A. baumannii field should be carefully used, because of the high genetic and phenotypic heterogeneities. In this study, we generated a collection of high-quality nucleotide sequences of 43 modern clinical isolates with the corresponding multi-level phenotypic characterizations. Beside the contribution of novel fundamental observations generated in this study, the phenotypic and genetic data, along with the bacterial strains themselves, will be further accessible using the first open access online platform called “Acinetobase”. Therefore, a rational choice of modern strains will be possible to select the ones that suit the needs of specific biological questions.

Genetics

Microbiology

14

Paper

Save

The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics

Ann Cartney et al.Sep 17, 2024

A genomic database of all Earth's eukaryotic species could contribute to many scientific discoveries; however, only a tiny fraction of species have genomic information available. In 2018, scientists across the world united under the Earth BioGenome Project (EBP), aiming to produce a database of high-quality reference genomes containing all ~1.5 million recognized eukaryotic species. As the European node of the EBP, the European Reference Genome Atlas (ERGA) sought to implement a new decentralised, equitable and inclusive model for producing reference genomes. For this, ERGA launched a Pilot Project establishing the first distributed reference genome production infrastructure and testing it on 98 eukaryotic species from 33 European countries. Here we outline the infrastructure and explore its effectiveness for scaling high-quality reference genome production, whilst considering equity and inclusion. The outcomes and lessons learned provide a solid foundation for ERGA while offering key learnings to other transnational, national genomic resource projects and the EBP.

Genetics

Ecology

0

Paper

Save

Scywalker: scalable end-to-end data analysis workflow for nanopore single-cell transcriptome sequencing

Peter Rijk et al.Feb 24, 2024

Abstract We introduce scywalker , an innovative and scalable package developed to comprehensively analyze long-read nanopore sequencing data of full-length single-cell or single-nuclei cDNA. Existing nanopore single-cell data analysis tools showed severe limitations in handling current data sizes. We developed novel scalable methods for cell barcode demultiplexing and single-cell isoform calling and quantification and incorporated these in an easily deployable package. Scywalker streamlines the entire analysis process, from sequenced fragments in FASTQ format to demultiplexed pseudobulk isoform counts, into a single command suitable for execution on either server or cluster. Scywalker includes data quality control, cell type identification, and an interactive report. Assessment of datasets from the human brain, Arabidopsis leaves, and previously benchmarked data from mixed cell lines, demonstrate excellent correlation with short-read analyses at both the cell-barcoding and gene quantification levels. At the isoform level, we show that scywalker facilitates the direct identification of cell-type-specific expression of novel isoforms.

Genetics

Molecular Biology

0

Paper

Save

Methylmap: visualization of modified nucleotides for large cohort sizes

Elise Coopman et al.Nov 30, 2022

Summary Methylmap is a tool developed for visualization of modified nucleotide frequencies per position, especially for large numbers of samples. Various input possibilities are supported, including the standardized BAM/CRAM files containing MM and ML tags. Availability and implementation Methylmap is written in Python3 and available through PyPI and bioconda. The source code is released under MIT license and can be found at https://github.com/EliseCoopman/methylmap .

Finance

Biochemistry

6

Paper

Save

Investigating the Role of Chromatin Remodeler FOXA1 in Ferroptotic Cell Death

Emilie Logie et al.Oct 14, 2021

Ferroptosis is a lipid peroxidation-dependent mechanism of regulated cell death known to suppress tumor proliferation and progression. Although several genetic and protein hallmarks have been identified in ferroptotic cell death, it remains challenging to fully characterize ferroptosis signaling pathways and to find suitable biomarkers. Moreover, changes taking place in the epigenome of ferroptotic cells remain poorly studied. In this context, we aimed to investigate the role of chromatin remodeler forkhead box protein A1 (FOXA1) in RSL3-treated multiple myeloma cells because, similar to ferroptosis, this transcription factor has been associated with changes in the lipid metabolism, DNA damage, and epithelial-to-mesenchymal transition (EMT). RNA sequencing and Western blot analysis revealed that FOXA1 expression is consistently upregulated upon ferroptosis induction in different in vitro and in vivo disease models. In silico motif analysis and transcription factor enrichment analysis further suggested that ferroptosis-mediated FOXA1 expression is orchestrated by specificity protein 1 (Sp1), a transcription factor known to be influenced by lipid peroxidation. Remarkably, FOXA1 upregulation in ferroptotic myeloma cells did not alter hormone signaling or EMT, two key downstream signaling pathways of FOXA1. CUT&RUN genome-wide transcriptional binding site profiling showed that GPX4-inhibition by RSL3 triggered loss of binding of FOXA1 to pericentromeric regions in multiple myeloma cells, suggesting that this transcription factor is possibly involved in genomic instability, DNA damage, or cellular senescence under ferroptotic conditions. Abstract Figure

Genetics

Cell Biology

1

Paper

Save

Methplotlib: analysis of modified nucleotides from nanopore sequencing

Wouter Coster et al.Nov 7, 2019

Summary: Modified nucleotides play a crucial role in gene expression regulation. Here we describe methplotlib, a tool developed for the visualization of modified nucleotides detected from Oxford Nanopore Technologies sequencing platforms, together with additional scripts for statistical analysis of allele specific modification within subjects and differential modification frequency across subjects. Availability and implementation: The methplotlib command-line tool is written in Python3, is compatible with Linux, Mac OS and the MS Windows 10 Subsystem for Linux and released under the MIT license. The source code can be found at https://github.com/wdecoster/methplotlib and can be installed from PyPI and bioconda. Our repository includes test data and the tool is continuously tested at travis-ci.com.

Genetics

Molecular Biology

0

Paper

Save

opentsv prevents the corruption of scientific data by Excel

Peter Rijk et al.Dec 16, 2018

Microsoft Excel is widely used by researchers to edit tab- or comma-separated data files. However, Excel often corrupts the data when opening these files, most notably by changing some gene names to a date. Although this problem was cautioned against earlier, we show that every year hundreds of published papers still come with supplementary data files containing these errors. Opentsv was developed to effectively circumvent this problem at the root by providing an easy and transparent way to open delimited data files in Excel without these conversions. Opentsv is freely available at .

Software

Information Systems

0

Paper

Save

Critical length in long read resequencing

De Wouter et al.Apr 29, 2019

Long read sequencing has a substantial advantage for structural variant discovery and phasing of variants compared to short-read technologies, but the required and optimal read length has not been assessed. In this work, we used simulated long reads and evaluated structural variant discovery and variant phasing using current best practice bioinformatics methods. We determined that optimal discovery of structural variants from human genomes can be obtained with reads of minimally 15 kbp. Haplotyping genes entirely only reaches its optimum from reads of 100 kbp. These findings are important for the design of future long read sequencing projects.

Genetics

Molecular Biology

0

Paper

Genetics

Molecular Biology

0

Save