ResearchHub | Open Science Community

Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly

Ou Wang et al.Apr 2, 2019

Here, we describe single-tube long fragment read (stLFR), a technology that enables sequencing of data from long DNA molecules using economical second-generation sequencing technology. It is based on adding the same barcode sequence to subfragments of the original long DNA molecule (DNA cobarcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process, up to 3.6 billion unique barcode sequences were generated on beads, enabling practically nonredundant cobarcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique cobarcoding of more than 8 million 20- to 300-kb genomic DNA fragments. Analysis of the human genome NA12878 with stLFR demonstrated high-quality variant calling and phase block lengths up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries, and their construction did not significantly add to the time or cost of whole-genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high-quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.

Genetics

Ecology

0

Paper

Save

Advanced Whole Genome Sequencing Using an Entirely PCR-free Massively Parallel Sequencing Workflow

Hanjie Shen et al.Dec 23, 2019

Abstract Background Systematic errors can be introduced from DNA amplification during massively parallel sequencing (MPS) library preparation and sequencing array formation. Polymerase chain reaction (PCR)-free genomic library preparation methods were previously shown to improve whole genome sequencing (WGS) quality on the Illumina platform, especially in calling insertions and deletions (InDels). We hypothesized that substantial InDel errors continue to be introduced by the remaining PCR step of DNA cluster generation. In addition to library preparation and sequencing, data analysis methods are also important for the accuracy of the output data.In recent years, several machine learning variant calling pipelines have emerged, which can correct the systematic errors from MPS and improve the data performance of variant calling. Results Here, PCR-free libraries were sequenced on the PCR-free DNBSEQ™ arrays from MGI Tech Co., Ltd. (referred to as MGI) to accomplish the first true PCR-free WGS which the whole process is truly not only PCR-free during library preparation but also PCR-free during sequencing. We demonstrated that PCR-based WGS libraries have significantly (about 5 times) more InDel errors than PCR-free libraries.Furthermore, PCR-free WGS libraries sequenced on the PCR-free DNBSEQ™ platform have up to 55% less InDel errors compared to the NovaSeq platform, confirming that DNA clusters contain PCR-generated errors.In addition, low coverage bias and less than 1% read duplication rate was reproducibly obtained in DNBSEQ™ PCR-free using either ultrasonic or enzymatic DNA fragmentation MGI kits combined with MGISEQ-2000. Meanwhile, variant calling performance (single-nucleotide polymorphisms (SNPs) F-score>99.94%, InDels F-score>99.6%) exceeded widely accepted standards using machine learning (ML) methods (DeepVariant or DNAscope). Conclusions Enabled by the new PCR-free library preparation kits, ultra high-thoughput PCR-free sequencers and ML-based variant calling, true PCR-free DNBSEQ™ WGS provides a powerful solution for improving WGS accuracy while reducing cost and analysis time, thus facilitating future precision medicine, cohort studies, and large population genome projects.

Genetics

Molecular Biology

0

Paper

Save

A new massively parallel nanoball sequencing platform for whole exome research

Yu Xu et al.Oct 30, 2018

Background: Whole exome sequencing (WES) has been widely used in human genetics research. BGISEQ-500 is a recently established next-generation sequencing platform. However, the performance of BGISEQ-500 on WES is not well studied. In this study, we evaluated the performance of BGISEQ-500 on WES by side-to-side comparison with Hiseq4000, on well-characterized human sample NA12878. Results: BGISEQ demonstrated similarly high reproducibility as Hiseq for variation detection. Also, the SNPs from BGISEQ data is highly consistent with Hiseq results (concordance 96.5%~97%). Variation detection accuracy was subsequently evaluated with data from the genome in a bottle project as the benchmark. Both platforms showed similar sensitivity and precision in SNP detection. While in indel detection, BGISEQ showed slightly higher sensitivity and lower precision. The impact of sequence depth and read length on variation detection accuracy was further analyzed, and showed that variation detection sensitivity still increasing when the sequence depth is larger than 100x, and the impact of read length is minor when using 100x data. Conclusions: This study suggested that BGISEQ-500 is a qualified sequencing platform for WES.

Genetics

Molecular Biology

0

Paper

Save

Reusable and Sensitive Detection Method for Exonuclease III Activity by DNB Nanoarrays Based on cPAS Sequencing Technology

Ying Chen et al.Mar 25, 2020

In this article, we have designed a sensitive and recycled DNB (DNA nanoball) nanoarrays sequencing complex structures based on BGISEQ-500RS sequencer for the monitoring performance of Exo III activity. In the shortage of Exo III, the effective number ratio of DNB would be captured by an optical system due to one fluorescent. In contrast, in the presence of Exo III, some DNB would disappear or discard from the fields of the optical system by fluorescence extinction and uncleaned fluorescent, respectively. As a result, the effective number of DNB of this strategy was relative to the concentration of Exo III. For Exo III, our strategy showed a highly sensitive linear response in the low detection range of 0.01 U/mL to 0.5 U/mL, with detection limits below 0.01 U/mL. With the comparison between DNB nanoarrays and other fluorescent sensors, this study possessed superior sensitivity, selectivity, and reusability, accompanying with the low cost and simple setup.

Philosophy

Biochemistry

0

Paper

Save

Efficient long single molecule sequencing for cost effective and accurate sequencing, haplotyping, and de novo assembly

Ou Wang et al.May 17, 2018

Single tube long fragment read (stLFR) technology enables efficient WGS, haplotyping, and contig scaffolding. It is based on adding the same barcode sequence to sub-fragments of the original DNA molecule (DNA co-barcoding). To achieve this, stLFR uses the surface of microbeads to create millions of miniaturized compartments in a single tube. Using a combinatorial process over 1.8 billion unique barcode sequences were generated on beads, enabling practically non-redundant co-barcoding in reactions with 50 million barcodes. Using stLFR we demonstrate efficient unique co-barcoding of over 8 million 20-300 kb genomic DNA fragments with near perfect variant calling and phasing of the genome of NA12878 into contigs up to N50 23.4 Mb. stLFR represents a low-cost single library solution that can enable long sequence data.

Genetics

Molecular Biology

0

Paper

Save

DNB-Based On-Chip Motif Finding (DocMF): a High-Throughput Method to Profile Different Types of Protein-DNA Interactions

Zhuokun Li et al.Nov 1, 2019

Here we report a highly sensitive DNB-based on-chip Motif Finding (DocMF) system that utilizes high throughput next-generation-sequencing (NGS) chips to profile protein binding or cleaving activity. Using DocMF, we successfully identified a variety of endonuclease recognition sites and the protospacer-adjacent-motif (PAM) sequences of different CRISPR systems. Our DocMF platform can simultaneously screen both 5’ and 3’ PAM regions with high coverage using the same NGS library/chip. For the well-studied SpCas9, our DocMF platform identified a small proportion of noncanonical 5’-NAG-3’ (∼5%) and 5’-NGA-3’ (∼1.6%), in addition to its common PAMs, 5’-NGG-3’ (∼89.9%). We also used the DocMF to assay two uncharacterized Cas endonucleases, VeCas9 and BvCpf1. VeCas9 PAMs were not detected by the conventional PAM depletion method. However, DocMF discovered that both VeCas9 and BvCpf1 required broader and more complicated PAM sequences for target recognition. VeCas9 preferred the R-rich motifs, whereas BvCpf1 used the T-rich PAMs. Moreover, after slightly changing the experimental protocol, we observed that dCas9, a DNA-binding protein lacking endonuclease activity, preferably binded to the previously reported PAMs 5’-NGG-3’. In summary, our studies demonstrate that DocMF is the first tool with the capacity to exhaustively assay both the binding and the cutting properties of different DNA-binding proteins.

Genetics

Molecular Biology

0

Paper

Save

Semi-rational evolution of a recombinant DNA polymerase for modified nucleotide incorporation efficiency

Lili Zhai et al.Mar 22, 2023

Engineering improved B-family DNA polymerases to incorporate 3′-O-modified nucleotide reversible terminators is limited by an insufficient understanding of the structural determinants that define polymerization efficiency. To explore the key mechanism for unnatural nucleotide incorporation, we engineered a B-family DNA polymerase from Thermococcus Kodakaraenis (KOD pol) by using semi-rational design strategies. We first scanned the active pocket of KOD pol through site-directed saturation mutagenesis and combinatorial mutations and identified a variant Mut_C2 containing five mutation sites (D141A, E143A, L408I, Y409A, A485E) using a high-throughput microwell-based screening method. Mut_C2 demonstrated high catalytic efficiency in incorporating 3'-O-azidomethyl-dATP labeled with a Cy3 dye, whereas the wild-type KOD pol failed to incorporate it. Computational simulations were then conducted towards the DNA binding region of KOD pol to predict additional mutations with enhanced catalytic activity, which were subsequently experimentally verified. By a stepwise combinatorial mutagenesis approach, we obtained an eleven-mutation variant, named Mut_E10 by introducing additional mutations to the Mut_C2 variant. Mut_E10, which carried six specific mutations (S383T, Y384F, V389I, V589H, T676K, and V680M) within the DNA-binding region, demonstrated over 20-fold improvement in kinetic efficiency as compared to Mut_C2. In addition, Mut_E10 demonstrated satisfactory performance in two different sequencing platforms (BGISEQ-500 and MGISEQ-2000), indicating its potential for commercialization. Our study demonstrates that an effective enhancement in its catalytic efficiency towards modified nucleotides can be achieved efficiently through combinatorial mutagenesis of residues in the active site and DNA binding region of DNA polymerase. These findings contribute to a comprehensive understanding of the mechanisms that underlie the incorporation of modified nucleotides by DNA polymerase. The beneficial mutation sites, as well as the nucleotide incorporation mechanism identified in this study, can provide valuable guidance for the engineering of other B-family DNA polymerases.

Genetics

Molecular Biology

1

Paper

Save

Rational evolution of a recombinant DNA polymerase for efficient incorporation of unnatural nucleotides by dual-site boosting

Ruyin Cao et al.Mar 2, 2022

Machine learning modelling assisting function-oriented enzyme engineering is normally built on predefined protein sequence space. However, efficient defining the determinant amino acid positions upon which the combinatorial mutation library is constructed is still a challenge in protein science. Herein, we present a comprehensive investigation of modifying a recombinant DNA polymerase for efficient incorporating one unnatural nucleotide, including the identification of key sites/regions, machine learning-assisted mutants screening, and the underlying mechanism of kinetics boosting. By using hundreds of training points and only dozens of testing samples, we found that one highly engineered enzyme’s catalytic efficiency can be further improved by one order of magnitude by specific mutation on two sites, 485I and 451L. Compared to the position 485 which is known to dominate local conformation of B-family DNA polymerases, 451 is a split-new active site discovered by our approach. A novel allosteric regulation mechanism is underlying the apparent synergy of 485I and 451L on the kinetics boosting. As a result, a “half-closed” conformation of the binding pocket and a cooperative binding of both primer and template DNA strands on the protein accelerated the processes of substrate’s incorporation, molecular recognition, and releasing of incorrect nucleotides. These findings have implications in guiding the function-tuning of DNA polymerases for a broad range of biotechnological applications.

Genetics

Biochemistry

1

Paper

Save

CoolMPS™: Advanced massively parallel sequencing using antibodies specific to each natural nucleobase

Snezana Drmanac et al.Feb 20, 2020

Massively parallel sequencing (MPS) on DNA nanoarrays provides billions of reads at relatively low cost and enables a multitude of genomic applications. Further improvement in read length, sequence quality and cost reduction will enable more affordable and accurate comprehensive health monitoring tests. Currently the most efficient MPS uses dye-labeled reversibly terminated nucleotides (RTs) that are expensive to make and challenging to incorporate. Furthermore, a part of the dye-linker (scar) remains on the nucleobase after cleavage and interferes with subsequent sequencing cycles. We describe here the development of a novel MPS chemistry (CoolMPS™) utilizing unlabeled RTs and four natural nucleobase-specific fluorescently labeled antibodies with fast (30 sec) binding. We implemented CoolMPS™ on MGI′s PCR-free DNBSEQ MPS platform using arrays of 200nm DNA nanoballs (DNBs) generated by rolling circle replication and demonstrate 3-fold improvement in signal intensity and elimination of scar interference. Single-end 100-400 base and pair-end 2x150 base reads with high quality were readily generated with low out-of-phase incorporation. Furthermore, DNBs with less than 50 template copies were successfully sequenced by strong-signal CoolMPS ™ with 3-times higher accuracy than in standard MPS. CoolMPS™ chemistry based on natural nucleobases has potential to provide longer, more accurate and less expensive MPS reads, including highly accurate ″4-color sequencing″ on the most efficient dye-crosstalk-free 2-color imagers with an estimated sequencing error rate of 0.00058% (one error in 170,000 base calls) in a proof-of-concept demonstration.

Genetics

Biochemistry

0

Paper

Genetics

Biochemistry

0

Save