SummaryPacific Biosciences (PacBio) circular consensus sequencing (CCS) aka high fidelity (HiFi) technology has revolutionized modern genomics by producing long (10+kb) and highly accurate reads by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently the accuracy and quality value estimation is more than sufficient for genome assembly and germline variant calling, but the estimated quality scores are not accurate enough for confident somatic variant calling on single reads. Here we introduce TopoQual, a tool utilizing partial order alignments (POA), topologically parallel bases, and deep learning to polish consensus sequences and more accurately predict base qualities. We correct ~31.9% of errors in PacBio consensus sequences and validate base qualities up to q59 which is one error in 0.9 million bases enabling accurate somatic variant calling with HiFi data. Availability and implementationThe source code and installation instructions as well as validation dataset used are freely available at https://github.com/lorewar2/TopoQual
Support the authors with ResearchCoin
Support the authors with ResearchCoin