Abstract Massive sequencing of SARS-CoV-2 genomes has led to a great demand for adding new samples to a reference phylogeny instead of building the tree from scratch. To address such challenge, we proposed an algorithm ‘TIPars’ by integrating parsimony analysis with pre-computed ancestral sequences. Compared to four state-of-the-art methods on four benchmark datasets (SARS-CoV-2, Influenza virus, Newcastle disease virus and 16S rRNA genes), TIPars achieved the best performance in most tests. It took only 21 seconds to insert 100 SARS-CoV-2 genomes to a 100k-taxa reference tree using near 1.4 gigabytes of memory. Its efficient and accurate phylogenetic placements and incrementation for phylogenies with highly similar and divergent sequences suggest that it will be useful in a wide range of studies including pathogen molecular epidemiology, microbiome diversity and systematics.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.