1. Abstract Background Sequencing of amplified genetic markers, such as the 16S rRNA gene, have been extensively used to characterize microbial community composition. Recent studies suggested that Amplicon Sequences Variants (ASV) should replace the Operational Taxonomic Units (OTU), given the arbitrary definition of sequence identity thresholds used to define units. Alignment-free methods are an interesting alternative for the taxonomic classification of the ASVs, preventing the introduction of biases from sequence identity thresholds. Results Here we present TAG.ME, a novel alignment-independent and amplicon-specific method for taxonomic assignment based on genetic markers. TAG.ME uses a multilevel supervised learning approach to create predictive models based on user-defined genetic marker genes. The predictive method can assign taxonomy to sequenced amplicons efficiently and effectively. We applied our method to assess gut and soil sample classification, and it outperformed alternative approaches, identifying a substantially larger proportion of species. Benchmark tests performed using the RDP database, and Mock communities reinforced the precise classification into deep taxonomic levels. Conclusion TAG.ME presents a new approach to assign taxonomy to amplicon sequences accurately. Our classification model, trained with amplicon specific sequences, can address resolution issues not solved by other methods and approaches that use the whole 16S rRNA gene sequence. TAG.ME is implemented as an R package and is freely available at http://gabrielrfernandes.github.io/tagme/
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.