BackgroundGenome contamination is a well-known issue in (meta)genomics. Although it has received a lot of attention, with an increasing number of detection tools made available over the years, no comparison between these tools exists in the literature. ResultsHere, we report the benchmarking of six of the most popular tools using a simulated framework. Our simulations were conducted on six different taxonomic ranks, from phylum to species. The analysis of the estimated contamination levels indicates that the precision of the tools is not good, often due to large overdetection but also underdetection, especially at the genus and species ranks. Furthermore, our results show that only redundant contamination is accurately estimated. ConclusionOur results indicate that using a combination of tools, including Kraken2, is necessary to estimate the contamination level accurately. We also provide a freely available contamination simulation framework, CRACOT, which may be useful for estimating the accuracy of future algorithms.
Support the authors with ResearchCoin
Support the authors with ResearchCoin