Abstract The non-recombinant region of the Y chromosome (NRY) contains a great number of polymorphic markers that allows to accurately reconstruct pedigree relationships and retrieve ancestral information from study samples. The analysis of NRY is typically implemented in anthropological, medical, and forensic studies. High-throughput sequencing (HTS) has profoundly increased the identification of genetic markers in the NRY genealogy and has prompted the development of automated NRY haplogroup classification tools. Here, we present a benchmarking study of five command-line tools for NRY haplogroup classification. The evaluation was done using empirical short-read HTS data from 50 unrelated donors using paired data from whole-genome sequencing (WGS) and whole-exome sequencing (WES) experiments. Besides, we evaluate the performance of the top-ranked tool in the classification of data of third generation HTS obtained from a subset of donors. Our findings demonstrate that WES can be an efficient approach to infer the NRY haplogroup, albeit generally providing a lower level of genealogical resolution than that recovered by WGS. Among the tools evaluated, YLeaf offers the best performance for both WGS and WES applications. Finally, we demonstrate that YLeaf is able to correctly classify all samples sequenced with nanopore technology from long noisy reads.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.