Abstract Accurate identification of human leukocyte antigen (HLA) alleles is essential for various clinical and research applications, such as transplant matching and drug sensitivities. Recent advances in RNA-seq technology have made it possible to impute HLA types from high throughput sequencing data, spurring the development of a large number of computational HLA typing tools. However, the relative performance of these tools is unknown, limiting the ability for clinical and biomedical research to make informed choices regarding which tools to use. Here, we rigorously compare the performance of 9 HLA callers on 652 RNA-seq samples across 6 datasets with molecularly defined gold standard. We find that OptiType has the highest accuracy at both low and high resolution with an accuracy above 99%, followed by arcasHLA and seq2HLA with accuracies above 96%. Despite OptiType’s high accuracy, it is only capable of Class I predictions, thereby limiting its application to clinical procedures like transplantation requiring Class II predictions. Furthermore, our findings reveal significant variation in accuracy across each HLA locus, with HLA-A exhibiting the highest accuracy and HLA-DRB1 exhibiting the lowest accuracy. We also find that class II genes are generally more challenging to impute than class I genes, with most typing algorithms capable of making Class I predictions to >97% accuracy whereas the best Class II tool predicts with 94.2% accuracy. Moreover, we identify notable differences in the computational resources necessary to run each tool. We find that the most computationally expensive tools are OptiType and HLA-HD which require 10 5 and 10 2 times greater RAM and CPU, respectively, than the least computationally expensive tools, seq2HLA and RNA2HLA. Furthermore, all tools have decreased accuracy for African samples with respect to European samples at four digit resolution. We conclude that RNA-Seq HLA callers are capable of returning high-quality results, but the tools that offer a good balance between accuracy, consistency, and computational expensiveness are yet to be developed.