Abstract Estimating cell type composition of blood and tissue samples is a biological challenge relevant in both laboratory studies and clinical care. In recent years, a number of computational tools have been developed to estimate cell type abundance using gene expression data. While these tools use a variety of approaches, they all leverage expression profiles from purified cell types to evaluate the cell type composition within samples. In this study, we compare ten deconvolution tools and evaluate their performance while using each of eleven separate reference profiles. Specifically, we have run deconvolution tools on over 4,000 samples with known cell type proportions, spanning both immune and stromal cell types. Twelve of these represent in vitro synthetic mixtures and 300 represent in silico synthetic mixtures prepared using single cell data. A final 3,728 clinical samples have been collected from the Framingham Cohort, for which cell populations have been quantified using electrical impedance cell counting. When tools are applied to the Framingham dataset, the tool EPIC produces the highest correlation while GEDIT produces the lowest error. The best tool for other datasets is varied, but CIBERSORT and GEDIT most consistently produce accurate results. In terms of reference choice, we find that the Human Primary Cell Atlas (HPCA) and references published by the EPIC authors produce accurate results for the largest number of tools and datasets. When applying deconvolution to blood samples, the leukocyte reference matrix LM22 is also a suitable choice, usually (but not always) outperforming HPCA and EPIC. Running time varies substantially across tools. For as many as 5052 samples, SaVanT and dtangle reliably finish in under one minute, while slower tools may require up to two hours. However, when using custom references, CIBERSORT can run very slowly, taking over 24 hours to complete for large datasets. We conclude that combining the best tools with optimal reference datasets can provide significant gains in accuracy when carrying out deconvolution tasks.