Abstract Single-cell CRISPR screens have emerged as a critical method for linking genetic perturbations to phenotypic changes in individual cells. The most fundamental task in single-cell CRISPR screen data analysis is to test for association between a CRISPR perturbation and a univariate count outcome, such as the expression of a gene or protein. We conducted the first-ever comprehensive bench-marking study of association testing methods for low multiplicity-of-infection single-cell CRISPR screens, applying six leading methods to analyze six diverse datasets. We found that existing methods exhibit varying degrees of miscalibration, suggesting that results obtained using these methods may contain excess false positives. Next, we conducted an extensive empirical investigation to understand why existing methods demonstrate miscalibration. We identified three core analysis challenges: sparsity, confounding, and model misspecification. Finally, we developed a new association testing method based on the novel and statistically principled technique of permuting negative binomial score statistics, adding this method to our SCEPTRE software package (katsevich-lab.github.io/sceptre). This methodology addresses the core analysis challenges both in theory and in practice, demonstrating markedly improved calibration and power across datasets.