The human genome contains ~70 million possible protein-altering variants, the vast majority of which are of uncertain clinical significance. Closing this gap is essential for accurate diagnosis of disease-causing variants and understanding their mechanisms of action. Towards this goal, we developed a pooled perturbation approach combining saturation mutagenesis with single cell RNA sequencing to map the effects of every single nucleotide variant in a gene. We sequenced ~440,000 cells expressing variants in CDKN2A (p16INK4a), TP53, and SOD1, observing almost all possible protein-coding variants, with a mean of 61 cells per variant. Using single cell gene expression signatures, we show that each gene may contain multiple types of pathogenic variants that affect distinct downstream pathways. We demonstrate that single cell expression signatures outperform existing bulk experimental assays and computational models for predicting pathogenicity, and summarize both the utility and potential limitations of single cell sequencing as a general variant interpretation assay.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.