Abstract Aneuploidy, an abnormal number of chromosomes within a cell, is considered a hallmark of cancer. Patterns of aneuploidy differ across cancers, yet are similar in cancers affecting closely-related tissues. The selection pressures underlying aneuploidy patterns are not fully understood, hindering our understanding of cancer development and progression. Here, we applied interpretable machine learning (ML) methods to study tissue-selective aneuploidy patterns. We defined 20 types of features of normal and cancer tissues, and used them to model gains and losses of chromosome-arms in 24 cancer types. In order to reveal the factors that shape the tissue-specific cancer aneuploidy landscapes, we interpreted the ML models by estimating the relative contribution of each feature to the models. While confirming known drivers of positive selection, our quantitative analysis highlighted the importance of negative selection for shaping the aneuploidy landscapes of human cancer. Tumor-suppressor gene density was a better predictor of gain patterns than oncogene density, and vice-versa for loss patterns. We identified the contribution of tissue-selective features and demonstrated them experimentally for chr13q gain in colon cancer. In line with an important role for negative selection in shaping the aneuploidy landscapes, we found compensation by paralogs to be a top predictor of chromosome-arm loss prevalence, and demonstrated this relationship for one such paralog interaction. Similar factors were found to shape aneuploidy patterns in human cancer cell lines, demonstrating their relevance for aneuploidy research. Overall, our quantitative, interpretable ML models improve the understanding of the genomic properties that shape cancer aneuploidy landscapes.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.