Abstract Cells are the fundamental structural and functional units of life. Studying the definition and composition of different cell types can help us understand the complex mechanisms underlying biological diversity and functionality. The increasing volume of extensive single-cell omics data makes it possible to provide detailed characterisations of cell types. Recently, there has been a rise in deep learning-based approaches that generate cell type labels solely through mapping query data to reference data. However, these approaches lack multi-scale descriptions and interpretations of identified cell types. Here, we propose Cell Decoder, a biological prior knowledge informed model to achieve multi-scale representation of cells. We implemented automated machine learning and post-hoc analysis techniques to decode cell identity. We have shown that Cell Decoder compares favourably to existing methods, offering multi-view interpretability for decoding cell identity and data integration. Furthermore, we have showcased its applicability in uncovering novel cell types and states in both human bone and mouse embryonic contexts, thereby revealing the multi-scale heterogeneity inherent in cell identities.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.