With the advancement of high-throughput RNA sequencing technologies, the use of chemical-induced transcriptional profiling has greatly increased in biomedical research. However, the usefulness of transcriptomics data is limited by inherent random noise and technical artefacts that may cause systematical biases. These limitations make it challenging to identify the true signal of perturbation and extract knowledge from the data. In this study, we propose a deep generative model called Transcriptional Signatures Generator (TranSiGen), which aims to denoise and reconstruct transcriptional profiles through self-supervised representation learning. TranSiGen uses cell basal gene expression and compound molecular structure representation to infer the chemical-induced transcriptional profile. Results demonstrate the effectiveness of TranSiGen in learning and predicting differential expression genes. The representation derived from TranSiGen can also serve as an alternative phenotype information, with applications in ligand-based virtual screening, drug response prediction, and phenotype-based drug repurposing. We envisage that integrating TranSiGen into the drug discovery and mechanism research pipeline will promote the development of biomedicine.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.