Abstract In order to control gene expression, regulatory DNA variants are commonly designed using random synthetic approaches with mutagenesis and screening. This however limits the size of the designed DNA to span merely a part of a single regulatory region, whereas the whole gene regulatory structure including the coding and adjacent non-coding regions is involved in controlling gene expression. Here, we prototype a deep neural network strategy that models whole gene regulatory structures and generates de novo functional regulatory DNA with prespecified expression levels. By learning directly from natural genomic data, without the need for large synthetic DNA libraries, our ExpressionGAN can traverse the whole sequence-expression landscape to produce sequence variants with target mRNA levels as well as natural-like properties, including over 30% dissimilarity to any natural sequence. We experimentally demonstrate that this generative strategy is more efficient than a mutational one when using purely natural genomic data, as 57% of the newly-generated highly-expressed sequences surpass the expression levels of natural controls. We foresee this as a lucrative strategy to expand our knowledge of gene expression regulation as well as increase expression control in any desired organism for synthetic biology and metabolic engineering applications.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.