Inexpensive DNA sequencing has led to a rapidly increasing number of whole genome sequences in the public domain. Natural sequence variation can now be assessed across a large number of sequenced strains of a bacterial species, resulting in the definition of the wild-type alleleome (the collection of alleles for every gene found in the species). Concurrently, laboratory evolution emerged as a new approach to address biological questions and to develop new phenotypic traits, and a large number of laboratory acquired mutations can be found in databases. The availability of this large-scale sequence variation data now allows for a detailed comparison of mutations fixed in natural versus laboratory evolutions. Such comparison shows that laboratory-acquired mutations are rarely found in the wild-type alleleome of Escherichia coli. The E. coli alleleome is highly conserved as most of the sequence variation is concentrated in about 2% of the coding region. We find that there are typically two alternate amino acids coded for in the variable locations, and switches between the two are found in the data sets. Finally, we find that adaptive laboratory mutations, unlike wild-type mutations, do not utilize the redundancy built into the genetic code: they are less likely to be synonymous and rely on changing a single nucleotide in a codon. However, the uniqueness of mutations fixed in laboratory evolutions bodes well for synthetic biology by revealing novel exploitable sequence space untouched by natural evolution.
Support the authors with ResearchCoin
Support the authors with ResearchCoin