Article1 June 2010Open Access Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo Gong-Hong Wei Gong-Hong Wei Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Gwenael Badis Gwenael Badis Department of Molecular Genetics and Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada Search for more papers by this author Michael F Berger Michael F Berger Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA Harvard-MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA, USA Search for more papers by this author Teemu Kivioja Teemu Kivioja Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Department of Computer Science, University of Helsinki, Helsinki, Finland Search for more papers by this author Kimmo Palin Kimmo Palin Department of Computer Science, University of Helsinki, Helsinki, Finland Search for more papers by this author Martin Enge Martin Enge Department of Biosciences and Medical Nutrition, Karolinska Institutet, Sweden Search for more papers by this author Martin Bonke Martin Bonke Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Arttu Jolma Arttu Jolma Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Markku Varjosalo Markku Varjosalo Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Andrew R Gehrke Andrew R Gehrke Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA Harvard-MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA, USA Search for more papers by this author Jian Yan Jian Yan Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Shaheynoor Talukder Shaheynoor Talukder Department of Molecular Genetics and Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada Search for more papers by this author Mikko Turunen Mikko Turunen Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Mikko Taipale Mikko Taipale Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Hendrik G Stunnenberg Hendrik G Stunnenberg Department of Molecular Biology, Radboud University Nijmegen, Nijmegen, The Netherlands Search for more papers by this author Esko Ukkonen Esko Ukkonen Department of Computer Science, University of Helsinki, Helsinki, Finland Search for more papers by this author Timothy R Hughes Timothy R Hughes Department of Molecular Genetics and Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada Search for more papers by this author Martha L Bulyk Martha L Bulyk Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA Harvard-MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA, USA Search for more papers by this author Jussi Taipale Corresponding Author Jussi Taipale Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Department of Biosciences and Medical Nutrition, Karolinska Institutet, Sweden Search for more papers by this author Gong-Hong Wei Gong-Hong Wei Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Gwenael Badis Gwenael Badis Department of Molecular Genetics and Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada Search for more papers by this author Michael F Berger Michael F Berger Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA Harvard-MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA, USA Search for more papers by this author Teemu Kivioja Teemu Kivioja Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Department of Computer Science, University of Helsinki, Helsinki, Finland Search for more papers by this author Kimmo Palin Kimmo Palin Department of Computer Science, University of Helsinki, Helsinki, Finland Search for more papers by this author Martin Enge Martin Enge Department of Biosciences and Medical Nutrition, Karolinska Institutet, Sweden Search for more papers by this author Martin Bonke Martin Bonke Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Arttu Jolma Arttu Jolma Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Markku Varjosalo Markku Varjosalo Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Andrew R Gehrke Andrew R Gehrke Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA Harvard-MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA, USA Search for more papers by this author Jian Yan Jian Yan Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Shaheynoor Talukder Shaheynoor Talukder Department of Molecular Genetics and Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada Search for more papers by this author Mikko Turunen Mikko Turunen Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Mikko Taipale Mikko Taipale Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Search for more papers by this author Hendrik G Stunnenberg Hendrik G Stunnenberg Department of Molecular Biology, Radboud University Nijmegen, Nijmegen, The Netherlands Search for more papers by this author Esko Ukkonen Esko Ukkonen Department of Computer Science, University of Helsinki, Helsinki, Finland Search for more papers by this author Timothy R Hughes Timothy R Hughes Department of Molecular Genetics and Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada Search for more papers by this author Martha L Bulyk Martha L Bulyk Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA Harvard-MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA, USA Search for more papers by this author Jussi Taipale Corresponding Author Jussi Taipale Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland Department of Biosciences and Medical Nutrition, Karolinska Institutet, Sweden Search for more papers by this author Author Information Gong-Hong Wei1, Gwenael Badis2, Michael F Berger3,4,5,6, Teemu Kivioja1,7, Kimmo Palin7, Martin Enge8, Martin Bonke1, Arttu Jolma1, Markku Varjosalo1, Andrew R Gehrke3,4,5,6, Jian Yan1, Shaheynoor Talukder2, Mikko Turunen1, Mikko Taipale1, Hendrik G Stunnenberg9, Esko Ukkonen7, Timothy R Hughes2, Martha L Bulyk3,4,5,6 and Jussi Taipale 1,8 1Public Health Genomics Unit, National Institute for Health and Welfare (THL) and Genome-Scale Biology Program, Institute of Biomedicine and High Throughput Center, University of Helsinki, Biomedicum, Helsinki, Finland 2Department of Molecular Genetics and Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada 3Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA 4Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA 5Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA 6Harvard-MIT Division of Health Sciences and Technology (HST), Harvard Medical School, Boston, MA, USA 7Department of Computer Science, University of Helsinki, Helsinki, Finland 8Department of Biosciences and Medical Nutrition, Karolinska Institutet, Sweden 9Department of Molecular Biology, Radboud University Nijmegen, Nijmegen, The Netherlands *Corresponding author. Department of Biosciences and Medical Nutrition, Karolinska Institutet, Sweden. Tel.: +46 858 583 833; E-mail: [email protected] The EMBO Journal (2010)29:2147-2160https://doi.org/10.1038/emboj.2010.106 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions Figures & Info Members of the large ETS family of transcription factors (TFs) have highly similar DNA-binding domains (DBDs)—yet they have diverse functions and activities in physiology and oncogenesis. Some differences in DNA-binding preferences within this family have been described, but they have not been analysed systematically, and their contributions to targeting remain largely uncharacterized. We report here the DNA-binding profiles for all human and mouse ETS factors, which we generated using two different methods: a high-throughput microwell-based TF DNA-binding specificity assay, and protein-binding microarrays (PBMs). Both approaches reveal that the ETS-binding profiles cluster into four distinct classes, and that all ETS factors linked to cancer, ERG, ETV1, ETV4 and FLI1, fall into just one of these classes. We identify amino-acid residues that are critical for the differences in specificity between all the classes, and confirm the specificities in vivo using chromatin immunoprecipitation followed by sequencing (ChIP-seq) for a member of each class. The results indicate that even relatively small differences in in vitro binding specificity of a TF contribute to site selectivity in vivo. Introduction We currently know very little about the molecular mechanisms that control tissue-specific gene expression, and about the variations in gene expression that underlie many pathological states, including cancer. This is in part due to the lack of information about the ‘second genetic code’—the binding specificities of transcription factors (TFs). Deciphering this regulatory code will allow us to explain observations (i.e. ChIP, expression profiling) based on biochemical principles. The ultimate aim is to read the genetic code of gene expression, that is, understand the expression of genes based on DNA sequence. To begin to address these questions, we have in this work concentrated on the study of the large ETS family of TFs, whose members have diverse functions and activities in physiology and oncogenesis (Bartel et al, 2000; Sharrocks, 2001; Kumar-Sinha et al, 2008). The first ETS factor identified was ETS1, which was discovered as a homolog of the avian leukaemia virus E26 oncogene in 1983 (Leprince et al, 1983; Nunn et al, 1983). Subsequent analyses have identified a total of 27 and 26 ETS-family members in human and mouse genomes, respectively (Bult et al, 2008). ETS factors have both developmental functions (Schober et al, 2005), and functions in differentiated tissues and cells (Bartel et al, 2000). They are critical for vasculogenesis/angiogenesis, hematopoiesis and neuronal development (Bartel et al, 2000; Vrieseling and Arber, 2006). Cellular responses to activated ETS factors include cell proliferation, differentiation and migration (Sharrocks, 2001; Schober et al, 2005), depending on the type and state of the responding cell. Many ETS proteins, including ETV4 in mammals and Yan in Drosophila are transcriptional targets of signalling pathways (Schober et al, 2005; Vrieseling and Arber, 2006). Activity of ETS proteins can also be modulated directly by phosphorylation; members of the ETS and ELK subgroups of ETS factors mediate transcriptional responses to Ras/MAPK signalling pathways in species ranging from Caenorhabditis elegans to humans (Brunner et al, 1994; O'Neill et al, 1994; Beitel et al, 1995; Sharrocks, 2001). The mechanism of activation of the ELK factors in response to activation of Ras also appears to be conserved between species (Wasylyk et al, 1997). Translocations altering the activity of several members of the ETS family are associated with multiple types of human cancer. In translocations observed in some cancer types, the ETS DNA-binding domain (DBD) is lost, and the ETS partner contributes a regulatory domain to another class of DBD (e.g. ETV6-RUNX1; Golub et al, 1995; Mavrothalassitis and Ghysdael, 2000). More commonly, the cancer-associated translocations result in fusion of a strong transcriptional activator domain to the ETS DBD (e.g. EWS fused to FLI1 or ERG in Ewing's sarcoma; Delattre et al, 1992; Sorensen et al, 1994) and/or overexpression of an ETS-family member due to introduction of a strong cis-regulatory element upstream of it (Tomlins et al, 2005, 2007). In fact, the most common known cancer-associated translocation is the TMPRSS2-ERG fusion, which introduces a strong androgen receptor (AR)-dependent regulatory element upstream of the ERG gene (Tomlins et al, 2005). Together with other translocations involving ETV1 and ETV4, over 50% of all prostate cancer cases display hyperactivity of ETS proteins (Kumar-Sinha et al, 2008). All ETS factors share a conserved winged helix-turn-helix DBD of ∼85 amino acids, and all analysed members of this family bind to a consensus DNA sequence containing a core 5′-GGA(A/T)-3′ motif (Karim et al, 1990; Nye et al, 1992). On the basis of phylogenetic analysis of the DBDs, the ETS family has been subdivided into 12 different subgroups (Laudet et al, 1999; Hollenhorst et al, 2007). Thus, although all ETS DBDs are relatively highly conserved, different ETS proteins might exhibit a preference for different flanking sequences to differentially bind to specific DNA sites, and thus regulate distinct biological processes. However, there exists no systematic and uniform analysis of ETS-binding specificities, and whether differences in binding specificity (if any) relate to targeting in vivo. An earlier analysis showed that there were differences between published motifs for different ETS-family members, but these differences did not reflect amino-acid features, and might be due to differences in the experimental methods used in different studies (Kielbasa et al, 2005). In this work, we describe the first comprehensive genome-wide analysis of binding specificities of the ETS TF family. We find that the ETS-family DNA-binding specificities fall into four distinct classes, and confirm this finding by identifying the key DNA-contact amino acids that contribute to class specificity. We further perform ChIP-seq analyses for representative ETS factors to map the ETS-binding sites in vivo in Ewing's sarcoma, leukaemia and prostate cancer cells. These analyses provide a systematic genome-wide map of ETS DNA-binding specificities in vitro and in vivo. Remarkably, the genome-wide data reveal that even small differences in ETS DNA-binding preferences can contribute to in vivo targeting specificities. Results Systematic determination of ETS-binding specificities To determine the binding specificities of the ETS factors, we first cloned all human and mouse ETS DBDs and human ETS full-length cDNAs (Figure 1; Supplementary Table S1). Two parallel methods were used to independently determine relative DNA sequence-specific binding affinities: high-throughput microwell-based TF DNA-binding specificity assay (Hallikas and Taipale, 2006; Hallikas et al, 2006) and protein-binding microarrays (PBMs; Berger et al, 2006). As these two strategies are based on different principles, they act to complement and cross-validate each other. Figure 1.Structural organizations and binding specificities of mammalian ETS transcription factors. (Left) Schematic representation of the domain structures of the respective full-length proteins. ETS domain is in blue, pointed domain is in green, Proline-rich domain is in grey, and the Nuc_orp_HMR_rcpt and A/T hook domains are in dark yellow and black, respectively. HUGO gene names are from ENSEMBL and protein domains are from Pfam. The second and third columns, respectively, show human and mouse ETS-binding profiles determined using microwell-based transcription factor-DNA-binding assays. The right column shows mouse ETS-binding profiles determined using protein-binding microarrays. The logos are drawn using enoLOGOS (Workman et al, 2005), and the height of a letter at a particular position is directly proportional to the effect of that nucleotide on the binding affinity. Coordinates for the bases are also indicated above each column (see also Supplementary Figures S1 and S9; Supplementary Tables S1, S2 and S6; Supplementary data file S1). Download figure Download PowerPoint In the microwell-based assay, human and mouse ETS DBDs were expressed as fusion proteins to a Renilla luciferase enzyme. The TF-Renilla luciferase fusion proteins were incubated with biotinylated double-stranded oligonucleotide containing a sequence with high affinity to all known ETS factors in the presence of an excess of mismatched competitor oligos. The binding data were then analysed to produce a position weight matrix (PWM) of the TF-binding site. Independent analysis of the mouse ETS family was carried out using PBMs, which allow determination of TF-binding specificities through sequence-specific binding of individual TFs directly to double-stranded DNA microarrays containing all possible 10-mer binding sites (Berger et al, 2006). Both methods generated similar binding profiles (Figure 1; Supplementary Table S2), with all of the ETS factors binding to the previously described core GGA(A/T) motif. Of the 27 factors we studied, 13 had been previously analysed using different methods to yield a partial binding specificity. Our results were similar, but not identical, to these earlier studies, as described in the following sections. Analysis of the divergence of ETS TF-binding profiles We next analysed the differences in the obtained profiles to determine which ETS factors have similar binding specificities. For this purpose, we developed a computational method that allows determination of similarity between TF motifs using the minimum Kullback–Leibler divergence between all translations and reverse complementations of the multinomial distributions defined by the motifs. This analysis revealed that all ETS profiles were relatively similar to each other, and clearly divergent from publicly available non-ETS TF-binding profiles (Figure 2). The ETS-binding profiles fell into four distinct classes (Figure 2), containing 15, 8, 3 and 1 member(s), respectively. These classes were robustly identified using results either only from the microwell-based method (Figure 2), from only the PBM method (Figure 3A and B) or from the combination of the two (Supplementary Figure S1A). The classes were named according to their respective sizes, with class I being the largest group, containing the cancer-associated ETS factors ERG, ETV1, ETV4 and FLI1. Consistent with earlier results (Kielbasa et al, 2005), clustering analysis of ETS factors available from current databases and literature did not yield a clear classification of sites (Figure 3C; Supplementary Figure S3). However, the classes we obtained do show a clear relationship to groupings based on amino-acid features (see below and Discussion). Figure 2.ETS-binding specificity. Clustering analysis of binding profiles of human (h) and mouse (m) ETS transcription factors (microwell method) and publicly available non-ETS-family transcription factor matrices from Jaspar2 (Bryne et al, 2008; http://jaspar.genereg.net). The four different classes of ETS factors are indicated by colour: class I, red; class II, blue; class III, green; class IV, brown. Coloured dots indicate the main branches defining the classes. ETS matrices indicated as ‘class’ are the representative matrices for the different ETS classes identified using affinity propagation clustering (see Materials and methods for details). Representative logos, drawn using enoLOGOS (Workman et al, 2005), are also shown. Bases displaying the most prominent changes are boxed (see also Supplementary Figures S2 and S6; Supplementary data file S1). Download figure Download PowerPoint Figure 3.Identification of four ETS classes is independent of clustering and binding model derivation methods used. (A) Heat-map correlation analysis of the protein-binding microarray-derived ETS-binding models. The same four ETS classes are detected when protein-binding microarray results are clustered using the top 100 TF-binding sites for each ETS-family member (analysis as in Berger et al, 2008). In all, 20 known and 2 predicted mouse ETS-family members are included in the analysis. (B) Kullback–Leibler divergence-based clustering analysis of mouse ETS-binding profiles derived from protein-binding microarrays. Note that this analysis also reveals the same four different classes of ETS factors, which are indicated in the same colours as in Figures 1 and 2. Coloured dots indicate the main branches defining the classes. (C) Clustering of existing ETS family binding profiles from JASPAR2 (JA), TRANSFAC (TR) professional and literature (Nye et al, 1992; Treisman et al, 1992; Woods et al, 1992; Dalton and Treisman, 1992; Virbasius et al, 1993; Ray-Gallet et al, 1995; Shore and Sharrocks, 1995; Matys et al, 2006; Choi and Sinha, 2006; Bryne et al, 2008). Note that the four separate ETS classes are not identified using the earlier data (see also Supplementary Figures S3 and S6). Download figure Download PowerPoint The main differences between our motifs are concentrated on the core +4 position and 5′ flanking base pairs. Although the consensus sequences of the ETS factors are relatively similar, many somewhat weaker sites are much more class specific or exclude one or more classes of ETS DBDs (Supplementary Figure S2). Only the difference between other ETS-family members analysed and the lone class IV factor SPDEF has been identified earlier (Oettgen et al, 2000). In general, the class definitions derived using hierarchical clustering seemed to be largely sufficient to explain the differences between the ETS-family members. However, ETV6 and ETV7 appeared to have subtly different binding specificity at +4 compared with the other members of class II (Figure 1), and in this way resembled more the class III factors. We therefore propose subclassification of class II into class IIa containing the ELF-family factors, and class IIb comprising ETV6 and ETV7. Molecular basis of ETS-class specificity To analyse the molecular basis of the differences in ETS-binding specificities, we investigated the amino acid-DNA contacts in published crystal structures of ETS1, GABPA, ELK1, ELF3, SPI1 and SPDEF–DNA complexes (Kodandapani et al, 1996; Batchelor et al, 1998; Mo et al, 1998, 2000; Garvie et al, 2001; Verger and Duterque-Coquillaud, 2002; Pufall et al, 2005; Wang et al, 2005; Lamber et al, 2008; Agarkar et al, 2010). The invariant GGA core bases of the ETS-family profiles are consistent with the absolute conservation of two key DNA-contacting arginines in Helix3 (Figure 4A, black). Most of the differences in DNA-binding specificity at particular bases, in turn, correlate with corresponding changes in residues contacting DNA at or near these bases (Figure 4B). The preference of the lone class IV factor, SPDEF for T at +4 correlates with the presence of serine and glutamine at DNA-contact residues 9 and 11, respectively. Recent crystal structure analysis of SPDEF–DNA complex suggested that combination of these residues is responsible for the preference of T at +4 (Wang et al, 2005). We confirmed the importance of these two residues by mutagenesis followed by microwell-based DNA-binding specificity assay (Figure 4C). Figure 4.Molecular basis of ETS-class specificity. (A) ETS-domain secondary structure indicating key amino acids contacting nucleotide bases (black lines), DNA-backbone (brown lines) and water (blue line) based on the published crystal structures of ETS-domain DNA complexes (Kodandapani et al, 1996; Batchelor et al, 1998; Mo et al, 1998, 2000; Garvie et al, 2001; Pufall et al, 2005; Wang et al, 2005; Lamber et al, 2008; Agarkar et al, 2010). Amino-acid residues contacting DNA are numbered from 1 to 15. Two invariant arginines that bind to the core GGA sequence are in black typeface. Bases contributing to DNA-binding specificity are numbered from −3 to +7. (B) (Left) Sequence logos showing amino-acid conservation in DNA-contacting regions of the different ETS classes. Amino acids that are specific for a given class are indicated by colours, and the two invariant arginines are in black. (Right) Representative PWMs for the ETS classes. Bases that distinguish each class from the others are boxed, and residues that contact bases, water or DNA backbone are indicated in black, blue or yellow lines, respectively. (C) Identification of amino-acid residues that are required for class IV DNA-binding specificity. Mutating key DNA-contact residues in ETS class IV factor SPDEF (top) to the corresponding residues in ETS class IIa (bottom left) results in a change in DNA-binding specificity of SPDEF towards class IIa (bottom right; data from microwell assay). (D) Identification of amino-acid residues that can confer class I, IIb or III DNA-binding specificity to a class IIa ETS factor. Indicated residues in class IIa ETS DNA-binding domain from mouse ELF4 were mutated to the corresponding residues in the other ETS classes. The resulting DNA-binding profile from microwell assay is shown on the right. Bases displaying the most prominent changes are boxed. Note that o