Article20 January 2009Open Access An integrated workflow for charting the human interaction proteome: insights into the PP2A system Timo Glatter Timo Glatter Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Competence Center for Systems Physiology and Metabolic Diseases, ETH Zurich, Zurich, Switzerland Search for more papers by this author Alexander Wepf Alexander Wepf Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Competence Center for Systems Physiology and Metabolic Diseases, ETH Zurich, Zurich, Switzerland Search for more papers by this author Ruedi Aebersold Ruedi Aebersold Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Competence Center for Systems Physiology and Metabolic Diseases, ETH Zurich, Zurich, Switzerland Faculty of Science, University of Zurich, Zurich, Switzerland Institute for Systems Biology, Seattle, WA, USA Search for more papers by this author Matthias Gstaiger Corresponding Author Matthias Gstaiger Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Competence Center for Systems Physiology and Metabolic Diseases, ETH Zurich, Zurich, Switzerland Search for more papers by this author Timo Glatter Timo Glatter Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Competence Center for Systems Physiology and Metabolic Diseases, ETH Zurich, Zurich, Switzerland Search for more papers by this author Alexander Wepf Alexander Wepf Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Competence Center for Systems Physiology and Metabolic Diseases, ETH Zurich, Zurich, Switzerland Search for more papers by this author Ruedi Aebersold Ruedi Aebersold Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Competence Center for Systems Physiology and Metabolic Diseases, ETH Zurich, Zurich, Switzerland Faculty of Science, University of Zurich, Zurich, Switzerland Institute for Systems Biology, Seattle, WA, USA Search for more papers by this author Matthias Gstaiger Corresponding Author Matthias Gstaiger Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Competence Center for Systems Physiology and Metabolic Diseases, ETH Zurich, Zurich, Switzerland Search for more papers by this author Author Information Timo Glatter1,2,‡, Alexander Wepf1,2,‡, Ruedi Aebersold1,2,3,4 and Matthias Gstaiger 1,2 1Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland 2Competence Center for Systems Physiology and Metabolic Diseases, ETH Zurich, Zurich, Switzerland 3Faculty of Science, University of Zurich, Zurich, Switzerland 4Institute for Systems Biology, Seattle, WA, USA ‡These authors contributed equally to this work *Corresponding author. Institute of Molecular Systems Biology, ETH, Wolfgang Pauli Strasse 16, Zürich 8093, Switzerland. Tel.: +41 44 633 71 49; Fax: +41 44 633 10 51; E-mail: [email protected] Molecular Systems Biology (2009)5:237https://doi.org/10.1038/msb.2008.75 PDFDownload PDF of article text and main figures. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Protein complexes represent major functional units for the execution of biological processes. Systematic affinity purification coupled with mass spectrometry (AP-MS) yielded a wealth of information on the compendium of protein complexes expressed in Saccharomyces cerevisiae. However, global AP-MS analysis of human protein complexes is hampered by the low throughput, sensitivity and data robustness of existing procedures, which limit its application for systems biology research. Here, we address these limitations by a novel integrated method, which we applied and benchmarked for the human protein phosphatase 2A system. We identified a total of 197 protein interactions with high reproducibility, showing the coexistence of distinct classes of phosphatase complexes that are linked to proteins implicated in mitosis, cell signalling, DNA damage control and more. These results show that the presented analytical process will substantially advance throughput and reproducibility in future systematic AP-MS studies on human protein complexes. Synopsis The majority of proteins function in the context of larger protein complexes. Affinity purification coupled with mass spectrometry (AP-MS) became the method of choice for systematic and direct experimental analysis of protein complexes under near-physiological conditions. Although a lot of progress has been made on the systematic AP-MS analysis of the yeast compendium of protein complexes (Gavin et al, 2006; Krogan et al, 2006), relatively little advance has been reported on the corresponding organization of the human interaction proteome. Despite recent improvements in mass spectrometry instrumentation, the size of the human proteome and the number of 225 000 estimated protein interactions (Hart et al, 2006) challenge existing experimental AP-MS workflows with respect to throughput, sensitivity and data robustness. In this study, we have developed and evaluated an integrated experimental workflow to facilitate system-wide analysis of human protein complexes. We benchmarked the overall performance of the presented workflow using the human PP2A phosphatase system and show how it can be used to increase data robustness and throughput in future AP-MS studies on the human interaction proteome. The presented workflow builds on the increasing availability of gateway-compatible orfeome resources and FRT-mediated recombination for high-throughput generation of isogenic bait-expressing cell lines within 2 weeks. Expression in these cell lines can be controlled by a tetracycline-inducible promoter to maintain homogenous expression at close to physiological levels throughout the cell population. We have replaced the widely used classical 21 kDa TAP tag by a novel small double-affinity tag to increase sample processing speed and enhance purification yields up to 40%. Samples purified by this procedure can be analysed readily by a direct liquid chromatography tandem mass spectrometry (LC-MS/MS) approach without the need for further SDS–PAGE fractionation commonly used in previous workflows. Direct LC-MS/MS analysis reduces the number of experimental steps and contributes to the obtained overall reproducibility of the approach, which we benchmarked for the human PP2A phosphatase system. The evolutionary conserved serine/threonine phosphatase PP2A has been linked to a wide range of cellular processes including transcription, apoptosis, cell growth and cellular transformation (Virshup, 2000; Janssens et al, 2005; Westermarck and Hahn, 2008). The human genome encodes two catalytic subunits (PPP2CA, PPP2CB), two scaffolding subunits (PPP2R1A, PPP2R1B) and at least 15 known regulatory B subunits, which, by combinatorial assembly, can potentially form a multitude of different trimeric PP2A complexes (Janssens and Goris, 2001; Lechward et al, 2001). It is believed that the versatile nature of this combinatorial subunit arrangement provides substrate specificity as well as temporal and spatial control of phosphatase activity. So far no systematic study has yet been performed to characterize the set of PP2A complexes that coexist in human cells and to understand how these complexes are connected to specific cellular processes at the level of protein–protein interactions. We have analysed 11 bait proteins selected from the human protein phosphatase 2A (PP2A) system and identified 197 protein interaction with a reproducibility rate of 85% between two biological replicate experiments (Figure 4A). This is among the highest rates reported so far for systematic AP-MS/MS workflows. For further validation, we compared the data to information from the literature and public databases. About two-thirds of the 197 interactions either have been reported previously in the literature or were related to interactions known between human paralogous or yeast orthologous proteins. On the basis of interaction information alone, it is difficult to infer the presence and composition of protein complexes. However, in the case of human PP2A, significant amount of published structural and biochemical data provide valuable information on the composition of several distinct groups of phosphatase complexes (Lechward et al, 2001; Chao et al, 2006; Leulliot et al, 2006; Xu et al, 2006; Xing et al, 2008). We used this information to assign the 150 paralogous interactions identified in our network to five groups of known phosphatase complexes, here referred to as modules (Figure 6). These include the group of trimeric PP2A complexes described above, which represent the majority of PP2A complexes we found, as well as PP2A complexes containing the proteins IGBP1/TAP42 or the protein phosphatase methylesterase (PPME1) in addition to PPP4C containing phosphatase complexes. We estimate that, overall, more than 30 distinct phosphatase complexes coexist in human embryonic kidney cells. On the basis of their interactions with other cellular proteins, these complexes may have specific functions in transcription, cell signalling, DNA damage control and the regulation of mitosis. The presented results thus confirmed and significantly extended our knowledge on combinatorial complex assembly as a molecular principle for the functional diversification within the human PP2A phosphatase system. When we compared our interaction data with interaction data available for the corresponding yeast orthologous proteins, we found that the interactions particularly within the modules mentioned above are highly conserved. Furthermore, the comparison suggested that functional diversification within the human phosphatase system primarily involved an expansion of regulatory phosphatase subunits and their protein interactions, as the number of PP2A catalytic subunits are the same between humans and yeast. Large-scale AP-MS represents the method of choice to retrieve high-quality information on the global organization of the human proteome into protein complexes, which in most cases represent the actual functional units of biochemical systems. A comprehensive representation of the human interaction proteome will require a collective effort by the research community using improved analytical workflows with increased throughput, sensitivity and reliability. We believe that the advances collectively achieved by the integrated workflow presented here mark a significant step forward towards these goals. Introduction The majority of proteins exert an effect in the context of macromolecular assemblies that are part of dynamic networks of enormous complexity. Cellular processes, such as cell signalling, proliferation, apoptosis and cell growth, emerge to a large extent from the properties of such networks. Hence, understanding and modelling of cellular processes in healthy and pathological conditions depend on comprehensive and robust information on the topology and the dynamic properties of the engaged protein networks. Initially, large-scale protein interaction studies were performed with the yeast two-hybrid technology, which provided insights into global patterns of binary protein interactions of model organism proteomes (Uetz et al, 2000; Walhout et al, 2000; Ito et al, 2001). More recently, affinity purification coupled with mass spectrometry (AP-MS) has become the method of choice for the analysis of protein complexes under near-physiological conditions (Gingras et al, 2007; Kocher and Superti-Furga, 2007). Large-scale AP-MS studies performed in yeast provided the first comprehensive set of high-density interaction data, which became an invaluable source of information for yeast systems biology (Gavin et al, 2002, 2006; Ho et al, 2002; Krogan et al, 2006). The success in yeast can be mainly attributed to the high efficiency of homologous recombination that allowed genome-wide tagging of yeast ORFs as a valuable resource for large-scale AP-MS studies. However, no such genetic system exists for multicellular eukaryotes. Given the various cell types and cellular states, each characterized by specific protein–protein interaction networks, the complexity of the human proteome and the limited genetic methods available to generate cell lines expressing affinity-tagged proteins, global analysis of protein complexes and protein interaction networks in human cells is a daunting task. Progress towards this goal will strongly depend on efficient and robust AP-MS workflows for human cells that provide comprehensive as well as high-confidence protein complex information to populate public databases. The robustness and reproducibility of such methods are key because it can be anticipated that data from different studies and research groups must be combined to achieve saturation coverage of the human interaction proteome. However, false discovery and reproducibility rates are not known for existing methods, which make the combination of AP-MS data from different studies difficult. In addition, present AP-MS strategies are limited by the labour-intense generation of large collections of human cell lines for expression of epitope-tagged bait proteins, the low yield of protein complex isolation from such cell lines and the limited sensitivity of MS-based protein identification. To overcome some of these major limitations, we have developed an integrated experimental workflow. Besides optimizing each experimental step, we focused on the compatibility of the steps with each other to generate a process with improved performance. As a result, the proposed procedure significantly enhanced the throughput of generating bait-expressing cell lines, increased the protein complex purification yields by a novel double-affinity strategy and allowed analysis of protein complexes and interaction networks with high sensitivity and reproducibility. We applied this procedure to study a network of human protein phosphatase 2A (PP2A) complexes. PP2A is a heterotrimeric, evolutionary conserved serine/threonine phosphatase with regulatory functions in a wide range of cellular processes, including transcription, apoptosis, cell growth and cellular transformation (Virshup, 2000; Lechward et al, 2001). The human genome encodes two catalytic subunits (PPP2CA, PPP2CB), two scaffolding subunits (PPP2R1A, PPP2R1B) and at least 15 known regulatory B subunits that, by combinatorial assembly, can potentially form a multitude of different trimeric PP2A complexes (Janssens and Goris, 2001; Lechward et al, 2001). It is believed that the versatile nature of this combinatorial subunit arrangement provides substrate specificity as well as temporal and spatial control of phosphatase activity. However, no systematic study has yet been performed to address the question, which PP2A complex forms indeed coexist in human cells and how these complexes are connected to specific cellular processes through protein–protein interactions. Using the method described in this work, we identified 197 specific protein–protein interactions at a reproducibility rate of at least 85%. The discovered interactions constitute a network of different classes of concurrently present phosphatase complexes that in turn are linked to proteins with specific functions in cell signalling, mitosis, DNA repair and more. On the basis of these results, we believe that the proposed analytical procedure will significantly improve the scope and reproducibility of future AP-MS studies on the human interaction proteome. Results An integrated workflow for systematic AP-MS studies on human protein complexes Affinity purification coupled with mass spectrometry analysis of protein complexes can be grouped into three sequential steps: production of cell lines expressing epitope-tagged bait proteins, protein complex purification and MS-based analysis of the isolated samples. Each step contributes to the overall performance of the process. To generate a robust and reproducible workflow for the characterization of human protein complexes, we have optimized each step and integrated them into an efficient process. Thereby, we paid attention to a good compatibility of the steps between each other. The system builds on (i) gateway-compatible orfeome collections and the Flippase (Flp) recombination system to rapidly generate large collections of human cell lines by homologous recombination for isogenic and tetracycline (tet)-controlled expression of tagged bait proteins, (ii) the development of a novel double-affinity purification strategy to significantly increase sample recovery and reproducibility and (iii) direct liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis of purified complexes to improve the sensitivity of protein identification. In what follows we describe the individual steps of the workflow and document its performance for systematic protein complex analysis as shown for the human PP2A protein interaction network. Rapid generation of cell lines for inducible expression of affinity-tagged bait proteins A major bottleneck in large-scale AP-MS analyses in species other than Saccharomyces cerevisiae has been the resource-intense generation of cell lines expressing epitope-tagged bait proteins, preferably at controlled levels, required for protein complex purification. Here, we combined recombinational cloning of expression constructs from human orfeome libraries with homologous recombination using Flp recombinase in human cells to significantly increase the production rate for such human cell lines (Figure 1A). We used a gateway-compatible orfeome collection containing 12 212 ORFs, representing 10 214 non-redundant protein-coding genes (Lamesch et al, 2007) as a resource to generate expression constructs by LR recombination with a destination vector suitable for tetracycline-controlled expression of affinity-tagged bait proteins. The presence of a Flp recombination target site (FRT) in the resulting expression constructs supported the rapid generation of bait-expressing cell lines by Flp-mediated recombination with a single FRT site present in the HEK293 host cell line (O'Gorman et al, 1991). The system was evaluated with respect to the following properties. (i) Efficiency and reliability. We routinely obtained isogenic human HEK293 cell pools within 2 weeks after transfection with a success rate of about 85% (n>200 transfected orf clones, data not shown) without further need for subcloning. (ii) Uniformity. The FRT recombination system ensures uniform expression of the transgene in the respective cell populations as demonstrated by indirect immunofluorescence microscopy of HEK293 cell lines expressing different epitope-tagged proteins from the human PP2A system (Figure 1B). (iii) Inducible bait expression. The ability to control bait protein expression levels is crucial in cases in which growth inhibitory or pro-apoptotic bait proteins are expressed. Expression levels of bait proteins could be induced and adjusted by tetracycline (Figure 1C) and were comparable with corresponding endogenous protein levels (Figure 1D). In conclusion, orfeome-based generation of human cell lines using the FRT system is an efficient and reliable method for the generation of large collections of isogenic cell pools for tetracycline-controlled expression of affinity-tagged bait proteins at close to physiological levels. Figure 1.Rapid generation of cell line collections for isogenic and inducible bait expression using Flp-recombinase-mediated recombination. (A) Schematic overview on the generation of cell line collections. Starting from human Gateway orfeome collections, cDNAs of interest are recombined into an expression construct for tetracycline (tet)-inducible expression of strep-hemagglutinin double-tagged (SH) bait proteins. Isogenic cell lines are generated using Flp-recombinase-mediated recombination through single FRT sites present in the expression construct and the genome of Flp-In HEK293 cells stably expressing the tet repressor. After transfection, cell lines are selected on hygromycin for 2 weeks, tested for tetracycline-inducible expression and used for subsequent affinity purification. (B) Isogenic bait protein expression in HEK293 Flp-In cells. The expression of the indicated recombinant proteins in the absence or presence of 1 μg/ml tetracycline for 24 h was visualized by indirect fluorescence microscopy with an anti-HA antibody. Nuclei were stained with DAPI. (C) Tet-inducible bait expression. Increasing amounts of tetracycline were added to HEK293 cells expressing SH–eGFP for 24 h. Bait expression was monitored by immunoblotting using anti-HA antibodies. (D) Comparison of protein expression levels of SH-tagged bait proteins with endogenous protein levels. HEK293 cell lines expressing SH-tagged bait proteins were analysed by immunoblotting using the indicated antibodies following induction with tetracycline (1 μg/ml) for 24 h. HEK293 cells that do not express the corresponding bait proteins were used as controls. Note that anti-PPP2C and anti-PPP2R1 antibodies do not distinguish between the highly related endogenous proteins PPP2CA and PPP2CB or PPP2R1A and PPP2R1B, respectively. Download figure Download PowerPoint Increased yields in protein complex preparations by SH-double affinity purification So far, tandem affinity purification (TAP) has been the most widely used procedure in systematic protein complex purification (Rigaut et al, 1999; Bouwmeester et al, 2004; Gingras et al, 2005; Gavin et al, 2006). However, these studies required large amounts of cellular starting material due to the low purification yields obtained by the TAP procedure (Al-Hakim et al, 2005; Gregan et al, 2007). This imposes significant economical and logistic challenges, especially for large-scale studies. To improve the yield, we integrated a novel double-affinity purification protocol into our analytical workflow. Protein complexes are isolated through a small double-affinity tag (SH-tag) consisting of a streptavidin-binding peptide and a hemagglutinin (HA) epitope tag. In addition, we optimized the purification protocols for efficient double-affinity purification from low amounts of starting material. Following induction of isogenic bait expression using tetracycline, SH-tagged and associated proteins are first bound to an affinity column containing a modified version of streptavidin (Junttila et al, 2005) and specifically eluted with biotin onto an anti-HA antibody column. The protein complexes are eluted from this column at low pH (Figure 2A). Western blotting showed that SH-PPP2R2B from HEK293 cell extracts could be bound near quantitatively to the streptavidin column (Figure 2B, SNS). Elution from the streptavidin column with biotin was highly efficient with almost no detectable bait protein left on the streptavidin beads following elution with SDS Laemmli buffer (not shown). Overall, the first purification step recovered more than 90% of bait protein (Figure 2B, ES). From the western blot signal intensity of the final eluate (Figure 2B, EH), we estimate the overall yield of the double purification at about 30–40% of bait protein present in the cell lysate, which is among the highest reported for double-affinity purification protocols. Importantly, the high yields were achieved independent of the bait protein, as SH-purifications of 11 different bait proteins showed comparable yields (Figure 2C). Figure 2.Monitoring SH-double-affinity purification efficiency. (A) Schematic overview of the purification procedure. HEK293 cells expressing SH-tagged proteins are lysed and first purified from total protein extracts using streptavidin sepharose (Strep-Tactin beads). After several wash steps, purified proteins are released in the presence of 2 mM biotin for subsequent immunoaffinity purification using anti-HA agarose. Finally, protein complexes are eluted with 0.2 M glycine, pH 2.5, and processed for mass spectrometric analysis. (B) Western blot analysis of SH-purification yields. SH-PPP2R2B expressing HEK293 cell line was generated as described in Figure 1 and the purification was performed on 3 × 107 cells. The purification procedure was monitored by immunoblotting using anti-HA antibodies. L: lysate; SNS: supernatant after streptavidin purification; ES: elution from the streptavidin sepharose; SNH: supernatant after anti-HA purification; EH: elution from the anti-HA agarose (final eluate). Information on the percentage of input is given to compare the relative amount of sample loaded on each gel lane. (C) Reproducibility of SH-purification yields. 11 cell lines inducibly expressing SH-tagged bait proteins related to the PP2A phosphatase system were generated as described. Lysate (L) and final eluate (EH) from all 11 bait-specific SH-purifications were immunoblotted using anti-HA antibodies and percentage of loaded sample amount is indicated. Download figure Download PowerPoint Monitoring specificity and sensitivity of the SH-purification Protein complex preparations typically contain significant amounts of co-purifying contaminant proteins that increase sample complexity, reduce the sensitivity of detection of true interactors and complicate the interpretation of the results. We first monitored the sample complexity during the SH-purification step of our workflow by silver staining after biotin (ES) and final elution (EH) using the PP2A regulatory B subunit PPP2R2B as a bait (Figure 3A). Biotin eluates contained high molecular weight contaminants, which may interfere with the identification of less-abundant interaction partners by direct LC-MS/MS. These contaminants were efficiently removed by applying the second purification step as shown by silver staining and direct LC-MS/MS (Figure 3A, Supplementary Table I). Figure 3.Analysis of SH-double-affinity purification specificity. (A) Silver stain monitoring of sample complexity. SH-double-affinity purification was performed on 3 × 107 HEK293 cells expressing SH-PPP2R2B, and aliquots of the first (ES) and second (EH) purification step were separated by SDS–PAGE. (B) Quantitative MS analysis of the specificity increase by the second purification step. HEK293 SH-PPP2R2B lysates were split in half. Single and double-affinity purifications were performed as described before. Peptides derived from single (ES) and double-affinity purification (EH) were mixed in a three-step dilution. Following MS analysis, MS spectra were aligned and MS1 signal intensities were used for relative quantification to generate protein abundance profiles across the three samples. All presented protein profiles were generated from aligned MS1 features that correspond to at least five unmodified, fully tryptic peptides. Note that not all observed protein profiles were included here, as they did not pass the indicated filtering criteria (e.g. PPP2CA). Protein abundance profiles were normalized to the bait profile. Yellow lines correspond to specific binding partners that match the bait profile. Orange lines refer to unspecifically co-purifying proteins identified also in SH–eGFP control samples. Profiles in red represent proteins unspecifically co-purified with the streptavidin sepharose beads that are successfully removed after the second purification step. Error bars indicate s.e.m. of MS1 feature ratios of the indicated proteins. Download figure Download PowerPoint In addition, we used the recently developed MasterMap concept to illustrate the specificity increase achieved by the second purification step by monitoring the relative abundance profiles of co-purifying proteins (Rinner et al, 2007). This approach allows label-free protein quantification from aligned MS1 spectra obtained on a high mass accuracy instrument. MS1 quantification of the proteins identified with at least five fully tryptic unmodified peptides revealed three major groups of co-purifying proteins. (i) One set of proteins (Figure 3B, red lines) was efficiently removed by the second purification. These proteins were also identified in SH–eGFP control samples (Supplementary Table II) and include a group of abundant carboxylases (e.g. MCCC1, PCCA, ACACA) that are known to be biotinylated in vivo (Gravel and Narang, 2005), and hence most likely interact with the streptavidin column independent of the bait protein. (ii) A group of proteins including HSP70 chaperones (HSPA5, HSPA6 and HSPA8) that remained in the sample even after the second purification step (Figure 3B, orange lines). These proteins most likely represent unspecific interactors that bind independently of the bait protein, as they were also identified in eGFP control experiments (Supplementary Table II). (iii) Finally, the group of specific interactors that followed the profile of the bait protein but were absent in SH–eGFP control purifications. This group contained well-established interactors of the bait protein PPP2R2B, including PPP2R1A and PPP2R1B (Figure 3B, yellow lines). Previous systematic studies were hampered by the large amount of cellular starting material required for AP-MS analysis. We performed SH-purifications from as low as 4 × 106 HEK293 cells. Direct LC-MS/MS analysis of 25% of the tryptic digest was still sufficient to identify PPP2R2B-interacting proteins PPP2CA, PPP2R1A and most of the associated subunits of the CCT complex (Supplementary Table III). As PP2A is regarded as an abundant phosphatase, we recommend to use 3 × 107 cells for standard SH-purification of protein complexes. Conclusively, the SH-double-affinity purification step is a central part of our workflow and results in protein complex preparations of high purity and, when combined with direct LC-MS/MS, it significantly reduces the amounts of cellu