Defining protein complexes is critical to virtually all aspects of cell biology. Two recent affinity purification/mass spectrometry studies in Saccharomyces cerevisiae have vastly increased the available protein interaction data. The practical utility of such high throughput interaction sets, however, is substantially decreased by the presence of false positives. Here we created a novel probabilistic metric that takes advantage of the high density of these data, including both the presence and absence of individual associations, to provide a measure of the relative confidence of each potential protein-protein interaction. This analysis largely overcomes the noise inherent in high throughput immunoprecipitation experiments. For example, of the 12,122 binary interactions in the general repository of interaction data (BioGRID) derived from these two studies, we marked 7504 as being of substantially lower confidence. Additionally, applying our metric and a stringent cutoff we identified a set of 9074 interactions (including 4456 that were not among the 12,122 interactions) with accuracy comparable to that of conventional small scale methodologies. Finally we organized proteins into coherent multisubunit complexes using hierarchical clustering. This work thus provides a highly accurate physical interaction map of yeast in a format that is readily accessible to the biological community. Defining protein complexes is critical to virtually all aspects of cell biology. Two recent affinity purification/mass spectrometry studies in Saccharomyces cerevisiae have vastly increased the available protein interaction data. The practical utility of such high throughput interaction sets, however, is substantially decreased by the presence of false positives. Here we created a novel probabilistic metric that takes advantage of the high density of these data, including both the presence and absence of individual associations, to provide a measure of the relative confidence of each potential protein-protein interaction. This analysis largely overcomes the noise inherent in high throughput immunoprecipitation experiments. For example, of the 12,122 binary interactions in the general repository of interaction data (BioGRID) derived from these two studies, we marked 7504 as being of substantially lower confidence. Additionally, applying our metric and a stringent cutoff we identified a set of 9074 interactions (including 4456 that were not among the 12,122 interactions) with accuracy comparable to that of conventional small scale methodologies. Finally we organized proteins into coherent multisubunit complexes using hierarchical clustering. This work thus provides a highly accurate physical interaction map of yeast in a format that is readily accessible to the biological community. Because most cellular functions are mediated by groups of physically associated proteins or complexes that work in a coherent fashion, it is of great interest to systematically map protein-protein interactions (PPIs). 1The abbreviations used are: PPI, protein-protein interaction; TAP, tandem affinity purification; PE, purification enrichment; ROC, receiver operating characteristic; MIPS, Munich Information Center for Protein Sequences; SGD, Saccharomyces Genome Database; GO, Gene Ontology In Saccharomyces cerevisiae, these physical connections have been defined in large scale experiments using the yeast two-hybrid method (1Ito T. Chiba T. Ozawa R. Yoshida M. Hattori M. Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome..Proc. Natl. Acad. Sci. U. S. A. 2001; 98: 4569-4574Crossref PubMed Scopus (2953) Google Scholar, 2Uetz P. Giot L. Cagney G. Mansfield T.A. Judson R.S. Knight J.R. Lockshon D. Narayan V. Srinivasan M. Pochart P. Qureshi-Emili A. Li Y. Godwin B. Conover D. Kalbfleisch T. Vijayadamodar G. Yang M. Johnston M. Fields S. Rothberg J.M. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae..Nature. 2000; 403: 623-627Crossref PubMed Scopus (3922) Google Scholar) as well as direct purification of complexes using affinity tags followed by mass spectrometry analyses. In 2002, two initial studies utilized the latter strategy on subsets of the proteome (3Gavin A.C. Bosche M. Krause R. Grandi P. Marzioch M. Bauer A. Schultz J. Rick J.M. Michon A.M. Cruciat C.M. Remor M. Hofert C. Schelder M. Brajenovic M. Ruffner H. Merino A. Klein K. Hudak M. Dickson D. Rudi T. Gnau V. Bauch A. Bastuck S. Huhse B. Leutwein C. Heurtier M.A. Copley R.R. Edelmann A. Querfurth E. Rybin V. Drewes G. Raida M. Bouwmeester T. Bork P. Seraphin B. Kuster B. Neubauer G. Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes..Nature. 2002; 415: 141-147Crossref PubMed Scopus (4010) Google Scholar, 4Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sorensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry..Nature. 2002; 415: 180-183Crossref PubMed Scopus (3086) Google Scholar). Ho et al. (4Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sorensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry..Nature. 2002; 415: 180-183Crossref PubMed Scopus (3086) Google Scholar) used an overexpression strategy combined with a single affinity purification step, whereas Gavin et al. (3Gavin A.C. Bosche M. Krause R. Grandi P. Marzioch M. Bauer A. Schultz J. Rick J.M. Michon A.M. Cruciat C.M. Remor M. Hofert C. Schelder M. Brajenovic M. Ruffner H. Merino A. Klein K. Hudak M. Dickson D. Rudi T. Gnau V. Bauch A. Bastuck S. Huhse B. Leutwein C. Heurtier M.A. Copley R.R. Edelmann A. Querfurth E. Rybin V. Drewes G. Raida M. Bouwmeester T. Bork P. Seraphin B. Kuster B. Neubauer G. Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes..Nature. 2002; 415: 141-147Crossref PubMed Scopus (4010) Google Scholar) used a tandem affinity purification (TAP) system in which epitope-tagged proteins were expressed under normal physiological conditions. The use of an overexpression system may facilitate detection of weaker or more transitory associations between proteins or protein complexes but might be less optimal for accurate definition of stoichiometric interactions. Indeed the purification of proteins expressed under normal physiological conditions followed by mass spectrometry provided the best coverage and accuracy for detection of stable protein complexes (5von Mering C. Krause R. Snel B. Cornell M. Oliver S.G. Fields S. Bork P. Comparative assessment of large-scale data sets of protein-protein interactions..Nature. 2002; 417: 399-403Crossref PubMed Scopus (1934) Google Scholar). Based on these considerations, two separate groups interrogated the physical interactome of S. cerevisiae using this strategy (6Gavin A.C. Aloy P. Grandi P. Krause R. Boesche M. Marzioch M. Rau C. Jensen L.J. Bastuck S. Dumpelfeld B. Edelmann A. Heurtier M.A. Hoffman V. Hoefert C. Klein K. Hudak M. Michon A.M. Schelder M. Schirle M. Remor M. Rudi T. Hooper S. Bauer A. Bouwmeester T. Casari G. Drewes G. Neubauer G. Rick J.M. Kuster B. Bork P. Russell R.B. Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery..Nature. 2006; 440: 631-636Crossref PubMed Scopus (2134) Google Scholar, 7Krogan N.J. Cagney G. Yu H. Zhong G. Guo X. Ignatchenko A. Li J. Pu S. Datta N. Tikuisis A.P. Punna T. Peregrin-Alvarez J.M. Shales M. Zhang X. Davey M. Robinson M.D. Paccanaro A. Bray J.E. Sheung A. Beattie B. Richards D.P. Canadien V. Lalev A. Mena F. Wong P. Starostine A. Canete M.M. Vlasblom J. Wu S. Orsi C. Collins S.R. Chandran S. Haw R. Rilstone J.J. Gandi K. Thompson N.J. Musso G. St Onge P. Ghanny S. Lam M.H. Butland G. Altaf-Ul A.M. Kanaya S. Shilatifard A. O'Shea E. Weissman J.S. Ingles C.J. Hughes T.R. Parkinson J. Gerstein M. Wodak S.J. Emili A. Greenblatt J.F. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae..Nature. 2006; 440: 637-643Crossref PubMed Scopus (2350) Google Scholar). Although a similar approach was used for protein purification and identification, the resulting datasets were subjected to different analytical methods to define PPIs and protein complexes. Gavin et al. (6Gavin A.C. Aloy P. Grandi P. Krause R. Boesche M. Marzioch M. Rau C. Jensen L.J. Bastuck S. Dumpelfeld B. Edelmann A. Heurtier M.A. Hoffman V. Hoefert C. Klein K. Hudak M. Michon A.M. Schelder M. Schirle M. Remor M. Rudi T. Hooper S. Bauer A. Bouwmeester T. Casari G. Drewes G. Neubauer G. Rick J.M. Kuster B. Bork P. Russell R.B. Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery..Nature. 2006; 440: 631-636Crossref PubMed Scopus (2134) Google Scholar) exploited a “socio-affinity” scoring system that measures the log-ratio of the number of times two proteins are observed together relative to what would be expected from their frequency in the dataset. Importantly this approach takes advantage of not only direct bait-prey connections but also indirect prey-prey relationships where two proteins are each identified as preys in a purification in which a third protein is used as bait. Krogan et al. (7Krogan N.J. Cagney G. Yu H. Zhong G. Guo X. Ignatchenko A. Li J. Pu S. Datta N. Tikuisis A.P. Punna T. Peregrin-Alvarez J.M. Shales M. Zhang X. Davey M. Robinson M.D. Paccanaro A. Bray J.E. Sheung A. Beattie B. Richards D.P. Canadien V. Lalev A. Mena F. Wong P. Starostine A. Canete M.M. Vlasblom J. Wu S. Orsi C. Collins S.R. Chandran S. Haw R. Rilstone J.J. Gandi K. Thompson N.J. Musso G. St Onge P. Ghanny S. Lam M.H. Butland G. Altaf-Ul A.M. Kanaya S. Shilatifard A. O'Shea E. Weissman J.S. Ingles C.J. Hughes T.R. Parkinson J. Gerstein M. Wodak S.J. Emili A. Greenblatt J.F. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae..Nature. 2006; 440: 637-643Crossref PubMed Scopus (2350) Google Scholar), on the other hand, used a synthesis of machine learning techniques including Bayesian networks and C4.5-based and boosted stump decision trees to define confidence scores for potential interactions based on direct bait-prey observations. The two groups also used different clustering algorithms to define protein complexes from their PPI datasets. For example, Krogan et al. (7Krogan N.J. Cagney G. Yu H. Zhong G. Guo X. Ignatchenko A. Li J. Pu S. Datta N. Tikuisis A.P. Punna T. Peregrin-Alvarez J.M. Shales M. Zhang X. Davey M. Robinson M.D. Paccanaro A. Bray J.E. Sheung A. Beattie B. Richards D.P. Canadien V. Lalev A. Mena F. Wong P. Starostine A. Canete M.M. Vlasblom J. Wu S. Orsi C. Collins S.R. Chandran S. Haw R. Rilstone J.J. Gandi K. Thompson N.J. Musso G. St Onge P. Ghanny S. Lam M.H. Butland G. Altaf-Ul A.M. Kanaya S. Shilatifard A. O'Shea E. Weissman J.S. Ingles C.J. Hughes T.R. Parkinson J. Gerstein M. Wodak S.J. Emili A. Greenblatt J.F. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae..Nature. 2006; 440: 637-643Crossref PubMed Scopus (2350) Google Scholar) used a Markov clustering algorithm (8Enright A.J. Van Dongen S. Ouzounis C.A. An efficient algorithm for large-scale detection of protein families..Nucleic Acids Res. 2002; 30: 1575-1584Crossref PubMed Scopus (2570) Google Scholar) for definition of protein complexes, whereas Gavin et al. (6Gavin A.C. Aloy P. Grandi P. Krause R. Boesche M. Marzioch M. Rau C. Jensen L.J. Bastuck S. Dumpelfeld B. Edelmann A. Heurtier M.A. Hoffman V. Hoefert C. Klein K. Hudak M. Michon A.M. Schelder M. Schirle M. Remor M. Rudi T. Hooper S. Bauer A. Bouwmeester T. Casari G. Drewes G. Neubauer G. Rick J.M. Kuster B. Bork P. Russell R.B. Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery..Nature. 2006; 440: 631-636Crossref PubMed Scopus (2134) Google Scholar) utilized a different clustering approach to define complexes, each consisting of groups of proteins termed “core,”“module,” or “attachment”. Modules were intended to represent subcomplexes that are components of several distinct complexes, and attachments were factors less stably associated with stable core complexes. Although both of these individual datasets are of high quality, it is not obvious how discrepancies between them should be resolved, and each still contains a substantial number of false positive interactions that can compromise the utility of these data for guiding more focused studies. In this study, we merged these two datasets into a single reliable collection of experimentally based PPIs by analyzing the primary affinity purification data using a novel purification enrichment (PE) scoring system. Using a well defined reference set of manually curated PPIs, we demonstrated that our consolidated dataset is of greater accuracy than the individual sets and is comparable to PPIs defined using more conventional small scale methodologies. Although algorithms designed to detect multiprotein complexes can be highly effective for extracting additional information from noisy and incomplete datasets, attempting to strictly define protein complexes may not be the optimal way to analyze such a high confidence dataset. In particular, any partitioning analysis must either group together distinct complexes that share one or more subunits or fail to correctly identify all of the components of such complexes. Additionally weak interactions between proteins or protein complexes may be lost. In this work, we subjected the entire high confidence PPI dataset to a relatively unbiased hierarchical clustering from which one can more easily identify shared components of distinct complexes as well as weak associations between complexes. We argue that this representation provides a convenient tool for biologists to gather information about a protein of interest rapidly. Finally this depiction potentially mimics the in vivo environment: a continuum of weak associations between stable protein complexes. PE scores were modeled after a discriminant function for a Bayes classifier (9Duda R.O. Hart P.E. Stork D.G. Pattern Classification. 2nd Ed. John Wiley and Sons, New York2001: 20-63Google Scholar) as a measure of the likelihood of observed experimental results given the hypothesis that an interaction is genuine relative to the likelihood of the same results if the interaction is not real. These scores incorporate ideas from the socio-affinity scoring system reported by Gavin et al. (6Gavin A.C. Aloy P. Grandi P. Krause R. Boesche M. Marzioch M. Rau C. Jensen L.J. Bastuck S. Dumpelfeld B. Edelmann A. Heurtier M.A. Hoffman V. Hoefert C. Klein K. Hudak M. Michon A.M. Schelder M. Schirle M. Remor M. Rudi T. Hooper S. Bauer A. Bouwmeester T. Casari G. Drewes G. Neubauer G. Rick J.M. Kuster B. Bork P. Russell R.B. Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery..Nature. 2006; 440: 631-636Crossref PubMed Scopus (2134) Google Scholar) but differ in several significant ways. First, these scores take into account not only positive evidence for an interaction contained in the identification of two proteins in the same purification but also negative evidence against interactions wherein one protein fails to be identified as a prey when another is used as a bait. This negative evidence has typically not been used in previous interaction scoring techniques, and it can be particularly useful for distinguishing non-interacting pairs of proteins that share many interaction partners from pairs that do exist in stable complexes. Second, PE scores more powerfully exploit situations in which a particular bait protein was used in multiple separate purifications. Third, the PE scoring strategy uses a different model for the likelihood of observing a pair of proteins in the same purification if these proteins do not interact. PE scores were motivated by the probabilistic framework of a (naïve) Bayes classifier. In a Bayes classifier, an estimate of the probability of one hypothesis (here that an interaction is real) relative to the probability of a second hypothesis (here that the interaction is not real), given a set of observations, is calculated to determine which hypothesis is more likely. Both of these probabilities are calculated using Bayes’ theorem, and a discriminant function f is calculated as the log-ratio of these probabilities. An interaction is classified as real if f > 0 and false if f < 0 (9Duda R.O. Hart P.E. Stork D.G. Pattern Classification. 2nd Ed. John Wiley and Sons, New York2001: 20-63Google Scholar). The function f is defined as f(allobservations)=log10 P(allobservations|truePPI)×P(truePPI)P(allobservations|falsePPI)×P(falsePPI)Eq. 1 where P(true_PPI) and P(false_PPI) represent prior expectations for the fraction of all protein pairs that do and do not interact physically. The above equation can be rewritten as follows. f(allobservations)=log10 P(truePPI)P(falsePPI)+ ∑i=1numobservations P(observationi|truePPI)P(observation|falsePPI)Eq. 2 Although the accuracy of a Bayes classifier will rely on an appropriate value for P(true_PPI) and the correct value is not obvious, an incorrect choice of this value will not affect the ordering of scores for putative interactions. We therefore computed PE scores as a sum of the evidence supporting or disaffirming each potential interaction over all relevant purifications in the dataset. For a particular observation, this evidence was computed as an estimate of the corresponding term in the above sum. Evidenceobservation=log10 P(observation|truePPI)P(observation|falsePPI)Eq. 3 A PE score of 0 then indicates that no evidence for or against the validity of a particular interaction was collected (and in theory the probability that such an interaction is true should be equal to the prior estimate of P(true_PPI)). In particular, we considered two types of observations in the construction of PE scores: bait-prey observations when one of the proteins of interest was used as a bait and prey-prey observations when the two proteins of interest both appeared as preys in the purification of a third protein. As a result, similar to socio-affinity scores (6Gavin A.C. Aloy P. Grandi P. Krause R. Boesche M. Marzioch M. Rau C. Jensen L.J. Bastuck S. Dumpelfeld B. Edelmann A. Heurtier M.A. Hoffman V. Hoefert C. Klein K. Hudak M. Michon A.M. Schelder M. Schirle M. Remor M. Rudi T. Hooper S. Bauer A. Bouwmeester T. Casari G. Drewes G. Neubauer G. Rick J.M. Kuster B. Bork P. Russell R.B. Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery..Nature. 2006; 440: 631-636Crossref PubMed Scopus (2134) Google Scholar), PE scores can be written as a sum of direct bait-prey components (S) and an indirect prey-prey component (M). Thus, for a potential interaction between proteins i and j, PEij=Sij+Sji+MijEq. 4 where Sij measures evidence from purifications where protein i was used as bait, Sji measures evidence from purifications where protein j was used as bait, and Mij measures indirect evidence due to co-occurrence of proteins i and j as preys in the same purifications. Below we give detailed equations used to compute the S and M components, Sij= ∑ksijkEq. 5 where each value of k indicates a distinct purification in which protein i was used as bait and sijk represents the corresponding evidence computed using Equation 3. The probabilities P(observation | true_PPI) and P(observation | true_PPI) used to define sijk were calculated based on estimates of two underlying probabilities: r representing the probability that a true association will be preserved and detected in a purification experiment and pijk representing the probability that a bait-prey pair will be observed for nonspecific reasons. Using these quantities, we calculate sijk=log10 r+(1−r)×pijkpijkEq. 6 if protein j appeared as a prey in purification k using bait i and sijk=log10 (1−r)×(1−pijk)(1−pijk)=log10(1−r)Eq. 7 otherwise. Values for r and pijk could in principle be estimated in a number of ways. Here we estimated r using the observed frequency of successful purification over a very high confidence set of interactions (the intersection of MIPS complexes and MIPS small scale experiments). For the Krogan et al. (7Krogan N.J. Cagney G. Yu H. Zhong G. Guo X. Ignatchenko A. Li J. Pu S. Datta N. Tikuisis A.P. Punna T. Peregrin-Alvarez J.M. Shales M. Zhang X. Davey M. Robinson M.D. Paccanaro A. Bray J.E. Sheung A. Beattie B. Richards D.P. Canadien V. Lalev A. Mena F. Wong P. Starostine A. Canete M.M. Vlasblom J. Wu S. Orsi C. Collins S.R. Chandran S. Haw R. Rilstone J.J. Gandi K. Thompson N.J. Musso G. St Onge P. Ghanny S. Lam M.H. Butland G. Altaf-Ul A.M. Kanaya S. Shilatifard A. O'Shea E. Weissman J.S. Ingles C.J. Hughes T.R. Parkinson J. Gerstein M. Wodak S.J. Emili A. Greenblatt J.F. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae..Nature. 2006; 440: 637-643Crossref PubMed Scopus (2350) Google Scholar), Gavin et al. (6Gavin A.C. Aloy P. Grandi P. Krause R. Boesche M. Marzioch M. Rau C. Jensen L.J. Bastuck S. Dumpelfeld B. Edelmann A. Heurtier M.A. Hoffman V. Hoefert C. Klein K. Hudak M. Michon A.M. Schelder M. Schirle M. Remor M. Rudi T. Hooper S. Bauer A. Bouwmeester T. Casari G. Drewes G. Neubauer G. Rick J.M. Kuster B. Bork P. Russell R.B. Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery..Nature. 2006; 440: 631-636Crossref PubMed Scopus (2134) Google Scholar), and Ho et al. (4Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sorensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry..Nature. 2002; 415: 180-183Crossref PubMed Scopus (3086) Google Scholar) data, this gave values of 0.51, 0.62, and 0.265, respectively. For pijk we used an estimate of the probability that a given bait-prey pair would be observed for nonspecific reasons at least once in the dataset, calculated using the Poisson distribution as pijk=1−exp(−fjnikpreynibait)Eq. 8 where nikprey is the number of preys identified in purification k with bait i, nibait is the number of times protein i was used as bait, and fj is an estimate of the nonspecific frequency of occurrence of prey j in the dataset. The relative values of the fj are estimates of relative rates at which different preys occur nonspecifically (and can be considered measures of relative promiscuity), and the sum of the fj can be considered to be the fraction of all prey identifications that are nonspecific. Although alternate strategies could be used, for simplicity we allowed the sum of the fj to be 1, and we computed fj as Bayesian posterior estimates based on the observed frequency of occurrence of preys in the dataset and the prior hypothesis that all preys occur nonspecifically with equal frequency, fj= njpreyobs+npseudontotpreyobs+(ndistinctpreys×npseudo)Eq. 9 where njprey_obs is the total number of observations of protein j as a prey, ntotprey_obs is the total number of observations of all preys, ndistinct_preys is the number of distinct preys observed, and npseudo is a number of pseudocounts added for each prey that determines the weight given to the prior hypothesis. Values of 20, 10, and 5 were used for npseudo for the Krogan et al. (7Krogan N.J. Cagney G. Yu H. Zhong G. Guo X. Ignatchenko A. Li J. Pu S. Datta N. Tikuisis A.P. Punna T. Peregrin-Alvarez J.M. Shales M. Zhang X. Davey M. Robinson M.D. Paccanaro A. Bray J.E. Sheung A. Beattie B. Richards D.P. Canadien V. Lalev A. Mena F. Wong P. Starostine A. Canete M.M. Vlasblom J. Wu S. Orsi C. Collins S.R. Chandran S. Haw R. Rilstone J.J. Gandi K. Thompson N.J. Musso G. St Onge P. Ghanny S. Lam M.H. Butland G. Altaf-Ul A.M. Kanaya S. Shilatifard A. O'Shea E. Weissman J.S. Ingles C.J. Hughes T.R. Parkinson J. Gerstein M. Wodak S.J. Emili A. Greenblatt J.F. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae..Nature. 2006; 440: 637-643Crossref PubMed Scopus (2350) Google Scholar), Gavin et al. (6Gavin A.C. Aloy P. Grandi P. Krause R. Boesche M. Marzioch M. Rau C. Jensen L.J. Bastuck S. Dumpelfeld B. Edelmann A. Heurtier M.A. Hoffman V. Hoefert C. Klein K. Hudak M. Michon A.M. Schelder M. Schirle M. Remor M. Rudi T. Hooper S. Bauer A. Bouwmeester T. Casari G. Drewes G. Neubauer G. Rick J.M. Kuster B. Bork P. Russell R.B. Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery..Nature. 2006; 440: 631-636Crossref PubMed Scopus (2134) Google Scholar), and Ho et al. (4Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sorensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry..Nature. 2002; 415: 180-183Crossref PubMed Scopus (3086) Google Scholar) datasets, respectively. The value of npseudo was the only parameter adjusted to optimize the PE scoring system. Adjustments were done using the MIPS complexes as a reference, and for this reason results of all comparisons made using a reference set based on the MIPS complexes were duplicated using an independent reference set generated from the SGD complexes. The M component was calculated as Mij= ∑kmijkEq. 10 where each value of k indicates one purification in which proteins i and j were simultaneously observed as preys. In this case, our approach differs slightly from the full Bayesian classifier approach, which would either sum over all purifications or sum over all purifications in which at least one of the two proteins was identified as a prey. We did not use a sum over all purifications because it would require an enormous number of calculations and because estimation of all of the relevant probabilities is itself a very difficult problem. We instead created an approximate implementation of Equation 3 for mijk calculated only for observations where both preys were observed in the same purification. Significantly we did not include a negative term for the case in which only one of the two proteins was observed as a prey in a purification. This was because two proteins can interact yet also be components of alternate complexes. Our implementation was again based on estimates for two underlying probabilities. Here we used r to represent the probability that a true association between proteins i and j will be preserved and detected during a purification experiment and pijk to represent the probability that proteins i and j will appear as preys in the same purification for nonspecific reasons. mijk=log10 r+(1−r)×ptjkpijkEq. 11 We used the same estimate for r as calculated above, and for pijk we used an estimate of probability that proteins i and j will occur nonspecifically as preys in the same purification at least once in the dataset. This value for pijk is calculated using the Poisson distribution as pijk=1−exp(−fifjntotprey−prey)Eq. 12 where fi and fj are computed as described above, and ntotprey-prey is the total number of prey-prey pairs observed in the dataset. The Krogan et al. (7Krogan N.J. Cagney G. Yu H. Zhong G. Guo X. Ignatchenko A. Li J. Pu S. Datta N. Tikuisis A.P. Punna T. Peregrin-Alvarez J.M. Shales M. Zhang X. Davey M. Robinson M.D. Paccanaro A. Bray J.E. Sheung A. Beattie B. Richards D.P. Canadien V. Lalev A. Mena F. Wong P. Starostine A. Canete M.M. Vlasblom J. Wu S. Orsi C. Collins S.R. Chandran S. Haw R. Rilstone J.J. Gandi K. Thompson N.J. Musso G. St Onge P. Ghanny S. Lam M.H. Butland G. Altaf-Ul A.M. Kanaya S. Shilatifard A. O'Shea E. Weissman J.S. Ingles C.J. Hughes T.R. Parkinson J. Gerstein M. Wodak S.J. Emili A. Greenblatt J.F. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae..Nature. 2006; 440: 637-643Crossref PubMed Scopus (2350) Google Scholar) and Gavin et al. (6Gavin A.C. Aloy P. Grandi P. Krause R. Boesche M. Marzioch M. Rau C. Jensen L.J. Bastuck S. Dumpelfeld B. Edelmann A. Heurtier M.A. Hoffman V. Hoefert C. Klein K. Hudak M. Michon A.M. Schelder M. Schirle M. Remor M. Rudi T. Hooper S. Bauer A. Bouwmeester T. Casari