Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray coexpressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown. Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray coexpressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown. The completion of various genome-sequencing projects and large-scale genomic studies has led to a wealth of available biological data. It is anticipated that this information will revolutionize our insight into the molecular basis of most common diseases by making it easier and quicker to identify genes with variants that predispose to disease (i.e., disease genes). At the moment, we are faced with many disease susceptibility loci, resulting from linkage or cytogenetic analyses, that cover extensive genomic regions. Usually, when the genes in these loci are assessed, positional candidate genes become apparent that can be linked to the phenotype being studied on the basis of their biological function. However, the most obvious functional candidate gene from a disease locus does not always prove to be involved in the disease.e.g.,1Jacobi FK Broghammer M Pesch K Zrenner E Berger W Meindl A Pusch CM Physical mapping and exclusion of GPR34 as the causative gene for congenital stationary night blindness type 1.Hum Genet. 2000; 107: 89-91Crossref PubMed Scopus (6) Google Scholar, 2Seri M Martucciello G Paleari L Bolino A Priolo M Salemi G Forabosco P Caroli F Cusano R Tocco T Lerone M Cama A Torre M Guys JM Romeo G Jasonni V Exclusion of the Sonic Hedgehog gene as responsible for Currarino syndrome and anorectal malformations with sacral hypodevelopment.Hum Genet. 1999; 104: 108-110Crossref PubMed Scopus (24) Google Scholar, 3Simard J Feunteun J Lenoir G Tonin P Normand T Luu The V Vivier A et al.Genetic mapping of the breast-ovarian cancer syndrome to a small interval on chromosome 17q12-21: exclusion of candidate genes EDH17B2 and RARA.Hum Mol Genet. 1993; 2: 1193-1199Crossref PubMed Scopus (66) Google Scholar, 4Tumer Z Croucher PJ Jensen LR Hampe J Hansen C Kalscheuer V Ropers HH Tommerup N Schreiber S Genomic structure, chromosome mapping and expression analysis of the human AVIL gene, and its exclusion as a candidate for locus for inflammatory bowel disease at 12q13-14 (IBD2).Gene. 2002; 288: 179-185Crossref PubMed Scopus (13) Google Scholar, 5Walpole SM Ronce N Grayson C Dessay B Yates JR Trump D Toutain A Exclusion of RAI2 as the causative gene for Nance-Horan syndrome.Hum Genet. 1999; 104: 410-411Crossref PubMed Scopus (10) Google Scholar Often, genes that would not have been predicted to be disease causing prove to be the true disease gene—for example, the BRCA1 gene in early-onset breast cancer.6Miki Y Swensen J Shattuck-Eidens D Futreal PA Harshman K Tavtigian S Liu Q et al.A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1.Science. 1994; 266: 66-71Crossref PubMed Scopus (5048) Google Scholar Moreover, although these disease genes might have been assigned biological functions, it is not always evident how these functions relate to disease. Finally, genes with unknown functions are often overlooked, as attention is paid only to well-studied genes for which functions and interactions have been identified or implicated, some of which can be related to the disease pathogenesis. For example, in Fanconi anemia, at least 10 disease genes were identified,7Joenje H Patel KJ The emerging genetic and molecular basis of Fanconi anaemia.Nat Rev Genet. 2001; 2: 446-457Crossref PubMed Scopus (493) Google Scholar but only a few had a known function. However, follow-up research8D'Andrea AD Grompe M The Fanconi anaemia/BRCA pathway.Nat Rev Cancer. 2003; 3: 23-34Crossref PubMed Scopus (643) Google Scholar, 9de Winter JP van der Weel L de Groot J Stone S Waisfisz Q Arwert F Scheper RJ Kruyt FA Hoatlin ME Joenje H The Fanconi anemia protein FANCF forms a nuclear complex with FANCA, FANCC and FANCG.Hum Mol Genet. 2000; 9: 2665-2674Crossref PubMed Scopus (172) Google Scholar, 10Yamashita T Kupfer GM Naf D Suliman A Joenje H Asano S D'Andrea AD The Fanconi anemia pathway requires FAA phosphorylation and FAA/FAC nuclear accumulation.Proc Natl Acad Sci USA. 1998; 95: 13085-13090Crossref PubMed Scopus (106) Google Scholar revealed that five of those genes function in the same protein complex. Another example is limb-girdle muscular dystrophy, in which many of the disease genes encode for proteins that are part of the dystrophin complex.11Zatz M de Paula F Starling A Vainzof M The 10 autosomal recessive limb-girdle muscular dystrophies.Neuromuscul Disord. 2003; 13: 532-544Abstract Full Text Full Text PDF PubMed Scopus (121) Google Scholar This emphasizes the importance of taking an unbiased approach to assessing positional candidate genes. Faced with the absence of complete functional information for the majority of genes in susceptibility loci, it is difficult to prioritize the positional candidate genes correctly for further sequence or association analysis. However, high-throughput genomic work has now yielded relatively unbiased genomewide data sets12Alfarano C Andrade CE Anthony K Bahroos N Bajec M Bantoft K Betel D et al.The Biomolecular Interaction Network Database and related tools 2005 update.Nucleic Acids Res Database Issue. 2005; 33: D418-D424Crossref PubMed Scopus (458) Google Scholar, 13Peri S Navarro JD Kristiansen TZ Amanchy R Surendranath V Muthusamy B Gandhi TK et al.Human Protein Reference Database as a discovery resource for proteomics.Nucleic Acids Res Database Issue. 2004; 32: D497-D501Crossref PubMed Google Scholar, 14Kanehisa M Goto S Kawashima S Okuno Y Hattori M The KEGG resource for deciphering the genome.Nucleic Acids Res Database Issue. 2004; 32: D277-D280Crossref PubMed Google Scholar, 15Joshi-Tope G Gillespie M Vastrik I D'Eustachio P Schmidt E de Bono B Jassal B Gopinath GR Wu GR Matthews L Lewis S Birney E Stein L Reactome: a knowledgebase of biological pathways.Nucleic Acids Res Database Issue. 2005; 33: D428-D432Crossref PubMed Scopus (911) Google Scholar that comprise known metabolic, regulatory, functional, and physical interactions. There is, however, little integration of these diverse data sets into a coherent view of possible gene and protein interactions that can be used to investigate relationships between genes in different genetic loci. We have tried to address this problem by developing a functional human gene network that comprises known interactions derived from the Biomolecular Interaction Network Database (BIND),12Alfarano C Andrade CE Anthony K Bahroos N Bajec M Bantoft K Betel D et al.The Biomolecular Interaction Network Database and related tools 2005 update.Nucleic Acids Res Database Issue. 2005; 33: D418-D424Crossref PubMed Scopus (458) Google Scholar the Human Protein Reference Database (HPRD),13Peri S Navarro JD Kristiansen TZ Amanchy R Surendranath V Muthusamy B Gandhi TK et al.Human Protein Reference Database as a discovery resource for proteomics.Nucleic Acids Res Database Issue. 2004; 32: D497-D501Crossref PubMed Google Scholar Reactome,15Joshi-Tope G Gillespie M Vastrik I D'Eustachio P Schmidt E de Bono B Jassal B Gopinath GR Wu GR Matthews L Lewis S Birney E Stein L Reactome: a knowledgebase of biological pathways.Nucleic Acids Res Database Issue. 2005; 33: D428-D432Crossref PubMed Scopus (911) Google Scholar and the Kyoto Encyclopedia of Genes and Genomes (KEGG).14Kanehisa M Goto S Kawashima S Okuno Y Hattori M The KEGG resource for deciphering the genome.Nucleic Acids Res Database Issue. 2004; 32: D277-D280Crossref PubMed Google Scholar Since these data sets contain a limited number of known interactions, we implemented a Bayesian framework to complement these relationships with a large number of predicted interactions by relying on evidence for putative gene relationships based on biological process and molecular function annotations from the Gene Ontology database (GO).16Harris MA Clark J Ireland A Lomax J Ashburner M Foulger R Eilbeck K et al.The Gene Ontology (GO) database and informatics resource.Nucleic Acids Res Database Issue. 2004; 32: D258-D261Crossref PubMed Google Scholar We further incorporated experimental data—namely, coexpression data derived from ∼450 microarray hybridizations from the Stanford Microarray Database (SMD)17Ball CA Awad IA Demeter J Gollub J Hebert JM Hernandez-Boussard T Jin H Matese JC Nitzberg M Wymore F Zachariah ZK Brown PO Sherlock G The Stanford Microarray Database accommodates additional microarray platforms and data formats.Nucleic Acids Res Database Issue. 2005; 33: D580-D582Crossref PubMed Scopus (156) Google Scholar and the NCBI Gene Expression Omnibus (GEO),18Barrett T Suzek TO Troup DB Wilhite SE Ngau WC Ledoux P Rudnev D Lash AE Fujibuchi W Edgar R NCBI GEO: mining millions of expression profiles—database and tools.Nucleic Acids Res Database Issue. 2005; 33: D562-D566Crossref PubMed Scopus (762) Google Scholar along with human yeast two-hybrid (Y2H) interactions19Stelzl U Worm U Lalowski M Haenig C Brembeck FH Goehler H Stroedicke M Zenkner M Schoenherr A Koeppen S Timm J Mintzlaff S Abraham C Bock N Kietzmann S Goedde A Toksoz E Droege A Krobitsch S Korn B Birchmeier W Lehrach H Wanker EE A human protein-protein interaction network: a resource for annotating the proteome.Cell. 2005; 122: 957-968Abstract Full Text Full Text PDF PubMed Scopus (1763) Google Scholar and interactions based on orthologous high-throughput protein-protein interactions from lower eukaryotes.20Lehner B Fraser AG A first-draft human protein-interaction map.Genome Biol. 2004; 5: R63Crossref PubMed Google Scholar Our interaction network was then used to test whether we could rank the best positional candidates in susceptibility loci on the basis of their interactions, assuming that the causative genes for any one disorder will be involved in only a few different biological pathways. This would be apparent in our network as a clustering of genes from different susceptibility loci, resulting in shorter gene-gene connections between disease genes than one would expect by chance (fig. 1). Our method (called "Prioritizer") analyzes susceptibility loci and investigates whether genes from different loci can be linked to each other directly21Turner FS Clutterbuck DR Semple CA POCUS: mining genomic sequence annotation to predict disease genes.Genome Biol. 2003; 4: R75Crossref PubMed Scopus (183) Google Scholar or indirectly.22Brunner HG van Driel MA From syndrome families to functional genomics.Nat Rev Genet. 2004; 5: 545-551Crossref PubMed Scopus (143) Google Scholar When we constructed artificial loci of varying size around susceptibility loci from 96 different genetic disorders (each containing at least three loci) and used Prioritizer in our most comprehensive gene network to rank the positional candidate genes for each locus, we were able to significantly increase the chance of detecting disease genes. As a basis for the gene network, we used annotations from Ensembl,23Birney E Andrews TD Bevan P Caccamo M Chen Y Clarke L Coates G et al.An overview of Ensembl.Genome Res. 2004; 14: 925-928Crossref PubMed Scopus (311) Google Scholar version 32.35, resulting in 20,334 known genes that physically map within the autosomes or chromosome X or Y. This yielded 206,725,611 potential gene-gene interactions. On the basis of this set of genes, a comprehensive "gold standard" set of validated direct gene-gene relationships (true positives) was determined using both BIND (September 15, 2005) and HRPD (September 15, 2005) to extract human, curated protein-protein interactions, the proteins of which were mapped to Ensembl gene identifiers. In addition, all human pathways from Reactome (September 15, 2005) and KEGG (September 15, 2005) were used to derive direct interactions that were of transcriptional, physical, or metabolic origin, since pathways are usually composed of genes and proteins that interact with each other in various ways. We chose to allow interactions of physical, metabolic, and regulatory origin to be included within our network, because, for instance, mutations in either one of two genes encoding proteins in the same metabolic pathway or protein complex could lead to the same disease phenotype. Because the true-positive gold standard only describes a limited number of relationships between a limited number of genes, we also used data from GO, coexpression data derived from microarray experiments, conserved protein-protein high-throughput data, and human Y2H interaction data to predict interactions of the remaining gene pairs. We used a Bayesian classifier, because these four types of data were of varying reliability and only contained information about a subset of the data. The classifier allows for combining dissimilar data sets, can deal with missing data, and uses conditional probabilities that can be well interpreted and that control for the varying reliability of the data sets.24Beaumont MA Rannala B The Bayesian revolution in genetics.Nat Rev Genet. 2004; 5: 251-261Crossref PubMed Scopus (323) Google Scholar, 25Egmont-Petersen M Feelders A Baesens B Confidence intervals for probabilistic network classifiers.Comput Stat Data Anal. 2005; 49: 998-1019Crossref Scopus (4) Google Scholar, 26Jansen R Yu H Greenbaum D Kluger Y Krogan NJ Chung S Emili A Snyder M Greenblatt JF Gerstein M A Bayesian networks approach for predicting protein-protein interactions from genomic data.Science. 2003; 302: 449-453Crossref PubMed Scopus (1010) Google Scholar, 27Lee I Date SV Adai AT Marcotte EM A probabilistic functional network of yeast genes.Science. 2004; 306: 1555-1558Crossref PubMed Scopus (531) Google Scholar, 28Xia Y Yu H Jansen R Seringhaus M Baxter S Greenbaum D Zhao H Gerstein M Analyzing cellular biochemistry in terms of molecular networks.Annu Rev Biochem. 2004; 73: 1051-1087Crossref PubMed Scopus (114) Google Scholar, 29Friedman N Geiger D Goldszmidt M Bayesian network classifiers.Mach Learn. 1997; 29: 131-163Crossref Google Scholar For the prediction of interactions, we used a Bayesian classifier type that assumed all data sets had been binned. This operation was performed for each gene pair, and it determined, for each data set, to which bin the pair belongs. Because the number of bins per data set was limited, each bin contained many gene pairs. Subsequently, for each bin, we determined the likelihood ratio between the proportion of gene pairs known to interact and the proportion of gene pairs known not to interact. This measure indicates whether there is an over- or an underrepresentation of truly interacting gene pairs in the bin, which specifies the conditional probability estimates of the Bayesian classifier; thus, training of the classifier is straightforward. However, to be able to train the classifier by determining likelihood ratios of sets of gene pairs, it was crucial that the gold standard, containing the aforementioned well-defined set of curated true-positive gene pairs, be complemented with a set of gene pairs for which there is strong evidence that they, or the proteins they encode, do not functionally interact (true negatives). As has been discussed by others,30Jansen R Gerstein M Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction.Curr Opin Microbiol. 2004; 7: 535-545Crossref PubMed Scopus (141) Google Scholar the construction of this true-negative reference set is problematic, because it is impossible to be certain that two genes (i.e., their protein products) do not interact. However, by assuming that genes encoding for proteins localized within different cellular compartments are, in general, unrelated, it is possible to make a list of gene pairs that are unlikely to interact. The GO Cellular Component annotations were used to yield groups of gene pairs that have exclusive cellular component annotations. To overcome a strong selection bias in the classifier toward well-annotated genes (details provided in appendix A [online only]), only the 5,105 genes that were part of a true-positive gene pair at least three times were allowed to form true-negative gene pairs. We chose combinations of cellular organelles that were highly underrepresented (χ2=2,490; P<10−10) within the true-positive set, which resulted in gene pairs for the following combinations: nucleus and extracellular matrix, protein complex and Golgi apparatus, protein complex and Golgi stack, non–membrane-bound organelle and Golgi stack, non–membrane-bound organelle and extracellular space, non–membrane-bound organelle and Golgi apparatus, extracellular region and organelle membrane, mitochondrion and extracellular matrix, extracellular space and organelle membrane, extracellular space and Golgi stack, organelle membrane and extracellular matrix, extracellular matrix and Golgi stack, extracellular matrix and ubiquitin ligase complex, and ubiquitin ligase complex and Golgi stack. To allow for Bayesian integration, the GO data, microarray coexpression data, and orthologous and human protein-protein interactions data were preprocessed and binned. Biological Process and Molecular Function GO annotations were derived from Ensembl, and two measures of relatedness for each of the two data sets were determined, resulting in a total of four different GO measures of relatedness. First, we determined, for each Biological Process GO term, how many of the genes had been assigned this term. Then, we determined which Biological Process GO terms were shared between the two components of each gene pair, for all the pairs. This led to the shared GO term that was annotated in the least number of genes, and its frequency of occurrence was used as a measure. GO terms GO:0000004 (biological process unknown) and GO:0005554 (molecular function unknown) were discarded, since genes that shared either of these highly unspecific terms should not be related to each other on the basis of this information. The same procedure was performed to generate the first measure of Molecular Function GO relatedness. The second measure determined the maximal hierarchical depth at which a gene pair shared a Biological Process GO term. This hierarchical depth was defined as the shortest number of branches necessary to go from one Biological Process GO term back to the GO root. The same method was used to generate the maximum hierarchical depth of the Molecular Function GO sharing measure. Coexpression between genes was determined in microarray data sets from GEO and SMD. Individual data sets comprised an experiment that contained at least 10 hybridizations. To ensure that the quality of the intensity measurements was reliable, various filtering steps were performed to exclude spots with low signal-to-noise ratios.31Lee HK Hsu AK Sajdak J Qin J Pavlidis P Coexpression analysis of human genes across many microarray data sets.Genome Res. 2004; 14: 1085-1094Crossref PubMed Scopus (572) Google Scholar Within the SMD data sets, intensity spots were filtered out that were either missing or contaminated, and the mean intensity of spots had to be at least 2.5 times higher than the average background signal of the microarray. Since GEO contains both ratiometric and Affymetrix single-spot intensity microarray data sets, we used different filtering strategies. The 5% of genes with the lowest maximal intensity were removed from the Affymetrix data sets. For both SMD and GEO, expression ratios were log2 transformed. Microarray features missing ⩾25% of expression measurements in a data set after filtering were excluded. All features were assigned Ensembl gene identifiers by comparing their sequences to Ensembl transcripts with the use of SSAHA.32Ning Z Cox AJ Mullikin JC SSAHA: a fast search method for large DNA databases.Genome Res. 2001; 11: 1725-1729Crossref PubMed Scopus (746) Google Scholar To determine which gene pairs showed coexpression, the mutual information was calculated between all the genes represented within each data set33Basso K Margolin AA Stolovitzky G Klein U Dalla-Favera R Califano A Reverse engineering of regulatory networks in human B cells.Nat Genet. 2005; 37: 382-390Crossref PubMed Scopus (1027) Google Scholar if there were at least 10 nonmissing data points. As a preprocessing step, expression levels were ranked; this invertible reparameterization did not affect the mutual information. Next, for each pair of genes, the joint distribution of expression levels was estimated by calculating a histogram with overlapping windows. The range was divided into six windows, where each window extends to the center of the next window. The number of windows was chosen by optimizing the error rate for the mutual information derived from analytical probability densities.33Basso K Margolin AA Stolovitzky G Klein U Dalla-Favera R Califano A Reverse engineering of regulatory networks in human B cells.Nat Genet. 2005; 37: 382-390Crossref PubMed Scopus (1027) Google Scholar In this way, each data point contributes to two windows, except at the extremities. Finally, on the basis of the resulting distribution, the mutual information (MI) between each pair of genes was calculated as MI(A,B)=H(A)+H(B)-H(A,B), where H(X) is the information-theoretic Shannon entropy.34Shannon CE A mathematical theory of communication.Bell Syst Tech J. 1948; 27 (623-356): 379-423Crossref Scopus (20774) Google Scholar For each microarray data set, the MI score was binned. This allowed the subsequent Bayesian classifier to determine the likelihood ratio, indicating whether gene pairs within each bin contained an overrepresentation of truly interacting gene pairs. Once the likelihood ratios had been determined for each data set, a receiver operator characteristic (ROC) curve was constructed, and the area under the curve (AUC) was calculated. Data sets that had a minimal AUC of 0.59 were combined in a naive way—for each gene pair, the likelihood ratios were multiplied by each other, resulting in a final microarray coexpression likelihood ratio for each gene pair. Two orthologous protein-protein interaction data sets from Lehner and Fraser20Lehner B Fraser AG A first-draft human protein-interaction map.Genome Biol. 2004; 5: R63Crossref PubMed Google Scholar were used to supplement the GO and microarray coexpression data. One data set contained computationally predicted human protein interactions that had been physically mapped within Ensembl genes. The second data set contained a subset of these protein pairs, to which Lehner et al. had assigned a higher confidence. Three bins were constructed: one containing the higher-confidence gene pairs, one containing the remaining lower-confidence pairs, and a third containing all the other unobserved gene pairs. A human Y2H protein-protein interaction data set from Stelzl et al.19Stelzl U Worm U Lalowski M Haenig C Brembeck FH Goehler H Stroedicke M Zenkner M Schoenherr A Koeppen S Timm J Mintzlaff S Abraham C Bock N Kietzmann S Goedde A Toksoz E Droege A Krobitsch S Korn B Birchmeier W Lehrach H Wanker EE A human protein-protein interaction network: a resource for annotating the proteome.Cell. 2005; 122: 957-968Abstract Full Text Full Text PDF PubMed Scopus (1763) Google Scholar was integrated by mapping the HUGO identifiers to Ensembl genes. Two bins were constructed: one containing the gene pairs for which a Y2H interaction was reported, and one containing all the other unobserved gene pairs. The Bayesian classifier was employed to integrate the various binned types of data. We chose not to learn the Bayesian network structure from the data but to use a predefined Bayesian network structure, for which the conditional probabilities were determined by benchmarking the various data sets against the gold standard (fig. 2) (details provided in appendix A). We subsequently generated four gene networks. One network contained evidence for interaction based on the GO data (GO network). Another network contained evidence for interaction derived from integrating the microarray coexpression and predicted protein-protein interaction data in a naive way (MA+PPI network). A third network combined, in a naive way, the GO and MA+PPI networks (GO+MA+PPI network), and this was complemented with all known true-positive interactions in a final network (GO+MA+PPI+TP network). To relate interacting genes directly or indirectly, an all-pairs shortest path was calculated for each gene network.35Floyd RW Algorithm 97: shortest path.Commun ACM. 1962; 5: 345Crossref Scopus (2371) Google Scholar This measure of the minimal path length between pairs of genes was used in the subsequent method to associate disease genes with each other. Prioritizer assesses whether genes residing within different susceptibility loci are close together within the gene network. This indicates that this method could also work with diseases for which only two loci have been identified. However, in such a case, there is a considerable probability that two genes, each residing in a different locus, would interact by chance. We therefore restricted the analysis to diseases for which at least three contributing disease genes had been identified. These diseases and disease genes were derived from the Online Mendelian Inheritance in Man (OMIM) database,36Hamosh A Scott AF Amberger JS Bocchini CA McKusick VA Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.Nucleic Acids Res Database Issue. 2005; 33: D514-D517Crossref PubMed Scopus (1728) Google Scholar by text mining the first paragraphs of all OMIM disease entries as of March 1, 2005, and extracting the OMIM gene numbers contained within these paragraphs (table A1 in appendix A). The HUGO gene name was later extracted from these OMIM entries and was mapped to an Ensembl gene name. If, for any one disease, there were two disease genes situated at the same chromosome and positionally <200 genes apart, one of the two genes was randomly removed to ensure that no loci would overlap. The diseases for which at least three disease genes remained after filtering were analyzed by artificially generating su