Genetic risk prediction has several potential applications in medical research and clinical practice and could be used, for example, to stratify a heterogeneous population of patients by their predicted genetic risk. However, for polygenic traits, such as psychiatric disorders, the accuracy of risk prediction is low. Here we use a multivariate linear mixed model and apply multi-trait genomic best linear unbiased prediction for genetic risk prediction. This method exploits correlations between disorders and simultaneously evaluates individual risk for each disorder. We show that the multivariate approach significantly increases the prediction accuracy for schizophrenia, bipolar disorder, and major depressive disorder in the discovery as well as in independent validation datasets. By grouping SNPs based on genome annotation and fitting multiple random effects, we show that the prediction accuracy could be further improved. The gain in prediction accuracy of the multivariate approach is equivalent to an increase in sample size of 34% for schizophrenia, 68% for bipolar disorder, and 76% for major depressive disorders using single trait models. Because our approach can be readily applied to any number of GWAS datasets of correlated traits, it is a flexible and powerful tool to maximize prediction accuracy. With current sample size, risk predictors are not useful in a clinical setting but already are a valuable research tool, for example in experimental designs comparing cases with high and low polygenic risk. Genetic risk prediction has several potential applications in medical research and clinical practice and could be used, for example, to stratify a heterogeneous population of patients by their predicted genetic risk. However, for polygenic traits, such as psychiatric disorders, the accuracy of risk prediction is low. Here we use a multivariate linear mixed model and apply multi-trait genomic best linear unbiased prediction for genetic risk prediction. This method exploits correlations between disorders and simultaneously evaluates individual risk for each disorder. We show that the multivariate approach significantly increases the prediction accuracy for schizophrenia, bipolar disorder, and major depressive disorder in the discovery as well as in independent validation datasets. By grouping SNPs based on genome annotation and fitting multiple random effects, we show that the prediction accuracy could be further improved. The gain in prediction accuracy of the multivariate approach is equivalent to an increase in sample size of 34% for schizophrenia, 68% for bipolar disorder, and 76% for major depressive disorders using single trait models. Because our approach can be readily applied to any number of GWAS datasets of correlated traits, it is a flexible and powerful tool to maximize prediction accuracy. With current sample size, risk predictors are not useful in a clinical setting but already are a valuable research tool, for example in experimental designs comparing cases with high and low polygenic risk. Genome-wide association studies (GWASs) have been highly successful in identifying variants associated with a wide range of complex human diseases.1Visscher P.M. Brown M.A. McCarthy M.I. Yang J. Five years of GWAS discovery.Am. J. Hum. Genet. 2012; 90: 7-24Abstract Full Text Full Text PDF PubMed Scopus (1573) Google Scholar, 2Hindorff L.A. Sethupathy P. Junkins H.A. Ramos E.M. Mehta J.P. Collins F.S. Manolio T.A. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.Proc. Natl. Acad. Sci. USA. 2009; 106: 9362-9367Crossref PubMed Scopus (3050) Google Scholar However, most common diseases are highly polygenic and each variant explains only a tiny proportion of the genetic variation. Even when associated SNPs are considered jointly in polygenic approaches such as polygenic risk scores3Purcell S.M. Moran J.L. Fromer M. Ruderfer D. Solovieff N. Roussos P. O’Dushlaine C. Chambert K. Bergen S.E. Kähler A. et al.A polygenic burden of rare disruptive mutations in schizophrenia.Nature. 2014; 506: 185-190Crossref PubMed Scopus (1000) Google Scholar or genomic best linear unbiased prediction (GBLUP),4Zhou X. Carbonetto P. Stephens M. Polygenic modeling with bayesian sparse linear mixed models.PLoS Genet. 2013; 9: e1003264Crossref PubMed Scopus (445) Google Scholar, 5Speed D. Balding D.J. MultiBLUP: improved SNP-based prediction for complex traits.Genome Res. 2014; 24: 1550-1557https://doi.org/10.1101/gr.169375.113Crossref PubMed Scopus (166) Google Scholar the accuracy of risk prediction is low. The use of more advanced methods4Zhou X. Carbonetto P. Stephens M. Polygenic modeling with bayesian sparse linear mixed models.PLoS Genet. 2013; 9: e1003264Crossref PubMed Scopus (445) Google Scholar, 5Speed D. Balding D.J. MultiBLUP: improved SNP-based prediction for complex traits.Genome Res. 2014; 24: 1550-1557https://doi.org/10.1101/gr.169375.113Crossref PubMed Scopus (166) Google Scholar, 6Lee S.H. van der Werf J.H. Hayes B.J. Goddard M.E. Visscher P.M. Predicting unobserved phenotypes for complex traits from whole-genome SNP data.PLoS Genet. 2008; 4: e1000231Crossref PubMed Scopus (152) Google Scholar, 7Erbe M. Hayes B.J. Matukumalli L.K. Goswami S. Bowman P.J. Reich C.M. Mason B.A. Goddard M.E. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels.J. Dairy Sci. 2012; 95: 4114-4129Abstract Full Text Full Text PDF PubMed Scopus (396) Google Scholar, 8Wei Z. Wang K. Qu H.-Q. Zhang H. Bradfield J. Kim C. Frackleton E. Hou C. Glessner J.T. Chiavacci R. et al.From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes.PLoS Genet. 2009; 5: e1000678Crossref PubMed Scopus (159) Google Scholar improved prediction accuracy for traits where a small number of relatively strong associations have been identified, such as type 1 diabetes, ankylosing spondylitis, and rheumatoid arthritis, but not for other traits characterized by small effect size variants, including psychiatric disorders.4Zhou X. Carbonetto P. Stephens M. Polygenic modeling with bayesian sparse linear mixed models.PLoS Genet. 2013; 9: e1003264Crossref PubMed Scopus (445) Google Scholar, 5Speed D. Balding D.J. MultiBLUP: improved SNP-based prediction for complex traits.Genome Res. 2014; 24: 1550-1557https://doi.org/10.1101/gr.169375.113Crossref PubMed Scopus (166) Google Scholar, 9Li C. Yang C. Gelernter J. Zhao H. Improving genetic risk prediction by leveraging pleiotropy.Hum. Genet. 2014; 133: 639-650Crossref PubMed Scopus (54) Google Scholar A major factor determining how well a polygenic model can predict a trait value in an independent sample is the sample size of the discovery data.10Daetwyler H.D. Villanueva B. Woolliams J.A. Accuracy of predicting the genetic risk of disease using a genome-wide approach.PLoS ONE. 2008; 3: e3395Crossref PubMed Scopus (459) Google Scholar, 11Dudbridge F. Power and predictive accuracy of polygenic risk scores.PLoS Genet. 2013; 9: e1003348Crossref PubMed Scopus (942) Google Scholar Using more individuals will provide more information and hence increase the accuracy of the estimated effect size of a specific SNP. Sample size can also be effectively increased through datasets measured for correlated traits. Recently, we estimated the genetic relationships among five psychiatric disorders from the Psychiatric Genomics Consortium (PGC) by using a bivariate linear mixed model demonstrating that there are significant shared genetic risk factors across the disorders and that measurement of one trait provides information on other genetically correlated traits.12Lee S.H. Ripke S. Neale B.M. Faraone S.V. Purcell S.M. Perlis R.H. Mowry B.J. Thapar A. Goddard M.E. Witte J.S. et al.Cross-Disorder Group of the Psychiatric Genomics ConsortiumInternational Inflammatory Bowel Disease Genetics Consortium (IIBDGC)Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.Nat. Genet. 2013; 45: 984-994Crossref PubMed Scopus (1586) Google Scholar Here we extend our bivariate approach to a multivariate linear mixed model and apply multi-trait genomic best linear unbiased prediction (MTGBLUP)13Henderson C.R. Quass R.L. Multiple trait evaluation using relatives’ records.J. Anim. Sci. 1976; 43: 1188-1197Google Scholar, 14Guo G. Zhao F. Wang Y. Zhang Y. Du L. Su G. Comparison of single-trait and multiple-trait genomic prediction models.BMC Genet. 2014; 15: 30Crossref PubMed Scopus (126) Google Scholar for genetic risk prediction of disease. MTGBLUP is expected to be more powerful because it uses correlations between disorders and jointly evaluates individual risk across disorders. To date, the information from other correlated traits has been little exploited in the context of risk prediction although recently Li et al.9Li C. Yang C. Gelernter J. Zhao H. Improving genetic risk prediction by leveraging pleiotropy.Hum. Genet. 2014; 133: 639-650Crossref PubMed Scopus (54) Google Scholar applied bivariate ridge regression to two genetically correlated diseases to improve risk prediction. An important advantage of the MTGBLUP approach is that it does not require multiple phenotypes to be measured on the same individuals and therefore can be readily applied to any number of existing datasets of genetically related traits. This is particularly beneficial for disease studies that are limited to a single phenotype but typically aim for large sample sizes. Moreover, it is not necessary for the datasets to be genotyped with the same SNP array because SNPs can be imputed to a common set of SNPs, such as those available from the HapMap or 1000 Genomes reference panels.15Abecasis G.R. Auton A. Brooks L.D. DePristo M.A. Durbin R.M. Handsaker R.E. Kang H.M. Marth G.T. McVean G.A. 1000 Genomes Project ConsortiumAn integrated map of genetic variation from 1,092 human genomes.Nature. 2012; 491: 56-65Crossref PubMed Scopus (5676) Google Scholar, 16Altshuler D.M. Gibbs R.A. Peltonen L. Altshuler D.M. Gibbs R.A. Peltonen L. Dermitzakis E. Schaffner S.F. Yu F. Peltonen L. et al.International HapMap 3 ConsortiumIntegrating common and rare genetic variation in diverse human populations.Nature. 2010; 467: 52-58Crossref PubMed Scopus (1998) Google Scholar Prediction accuracy can be expected to improve as more data from phenotypes with shared etiology are utilized. In this report, we apply the MTGBLUP approach to the cross-disorder PGC GWAS data and show a significant increase in risk prediction accuracy in independent cohorts of schizophrenia, bipolar disorder, and major depressive disorder. MTGBLUP increased the discriminant power between the top and bottom 10% of individuals ranked on their risk predictor, implying that this approach might be useful for stratified medicine in a research setting, to develop tailored interventions or treatments for individuals having different risks.17Kapur S. Phillips A.G. Insel T.R. Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it?.Mol. Psychiatry. 2012; 17: 1174-1179Crossref PubMed Scopus (698) Google Scholar, 18Trusheim M.R. Berndt E.R. Douglas F.L. Stratified medicine: strategic and economic implications of combining drugs and clinical biomarkers.Nat. Rev. Drug Discov. 2007; 6: 287-293Crossref PubMed Scopus (394) Google Scholar, 19Insel T. Cuthbert B. Garvey M. Heinssen R. Pine D.S. Quinn K. Sanislow C. Wang P. Research domain criteria (RDoC): toward a new classification framework for research on mental disorders.Am. J. Psychiatry. 2010; 167: 748-751Crossref PubMed Scopus (4065) Google Scholar We further demonstrate a relationship between functionally annotated SNPs and increased prediction accuracy of schizophrenia and bipolar disorder. As the main method, we use a multivariate linear mixed model for the analyses of GWAS data that estimates the total genetic values of individuals directly by utilizing genomic relationships based on SNP information. In the model, a vector of phenotypic observations for each trait is written as a linear function of fixed effects, random genetic effects, and residuals. For simplicity, we constrain the description to a single component for the random genetic effects, but the model can be readily extended to multiple components of random genetic effects:y1=X1b1+Z1g1+e1fortrait1y2=X2b2+Z2g2+e2fortrait2⋮yn=Xnbn+Zngn+enfortraitnwhere y is a vector of trait phenotypes, b is a vector of fixed effects, g is a vector of total genetic value for each individual, and e are residuals. The random effects (g and e) are assumed to be normally distributed with mean zero. X and Z are incidence matrices for the effects b and g, respectively. Subscript 1,…, n represents trait 1 to trait n. The variance covariance matrix is defined asV=[ZAσg12Z'+Iσe12…ZAσg1nZ'+Iσe1n⋮⋱⋮ZAσgn1Z'+Iσen1⋯ZAσgn2Z'+Iσen2]where A is the genomic similarity matrix based on SNP information and I is an identity matrix. The terms σgi2 and σei2 denote the genetic and residual variance of trait i, respectively, and σgij and σeij the genetic and residual covariance between traits i and j. Multi-trait genomic residual maximum likelihood (MTGREML) estimates (see Appendix A) are obtained with the average information algorithm.20Lee S.H. van der Werf J.H.J. An efficient variance component approach implementing an average information REML suitable for combined LD and linkage mapping with a general complex pedigree.Genet. Sel. Evol. 2006; 38: 25-43Crossref PubMed Google Scholar, 21Lee S.H. Yang J. Goddard M.E. Visscher P.M. Wray N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood.Bioinformatics. 2012; 28: 2540-2542Crossref PubMed Scopus (379) Google Scholar, 22Yang J. Lee S.H. Goddard M.E. Visscher P.M. GCTA: a tool for genome-wide complex trait analysis.Am. J. Hum. Genet. 2011; 88: 76-82Abstract Full Text Full Text PDF PubMed Scopus (3812) Google Scholar Next we show that SNP risk predictors can be easily transformed from individual risk predictors with a simplified BLUP model that uses individual risk predictors as the dependent variable and fits a covariance structure without residual variance (i.e., heritability is 1). Individual risk predictors are the best linear unbiased predictors (BLUPs) of total genetic value of individual subjects contributed by genome-wide SNPs, i.e., g in the previous section. Analogously, SNP risk predictors are defined as the BLUPs of SNP effects estimated jointly with a linear mixed model that intrinsically accounts for linkage disequilibrium between SNPs. The SNP-BLUP model is computationally more demanding for a large number of SNPs. Therefore, it is desirable to estimate genetic values (GBLUP) for efficiency and to transform them to SNP-BLUP. The SNP-BLUP can be projected to predict genetic risk for independent validation sample without the need to have access to the training individuals. The SNP-BLUP estimates can be applied to independent datasets as the SNP weights used to create a risk profile score, for example using the PLINK-score command. The individual BLUP model is[g1⋮gn]=[σg12⋯σg1n⋮⋱⋮σgn1⋯σgn2]⊗A⋅[Z1⋯0⋮⋱⋮0⋯Zn]′⋅V−1[y1−X1b1⋮yn−Xnbn].(Equation 1) SNP-BLUP model is[u1⋮un]=[σu12⋯σu1n⋮⋱⋮σun1⋯σun2]⊗I⋅[W1⋯0⋮⋱⋮0⋯Wn]′⋅Ω−1[y1−X1b1⋮yn−Xnbn]where Wi is a N × M matrix of standardized SNP coefficients with N being the number of individuals and M the number of SNPs, ⊗ is the Kronecker product function, and the variance covariance matrix for SNP-BLUP mode is defined asΩ=[WIσu12W'+Iσe12…WIσu1nW'+Iσe1n⋮⋱⋮WIσun1W'+Iσen1⋯WIσgn2W'+Iσen2].Replacing y with g (individual BLUP) and setting residual (co)variances as zero (because individual BLUP is already adjusted for residuals), the variance covariance matrix can be simplified asΩ=[σu12⋯σu1n⋮⋱⋮σun1⋯σun2]⊗WW'=[σu12⋯σu1n⋮⋱⋮σun1⋯σun2]⊗A⋅M.Therefore, SNP-BLUP can be written as[u1⋮un]=[W1⋯0⋮⋱⋮0⋯Wn]′⊗A−1[g1⋮gn]⋅M−1,(Equation 2) and this can be rewritten as[W1⋯0⋮⋱⋮0⋯Wn][u1⋮un]=[g1⋮gn].This agrees with Hayes et al.23Hayes B.J. Visscher P.M. Goddard M.E. Increased accuracy of artificial selection by using the realized relationship matrix.Genet. Res. 2009; 91: 47-60Crossref PubMed Scopus (431) Google Scholar and Yang et al.22Yang J. Lee S.H. Goddard M.E. Visscher P.M. GCTA: a tool for genome-wide complex trait analysis.Am. J. Hum. Genet. 2011; 88: 76-82Abstract Full Text Full Text PDF PubMed Scopus (3812) Google Scholar when it reduces to a univariate model. Equation 2, after replacing [g1, …, gn]’ with the right-hand side in Equation 1, can be rewritten as[u1⋮un]=[W1⋯0⋮⋱⋮0⋯Wn]′⋅[σg12⋯σg1n⋮⋱⋮σgn1⋯σgn2]⊗I⋅[Z1⋯0⋮⋱⋮0⋯Zn]′⋅V−1[y1−X1b1⋮yn−Xnbn]M−1.(Equation 3) This agrees with VanRaden24VanRaden P.M. Efficient methods to compute genomic predictions.J. Dairy Sci. 2008; 91: 4414-4423Abstract Full Text Full Text PDF PubMed Scopus (3094) Google Scholar and Strandén and Garrick25Strandén I. Garrick D.J. Technical note: Derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit.J. Dairy Sci. 2009; 92: 2971-2975Abstract Full Text Full Text PDF PubMed Scopus (173) Google Scholar derived from a matrix inversion theory when it reduces to a univariate model. We extended our approach to genomic partitions according to gene annotation. An enrichment analysis based on gene annotation categories has shown that SNPs located within genes identified as being differentially expressed in the central nervous system (CNS) explain a significantly larger proportion of phenotypic variance than expected by chance for schizophrenia and bipolar disorder.12Lee S.H. Ripke S. Neale B.M. Faraone S.V. Purcell S.M. Perlis R.H. Mowry B.J. Thapar A. Goddard M.E. Witte J.S. et al.Cross-Disorder Group of the Psychiatric Genomics ConsortiumInternational Inflammatory Bowel Disease Genetics Consortium (IIBDGC)Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.Nat. Genet. 2013; 45: 984-994Crossref PubMed Scopus (1586) Google Scholar, 26Raychaudhuri S. Korn J.M. McCarroll S.A. Altshuler D. Sklar P. Purcell S. Daly M.J. International Schizophrenia ConsortiumAccurately assessing the risk of schizophrenia conferred by rare copy-number variation affecting genes with brain function.PLoS Genet. 2010; 6: e1001097Crossref PubMed Scopus (112) Google Scholar It is of interest to determine whether the gene/functional annotation information can further increase the prediction accuracy. In the annotation analysis, we grouped SNPs that were located within ±50 kb from the 5′ and 3′ UTRs of 2,725 genes differentially expressed in the CNS26Raychaudhuri S. Korn J.M. McCarroll S.A. Altshuler D. Sklar P. Purcell S. Daly M.J. International Schizophrenia ConsortiumAccurately assessing the risk of schizophrenia conferred by rare copy-number variation affecting genes with brain function.PLoS Genet. 2010; 6: e1001097Crossref PubMed Scopus (112) Google Scholar, 27Lee S.H. DeCandia T.R. Ripke S. Yang J. Sullivan P.F. Goddard M.E. Keller M.C. Visscher P.M. Wray N.R. et al.Schizophrenia Psychiatric Genome-Wide Association Study Consortium (PGC-SCZ)International Schizophrenia Consortium (ISC)Molecular Genetics of Schizophrenia Collaboration (MGS)Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs.Nat. Genet. 2012; 44: 247-250Crossref PubMed Scopus (439) Google Scholar together, and 21% of the SNPs belonged to this category. We then estimated SNP effects from a two-component model fitting relationship matrices of SNPs in CNS genes and SNPs localized elsewhere. The model isy1=X1b1+Z1g1CNS+Z1g1non−CNS+e1fortrait1⋮yn=Xnbn+ZngnCNS+Zngnnon−CNS+enfortraitnwhere gCNS is a vector of random genetic effects due to the CNS genes and gnon-CNS is a vector of random genetic effects resulting from the non-CNS region. We also tested another gene set that included candidate genes set for schizophrenia, autism, and intellectual disability (SAI).3Purcell S.M. Moran J.L. Fromer M. Ruderfer D. Solovieff N. Roussos P. O’Dushlaine C. Chambert K. Bergen S.E. Kähler A. et al.A polygenic burden of rare disruptive mutations in schizophrenia.Nature. 2014; 506: 185-190Crossref PubMed Scopus (1000) Google Scholar We matched these candidate genes with UCSC Genome Browser human genome version 18 (on which the discovery dataset was built) and retained 4,133 autosomal genes. It is noted that we excluded 479 genes flanking GWAS SNPs identified in the Swedish sample28Ripke S. O’Dushlaine C. Chambert K. Moran J.L. Kähler A.K. Akterin S. Bergen S.E. Collins A.L. Crowley J.J. Fromer M. et al.Multicenter Genetic Studies of Schizophrenia ConsortiumPsychosis Endophenotypes International ConsortiumWellcome Trust Case Control Consortium 2Genome-wide association analysis identifies 13 new risk loci for schizophrenia.Nat. Genet. 2013; 45: 1150-1159Crossref PubMed Scopus (1135) Google Scholar to avoid artifact inflation in prediction accuracy. We annotated SNPs within the SAI genes (28% of the SNPs) and fitted genomic similarity matrices of the annotated SNPs and the rest of SNPs in the two-component model. We had access to the PGC Cross-Disorder data and three independent validation datasets. The details of the PGC Cross-Disorder data with additionally available ADHD samples are described elsewhere.12Lee S.H. Ripke S. Neale B.M. Faraone S.V. Purcell S.M. Perlis R.H. Mowry B.J. Thapar A. Goddard M.E. Witte J.S. et al.Cross-Disorder Group of the Psychiatric Genomics ConsortiumInternational Inflammatory Bowel Disease Genetics Consortium (IIBDGC)Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.Nat. Genet. 2013; 45: 984-994Crossref PubMed Scopus (1586) Google Scholar The datasets stored in the PGC central server follow strict guidelines with local ethics committee approval. Genotype data from each study cohort were processed through the stringent PGC pipeline and imputation of autosomal SNPs was carried out with the HapMap3 reference sample.29Cross-Disorder Group of the Psychiatric Genomics ConsortiumIdentification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis.Lancet. 2013; 381: 1371-1379Abstract Full Text Full Text PDF PubMed Scopus (2086) Google Scholar In each imputation cohort, we retained only SNPs with MAF >0.01 and imputation R2 >0.6. The number of SNPs used in this study was 745,705. We excluded certain individuals to ensure that all samples from the five disorders were completely unrelated in the conventional sense, so that no pair of individuals had a genome-wide similarity relationship greater than 0.05. The numbers of case and control subjects used in this study are shown in Table 1. All phenotypes were controlled for cohort, sex, and the first 20 principal components estimated from genome-wide SNPs. Adjustments were performed for each trait.Table 1Estimates of SNP Heritability and Genetic Correlations from Multivariate Analysis of Five Psychiatric DisordersDisordersCasesControlsSNP-h2 on the Liability ScaleSESCZ8,8266,1060.2350.011BIP5,8673,3280.2180.017MDD8,7706,5060.2860.023ASD3,0863,1630.1300.024ADHD3,9978,4790.2810.022Genetic CorrelationSEBIP/SCZ5,867/8,8263,328/6,1060.5900.048MDD/SCZ8,770/8,8266,506/6,1060.3650.047MDD/BIP8,770/5,8676,506/3,3280.3710.060ASD/SCZ3,086/8,8263,163/6,1060.1940.071ASD/BIP3,086/5,8673,163/3,3280.0840.089ASD/MDD3,086/8,7703,163/6,5060.0540.089ADHD/SCZ3,997/8,8268,479/6,1060.0550.046ADHD/BIP3,997/5,8678,479/3,3280.1600.059ADHD/MDD3,997/8,7708,479/6,5060.2420.059ADHD/ASD3,997/3,0868,479/3,163−0.0440.088Abbreviations are as follows: SE, standard error; SCZ, schizophrenia; BIP, bipolar disorder; MDD, major depressive disorder; ASD, autism spectrum disorder; ADHD, attention deficit disorder. Open table in a new tab Abbreviations are as follows: SE, standard error; SCZ, schizophrenia; BIP, bipolar disorder; MDD, major depressive disorder; ASD, autism spectrum disorder; ADHD, attention deficit disorder. In preliminary analysis, using the multivariate linear mixed model, we estimated genetic variances and genetic correlations between the five psychiatric disorders (Table 1). The estimates agreed with those reported in the previous study12Lee S.H. Ripke S. Neale B.M. Faraone S.V. Purcell S.M. Perlis R.H. Mowry B.J. Thapar A. Goddard M.E. Witte J.S. et al.Cross-Disorder Group of the Psychiatric Genomics ConsortiumInternational Inflammatory Bowel Disease Genetics Consortium (IIBDGC)Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.Nat. Genet. 2013; 45: 984-994Crossref PubMed Scopus (1586) Google Scholar (Figure S1) but were slightly less accurate (larger standard errors) because of the smaller sample size due to excluding genetically related samples across all five disorders rather than across only two traits in the bivariate analyses. To evaluate the risk prediction performance of MTGBLUP, we performed within-study cross-validation of the PCG data, i.e., internal validation. We randomly split the data for each disease into a training sample containing ∼80% of individuals and a validation sample containing the remaining ∼20%30Roche O. Schneider P. Zuegge J. Guba W. Kansy M. Alanine A. Bleicher K. Danel F. Gutknecht E.-M. Rogers-Evans M. et al.Development of a virtual screening method for identification of “frequent hitters” in compound libraries.J. Med. Chem. 2002; 45: 137-142Crossref PubMed Scopus (284) Google Scholar and repeated this five times. For assessing predictive performance in the internal validation, we calculated the correlation coefficient between the observed disease status and the predicted genomic risk score of the validation individuals. We also regressed observed disease status on risk scores. If the risk scores are unbiased estimates of genetic risk then the regression coefficient is expected to be 1, i.e., the covariance between true and estimated risks equals the variance of estimated risks. Deviations from 1 reflect the degree of bias of the risk scores. We averaged the correlation and regression coefficients and estimated empirical standard errors over five replicates. Using the empirical standard errors estimates, a t test was performed to assess differences in prediction accuracy between methods. In the within-study cross-validation, MTGBLUP outperformed single-trait genomic best linear unbiased prediction (STGBLUP) for all disorders: the gain in prediction accuracy was significant for schizophrenia (p < 6.0 × 10−8) and bipolar disorder (p < 6.6 × 10−11) (Figure S2). The slope from the regression of disease status on predicted risk score ranged from 0.88 to 1.14 (Table S1), indicating that the risk scores are well calibrated. Results obtained from a within-study validation might not reflect the true performance when SNP effects estimated from the training data are spuriously associated with the diseases. To better assess the true prediction potential of MTGBLUP, risk scores derived from the complete PCG data were validated in independent samples for schizophrenia, bipolar, and major depressive disorder. As independent validation sets, we used Swedish schizophrenia28Ripke S. O’Dushlaine C. Chambert K. Moran J.L. Kähler A.K. Akterin S. Bergen S.E. Collins A.L. Crowley J.J. Fromer M. et al.Multicenter Genetic Studies of Schizophrenia ConsortiumPsychosis Endophenotypes International ConsortiumWellcome Trust Case Control Consortium 2Genome-wide association analysis identifies 13 new risk loci for schizophrenia.Nat. Genet. 2013; 45: 1150-1159Crossref PubMed Scopus (1135) Google Scholar and bipolar GWAS data31Bergen S.E. O’Dushlaine C.T. Ripke S. Lee P.H. Ruderfer D.M. Akterin S. Moran J.L. Chambert K.D. Handsaker R.E. Backlund L. et al.Genome-wide association study in a Swedish population yields support for greater CNV and MHC involvement in schizophrenia compared with bipolar disorder.Mol. Psychiatry. 2012; 17: 880-886Crossref PubMed Scopus (177) Google Scholar and the GENRED2 major depressive disorder dataset collected by the same methods as reported for the GENRED1 dataset.32Shi J. Potash J.B. Knowles J.A. Weissman M.M. Coryell W. Scheftner W.A. Lawson W.B. DePaulo Jr., J.R. Gejman P.V. Sanders A.R. et al.Genome-wide association study of recurrent early-onset major depressive disorder.Mol. Psychiatry. 2011; 16: 193-201Crossref PubMed Scopus (210) Google Scholar SNPs in the validation data were processed through the same stringent quality control as the discovery data. The Swedish schizophrenia data were imputed with HapMap3 as reference. The bipolar disorder data and major depressive disorder data were imputed with the 1000 Genomes Project data as reference. Post-imputation quality control was applied to exclude poorly imputed SNPs from the validation sets. Finally, we selected SNPs that matched those in the discovery set. The number of SNPs in each validation set is shown in Table 2. Individuals were removed from the validation datasets if they had relatedness >0.05 to any one of the individuals in the discovery set. Table 2 gives the numbers of case and control subjects in the independent validation datasets before and after excluding related individuals. In the discovery set, we obtained SNP solutions by applying SNP-BLUP (Equation 3) and then projected the SNP solution to the genotypes of the validation individuals (Equation 2). For assessing predictive performance in the independent validation, the correlation and regression coefficients were used as measures of prediction accuracy and biasedness, respectively, similar to the internal validation. A likelihood ratio test (LRT) was used to test for differences in prediction accuracy between methods comparing the likelihood of a logistic regres