With millions of single-nucleotide polymorphisms (SNPs) identified and characterized, genomewide association studies have begun to identify susceptibility genes for complex traits and diseases. These studies involve the characterization and analysis of very-high-resolution SNP genotype data for hundreds or thousands of individuals. We describe a computationally efficient approach to testing association between SNPs and quantitative phenotypes, which can be applied to whole-genome association scans. In addition to observed genotypes, our approach allows estimation of missing genotypes, resulting in substantial increases in power when genotyping resources are limited. We estimate missing genotypes probabilistically using the Lander-Green or Elston-Stewart algorithms and combine high-resolution SNP genotypes for a subset of individuals in each pedigree with sparser marker data for the remaining individuals. We show that power is increased whenever phenotype information for ungenotyped individuals is included in analyses and that high-density genotyping of just three carefully selected individuals in a nuclear family can recover >90% of the information available if every individual were genotyped, for a fraction of the cost and experimental effort. To aid in study design, we evaluate the power of strategies that genotype different subsets of individuals in each pedigree and make recommendations about which individuals should be genotyped at a high density. To illustrate our method, we performed genomewide association analysis for 27 gene-expression phenotypes in 3-generation families (Centre d'Etude du Polymorphisme Humain pedigrees), in which genotypes for ∼860,000 SNPs in 90 grandparents and parents are complemented by genotypes for ∼6,700 SNPs in a total of 168 individuals. In addition to increasing the evidence of association at 15 previously identified cis-acting associated alleles, our genotype-inference algorithm allowed us to identify associated alleles at 4 cis-acting loci that were missed when analysis was restricted to individuals with the high-density SNP data. Our genotype-inference algorithm and the proposed association tests are implemented in software that is available for free. With millions of single-nucleotide polymorphisms (SNPs) identified and characterized, genomewide association studies have begun to identify susceptibility genes for complex traits and diseases. These studies involve the characterization and analysis of very-high-resolution SNP genotype data for hundreds or thousands of individuals. We describe a computationally efficient approach to testing association between SNPs and quantitative phenotypes, which can be applied to whole-genome association scans. In addition to observed genotypes, our approach allows estimation of missing genotypes, resulting in substantial increases in power when genotyping resources are limited. We estimate missing genotypes probabilistically using the Lander-Green or Elston-Stewart algorithms and combine high-resolution SNP genotypes for a subset of individuals in each pedigree with sparser marker data for the remaining individuals. We show that power is increased whenever phenotype information for ungenotyped individuals is included in analyses and that high-density genotyping of just three carefully selected individuals in a nuclear family can recover >90% of the information available if every individual were genotyped, for a fraction of the cost and experimental effort. To aid in study design, we evaluate the power of strategies that genotype different subsets of individuals in each pedigree and make recommendations about which individuals should be genotyped at a high density. To illustrate our method, we performed genomewide association analysis for 27 gene-expression phenotypes in 3-generation families (Centre d'Etude du Polymorphisme Humain pedigrees), in which genotypes for ∼860,000 SNPs in 90 grandparents and parents are complemented by genotypes for ∼6,700 SNPs in a total of 168 individuals. In addition to increasing the evidence of association at 15 previously identified cis-acting associated alleles, our genotype-inference algorithm allowed us to identify associated alleles at 4 cis-acting loci that were missed when analysis was restricted to individuals with the high-density SNP data. Our genotype-inference algorithm and the proposed association tests are implemented in software that is available for free. Rapid advances in genotyping technology and the availability of very large inventories of SNPs are making new strategies for genetic mapping possible.1The International HapMap Consortium The International HapMap Project.Nature. 2005; 437: 1299-1320Crossref PubMed Scopus (4536) Google Scholar, 2Hirschhorn JN Daly MJ Genome-wide association studies for common diseases and complex traits.Nat Rev Genet. 2005; 6: 95-108Crossref PubMed Scopus (2018) Google Scholar, 3Abecasis GR Ghosh D Nichols TE Linkage disequilibrium: ancient history drives the new genetics.Hum Hered. 2005; 59: 118-124Crossref PubMed Scopus (38) Google Scholar It is now practical to examine hundreds of thousands of SNPs, representing a large fraction of the common variants in the human genome,4Barrett JC Cardon LR Evaluating coverage of genome-wide association studies.Nat Genet. 2006; 38: 659-662Crossref PubMed Scopus (338) Google Scholar, 5Pe'er I de Bakker PI Maller J Yelensky R Altshuler D Daly MJ Evaluating and improving power in whole-genome association studies using fixed marker sets.Nat Genet. 2006; 38: 663-667Crossref PubMed Scopus (238) Google Scholar in very large numbers of individuals. Genetic association studies, which traditionally focused on relatively small numbers of SNPs within candidate genes or regions, can now be performed on a genomic scale. These technological advances, which are revolutionizing human genetics, will greatly impact analytical strategies for family-based association studies. For example, some of the most popular techniques for association analysis of family data are the transmission/disequilibrium test and its extensions,6Spielman RS Ewens WJ A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test.Am J Hum Genet. 1998; 62: 450-458Abstract Full Text Full Text PDF PubMed Scopus (545) Google Scholar, 7Spielman RS McGinnis RE Ewens WJ Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM).Am J Hum Genet. 1993; 52: 506-516PubMed Google Scholar, 8Rabinowitz D A transmission disequilibrium test for quantitative trait loci.Hum Hered. 1997; 47: 342-350Crossref PubMed Scopus (195) Google Scholar, 9Abecasis GR Cardon LR Cookson WOC A general test of association for quantitative traits in nuclear families.Am J Hum Genet. 2000; 66: 279-292Abstract Full Text Full Text PDF PubMed Scopus (944) Google Scholar, 10Martin ER Kaplan NL Weir BS Tests for linkage and association in nuclear families.Am J Hum Genet. 1997; 61: 439-448Abstract Full Text PDF PubMed Scopus (185) Google Scholar which focus on the transmission of alleles from heterozygous parents to their offspring. The strategy results in association tests that are robust to population stratification, even when a single marker is examined, at the cost of a substantial loss in power on a per-genotype basis.11Cardon LR Palmer LJ Population stratification and spurious allelic association.Lancet. 2003; 361: 598-604Abstract Full Text Full Text PDF PubMed Scopus (920) Google Scholar, 12Fulker DW Cherny SS Sham PC Hewitt JK Combined linkage and association analysis for quantitative traits.Am J Hum Genet. 1999; 64: 259-267Abstract Full Text Full Text PDF PubMed Scopus (354) Google Scholar Loss of power occurs because these methods rely on a single marker to simultaneously provide evidence of association and guard against population stratification. When genotype data are available on a genomic scale, methods that use multiple markers to evaluate the effects of population structure, such as genomic control13Devlin B Roeder K Genomic control for association studies.Biometrics. 1999; 55: 997-1004Crossref PubMed Scopus (2155) Google Scholar or structured association mapping,14Pritchard JK Stephens M Rosenberg NA Donnelly P Association mapping in structured populations.Am J Hum Genet. 2000; 67: 170-181Abstract Full Text Full Text PDF PubMed Scopus (1403) Google Scholar are likely to provide a more cost-effective way to guard against population stratification. Thus, as association studies performed on a genomic scale become the norm, we expect that association tests that focus on allelic transmission from heterozygous parents will be replaced by tests that use genomic data to control for stratification. Another feature that we expect will become important in association tests in the future is the ability to incorporate phenotypes of relatives that are not directly measured for the marker of interest when evidence of association is evaluated.15Burdick JT Chen WM Abecasis GR Cheung VG In silico method for inferring genotypes in pedigrees.Nat Genet. 2006; 38: 1002-1004Crossref PubMed Scopus (105) Google Scholar, 16Li M Boehnke M Abecasis GR Efficient study designs for test of genetic association using sibship data and unrelated cases and controls.Am J Hum Genet. 2006; 78: 778-792Abstract Full Text Full Text PDF PubMed Scopus (78) Google Scholar, 17Visscher PM Duffy DL The value of relatives with phenotypes but missing genotypes in association studies for quantitative traits.Genet Epidemiol. 2006; 30: 30-36Crossref PubMed Scopus (18) Google Scholar Since related individuals share a large fraction of their genetic material, genotypes for one or more individuals in a family can be used to estimate genotypes of their relatives. If flanking-marker data are available, missing genotypes often can be imputed with very high accuracy, and the imputed genotypes provide substantial gains in power.15Burdick JT Chen WM Abecasis GR Cheung VG In silico method for inferring genotypes in pedigrees.Nat Genet. 2006; 38: 1002-1004Crossref PubMed Scopus (105) Google Scholar However, even without flanking-marker data, genotypes of relatives can be estimated and used to increase the power of genetic association studies.17Visscher PM Duffy DL The value of relatives with phenotypes but missing genotypes in association studies for quantitative traits.Genet Epidemiol. 2006; 30: 30-36Crossref PubMed Scopus (18) Google Scholar Unfortunately, most of the currently available family-based association tests consider only the phenotypes of individuals for whom genotype data are available. Here, we describe two efficient approaches to testing for association between a genetic marker and a quantitative trait that incorporate phenotype information for relatives and that readily allow genomic data to be used to control for stratification. In one approach, evidence of association is evaluated within a computationally demanding maximum-likelihood framework. In another approach, evidence of association is evaluated using a rapid score test that substantially reduces computational time at the expense of a slight loss of power. When evidence of association at a genetic marker is evaluated, both approaches not only examine individuals for whom genotype and phenotype data are available, but also examine the phenotypes of their relatives, if available. In addition, both approaches can use genotype data at flanking markers to improve estimates of unobserved genotypes and to further increase power. The proposed approaches do not focus on alleles transmitted from heterozygous parents. Instead, to control for stratification in admixed samples, they rely on estimates of the ancestry of each individual to be provided as covariates. These estimates can be computed from genomic data.14Pritchard JK Stephens M Rosenberg NA Donnelly P Association mapping in structured populations.Am J Hum Genet. 2000; 67: 170-181Abstract Full Text Full Text PDF PubMed Scopus (1403) Google Scholar, 18Price AL Patterson NJ Plenge RM Weinblatt ME Shadick NA Reich D Principal components analysis corrects for stratification in genome-wide association studies.Nat Genet. 2006; 38: 904-909Crossref PubMed Scopus (6171) Google Scholar Our approaches can accommodate many distinct pedigree configurations (each with potentially different subsets of genotyped and phenotyped individuals), and, in the “Results” section, we illustrate some of the possibilities through the analysis of simulated and real data sets. We consider a phenotype of interest, measured in a set of pedigrees, each including one or more related individuals. We let Yij and xij denote the observed trait and covariates, respectively, for individual j in family i. Similarly, we let Gijm denote the observed genotype at marker m for individual j in family i. Different amounts of data may be available or missing for each individual. For example, for some individuals, both phenotype and genotype data may be available; for others, only phenotype data or only genotype data may be available; and, for yet others, neither may be available. Further note that, in each individual for whom genotype data are available, genotypes may be available for only a subset of markers. For each of the genotyped SNP markers, we are interested in testing whether observed genotypes and phenotypes are associated. For the SNP being tested, we label the two alleles “A” and “a” and define a genotype score, gijm, as 0, 1, or 2, depending on whether Gijm=a/a, A/a, or A/A, respectively. To avoid unnecessary cumbersome notation, and because we evaluate the evidence of association one SNP at a time, we drop the index m in our presentation below. We consider the modelE(Yij)=μ+βggij+βxxij .(1) Here, β is the population mean, βg is the additive effect for each SNP, and βx is a vector of covariate effects. Recall that the additive genetic effect corresponds to the average change in the phenotype when an allele of type a is replaced with an allele of type A (for details, see the work of Boerwinkle et al.19Boerwinkle E Chakraborty R Sing CF The use of measured genotype information in the analysis of quantitative phenotypes in man. I. Models and analytical methods.Ann Hum Genet. 1986; 50: 181-194Crossref PubMed Scopus (244) Google Scholar). To allow for correlation between different observed phenotypes within each family, we define the variance-covariance matrix Ωi for family i asΩijk=σa2+σg2+σe2if j=kπijkσa2+2φijkσg2if j≠k.(2) Here, the parameters σ2a, σ2g, and σ2e are variance components20Hopper JL Mathews JD Extensions of multivariate normal models for pedigree analysis.Ann Hum Genet. 1982; 46: 373-383Crossref PubMed Scopus (390) Google Scholar, 21Lange K Boehnke M Extensions to pedigree analysis. IV. Covariance components models for multivariate traits.Am J Med Genet. 1983; 14: 513-524Crossref PubMed Scopus (194) Google Scholar, 22Amos CI Robust variance-components approach for assessing genetic linkage in pedigrees.Am J Hum Genet. 1994; 54: 535-543PubMed Google Scholar defined to account for linked major gene effects, background polygenic effects, and environmental effects, respectively. As usual, πijk denotes identical-by-descent (IBD) sharing between individuals j and k at the location of the SNP being tested, and φijk denotes the kinship coefficient between the same two individuals. The model defined in equations (1) and (2) or very similar models form the basis of many family-based association tests.9Abecasis GR Cardon LR Cookson WOC A general test of association for quantitative traits in nuclear families.Am J Hum Genet. 2000; 66: 279-292Abstract Full Text Full Text PDF PubMed Scopus (944) Google Scholar, 12Fulker DW Cherny SS Sham PC Hewitt JK Combined linkage and association analysis for quantitative traits.Am J Hum Genet. 1999; 64: 259-267Abstract Full Text Full Text PDF PubMed Scopus (354) Google Scholar These tests perform well when SNP genotypes are available for all (or nearly all) phenotyped individuals, and, below, we extend two of these tests to accommodate individuals for whom genotypes at the SNP being tested are missing. First, we show how estimates of unobserved genotypes can be obtained. Then, we show how these estimates can be incorporated into variance-components–based likelihood-ratio and score tests. High-throughput SNP genotyping data can be costly and time consuming to generate. When data of this type are generated only for a subset of individuals in each family, it is desirable to estimate genotypes for other individuals in the family, so as to incorporate all available phenotype information in tests of association. One way to accomplish this is to estimate a conditional distribution of the missing genotypes for every individual in the family. In addition to the observed genotypes, this conditional distribution will depend on a vector of intermarker recombination fractions, θ, and a vector of allele frequencies for each marker, F. The intermarker recombination fractions θ can be obtained from one of the publicly available genetic maps23Matise TC Sachidanandam R Clark AG Kruglyak L Wijsman E Kakol J Buyske S Chui B Cohen P de Toma C et al.A 3.9-centimorgan-resolution human single-nucleotide polymorphism linkage map and screening set.Am J Hum Genet. 2003; 73: 271-284Abstract Full Text Full Text PDF PubMed Scopus (99) Google Scholar, 24Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SA Richardsson B Sigurdardottir S Barnard J Hallbeck B Masson G et al.A high-resolution recombination map of the human genome.Nat Genet. 2002; 31: 241-247Crossref PubMed Scopus (1329) Google Scholar or can be estimated from physical maps, by use of the approximation 1 cM≈1 Mb.23Matise TC Sachidanandam R Clark AG Kruglyak L Wijsman E Kakol J Buyske S Chui B Cohen P de Toma C et al.A 3.9-centimorgan-resolution human single-nucleotide polymorphism linkage map and screening set.Am J Hum Genet. 2003; 73: 271-284Abstract Full Text Full Text PDF PubMed Scopus (99) Google Scholar, 24Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SA Richardsson B Sigurdardottir S Barnard J Hallbeck B Masson G et al.A high-resolution recombination map of the human genome.Nat Genet. 2002; 31: 241-247Crossref PubMed Scopus (1329) Google Scholar Our software implementation can rapidly calculate maximum-likelihood allele-frequency estimates for each locus in most small pedigrees.25Abecasis GR Wigginton JE Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers.Am J Hum Genet. 2005; 77: 754-767Abstract Full Text Full Text PDF PubMed Scopus (222) Google Scholar Consider the situation in which Gijm (the genotype at marker m for individual j in family i) is unobserved, and let Gi denote all the observed genotype data for family i. Let Pr(Gi|θ,F) be a function that provides the probability of the observed genotypes Gi conditional on a specific vector of intermarker recombination fractions θ and allele frequencies F. This function can be calculated using the Elston-Stewart26Elston RC Stewart J A general model for the genetic analysis of pedigree data.Hum Hered. 1971; 21: 523-542Crossref PubMed Scopus (1057) Google Scholar or Lander-Green27Lander ES Green P Construction of multilocus genetic linkage maps in humans.Proc Natl Acad Sci USA. 1987; 84: 2363-2367Crossref PubMed Scopus (1165) Google Scholar algorithms, or it can be approximated using Monte-Carlo methods.28Cannings C Thompson EA Skolnick MH Probability functions on complex pedigrees.Adv Appl Probab. 1978; 10: 26-61Crossref Google Scholar, 29Sobel E Lange K Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics.Am J Hum Genet. 1996; 58: 1323-1337PubMed Google Scholar Then, note that⪻(Gijm=A/A|Gi,Θ,F)=⪻(Gi,Gijm=A/A|Θ,F)⪻(Gi|Θ,F) ,⪻(Gijm=A/a|Gi,Θ,F)=⪻(Gi,Gijm=A/a|Θ,F)⪻(Gi|Θ,F) , and⪻(Gijm=a/a|Gi,Θ,F)=⪻(Gi,Gijm=a/a|Θ,F)⪻(Gi|Θ,F) . One approach15Burdick JT Chen WM Abecasis GR Cheung VG In silico method for inferring genotypes in pedigrees.Nat Genet. 2006; 38: 1002-1004Crossref PubMed Scopus (105) Google Scholar for dealing with unobserved genotypes is to check whether any of these conditional probabilities exceeds a predefined threshold (say, 0.99) and then to impute the corresponding genotype. Although this approach would work well in some settings, it could still result in the discarding of useful information. Instead of imputing the most likely genotype, we impute the expected genotype score,g^ijm, which we define asg∧ijm=E(gijm|Gi,Θ,F)=2P(Gijm=A/A|Gi,Θ,F)+P(Gijm=A/a|Gi,Θ,F) . As detailed below, whenever a genotype is not observed, this expected genotype scoreg^ijm can be used in place of the observed genotype gijm. Whatever approach is used to calculate the likelihood of the different genotype configurations, note that all genotype configurations whose likelihoods are evaluated differ by only one or two genotypes; thus, many portions of the likelihood calculation can be reused. By use of our implementation of the Lander-Green algorithm,25Abecasis GR Wigginton JE Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers.Am J Hum Genet. 2005; 77: 754-767Abstract Full Text Full Text PDF PubMed Scopus (222) Google Scholar, 30Abecasis GR Cherny SS Cookson WO Cardon LR Merlin—rapid analysis of dense genetic maps using sparse gene flow trees.Nat Genet. 2002; 30: 97-101Crossref PubMed Scopus (2696) Google Scholar these expected genotype scores can be calculated very rapidly in most small pedigrees (typically, only a few seconds are required to calculate expected genotype scores for ∼500,000 markers in a small sibship). The Lander-Green algorithm assumes that the likelihood calculation can be updated one marker at a time and that its complexity increases exponentially with pedigree size. For larger pedigrees (e.g., those with >15 individuals), we have implemented an Elston-Stewart version of the approach, complete with genotype elimination.31Lange K Goradia TM An algorithm for automatic genotype elimination.Am J Hum Genet. 1987; 40: 250-256PubMed Google Scholar The Elston-Stewart algorithm is designed for pedigrees with no inbreeding and assumes that the likelihood calculation can be factored by individual. Its complexity increases exponentially with the number of markers being analyzed, so that only a subset of the available flanking markers can be used to estimate each unobserved genotype (typically, 5–10 flanking markers can be used, depending on the pattern of missing data in the pedigree). Both implementations are available with source code from our Web sites (Ghost and Merlin). Figure 1 provides an example of how the expected genotype scores are coded. In figure 1A, only the first sibling is genotyped, and no genotype information is available for the three siblings. Thus, the first sibling is assigned a genotype score of 2 (corresponding to two copies of allele A), whereas the other siblings are assigned identical genotype scores of 1+p (where p is the population frequency of allele A). In figure 1B, information at flanking markers is available for all individuals, specifying IBD sharing patterns in the family and resulting in distinct expected genotype scores for each of the siblings (note that, in this case, genotypes could only be inferred for the fourth sibling). In figure 1C, genotype information at the candidate marker is available for one additional sibling, and all genotype scores become integers. In the situation depicted in figure 1C, it would actually be possible to impute genotypes for the third and fourth siblings as A/a and A/A. To accommodate individuals with missing genotype data, we extend our model by replacing equation (1) withE(Yij)=μ+βgg∧ij+βxxij .In this setting, although the above equality holds, the variance-components model given in equation (2) is only approximate (because the variance of each Yij around E(Yij) will be slightly smaller when the genotype score is known and the marker being tested is associated with the trait than when the genotype score is estimated). However, we note that (i) simulations suggest our method appears to perform correctly and (ii) since most genotypes will have no impact or only a small impact on the trait, the differences between our approximation and more-accurate but cumbersome approaches should be slight. One natural way to test association is to consider the multivariate normal likelihoodL=Πi(2π)-ni/2|Ωi|-1/2e[yi-E(yi)]′Ωi-1 [yi-E(yi)].Here, ni is the number of phenotyped individuals in family i and |Ωi| is the determinant of matrix Ωi. The likelihood can be maximized numerically, with respect to the parameter β and the coefficients βg and βx—which together define the expected phenotype vector for family i, E(yi)—and the variance components σ2a, σ2g, and σ2e—which together define the variance-covariance matrix for family i, Ωi. To test for association, we first maximize the likelihood under the null hypothesis with the constraint that βg=0 and denote the resulting likelihood as L0. We then repeat the procedure without constraints on the parameters, to obtain L1. Then, a likelihood-ratio test (LRT) statistic that is asymptotically distributed as χ2 with 1 df can be used to evaluate the evidence of association:TLRT=2lnL1-2lnL0 . The LRT statistic above requires that L0 and L1 be maximized numerically for each SNP, a procedure that can become computationally prohibitive on a genomewide scale. Maximization of L0 is required because estimates of σ2a depend on the observed patterns of IBD sharing at each location. When available computing time is limited, an alternative approach is to first fit a simple variance-components model to the data (with parameters β, βx, σ2g, and σ2e but without parameters βg and σ2a). This model provides a vector of fitted values for each family, which we denote E(yi)(base), and an estimate of the variance-covariance matrix for each family, which we denote Ω(base)i. Using these two quantities, we define the score statisticTSCORE={∑i[g∧i-E(g∧i)]'[Ωi(base)]-1[yi-E(yi)(base)]}2∑i[g∧i-E(g∧i)]'[Ωi(base)]-1[g∧i-E(g∧i)] ,whereg^i is a vector with expected genotype scores for each individual in the ith family, calculated conditional on the available marker data, and E(g^i) is a vector with identical elements that give the unconditional expectation of each genotype score. This expectation is 2p, or twice the frequency of allele A at the SNP being tested. The value 2p arises from the assumption of Hardy-Weinberg equilibrium in the population; before conditioning on genotypes of related individuals, we have probability p2 of observing genotype A/A and probability 2p(1-p) of observing genotype A/a. Thus, for any i and j, we have E(g^ij)=E(gij)=2Pr<(Gij=A/A)+Pr(Gij=A/a)=2p2+2p(1-p)=2p. TSCORE is approximately distributed as χ2 with 1 df. In contrast to the TLRT statistic, which requires one round of numerical maximization for each marker, the TSCORE statistic requires only a single round of numerical optimization to estimate Ω(base)i and E(yi)(base). Thus, the TSCORE statistic should provide a useful and computationally efficient screening tool for genomewide studies. In our preliminary analyses, it allows genomewide association scans in data sets that include thousands of individuals in modest-sized pedigrees (≤15 individuals) to be completed within a few hours. It is important to note that the distribution of TSCORE will deviate from χ2 when σ2a is large. In practice, TSCORE should be used for an initial screening phase in genomewide studies, and promising findings should be reevaluated with the TLRT statistic to avoid an excess of false-positive results in regions of strong linkage. The number of promising statistics that can be reevaluated with TLRT will depend on the available computational resources. We recommend that at least those statistics selected for further follow-up should be evaluated with TLRT. To evaluate the performance of our approach, we simulated different types of pedigrees and patterns of missing genotype data at the SNP being tested for association. Unless otherwise specified, we simulated a SNP with a minor-allele frequency (MAF) of 0.30 that explained 5% of the trait variance and simulated background polygenic effects that accounted for a further 35% of the trait variability. In addition, we simulated genotype data for a 0.3-cM grid of 50 equally spaced flanking SNPs, each with two equally frequent alleles. This should be approximately analogous to using 10,000 SNP markers across the genome to genotype individuals not selected for high-density scanning. We implemented our simulation engine within Merlin,25Abecasis GR Wigginton JE Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers.Am J Hum Genet. 2005; 77: 754-767Abstract Full Text Full Text PDF PubMed Scopus (222) Google Scholar, 30Abecasis GR Cherny SS Cookson WO Cardon LR Merlin—rapid analysis of dense genetic maps using sparse gene flow trees.Nat Genet. 2002; 30: 97-101Crossref PubMed Scopus (2696) Google Scholar allowing others to easily reproduce our results and simulations. To summarize analyses of simulated data, we report expected LOD scores (ELODs), which were calculated as the average of the LOD scores estimated after analysis of each replicate. As usual, LOD scores were defined as χ2/2ln(10). To examine the performance of our method in a real data set, we reanalyzed the data of Cheung et al.32Cheung VG Spielman RS Ewens KG Weber TM Morley M Burdick JT Mapping determinants of human gene expression by regional and genome-wide association.Nature. 2005; 437: 1365-1369Crossref PubMed Scopus (476) Google Scholar The original analysis of Cheung et al.32Cheung VG Spielman RS Ewens KG Weber TM Morley M Burdick JT Mapping determinants of human gene expression by regional and genome-wide association.Nature. 2005; 437: 1365-1369