Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million–SNP HapMap data set. Finally, we evaluated methods of estimating the value of r2 between a pair of SNPs and concluded that all methods estimated r2 well when the estimated value was ⩾0.8. Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million–SNP HapMap data set. Finally, we evaluated methods of estimating the value of r2 between a pair of SNPs and concluded that all methods estimated r2 well when the estimated value was ⩾0.8. The size and scale of genetic-variation data sets for both disease and population studies have increased enormously. A large number of SNPs have been identified (current databases show 9 million of the posited 10–13 million common SNPs in the human genome [International HapMap Consortium International HapMap Consortium, 2005International HapMap Consortium A haplotype map of the human genome.Nature. 2005; 437: 1299-1320Crossref PubMed Scopus (4545) Google Scholar]); genotyping technology has advanced at a dramatic pace, so that 500,000 SNP assays can be undertaken in a single experiment; and patterns of correlations among SNPs (linkage disequilibrium [LD]) have been catalogued in multiple populations, yielding efficient marker panels for genomewide investigations (see the International HapMap Project Web site). These genetic advances coincide with recognition of the need for large case-control samples to robustly identify genetic variants for complex traits. As a result, genomewide association studies are now being undertaken, and much effort is being made to develop efficient statistical techniques for analyzing the resulting data, to uncover the location of disease genes. In addition, the advances allow much more detailed analysis of candidate genes identified by more traditional linkage-analysis methods. Many methods of mapping disease genes assume that haplotypes from case and control individuals are available in the region of interest. Such approaches have been successful in localizing many monogenic disorders (Lazzeroni Lazzeroni, 2001Lazzeroni L A chronology of fine-scale gene mapping by linkage disequilibrium.Stat Methods Med Res. 2001; 10: 57-76Crossref PubMed Scopus (15) Google Scholar), and there is increasing evidence, of both a practical and theoretical nature, that the use of haplotypes can be more powerful than individual markers in the search for more-complex traits (Puffenberger et al. Puffenberger et al., 1994Puffenberger E Kauffman E Bolk S Matise T Washington S Angrist M Weissenbach J Garver KL Mascari M Ladda R Slaugenhaupt SA Chakravarti A Identity-by-descent and association mapping of a recessive gene for Hirschsprung disease on human chromosome 13q22.Hum Mol Genet. 1994; 3: 1217-1225Crossref PubMed Scopus (200) Google Scholar; Akey et al. Akey et al., 2001Akey J Jin L Xiong M Haplotypes vs single marker linkage disequilibrium tests: what do we gain?.Eur J Hum Genet. 2001; 9: 291-300Crossref PubMed Scopus (351) Google Scholar; Hugot et al. Hugot et al., 2001Hugot JP Chamaillard M Zouali H Lesage S Cezard JP Belaiche J Almer S Tysk C O'Morain CA Gassull M Binder V Finkel Y Cortot A Modigliani R Laurent-Puig P Gower-Rousseau C Macry J Colombel JF Sahbatou M Thomas G Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease.Nature. 2001; 411: 599-603Crossref PubMed Scopus (4485) Google Scholar; Rioux et al. Rioux et al., 2001Rioux J Daly M Silverberg M Lindblad K Steinhart H Cohen Z Delmonte T et al.Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease.Nat Genet. 2001; 29: 223-228Crossref PubMed Scopus (678) Google Scholar). Similarly, haplotypes are required for many population-genetics analyses, including some methods for inferring selection (Sabeti et al. Sabeti et al., 2002Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SF Gabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HC Campbell SJ Altshuler D Cooper R Kwiatkowski D Ward R Lander ES Detecting recent positive selection in the human genome from haplotype structure.Nature. 2002; 419: 832-837Crossref PubMed Scopus (1319) Google Scholar), and for studying recombination (Fearnhead and Donnelly Fearnhead and Donnelly, 2001Fearnhead P Donnelly P Estimating recombination rates from population genetic data.Genetics. 2001; 159: 1299-1318PubMed Google Scholar; Myers and Griffiths Myers and Griffiths, 2003Myers S Griffiths R Bounds on the minimum number of recombination events in a sample history.Genetics. 2003; 163: 375-394PubMed Google Scholar) and historical migration (Beerli and Felsenstein Beerli and Felsenstein, 2001Beerli P Felsenstein J Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach.Proc Natl Acad Sci USA. 2001; 98: 4563-4568Crossref PubMed Scopus (1333) Google Scholar; De Iorio and Griffiths De Iorio and Griffiths, 2004De Iorio M Griffiths R Importance sampling on coalescent histories. II. Subdivided population models.Adv Appl Probab. 2004; 36: 434-454Crossref Scopus (54) Google Scholar). It is possible to determine haplotypes by use of experimental techniques, but such approaches are considerably more expensive and time-consuming than modern high-throughput genotyping. The statistical determination of haplotype phase from genotype data is thus potentially very valuable if the estimation can be done accurately. This problem has received an increasing amount of attention over recent years, and several computational and statistical approaches have been developed in the literature (see Salem et al. [Salem et al., 2005Salem M Wessel J Schork J A comprehensive literature review of haplotyping software and methods for use with unrelated individuals.Hum Genomics. 2005; 2: 39-66Crossref PubMed Scopus (60) Google Scholar] for a recent literature review). Existing methods include parsimony approaches (Clark Clark, 1990Clark AG Inference of haplotypes from PCR-amplified samples of diploid populations.Mol Biol Evol. 1990; 7: 111-122PubMed Google Scholar; Gusfield Gusfield, 2000Gusfield D A practical algorithm for optimal inference of haplotypes from diploid populations.Proc Int Conf Intell Syst Mol Biol. 2000; 8: 183-189PubMed Google Scholar, Gusfield, 2001Gusfield D Inference of haplotypes from samples of diploid populations: complexity and algorithms.J Comput Biol. 2001; 8: 305-323Crossref PubMed Scopus (128) Google Scholar), maximum-likelihood methods (Excoffier and Slakin Excoffier and Slakin, 1995Excoffier L Slakin M Maximum likelihood estimation of molecular haplotype frequencies in a diploid population.Mol Biol Evol. 1995; 12: 921-927PubMed Google Scholar; Hawley and Kidd Hawley and Kidd, 1995Hawley M Kidd K HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes.J Hered. 1995; 86: 409-411PubMed Google Scholar; Long et al. Long et al., 1995Long J Williams R Urbanek M An E-M algorithm and testing strategy for multiple-locus haplotypes.Am J Hum Genet. 1995; 56: 799-810PubMed Google Scholar; Fallin and Schork Fallin and Schork, 2000Fallin D Schork NJ Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data.Am J Hum Genet. 2000; 67: 947-959Abstract Full Text Full Text PDF PubMed Scopus (334) Google Scholar; Qin et al. Qin et al., 2002Qin ZS Niu T Liu JS Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms.Am J Hum Genet. 2002; 71: 1242-1247Abstract Full Text Full Text PDF PubMed Scopus (422) Google Scholar), Bayesian approaches based on conjugate priors (Lin et al. Lin et al., 2002Lin S Cutler DJ Zwick ME Chakravarti A Haplotype inference in random population samples.Am J Hum Genet. 2002; 71: 1129-1137Abstract Full Text Full Text PDF PubMed Scopus (156) Google Scholar, Lin et al., 2004bLin S Chakravarti A Cutler D Haplotype and missing data inference in nuclear families.Genome Res. 2004; 14: 1624-1632Crossref PubMed Scopus (38) Google Scholar; Niu et al. Niu et al., 2002Niu T Qin ZS Xu X Liu JS Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms.Am J Hum Genet. 2002; 70: 157-169Abstract Full Text Full Text PDF PubMed Scopus (522) Google Scholar) and on priors from population genetics (Stephens et al. Stephens et al., 2001Stephens M Smith NJ Donnelly P A new statistical method for haplotype reconstruction from population data.Am J Hum Genet. 2001; 68: 978-989Abstract Full Text Full Text PDF PubMed Scopus (6195) Google Scholar; Stephens and Donnelly Stephens and Donnelly, 2003Stephens M Donnelly P A comparison of Bayesian methods for haplotype reconstruction from population genotype data.Am J Hum Genet. 2003; 73: 1162-1169Abstract Full Text Full Text PDF PubMed Scopus (2940) Google Scholar; Stephens and Scheet Stephens and Scheet, 2005Stephens M Scheet P Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am J Hum Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar), and (im)perfect phylogeny approaches (Eskin et al. Eskin et al., 2003Eskin E Halperin E Karp R Efficient reconstruction of haplotype structure via perfect phylogeny.J Bioinform Comput Biol. 2003; 1: 1-20Crossref PubMed Scopus (83) Google Scholar; Gusfield Gusfield, 2003Gusfield D (2003) Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. Paper presented at the Proceedings of the 6th Annual International Conference on Computational Biology, Washington, DCGoogle Scholar). Up to now, no comprehensive comparison of many of these approaches has been conducted. The forthcoming era of genomewide studies presents two new challenges to the endeavor of haplotype-phase inference. First, the size of data sets that experimenters will want to phase is about to increase dramatically, in terms of both numbers of loci and numbers of individuals. For example, we might expect data sets consisting of 500,000 SNPs genotyped in 2,000 individuals in some genomewide studies. Second, to date, most approaches have focused on inferring haplotypes from samples of unrelated individuals, but estimation of haplotypes from samples of related individuals is likely to become important. When inferring haplotypes within families, substantially more information is available than for samples of unrelated individuals. For example, consider the situation in which a father-mother-child trio has been genotyped at a given SNP locus. With no missing data, phase can be determined precisely, unless all three individuals are heterozygous at the locus in question. Of loci with a minor-allele frequency of 20%, for example, just 5.1% will be phase unknown in trios, but this rises to 32% in unrelated individuals. With missing data, other combinations of genotypes can also fail to uniquely determine phase. In this study, we describe the extension of several existing algorithms for dealing with trio data. We then describe a comprehensive evaluation of the performance of these algorithms for both trios and unrelated individuals. The evaluation uses both simulated and real data sets of a larger size (in terms of numbers of SNPs) than has been previously been considered. We draw the encouraging conclusion that all methods provide a very good level of accuracy on trio data sets. Overall, the PHASE (v2.1) algorithm provided the most accurate estimation on all the data sets considered. For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap CEPH trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this study had comparable but slightly worse error rates. The error rates for trios are comparable to expected levels of genotyping error and missing data and highlight the level of accuracy that the best phasing algorithms can provide on a useful scale. We also observed substantial variation in the speed of the algorithms we considered. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million–SNP HapMap data set (International HapMap Consortium International HapMap Consortium, 2005International HapMap Consortium A haplotype map of the human genome.Nature. 2005; 437: 1299-1320Crossref PubMed Scopus (4545) Google Scholar). In addition, the data sets used in this comparison will be made available, to form a benchmark set to aid the future development and assessment of phasing algorithms. Finally, we evaluated methods of estimating the value of r2 between a pair of SNPs. The most accurate method for estimating r2 was to first use PHASE to infer the haplotypes across the region and then to estimate r2 between the pair of SNPs as if the haplotypes were known. All methods estimated r2 well when the estimated value was ⩾0.8. In this section, we describe the algorithms implemented in this study. Since most of these algorithms have been described elsewhere, we give only a brief overview of each method, together with some details concerning how each method was extended to cope with father-mother-child trios. Following a description of our notation and the assumptions made by each method, there is one subsection for each new method. Individuals who contributed to the development of the trio version of each method are shown in parentheses as part of the subsection title. In each subsection, expressed opinions are those of the contributing authors of that subsection and not of the combined set of authors as a group. We conclude with a concise overview that relates the different methods according to the assumptions they make about the most-plausible haplotype reconstructions. We consider m linked SNPs on a chromosomal region of n trio families, where each trio consists of a mother, a father, and one offspring. We use the following notation throughout. Let G=(G1,…,Gn) denote all the observed genotypes, in which Gi=(GMi,GFi,GCi) denotes the ith trio. GFi, GMi, and GCii denote the observed genotype data for the father, mother, and child, respectively, and each are vectors of length m—that is, GFi=(GFi1,…,GFim), with GFik=0, 1, or 2 representing homozygous wild-type, heterozygous, or homozygous mutant genotypes, respectively, at SNP marker k. Similarly, let H=(H1,H2,…,Hn) denote the unobserved haplotype configurations compatible with G, in which Hi=(HMi,HFi), where HMi=(HMi1,HMi2) and HFi=(HFi1,HFi2) denote the haplotype pairs of the mother and father, respectively. We use the notation HFi1⊕HFi2=GFi to indicate that the two haplotypes are compatible with the genotype GFi. Also, we let Θ=(θ1,…,θs) be a vector of unknown population haplotype frequencies of the s possible haplotypes that are consistent with the sample. All of the following algorithms make the assumption that all the parents are sampled independently from the population and that no recombination occurs in the transmission of haplotypes from the parents to children. The PHASE algorithm (Stephens et al. Stephens et al., 2001Stephens M Smith NJ Donnelly P A new statistical method for haplotype reconstruction from population data.Am J Hum Genet. 2001; 68: 978-989Abstract Full Text Full Text PDF PubMed Scopus (6195) Google Scholar; Stephens and Donnelly Stephens and Donnelly, 2003Stephens M Donnelly P A comparison of Bayesian methods for haplotype reconstruction from population genotype data.Am J Hum Genet. 2003; 73: 1162-1169Abstract Full Text Full Text PDF PubMed Scopus (2940) Google Scholar; Stephens and Scheet Stephens and Scheet, 2005Stephens M Scheet P Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am J Hum Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar) is a Bayesian approach to haplotype inference that uses ideas from population genetics—in particular, coalescent-based models—to improve accuracy of haplotype estimates for unrelated individuals sampled from a population. The algorithm attempts to capture the fact that, over short genomic regions, sampled chromosomes tend to cluster together into groups of similar haplotypes. With the explicit incorporation of recombination in the most recent version of the algorithm (Stephens and Scheet Stephens and Scheet, 2005Stephens M Scheet P Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am J Hum Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar), this clustering of haplotypes may change as one moves along a chromosome. The method uses a flexible model for the decay of LD with distance that can handle both “blocklike” and “nonblocklike” patterns of LD. We extended the algorithm described by Stephens and Scheet (Stephens and Scheet, 2005Stephens M Scheet P Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am J Hum Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar) to allow for data from trios (two parents and one offspring). We treat the parents as a random sample from the population and aim to estimate their haplotypes, taking into account both the genotypes of the parents and the genotype of the child. More specifically, we aim to sample from the distribution Pr(HF, HM|GF, GM, GC) (compared with sampling from Pr(HF, HM|GF, GM), as shown in the work by Stephens and Scheet [Stephens and Scheet, 2005Stephens M Scheet P Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am J Hum Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar]). To do this, we use a Markov chain–Monte Carlo (MCMC) algorithm very similar to that of Stephens and Scheet (Stephens and Scheet, 2005Stephens M Scheet P Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am J Hum Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar), but, instead of updating one individual at a time, we update pairs of parents simultaneously. Note that the observed genotypes may include missing data at some loci, in which case the inferred haplotype pairs will include estimates of the unobserved alleles. When updating the parents in trio i, this involves computing, for each possible pair of haplotype combinations (HFi={hf, hf′};HMi={hm, hm′}) in the two parents, the probability ⪻(HFi={hf,hf′},HMi={hm,hm′}|GFi,GMi,GCi,HF-i,HM-i,ρ)∝αiβiγi , where αi=(2-δhfhf′)π(hf|HF-i,HM-i,ρ,μ)π(hf′|HF-i,HM-i,ρ,μ) ,βi=(2-δhmhm′)π(hm|HF-i,HM-i,ρ,μ)π(hm′|HF-i,HM-i,ρ,μ) , and γi=⪻[GCi|HFi=(hf,hf′),HMi=(hm,hm′)] , and where δhh′ is 1 if h=h′ and is 0 otherwise; HF-i and HM-i are the sets HF and HM with HFi and HMi removed, respectively; π is a modification of the conditional distribution of Fearnhead and Donnelly (Fearnhead and Donnelly, 2001Fearnhead P Donnelly P Estimating recombination rates from population genetic data.Genetics. 2001; 159: 1299-1318PubMed Google Scholar); ρ is an estimate of the population-scaled recombination rate, which is allowed to vary along the region being considered; and μ is a parameter that controls the mutation rate (see Stephens and Scheet [Stephens and Scheet, 2005Stephens M Scheet P Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am J Hum Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar] for more details). The probability Pr[GCi|HFi=(hf, hf′),HMi=(hm, hm′)] is calculated assuming no recombination from parents to offspring and is therefore trivial to compute. We also assume no genotyping error. As a result, this probability is typically equal to 0 for a large number of parental diplotype configurations consistent with the parental genotypes, so the children's genotype data substantially reduces the number of diplotype configurations that must be considered. As in the work of Stephens and Scheet (Stephens and Scheet, 2005Stephens M Scheet P Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am J Hum Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar), we use Partition Ligation (Niu et al. Niu et al., 2002Niu T Qin ZS Xu X Liu JS Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms.Am J Hum Genet. 2002; 70: 157-169Abstract Full Text Full Text PDF PubMed Scopus (522) Google Scholar) to further reduce the number of diplotype configurations considered when estimating haplotypes over many markers. This approach is not the most efficient, but it involved few changes to the existing algorithm. The model underlying wphase was developed on the basis of ideas proposed by Fearnhead and Donnelly (Fearnhead and Donnelly, 2001Fearnhead P Donnelly P Estimating recombination rates from population genetic data.Genetics. 2001; 159: 1299-1318PubMed Google Scholar) that introduced a simple approximate model for haplotypes sampled from a population. The algorithm differs from the PHASE algorithm above in three ways: 1.PHASE uses MCMC to sample configurations, whereas wphase performs a discrete hill climb. wphase computes a pseudolikelihood function or score for a putative haplotype reconstruction, H, of the form S(H)=Πi=1nαiβiγi , where αi, βi, and γi are defined as in the description of PHASE above. The method attempts to maximize the score by iteratively applying a set of “moves” that make small changes to the reconstruction.2.PHASE and wphase differ in the precise form of the conditional distributions, π, used to calculate the factors αi and βi. As explained above, PHASE uses a modification of the conditional distribution of Fearnhead and Donnelly (Fearnhead and Donnelly, 2001Fearnhead P Donnelly P Estimating recombination rates from population genetic data.Genetics. 2001; 159: 1299-1318PubMed Google Scholar), whereas wphase uses the conditional distributions introduced by Li and Stephens (Li and Stephens, 2003Li N Stephens M Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.Genetics. 2003; 165: 2213-2233PubMed Google Scholar).3.PHASE internally re-estimates a variable recombination rate across the region, whereas wphase uses an externally input constant recombination rate across the region. Specifically, wphase uses ρ=0.05 and θ=0.02.In our opinion, the second and third differences are more important than the first. Although use of an MCMC offers some theoretical advantages, particularly the possibility of inference with use of multiple imputation of haplotypes, this is rarely used in practice (see David Clayton's SNPHAP algorithm for a notable exception [Clayton Web site]). If only one haplotype reconstruction is to be used (e.g., in HapMap), then maximizing a pseudolikelihood function is likely to produce a good solution. Testing in simulation has shown that wphase nearly always returns a score that is as good as or better than the value of the true haplotypes. This suggests that the quality of the reconstruction can be improved only by refining the score, not by altering the details of the hill climb. The difference in the form of the conditional distributions described above may lead to improved reconstructions (Stephens and Scheet Stephens and Scheet, 2005Stephens M Scheet P Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am J Hum Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar). In the special case of the resolution of singleton SNPs that occur in the same individual, the conditional distributions used with PHASE will result in a more plausible solution than those used with wphase. The effect this difference has for nonsingleton SNPs remains unclear. In addition, internally estimating a variable recombination rate is important, and its absence is a major weakness of the current version of wphase. True recombination rates vary greatly across the genome (McVean et al. McVean et al., 2004McVean G Myers S Hunt S Deloukas P Bentley D Donnelly P The fine-scale structure of recombination rate variation in the human genome.Science. 2004; 304: 581-584Crossref PubMed Scopus (700) Google Scholar; Myers et al. Myers et al., 2005Myers S Bottolo L Freeman C McVean G Donnelly P A fine-scale map of recombination rates and hotspots across the human genome.Science. 2005; 310: 321-324Crossref PubMed Scopus (796) Google Scholar) and between various simulated regions in our test set. Initial comparisons with PHASE version 1 (Stephens et al. Stephens et al., 2001Stephens M Smith NJ Donnelly P A new statistical method for haplotype reconstruction from population data.Am J Hum Genet. 2001; 68: 978-989Abstract Full Text Full Text PDF PubMed Scopus (6195) Google Scholar) at the time of development showed wphase to have very similar performance but not enough improvement to make it important to publish quickly. Since then, wphase has hardly improved, the main change being support for trio data, but PHASE underwent a major revision, with significant performance enhancements (Stephens and Donnelly Stephens and Donnelly, 2003Stephens M Donnelly P A comparison of Bayesian methods for haplotype reconstruction from population genotype data.Am J Hum Genet. 2003; 73: 1162-1169Abstract Full Text Full Text PDF PubMed Scopus (2940) Google Scholar; Stephens and Scheet Stephens and Scheet, 2005Stephens M Scheet P Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.Am J Hum Genet. 2005; 76: 449-462Abstract Full Text Full Text PDF PubMed Scopus (1068) Google Scholar). Haplotype and missing data inference was performed with HAP2, the details of which have been published elsewhere (Lin et al. Lin et al., 2004bLin S Chakravarti A Cutler D Haplotype and missing data inference in nuclear families.Genome Res. 2004; 14: 1624-1632Crossref PubMed Scopus (38) Google Scholar). In short, HAP2 takes a Bayesian approach to haplotype reconstruction, set forth by Stephens et al. (Stephens et al., 2001Stephens M Smith NJ Donnelly P A new statistical method for haplotype reconstruction from population data.Am J Hum Genet. 2001; 68: 978-989Abstract Full Text Full Text PDF PubMed Scopus (6195) Google Scholar), of dynamically updating an individual's haplotypes to resemble other haplotypes in the s