Studies of copy-number variation and linkage disequilibrium (LD) have typically excluded complex regions of the genome that are rich in duplications and prone to rearrangement. In an attempt to assess the heritability and LD of copy-number polymorphisms (CNPs) in duplication-rich regions of the genome, we profiled copy-number variation in 130 putative “rearrangement hotspot regions” among 269 individuals of European, Yoruba, Chinese, and Japanese ancestry analyzed by the International HapMap Consortium. Eighty-four hotspot regions, corresponding to 257 bacterial artificial chromosome (BAC) probes, showed evidence of copy-number differences. Despite a predisposing genetic architecture, no polymorphism was ever observed in the remaining 46 “rearrangement hotspots,” and we suggest these represent excellent candidate sites for pathogenic rearrangements. We used a combination of BAC-based and high-density customized oligonucleotide arrays to resolve the molecular basis of structural rearrangements. For common variants (frequency >10%), we observed a distinct bias against copy-number losses, suggesting that deletions are subject to purifying selection. Heritability estimates did not differ significantly from 1.0 among the majority (30 of 34) of loci analyzed, consistent with normal Mendelian inheritance. Some of the CNPs in duplication-rich regions showed strong LD with nearby single-nucleotide polymorphisms (SNPs) and were observed to segregate on ancestral SNP haplotypes. However, LD with the best available SNP markers was weaker than has been reported for deletion polymorphisms in less complex regions of the genome. These observations may be accounted for by a low density of SNP data in duplicated regions, challenges in mapping and typing the CNPs, and the possibility that CNPs in these regions have rearranged on multiple haplotype backgrounds. Our results underscore the need for complete maps of genetic variation in duplication-rich regions of the genome. Studies of copy-number variation and linkage disequilibrium (LD) have typically excluded complex regions of the genome that are rich in duplications and prone to rearrangement. In an attempt to assess the heritability and LD of copy-number polymorphisms (CNPs) in duplication-rich regions of the genome, we profiled copy-number variation in 130 putative “rearrangement hotspot regions” among 269 individuals of European, Yoruba, Chinese, and Japanese ancestry analyzed by the International HapMap Consortium. Eighty-four hotspot regions, corresponding to 257 bacterial artificial chromosome (BAC) probes, showed evidence of copy-number differences. Despite a predisposing genetic architecture, no polymorphism was ever observed in the remaining 46 “rearrangement hotspots,” and we suggest these represent excellent candidate sites for pathogenic rearrangements. We used a combination of BAC-based and high-density customized oligonucleotide arrays to resolve the molecular basis of structural rearrangements. For common variants (frequency >10%), we observed a distinct bias against copy-number losses, suggesting that deletions are subject to purifying selection. Heritability estimates did not differ significantly from 1.0 among the majority (30 of 34) of loci analyzed, consistent with normal Mendelian inheritance. Some of the CNPs in duplication-rich regions showed strong LD with nearby single-nucleotide polymorphisms (SNPs) and were observed to segregate on ancestral SNP haplotypes. However, LD with the best available SNP markers was weaker than has been reported for deletion polymorphisms in less complex regions of the genome. These observations may be accounted for by a low density of SNP data in duplicated regions, challenges in mapping and typing the CNPs, and the possibility that CNPs in these regions have rearranged on multiple haplotype backgrounds. Our results underscore the need for complete maps of genetic variation in duplication-rich regions of the genome. Variation in the human genome occurs on multiple levels, from the SNP to larger events involving contiguous blocks of DNA sequence that vary in copy number between individuals. Although the technological development of SNP detection and genotyping methods has progressed significantly in the past decade, the ability to detect copy-number variants (CNVs) on a genomewide scale has emerged only recently. Current array-based methods typically detect CNVs ≥40 kb in size, and variation at this level of resolution has been shown to occur frequently in the human population.1Iafrate AJ Feuk L Rivera MN Listewnik ML Donahoe PK Qi Y Scherer SW Lee C Detection of large-scale variation in the human genome.Nat Genet. 2004; 36: 949-951Crossref PubMed Scopus (2231) Google Scholar, 2Sebat J Lakshmi B Troge J Alexander J Young J Lundin P Maner S Massa H Walker M Chi M Navin N Lucito R Healy J Hicks J Ye K Reiner A Gilliam TC Trask B Patterson N Zetterberg A Wigler M Large-scale copy number polymorphism in the human genome.Science. 2004; 305: 525-528Crossref PubMed Scopus (1897) Google Scholar, 3Sharp AJ Locke DP McGrath SD Cheng Z Bailey JA Vallente RU Pertz LM Clark RA Schwartz S Segraves R Oseroff VV Albertson DG Pinkel D Eichler EE Segmental duplications and copy-number variation in the human genome.Am J Hum Genet. 2005; 77: 78-88Abstract Full Text Full Text PDF PubMed Scopus (689) Google Scholar On the basis of a report published elsewhere, it has been estimated that any two individuals differ by >11 CNVs that are >100 kb.2Sebat J Lakshmi B Troge J Alexander J Young J Lundin P Maner S Massa H Walker M Chi M Navin N Lucito R Healy J Hicks J Ye K Reiner A Gilliam TC Trask B Patterson N Zetterberg A Wigler M Large-scale copy number polymorphism in the human genome.Science. 2004; 305: 525-528Crossref PubMed Scopus (1897) Google Scholar At a finer level of resolution, a recent analysis comparing a single individual with the reference human genome identified 297 intermediate-sized structural variants (ISVs) in the 8–200-kb range (77 events >40 kb).4Tuzun E Sharp AJ Bailey JA Kaul R Morrison VA Pertz LM Haugen E Hayden H Albertson D Pinkel D Olson MV Eichler EE Fine-scale structural variation of the human genome.Nat Genet. 2005; 37: 727-732Crossref PubMed Scopus (780) Google Scholar Structural variation is therefore an important subject for study, not only to understand the full spectrum of human genetic variation, but also to assess the significance of such variation in disease-association studies. Several consistent themes have emerged from recently published studies of copy-number polymorphisms (CNPs), CNVs with a frequency >1%. Of primary importance to understanding the relationship between genotype and phenotype is the fact that CNPs are frequently found in genic regions. This association is exemplified by studies of toxin sensitivity and variation in the copy number of members of the glutathione S-transferase gene family GSTT1 and GSTM1.5Buckland PR Polymorphically duplicated genes: their relevance to phenotypic variation in humans.Ann Med. 2003; 35: 308-315Crossref PubMed Scopus (63) Google Scholar Also, CNPs and ISVs have been found, by multiple genomewide approaches,1Iafrate AJ Feuk L Rivera MN Listewnik ML Donahoe PK Qi Y Scherer SW Lee C Detection of large-scale variation in the human genome.Nat Genet. 2004; 36: 949-951Crossref PubMed Scopus (2231) Google Scholar, 2Sebat J Lakshmi B Troge J Alexander J Young J Lundin P Maner S Massa H Walker M Chi M Navin N Lucito R Healy J Hicks J Ye K Reiner A Gilliam TC Trask B Patterson N Zetterberg A Wigler M Large-scale copy number polymorphism in the human genome.Science. 2004; 305: 525-528Crossref PubMed Scopus (1897) Google Scholar, 3Sharp AJ Locke DP McGrath SD Cheng Z Bailey JA Vallente RU Pertz LM Clark RA Schwartz S Segraves R Oseroff VV Albertson DG Pinkel D Eichler EE Segmental duplications and copy-number variation in the human genome.Am J Hum Genet. 2005; 77: 78-88Abstract Full Text Full Text PDF PubMed Scopus (689) Google Scholar, 4Tuzun E Sharp AJ Bailey JA Kaul R Morrison VA Pertz LM Haugen E Hayden H Albertson D Pinkel D Olson MV Eichler EE Fine-scale structural variation of the human genome.Nat Genet. 2005; 37: 727-732Crossref PubMed Scopus (780) Google Scholar to be enriched in regions of intrachromosomal segmental duplication, and many deletions have been shown to be flanked by pairs of paralogous sequences in a direct orientation.6McCarroll SA Hadnott TN Perry GH Sabeti PC Zody MC Barrett JC Dallaire S Gabriel SB Lee C Daly MJ Altshuler DM The International HapMap Consortium Common deletion polymorphisms in the human genome.Nat Genet. 2006; 38: 86-92Crossref PubMed Scopus (557) Google Scholar These findings indicate that genes found in regions of segmental duplication are more likely to vary in copy number in the human population. The majority of CNP studies to date, however, have used a panel of unrelated individuals, and questions about the heritability of CNPs have been left unaddressed. More importantly, it is unknown whether CNPs are in linkage disequilibrium (LD) with nearby single-nucleotide variation. Previous studies of unique regions of the genome suggest that nearby SNPs may serve as markers for deletion polymorphisms,6McCarroll SA Hadnott TN Perry GH Sabeti PC Zody MC Barrett JC Dallaire S Gabriel SB Lee C Daly MJ Altshuler DM The International HapMap Consortium Common deletion polymorphisms in the human genome.Nat Genet. 2006; 38: 86-92Crossref PubMed Scopus (557) Google Scholar, 7Hinds DA Kloek AP Jen M Chen X Frazer KA Common deletions and SNPs are in linkage disequilibrium in the human genome.Nat Genet. 2006; 38: 82-85Crossref PubMed Scopus (294) Google Scholar but the association of CNPs with SNPs in duplication-rich regions—which are more likely to have undergone multiple rearrangements—has not been addressed. Furthermore, the LD properties of structural polymorphisms involving gains of copies are almost completely unknown. In this study, we present an analysis of CNPs within the sample populations used by the International HapMap Project8International HapMap Consortium The International HapMap Project.Nature. 2003; 426: 789-796Crossref PubMed Scopus (4664) Google Scholar and assess somatic variation among diverse tissue sources. By using array-based comparative genomic hybridization (array CGH) targeted to regions of segmental duplication, we focus our efforts on assessing variation in complex regions that are prone to rearrangement.3Sharp AJ Locke DP McGrath SD Cheng Z Bailey JA Vallente RU Pertz LM Clark RA Schwartz S Segraves R Oseroff VV Albertson DG Pinkel D Eichler EE Segmental duplications and copy-number variation in the human genome.Am J Hum Genet. 2005; 77: 78-88Abstract Full Text Full Text PDF PubMed Scopus (689) Google Scholar A combination of both BAC-based and high-density oligonucleotide arrays allowed for an extremely detailed view and illuminated the molecular basis of a subset of CNPs. We assessed copy-number variation in all four HapMap population samples, a total of 269 individuals. Using DNA samples and available SNP data from the International HapMap Project, we analyzed patterns of SNP density, heritability, and LD at sites of CNP in duplication-rich regions of the genome. The samples profiled in this study were those used in the International HapMap Project.8International HapMap Consortium The International HapMap Project.Nature. 2003; 426: 789-796Crossref PubMed Scopus (4664) Google Scholar, 9International HapMap Consortium A haplotype map of the human genome.Nature. 2005; 437: 1299-1320Crossref PubMed Scopus (4512) Google Scholar Hybridizations were performed on DNA from all 269 individuals sampled by the HapMap Consortium; these consisted of 90 individuals (30 trios) of European ancestry sampled in Utah (CEU); 90 individuals (30 trios) of Yoruba ancestry, sampled in Ibadan, Nigeria (YRI); 45 unrelated individuals of Han Chinese ancestry, sampled in Beijing (CHB); and 44 unrelated individuals of Japanese ancestry, sampled in Tokyo (JPT). DNA samples were obtained from Coriell Cell Repositories. A few profiles showed chromosomal aneuploidy, suggestive of cell-line artifacts; therefore, the data for those chromosomes were not considered. The following samples were affected: JPT NA18996, YRI NA19208, YRI NA19193, CHB NA18540, CEU NA12236, and CEU NA12875 (fig. 1). The reference DNA used for all hybridizations was a single male of Czechoslovakian descent, Coriell ID GM15724, which is a well-characterized sample used in a previous array CGH study.3Sharp AJ Locke DP McGrath SD Cheng Z Bailey JA Vallente RU Pertz LM Clark RA Schwartz S Segraves R Oseroff VV Albertson DG Pinkel D Eichler EE Segmental duplications and copy-number variation in the human genome.Am J Hum Genet. 2005; 77: 78-88Abstract Full Text Full Text PDF PubMed Scopus (689) Google Scholar In the present study, a CNV was classified as a CNP if altered copy number was observed in >1% of the 269 individuals sampled. We refer to “altered copy-number frequency” (ACNF) instead of “minor-allele frequency,” because measurements of copy number are on diploid samples, and, in most cases, the actual allele structure of the variant has not been resolved at the molecular level. Array hybridizations were performed as described by Snijders et al.,10Snijders AM Nowak N Segraves R Blackwood S Brown N Conroy J Hamilton G Hindle AK Huey B Kimura K Law S Myambo K Palmer J Ylstra B Yue JP Gray JW Jain AN Pinkel D Albertson DG Assembly of microarrays for genome-wide measurement of DNA copy number.Nat Genet. 2001; 29: 263-264Crossref PubMed Scopus (753) Google Scholar with use of the segmental duplication array.3Sharp AJ Locke DP McGrath SD Cheng Z Bailey JA Vallente RU Pertz LM Clark RA Schwartz S Segraves R Oseroff VV Albertson DG Pinkel D Eichler EE Segmental duplications and copy-number variation in the human genome.Am J Hum Genet. 2005; 77: 78-88Abstract Full Text Full Text PDF PubMed Scopus (689) Google Scholar, 10Snijders AM Nowak N Segraves R Blackwood S Brown N Conroy J Hamilton G Hindle AK Huey B Kimura K Law S Myambo K Palmer J Ylstra B Yue JP Gray JW Jain AN Pinkel D Albertson DG Assembly of microarrays for genome-wide measurement of DNA copy number.Nat Genet. 2001; 29: 263-264Crossref PubMed Scopus (753) Google Scholar The segmental duplication array consists of 2,007 BACs, spotted in triplicate, that were targeted to 130 complex regions of the genome and flanked by intrachromosomal segmental duplications. All 269 individuals were hybridized, with dye-swap replicate experiments, to the segmental duplication array with use of a single reference individual for comparison.3Sharp AJ Locke DP McGrath SD Cheng Z Bailey JA Vallente RU Pertz LM Clark RA Schwartz S Segraves R Oseroff VV Albertson DG Pinkel D Eichler EE Segmental duplications and copy-number variation in the human genome.Am J Hum Genet. 2005; 77: 78-88Abstract Full Text Full Text PDF PubMed Scopus (689) Google Scholar A locus was considered a CNV if the log ratio of fluorescence measurements for the individuals assayed exceeded twice the SD of the autosomal clones in both dye-swapped experiments. To account for asymmetry in some hybridization data, presumably due to differences in labeling efficiency between DNAs obtained from outside sources and our reference DNA extracted in-house, we developed a statistical method to identify variants in an asymmetric distribution. In brief, the total distribution of autosomal log2 ratios was divided into two groups, with the average autosomal log2 ratio as the division point. The SD was then determined for the above-average and below-average groups, after mirroring the data to simulate a symmetric distribution within each subgroup. The variant threshold for gains was then calculated as 2 SDs of the above-average group added to the mean, and the threshold for losses became 2 SDs of the below-average group subtracted from the mean. Comparison of the results of this method with those that did not account for asymmetry showed that the asymmetric method reduced the number of false-positive results, when compared with the oligonucleotide array data used for validation purposes (data not shown). Generally, hybridizations were considered good quality if they had an SD <0.2 for autosomal clones; otherwise, they were repeated. In a small subset of cases, repeated hybridizations also resulted in higher SDs, likely because of starting-DNA quality. Of the 538 hybridization profiles used in this analysis, which comprise 269 dye-swap pairs, only 6 profiles (samples CHB NA18633, CEU NA10847, CEU NA10851, CEU NA12707, CEU NA12740, and CEU NA12864) exceed an autosomal SD of 0.2. For each locus, the reported ACNF represents the percentage of unrelated individuals assayed (i.e., with exclusion of offspring from the CEU and YRI trios) who were scored as possessing a copy-number variation at that locus. Since our standard reference individual is male, to avoid difficulties in identifying variant clones on the X and Y chromosomes in sex-mismatched hybridizations, only male-male hybridizations were used to score variants on the sex chromosomes. A complete list of all BACs present on the array and the frequency of copy-number variation of each within the populations tested is shown in the tab-delimited ASCII files of data set 1 (online only). A total of 30 self-versus-self hybridizations were performed on a panel of tissues from four individuals obtained from the Cooperative Human Tissue Network (CHTN), with use of the identical protocol that was used for the HapMap population sample hybridizations. A total of 7 or 8 tissues were profiled from each individual with splenic genomic DNA as the reference DNA, since it was abundantly available, high quality, and obtainable from all donors. A custom oligonucleotide array (NimbleGen Systems) was designed that consisted of 385,000 isothermal probes (45–70 bp) that covered the identical regions represented on the segmental duplication array, with an overall mean probe density of one probe per 733 bp. Probes were selected in regions devoid of high-copy repeats but within the unique and duplicated sequences that comprise the BACs on the segmental duplication array. DNA from nine individuals (YRI NA18517, YRI NA18507, YRI NA18502, YRI NA19240, YRI NA19129, CHB NA18555, JPT NA18992, CEU NA12156, and CEU NA12878), representing individuals from each of the four HapMap population samples, were then hybridized to the oligonucleotide array. The variants detected by BAC-array analysis were then directly compared with the oligonucleotide array profiles by converting the results from all oligonucleotide probes overlapping a BAC into a single statistic. The oligonucleotide array data was scored such that the duplication and deletion thresholds were computed as 2 SDs beyond the mean log2 ratio for all autosomal oligonucleotides reporting in that hybridization. For each BAC, the number of oligonucleotide probes that reported a loss was subtracted from the number of oligonucleotide probes that reported a gain and then was divided by the total number of probes overlapping the original BAC probe, which resulted in a simple scoring ratio. Ratios >0.1 or <−0.1 were scored as gains or losses, respectively. To assess the sensitivity and specificity of these criteria, we examined X chromosome loci in sex-mismatched hybridizations; this analysis indicated a false-negative rate of 5% and a false-positive rate of <0.2%, indicating it is a sensitive and specific metric for confirming copy-number changes. CNVs were classified as discrete or continuously variable by visual inspection of a plot of the log2 ratios from replicate dye-swap hybridizations. For discrete CNVs (defined as those in which the underlying signal intensity ratio could be visually clustered into two, three, or four well-separated copy-number classes by inspection of scatter plots of the replicate log2 hybridization values), we treated each of these clusters as a genotype, omitting genotype calls for any samples for which assignment was ambiguous or for which the dye-swap replicates were not concordant (SD >0.2) (see the tab-delimited ASCII file of data set 2 [online only]). We assessed whether the resulting genotypes were in Hardy-Weinberg equilibrium (HWE) and analyzed all trios for deviations from Mendelian inheritance. “Narrow-sense” heritability estimates,11Fisher R The genetical theory of natural selection. Clarendon Press, Oxford, United Kingdom1930Google Scholarh2, obtained by estimating the regression coefficient (slope) of offspring values against midparental values (the mean value for both parents within a trio), are shown in table 1. Measurements of h2 close to 1.0 suggest that the copy number is stably inherited, independent of measurement noise or precision.Table 1Heritability of CNPs with a Continuous DistributionPopulation and CloneChromosome and hg16 CoordinatesaBased on the hg16 reference sequence.Dye-Swap R2ACNFNarrow-Sense Heritability (h2±SE)YRI: CTD-2046J211: 103532647–103647985.90.1381.06 ± .21 RP11-585N151: 16304321–16391174.66.249.97 ± .18 CTD-2589H19bOverlapping BAC clones were analyzed independently.5: 662684–864137.87.424.86 ± .27 RP11-837K1bOverlapping BAC clones were analyzed independently.5: 693297–873247.75.416.65 ± .25 RP11-812N8bOverlapping BAC clones were analyzed independently.5: 779850–879258.69.313.90 ± .38 RP11-262L17: 45058286–45214464.77.191.49 ± .55cBAC does not show significant heritability. RP11-384C27: 142717297–142869087.62.141.83 ± .16 RP11-45N97: 143297685–143451563.87.481.92 ± .24 CTD-2142K238: 7238603–7341931.86.238.69 ± .19 RP11-774P78: 7917017–8067760.92.328.73 ± .23 RP11-138C515: 19199775–19364096.51.4881.11 ± .23 RP11-117M1415: 19804700–19971720.71.4011.20 ± .21 RP11-351D617: 34930509–35010273.69.2671.16 ± .19 RP11-142H619: 8669454–8825625.87.463.61 ± .20 RP11-775G622: 17102889–17244196.58.4171.06 ± .27 RP11-379N1122: 19757625–19940794.80.4021.20 ± .19 CTD-2506I1622: 20014749–20220783.61.246.83 ± .26CEU: CTD-2046J211: 103532647–103647985.69.138.92 ± .19 RP11-1112O103: 196744968–196880879.72.1861.20 ± .22 CTD-2108J173: 196950243–197121995.70.183.84 ± .25 CTD-2589H19bOverlapping BAC clones were analyzed independently.5: 662684–864137.62.424.32 ± .21cBAC does not show significant heritability. RP11-837K1bOverlapping BAC clones were analyzed independently.5: 693297–873247.57.587.71 ± .21 RP11-812N8bOverlapping BAC clones were analyzed independently.5: 779850–879258.50.691.15 ± .35cBAC does not show significant heritability. RP11-24O145: 69417315–69562055.73.199.80 ± .19 RP11-188C217: 101763594–101920490.59.117.50 ± .27cBAC does not show significant heritability. CTD-3088N11bOverlapping BAC clones were analyzed independently.8: 7767399–7916838.83.2821.08 ± .16 RP11-774P7bOverlapping BAC clones were analyzed independently.8: 7917017–8067760.83.328.99 ± .19 RP11-110H228: 86762305–86913434.72.111.95 ± .15 CTD-2387G710: 48395333–48482422.79.0621.02 ± .14 RP11-138C515: 19199775–19364096.63.488.58 ± .19 RP11-142H619: 8669454–8825625.82.4631.05 ± .21 CTD-3048O1422: 16933331–17071291.51.172.90 ± .23 RP11-775G622: 17102889–17244196.75.417.78 ± .23 RP11-379N1122: 19757625–19940794.73.4041.10 ± .18Note.—A subset of CNPs with continuously distributed copy-number measurements was tested for narrow-sense heritability, estimated by the slope of the regression line fitting offspring copy-number measurements to midparental (mean of the parents) copy-number measurements. Of the 34 analyzed CNPs, 30 (88%) demonstrated significant heritability in the CEU and YRI subpopulations. ACNF indicates the frequency at which this variant was found among all HapMap sample populations. The coefficient of determination (R2) was calculated from the dye-swap replicate data points of the BAC array hybridization data and is an indicator of reproducibility. Sites with R2<0.5 were removed from further analysis. Three further loci were also analyzed and showed narrow-sense heritability values <0 (data not shown). All three corresponded to the IGH and IGL gene clusters, which are known sites of somatic variation.2Sebat J Lakshmi B Troge J Alexander J Young J Lundin P Maner S Massa H Walker M Chi M Navin N Lucito R Healy J Hicks J Ye K Reiner A Gilliam TC Trask B Patterson N Zetterberg A Wigler M Large-scale copy number polymorphism in the human genome.Science. 2004; 305: 525-528Crossref PubMed Scopus (1897) Google Scholara Based on the hg16 reference sequence.b Overlapping BAC clones were analyzed independently.c BAC does not show significant heritability. Open table in a new tab Note.— A subset of CNPs with continuously distributed copy-number measurements was tested for narrow-sense heritability, estimated by the slope of the regression line fitting offspring copy-number measurements to midparental (mean of the parents) copy-number measurements. Of the 34 analyzed CNPs, 30 (88%) demonstrated significant heritability in the CEU and YRI subpopulations. ACNF indicates the frequency at which this variant was found among all HapMap sample populations. The coefficient of determination (R2) was calculated from the dye-swap replicate data points of the BAC array hybridization data and is an indicator of reproducibility. Sites with R2<0.5 were removed from further analysis. Three further loci were also analyzed and showed narrow-sense heritability values <0 (data not shown). All three corresponded to the IGH and IGL gene clusters, which are known sites of somatic variation.2Sebat J Lakshmi B Troge J Alexander J Young J Lundin P Maner S Massa H Walker M Chi M Navin N Lucito R Healy J Hicks J Ye K Reiner A Gilliam TC Trask B Patterson N Zetterberg A Wigler M Large-scale copy number polymorphism in the human genome.Science. 2004; 305: 525-528Crossref PubMed Scopus (1897) Google Scholar To assess the LD of discretely varying CGH measurements with SNPs, we used an approach used elsewhere to analyze discretely varying copy-number measurements obtained by quantitative PCR.6McCarroll SA Hadnott TN Perry GH Sabeti PC Zody MC Barrett JC Dallaire S Gabriel SB Lee C Daly MJ Altshuler DM The International HapMap Consortium Common deletion polymorphisms in the human genome.Nat Genet. 2006; 38: 86-92Crossref PubMed Scopus (557) Google Scholar In brief, we recoded the discrete CNP genotype as a SNP genotype (“+/+” = AA, “+/−” = AT, and “−/−” = TT) and combined this with SNP genotype data from Phase I HapMap.9International HapMap Consortium A haplotype map of the human genome.Nature. 2005; 437: 1299-1320Crossref PubMed Scopus (4512) Google Scholar We used SNP genotype data from all SNPs in a region extending 200 kb beyond the edges of the BAC probe, which was based on the hg16 reference sequence. We used the Haploview program12Barrett JC Fry B Maller J Daly MJ Haploview: analysis and visualization of LD and haplotype maps.Bioinformatics. 2005; 21: 263-265Crossref PubMed Scopus (11414) Google Scholar to phase CNP and SNP genotypes and to calculate R2. To assess the correlation of continuously varying CGH measurements with SNPs, we obtained SNP genotype data from all SNPs from Phase I HapMap that spanned a region 200 kb from both edges of the CGH probe, transformed these SNP genotypes into integers (e.g., AA=0, AC=1, and CC=2), and calculated the coefficient of determination (R2) for each of these SNP genotypes with the CNP measurements. To assess the significance of these correlation measurements, we performed a permutation test in which the CGH measurements were permuted across the trios in a population sample (maintaining the relationships within each trio) and again compared with the same SNP genotypes in that region. We considered a correlation significant (P<.05) if it was observed in <5% of these simulations. All genome physical coordinates referred to in this work are on the hg16 (build 34) coordinate system. Using a BAC array targeted to regions of intrachromosomal segmental duplication,3Sharp AJ Locke DP McGrath SD Cheng Z Bailey JA Vallente RU Pertz LM Clark RA Schwartz S Segraves R Oseroff VV Albertson DG Pinkel D Eichler EE Segmental duplications and copy-number variation in the human genome.Am J Hum Genet. 2005; 77: 78-88Abstract Full Text Full Text PDF PubMed Scopus (689) Google Scholar we analyzed 269 DNA samples corresponding to 209 unrelated individuals and 60 parents-child trios,9International HapMap Consortium A haplotype map of the human genome.Nature. 2005; 437: 1299-1320Crossref PubMed Scopus (4512) Google Scholar by array CGH against a single reference individual. Of the samples, 263 passed quality assessment criteria (see the “Methods” section). From this set, we identified 384 CNV BACs, of which 127 (∼33%) were observed only once, and 257 (∼67%) were observed in more than one individual (fig. 2 and data set 1 [online only]). Of these variants, 103 have not been reported elsewhere (table 2). When adjacent clones, mapping within 250 kb of each other, are merged, these variant BACs represent 222 CNV regions. The average multi-BAC CNV region spanned 436 kb, with a range of 145 kb to 1.4 Mb. Several multi-BAC CNV regions showed evidence of heterogeneity, suggestive of additional genomic complexity. For example, a variant region from chromosome 22 consisted of four BAC clones (RP11-105A23, RP11-157B2, RP11-1143M16, and RP11-229C18) within a span of 605 kb. The four clones in this region were observed as a copy-number loss in the CEU subpopulation, three of the four were observed as a loss in the CHB population, and two of the four were observed as a loss in the JPT and YRI populations.Table 2Summary of Autosomal Variant BACs by Array CGHNo. of Variant BACsNo. of VariantsHapMap SubgroupNo. of SamplesTotalWith GainWith LossWith Gain and LossCorroboratedNovelaNovel within the subgroup but not necessarily novel with respect to all subgroups, except in the nonre