Genomic analysis of archival tissues fixed in formalin is of fundamental importance in biomedical research, and numerous studies have used such material. Although the possibility of polymerase chain reaction (PCR)-introduced artifacts is known, the use of direct sequencing has been thought to overcome such problems. Here we report the results from a controlled study, performed in parallel on frozen and formalin-fixed material, where a high frequency of nonreproducible sequence alterations was detected with the use of formalin-fixed tissues. Defined numbers of well-characterized tumor cells were amplified and analyzed by direct DNA sequencing. No nonreproducible sequence alterations were found in frozen tissues. In formalin-fixed material up to one mutation artifact per 500 bases was recorded. The chance of such artificial mutations in formalin-fixed material was inversely correlated with the number of cells used in the PCR—the fewer cells, the more artifacts. A total of 28 artificial mutations were recorded, of which 27 were C-T or G-A transitions. Through confirmational sequencing of independent amplification products artifacts can be distinguished from true mutations. However, because this problem was not acknowledged earlier, the presence of artifacts may have profoundly influenced previously reported mutations in formalin-fixed material, including those inserted into mutation databases. Genomic analysis of archival tissues fixed in formalin is of fundamental importance in biomedical research, and numerous studies have used such material. Although the possibility of polymerase chain reaction (PCR)-introduced artifacts is known, the use of direct sequencing has been thought to overcome such problems. Here we report the results from a controlled study, performed in parallel on frozen and formalin-fixed material, where a high frequency of nonreproducible sequence alterations was detected with the use of formalin-fixed tissues. Defined numbers of well-characterized tumor cells were amplified and analyzed by direct DNA sequencing. No nonreproducible sequence alterations were found in frozen tissues. In formalin-fixed material up to one mutation artifact per 500 bases was recorded. The chance of such artificial mutations in formalin-fixed material was inversely correlated with the number of cells used in the PCR—the fewer cells, the more artifacts. A total of 28 artificial mutations were recorded, of which 27 were C-T or G-A transitions. Through confirmational sequencing of independent amplification products artifacts can be distinguished from true mutations. However, because this problem was not acknowledged earlier, the presence of artifacts may have profoundly influenced previously reported mutations in formalin-fixed material, including those inserted into mutation databases. Analysis of nucleic acids from paraffin-embedded tissue blocks is crucial in today's clinical research. It is known that the formalin fixation procedure lowers the success of polymerase chain reaction (PCR) amplification1Ben-Ezra J Johnson DA Rossi J Cook N Wu AJ Effect of fixation on the amplification of nucleic acids from paraffin-embedded material by the polymerase chain reaction.J Histochem Cytochem. 1991; 39: 351-354Crossref PubMed Scopus (323) Google Scholar because of cross-linking between protein and DNA.2Chalkley R Hunter C Histone-histone propinquity by aldehyde fixation of chromatin.Proc Natl Acad Sci USA. 1975; 72: 1304-1308Crossref PubMed Scopus (87) Google Scholar Nevertheless, a great number of reports based on formalin-fixed paraffin-embedded tissues used for amplification and subsequent analysis have been published, and the results have been incorporated into databases. Use of the PCR has permitted the analysis of decreasing amounts of template, allowing genetic analysis of single cells in tissue sections.3Pontén F Williams C Ling G Ahmadian A Nistér M Lundeberg J Pontén J Uhlén M Genomic analysis of single cells from human basal cell cancer using laser-assisted capture microscopy.Mutat Res. 1997; 382: 45-55PubMed Google Scholar The use of amplification techniques makes the analysis vulnerable for several reasons. Randomly scattered nucleotide substitutions due to misincorporation by the Taq DNA polymerase4Eckert KA Kunkel TA DNA polymerase fidelity and the polymerase chain reaction.PCR Methods Appl. 1991; 1: 17-24Crossref PubMed Scopus (356) Google Scholar are observed after cloning of PCR products and are well documented.4Eckert KA Kunkel TA DNA polymerase fidelity and the polymerase chain reaction.PCR Methods Appl. 1991; 1: 17-24Crossref PubMed Scopus (356) Google Scholar, 5Hultman T Bergh S Moks T Uhlén M Bidirectional solid-phase sequencing of in vitro-amplified plasmid DNA.Biotechniques. 1991; 10: 84-93PubMed Google Scholar Direct sequencing of the amplified PCR product theoretically overcomes this problem, because the effect of such randomly distributed mutations should be masked by the consensus sequence. Mutations detected by direct sequencing are therefore generally considered as true, especially when nonambiguous, and the need for independent confirmation (starting from new amplification of the original sample lysate) may be overlooked. Nevertheless, we have previously noted a disturbing occurrence of nonreproducible mutations in studies involving amplification and direct DNA sequence analysis of the p53 gene in formalin-fixed samples of lung, breast, bladder, and skin cancer (data not published). To determine the exact presence and frequency of these artifacts we compared PCR amplification and direct sequencing analysis of frozen and formalin-fixed parallel tumor tissue, from one well-characterized tumor, under controlled conditions. Clinical samples from a basal cell cancer containing known mutations6Pontén F Berg C Ahmadian A Ren ZP Nistér M Lundeberg J Uhlén M Pontén J Molecular pathology in basal cell cancer with p53 as a genetic marker.Oncogene. 1997; 15: 1059-1067Crossref PubMed Scopus (91) Google Scholar were used in this study. Biopsies were sliced immediately after excision; one part was snap-frozen and cryosectioned, and the other part was fixed in formalin and paraffin embedded. The 12–16-μm-thick sections were microdissected with a small scalpel (Alcon Ophthalmic knife 15°). The number of microdissected cells was estimated at a minimum of 1500 for the frozen sample and 2000 for the formalin-fixed sample. The number of microdissected cells available per PCR is based on this first estimation. The samples were transferred to tubes containing 50 μl PCR buffer (10 mM Tris-HCl (pH 8.3), 50 mM KCl). Cells were lysed by the addition of 2 μl freshly prepared proteinase K solution (25 mg/ml, dissolved in redistilled water) at 56°C for 1 hour, incubated with 0.5 volume Chelex slurry (1:1 w/v Chelex 100 resin/redistilled water) for 10 minutes at room temperature, followed by heat inactivation (95°C for 5 minutes). The mixture was centrifuged (5000 rpm for 5 minutes) and carefully removed by aspiration to a clean microcentrifuge tube. Dilution series were made to correspond to 300 to 10 cells per 2 μl. Aliquots of the different dilutions were amplified into six shorter fragments in an outer multiplex PCR (covering 900 bp of exons 4–9 of the p53 gene), followed by inner specific PCRs for each exon. This technique7Berg C Hedrum A Holmberg A Pontén F Uhlén M Lundeberg J Direct solid-phase sequence analysis of the human p53 gene by use of multiplex polymerase chain reaction and α-thiotriphosphate nucleotides.Clin Chem. 1995; 41: 1461-1466PubMed Google Scholar has been developed especially to facilitate analysis of small samples, down to a single microdissected cell.3Pontén F Williams C Ling G Ahmadian A Nistér M Lundeberg J Pontén J Uhlén M Genomic analysis of single cells from human basal cell cancer using laser-assisted capture microscopy.Mutat Res. 1997; 382: 45-55PubMed Google Scholar The outer amplification was performed for 35 cycles, using AmpliTaq and Stoffel Fragment AmpliTaq polymerases (Perkin-Elmer, Norwalk, CT). After dilution (25-fold for exons 4, 5, and 7–9 and 100-fold for exon 6), inner region specific amplifications for exons 4–9 were performed (35 cycles). One of the inner primers for each fragment was labeled with biotin to permit solid-phase sequencing of PCR templates. Several PCR amplifications were made for each dilution. Solid-phase direct DNA sequencing was essentially performed, according to the methods described in 7Berg C Hedrum A Holmberg A Pontén F Uhlén M Lundeberg J Direct solid-phase sequence analysis of the human p53 gene by use of multiplex polymerase chain reaction and α-thiotriphosphate nucleotides.Clin Chem. 1995; 41: 1461-1466PubMed Google Scholar and 8, with the use of Streptavidin-coated combs (AutoLoad Solid Phase Sequencing Kit. Amersham Pharmacia Biotech, Uppsala, Sweden) and automated laser fluorescent analysis (ALFExpress; Amersham Pharmacia Biotech). A total of 3600 bases (four repeats of 900 bases) per dilution were analyzed, except for the highest dilutions of formalin-fixed samples, where 4300 bases covering exons 5–9 were analyzed (because exon 4 did not amplify). The reference basal cell cancer was known to harbor two mutations (codons 130 and 285) in all of its parts.6Pontén F Berg C Ahmadian A Ren ZP Nistér M Lundeberg J Uhlén M Pontén J Molecular pathology in basal cell cancer with p53 as a genetic marker.Oncogene. 1997; 15: 1059-1067Crossref PubMed Scopus (91) Google Scholar These were here considered elements of the “correct” prototype nucleotide sequence. The sequences recorded (from dilutions of frozen and formalin-fixed parts of the tumor) were compared to the prototype sequence for the detection of additional alterations. All dilutions originated from the same microdissected frozen or formalin sample, and each dilution was amplified in several different outer PCRs. Additional alterations were considered artifacts when they did not appear in amplicons of different outer PCRs. All artifacts could be “confirmed,” however, by repeated analyses of amplicons of the same outer PCR product (by a new inner PCR and sequencing). The ratios between the total number of confirmed artifacts and number of bases sequenced for the respective dilutions were tabulated (Table 1). The detection limit for mutations in this assay requires at least 20% of the amplified product to harbor the alteration.Table 1Amplification Efficiency and Frequency of Artificial Mutations in Relation to Tissue Material and Number of Cells per AnalysisTissueNo. of cellsAmplificationBases sequencedNo. of artifactsFrequency of artifacts (%)Frozen200OK36000—64OK36000—20OK36000—10OK36000—Formalin fixed300OK36000—150OK360010.0380OK360030.0840OK410060.1420Not exon 4430090.210Not exon 4430090.2 Open table in a new tab All dilutions of the frozen cells (200, 64, 20, and 10 cells per PCR) were amplified successfully for all exons (Table 1). For formalin-fixed cells the higher dilutions (10 and 20 cells per PCR) did not amplify exon 4 (which is the longest fragment, 350 bp), whereas dilutions corresponding to 40, 80, 150, and 300 cells amplified all exons (Table 1). To control the accuracy of cell concentrations in the dilutions, additional amplifications of microdissected samples with the exact number of cells known were performed. These experiments confirmed the results above. Among the frozen samples, no sequence alterations other than the two known mutations were detected in any of the dilutions (from 200 cells to 10 cells per PCR). Among formalin-fixed samples, a number of nonreproducible sequence alterations (ie, artificial mutations. appeared. The higher dilutions, 10 and 20 cells per PCR, showed one nonreproducible mutation for every 500 bases, whereas 40, 80, and 150 cells showed a lower but still important error rate. The results are shown in Table 1. The known mutations in this tumor were always present, confirming the origin of template. Additional sequence alterations, however, were found only in the formalin-fixed material and could not in any case be confirmed by repeated analysis (starting from the original sample lysate dilution), as exemplified in Figure 1. Independent amplifications from the same pool of formalin-fixed cells could show several different artificial mutations. In total, 28 artificial mutations were recorded in the formalin-fixed part of the tumor, 27 (96%) occurred at guanosine or cytosine positions and resulted in C-T or G-A transitions, and the remaining one was an A-T transversion. Eight artificial mutations (28%) were silent or were intron alterations, and 20 (72%. coded for missense or nonsense alterations. This study shows that as much as one artificial mutation per 500 bases may be recorded in the analysis of formalin-fixed material. Approximately one-third of the artificial mutations coded for a silent amino acid change. Such a mutation spectrum is expected if the mutations are distributed randomly, without biological selection. Silent mutations have not been observed by us before, in this or in previous studies of frozen skin tumors,6Pontén F Berg C Ahmadian A Ren ZP Nistér M Lundeberg J Uhlén M Pontén J Molecular pathology in basal cell cancer with p53 as a genetic marker.Oncogene. 1997; 15: 1059-1067Crossref PubMed Scopus (91) Google Scholar, 9Ren ZP Hedrum A Pontén F Nistér M Ahmadian A Lundeberg J Uhlén M Pontén J Human epidermial cancer and accompanying precursors have identical p53 mutations different from p53 mutations in adjacent areas of clonally expanded non-neoplastic keratinocytes.Oncogene. 1996; 12: 765-773PubMed Google Scholar, 10Ren ZP Ahmadian A Pontén F Nistér M Berg C Lundeberg J Uhlén M Pontén J Benign clonal keratinocyte patches with p53 mutations show no genetic link to synchronous squamous cell precancer or cancer in human skin.Am J Pathol. 1997; 150: 1791-1803PubMed Google Scholar including an extensive analysis of a xeroderma pigmentosum patient, in which we recorded 29 different mutations in various lesions.11Williams C Pontén F Ahmadian A Ren ZP Gao L Rollman O Ljung A Jaspers NGJ Uhlén M Lundeberg J Pontén J Clones of normal keratinocytes and a variety of simultaneously present epidermal neoplastic lesions contain a multitude of p53 gene mutations in a xeroderma pigmentosum patient.Cancer Res. 1998; 58: 2449-2455PubMed Google Scholar Although the artificial mutations could never be confirmed by repeated analysis starting from the original sample lysate dilution, a repeated inner PCR performed on the same outer PCR showed the same artifact when sequenced. This suggests that the artifacts occur in or before the outer PCR and thus are not related to the sequencing procedure. For an error to show up as a detectable sequence alteration (in direct sequence analysis) it is required to occur in the first cycle of outer amplification, in the presence of very few templates. When only one template is present (ie, only one strand of DNA) and an error occurs in the first cycle, the theoretical amount of mutated fragments should not be more than 50% of the final amplification product, assuming that the original template is amplified correctly in the second cycle. When one cell, which contains four templates of DNA (two strands on two alleles), is subjected to amplification the fragments containing artifacts should not make up more than 12.5%. This would place them below or, at best, at the limit of detection in our sequencing method. In this study, where 10 cells per PCR was the lowest number of cells used, an error should not be detectable. Nevertheless, 28 artificial mutations were detected, and, as exemplified in Figure 1, the fragments containing the artifact often made up approximately 50% of the sequence (which corresponds to half of the final product of amplified DNA). In addition, a few amplifications contained only the error sequence (as determined by a 100% mutant DNA sequencing signal; data not shown). Our conclusion is that only one or a few of the theoretical templates were truly available for amplification. The exact mechanism for modification of DNA in formalin-fixed samples is not known. DNase activity is not believed to be the cause.12Yagi N Satonaka K Horio M Shimogaki H Tokuda Y Maeda S The role of DNase and EDTA on DNA degradation in formaldehyde fixed tissues.Biotech Histochem. 1996; 71: 123-129Crossref PubMed Scopus (39) Google Scholar The rate of errors detected in the formalin-fixed material is much higher than the reported Taq DNA polymerase error frequency (2/105 to 1/9 × 103).13Lundberg KS Shoemaker DD Adams MW Short JM Sorge JA Mathur EJ High-fidelity amplification using a thermostable DNA polymerase isolated from Pyrococcus furiosus.Gene. 1991; 108: 1-6Crossref PubMed Scopus (447) Google Scholar, 14Tindall KR Kunkel TA Fidelity of DNA synthesis by the Thermus aquaticus DNA polymerase.Biochemistry. 1988; 27: 6008-6013Crossref PubMed Scopus (635) Google Scholar Artifacts could be the consequence of formalin damaging or cross-linking cytosine nucleotides, on either strand, so that the Taq DNA polymerase would not recognize them and instead of a guanosine incorporate an adenosine (because of the so-called A-rule). Thereby an artificial C-T or G-A mutation would be created. In addition, damaged DNA has been described to promote jumping between templates during enzymatic amplification.15Pääbo S Irwin DM Wilson AC DNA damage promotes jumping between templates during enzymatic amplification.J Biol Chem. 1990; 265: 4718-4721PubMed Google Scholar According to that theory, Taq DNA polymerase may insert an adenosine residue when it encounters the end of a template molecule (the same A-rule as above), then jump to another template and continue the extension. As a result, an artificial mutation may be produced and amplified. The actual frequency of errors would correspond, in addition to the Taq DNA polymerase's normal error frequency, to the degree of damage and/or cross-linking of DNA. The detected frequency of artificial mutations, however, would also depend on the degree of “dilution” by correctly amplified fragments and, thereby, on the number of target templates in the first round of amplification. This corresponds well to the increase in artifacts we observed when fewer cells were used in the outer PCR. At a higher number of target cells (>300 in our study) there were enough nondamaged templates to dominate the amplification process (Figure 2A). For smaller amounts of cells only fragmented DNA may be present, requiring a few PCR cycles to achieve an in vitro repaired template that would yield an exponential amplification. The artifact mutations may then represent errors in the early repair process (Figure 2B), by, for example, the non-template-dependent addition of an A residue. This interpretation is supported by the finding of mutation signals on the order of 50–100. peak signals, indicating amplification of a single DNA copy. In a former study,16Yngveson A Williams C Hjerpe A Lundeberg J Söderkvist P Pershagen G p53 mutations in lungcancer associated with residential radon exposure.Cancer Epidemiol Biomarkers Prev. 1999; 8: 433-438PubMed Google Scholar formalin-fixed lung cancers were analyzed for p53 mutations by both direct sequencing and single-strand conformation polymorphism (SSCP). In 50 tumors, 13 true mutations and 47 artificial mutations were found (a frequency of one artifact per 606 bases). When direct sequencing was used, the artifacts were easily distinguished from true mutations by confirmatory sequencing of independent PCR products. With the use of SSCP, many samples with artificial mutations also showed shifts in the confirmatory analysis. DNA sequencing of both shifts was needed to rule out artifacts (where the two shifts exhibited different nucleotide changes). The normal procedure is to consider the mutation confirmed if a shift appears in two separate runs. However, with an error rate of one artifact per 500 bases, a sample may have artifacts in two separate PCRs, although different ones. Hence artificial mutations may pose a problem in direct DNA-sequencing strategies and in many other molecular techniques (eg, SSCP, denaturant gradient gel electrophoresis. that are based on PCR amplification. In addition, we have noted a high frequency of artificial mutations in other studies of (formalin-fixed. cancers of different origins (data not published). When the Taq DNA polymerase was used, the frequency of artifacts was one per 683 bases in a study of endocrine samples and one per 821 bases in a study of breast tumor samples. Furthermore, one study of basal cell cancer samples was performed with the Pfu DNA polymerase, where, in amplified samples, the error frequency was one per 2050 bases (data not published). Because samples were collected from different pathology laboratories and different preparations of DNA template were used (with and without extraction, chelating agents, and proteinase K treatment), the artifacts do not seem to be dependent on any specific routine or treatment of the samples (other than the use of formalin fixation). As a result of this, an unknown number of incorrect mutations may have been reported in various studies and inserted into various databases when DNA from tissues fixated in formalin was analyzed. A significant part of the data in mutation databases is based on analysis of formalin-fixed material. An example of this is the IARC Database of somatic p53 mutations (http://www.iarc.fr/p53/homepage.htm), where 38% of reported somatic mutations (with information of origin) are from formalin-fixed tumors (Dr. T. Hernandez-Boussard, personal communication). In conclusion, this study has highlighted concerns that need to be dealt with when formalin-fixed archival specimens are used. Although PCR amplification and subsequent analysis appear successful, artificial mutations can be present at a high frequency. Thus our results emphasize the importance of confirmation from the biological source, which resolves the problem with artificial mutations. We are grateful to Dr. Jacob Odeberg for valuable comments on the manuscript.