Article1 November 2002free access Human L1 element target-primed reverse transcription in vitro Gregory J. Cost Gregory J. Cost Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 725 N.Wolfe Street, 617 Hunterian, Baltimore, MD, 21205 USA Génétique des Interactions Macromoléculaires, CNRS URA2171, Institut Pasteur, 25–28 rue Docteur Roux, 75724 Paris, Cedex 15, France Search for more papers by this author Qinghua Feng Qinghua Feng Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 725 N.Wolfe Street, 617 Hunterian, Baltimore, MD, 21205 USA Search for more papers by this author Alain Jacquier Alain Jacquier Génétique des Interactions Macromoléculaires, CNRS URA2171, Institut Pasteur, 25–28 rue Docteur Roux, 75724 Paris, Cedex 15, France Search for more papers by this author Jef D. Boeke Corresponding Author Jef D. Boeke Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 725 N.Wolfe Street, 617 Hunterian, Baltimore, MD, 21205 USA Search for more papers by this author Gregory J. Cost Gregory J. Cost Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 725 N.Wolfe Street, 617 Hunterian, Baltimore, MD, 21205 USA Génétique des Interactions Macromoléculaires, CNRS URA2171, Institut Pasteur, 25–28 rue Docteur Roux, 75724 Paris, Cedex 15, France Search for more papers by this author Qinghua Feng Qinghua Feng Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 725 N.Wolfe Street, 617 Hunterian, Baltimore, MD, 21205 USA Search for more papers by this author Alain Jacquier Alain Jacquier Génétique des Interactions Macromoléculaires, CNRS URA2171, Institut Pasteur, 25–28 rue Docteur Roux, 75724 Paris, Cedex 15, France Search for more papers by this author Jef D. Boeke Corresponding Author Jef D. Boeke Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 725 N.Wolfe Street, 617 Hunterian, Baltimore, MD, 21205 USA Search for more papers by this author Author Information Gregory J. Cost1,2, Qinghua Feng1, Alain Jacquier2 and Jef D. Boeke 1 1Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 725 N.Wolfe Street, 617 Hunterian, Baltimore, MD, 21205 USA 2Génétique des Interactions Macromoléculaires, CNRS URA2171, Institut Pasteur, 25–28 rue Docteur Roux, 75724 Paris, Cedex 15, France *Corresponding author. E-mail: [email protected] The EMBO Journal (2002)21:5899-5910https://doi.org/10.1093/emboj/cdf592 PDFDownload PDF of article text and main figures. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info L1 elements are ubiquitous human transposons that replicate via an RNA intermediate. We have reconstituted the initial stages of L1 element transposition in vitro. The reaction requires only the L1 ORF2 protein, L1 3′ RNA, a target DNA and appropriate buffer components. We detect branched molecules consisting of junctions between transposon 3′ end cDNA and the target DNA, resulting from priming at a nick in the target DNA. 5′ junctions of transposon cDNA and target DNA are also observed. The nicking and reverse transcription steps in the reaction can be uncoupled, as priming at pre-existing nicks and even double-strand breaks can occur. We find evidence for specific positioning of the L1 RNA with the ORF2 protein, probably mediated in part by the polyadenosine portion of L1 RNA. Polyguanosine, similar to a conserved region of the L1 3′ UTR, potently inhibits L1 endonuclease (L1 EN) activity. L1 EN activity is also repressed in the context of the full-length ORF2 protein, but it and a second cryptic nuclease activity are released by ORF2p proteolysis. Additionally, heterologous RNA species such as Alu element RNA and L1 transcripts with 3′ extensions are substrates for the reaction. Introduction The completion of the human genome sequence has revealed the sheer abundance, diversity and importance of our transposons (Lander, 2001). Transposition is an ongoing process, actively changing the genome, occasionally for the worse (Kazazian and Moran, 1998; Ostertag and Kazazian, 2001a; Gilbert et al., 2002; Symer et al., 2002). L1 transposition has recently been suggested as a mechanism for exon shuffling (Moran et al., 1999). More passively, transposon sequences have been co-opted by the cell for a wide variety of functions including use as gene regulatory sequences and centromeric heterochromatin (Howard et al., 1995; Boeke and Stoye, 1997; Laurent et al., 1997). Due to their high copy number, these sequences are often substrates for homologous recombination and rearrangement (Meuth, 1989). Transposons are therefore a source of plasticity for the genome. The element responsible for the vast majority of transposition in humans is the L1 retrotransposon (Figure 1A). The majority of L1s in the genome are 5′ truncated (Boissinot et al., 2001; Szak et al., 2002). As most truncated (and non-truncated) L1s are flanked by variable-length target site duplications, this process is typically thought to be due to a premature termination of reverse transcription rather than recombinational 5′ deletion. Elements that are full length contain both 5′ and 3′ UTRs and two non-overlapping open reading frames (ORFs). ORF1 has been shown to code for an RNA-binding protein specifically associated with L1 RNA (Hohjoh and Singer, 1997) and to form ribonucleoprotein particles with L1 RNA in vivo (Hohjoh and Singer, 1996). Recently, the murine L1 ORF1 protein was shown to have nucleic acid chaperone activity: ORF1 encouraged annealing of complementary sequences and promoted the formation of the most stable nucleic acid hybrid possible (Martin and Bushman, 2001). ORF2 encodes an endonuclease (L1 EN) that is required for retrotransposition (Feng et al., 1996; Moran et al., 1996). The nicking specificity of L1 EN mirrors the sequence at the sites of L1 insertion in vivo, and the biochemical requirements of its nucleic acid recognition have been investigated (Feng et al., 1996; Cost and Boeke, 1998). Briefly, the L1 EN is specific for DNA within a range of structural and sequence parameters, with minor groove width being of particular importance. The DNA sequence that best correlates with these requirements is TnAn, with nicking occurring mainly at the TpA and flanking phosphodiesters. A hotspot for L1 EN nicking occurs between the bla gene and the origin of replication on pBluescript, as this region contains many TnAn sequences (Feng et al., 1996). Nicking at such sequences is generally inhibited by chromatinization; interestingly however, cleavage of some non-consensus sites is enhanced (Cost et al., 2001). The ORF2-encoded L1 reverse transcriptase (RT) contains seven conserved domains and is significantly similar to the telomerase RT (Xiong and Eickbush, 1990; Eickbush, 1997). Following the RT domain is a cysteine-rich domain of unknown function. Interestingly, the proteins encoded by L1 elements work preferentially in cis, that is, preferentially on the RNA from which they were translated (Boeke, 1997; Esnault et al., 2000; Wei et al., 2001). Despite this preference, members of the Alu class of retroelements are believed to misappropriate L1 proteins in order to proliferate (Smit, 1996; Boeke, 1997; Esnault et al., 2000). Figure 1.(A) The human L1 retrotransposon. EN, endonuclease domain; RT, reverse transcriptase domain; ZN cysteine-rich domain; vTSD, variable target site duplication. The 5′ UTR contains an internal promoter (arrow); the 3′ UTR, a polyG and polyA sequence. (B) Protein purification. L1 ORF2p purification was analyzed by electrophoresis and western blotting and silver staining (right panel). T, total lysate; S, supernatant; P, pellet; F, column flow-through; W, wash; 1–9, 0.5 ml GSH elution fractions. (C) Reaction and detection scheme. Incubation of the reaction components results in formation of branched TPRT products. Branched molecules are detected by PCR with primers JB1179 and 1180, followed by Southern blotting with the JB2296 probe. (D) TPRT by L1 ORF2p. Lane 1, full reaction; lanes 2–7, full reaction less the indicated omission; lane 8, to ensure that the products observed in lane 1 were not the result of PCR-mediated target DNA–cDNA recombination, reactions 3 (containing cDNA but no target DNA) and 4 (containing target DNA but no cDNA) were mixed before PCR; lane 9, a full reaction, but with a large excess of AMV RT substituted for L1 ORF2p. The sizing standard used here and throughout is a MspI digest of pBR322, consisting of fragments of the following number of base pairs: 622, 527, 404, 307, 242, 238, 217, 201, 190, 180, 160, 147 and smaller fragments. Download figure Download PowerPoint A model of the first steps of retrotransposition of the R2Bm element has been derived from the biochemical work of Luan and Eickbush (Luan et al., 1993). In the R2Bm model, called target-primed reverse transcription (TPRT), an element-encoded endonuclease nicks the target DNA, generating an exposed 3′ hydroxyl that serves as a primer for reverse transcription of the element's RNA. The mechanism of second-strand synthesis and nick repair is unknown. The R2Bm and L1 elements are both non-LTR polyA transposons, but otherwise share little structural similarity (Malik et al., 1999). In contrast to the semi-specific apurinic/apyrimidinic (AP)-endonuclease-related L1 EN, the R2Bm endonuclease is a type IIs restriction-like enzyme with a CCHC motif, and is specifically targeted to a sequence in the insect rDNA (Yang et al., 1999). The R2Bm enzyme is C-terminal, located after the RT; the L1 EN is at the N-terminus, before the RT. In place of the endonuclease at the N-terminus of R2Bm is a consensus zinc-finger motif proposed to be involved in DNA binding (Yang et al., 1999). Additionally, the R2Bm element completely lacks an ORF1 protein. Given these substantial differences, our results were surprising: we found that the basic mechanism of transposition initiation is conserved, as the L1 ORF2 protein (ORF2p) can carry out a TPRT reaction. The reaction faithfully recapitulates many aspects of in vivo L1 transposition, and displays several mechanistically and evolutionarily interesting behaviors. Results An in vitro TPRT assay To investigate the mechanism of L1 retrotransposition, we have reconstituted several steps of this reaction in vitro. The ORF2 protein of the highly active L1.3 retrotransposon was purified by affinity chromatography (Figure 1B), and assayed according to the scheme depicted in Figure 1C for its ability to initiate the transposition reaction. Definition of the TPRT model coupled with the discovery of an EN domain at the N-terminus of L1 ORF2p (Feng et al., 1996) suggested that an early step in the process of L1 transposition might be the synthesis of L1 cDNA utilizing a nick in the target plasmid to prime reverse transcription. When L1 ORF2 protein was incubated with L1.3 3′ end RNA and a suitable DNA target, L1 ORF2p produced a distribution of branched molecules as detected by PCR amplification (Figure 1D). Success of this reaction depended upon the presence of the L1 RNA and the target DNA (Figure 1D), as well as free deoxynucleotides and Mg2+ (data not shown). The size distribution of the branched molecules formed was dependent upon two variables: the position of the site of nicking on the target DNA and the length of the polyA tail on the transposed RNA (see Materials and methods). The minimum product length expected from the reaction was 174 bp, corresponding to an insertion in the target DNA exactly at the 3′ end of the JB1180 PCR primer. While the large majority of products exceeded this length, a small amount of shorter products were formed due to internal initiation of L1 ORF2p RT or by utilization of truncated RNAs. Most amplified TPRT products were 275–400 bp long, corresponding to cDNA insertion into the hotspot region of the plasmid. Although L1 ORF2p can produce a minimal amount of cDNA in the absence of a DNA target (data not shown), the material seen in lane 1 was not the result of artifactual cDNA/target DNA recombination during PCR, as it was not produced when cDNA and target DNA were mixed after the TPRT reaction but before PCR (Figure 1D, lane 8). Additionally, AMV RT was unable to substitute for L1 ORF2p in the TPRT reaction, even at high concentrations (Figure 1D, lane 9). TPRT products resemble in vivo L1 insertions Targeting of L1 transposition in vivo is not random. While multiple factors, including the accessibility of chromatin, may influence transposon insertion on a global scale, targeting of insertion at the nucleotide level is dictated by the specificity of the ORF2p EN domain (Feng et al., 1996; Jurka, 1997; Cost and Boeke, 1998; Cost et al., 2001). If the branched molecules detected in Figure 1 were authentic intermediates in L1 transposition, then the distribution of in vitro insertion sites should reflect the specificity of the L1 EN domain. PCR products generated in Figure 1D were cloned and sequenced. The sites of L1 cDNA insertion into the target DNA are mapped in Figure 2A, along with the major and minor sites of L1 EN nicking on this DNA sequence. In vitro TPRT exhibited extremely non-random targeting, with a large majority of the recovered cDNAs correctly targeted (Figure 2A and D). Figure 2.(A) In vitro transposition insertions. L1 EN nicking sites, white arrowheads; observed L1 insertion sites, black arrowheads; JB1180 PCR primer, shaded nucleotides; ambiguity in the exact site of insertion due to microhomology between the polyT of the L1 cDNA and the target DNA, horizontal lines. (B) Untemplated nucleotides are sometimes found at transposon insertion sites. (C) Transposition activity of wild-type and mutant ORF2 proteins. Diamonds, wild-type ORF2p; squares, EN mutant ORF2p; triangles, RT mutant ORF2p. (D) Targeting of transposition. (A) contains 291 nt, 38 of which are defined as L1 target sites (see Materials and methods). Random insertion into this sequence would therefore yield an apparent targeting frequency of 13%. When wild-type L1 ORF2p was used, 25/36 L1 insertions were targeted, whereas only 9/36 insertions with EN mutant protein were. The complete set of sequenced L1 insertions exists as Supplementary information for this paper and is available from J.D.B. at http://www.bs.jhmi.edu/MBG/boekelab/boeke_lab_homepage. Download figure Download PowerPoint In addition to retaining insertion site specificity, our TPRT assay also partially recapitulated another aspect of L1 biology. In vivo and in vitro, most L1 insertions contain L1 cDNA with a variable length polyT tail directly joined to the target DNA. However, short stretches of nucleotides, often simple repeats of high A–T content, were found between genomic L1 polyT tails and the target site duplication at a frequency of 13% (2092/15921) overall and 12% (56/479) for TA subset L1s (Szak et al., 2002). We found similar (presumably untemplated) nucleotides at the junction of the L1 cDNA and the target DNA in 28% (11/38) of in vitro insertions (Figure 2B). The extra nucleotides observed in vivo were mainly of the structure [TAAA(A)n]n; several of the extra-nucleotide sequences seen in our assay are similar to this type, but most are of apparently random sequence. While it is possible that some of these nucleotides came from aberrant extension of the RNA by T7 polymerase, we also saw such nucleotides in cDNAs from non-polyadenylated transcripts (data not shown). Additionally, extra-nucleotide addition by L1 RT has been observed with Ty1/L1 hybrid elements (Dombroski et al., 1994; Teng et al., 1996), with the R2Bm element (Luan and Eickbush, 1995) and with the Mauriceville plasmid RT (Chiang et al., 1994). TPRT can occur at pre-formed nicks and breaks L1 transposition in vivo is dependent upon the activities of both the EN and RT domains (Feng et al., 1996; Moran et al., 1996). When in vitro TPRT was attempted with protein containing an active site mutation in the RT domain sufficient to abolish all detectable RT activity, TPRT activity was undetectable (Figure 2C). Surprisingly, a similarly deleterious mutation of the EN domain resulted in the apparent retention of appreciable TPRT activity. However, rather than the targeted insertion seen with wild-type ORF2 protein, branched molecules recovered from reactions using EN mutant protein were much more randomly scattered across the target DNA (Figure 2D). This observation suggested that such molecules resulted from usage of spurious nicks generated by a low-level nicking activity found to be present in the reaction. Indeed, the use of pre-existing cellular nicks has been postulated to account for the existence of genomic L1 elements found in a sequence context inconsistent with L1 EN activity (Hutchison et al., 1989). We directly tested this hypothesis by assaying TPRT activity on DNA molecules pre-nicked at specific locations. When pGC89 plasmid previously nicked by various restriction enzymes was used as the target DNA (Figure 3A), TPRT activity was observed at the site of the nicks, both in the hotspot region (data not shown) and in a normally transposition-incompetent region (Figure 3B). Figure 3.L1 TPRT can utilize pre-existing 3′ hydroxyls for transposition. (A) Pre-nicking reaction scheme. pGC89 DNA was pre-nicked with various restriction enzymes outside of the nicking and transposition hotspot region of the plasmid, then used in the TPRT reaction. (B) TPRT products are produced at the pre-nicked sites. Predicted product sizes are 252, 268 and 285, for DraI (D1 on the figure), HindIII (H3) and HincII (H2), respectively. HindIII ‘star’ nicking activity results in a band at ∼250 nt. As the pGC89 substrate used in this experiment contains four DraI sites, only 1/4th of the nicks in the plasmid are at the assayed DraI site, reducing the intensity of the DraI band four-fold relative to the other enzymes; the other sites are unique. (C) Pre-digestion of target pBluescript KS–DNA into linear fragments. (D) Transposition products are produced via utilization of either a blunt-end DSB (DraI, D1, TTT/AAA), or a four base overhang (5′ overhang, BspHI, B1, T/CATGA, all BspHI sites are also NlaIII sites; 3′ overhang, NlaIII, N3, CATG/). The region of the plasmid assayed in this experiment is within the L1 ENp nicking and L1 ORF2p TPRT hotspot region of the plasmid. Predicted molecular weights of TPRT-derived PCR products: DraI, 303 bp; NlaIII, 361 bp; BspHI, 357 bp. A band from primer–L1 cDNA fusion is seen near the bottom of all lanes in this panel. The products of the normal TPRT reaction appear lighter in this gel only because the efficiency of transposition using DSBs necessitated a shorter exposure. Download figure Download PowerPoint When the standard plasmid target DNA was digested with restriction enzymes (Figure 3C) and then used in the TPRT assay, a substantial amount of the total TPRT activity was directed to the ends of the DNA fragments (Figure 3D). Blunt-ended fragments were much better substrates than either 5′ or 3′ overhang fragments. Utilization of blunt-ended fragments as DNA targets allowed for TPRT in the absence of EN activity. As the ORF2 protein is capable of using 3′ hydroxyls found at nicks and double-strand breaks (DSBs) generated in trans, we conclude that the nicking and reverse transcription phases of the TPRT reaction can be uncoupled. RNA requirements for reverse transcription We observed that L1 ORF2p had precocious template switching tendencies in a conventional primer–template RT assay (Mathias et al., 1991), as there was a substantial size difference between the template RNA and the cDNA produced (Figure 4A). Such activity was also detected with reverse transcription of an A20 ribo-oligonucleotide template, with which the products of reverse transcription reached lengths of several hundred nucleotides (data not shown). Evidence of template switching was also obtained from the sequences of in vitro L1 transposon insertions (Figure 2B, lanes 5 and 6), in which the joining of at least two heterologous cDNAs to the L1 cDNA was observed. Figure 4.(A) In a homopolymer RT assay (polyA RNA, oligo-dT primer), L1 ORF2p produces RT products far in excess of the molecular weight of the template, indicative of template switching activity; AMV RT does not. (B) TPRT of various RNA species. The end-point of the L1 and Alu RNAs are indicated with the dotted line (not to scale). RNA number 4 has 38 nt of vector RNA after the polyA tail. URA3, fragment of S.cerevisiae URA3 RNA with and without a polyA tail. (C) Distribution of cDNA initiation points. Shown below each histogram is a full-length cDNA (from 3′ to 5′), with the positions of the reverse-transcribed polyG and polyA tracts indicated by ‘Cn’ and ‘TTTTTT’, respectively. Position zero marks the end of L1 sequence and the beginning of the polyA tail. The actual positions of the cDNA initiation are plotted in the histogram grouped into 10 nt bins. For the first cDNA (generated from RNA number 1 in B), many cDNAs end beyond the designed position at nucleotide 14 due to the addition of extra A residues to the RNA by T7 polymerase (see Materials and methods). Transposition of RNA number 4 (the second cDNA from the top) yields two distinct populations of cDNA initiation points. Mutation of either the polyG or the polyA sequence (histograms three and four, respectively) removes the bias towards internal initiation of cDNA synthesis, although only the polyA mutation results in a statistically significant difference. Student's two-tailed t-test comparison of wild type versus mutant, bins −20−30 with bins 30–60; polyG p-value = 0.17; polyA p-value = 0.05. For all three cDNAs from 3′ extended transcripts, all initiation points in the 50–60 bin occurred at nucleotide 53, the end of the transcript. The following number of highly truncated (endpoint <−30) cDNAs were excluded from the statistical analysis: wt 3′ extended RNA, 3; polyG mutant, 4; polyA mutant, 3. Download figure Download PowerPoint We carried out a limited deletion analysis of the 366 nt L1 3′ RNA in an effort to define any cis requirements for efficient TPRT (Figure 4B). Simple deletion of the polyA tail of the RNA (lane 2) had a modest effect on the efficiency of TPRT, whereas a more substantial 3′ truncation (lane 1) reduced the amount of TPRT products. Interestingly, a hybrid RNA containing L1 3′ RNA joined to 38 nt of vector sequence could be used as a substrate for TPRT at low levels (lane 4). We detected efficient TPRT of the Alu element RNA (lanes 5 and 6), a transposon long thought to utilize L1-encoded proteins for its mobilization. TPRT products from an irrelevant RNA were also formed at reasonable efficiency, however (lanes 7 and 8), suggesting that the molecular basis for the RNA selectivity of L1 is not likely to reside in the ORF2 protein (Esnault et al., 2000; Wei et al., 2001). Generally, a 3′ terminal polyA tail modestly stimulated TPRT activity of the transcripts. In addition to assaying bulk TPRT activity with the different RNAs, analysis of discrete cDNAs produced by TPRT proved informative as well. Sequencing of the insertion events from the chimeric RNA number 4 (which has a 3′ extension of 38 nt after the polyA tail) revealed a bimodal distribution of L1 cDNA initiation points (Figure 4C). Excluding the three highly truncated insertions, about half (13/24) of the inserted cDNAs ended at or near the 3′ end of the RNA and half (11/24) in the vicinity of the polyT region. While L1 ORF2p RT can reverse transcribe the full length of 3′ extended transcripts, we infer from these results that reverse transcription of extended RNAs is often guided to begin internally within a window set by a cis-acting RNA sequence. One possible candidate for this sequence is the evolutionarily conserved polyguanosine stretch (25/34 nt are Gs) in the L1 3′ UTR (Furano, 2000). We found a polyG RNA sequence to be a potent inhibitor of L1 EN activity (rG20 IC50 ≈ 50 nM), in contrast to a polyA sequence (rA20 IC50 ≈ 50 μM; Figure 5B). Under physiological conditions, the polyG stretch in the L1 3′ UTR can adopt a ‘G-quartet’ secondary structure (Howell and Usdin, 1997). Complete disruption of a similar structure in a polyI homopolymer by conversion from the sodium to the lithium salt (Figure 5C, see Materials and methods) had no effect on the inhibitory activity of this sequence (Figure 5D), suggesting that a quartet structure is not required for L1 EN inhibition. Figure 5.L1 EN is inhibited by RNA. (A) L1 EN nicking activity was assayed by following conversion of the quickly migrating supercoiled KS–plasmid to the slowly migrating open-circular form. (B) L1 EN nicking of a supercoiled plasmid was challenged with 10-fold dilutions (100 μM–100 nM; 100 μM–10 nM for G20) of the indicated RNA oligo. Unlike the related CCR4 nuclease (Chen et al., 2002), L1 EN has neither polyA-specific RNA exonucleolytic activity, nor any detectable nucleolytic activity on RNA (data not shown). (C) Quartet structures (as assayed by differential light absorbance and thermal melting) are disrupted when polyI is converted to a lithium salt. (D) Quartet structures are not required for inhibition of L1 EN nicking. Ten-fold dilutions (100 ng/μl–10 pg/ml); no difference was seen even at two-fold dilutions (data not shown). sc., supercoiled; oc., open circular. Download figure Download PowerPoint Furthermore, we investigated whether the polyG or polyA tract was required to observe the bimodal cDNA endpoint distribution observed in Figure 4C. Despite its ability to markedly inhibit L1 EN activity, mutation of the endogenous polyG sequence within the L1 3′ UTR RNA only partially affected where cDNA was begun on 3′ extended transcripts (Figure 4C). In contrast, substitution of non-polyA sequence for the L1 polyA stretch in a 3′ extended RNA significantly altered the position of cDNA initiation such that the large majority (24/30) began near the 3′ end of the RNA (Figure 4C). We conclude that L1 ORF2p RT can recognize and initiate reverse transcription at internal polyA RNA sequences. Second-strand L1 DNA synthesis Completion of the transposition reaction requires a second round of TPRT. Molecules resulting from complete L1 insertions will therefore have a junction of the 5′ end of second-strand DNA to the target DNA at the site of the second nick (Figure 6A). A PCR assay (analogous to the one used to detect transposon 3′ end insertion in Figure 1C and D) revealed the existence of a population of such molecules in our in vitro reaction (Figure 6B). Like 3′ TPRT, the formation of second-strand L1 DNA was dependent upon the activity of the L1 RT and (to a lesser extent) EN domains (Figure 6B). Cloning and sequencing of these PCR products allowed for the site of 5′ end insertion to be determined for six independent 5′ end insertions (Figure 6C). Of the six, three contained a single extra thymidine, one was missing the 5′-most nucleotide of the L1 cDNA, one missing the first seven cDNA nucleotides, and one 5′ junction was neither missing nor had gained nucleotides. As L1 retrotransposition does not result in fixed spacing between the 5′ and 3′ TPRT reactions, the exact position of the 3′ end TPRT for these insertions is unknown. Figure 6.Creation of second-strand L1 cDNA. (A) The full L1 transposition reaction requires the utilization of two nicks in the target DNA. Arrows indicate the position of the primers (JB1180 and 2NP) used for PCR. (B) Five prime end insertion. Lane 1, wild-type ORF2 protein; lane 2, EN mutant ORF2p; lane 3, RT mutant ORF2p. (C) Black arrowheads indicate the position of target DNA–L1 5′ cDNA junctions. Download figure Download PowerPoint L1 EN activity is repressed in the full-length ORF2 protein In contrast to the robust RT activity present in full-length ORF2p, initial assays for L1 EN nicking revealed little EN activity (Figure 7B, lane 1; data not shown). The observation (in Figure 3) that pre-nicking or pre-breaking of the DNA target could stimulate TPRT with wild-type and EN mutant protein implied that nicking activity was in fact rate-limiting in the reaction. One possibility is that the conformation of the EN domain in the context of the full-length ORF2 protein renders it unable to efficiently nick DNA. We tested this hypothesis by examining the EN activity of proteolytic fragments of L1 ORF2p. Treatment of ORF2p with Factor Xa protease [a procedure originally intended to specifically remove the glutathione S-transferase (GST) affinity tag] resulted in the unexpected scission of the EN domain from the ORF2p, without appreciable release of the GST domain (Figure 7A). When ORF2 protein treated with the highest concentration of Factor Xa used in Figure 7A (Figure 7B, lanes 2, 5 and 8), or an excess of Factor Xa (complete EN proteolytic release, lanes 3, 6, 9 and 12) was assayed for nicking ability, activity was detected in a proteolysis-dependent manner. Similar results were obtained with a wide variety of less specific proteases (data not shown). The nicking activity released by EN cleavage had a DNA sequence specificity identical to that of the purified EN domain, as it cleaved at and near the TpA bond of a TnAn oligonucleotide (Cost and Boeke, 1998). Bona fide L1 EN activity is stimulated by the addition of DMSO to the reaction (Cost and Boeke, 1998; Figure 7C); the nicking activity released by Factor Xa proteolysis reacted similarly (lanes 11 and 12). When EN mutant ORF2 protein was