The substrate specificities of papain-like cysteine proteases (clan CA, family C1) papain, bromelain, and human cathepsins L, V, K, S, F, B, and five proteases of parasitic origin were studied using a completely diversified positional scanning synthetic combinatorial library. A bifunctional coumarin fluorophore was used that facilitated synthesis of the library and individual peptide substrates. The library has a total of 160,000 tetrapeptide substrate sequences completely randomizing each of the P1, P2, P3, and P4 positions with 20 amino acids. A microtiter plate assay format permitted a rapid determination of the specificity profile of each enzyme. Individual peptide substrates were then synthesized and tested for a quantitative determination of the specificity of the human cathepsins. Despite the conserved three-dimensional structure and similar substrate specificity of the enzymes studied, distinct amino acid preferences that differentiate each enzyme were identified. The specificities of cathepsins K and S partially match the cleavage site sequences in their physiological substrates. Capitalizing on its unique preference for proline and glycine at the P2 and P3 positions, respectively, selective substrates and a substrate-based inhibitor were developed for cathepsin K. A cluster analysis of the proteases based on the complete specificity profile provided a functional characterization distinct from standard sequence analysis. This approach provides useful information for developing selective chemical probes to study protease-related pathologies and physiologies. The substrate specificities of papain-like cysteine proteases (clan CA, family C1) papain, bromelain, and human cathepsins L, V, K, S, F, B, and five proteases of parasitic origin were studied using a completely diversified positional scanning synthetic combinatorial library. A bifunctional coumarin fluorophore was used that facilitated synthesis of the library and individual peptide substrates. The library has a total of 160,000 tetrapeptide substrate sequences completely randomizing each of the P1, P2, P3, and P4 positions with 20 amino acids. A microtiter plate assay format permitted a rapid determination of the specificity profile of each enzyme. Individual peptide substrates were then synthesized and tested for a quantitative determination of the specificity of the human cathepsins. Despite the conserved three-dimensional structure and similar substrate specificity of the enzymes studied, distinct amino acid preferences that differentiate each enzyme were identified. The specificities of cathepsins K and S partially match the cleavage site sequences in their physiological substrates. Capitalizing on its unique preference for proline and glycine at the P2 and P3 positions, respectively, selective substrates and a substrate-based inhibitor were developed for cathepsin K. A cluster analysis of the proteases based on the complete specificity profile provided a functional characterization distinct from standard sequence analysis. This approach provides useful information for developing selective chemical probes to study protease-related pathologies and physiologies. Proteases hydrolyze amide bonds in proteins and peptides and represent one of the largest and most important protein families known. They comprise over 2% of the human genome and play diverse physiological roles (merops.sanger.ac.uk) (1Rawlings N.D. Tolle D.P. Barrett A.J. Nucleic Acids Res. 2004; 32: D160-D164Crossref PubMed Google Scholar). The substrate specificity of a protease enables the enzyme to preferentially cleave its substrates in the presence of other peptides or proteins. Therefore, specificity information can provide clues about the biological function of the protease and aid in the design of efficient substrates and potent, selective inhibitors. Various methods including both biological and chemical-based approaches to study protease specificity have been developed and were recently reviewed (2Marnett A.B. Craik C.S. Trends Biotechnol. 2005; 23: 59-64Abstract Full Text Full Text PDF PubMed Scopus (28) Google Scholar). Positional scanning synthetic combinatorial libraries (PS-SCLs) 2The abbreviations used are: PS-SCL, positional scanning synthetic combinatorial libraries; ACC, 7-amino-4-carbamoylmethylcoumarin; DMF, N,N-dimethyl formamide; DICI, diisopropylcarbodiimide; HOBt, 1-hydroxybenzotriazole; Ac, acetyl; Z, benzyloxycarbonyl; AMC, 7-amino-4-methylcoumarin; HPLC, high pressure liquid chromatography; AOMK, acyloxylmethyl ketone; Fmoc, N-(9-fluorenyl)methoxycarbonyl. of fluorogenic substrates have emerged as useful reagents for the rapid and exhaustive determination of protease specificity (3Thornberry N.A. Rano T.A. Peterson E.P. Rasper D.M. Timkey T. Garcia-Calvo M. Houtzager V.M. Nordstrom P.A. Roy S. Vaillancourt J.P. Chapman K.T. Nicholson D.W. J. Biol. Chem. 1997; 272: 17907-17911Abstract Full Text Full Text PDF PubMed Scopus (1849) Google Scholar). A peptide-based PS-SCL is composed of sublibraries in which one peptide position is fixed with an amino acid, whereas the remaining positions contain an equimolar mixture of amino acids. Assaying proteases with these sublibraries rapidly establishes the amino acid preferences at the defined position. Initially, the substrate specificities of caspases and granzyme B were profiled using PS-SCLs with the P1 position fixed as an aspartic acid. The limitations of the original P1 fixed libraries were overcome through the development of a modified coumarin, 7-amino-4-carbamoylmethylcoumarin (ACC) fluorogenic leaving group. The bifunctional nature of this enables straightforward solid-phase synthesis of libraries containing any amino acid at the P1 position. Early applications involved the use of a P1-diverse PS-SCL in combination with several P1-fixed PS-SCLs to study P1 and P2-P3-P4 specificity, respectively (4Harris J.L. Backes B.J. Leonetti F. Mahrus S. Ellman J.A. Craik C.S. Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 7754-7759Crossref PubMed Scopus (475) Google Scholar, 5Takeuchi T. Harris J.L. Huang W. Yan K.W. Coughlin S.R. Craik C.S. J. Biol. Chem. 2000; 275: 26333-26342Abstract Full Text Full Text PDF PubMed Scopus (391) Google Scholar, 6Harris J.L. Niles A. Burdick K. Maffitt M. Backes B.J. Ellman J.A. Kuntz I. Haak-Frendscho M. Craik C.S. J. Biol. Chem. 2001; 276: 34941-34947Abstract Full Text Full Text PDF PubMed Scopus (57) Google Scholar, 7Salter J.P. Choe Y. Albrecht H. Franklin C. Lim K.C. Craik C.S. McKerrow J.H. J. Biol. Chem. 2002; 277: 24618-24624Abstract Full Text Full Text PDF PubMed Scopus (82) Google Scholar). We report the preparation of a completely diversified PS-SCL of ACC-based substrates. This library permits the determination of P1-P2-P3-P4 specificity of proteases regardless of their P1 specificity. Using this complete diverse library, numerous proteases from various sources including humans, parasites, bacteria, and viruses have been profiled. As a representative family, we present a study of the substrate specificity of papain-like cysteine proteases. The papain-like cysteine proteases, which include plant enzymes papain and bromelain, human cysteine cathepsins (B, H, L, S, C, K, O, F, V, X, W), and parasite proteases cruzain and falcipains, have been characterized as key enzymes in many biological and pathological events (8Lecaille F. Kaleta J. Brömme D. Chem. Rev. 2002; 102: 4459-4488Crossref PubMed Scopus (448) Google Scholar, 9Brömme D. Kaleta J. Curr. Pharm. Des. 2002; 8: 1639-1658Crossref PubMed Scopus (93) Google Scholar, 10Sajid M. McKerrow J.H. Mol. Biochem. Parasitol. 2002; 120: 1-21Crossref PubMed Scopus (670) Google Scholar, 11Rosenthal P.J. Int. J. Parasitol. 2004; 34: 1489-1499Crossref PubMed Scopus (286) Google Scholar). As a result, many of them represent particularly attractive drug targets. Of particular interest to these studies is cathepsin K, a cysteine protease implicated in osteoporosis and other diseases (12Brömme D. Okamoto K. Wang B.B. Biroc S. J. Biol. Chem. 1996; 271: 2126-2132Abstract Full Text Full Text PDF PubMed Scopus (382) Google Scholar). The substrate binding pocket of these proteases can be divided into seven substrate binding subsites, S4 to S3′, that interact with P4 to P3′ residues of substrates (13Berger A. Schechter I. Philos. Trans. R. Soc. Lond. B Biol. Sci. 1970; 257: 249-264Crossref PubMed Scopus (378) Google Scholar). Hydrolysis occurs at the scissile bond between P1 and P1′. Among these, S3 and S2′ subsites interact with substrates through only side chain contacts, and their interactions spread over a relatively wide area. In contrast, the S2, S1, and S1′ subsites involve both main chain and side chain contacts. These recognition properties in combination with the well conserved structure of the family result in broad and similar specificities for the papain-like proteases (14Turk D. Guncar G. Podobnik M. Turk B. Biol. Chem. 1998; 379: 137-147Crossref PubMed Scopus (219) Google Scholar). In this study, however, the complete diverse PS-SCL and the cluster analysis of the resulting specificity information have revealed distinctive differences between the members of this class. The utility of this library and the specificity information obtained using it is well exemplified by the development of specific ACC-based substrates and an acyloxymethyl ketone inhibitor for cathepsin K. Materials—Chemicals were obtained from commercial suppliers and used without further purification, unless otherwise stated. Rink amide AM resin and Fmoc-amino acids were purchased from Novabiochem. Anhydrous low amine content N,N-dimethyl formamide (DMF) was from EM Science. O-(7-azabenzotriazol-1-yl)-1,1,3,3-tetramethyluronium hexafluorophosphate was from PerSeptive Biosystems. Diisopropylcarbodiimide (DICI), 1-hydroxybenzotriazole (HOBt), trifluoroacetic acid, and triisopropylsilane were from Aldrich. Synthetic substrates, Z-FR-AMC (7-amino-4-methylcoumarin) and Z-LR-AMC, were purchased from Bachem. Papain, and pineapple stem bromelain were purchased from Sigma. Human cathepsin B was purchased from Cortex Biochem. Heterologous expression, purification, and active site titration were performed as described previously for human cathepsin F (15Brömme D. McGrath M.E. Protein Sci. 1996; 5: 789-791Crossref PubMed Scopus (36) Google Scholar), K (12Brömme D. Okamoto K. Wang B.B. Biroc S. J. Biol. Chem. 1996; 271: 2126-2132Abstract Full Text Full Text PDF PubMed Scopus (382) Google Scholar), L (16Smith S.M. Gottesman M.M. J. Biol. Chem. 1989; 264: 20487-20495Abstract Full Text PDF PubMed Google Scholar), S (15Brömme D. McGrath M.E. Protein Sci. 1996; 5: 789-791Crossref PubMed Scopus (36) Google Scholar), and V (17Brömme D. Li Z. Barnes M. Mehler E. Biochemistry. 1999; 38: 2377-2385Crossref PubMed Scopus (201) Google Scholar). Five papain-like cysteine proteases of parasite origin were kind gifts from Drs. J. H. McKerrow, M. Sajid, and C. R. Caffrey. Rhodesain from Trypanosoma brucei rhodesiense, cruzain from Trypanosoma cruzi, a cathepsin L-like protease from Leishmania mexicana, and cathepsin B-like proteases 1 and 2 from Schistosoma mansoni were heterologously expressed and purified as described previously elsewhere (18Caffrey C.R. Hansell E. Lucas K.D. Brinen L.S. Alvarez Hernandez A. Cheng J. Gwaltney II, S.L. Roush W.R. Stierhof Y.D. Bogyo M. Steverding D. McKerrow J.H. Mol. Biochem. Parasitol. 2001; 118: 61-73Crossref PubMed Scopus (147) Google Scholar, 19Eakin A.E. Mills A.A. Harth G. McKerrow J.H. Craik C.S. J. Biol. Chem. 1992; 267: 7411-7420Abstract Full Text PDF PubMed Google Scholar). Synthesis of the Complete Diverse Tetrapeptide-ACC PS-SCL—The preparation of ACC and P1-substituted ACC was carried out as described previously using an Argonaut Quest 210 organic synthesizer (4Harris J.L. Backes B.J. Leonetti F. Mahrus S. Ellman J.A. Craik C.S. Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 7754-7759Crossref PubMed Scopus (475) Google Scholar, 20Backes B.J. Harris J.L. Leonetti F. Craik C.S. Ellman J.A. Nat. Biotechnol. 2000; 18: 187-193Crossref PubMed Scopus (234) Google Scholar, 21Maly D.J. Leonetti F. Backes B.J. Dauber D.S. Harris J.L. Craik C.S. Ellman J.A. J. Org. Chem. 2002; 67: 910-915Crossref PubMed Scopus (127) Google Scholar). The substitution level of the resin (0.63 mmol/g) was determined by a spectrophotometric Fmoc quantitative assay (22Bunin B.A. The Combinatorial Index. Academic Press, San Diego, CA1998: 6Google Scholar). The synthesis of the library was performed using a MultiChem 96-well synthesis apparatus (Robbins Scientific). To prepare the P1 part of the P1 library, each of 20 Fmoc-amino acids (omitting cysteine and including norleucine) bound to ACC-resin (0.1 mmol) was added to the wells of the reaction apparatus. The use of norleucine in the amino acid pool is to increase the amount of information provided by the substrate specificity screen. Since norleucine contains the same number of carbons as leucine and isoleucine and has a similar unbranched chain structure as lysine, it provides additional information in probing the extended substrate specificity of proteases. For the P2, P3, and P4 libraries, an isokinetic mixture of 20 Fmoc-P1 amino acids bound to ACC-resin (2 mmol per each library) was prepared by shaking the slurry in DMF for 2 h (4Harris J.L. Backes B.J. Leonetti F. Mahrus S. Ellman J.A. Craik C.S. Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 7754-7759Crossref PubMed Scopus (475) Google Scholar, 23Ostresh J.M. Winkle J.H. Hamashin V.T. Houghten R.A. Biopolymers. 1994; 34: 1681-1689Crossref PubMed Scopus (141) Google Scholar). After filtration, the resin was dried and then split in the wells of the reaction apparatus (0.1 mmol/well, 20 wells per each library). DMF was added to each well to solvate the Fmoc-P1 amino acid-ACC-resin, and gentle agitation for 30 min followed. To remove the Fmoc protection group, the DMF was drained, and a solution of 20% piperidine in DMF (4 ml/well) was added to the resin and agitated for 30 min. The piperidine solution was then removed by filtration, and the resin was thoroughly washed with DMF. To install P2 amino acids to the P2 library, 20 individual Fmoc-amino acids (10 eq, 1 mmol) were preactivated in separate vials using HOBt (10 eq, 1 mmol) and DICI (10 eq, 1 mmol) in DMF and added to the wells for the P2 library. To couple P2 amino acids to the P1, P3, and P4 libraries, an isokinetic mixture of the 20 Fmoc-amino acids (20 mmol per each library, 10 eq/well) was preactivated with HOBt (20 mmol) and DICI (20 mmol) in DMF. The solution was then added to each well for the P1, P3, and P4 libraries, and a 3-h agitation for coupling followed. When finished, the solution was drained, and the resin was thoroughly washed with DMF. The P3 and P4 positions were installed in the same manner except using 20 individual preactivated Fmoc-amino acids for the P3 position of the P3 library and for the P4 position of the P4 library, whereas a preactivated isokinetic mixture was used for the remaining positions. After the synthesis of the peptide portion was completed, the Fmoc blocking group of the P4 amino acids was removed, and the resin in each well, after washing with DMF, was treated with a capping solution consisting of AcOH (80 mmol), HOBt (80 mmol), and DICI (80 mmol) in DMF. After being agitated for 4 h, the resin was washed with DMF and then with CH2Cl2. The substrates were cleaved from the resin by treating for 1 h with a solution of trifluoroacetic acid:triisopropylsilane:H2O (95:2.5:2.5, 3 ml/well), and the collected material was lyophilized. The final products were dissolved in Me2SO to a concentration of 25 mm and stored at -20 °C until use. Synthesis of Individual Substrates and an Irreversible Inhibitor—The synthesis of individual peptide substrates was carried out using the same method employed for the complete diverse PS-SCL until the trifluoroacetic acid cleavage step. The peptide-ACC substrates cleaved from the resin were precipitated with t-butyl methyl ether. After any residual ether was evaporated, the resulting products were subjected to reverse phase preparatory HPLC (Rabbit HPLC with a Vydac C18 column, 0-95% CH3CN gradient with 0.01% trifluoroacetic acid). Matrix-assisted laser desorption/ionization mass spectrometry (Voyager, Applied Biosystems) was used to confirm the molecular weights of the purified substrates. The final purified substrates, after lyophilization, were dissolved in Me2SO and stored at -20 °C until use. An irreversible inhibitor, Ac-HGPR-acyloxylmethyl ketone (AOMK), was also designed based on the library assay results of cathepsin K and other cathepsins. The synthesis was carried out using conditions similar to those described elsewhere (24Wagner B.M. Smith R.A. Coles P.J. Copp L.J. Ernest M.J. Krantz A. J. Med. Chem. 1994; 37: 1833-1840Crossref PubMed Scopus (29) Google Scholar, 25Brömme D. Smith R.A. Coles P.J. Kirschke H. Storer A.C. Krantz A. Biol. Chem. Hoppe-Seyler. 1994; 375: 343-347Crossref PubMed Scopus (24) Google Scholar). The inhibitor was purified, and its molecular weight was confirmed as described for the preparation of individual substrates. PS-SCL Assay—The cysteine proteases were assayed at 25 °C in a buffer containing 100 mm sodium acetate (pH 5.5), 100 mm NaCl, 10 mm dithiothreitol, 1 mm EDTA, 0.01% Brij-35, and 1% Me2SO (from the substrates). Aliquots of 25 nmol in 1 μl from each of 20 sublibraries of the P1, P2, P3, and P4 libraries were added to the wells of a 96-well Microfluor-1 U-bottom plate (Dynex Technologies). The final concentration of each compound of the 8,000 compounds/well was 31.25 nm in 100-μl final reaction volume. The assays were initiated by the addition of preactivated enzyme and monitored fluorometrically with a Spectra-Max Gemini fluorescence spectrometer (Molecular Devices) with excitation at 380 nm, emission at 460 nm, and cutoff at 435 nm (4Harris J.L. Backes B.J. Leonetti F. Mahrus S. Ellman J.A. Craik C.S. Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 7754-7759Crossref PubMed Scopus (475) Google Scholar, 23Ostresh J.M. Winkle J.H. Hamashin V.T. Houghten R.A. Biopolymers. 1994; 34: 1681-1689Crossref PubMed Scopus (141) Google Scholar). The excitation and emission maxima of the peptide-conjugated ACC substrates are 325 and 400 nm, respectively. Cleavage of the substrate by a protease to release the free ACC results in a shift of the excitation and emission maxima to 350 and 450 nm, respectively. An excitation of 380 nm and an emission at 460 nm is used to maximize the signal of the ACC group over the background signal of the uncleaved substrate. In addition, the ACC fluorophore has an ∼2.8-fold higher fluorescence yield than AMC at the excitation and emission wavelengths of 380 and 460 nm. The enhanced fluorescence of the ACC group allows for the more sensitive detection of protease activity. Cluster Analysis of Specificity Data from the Complete Diverse PS-SCL Assays—To compare the specificity information with amino acid sequence information, the results from the library assays were clustered. First, the activity rates from the library assay were converted to values in a range from -1 to 1 by assigning a value of 1 to the amino acids that showed the strongest activity in each library (P1 to P4), whereas amino acids that showed no activity were assigned a value of -1. The results were analyzed with the program CLUSTER and displayed in a tree diagram by using TreeView (26Eisen M.B. Spellman P.T. Brown P.O. Botstein D. Proc. Natl. Acad. Sci. U. S. A. 1998; 95: 14863-14868Crossref PubMed Scopus (13235) Google Scholar). The P1, P2, P3, and P4 specificities were clustered together to compare with the sequence alignment and also separately for more detailed comparison. The structure-based amino acid sequence alignment of active protease domains was performed using the CLUSTAL_W program (MacVector, Accelrys Inc.). The primary sequences were taken from the SWISS-PROT or GenBank™ databases. Kinetic Analysis of Individual Peptide-ACC Substrates and an Irreversible Inhibitor—Michaelis-Menten steady state kinetic analysis was used to determine the kinetic constants of each substrate and protease pair. The final concentration of substrates ranged from 0.25 μm to1mm, and the concentration of Me2SO in the assays was less than 2% (v/v). The concentrations of cathepsins K, L, and B were 20, 1.37, and 1 nm, respectively. All kinetic assays were performed at 25 °C in triplicate. The hydrolysis of ACC substrates was monitored fluorometrically using the assay conditions described for the complete diverse library assay. The Kaleidagraph program (Synergy software) and Equation 1 were used to analyze the results and calculate kcat, Km, and kcat/Km. v0=kcat[E]01+(Km/S0)(Eq. 1) Kinetic characterization of the inhibitor Ac-HGPR-AOMK was performed as described previously (18Caffrey C.R. Hansell E. Lucas K.D. Brinen L.S. Alvarez Hernandez A. Cheng J. Gwaltney II, S.L. Roush W.R. Stierhof Y.D. Bogyo M. Steverding D. McKerrow J.H. Mol. Biochem. Parasitol. 2001; 118: 61-73Crossref PubMed Scopus (147) Google Scholar). Briefly, residual activity of human cathepsin K (20 nm), cathepsin L (1.37 nm), and cathepsin B (1 nm) were assayed with a synthetic substrate Z-FR-AMC (final 50 μm) under the complete diverse library assay conditions. Values for the pseudo-first-order rate constant kobs at each concentration of inhibitor [I]0 were computed for individual curves by fitting the data to Equation 2 when [I]0 ≥ 10 times the enzyme concentration [E]0, where [P] is the concentration of product formed over time t, and vo is the initial velocity of the reaction. [p]=v0kobs(1-exp-kobs·t)(Eq. 2) Non-linear regression analysis to determine the inactivation constant kinact and the inhibition constant ki was performed using the Kaleidagraph program and Equation 3. [S]0 is the concentration of substrate. kobs=kinact[I]0(I0+Ki1+S0/Km)(Eq. 3) Competition Labeling Assay to Assess the Selectivity of the Inhibitor—To compare the inhibitory activity and selectivity of Ac-HGPR-AOMK against human cathepsins L, B, and K, a competition labeling experiment was carried out as described previously (27Greenbaum D. Medzihradszky K.F. Burlingame A. Bogyo M. Chem. Biol. 2000; 7: 569-581Abstract Full Text Full Text PDF PubMed Scopus (489) Google Scholar). The intensity of bands inversely reflects the binding efficacy of Ac-HGPR-AOMK to the given proteases. This was measured using densitometry for quantitative comparison. Synthesis of the Complete Diverse PS-SCL—A completely diversified PS-SCL with the general structure of acetyl-P4-P3-P2-P1-ACC was synthesized using ACC, a bifunctional fluorophore leaving group with chemically labile sites for peptide synthesis and attachment to solid support. The library consists of P1, P2, P3, and P4 libraries in which the corresponding P1, P2, P3, or P4 position is fixed with one of 20 amino acids (omitting cysteine and including norleucine), whereas the remaining three positions contain an equimolar mixture of these amino acids (Fig. 1). As a result, each of the P1-P4 libraries has 20 sublibraries that contain a mixture of 8,000 (=203) species of tetrapeptide fluorogenic substrates. As a whole, the complete diverse library contains 160,000 unique tetrapeptide substrates. The library was functionally characterized using the enzymes trypsin, papain, and legumain, the substrate specificities of which are well known (Supplemental Data 1). The P1-diverse PS-SCL and various P1-fixed PS-SCLs were also used to profile these enzymes and human cysteine cathepsins (data not shown). The results obtained using these libraries were compared with those obtained from the complete diverse PS-SCL assays. The results were in good agreement with each other, providing confidence that the complete diverse library was not biased. Specificity of Papain-like Cysteine Proteases—The specificities of papain, bromelain, and human cysteine cathepsins L, V, S, K, F, and B were determined using the complete diverse PS-SCL (Fig. 2). The specificities of cruzain, rhodesain, cathepsin L-like protease from L. mexicana, and cathepsin B-like proteases 1 and 2 from S. mansoni were similarly determined using the complete diverse PS-SCL. Nearly all of these proteases displayed a preference for hydrophobic amino acids at the P2 position except bromelain, which strongly favored basic amino acids such as arginine. All six human cathepsins generally matched papain in specificity by preferring arginine and lysine at the P1 position, strictly hydrophobic amino acids at the P2 position, and broader specificities at the P3 and P4 positions. However, the complete diverse PS-SCL assay demonstrated the differences in the chemical characteristics of the favored amino acids at the P2 position and more subtle P3 specificity. At the P2 position, the substrate specificity profile of cathepsin L shows a preference for aromatic residues (phenylalanine, tryptophan, tyrosine) over aliphatic amino acids (valine, leucine), which distinguishes it from cathepsins K and S (Fig. 2, a, c, and d). Cathepsins K and S have been described to prefer branched hydrophobic residues at the P2 position, whereas cathepsin L has been shown to favor aromatic amino acids (12Brömme D. Okamoto K. Wang B.B. Biroc S. J. Biol. Chem. 1996; 271: 2126-2132Abstract Full Text Full Text PDF PubMed Scopus (382) Google Scholar, 28McGrath M.E. Palmer J.T. Brömme D. Somoza J.R. Protein Sci. 1998; 7: 1294-1302Crossref PubMed Scopus (70) Google Scholar). These previous studies are in good agreement with the complete diverse PS-SCL assay results, which showed that cathepsins K and S exclusively favored aliphatic amino acids (leucine, isoleucine, valine, methionine) at the P2 position. Cathepsin V, which is the closest to cathepsin L in terms of sequence identity, showed a preference similar to cathepsin L, favoring aromatic amino acids (tryptophan, tyrosine) over aliphatic amino acids (leucine, valine) (Fig. 2b). Cathepsin V accepted phenylalanine and leucine equally well at the P2 position, which is also consistent with previously published studies (17Brömme D. Li Z. Barnes M. Mehler E. Biochemistry. 1999; 38: 2377-2385Crossref PubMed Scopus (201) Google Scholar, 29Puzer L. Cotrin S.S. Alves M.F. Egborge T. Araujo M.S. Juliano M.A. Juliano L. Brömme D. Carmona A.K. Arch. Biochem. Biophys. 2004; 430: 274-283Crossref PubMed Scopus (56) Google Scholar). The P2 specificity of cathepsin F was similar to that of cathepsin K except for the proline preference of cathepsin K (Fig. 2e). It is noteworthy that cathepsin F accepted aspartic acid at the P2 and P3 positions, whereas none of the other cathepsins tolerate this acidic amino acid residue at either position. The complete diverse library assay confirmed that cathepsin B has much broader P2 specificity. Cathepsin B also showed stronger activity with the P1 library than with the P2 library, which significantly contrasts with the cathepsin L group proteases cathepsins L, V, S, and K (Fig. 2f). The library assay also confirmed that cathepsin B accepts arginine well at the P2 position, whereas the other cathepsins did not show any noticeable activity with this amino acid, in agreement with previous studies (30Cotrin S.S. Puzer L. de Souza Judice W.A. Juliano L. Carmona A.K. Juliano M.A. Anal. Biochem. 2004; 335: 244-252Crossref PubMed Scopus (79) Google Scholar). At the P3 position, cathepsins L and S showed similarly broad specificity but also displayed noticeable preference for basic amino acids (lysine, arginine) and some aliphatic amino acids (norleucine, leucine, methionine, isoleucine), whereas cathepsin V favored proline and norleucine. The library assay also indicated that cathepsin B has a narrower P3 specificity (norleucine, leucine, methionine, lysine, arginine) than it was previously believed to have. Specificity of Papain-like Proteases of Parasitic Origin—All the parasite proteases tested showed very similar P1 specificity as human cathepsins. Again, the interaction in the S2 subsite appears to be the predominant specificity-defining factor. Cruzain, rhodesain, and cathepsin L-like protease from L. mexicana showed specificity that is similar to that of human cathepsins L and V, whereas cathepsin B-like proteases 1 and 2 from S. mansoni showed much broader P2 specificity, accepting more amino acids than the aforementioned parasite proteases (Supplemental Data 2). It is noteworthy that cathepsin B-like protease 1 has less similar P2 and P3 specificities to human cathepsin B, although it has higher sequence identity to human cathepsin B than cathepsin B-like protease 2 has. This shows that a simple sequence comparison is not sufficient to deduce the specificity of an homologous protease. Unique Substrate Specificity of Human Cathepsin K—With the complete diverse PS-SCL, cathepsin K displayed the most distinguishing substrate specificity among the human cathepsins tested. The protease exclusively favored aliphatic amino acids (leucine, isoleucine) at the P2 position, unlike cathepsins L and V, which accepted both aromatic and aliphatic amino acids. Most distinctively, cathepsin K favored proline and glycine at the P2 and P3 positions, respectively, whereas neither of those amino acids, especially proline, were preferred by the other human cathepsins (in Fig. 2c, the proline preference is designated with a black bar). This is in agreement with a previous study that showed that Z-GPR-AMC shows a partial selectivity for cathepsin K (31Xia L. Kilb J. Wex H. Li Z. Lipyansky A. Breuil V. Stein L. Palmer J.T. Dempster D.W. Brömme D. Biol. Chem. 1999; 380: 679-687Crossref PubMed Scopus (116) Google Scholar). Comparison of Substrate Specificity by PS-SCL and Physiological Substrate Specificity of Cathepsins K and S—Cathepsins B, H, and L are ubiquitous, making it difficult to clearly identify their c