Genome sequencing projects in eukaryotes are revealing thousands of new genes of unknown function, many of which fall into gene families. We discovered one such family while systematically screening predicted Arabidopsis proteins for those likely to be targeted to mitochondria or chloroplasts. This large gene family (almost 200 genes in the 70% of the Arabidopsis genome sequenced so far) is characterized by the presence of tandem arrays of a degenerate 35-amino-acid repeat (Fig. 1). The same family has been identified independently on the basis of other criteria by Aubourg et al.1 Aubourg S. et al. In Arabidopsis thaliana, 1% of the genome is coding for a novel protein family unique to plants. Plant Mol. Biol. 2000; (in press) Google Scholar Two-thirds of these Arabidopsis proteins are predicted to be targeted to either mitochondria or chloroplasts (N. Peeters and I.D. Small, unpublished). None of them have been characterized in any way to our knowledge, but a few related sequences in other organisms have been studied. The maize gene crp1 ( 2 Fisk D.G. et al. Molecular cloning of the maize gene crp1 reveals similarity between regulators of mitochondrial and chloroplast gene expression. EMBO J. 1999; 18: 2621-2630 Crossref PubMed Scopus (211) Google Scholar ) is a member of the same family, and similar repeats are found in PET309 from Saccharomyces cerevisiae3 Manthey G.M. McEwen J.E. The product of the nuclear gene PET309 is required for translation of mature mRNA and stability or production of intron-containing RNAs derived from the mitochondrial COXI locus of Saccharomyces cerevisiae. EMBO J. 1995; 14: 4031-4043 PubMed Google Scholar and cya-5 from Neurospora crassa4 Coffin J.W. et al. The Neurospora crassa cya-5 nuclear gene encodes a protein with a region of homology to the Saccharomyces cerevisiae PET309 protein and is required in a post-transcriptional step for the expression of the mitochondrially encoded COXI protein. Curr. Genet. 1997; 32: 273-280 Crossref PubMed Scopus (56) Google Scholar . All three genes encode proteins involved in some way in processing or translation, or both, of particular organellar mRNAs ( 2 Fisk D.G. et al. Molecular cloning of the maize gene crp1 reveals similarity between regulators of mitochondrial and chloroplast gene expression. EMBO J. 1999; 18: 2621-2630 Crossref PubMed Scopus (211) Google Scholar ). None of these three proteins show obvious sequence similarity to each other or to the Arabidopsis proteins outside the zone of repeats. The repeat structure in these proteins appears to have been initially overlooked, although Fisk et al. 2 Fisk D.G. et al. Molecular cloning of the maize gene crp1 reveals similarity between regulators of mitochondrial and chloroplast gene expression. EMBO J. 1999; 18: 2621-2630 Crossref PubMed Scopus (211) Google Scholar state in a note added in proof that these proteins contain TPR (tetratricopeptide) motifs. In fact, although the 35-amino-acid repeats do resemble TPR motifs, they have significant and characteristic differences. To distinguish them from TPR motifs, we propose to call them PPR (pentatricopeptide) motifs. Using BLAST ( 5 Altschul S.F. et al. Basic local alignment search tool. J. Mol. Biol. 1990; 215: 403-410 Crossref PubMed Scopus (72097) Google Scholar ) searches on the non-redundant GenBank peptide database, we built up a list of more than 100 sequences likely to contain PPR motifs and then used the MEME program 6 Bailey, T.L. and Elkan, C. (1994) Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36, AAAI Press Google Scholar on this set of sequences to define a profile corresponding to the PPR motif (Fig. 1). This profile was then used by MOTIFSEARCH [part of the Genetics Computer Group (GCG) Wisconsin Package, Version 10.0, Madison, WI, USA] to screen the SwissProt/TrEMBL, PIR and GenPept databases. A total of 213 sequences were found with a combined probability score of less than 10−4 and containing at least one pair of motifs in tandem (the latter criterion being a very stringent requirement). The number of motifs per peptide ranges from 2 to 26, with a mean of 9.1. CRP1, Pet309p and CYA-5 were found in this search, along with a few other human or yeast proteins of unknown function (Table 1) and a salt-inducible protein from tobacco 8 Chang P-F. Alterations in cell membrane structure and expression of a membrane-associated protein after adaptation to osmotic stress. Physiol. Plant. 1996; 98: 505-516 Crossref Scopus (24) Google Scholar . The vast majority of the other sequences are from Arabidopsis. There are numerous expressed sequence tags (ESTs) from related genes from several plant species, including rice, so this gene family is likely to be widespread in higher plants (EST-encoded peptides were not included in the set searched with MOTIFSEARCH). No prokaryotic proteins were found to possess tandem copies of this motif and none of the known TPR-containing proteins were revealed via this search. Table 1Non-plant proteins containing putative PPR motifs Species a Abbreviations: H. sapiens, Homo sapiens; N. crassa, Neurospora crassa; S. cerevisiae, Saccharomyces cerevisiae; S. pombe, Schizosaccharomyces pombe. Accession no. Name Minimum no. of PPR motifs b PPR motifs were detected using the MEME profile. The proteins could contain other, less conserved copies of the PPR motif. The longest stretch of strictly tandem repeats is indicated in parentheses. Subcellular localization Function H. sapiens AAA67550 GP130 10 (3) Not mitochondrial c Subcellular localizations are MitoProt predictions7. ? S. cerevisiae P32522 PET309 6 (3) Mitochondrial RNA processing/translation S. pombe TrEMBL:O14275 C8C9.06C 6 (2) Mitochondrial c Subcellular localizations are MitoProt predictions7. ? H. sapiens TrEMBL:O75127 KIAA0632 5 (2) ? (partial sequence) ? N. crassa TrEMBL:P87224 CYA5 5 (2) Mitochondrial RNA processing/translation S. pombe TrEMBL:O42955 C19G7.07C 4 (2) Mitochondrial c Subcellular localizations are MitoProt predictions7. ? S. pombe TrEMBL:O94368 C1604.02C 3 (2) Mitochondrial c Subcellular localizations are MitoProt predictions7. ? S. cerevisiae P48237 YGL150c 3 (1) Mitochondrial c Subcellular localizations are MitoProt predictions7. ? S. cerevisiae S52526 YPLOO5w 3 (1) Mitochondrial c Subcellular localizations are MitoProt predictions7. ? a Abbreviations: H. sapiens, Homo sapiens; N. crassa, Neurospora crassa; S. cerevisiae, Saccharomyces cerevisiae; S. pombe, Schizosaccharomyces pombe. b PPR motifs were detected using the MEME profile. The proteins could contain other, less conserved copies of the PPR motif. The longest stretch of strictly tandem repeats is indicated in parentheses. c Subcellular localizations are MitoProt predictions 7 Claros M.G. Vincens P. Computational method to predict mitochondrially imported proteins and their targeting sequences. Eur. J. Biochem. 1996; 241: 770-786 Crossref Scopus (1388) Google Scholar . Open table in a new tab