ResearchHub | Open Science Community

T1SEstacker: A tri-layer stacking model effectively predicts bacterial type 1 secreted proteins based on C-terminal non-RTX-motif sequence features

Zewei Chen et al.Nov 12, 2021

Abstract The proteins secreted through type 1 secretion systems often play important roles in pathogenicity of various gram-negative bacteria. However, the type 1 secretion mechanism remains unknown. In this research, we observed the sequence features of RTX proteins, a major class of type 1 secreted substrates. We found striking non-RTX-motif amino acid composition patterns at the C-termini, most typically exemplified by the enriched ‘[FLI][VAI]’ at the most C-terminal two positions. Machine-learning models, including deep-learning models, were trained using these sequence-based non-RTX-motif features, and further combined into a tri-layer stacking model, T1SEstacker, which predicted the RTX proteins accurately, with a 5-fold cross-validated sensitivity of ~0.89 at the specificity of ~0.94. Besides substrates with RTX motifs, T1SEstacker can also well distinguish non-RTX-motif type 1 secreted proteins, further suggesting their potential existence of common secretion signals. In summary, we made comprehensive sequence analysis on the type 1 secreted RTX proteins, identified common sequence-based features at the C-termini, and developed a stacking model that can predict type 1 secreted proteins accurately.

Identification of new bacterial type III secreted effectors with a recursive Hidden Markov Model profile-alignment strategy

Xi Cheng et al.Oct 16, 2016

To identify new bacterial type III secreted effectors is computationally a big challenge. At least a dozen machine learning algorithms have been developed, but so far have only achieved limited success. Sequence similarity appears important for biologists but is frequently neglected by algorithm developers for effector prediction, although large success was achieved in the field with this strategy a decade ago. In this study, we propose a recursive sequence alignment strategy with Hidden Markov Models, to comprehensively find homologs of known YopJ/P full-length proteins, effector domains and N-terminal signal sequences. Using this method, we identified 155 different YopJ/P-family effectors and 59 proteins with YopJ/P N-terminal signal sequences from 27 genera and more than 70 species. Among these genera, we also identified one type III secretion system (T3SS) from Uliginosibacterium and two T3SSs from Rhizobacter for the first time. Higher conservation of effector domains, N-terminal fusion of signal sequences to other effectors, and the exchange of N-terminal signal sequences between different effector proteins were frequently observed for YopJ/P-family proteins. This made it feasible to identify new effectors based on separate similarity screening for the N-terminal signal peptides and the effector domains of known effectors. This method can also be applied to search for homologues of other known T3SS effectors.