The authors devise an algorithm that can cluster T cell receptor (TCR) sequences sharing the same specificity, predict the HLA restriction of these TCR clusters on the basis of subjects’ genotypes and help to identify specific peptide major histocompatibility complex ligands. T cells have a critical role in the immune system, using receptors to recognize and respond to specific antigens. T cell receptors form in unique sequences during T cell development, resulting in a huge diversity of sequences. Mark Davis and colleagues address the challenging question of how T cell receptor sequences relate to antigen specificity. They start with T cells of defined specificity and a structural knowledge of the interaction of different T cell receptors with their major histocompatibility complex (MHC) in complex with cognate peptides. They use this to devise an algorithm that predicts the human leukocyte antigen (HLA) restriction of the T cell receptor targets and helps to identify specific peptide MHC ligands. Elsewhere in this issue, Paul Thomas and colleagues use molecular genetic tools to analyse the diversity of epitope-specific T cell repertoires to extract features that enable the prediction of T cell epitope specificity immunity based on sequence analysis. T cell receptor (TCR) sequences are very diverse, with many more possible sequence combinations than T cells in any one individual1,2,3,4. Here we define the minimal requirements for TCR antigen specificity, through an analysis of TCR sequences using a panel of peptide and major histocompatibility complex (pMHC)-tetramer-sorted cells and structural data. From this analysis we developed an algorithm that we term GLIPH (grouping of lymphocyte interactions by paratope hotspots) to cluster TCRs with a high probability of sharing specificity owing to both conserved motifs and global similarity of complementarity-determining region 3 (CDR3) sequences. We show that GLIPH can reliably group TCRs of common specificity from different donors, and that conserved CDR3 motifs help to define the TCR clusters that are often contact points with the antigenic peptides. As an independent validation, we analysed 5,711 TCRβ chain sequences from reactive CD4 T cells from 22 individuals with latent Mycobacterium tuberculosis infection. We found 141 TCR specificity groups, including 16 distinct groups containing TCRs from multiple individuals. These TCR groups typically shared HLA alleles, allowing prediction of the likely HLA restriction, and a large number of M. tuberculosis T cell epitopes enabled us to identify pMHC ligands for all five of the groups tested. Mutagenesis and de novo TCR design confirmed that the GLIPH-identified motifs were critical and sufficient for shared-antigen recognition. Thus the GLIPH algorithm can analyse large numbers of TCR sequences and define TCR specificity groups shared by TCRs and individuals, which should greatly accelerate the analysis of T cell responses and expedite the identification of specific ligands.