MotivationIncreasingly comprehensive characterisation of cancer associated genetic alteration has paved the way for the development of highly specific therapeutic vaccines. Predicting precisely binding and presentation of peptides by MHC alleles is an important step towards such therapies. Recent data suggest that presentation of both class I and II epitopes is critical for the induction of a sustained effective immune response. However, the prediction performance for MHC class II has been limited compared to class I. ResultsWe present a transformer neural network model which leverages on self-supervised pretraining from a large corpus of protein sequences. We also propose a multiple instance learning (MIL) framework to deconvolve mass spectrometry data where multiple potential MHC alleles may have presented each peptide. We show that pretraining boosted the performance for these tasks. Combining pretraining and the novel MIL approach, our model outperforms state-of-the-art models for both binding and mass spectrometry presentation predictions. AvailabilityOur model is available at https://github.com/s6juncheng/BERTMHC Contactjun.cheng@neclab.eu, brandon.malone@neclab.eu
Support the authors with ResearchCoin
Support the authors with ResearchCoin