Abstract Drug-induced liver injury (DILI) is an adverse hepatic drug reaction that can potentially lead to life-threatening liver failure. Previously published work in the scientific literature on DILI has provided valuable insights for the understanding of hepatotoxicity as well as drug development. However, the manual search of scientific literature in PubMed is laborious. Natural language processing (NLP) techniques have been developed to decipher and understand the meaning of human language by extracting useful information from unstructured text data. In particular, NLP along with artificial intelligence (AI) / machine learning (ML) techniques may allow automatic processing of the DILI literature, but useful methods are yet to be demonstrated. To address this challenge, we have developed an integrated NLP/ML classification model to identify DILI-related literature using only paper titles and abstracts. We used 14,203 publications provided by the Critical Assessment of Massive Data Analysis (CAMDA) challenge, employing word vectorization techniques in NLP coupled with machine learning methods. Classification modeling was performed using 2/3 of the data for training and the remainder for testing in internal validation. The best performance was achieved using a linear support vector machine (SVM) model that combined vectors derived from term frequency-inverse document frequency (TF-IDF) and Word2Vec , achieving an accuracy of 95.0% and an F1-score of 95.0%. The final SVM model built using all 14,203 publications was tested on independent datasets, resulting in accuracies of 92.5%, 96.3%, and 98.3%, and F1-scores of 93.5%, 86.1%, and 75.6% for three test sets (T1-T3). The SVM model was tested on four external validation sets (V1-V4), resulting in accuracies of 92.0%, 96.2%, 98.3%, and 93.1%, and F1-scores of 92.4%, 82.9%, 75.0%, and 93.3%.