Abstract This study aimed to develop a deep learning system for the detection of three-rooted mandibular first molars (MFMs) on panoramic radiographs and to assess its diagnostic performance. Panoramic radiographs, together with cone beam computed tomographic (CBCT) images of the same subjects, were retrospectively collected from 730 patients, encompassing a total of 1444 MFMs (367 teeth were three-rooted and the remaining 1077 teeth were two-rooted). Five convolutional neural network (CNN) models (ResNet-101 and − 50, DenseNet-201, MobileNet-v3 and Inception-v3) were employed to classify three- and two-rooted MFMs on the panoramic radiographs. The diagnostic performance of each model was evaluated using standard metrics, including accuracy, sensitivity, specificity, precision, negative predictive value, and F1 score. Receiver operating characteristic (ROC) curve analyses were performed, with the CBCT examination taken as the gold standard. Among the five CNN models evaluated, ResNet-101 demonstrated superior diagnostic performance, and the AUC value attained was 0.907, significantly higher than that of all other models (all P < 0.01). The accuracy, sensitivity, and specificity were 87.5%, 83.6%, and 88.9%, respectively. DenseNet-201, however, showed the lowest diagnostic performance among the five models (all P < 0.01), with an AUC value of 0.701 and an accuracy of 73.2%. Overall, the performance of the CNNs diminished when using image patches containing only the distal half of MFMs, with AUC values ranging between 0.680 and 0.800. In contrast, the diagnostic performance of the two clinicians was poorer, with AUC values of only 0.680 and 0.632, respectively. In conclusion, the CNN-based deep learning system exhibited a high level of accuracy in the detection of three-rooted MFMs on panoramic radiographs.