Investigating protein-ligand binding sites is the key step in engineering protein/enzyme activity and selectivity. In this study, we developed a 3D convolutional neural network DUnet that derived from DenseNet and UNet for predicting the protein-ligand binding sites. To train DUnet, the features of protein 3D structure were extracted by describing the atomic physical characters, and the ligand binding sites were used as training labels. DUnet was trained using three dataset, the scPDB dataset (collecting of protein-ligand complexes from Protein Data Bank), scPDB and SC6K (collecting of protein-ligand complexes deposited after January 1st, 2018 from Protein Data Bank) datasets, and scPDB and its derived dataset by rotating the samples in the dataset. DUnet displayed better performance than the current state-of-art methods during the benchmark test using independent validation sets, and enlarging the training set contributed to better accuracy. We developed a small dataset contains commonly used industrial enzymes for testing DUnet and found that it was also accurate in predicting the substrate binding sites. We experimentally characterized the substrate binding sites of microbial transglutaminase according to the prediction and showed the significance of these sites. Finally, DUnet was used to predict the ligand binding sites of Swiss-Prot annotated proteins.
Support the authors with ResearchCoin
Support the authors with ResearchCoin