Abstract Protein language models have exhibited remarkable representational capabilities in various downstream tasks, notably in the prediction of protein functions. Despite their success, these models traditionally grapple with a critical shortcoming: the absence of explicit protein structure information, which is pivotal for elucidating the relationship between protein sequences and their functionality. Addressing this gap, we introduce DeProt, a Transformer-based protein language model designed to incorporate protein sequences and structures. It was pre-trained on millions of protein structures from diverse natural protein clusters. DeProt first serializes protein structures into residue-level local-structure sequences and use a graph neural network based auto-encoder to vectorized the local structures. Then, these vectors are quantized and formed a discrete structure tokens by a pre-trained codebook. Meanwhile, DeProt utilize disentangled attention mechanisms to effectively integrate residue sequences with structure token sequences. Despite having fewer parameters and less training data, DeProt significantly outperforms other state-ofthe-art (SOTA) protein language models, including those that are structure-aware and evolution-based, particularly in the task of zero-shot mutant effect prediction across 217 deep mutational scanning assays. Furthermore, DeProt exhibits robust representational capabilities across a spectrum of supervised-learning downstream tasks. Our comprehensive benchmarks underscore the innovative nature of DeProt’s framework and its superior performance, suggesting its wide applicability in the realm of protein deep learning. For those interested in exploring DeProt further, the code, model weights, and all associated datasets are accessible at: https://github.com/ginnm/DeProt .