Power grids are susceptible to cascading failure, which can have detrimental consequences for modern society. Remedial actions, such as proactive islanding, generator tripping, and load shedding, offer viable solutions to mitigate cascading failure in power grids. The success of applying these solutions lies in the timeliness and the appropriate choice of actions during the rapid propagation process of cascading failure. In this paper, we introduce an intelligent method that leverages deep reinforcement learning to generate adequate remedial actions in real time. A simulation model of cascading failure is first presented, which combines power flow distribution and the probabilistic failure mechanisms of components to accurately describe the dynamic cascading failure process. Based on this model, a Markov decision process is formulated to address the problem of deciding on the remedial actions as the failure propagates. Proximal Policy Optimization algorithm is then adapted for the training of underlying policies. Experiments are conducted on representative power test cases. Results demonstrate the out-performance of trained policy over benchmarks in both power preservation and decision times, thereby verifying its advantages in mitigating cascading failure in power grids.