Over-the-air federated edge learning (Air-FEEL) shows promise as a distributed machine learning paradigm for edge devices. By leveraging the superposition property of a multiple access channel (MAC), Air-FEEL can achieve low communication latency during training while enhancing the data privacy of edge devices, though at the expense of compromised learning performance. Recent studies suggest that optimizing the convergence speed of Air-FEEL can be accomplished by regulating the transmission power of edge devices while ensuring their differential privacy (DP). In this paper, we advance by incorporating device sampling in Air-FEEL (Air-FEEL-DS) to improve privacy and reduce device energy consumption, where each edge device decides randomly and independently whether to participate in each training round. Firstly, we theoretically characterize both the DP guarantee and convergence performance of Air-FEEL-DS. Then, we formulate a power control optimization problem to optimize the convergence speed while ensuring a specified DP guarantee. Despite the non-convex nature of this problem, we propose an efficient algorithm by linking it to a variant, transforming the variant into a convex problem, and demonstrating that the convex problem accommodates an efficient waterfilling-like algorithm. Finally, simulation results show that our proposed power control scheme achieves much faster convergence for Air-FEEL-DS than the channel inversion method, and has close convergence performance with significantly lower energy consumption compared to Air-FEEL with optimized power control but without device sampling.