Well placement optimization is a crucial method to solve the planar conflicts in reservoir development, mainly to determine the optimal well locations and drilling sequence to maximize the economic benefits of reservoir development. However, the current well placement optimization methods face the problems of high-dimensional discretization of optimization variables and lack of effective incentives for policy exploration, which make it challenging to improve the global optimization ability (the ability to jump out of locally optimal solutions in time and continuously search for better solutions in the whole optimization process) and real-time adjustability of the well placement optimization methods under limited numerical simulation times. In this paper, we propose a new sequential well placement optimization method, based on the Discrete Soft Actor-Critic Algorithm (DSAC), which incorporates the maximum entropy mechanism to formulate well placement and drilling sequencing schemes more efficiently and maximize the net present value (NPV) over the entire life cycle of reservoir development. Specifically, the method models the well placement optimization problem as a Markov Decision Process (MDP) and achieves sequential well placement optimization by training a Deep Reinforcement Learning (DRL) agent that maps reservoir states to a stochastic policy of well placement variables as well as evaluates the value function of the current policy. The DRL agent can determine the optimal infill well location in real-time based on the reservoir state at different times during the development process, thus obtaining the optimal drilling sequence. The proposed method in this paper has two innovations. First, by reconstructing the large-scale discrete action space of well placement optimization variables into multi-discrete action spaces, and with the maximum entropy mechanism, policy exploration is encouraged to improve the global optimization capability. Second, the trained policy can swiftly adapt the subsequent well placement scheme for a specific state of the target reservoir without the requirement to initiate training from scratch, which can realize the offline application of the trained policy and has better real-time adjustability. To verify the effectiveness of the proposed method, it is tested in 2D and 3D reservoir models. The results show that DSAC not only outperforms the gradient-based optimization method, classical evolutionary algorithms, and existing reinforcement learning proximal policy optimization (PPO) method in terms of global optimization ability but also shows better real-time adjustability of the trained policy when applied offline.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.