This paper tackles the challenge of power consumption in the massive multiple-input multiple-output (mMIMO) base station (BS), where continuous operation of all antennas generates significant heat within a limited physical area. We propose a strategic power control scheme to enhance energy efficiency and mitigate thermal impact. Our approach introduces a time-slotted model incorporating dynamic, time-varying user quality of service (QoS) requirements. We examine energy efficiency under various conditions, presenting discrete and analog power allocation methods for both hybrid and fully digital precoding, with consideration of hardware impairments (HWI).We frame the energy efficiency optimization as dynamic Markov decision process (MDP) problems, constrained by total power, per-antenna power, and dynamic QoS requirements. The randomized ensembled double Q-learning (REDQ) algorithm is utilized with an action coding scheme to reduce computational complexity. By comparing existing reinforcement learning algorithms and evaluating our proposed power allocation schemes across diverse scenarios, simulations demonstrate that our approach improves energy efficiency effectively under varying operational conditions, showcasing its potential as a robust solution for adaptive resource allocation in mMIMO systems.
Support the authors with ResearchCoin