The rapid expansion of Internet of Things (IoT) networks underscores the demand for efficient and secure machine learning methods suited for geographically dispersed, resource-constrained, and heterogeneous IoT devices. Federated Reinforcement Learning (FedRL) presents a methodology for training models on IoT devices without centralizing data, thereby addressing privacy, security, and bandwidth concerns. However, current FedRL methods encounter challenges in dynamic and heterogeneous IoT environments, struggling to allocate resources efficiently and accommodate diverse device capabilities. Our work introduces a Three-Stage Asynchronous Federated Reinforcement Learning (TAS-FedRL) framework to optimize resource allocation and local training epochs in heterogeneous IoT environments. We propose a novel Affinity-Based Spectral Client Grouping (ASCG) for dynamic device allocation and implement Deep Q-Networks (DQNs) for adaptive epoch adjustment and resource allocation. We also introduce the Dynamic Temporal-Weighted Reward Aggregation (DTW-RA) technique for updating edge hosts' models, reducing network strain and enabling adaptable device participation. Furthermore, our central server implements a Dynamic Hierarchical-Weighted Reward Aggregation (DH-WRA) mechanism to manage the entire network. Experimental results demonstrate the effectiveness of the TAS-FedRL framework in enhancing system performance, ensuring efficient resource utilization, and mitigating issues arising from IoT heterogeneity.