In the field of motor learning, few studies have addressed the case of non-instructed movement sequences learning, as they require long periods of training and data acquisition, and are complex to interpret. In contrast, such problems are readily addressed in machine learning, using artificial agents in simulated environments. To understand the mechanisms that drive the learning behavior of two macaque monkeys in a free-moving multi-target reaching task, we created two Reinforcement Learning (RL) models with different penalty criteria: "Time" reflecting the time spent to perform a trial, and "Power" integrating the energy cost. The initial phase of the learning process is characterized by a rapid improvement in motor performance for both the 2 monkeys and the 2 models, with hand trajectories becoming shorter and smoother while the velocity gradually increases along trials and sessions. This improvement in motor performance with training is associated with a simplification in the trajectory of the movements performed to achieve the task goal. The monkeys and models show a convergent evolution towards an optimal circular motor path, almost exclusively in counter-clockwise direction, and a persistent inter-trial variability. All these elements contribute to interpreting monkeys learning in the terms of a progressive updating of action-selection patterns, following a classic value iteration scheme as in reinforcement learning. However, in contrast with our models, the monkeys also show a specific variability in the choice of the motor sequences to carry out across trials. This variability reflects a form of `path selection9, that is absent in the models. Furthermore, comparing models and behavioral data also reveal sub-optimality in the way monkeys manage the trade-off between optimizing movement duration ("Time") and minimizing its metabolic cost ("Power"), with a tendency to overemphasize one criterion at the detriment of the other one. Overall, this study reveals the subtle interplay between cognitive factors, biomechanical constraints, task achievement and motor efficacy management in motor learning, and highlights the relevance of modeling approaches in revealing the respective contribution of the different elements at play.