Abstract In the canonical interpretation of phasic activation of dopaminergic neurons during Pavlovian conditioning, initially cell firing is triggered by unexpected rewards. Upon learning, activation instead follows the reward-predictive conditioned stimulus. When expected rewards are withheld, firing is inhibited. Here, we recorded optogenetically identified dopaminergic neurons of ventral tegmental area (VTA) in mice training in successive operant sensory discrimination tasks. A delay was imposed between nose-poke choices and trial outcome signals (reward or punishment). While animals were still performing at sub-criterion levels in the task, firing increased after correct choices, but prior to trial outcome signals. Thus, the neurons predicted whether choices would be rewarded, despite the animals’ poor behavioral performance. Surprisingly, these neurons also fired after reward delivery, as if the rewards had been unexpected, but the cells were inhibited after punishment signals, as if the reward had been expected after all. These inconsistencies suggest extension of theoretical formulations of dopaminergic neuronal activity: it would embody multiple roles in temporal difference learning and actor-critic models. Furthermore, during chance and sub-criterion performance levels during task training, the mice performed other task strategies (e.g., alternation and spatial persistence) which did not reliably elicit rewards, again while these neurons predicted the correct choice. The reward prediction activity of these neurons could serve as critic signal for the preceding choice. These finding are consistent with the notion that multiple Bayesian belief representations must be reconciled prior to reaching criterion performance levels.