For more information.
Decisions take place in dynamic environments. The nervous system must continually learn the best actions to obtain rewards. In the theoretical framework of optimal control and reinforcement learning, policies (the probability of performing an action given a state of the environment) are updated by feedback arising from errors in the predicted reward. Whereas these reward prediction errors have been mapped to dopaminergic neurons in the midbrain, how the decision variables that generate policies themselves are represented and modulated is unclear. We trained mice on a dynamic foraging task, in which they freely chose between two alternatives that delivered reward with changing probabilities. We found that corticostriatal neurons, in the medial prefrontal cortex (mPFC), maintained persistent changes in firing rates that represented relative and total action values over long timescales. These are consistent with control signals used to drive flexible behavior. We next recorded from serotonergic neurons in the dorsal raphe in the same task, to determine whether their firing rates tracked ongoing variables that could be used to modulate the decision variables in mPFC. We found that serotonergic neurons represented reward rate over long timescales (tens of seconds to minutes). These signals are consistent with modulatory signals used to regulate the robustness of ongoing decision variables.