Niv, Y., & Chan, S. (2011).
On the value of information and other rewards.
Nature Neuroscience ,
14 (9), 1095–1097.
PDFAbstractKnowledge is not just power. Even if advance information can not influence an upcoming event, people (and animals) prefer to know ahead of time what the outcome will be. According to the firing patterns of neurons in the lateral habenula, from the brain's perspective, knowledge is also water—or at least its equivalent in terms of reward.
Takahashi, Y. K., Roesch, M. R., Wilson, R. C., Toreson, K., O'Donnell, P., Niv, Y., & Schoenbaum, G. (2011).
Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex.
Nature Neuroscience ,
14 (12), 1590–1597.
PDFAbstractThe orbitofrontal cortex has been hypothesized to carry information regarding the value of expected rewards. Such information is essential for associative learning, which relies on comparisons between expected and obtained reward for generating instructive error signals. These error signals are thought to be conveyed by dopamine neurons. To test whether orbitofrontal cortex contributes to these error signals, we recorded from dopamine neurons in orbitofrontal-lesioned rats performing a reward learning task. Lesions caused marked changes in dopaminergic error signaling. However, the effect of lesions was not consistent with a simple loss of information regarding expected value. Instead, without orbitofrontal input, dopaminergic error signals failed to reflect internal information about the impending response that distinguished externally similar states leading to differently valued future rewards. These results are consistent with current conceptualizations of orbitofrontal cortex as supporting model-based behavior and suggest an unexpected role for this information in dopaminergic error signaling.
Eldar, E., Morris, G., & Niv, Y. (2011).
The effects of motivation on response rate: A hidden semi-Markov model analysis of behavioral dynamics.
Journal of Neuroscience Methods ,
201 (1), 251–261.
PDFAbstractA central goal of neuroscience is to understand how neural dynamics bring about the dynamics of behavior. However, neural and behavioral measures are noisy, requiring averaging over trials and subjects. Unfortunately, averaging can obscure the very dynamics that we are interested in, masking abrupt changes and artificially creating gradual processes. We develop a hidden semi-Markov model for precisely characterizing dynamic processes and their alteration due to experimental manipulations. This method takes advantage of multiple trials and subjects without compromising the information available in individual events within a trial. We apply our model to studying the effects of motivation on response rates, analyzing data from hungry and sated rats trained to press a lever to obtain food rewards on a free-operant schedule. Our method can accurately account for punctate changes in the rate of responding and for sequential dependencies between responses. It is ideal for inferring the statistics of underlying response rates and the probability of switching from one response rate to another. Using the model, we show that hungry rats have more distinct behavioral states that are characterized by high rates of responding and they spend more time in these high-press-rate states. Moreover, hungry rats spend less time in, and have fewer distinct states that are characterized by a lack of responding (Waiting/Eating states). These results demonstrate the utility of our analysis method, and provide a precise quantification of the effects of motivation on response rates. ©2011 Elsevier B.V.
Ribas-Fernandes, J. J. F., Solway, A., Diuk, C., McGuire, J. T., Barto, A. G., Niv, Y., & Botvinick, M. M. (2011).
A neural signature of hierarchical reinforcement learning. Neuron ,
71 (2), 370–379.
PDFAbstractHuman behavior displays hierarchical structure: simple actions cohere into subtask sequences, which work together to accomplish overall task goals. Although the neural substrates of such hierarchy have been the target of increasing research, they remain poorly understood. We propose that the computations supporting hierarchical behavior may relate to those in hierarchical reinforcement learning (HRL), a machine-learning framework that extends reinforcement-learning mechanisms into hierarchical domains. To test this, we leveraged a distinctive prediction arising from HRL. In ordinary reinforcement learning, reward prediction errors are computed when there is an unanticipated change in the prospects for accomplishing overall task goals. HRL entails that prediction errors should also occur in relation to task subgoals. In three neuroimaging studies we observed neural responses consistent with such subgoal-related reward prediction errors, within structures previously implicated in reinforcement learning. The results reported support the relevance of HRL to the neural processes underlying hierarchical behavior.
McDannald, M. A., Lucantonio, F., Burke, K. A., Niv, Y., & Schoenbaum, G. (2011).
Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning.
Journal of Neuroscience ,
31 (7), 2700–2705.
PDFAbstractIn many cases, learning is thought to be driven by differences between the value of rewards we expect and rewards we actually receive. Yet learning can also occur when the identity of the reward we receive is not as expected, even if its value remains unchanged. Learning from changes in reward identity implies access to an internal model of the environment, from which information about the identity of the expected reward can be derived. As a result, such learning is not easily accounted for by model-free reinforcement learning theories such as temporal difference reinforcement learning (TDRL), which predicate learning on changes in reward value, but not identity. Here, we used unblocking procedures to assess learning driven by value- versus identity-based prediction errors. Rats were trained to associate distinct visual cues with different food quantities and identities. These cues were subsequently presented in compound with novel auditory cues and the reward quantity or identity was selectively changed. Unblocking was assessed by presenting the auditory cues alone in a probe test. Consistent with neural implementations of TDRL models, we found that the ventral striatum was necessary for learning in response to changes in reward value. However, this area, along with orbitofrontal cortex, was also required for learning driven by changes in reward identity. This observation requires that existing models of TDRL in the ventral striatum be modified to include information about the specific features of expected outcomes derived from model-based representations, and that the role of orbitofrontal cortex in these models be clearly delineated.