Both the orbitofrontal cortex (OFC) and ventral striatum (VS) have been implicated in signalling reward expectancies. However, exactly what roles these two disparate structures play, and how they are different is very much an open question. Recent results from the Schoenbaum lab (Takahashi et al., this meeting) describing the detailed effect of OFC lesions on putative reward prediction error signalling by midbrain dopaminergic neurons of rats, point to one possible delineation. Here we describe a reinforcement learning (RL) model of the Takahashi et al. results, that suggests related, but slightly different roles for the OFC and VS in signalling reward expectancies. We present an actor/critic model with one actor (putatively the dorsal striatum) and two critics (OFC and VS). We hypothesise that the VS critic learns state values relatively slowly and in a model free way, while OFC learns state values faster and in a model based way, using one step look ahead. Both areas contribute to a single prediction error signal, computed in ventral tegmental area (VTA), that is used to teach both critics and the actor. As they receive the same teaching signal, the critics, OFC and VS, essentially compete for the value of each state. Our model makes a number of predictions regarding the effects of OFC and VS lesions on the response properties of dopaminergic (putatively prediction error encoding) neurons in VTA. The model predicts that lesions to either VS or OFC result in persistent prediction errors to predictable rewards and diminished prediction errors on the omission of predictable rewards. At the time of a reward predicting cue, the model predicts that these lesions cause both positive and negative prediction errors to be diminished. When the animal is free to choose between a high and low valued option, we predict a difference between the effects of OFC and VS lesions. In the “unlesioned” model, because of the proposed look-ahead abilities of OFC, the model predicts differential signals at the time at which the decision is made, corresponding to whether the high or low valued option has been chosen. When the model-OFC is “lesioned”, however, these differential signals disappear as the model is no longer aware of the decision that will be made. This is not the case when model-VS is “lesioned” in which case the difference between high and low valued options persists. These predictions regarding OFC lesions are born out in Takahashi et al.'s experiments on rats.
A. Redish et al. (2007) proposed a reinforcement learning model of context-dependent learning and extinction in conditioning experiments, using the idea of "state classification" to categorize new observations into states. In the current article, the authors propose an interpretation of this idea in terms of normative statistical inference. They focus on renewal and latent inhibition, 2 conditioning paradigms in which contextual manipulations have been studied extensively, and show that online Bayesian inference within a model that assumes an unbounded number of latent causes can characterize a diverse set of behavioral results from such manipulations, some of which pose problems for the model of Redish et al. Moreover, in both paradigms, context dependence is absent in younger animals, or if hippocampal lesions are made prior to training. The authors suggest an explanation in terms of a restricted capacity to infer new causes.
Reinforcement learning (RL) algorithms provide powerful explanations for simple learning and decision-making behaviors and the functions of their underlying neural substrates. Unfortunately, in real-world situations that involve many stimuli and actions, these algorithms learn pitifully slowly, exposing their inferiority in comparison to animal and human learning. Here we suggest that one reason for this discrepancy is that humans and animals take advantage of structure that is inherent in real-world tasks to simplify the learning problem. We survey an emerging literature on 'structure learning'–using experience to infer the structure of a task–and how this can be of service to RL, with an emphasis on structure in perception and action.
How is reinforcement learning possible in a high-dimensional world? Without making any assumptions about the struc- ture of the state space, the amount of data required to effec- tively learn a value function grows exponentially with the state space’s dimensionality. However, humans learn to solve high- dimensional problems much more rapidly than would be ex- pected under this scenario. This suggests that humans em- ploy inductive biases to guide (and accelerate) their learning. Here we propose one particular bias—sparsity—that amelio- rates the computational challenges posed by high-dimensional state spaces, and present experimental evidence that humans can exploit sparsity information when it is available.
This page contains links to original data from experiments run at the Princeton Neuroscience Institute. These data are available to others for educational purposes. If they are used in publications, please cite the source of the data by indicating the published reference and the address of this website