The field of computational reinforcement learning (RL) has proved extremely useful in research on human and animal behavior and brain function. However, the simple forms of RL considered in most empirical research do not scale well, making their relevance to complex, real-world behavior unclear. In computational RL, one strategy for addressing the scaling problem is to intro-duce hierarchical structure, an approach that has intriguing parallels with human behavior. We have begun to investigate the potential relevance of hierarchical RL (HRL) to human and animal behavior and brain function. In the present chapter, we first review two results that show the existence of neural correlates to key predictions from HRL. Then, we focus on one aspect of this work, which deals with the question of how action hierarchies are initially established. Work in HRL suggests that hierarchy learning is accomplished by identifying useful subgoal states, and that this might in turn be accomplished through a structural analysis of the given task domain. We review results from a set of behavioral and neuroimaging experiments, in which we have investigated the relevance of these ideas to human learning and decision making.
Attention is commonly thought to be manifest through local variations in neural gain. However, what would be the effects of brain-wide changes in gain? We hypothesized that global fluctuations in gain modulate the breadth of attention and the degree to which processing is focused on aspects of the environment to which one is predisposed to attend. We found that measures of pupil diameter, which are thought to track levels of locus coeruleus norepinephrine activity and neural gain, were correlated with the degree to which learning was focused on stimulus dimensions that individual human participants were more predisposed to process. In support of our interpretation of this effect in terms of global changes in gain, we found that the measured pupillary and behavioral variables were strongly correlated with global changes in the strength and clustering of functional connectivity, as brain-wide fluctuations of gain would predict.
Fear memories are notoriously difficult to erase, often recovering over time. The longstanding explanation for this finding is that, in extinction training, a new memory is formed that competes with the old one for expression but does not otherwise modify it. This explanation is at odds with traditional models of learning such as Rescorla-Wagner and reinforcement learning. A possible reconciliation that was recently suggested is that extinction training leads to the inference of a new state that is different from the state that was in effect in the original training. This solution, however, raises a new question: under what conditions are new states, or new memories formed? Theoretical accounts implicate persistent large prediction errors in this process. As a test of this idea, we reasoned that careful design of the reinforcement schedule during extinction training could reduce these prediction errors enough to prevent the formation of a new memory, while still decreasing reinforcement sufficiently to drive modification of the old fear memory. In two Pavlovian fear-conditioning experiments, we show that gradually reducing the frequency of aversive stimuli, rather than eliminating them abruptly, prevents the recovery of fear. This finding has important implications for theories of state discovery in reinforcement learning.
Studies suggest that dopaminergic neurons report a unitary, global reward prediction error signal. However, learning in complex real-life tasks, in particular tasks that show hierarchical structure, requires multiple prediction errors that may coincide in time. We used functional neuroimaging to measure prediction error signals in humans performing such a hierarchical task involving simultaneous, uncorrelated prediction errors. Analysis of signals in a priori anatomical regions of interest in the ventral striatum and the ventral tegmental area indeed evidenced two simultaneous, but separable, prediction error signals corresponding to the two levels of hierarchy in the task. This result suggests that suitably designed tasks may reveal a more intricate pattern of firing in dopaminergic neurons. Moreover, the need for downstream separation of these signals implies possible limitations on the number of different task levels that we can learn about simultaneously.
Recognizing when the world changes is fundamental for normal learning. In this issue of Neuron, Bradfield etal. (2013) show that cholinergic interneurons in dorsomedial striatum are critical to the process whereby new states of the world are appropriately registered and retrieved during associative learning
We thought we had figured out dopamine, a neuromodulator involved in everything from learning to addiction. But the finding that dopamine levels ramp up as rats navigate to a reward may overthrow current theories. See Letter p.575
Theoretical models of unsupervised category learning postulate that humans “invent” categories to accommodate new patterns, but tend to group stimuli into a small number of categories. This “Occam's razor” principle is motivated by normative rules of statistical inference. If categories influence perception, then one should find effects of category invention on simple perceptual estimation. In a series of experiments, we tested this prediction by asking participants to estimate the number of colored circles on a computer screen, with the number of circles drawn from a color-specific distribution. When the distributions associated with each color overlapped substantially, participants' estimates were biased toward values intermediate between the two means, indicating that subjects ignored the color of the circles and grouped different-colored stimuli into one perceptual category. These data suggest that humans favor simpler explanations of sensory inputs. In contrast, when the distributions associated with each color overlapped minimally, the bias was reduced (i.e., the estimates for each color were closer to the true means), indicating that sensory evidence for more complex explanations can override the simplicity bias. We present a rational analysis of our task, showing how these qualitative patterns can arise from Bayesian computations.
This page contains links to original data from experiments run at the Princeton Neuroscience Institute. These data are available to others for educational purposes. If they are used in publications, please cite the source of the data by indicating the published reference and the address of this website