Arguably, the most difficult part of learning is deciding what to learn about. Should I associate the positive outcome of safely completing a street-crossing with the situation ‘the car approaching the crosswalk was red' or with ‘the approaching car was slowing down'? In this Perspective, we summarize our recent research into the computational and neural underpinnings of ‘representation learning'—how humans (and other animals) construct task representations that allow efficient learning and decision-making. We first discuss the problem of learning what to ignore when confronted with too much information, so that experience can properly generalize across situations. We then turn to the problem of augmenting perceptual information with inferred latent causes that embody unobservable task-relevant information, such as contextual knowledge. Finally, we discuss recent findings regarding the neural substrates of task representations that suggest the orbitofrontal cortex represents ‘task states', deploying them for decision-making and learning elsewhere in the brain.
\textlessp\textgreaterSequential neural activity patterns related to spatial experiences are “replayed” in the hippocampus of rodents during rest. We investigated whether replay of nonspatial sequences can be detected noninvasively in the human hippocampus. Participants underwent functional magnetic resonance imaging (fMRI) while resting after performing a decision-making task with sequential structure. Hippocampal fMRI patterns recorded at rest reflected sequentiality of previously experienced task states, with consecutive patterns corresponding to nearby states. Hippocampal sequentiality correlated with the fidelity of task representations recorded in the orbitofrontal cortex during decision-making, which were themselves related to better task performance. Our findings suggest that hippocampal replay may be important for building representations of complex, abstract tasks elsewhere in the brain and establish feasibility of investigating fast replay signals with fMRI.\textless/p\textgreater
Rationale. Depression is a disorder characterized by sustained negative affect and blunted positive affect, suggesting potential abnormalities in reward learning and its interaction with episodic memory. Objectives. This study investigated how reward prediction errors experienced during learning modulate memory for rewarding events in individuals with depressive and non-depressive symptoms. Methods. Across three experiments, participants learned the average values of two scene categories in two learning contexts. Each learning context had either high or low outcome variance, allowing us to test the effects of small and large prediction errors on learning and memory. Participants were later tested for their memory of trial-unique scenes that appeared alongside outcomes. We compared learning and memory performance of individuals with self-reported depressive symptoms (N = 101) to those without (N = 184). Results. Although there were no overall differences in reward learning between the depressive and non-depressive group, depression severity within the depressive group predicted greater error in estimating the values of the scene categories. Similarly, there were no overall differences in memory performance. However, in depressive participants, negative prediction errors enhanced episodic memory more so than did positive prediction errors, and vice versa for non-depressive participants who showed a larger effect of positive prediction errors on memory. These results reflected differences in memory both within group and across groups. Conclusions. Individuals with self-reported depressive symptoms showed relatively intact reinforcement learning, but demonstrated a bias for encoding events that accompanied surprising negative outcomes versus surprising positive ones. We discuss a potential neural mechanism supporting these effects, which may underlie or contribute to the excessive negative affect observed in depression.
Dopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or excess value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.
Compact representations of the environment allow humans to behave efficiently in a complex world. Reinforcement learning models capture many behavioral and neural effects but do not explain recent findings showing that structure in the environment influences learning. In parallel, Bayesian cognitive models predict how humans learn structured knowledge but do not have a clear neurobiological implementation. We propose an integration of these two model classes in which structured knowledge learned via approximate Bayesian inference acts as a source of selective attention. In turn, selective attention biases reinforcement learning towards relevant dimensions of the environment. An understanding of structure learning will help to resolve the fundamental challenge in decision science: explaining why people make the decisions they do.
Decisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should be sensitive to not only whether the choice itself was suboptimal but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this, we used a modified version of a classic reinforcement learning task in which feedback indicated whether negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful, but reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction errors in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.
The orbitofrontal cortex (OFC) has long been implicated in signaling information about expected outcomes to facilitate adaptive or flexible behavior. Current proposals focus on signaling of expected value versus the representation of a value-agnostic cognitive map of the task. While often suggested as mutually exclusive, these alternatives may represent extreme ends of a continuum determined by task complexity and experience. As learning proceeds, an initial, detailed cognitive map might be acquired, based largely on external information. With more experience, this hypothesized map can then be tailored to include relevant abstract hidden cognitive constructs. The map would default to an expected value in situations where other attributes are largely irrelevant, but, in richer tasks, a more detailed structure might continue to be represented, at least where relevant to behavior. Here, we examined this by recording single-unit activity from the OFC in rats navigating an odor sequence task analogous to a spatial maze. The odor sequences provided a mappable state space, with 24 unique “positions” defined by sensory information, likelihood of reward, or both. Consistent with the hypothesis that the OFC represents a cognitive map tailored to the subjects' intentions or plans, we found a close correspondence between how subjects were using the sequences and the neural representations of the sequences in OFC ensembles. Multiplexed with this value-invariant representation of the task, we also found a representation of the expected value at each location. Thus, the value and task structure co-existed as dissociable components of the neural code in OFC.
Rationale. Pairing rewarding outcomes with audiovisual cues in simulated gambling games increases risky choice in both humans and rats. However, the cognitive mechanism through which this sensory enhancement biases decision-making is unknown. Objectives. To assess the computational mechanisms that promote risky choice during gambling, we applied a series of reinforcement learning models to a large dataset of choices acquired from rats as they each performed one of two variants of a rat gambling task (rGT), in which rewards on “win” trials were delivered either with or without salient audiovisual cues. Methods. We used a sampling technique based on Markov chain Monte Carlo to obtain posterior estimates of model parameters for a series of RL models of increasing complexity, in order to assess the relative contribution of learning about positive and negative outcomes to the latent valuation of each choice option on the cued and uncued rGT. Results. Rats which develop a preference for the risky options on the rGT substantially down-weight the equivalent cost of the time-out punishments during these tasks. For each model tested, the reduction in learning from the negative time-outs correlated with the degree of risk preference in individual rats. We found no apparent relationship between risk preference and the parameters that govern learning from the positive rewards. Conclusions. The emergence of risk-preferring choice on the rGT derives from a relative insensitivity to the cost of the time-out punishments, as opposed to a relative hypersensitivity to rewards. This hyposensitivity to punishment is more likely to be induced in individual rats by the addition of salient audiovisual cues to rewards delivered on win trials.
The activity of neural populations in the brains of humans and animals can exhibit vastly different spatial patterns when faced with different tasks or environmental stimuli. The degrees of similarity between these neural activity patterns in response to different events are used to characterize the representational structure of cognitive states in a neural population. The dominant methods of investigating this similarity structure first estimate neural activity patterns from noisy neural imaging data using linear regression, and then examine the similarity between the estimated patterns. Here, we show that this approach introduces spurious bias structure in the resulting similarity matrix, in particular when applied to fMRI data. This problem is especially severe when the signal-to-noise ratio is low and in cases where experimental conditions cannot be fully randomized in a task. We propose Bayesian Representational Similarity Analysis (BRSA), an alternative method for computing representational similarity, in which we treat the covariance structure of neural activity patterns as a hyper-parameter in a generative model of the neural data. By marginalizing over the unknown activity patterns, we can directly estimate this covariance structure from imaging data. This method offers significant reductions in bias and allows estimation of neural representational similarity with previously unattained levels of precision at low signal-to-noise ratio, without losing the possibility of deriving an interpretable distance measure from the estimated similarity. The method is closely related to Pattern Component Model (PCM), but instead of modeling the estimated neural patterns as in PCM, BRSA models the imaging data directly and is suited for analyzing data in which the order of task conditions is not fully counterbalanced. The probabilistic framework allows for jointly analyzing data from a group of participants. The method can also simultaneously estimate a signal-to-noise ratio map that shows where the learned representational structure is supported more strongly. Both this map and the learned covariance matrix can be used as a structured prior for maximum a posteriori estimation of neural activity patterns, which can be further used for fMRI decoding. Our method therefore paves the way towards a more unified and principled analysis of neural representations underlying fMRI signals. We make our tool freely available in Brain Imaging Analysis Kit (BrainIAK).
Reinforcement learning theory provides a powerful set of computational ideas for modeling human learning and decision making. Reinforcement learning algorithms rely on state representations that enable efficient behavior by focusing only on aspects relevant to the task at hand. Forming such representations often requires selective attention to the sensory environment, and recalling memories of relevant past experiences. A striking range of psychiatric disorders, including bipolar disorder and schizophrenia, involve changes in these cognitive processes. We review and discuss evidence that these changes can be cast as altered state representation, with the goal of providing a useful transdiagnostic dimension along which mental disorders can be understood and compared.
Translating advances in neuroscience into benefits for patients with mental illness presents enormous challenges because it involves both the most complex organ, the brain, and its interaction with a similarly complex environment. Dealing with such complexities demands powerful techniques. Computational psychiatry combines multiple levels and types of computation with multiple types of data in an effort to improve understanding, prediction and treatment of mental illness. Computational psychiatry, broadly defined, encompasses two complementary approaches: data driven and theory driven. Data-driven approaches apply machine-learning methods to high-dimensional data to improve classification of disease, predict treatment outcomes or improve treatment selection. These approaches are generally agnostic as to the underlying mechanisms. Theory-driven approaches, in contrast, use models that instantiate prior knowledge of, or explicit hypotheses about, such mechanisms, possibly at multiple levels of analysis and abstraction. We review recent advances in both approaches, with an emphasis on clinical applications, and highlight the utility of combining them.
We review the abstract concept of a ‘state' – an internal representation posited by reinforcement learning theories to be used by an agent, whether animal, human or artificial, to summarize the features of the external and internal environment that are relevant for future behavior on a particular task. Armed with this summary representation, an agent can make decisions and perform actions to interact effectively with the world. Here, we review recent findings from the neurobiological and behavioral literature to ask: ‘what is a state?' with respect to the internal representations that organize learning and decision making across a range of tasks. We find that state representations include information beyond a straightforward summary of the immediate cues in the environment, providing timing or contextual information from the recent or more distant past, which allows these additional factors to influence decision making and other goal-directed behaviors in complex and perhaps unexpected ways.
This page contains links to original data from experiments run at the Princeton Neuroscience Institute. These data are available to others for educational purposes. If they are used in publications, please cite the source of the data by indicating the published reference and the address of this website