Niv, Y. (2019). Learning task-state representations. Nature Neuroscience , 22 (10), 1544–1553. Publisher's VersionAbstract
Arguably, the most difficult part of learning is deciding what to learn about. Should I associate the positive outcome of safely completing a street-crossing with the situation ‘the car approaching the crosswalk was red' or with ‘the approaching car was slowing down'? In this Perspective, we summarize our recent research into the computational and neural underpinnings of ‘representation learning'—how humans (and other animals) construct task representations that allow efficient learning and decision-making. We first discuss the problem of learning what to ignore when confronted with too much information, so that experience can properly generalize across situations. We then turn to the problem of augmenting perceptual information with inferred latent causes that embody unobservable task-relevant information, such as contextual knowledge. Finally, we discuss recent findings regarding the neural substrates of task representations that suggest the orbitofrontal cortex represents ‘task states', deploying them for decision-making and learning elsewhere in the brain.
Schuck, N. W., & Niv, Y. (2019). Sequential replay of nonspatial task states in the human hippocampus. Science. Publisher's VersionAbstract
\textlessp\textgreaterSequential neural activity patterns related to spatial experiences are “replayed” in the hippocampus of rodents during rest. We investigated whether replay of nonspatial sequences can be detected noninvasively in the human hippocampus. Participants underwent functional magnetic resonance imaging (fMRI) while resting after performing a decision-making task with sequential structure. Hippocampal fMRI patterns recorded at rest reflected sequentiality of previously experienced task states, with consecutive patterns corresponding to nearby states. Hippocampal sequentiality correlated with the fidelity of task representations recorded in the orbitofrontal cortex during decision-making, which were themselves related to better task performance. Our findings suggest that hippocampal replay may be important for building representations of complex, abstract tasks elsewhere in the brain and establish feasibility of investigating fast replay signals with fMRI.\textless/p\textgreater
Rouhani, N., & Niv, Y. (2019). Depressive symptoms bias the prediction-error enhancement of memory towards negative events in reinforcement learning. Psychopharmacology , 236 (8), 2425–2435. Publisher's VersionAbstract
Rationale. Depression is a disorder characterized by sustained negative affect and blunted positive affect, suggesting potential abnormalities in reward learning and its interaction with episodic memory. Objectives. This study investigated how reward prediction errors experienced during learning modulate memory for rewarding events in individuals with depressive and non-depressive symptoms. Methods. Across three experiments, participants learned the average values of two scene categories in two learning contexts. Each learning context had either high or low outcome variance, allowing us to test the effects of small and large prediction errors on learning and memory. Participants were later tested for their memory of trial-unique scenes that appeared alongside outcomes. We compared learning and memory performance of individuals with self-reported depressive symptoms (N = 101) to those without (N = 184). Results. Although there were no overall differences in reward learning between the depressive and non-depressive group, depression severity within the depressive group predicted greater error in estimating the values of the scene categories. Similarly, there were no overall differences in memory performance. However, in depressive participants, negative prediction errors enhanced episodic memory more so than did positive prediction errors, and vice versa for non-depressive participants who showed a larger effect of positive prediction errors on memory. These results reflected differences in memory both within group and across groups. Conclusions. Individuals with self-reported depressive symptoms showed relatively intact reinforcement learning, but demonstrated a bias for encoding events that accompanied surprising negative outcomes versus surprising positive ones. We discuss a potential neural mechanism supporting these effects, which may underlie or contribute to the excessive negative affect observed in depression.
Sharpe, M. J., Batchelor, H. M., Mueller, L. E., Chang, C. Y., Maes, E. J. P., Niv, Y., & Schoenbaum, G. (2019). Dopamine transients delivered in learning contexts do not act as model-free prediction errors. bioRxiv. Publisher's VersionAbstract
Dopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or excess value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.
Radulescu, A., Niv, Y., & Ballard, I. (2019). Holistic Reinforcement Learning: The Role of Structure and Attention. Trends in Cognitive Sciences. Publisher's VersionAbstract
Compact representations of the environment allow humans to behave efficiently in a complex world. Reinforcement learning models capture many behavioral and neural effects but do not explain recent findings showing that structure in the environment influences learning. In parallel, Bayesian cognitive models predict how humans learn structured knowledge but do not have a clear neurobiological implementation. We propose an integration of these two model classes in which structured knowledge learned via approximate Bayesian inference acts as a source of selective attention. In turn, selective attention biases reinforcement learning towards relevant dimensions of the environment. An understanding of structure learning will help to resolve the fundamental challenge in decision science: explaining why people make the decisions they do.
McDougle, S. D., Butcher, P. A., Parvin, D. E., Mushtaq, F., Niv, Y., Ivry, R. B., & Taylor, J. A. (2019). Neural Signatures of Prediction Errors in a Decision-Making Task Are Modulated by Action Execution Failures. Current Biology. Publisher's VersionAbstract
Decisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should be sensitive to not only whether the choice itself was suboptimal but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this, we used a modified version of a classic reinforcement learning task in which feedback indicated whether negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful, but reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction errors in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.
Zhou, J., Gardner, M. P. H., Stalnaker, T. A., Ramus, S. J., Wikenheiser, A. M., Niv, Y., & Schoenbaum, G. (2019). Rat Orbitofrontal Ensemble Activity Contains Multiplexed but Dissociable Representations of Value and Task Structure in an Odor Sequence Task. Current Biology , 29 (6), 897–907.e3. Publisher's VersionAbstract
The orbitofrontal cortex (OFC) has long been implicated in signaling information about expected outcomes to facilitate adaptive or flexible behavior. Current proposals focus on signaling of expected value versus the representation of a value-agnostic cognitive map of the task. While often suggested as mutually exclusive, these alternatives may represent extreme ends of a continuum determined by task complexity and experience. As learning proceeds, an initial, detailed cognitive map might be acquired, based largely on external information. With more experience, this hypothesized map can then be tailored to include relevant abstract hidden cognitive constructs. The map would default to an expected value in situations where other attributes are largely irrelevant, but, in richer tasks, a more detailed structure might continue to be represented, at least where relevant to behavior. Here, we examined this by recording single-unit activity from the OFC in rats navigating an odor sequence task analogous to a spatial maze. The odor sequences provided a mappable state space, with 24 unique “positions” defined by sensory information, likelihood of reward, or both. Consistent with the hypothesis that the OFC represents a cognitive map tailored to the subjects' intentions or plans, we found a close correspondence between how subjects were using the sequences and the neural representations of the sequences in OFC ensembles. Multiplexed with this value-invariant representation of the task, we also found a representation of the expected value at each location. Thus, the value and task structure co-existed as dissociable components of the neural code in OFC.
Langdon, A. J., Hathaway, B. A., Zorowitz, S., Harris, C. B. W., & Winstanley, C. A. (2019). Relative insensitivity to time-out punishments induced by win-paired cues in a rat gambling task. Psychopharmacology , 236 (8), 2543–2556. Publisher's VersionAbstract
Rationale. Pairing rewarding outcomes with audiovisual cues in simulated gambling games increases risky choice in both humans and rats. However, the cognitive mechanism through which this sensory enhancement biases decision-making is unknown. Objectives. To assess the computational mechanisms that promote risky choice during gambling, we applied a series of reinforcement learning models to a large dataset of choices acquired from rats as they each performed one of two variants of a rat gambling task (rGT), in which rewards on “win” trials were delivered either with or without salient audiovisual cues. Methods. We used a sampling technique based on Markov chain Monte Carlo to obtain posterior estimates of model parameters for a series of RL models of increasing complexity, in order to assess the relative contribution of learning about positive and negative outcomes to the latent valuation of each choice option on the cued and uncued rGT. Results. Rats which develop a preference for the risky options on the rGT substantially down-weight the equivalent cost of the time-out punishments during these tasks. For each model tested, the reduction in learning from the negative time-outs correlated with the degree of risk preference in individual rats. We found no apparent relationship between risk preference and the parameters that govern learning from the positive rewards. Conclusions. The emergence of risk-preferring choice on the rGT derives from a relative insensitivity to the cost of the time-out punishments, as opposed to a relative hypersensitivity to rewards. This hyposensitivity to punishment is more likely to be induced in individual rats by the addition of salient audiovisual cues to rewards delivered on win trials.
Cai, M. B., Schuck, N. W., Pillow, J. W., & Niv, Y. (2019). Representational structure or task structure? Bias in neural representational similarity analysis and a Bayesian method for reducing bias. PLoS computational biology. Publisher's VersionAbstract
The activity of neural populations in the brains of humans and animals can exhibit vastly different spatial patterns when faced with different tasks or environmental stimuli. The degrees of similarity between these neural activity patterns in response to different events are used to characterize the representational structure of cognitive states in a neural population. The dominant methods of investigating this similarity structure first estimate neural activity patterns from noisy neural imaging data using linear regression, and then examine the similarity between the estimated patterns. Here, we show that this approach introduces spurious bias structure in the resulting similarity matrix, in particular when applied to fMRI data. This problem is especially severe when the signal-to-noise ratio is low and in cases where experimental conditions cannot be fully randomized in a task. We propose Bayesian Representational Similarity Analysis (BRSA), an alternative method for computing representational similarity, in which we treat the covariance structure of neural activity patterns as a hyper-parameter in a generative model of the neural data. By marginalizing over the unknown activity patterns, we can directly estimate this covariance structure from imaging data. This method offers significant reductions in bias and allows estimation of neural representational similarity with previously unattained levels of precision at low signal-to-noise ratio, without losing the possibility of deriving an interpretable distance measure from the estimated similarity. The method is closely related to Pattern Component Model (PCM), but instead of modeling the estimated neural patterns as in PCM, BRSA models the imaging data directly and is suited for analyzing data in which the order of task conditions is not fully counterbalanced. The probabilistic framework allows for jointly analyzing data from a group of participants. The method can also simultaneously estimate a signal-to-noise ratio map that shows where the learned representational structure is supported more strongly. Both this map and the learned covariance matrix can be used as a structured prior for maximum a posteriori estimation of neural activity patterns, which can be further used for fMRI decoding. Our method therefore paves the way towards a more unified and principled analysis of neural representations underlying fMRI signals. We make our tool freely available in Brain Imaging Analysis Kit (BrainIAK).
Radulescu, A., & Niv, Y. (2019). State representation in mental illness. Current Opinion in Neurobiology. Publisher's VersionAbstract
Reinforcement learning theory provides a powerful set of computational ideas for modeling human learning and decision making. Reinforcement learning algorithms rely on state representations that enable efficient behavior by focusing only on aspects relevant to the task at hand. Forming such representations often requires selective attention to the sensory environment, and recalling memories of relevant past experiences. A striking range of psychiatric disorders, including bipolar disorder and schizophrenia, involve changes in these cognitive processes. We review and discuss evidence that these changes can be cast as altered state representation, with the goal of providing a useful transdiagnostic dimension along which mental disorders can be understood and compared.
Bennett, D., Silverstein, S. M., & Niv, Y. (2019). The two cultures of computational psychiatry. JAMA Psychiatry. Publisher's VersionAbstract
Translating advances in neuroscience into benefits for patients with mental illness presents enormous challenges because it involves both the most complex organ, the brain, and its interaction with a similarly complex environment. Dealing with such complexities demands powerful techniques. Computational psychiatry combines multiple levels and types of computation with multiple types of data in an effort to improve understanding, prediction and treatment of mental illness. Computational psychiatry, broadly defined, encompasses two complementary approaches: data driven and theory driven. Data-driven approaches apply machine-learning methods to high-dimensional data to improve classification of disease, predict treatment outcomes or improve treatment selection. These approaches are generally agnostic as to the underlying mechanisms. Theory-driven approaches, in contrast, use models that instantiate prior knowledge of, or explicit hypotheses about, such mechanisms, possibly at multiple levels of analysis and abstraction. We review recent advances in both approaches, with an emphasis on clinical applications, and highlight the utility of combining them.
Langdon, A. J., Song, M., & Niv, Y. (2019). Uncovering the ‘state': Tracing the hidden state representations that structure learning and decision-making. Behavioural Processes , 167, 103891. Publisher's VersionAbstract
We review the abstract concept of a ‘state' – an internal representation posited by reinforcement learning theories to be used by an agent, whether animal, human or artificial, to summarize the features of the external and internal environment that are relevant for future behavior on a particular task. Armed with this summary representation, an agent can make decisions and perform actions to interact effectively with the world. Here, we review recent findings from the neurobiological and behavioral literature to ask: ‘what is a state?' with respect to the internal representations that organize learning and decision making across a range of tasks. We find that state representations include information beyond a straightforward summary of the immediate cues in the environment, providing timing or contextual information from the recent or more distant past, which allows these additional factors to influence decision making and other goal-directed behaviors in complex and perhaps unexpected ways.
Niv, Y. (2018). Deep down, you are a scientist. In Think tank: Forty neuroscientists explore the biological roots of human experience. Publisher's VersionAbstract
You may not know it, but deep down you are a scientist. To be precise, your brain is a scientist—and a good one, too: the kind of scientist that makes clear hypotheses, gathers data from several sources, and then reaches a well-founded conclusion. Although we are not aware of the scientific experimentation occurring in our brain on a momentary basis, the scientific process is fundamental to how our brain works. This scientific process involves three key components. First: hypotheses. Our brain makes hypotheses, or predictions, all the time. The second component of good scientific work is gathering data—testing the hypothesis by comparing it to evidence. The neuroscientists gather data to test the theories about how the brain works from several sources—for example, behavior, invasive recordings of the activity of single cells in the brain, and noninvasive imaging of overall activity in large areas of the brain. Finally, after making precise, well-founded predictions and gathering data from all available sources, a scientist must interpret the empirical observations. It is important to realize that the perceived reality is subjective—it is interpreted—rather than an objective image of the world out there. And in some cases this interpretation can break down. For instance, in schizophrenia, meaningless events and distractors can take on outsized meaning in subjective interpretation, leading to hallucinations, delusions, and paranoia. Our memories are similarly a reflection of our own interpretations rather than a true record of events. (PsycINFO Database Record (c) 2018 APA, all rights reserved)
Rouhani, N., Norman, K. A., & Niv, Y. (2018). Dissociable effects of surprising rewards on learning and memory. Journal of Experimental Psychology: Learning Memory and Cognition , 44 (9), 1430–1443. Publisher's VersionAbstract
The extent to which rewards deviate from learned expectations is tracked by a signal known as a reward prediction error, but it is unclear how this signal interacts with episodic memory. Here, we investigated whether learning in a high-risk environment, with frequent large prediction errors, gives rise to higher fidelity memory traces than learning in a low-risk environment. In Experiment 1, we showed that higher magnitude prediction errors, positive or negative, improved recognition memory for trial-unique items. Participants also increased their learning rate after large prediction errors. In addition, there was an overall higher learning rate in the low-risk environment. Although unsigned prediction errors enhanced memory and increased learning rate, we did not find a relationship between learning rate and memory, suggesting that these two effects were due to separate underlying mechanisms. In Experiment 2, we replicated these results with a longer task that posed stronger memory demands and allowed for more learning. We also showed improved source and sequence memory for high-risk items. In Experiment 3, we controlled for the difficulty of learning in the two risk environments, again replicating the previous results. Moreover, equating the range of prediction errors in the two risk environments revealed that learning in a high-risk context enhanced episodic memory above and beyond the effect of prediction errors to individual items. In summary, our results across three studies showed that (absolute) prediction error magnitude boosted both episodic memory and incremental learning, but the two effects were not correlated, suggesting distinct underlying systems.
Sharpe, M. J., Chang, C. Y., Liu, M. A., Batchelor, H. M., Mueller, L. E., Jones, J. L., Niv, Y., et al. (2018). Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nature Neuroscience , 21 (10), 1493. Publisher's VersionAbstract
Learning to predict reward is thought to be driven by dopaminergic prediction errors, which reflect discrepancies between actual and expected value. Here the authors show that learning to predict neutral events is also driven by prediction errors and that such value-neutral associative learning is also likely mediated by dopaminergic error signals.
Sharpe, M. J., Stalnaker, T. A., Schuck, N. W., Killcross, S., Schoenbaum, G., & Niv, Y. (2018). An Integrated Model of Action Selection: Distinct Modes of Cortical Control of Striatal Decision Making. Annual Review of Psychology. Publisher's VersionAbstract
Making decisions in environments with few choice options is easy. We select the action that results in the most valued outcome. Making decisions in more complex environments, where the same action can produce different outcomes in different conditions, is much harder. In such circumstances, we propose that accurate action selection relies on top-down control from the prelimbic and orbitofrontal cortices over striatal activity through distinct thalamostriatal circuits. We suggest that the prelimbic cortex exerts direct influence over medium spiny neurons in the dorsomedial striatum to represent the state space relevant to the current environment. Conversely, the orbitofrontal cortex is argued to track a subject's position within that state space, likely through modulation of cholinergic interneurons.
Langdon, A. J., Sharpe, M. J., Schoenbaum, G., & Niv, Y. (2018). Model-based predictions for dopamine. Current Opinion in Neurobiology , 49, 1–7. Publisher's VersionAbstract
Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning.
Hermsdorff, G. B., Pereira, T., & Niv, Y. (2018). Quantifying Humans' Priors Over Graphical Representations of Tasks. In Springer Proceedings in Complexity (pp. 281–290). Publisher's VersionAbstract
Some new tasks are trivial to learn while others are almost impossible; what determines how easy it is to learn an arbitrary task? Similar to how our prior beliefs about new visual scenes colors our per- ception of new stimuli, our priors about the structure of new tasks shapes our learning and generalization abilities [2]. While quantifying visual pri- ors has led to major insights on how our visual system works [5,10,11], quantifying priors over tasks remains a formidable goal, as it is not even clear how to define a task [4]. Here, we focus on tasks that have a natural mapping to graphs.We develop a method to quantify humans' priors over these “task graphs”, combining new modeling approaches with Markov chain Monte Carlo with people, MCMCP (a process whereby an agent learns from data generated by another agent, recursively [9]). We show that our method recovers priors more accurately than a standard MCMC sampling approach. Additionally, we propose a novel low-dimensional “smooth” (In the sense that graphs that differ by fewer edges are given similar probabilities.) parametrization of probability distributions over graphs that allows for more accurate recovery of the prior and better generalization.We have also created an online experiment platform that gamifies ourMCMCPalgorithm and allows subjects to interactively draw the task graphs. We use this platform to collect human data on sev- eral navigation and social interactions tasks. We show that priors over these tasks have non-trivial structure, deviating significantly from null models that are insensitive to the graphical information. The priors also notably differ between the navigation and social domains, showing fewer differences between cover stories within the same domain. Finally, we extend our framework to the more general case of quantifying priors over exchangeable random structures.
Schuck, N. W., Wilson, R. C., & Niv, Y. (2018). A State Representation for Reinforcement Learning and Decision-Making in the Orbitofrontal Cortex. In Goal-Directed Decision Making. Publisher's VersionAbstract
Despite decades of research, the exact ways in which the orbitofrontal cortex (OFC) influences cognitive function have remained mysterious. Anatomically, the OFC is characterized by remarkably broad connectivity to sensory, limbic and subcortical areas, and functional studies have implicated the OFC in a plethora of functions ranging from facial processing to value-guided choice. Notwithstanding such diversity of findings, much research suggests that one important function of the OFC is to support decision making and reinforcement learning. Here, we describe a novel theory that posits that OFC's specific role in decision-making is to provide an up-to-date representation of task-related information, called a state representation. This representation reflects a mapping between distinct task states and sensory as well as unobservable information. We summarize evidence supporting the existence of such state representations in rodent and human OFC and argue that forming these state representations provides a crucial scaffold that allows animals to efficiently perform decision making and reinforcement learning in high-dimensional and partially observable environments. Finally, we argue that our theory offers an integrating framework for linking the diversity of functions ascribed to OFC and is in line with its wide ranging connectivity.
Cohen, J. D., Daw, N. D., Engelhardt, B., Hasson, U., Li, K., Niv, Y., Norman, K. A., et al. (2017). Computational approaches to fMRI analysis. Nature Neuroscience , 20 (3), 304–313. Publisher's VersionAbstract
Multi-walled carbon nanotubes (MWCNT) and carbon nanofibers (CNF) were created using chemical vapor deposition at growth temperatures between 500 and 750 ??C, which have increasing crystallinity with increasing growth temperature. We used Raman spectroscopy to analyze the samples. The intensity ratios compared to the G-band, and full-width at half-maximum, of all observable Raman bands in both the first and second-order region were investigated. Good match was observed for the defect related bands of the MWCNT samples and data found in the literature. Several second-order bands display a strong dependency to growth temperature. Similar growth temperature (and thus defect) dependencies were found between several first and second-order bands, which might aid in determining the physical causes of these bands. CNF show much weaker Raman features due to their low crystallinity, making them more difficult to analyse. The results of this work are used to give recommendations on how to investigate MWCNT and CNF crystallinity using Raman spectroscopy. Finally, we demonstrate that Raman spectroscopy can be used to distinguish between the MWCNT root and tip growth mechanism. ?? 2012 Elsevier Ltd. All rights reserved.