Publications

Starting in 2022, the lab has decided to post here only the archival, open-access version of our publications. This is part of the movement to emphasize quality and content over the impact factor or prestige of the journal a paper is published in. Full citations (for referencing papers in your own work) can be found on pubmed and/or within the archival version, which will be updated once a paper is accepted for publication after peer review.

Asterisk (*) denotes equal contribution

Advanced Filters

2018

Niv, Y. (2018). Deep down, you are a scientist. In Think tank: Forty neuroscientists explore the biological roots of human experience. PDF: Deep down, you are a scientist
You may not know it, but deep down you are a scientist. To be precise, your brain is a scientist—and a good one, too: the kind of scientist that makes clear hypotheses, gathers data from several sources, and then reaches a well-founded conclusion. Although we are not aware of the scientific experimentation occurring in our brain on a momentary basis, the scientific process is fundamental to how our brain works. This scientific process involves three key components. First: hypotheses. Our brain makes hypotheses, or predictions, all the time. The second component of good scientific work is gathering data—testing the hypothesis by comparing it to evidence. The neuroscientists gather data to test the theories about how the brain works from several sources—for example, behavior, invasive recordings of the activity of single cells in the brain, and noninvasive imaging of overall activity in large areas of the brain. Finally, after making precise, well-founded predictions and gathering data from all available sources, a scientist must interpret the empirical observations. It is important to realize that the perceived reality is subjective—it is interpreted—rather than an objective image of the world out there. And in some cases this interpretation can break down. For instance, in schizophrenia, meaningless events and distractors can take on outsized meaning in subjective interpretation, leading to hallucinations, delusions, and paranoia. Our memories are similarly a reflection of our own interpretations rather than a true record of events. (PsycINFO Database Record (c) 2018 APA, all rights reserved)
Sharpe, M., Stalnaker, T., Schuck, N., Killcross, S., Schoenbaum, G., & Niv, Y. (2018). An Integrated Model of Action Selection: Distinct Modes of Cortical Control of Striatal Decision Making. Annual Review of Psychology. https://doi.org/10.1146/annurev-psych-010418-102824
Making decisions in environments with few choice options is easy. We select the action that results in the most valued outcome. Making decisions in more complex environments, where the same action can produce different outcomes in different conditions, is much harder. In such circumstances, we propose that accurate action selection relies on top-down control from the prelimbic and orbitofrontal cortices over striatal activity through distinct thalamostriatal circuits. We suggest that the prelimbic cortex exerts direct influence over medium spiny neurons in the dorsomedial striatum to represent the state space relevant to the current environment. Conversely, the orbitofrontal cortex is argued to track a subject's position within that state space, likely through modulation of cholinergic interneurons.
Schuck, N., Wilson, R., & Niv, Y. (2018). A State Representation for Reinforcement Learning and Decision-Making in the Orbitofrontal Cortex. In Goal-Directed Decision Making. https://doi.org/10.1016/b978-0-12-812098-9.00012-7
Despite decades of research, the exact ways in which the orbitofrontal cortex (OFC) influences cognitive function have remained mysterious. Anatomically, the OFC is characterized by remarkably broad connectivity to sensory, limbic and subcortical areas, and functional studies have implicated the OFC in a plethora of functions ranging from facial processing to value-guided choice. Notwithstanding such diversity of findings, much research suggests that one important function of the OFC is to support decision making and reinforcement learning. Here, we describe a novel theory that posits that OFC's specific role in decision-making is to provide an up-to-date representation of task-related information, called a state representation. This representation reflects a mapping between distinct task states and sensory as well as unobservable information. We summarize evidence supporting the existence of such state representations in rodent and human OFC and argue that forming these state representations provides a crucial scaffold that allows animals to efficiently perform decision making and reinforcement learning in high-dimensional and partially observable environments. Finally, we argue that our theory offers an integrating framework for linking the diversity of functions ascribed to OFC and is in line with its wide ranging connectivity.
Rouhani, N., Norman, K., & Niv, Y. (2018). Dissociable effects of surprising rewards on learning and memory. Journal of Experimental Psychology: Learning Memory and Cognition, 44(9), 1430–1443. https://doi.org/10.1037/xlm0000518

Reward-prediction errors track the extent to which rewards deviate from expectations, and aid in learning. How do such errors in prediction interact with memory for the rewarding episode? Existing findings point to both cooperative and competitive interactions between learning and memory mechanisms. Here, we investigated whether learning about rewards in a high-risk context, with frequent, large prediction errors, would give rise to higher fidelity memory traces for rewarding events than learning in a low-risk context. Experiment 1 showed that recognition was better for items associated with larger absolute prediction errors during reward learning. Larger prediction errors also led to higher rates of learning about rewards. Interestingly we did not find a relationship between learning rate for reward and recognition-memory accuracy for items, suggesting that these two effects of prediction errors were caused by separate underlying mechanisms. In Experiment 2, we replicated these results with a longer task that posed stronger memory demands and allowed for more learning. We also showed improved source and sequence memory for items within the high-risk context. In Experiment 3, we controlled for the difficulty of reward learning in the risk environments, again replicating the previous results. Moreover, this control revealed that the high-risk context enhanced item-recognition memory beyond the effect of prediction errors. In summary, our results show that prediction errors boost both episodic item memory and incremental reward learning, but the two effects are likely mediated by distinct underlying systems.

Sharpe, M., Chang, C. Y., Liu, M., Batchelor, H. M., Mueller, L., Jones, J., Niv, Y., & Schoenbaum, G. (2018). Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nature Neuroscience, 21(10), 1493. https://doi.org/10.1038/s41593-018-0202-5
Learning to predict reward is thought to be driven by dopaminergic prediction errors, which reflect discrepancies between actual and expected value. Here the authors show that learning to predict neutral events is also driven by prediction errors and that such value-neutral associative learning is also likely mediated by dopaminergic error signals.
Hermsdorff, G. B., Pereira, T., & Niv, Y. (2018). Quantifying Humans’ Priors Over Graphical Representations of Tasks. Springer Proceedings in Complexity, 281–290. https://doi.org/10.1007/978-3-319-96661-8_30
Some new tasks are trivial to learn while others are almost impossible; what determines how easy it is to learn an arbitrary task? Similar to how our prior beliefs about new visual scenes colors our per- ception of new stimuli, our priors about the structure of new tasks shapes our learning and generalization abilities [2]. While quantifying visual pri- ors has led to major insights on how our visual system works [5,10,11], quantifying priors over tasks remains a formidable goal, as it is not even clear how to define a task [4]. Here, we focus on tasks that have a natural mapping to graphs.We develop a method to quantify humans' priors over these “task graphs”, combining new modeling approaches with Markov chain Monte Carlo with people, MCMCP (a process whereby an agent learns from data generated by another agent, recursively [9]). We show that our method recovers priors more accurately than a standard MCMC sampling approach. Additionally, we propose a novel low-dimensional “smooth” (In the sense that graphs that differ by fewer edges are given similar probabilities.) parametrization of probability distributions over graphs that allows for more accurate recovery of the prior and better generalization.We have also created an online experiment platform that gamifies ourMCMCPalgorithm and allows subjects to interactively draw the task graphs. We use this platform to collect human data on sev- eral navigation and social interactions tasks. We show that priors over these tasks have non-trivial structure, deviating significantly from null models that are insensitive to the graphical information. The priors also notably differ between the navigation and social domains, showing fewer differences between cover stories within the same domain. Finally, we extend our framework to the more general case of quantifying priors over exchangeable random structures.

2017

Auchter, A., Cormack, L., Niv, Y., Gonzalez-Lima, F., & Monfils, M.-H. (2017). Reconsolidation-Extinction Interactions in Fear Memory Attenuation: The Role of Inter-Trial Interval Variability. Frontiers in Behavioral Neuroscience, 11. https://doi.org/10.3389/fnbeh.2017.00002
Most of life is extinct, so incorporating some fossil evidence into analyses of macroevolution is typically seen as necessary to understand the diversification of life and patterns of morphological evolution. Here we test the effects of inclusion of fossils in a study of the body size evolution of afrotherian mammals, a clade that includes the elephants, sea cows and elephant shrews. We find that the inclusion of fossil tips has little impact on analyses of body mass evolution; from a small ancestral size (approx. 100 g), there is a shift in rate and an increase in mass leading to the larger-bodied Paenungulata and Tubulidentata, regardless of whether fossils are included or excluded from analyses. For Afrotheria, the inclusion of fossils and morphological character data affect phylogenetic topology, but these differences have little impact upon patterns of body mass evolution and these body mass evolutionary patterns are consistent with the fossil record. The largest differences between our analyses result from the evolutionary model, not the addition of fossils. For some clades, extant-only analyses may be reliable to reconstruct body mass evolution, but the addition of fossils and careful model selection is likely to increase confidence and accuracy of reconstructed macroevolutionary patterns.
Gershman, S., Monfils, M.-H., Norman, K., & Niv, Y. (2017). The computational nature of memory modification. ELife, 6. https://doi.org/10.7554/eLife.23763
Retrieving a memory can modify its influence on subsequent behavior. We develop a computational theory of memory modification, according to which modification of a memory trace occurs through classical associative learning, but which memory trace is eligible for modification depends on a structure learning mechanism that discovers the units of association by segmenting the stream of experience into statistically distinct clusters (latent causes). New memories are formed when the structure learning mechanism infers that a new latent cause underlies current sensory observations. By the same token, old memories are modified when old and new sensory observations are inferred to have been generated by the same latent cause. We derive this framework from probabilistic principles, and present a computational implementation. Simulations demonstrate that our model can reproduce the major experimental findings from studies of memory modification in the Pavlovian conditioning literature.
Cohen, J., Daw, N., Engelhardt, B., Hasson, U., Li, K., Niv, Y., Norman, K., Pillow, J., Ramadge, P., Turk-Browne, N., & Willke, T. (2017). Computational approaches to fMRI analysis. Nature Neuroscience, 20(3), 304–313. https://doi.org/10.1038/nn.4499
Multi-walled carbon nanotubes (MWCNT) and carbon nanofibers (CNF) were created using chemical vapor deposition at growth temperatures between 500 and 750 C, which have increasing crystallinity with increasing growth temperature. We used Raman spectroscopy to analyze the samples. The intensity ratios compared to the G-band, and full-width at half-maximum, of all observable Raman bands in both the first and second-order region were investigated. Good match was observed for the defect related bands of the MWCNT samples and data found in the literature. Several second-order bands display a strong dependency to growth temperature. Similar growth temperature (and thus defect) dependencies were found between several first and second-order bands, which might aid in determining the physical causes of these bands. CNF show much weaker Raman features due to their low crystallinity, making them more difficult to analyse. The results of this work are used to give recommendations on how to investigate MWCNT and CNF crystallinity using Raman spectroscopy. Finally, we demonstrate that Raman spectroscopy can be used to distinguish between the MWCNT root and tip growth mechanism.
DuBrow, S., Rouhani, N., Niv, Y., & Norman, K. (2017). Does mental context drift or shift?. Current Opinion in Behavioral Sciences, 17, 141–146. https://doi.org/10.1016/j.cobeha.2017.08.003
Theories of episodic memory have proposed that individual memory traces are linked together by a representation of context that drifts slowly over time. Recent data challenge the notion that contextual drift is always slow and passive. In particular, changes in one's external environment or internal model induce discontinuities in memory that are reflected in sudden changes in neural activity, suggesting that context can shift abruptly. Furthermore, context change effects are sensitive to top-down goals, suggesting that contextual drift may be an active process. These findings call for revising models of the role of context in memory, in order to account for abrupt contextual shifts and the controllable nature of context change.
Leong*, Y. C., Radulescu*, A., Daniel, R., DeWoskin, V., & Niv, Y. (2017). Dynamic Interaction between Reinforcement Learning and Attention in Multidimensional Environments. Neuron, 93(2), 451–463. https://doi.org/10.1016/j.neuron.2016.12.040
Little is known about the relationship between attention and learning during decision making. Using eye tracking and multivariate pattern analysis of fMRI data, we measured participants' dimensional attention as they performed a trial-and-error learning task in which only one of three stimulus dimensions was relevant for reward at any given time. Analysis of participants' choices revealed that attention biased both value computation during choice and value update during learning. Value signals in the ventromedial prefrontal cortex and prediction errors in the striatum were similarly biased by attention. In turn, participants' focus of attention was dynamically modulated by ongoing learning. Attentional switches across dimensions correlated with activity in a frontoparietal attention network, which showed enhanced connectivity with the ventromedial prefrontal cortex between switches. Our results suggest a bidirectional interaction between attention and learning: attention constrains learning to relevant dimensions of the environment, while we learn what to attend to via trial and error.
Sharpe, M., Marchant, N., Whitaker, L., Richie, C., Zhang, Y., Campbell, E., Koivula, P., Necarsulmer, J., Mejias-Aponte, C., Morales, M., Pickel, J., Smith, J., Niv, Y., Shaham, Y., Harvey, B., & Schoenbaum, G. (2017). Lateral Hypothalamic GABAergic Neurons Encode Reward Predictions that Are Relayed to the Ventral Tegmental Area to Regulate Learning. Current Biology, 27(14), 2089––2100.e5. https://doi.org/10.1016/j.cub.2017.06.024
Eating is a learned process. Our desires for specific foods arise through experience. Both electrical stimulation and optogenetic studies have shown that increased activity in the lateral hypothalamus (LH) promotes feeding. Current dogma is that these effects reflect a role for LH neurons in the control of the core motivation to feed, and their activity comes under control of forebrain regions to elicit learned food-motivated behaviors. However, these effects could also reflect the storage of associative information about the cues leading to food in LH itself. Here, we present data from several studies that are consistent with a role for LH in learning. In the first experiment, we use a novel GAD-Cre rat to show that optogenetic inhibition of LH \(\gamma\)-aminobutyric acid (GABA) neurons restricted to cue presentation disrupts the rats' ability to learn that a cue predicts food without affecting subsequent food consumption. In the second experiment, we show that this manipulation also disrupts the ability of a cue to promote food seeking after learning. Finally, we show that inhibition of the terminals of the LH GABA neurons in ventral-tegmental area (VTA) facilitates learning about reward-paired cues. These results suggest that the LH GABA neurons are critical for storing and later disseminating information about reward-predictive cues.

2016

Eldar, E., Niv, Y., & Cohen, J. (2016). Do You See the Forest or the Tree? Neural Gain and Breadth Versus Focus in Perceptual Processing. Psychological Science, 27(12), 1632–1643. https://doi.org/10.1177/0956797616665578
When perceiving rich sensory information, some people may integrate its various aspects, whereas other people may selectively focus on its most salient aspects. We propose that neural gain modulates the trade-off between breadth and selectivity, such that high gain focuses perception on those aspects of the information that have the strongest, most immediate influence, whereas low gain allows broader integration of different aspects. We illustrate our hypothesis using a neural-network model of ambiguous-letter perception. We then report an experiment demonstrating that, as predicted by the model, pupil-diameter indices of higher gain are associated with letter perception that is more selectively focused on the letter's shape or, if primed, its semantic content. Finally, we report a recognition-memory experiment showing that the relationship between gain and selective processing also applies when the influence of different stimulus features is voluntarily modulated by task demands.
Eldar*, E., Rutledge*, ., Dolan, ., & Niv, Y. (2016). Mood as Representation of Momentum. Trends in Cognitive Sciences, 20(1), 15–24. https://doi.org/10.1016/j.tics.2015.07.010
Experiences affect mood, which in turn affects subsequent experiences. Recent studies suggest two specific principles. First, mood depends on how recent reward outcomes differ from expectations. Second, mood biases the way we perceive outcomes (e.g., rewards), and this bias affects learning about those outcomes. We propose that this two-way interaction serves to mitigate inefficiencies in the application of reinforcement learning to real-world problems. Specifically, we propose that mood represents the overall momentum of recent outcomes, and its biasing influence on the perception of outcomes 'corrects' learning to account for environmental dependencies. We describe potential dysfunctions of this adaptive mechanism that might contribute to the symptoms of mood disorders.
Kurth-Nelson, Z., O’Doherty, J. P., Barch, D. M., Denève, S., Durstewitz, D., Frank, M. J., Gordon, J. A., Mathew, S. J., Niv, Y., Ressler, K., & Tost, H. (2016). Computational Approaches for Studying Mechanisms of Psychiatric Disorders. In Computational Psychiatry. The MIT Press. https://doi.org/10.7551/mitpress/9780262035422.003.0005 (Original work published 2016)
Vast spectra of biological and psychological processes are potentially involved in the mechanisms of psychiatric illness. Computational neuroscience brings a diverse toolkit to bear on understanding these processes. This chapter begins by organizing the many ways in which computational neuroscience may provide insight to the mechanisms of psychiatric illness. It then contextualizes the quest for deep mechanistic understanding through the perspective that even partial or nonmechanistic understanding can be applied productively. Finally, it questions the standards by which these approaches...
Niv, Y., & Langdon, A. (2016). Reinforcement learning with Marr. Current Opinion in Behavioral Sciences, 11, 67–73. https://doi.org/http://dx.doi.org/10.1016/j.cobeha.2016.04.005
To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning – a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.
Radulescu, A., Daniel, R., & Niv, Y. (2016). The effects of aging on the interaction between reinforcement learning and attention. Psychology and Aging, 31(7), 747–757. https://doi.org/10.1037/pag0000112 (Original work published 2016)
Predicting the binding mode of flexible polypeptides to proteins is an important task that falls outside the domain of applicability of most small molecule and protein−protein docking tools. Here, we test the small molecule flexible ligand docking program Glide on a set of 19 non-\(\alpha\)-helical peptides and systematically improve pose prediction accuracy by enhancing Glide sampling for flexible polypeptides. In addition, scoring of the poses was improved by post-processing with physics-based implicit solvent MM- GBSA calculations. Using the best RMSD among the top 10 scoring poses as a metric, the success rate (RMSD ≤ 2.0 \AAfor the interface backbone atoms) increased from 21% with default Glide SP settings to 58% with the enhanced peptide sampling and scoring protocol in the case of redocking to the native protein structure. This approaches the accuracy of the recently developed Rosetta FlexPepDock method (63% success for these 19 peptides) while being over 100 times faster. Cross-docking was performed for a subset of cases where an unbound receptor structure was available, and in that case, 40% of peptides were docked successfully. We analyze the results and find that the optimized polypeptide protocol is most accurate for extended peptides of limited size and number of formal charges, defining a domain of applicability for this approach.
Schuck, N., Cai, M. B., Wilson, R., & Niv, Y. (2016). Human Orbitofrontal Cortex Represents a Cognitive Map of State Space. Neuron, 91(6), 1402–1412. https://doi.org/10.1016/j.neuron.2016.08.019
Although the orbitofrontal cortex (OFC) has been studied intensely for decades, its precise functions have remained elusive. We recently hypothesized that the OFC contains a “cognitive map” of task space in which the current state of the task is represented, and this representation is especially critical for behavior when states are unobservable from sensory input. To test this idea, we apply pattern-classification techniques to neuroimaging data from humans performing a decision-making task with 16 states. We show that unobservable task states can be decoded from activity in OFC, and decoding accuracy is related to task performance and the occurrence of individual behavioral errors. Moreover, similarity between the neural representations of consecutive states correlates with behavioral accuracy in corresponding state transitions. These results support the idea that OFC represents a cognitive map of task space and establish the feasibility of decoding state representations in humans using non-invasive neuroimaging.
Takahashi*, Y., Langdon*, A., Niv, Y., & Schoenbaum, G. (2016). Temporal Specificity of Reward Prediction Errors Signaled by Putative Dopamine Neurons in Rat VTA Depends on Ventral Striatum. Neuron, 91(1), 182–193. https://doi.org/10.1016/j.neuron.2016.05.015
Dopamine neurons signal reward prediction errors. This requires accurate reward predictions. It has been suggested that the ventral striatum provides these predictions. Here we tested this hypothesis by recording from putative dopamine neurons in the VTA of rats performing a task in which prediction errors were induced by shifting reward timing or number. In controls, the neurons exhibited error signals in response to both manipulations. However, dopamine neurons in rats with ipsilateral ventral striatal lesions exhibited errors only to changes in number and failed to respond to changes in timing of reward. These results, supported by computational modeling, indicate that predictions about the temporal specificity and the number of expected reward are dissociable and that dopaminergic prediction-error signals rely on the ventral striatum for the former but not the latter.
Arkadir, D., Radulescu, A., Raymond, D., Lubarr, N., Bressman, S., Mazzoni, P., & Niv, Y. (2016). DYT1 dystonia increases risk taking in humans. ELife, 5(JUN2016). https://doi.org/10.7554/eLife.14155
It has been difficult to link synaptic modification to overt behavioral changes. Rodent models of DYT1 dystonia, a motor disorder caused by a single gene mutation, demonstrate increased long-term potentiation and decreased long-term depression in corticostriatal synapses. Computationally, such asymmetric learning predicts risk taking in probabilistic tasks. Here we demonstrate abnormal risk taking in DYT1 dystonia patients, which is correlated with disease severity, thereby supporting striatal plasticity in shaping choice behavior in humans.
Cai, M. B., Schuck, N., Lee, ., Luxburg, ., Guyon, ., & Garnett, . (2016). A Bayesian method for reducing bias in neural representational similarity analysis. In Advances In Neural Information Processing Systems 29 (Issue Nips, pp. 4952–4960). Curran Associates, Inc. PDF: A Bayesian method for reducing bias in neural representational similarity analysis
In neuroscience, the similarity matrix of neural activity patterns in response to different sensory stimuli or under different cognitive states reflects the structure of neural representational space. Existing methods derive point estimations of neural activity patterns from noisy neural imaging data, and the similarity is calculated from these point estimations. We show that this approach translates structured noise from estimated patterns into spurious bias structure in the resulting similarity matrix, which is especially severe when signal-to-noise ratio is low and experimental conditions cannot be fully randomized in a cognitive task. We propose an alternative Bayesian framework for computing representational similarity in which we treat the covariance structure of neural activity patterns as a hyperparameter in a generative model of the neural data, and directly estimate this covariance structure from imaging data while marginalizing over the unknown activity patterns. Converting the estimated covariance structure into a correlation matrix offers a much less biased estimate of neural representational similarity. Our method can also simultaneously estimate a signal-to-noise map that informs where the learned representational structure is supported more strongly, and the learned covariance matrix can be used as a structured prior to constrain Bayesian estimation of neural activity patterns. Our code is freely available in Brain Imaging Analysis Kit (Brainiak) (https://github.com/IntelPNI/brainiak).
Chan, ., Niv*, Y., & Norman*, K. (2016). A probability distribution over latent causes, in the orbitofrontal cortex. Journal of Neuroscience, 36(30), 7817–7828. https://doi.org/10.1523/JNEUROSCI.0659-16.2016
The orbitofrontal cortex (OFC) has been implicated in both the representation of "state," in studies of reinforcement learning and decision making, and also in the representation of "schemas," in studies of episodic memory. Both of these cognitive constructs require a similar inference about the underlying situation or "latent cause" that generates our observations at any given time. The statistically optimal solution to this inference problem is to use Bayes' rule to compute a posterior probability distribution over latent causes. To test whether such a posterior probability distribution is represented in the OFC, we tasked human participants with inferring a probability distribution over four possible latent causes, based on their observations. Using fMRI pattern similarity analyses, we found that BOLD activity in the OFC is best explained as representing the (log-transformed) posterior distribution over latent causes. Furthermore, this pattern explained OFC activity better than other task-relevant alternatives, such as the most probable latent cause, the most recent observation, or the uncertainty over latent causes. ©2016 the authors.
Eldar, E., Cohen, J., & Niv, Y. (2016). Amplified selectivity in cognitive processing implements the neural gain model of norepinephrine function. The Behavioral and Brain Sciences, 39, e206. https://doi.org/10.1017/S0140525X15001776
Previous work has suggested that an interaction between local selective (e.g., glutamatergic) excitation and global gain modulation (via norepinephrine) amplifies selectivity in information processing. Mather et al. extend this existing theory by suggesting that localized gain modulation may further mediate this effect – an interesting prospect that invites new theoretical and experimental work.

2015

Wilson, R., & Niv, Y. (2015). Is Model Fitting Necessary for Model-Based fMRI?. PLoS Comput Biol, 11(6), e1004237. https://doi.org/10.1371/journal.pcbi.1004237
Model-based analysis of fMRI data is an important tool for investigating the computational role of different brain regions. With this method, theoretical models of behavior can be leveraged to find the brain structures underlying variables from specific algorithms, such as prediction errors in reinforcement learning. One potential weakness with this approach is that models often have free parameters and thus the results of the analysis may depend on how these free parameters are set. In this work we asked whether this hypothetical weakness is a problem in practice. We first developed general closed-form expressions for the relationship between results of fMRI analyses using different regressors, e.g., one corresponding to the true process underlying the measured data and one a model-derived approximation of the true generative regressor. Then, as a specific test case, we examined the sensitivity of model-based fMRI to the learning rate parameter in reinforcement learning, both in theory and in two previously-published datasets. We found that even gross errors in the learning rate lead to only minute changes in the neural results. Our findings thus suggest that precise model fitting is not always necessary for model-based fMRI. They also highlight the difficulty in using fMRI data for arbitrating between different models or model parameters. While these specific results pertain only to the effect of learning rate in simple reinforcement learning models, we provide a template for testing for effects of different parameters in other models.
Daniel, R., Schuck, N., & Niv, Y. (2015). How to divide and conquer the world, one step at a time. Proceedings of the National Academy of Sciences, 112(10), 2929–2930. https://doi.org/10.1073/pnas.1500975112 (Original work published 2015)
Dunsmoor, J., Niv, Y., Daw, N., & Phelps, E. (2015). Rethinking Extinction. Neuron, 88(1), 47–63. https://doi.org/10.1016/j.neuron.2015.09.028
Extinction serves as the leading theoretical framework and experimental model to describe how learned behaviors diminish through absence of anticipated reinforcement. In the past decade, extinction has moved beyond the realm of associative learning theory and behavioral experimentation in animals and has become a topic of considerable interest in the neuroscience of learning, memory, and emotion. Here, we review research and theories of extinction, both as a learning process and as a behavioral technique, and consider whether traditional understandings warrant a re-examination. We discuss the neurobiology, cognitive factors, and major computational theories, and revisit the predominant view that extinction results in new learning that interferes with expression of the original memory. Additionally, we reconsider the limitations of extinction as a technique to prevent the relapse of maladaptive behavior and discuss novel approaches, informed by contemporary theoretical advances, that augment traditional extinction methods to target and potentially alter maladaptive memories.
Eldar, E., & Niv, Y. (2015). Interaction between emotional state and learning underlies mood instability. Nature Communications, 6(1), 6149. https://doi.org/10.1038/ncomms7149 (Original work published 2015)
Intuitively, good and bad outcomes affect our emotional state, but whether the emotional state feeds back onto the perception of outcomes remains unknown. Here, we use behaviour and functional neuroimaging of human participants to investigate this bidirectional interaction, by comparing the evaluation of slot machines played before and after an emotion-impacting wheel-of-fortune draw. Results indicate that self-reported mood instability is associated with a positive-feedback effect of emotional state on the perception of outcomes. We then use theoretical simulations to demonstrate that such positive feedback would result in mood destabilization. Taken together, our results suggest that the interaction between emotional state and learning may play a significant role in the emergence of mood instability.
Gershman, S., & Niv, Y. (2015). Novelty and Inductive Generalization in Human Reinforcement Learning. Topics in Cognitive Science, 7(3), 391–415. https://doi.org/10.1111/tops.12138 (Original work published 2015)
In reinforcement learning (RL), a decision maker searching for the most rewarding option is often faced with the question: What is the value of an option that has never been tried before? One way to frame this question is as an inductive problem: How can I generalize my previous experi-ence with one set of options to a novel option? We show how hierarchical Bayesian inference can be used to solve this problem, and we describe an equivalence between the Bayesian model and temporal difference learning algorithms that have been proposed as models of RL in humans and animals. According to our view, the search for the best option is guided by abstract knowledge about the relationships between different options in an environment, resulting in greater search effi-ciency compared to traditional RL algorithms previously applied to human cognition. In two behav-ioral experiments, we test several predictions of our model, providing evidence that humans learn and exploit structured inductive knowledge to make predictions about novel options. In light of this model, we suggest a new interpretation of dopaminergic responses to novelty.
Gershman, S., Norman, K., & Niv, Y. (2015). Discovering latent causes in reinforcement learning. Current Opinion in Behavioral Sciences, 5, 43–50. https://doi.org/10.1016/j.cobeha.2015.07.007
Effective reinforcement learning hinges on having an appropriate state representation. But where does this representation come from? We argue that the brain discovers state representations by trying to infer the latent causal structure of the task at hand, and assigning each latent cause to a separate state. In this paper, we review several implications of this latent cause framework, with a focus on Pavlovian conditioning. The framework suggests that conditioning is not the acquisition of associations between cues and outcomes, but rather the acquisition of associations between latent causes and observable stimuli. A latent cause interpretation of conditioning enables us to begin answering questions that have frustrated classical theories: Why do extinguished responses sometimes return? Why do stimuli presented in compound sometimes summate and sometimes do not? Beyond conditioning, the principles of latent causal inference may provide a general theory of structure learning across cognitive domains.
Niv, Y., Daniel, R., Geana, A., Gershman, S., Leong, Y. C., Radulescu, A., & Wilson, R. (2015). Reinforcement Learning in Multidimensional Environments Relies on Attention Mechanisms. Journal of Neuroscience, 35(21), 8145–8157. https://doi.org/10.1523/JNEUROSCI.2978-14.2015 (Original work published 2015)
In recent years, ideas from the computational field of reinforcement learning have revolutionized the study of learning in the brain, famously providing new, precise theories of how dopamine affects learning in the basal ganglia. However, reinforcement learning algorithms are notorious for not scaling well to multidimensional environments, as is required for real-world learning. We hypothesized that the brain naturally reduces the dimensionality of real-world problems to only those dimensions that are relevant to predicting reward, and conducted an experiment to assess by what algorithms and with what neural mechanisms this "representation learning" process is realized in humans. Our results suggest that a bilateral attentional control network comprising the intraparietal sulcus, precuneus, and dorsolateral prefrontal cortex is involved in selecting what dimensions are relevant to the task at hand, effectively updating the task representation through trial and error. In this way, cortical attention mechanisms interact with learning in the basal ganglia to solve the "curse of dimensionality" in reinforcement learning.
Niv, Y., Langdon, A., & Radulescu, A. (2015). A free-choice premium in the basal ganglia. Trends in Cognitive Sciences, 19(1), 4–5. https://doi.org/10.1016/j.tics.2014.09.005 (Original work published 2015)
Apparently, the act of free choice confers value: when selecting between an item that you had previously chosen and an identical item that you had been forced to take, the former is often preferred. What could be the neural underpinnings of this free-choice bias in decision making? An elegant study recently published in Neuron suggests that enhanced reward learning in the basal ganglia may be the culprit.
Sharpe, M., Wikenheiser, A., Niv, Y., & Schoenbaum, G. (2015). The State of the Orbitofrontal Cortex. Neuron, 88(6), 1075–1077. https://doi.org/10.1016/j.neuron.2015.12.004
State representation is fundamental to behavior. However, identifying the true state of the world is challenging when explicit cues are ambiguous. Here, Bradfield and colleagues show that the medial OFC is critical for using associative information to discriminate ambiguous states. State representation is fundamental to behavior. However, identifying the true state of the world is challenging when explicit cues are ambiguous. Here, Bradfield and colleagues show that the medial OFC is critical for using associative information to discriminate ambiguous states.

2014

Gershman, S., Radulescu, A., Norman, K., & Niv, Y. (2014). Statistical Computations Underlying the Dynamics of Memory Updating. PLoS Computational Biology, 10(11), e1003939. https://doi.org/10.1371/journal.pcbi.1003939 (Original work published 2014)
Psychophysical and neurophysiological studies have suggested that memory is not simply a carbon copy of our experience: Memories are modified or new memories are formed depending on the dynamic structure of our experience, and specifically, on how gradually or abruptly the world changes. We present a statistical theory of memory formation in a dynamic environment, based on a nonparametric generalization of the switching Kalman filter. We show that this theory can qualitatively account for several psychophysical and neural phenomena, and present results of a new visual memory experiment aimed at testing the theory directly. Our experimental findings suggest that humans can use temporal discontinuities in the structure of the environment to determine when to form new memory traces. The statistical perspective we offer provides a coherent account of the conditions under which new experience is integrated into an old memory versus forming a new memory, and shows that memory formation depends on inferences about the underlying structure of our experience.
Solway*, A., Diuk*, C., Córdova, N., Yee, D., Barto, A., Niv, Y., & Botvinick, M. (2014). Optimal Behavioral Hierarchy. PLoS Computational Biology, 10(8), e1003779. https://doi.org/10.1371/journal.pcbi.1003779 (Original work published 2014)
Human behavior has long been recognized to display hierarchical structure: actions fit together into subtasks, which cohere into extended goal-directed activities. Arranging actions hierarchically has well established benefits, allowing behaviors to be represented efficiently by the brain, and allowing solutions to new tasks to be discovered easily. However, these payoffs depend on the particular way in which actions are organized into a hierarchy, the specific way in which tasks are carved up into subtasks. We provide a mathematical account for what makes some hierarchies better than others, an account that allows an optimal hierarchy to be identified for any set of tasks. We then present results from four behavioral experiments, suggesting that human learners spontaneously discover optimal action hierarchies.
Soto, F., Gershman, S., & Niv, Y. (2014). Explaining compound generalization in associative and causal learning through rational principles of dimensional generalization. Psychological Review, 121(3), 526–558. https://doi.org/10.1037/a0037018 (Original work published 2014)
How do we apply learning from one situation to a similar, but not identical, situation? The principles governing the extent to which animals and humans generalize what they have learned about certain stimuli to novel compounds containing those stimuli vary depending on a number of factors. Perhaps the best studied among these factors is the type of stimuli used to generate compounds. One prominent hypothesis is that different generalization principles apply depending on whether the stimuli in a compound are similar or dissimilar to each other. However, the results of many experiments cannot be explained by this hypothesis. Here, we propose a rational Bayesian theory of compound generalization that uses the notion of consequential regions, first developed in the context of rational theories of multidimensional generalization, to explain the effects of stimulus factors on compound generalization. The model explains a large number of results from the compound generalization literature, including the influence of stimulus modality and spatial contiguity on the summation effect, the lack of influence of stimulus factors on summation with a recovered inhibitor, the effect of spatial position of stimuli on the blocking effect, the asymmetrical generalization decrement in overshadowing and external inhibition, and the conditions leading to a reliable external inhibition effect. By integrating rational theories of compound and dimensional generalization, our model provides the first comprehensive computational account of the effects of stimulus factors on compound generalization, including spatial and temporal contiguity between components, which have posed long-standing problems for rational theories of associative and causal learning.
Wilson, R., Takahashi, Y., Schoenbaum, G., Niv, Y., Lee, ., Luxburg, ., Guyon, ., & Garnett, . (2014). Orbitofrontal Cortex as a Cognitive Map of Task Space. Neuron, 81(2), 267–279. https://doi.org/10.1016/j.neuron.2013.11.005 (Original work published 2014)
Orbitofrontal cortex (OFC) has long been known to play an important role in decision making. However, the exact nature of that role has remained elusive. Here, we propose a unifying theory of OFC function. We hypothesize that OFC provides an abstraction of currently available information in the form of a labeling of the current task state, which is used for reinforcement learning (RL) elsewhere in the brain. This function is especially critical when task states include unobservable information, for instance, from working memory. We use this framework to explain classic findings in reversal learning, delayed alternation, extinction, and devaluation as well as more recent findings showing the effect of OFC lesions on the firing of dopaminergic neurons in ventral tegmental area (VTA) in rodents performing an RL task. In addition, we generate a number of testable experimental predictions that can distinguish our theory from other accounts of OFC function. ?? 2014 Elsevier Inc.
Geana, A., & Niv, Y. (2014). Causal model comparison shows that human representation learning is not Bayesian. Cold Spring Harbor Symposia on Quantitative Biology, 79, 161–168. https://doi.org/10.1101/sqb.2014.79.024851
How do we learn what features of our multidimensional environment are relevant in a given task? To study the computational process underlying this type of "representation learning," we propose a novel method of causal model comparison. Participants played a probabilistic learning task that required them to identify one relevant feature among several irrelevant ones. To compare between two models of this learning process, we ran each model alongside the participant during task performance, making predictions regarding the values underlying the participant's choices in real time. To test the validity of each model's predictions, we used the predicted values to try to perturb the participant's learning process: We crafted stimuli to either facilitate or hinder comparison between the most highly valued features. A model whose predictions coincide with the learned values in the participant's mind is expected to be effective in perturbing learning in this way, whereas a model whose predictions stray from the true learning process should not. Indeed, we show that in our task a reinforcement-learning model could help or hurt participants' learning, whereas a Bayesian ideal observer model could not. Beyond informing us about the notably suboptimal (but computationally more tractable) substrates of human representation learning, our manipulation suggests a sensitive method for model comparison, which allows us to change the course of people's learning in real time.

2013

Diuk, C., Schapiro, A., Córdova, N., Ribas-Fernandes, J., Niv, Y., & Botvinick, M. (2013). Divide and conquer: Hierarchical reinforcement learning and task decomposition in humans. In Computational and Robotic Models of the Hierarchical Organization of Behavior (Vols. 9783642398, pp. 271–291). https://doi.org/10.1007/978-3-642-39875-9_12
The field of computational reinforcement learning (RL) has proved extremely useful in research on human and animal behavior and brain function. However, the simple forms of RL considered in most empirical research do not scale well, making their relevance to complex, real-world behavior unclear. In computational RL, one strategy for addressing the scaling problem is to intro-duce hierarchical structure, an approach that has intriguing parallels with human behavior. We have begun to investigate the potential relevance of hierarchical RL (HRL) to human and animal behavior and brain function. In the present chapter, we first review two results that show the existence of neural correlates to key predictions from HRL. Then, we focus on one aspect of this work, which deals with the question of how action hierarchies are initially established. Work in HRL suggests that hierarchy learning is accomplished by identifying useful subgoal states, and that this might in turn be accomplished through a structural analysis of the given task domain. We review results from a set of behavioral and neuroimaging experiments, in which we have investigated the relevance of these ideas to human learning and decision making.
Eldar, E., Cohen, J., & Niv, Y. (2013). The effects of neural gain on attention and learning. Nature Neuroscience, 16(8), 1146–1153. https://doi.org/10.1038/nn.3428
Attention is commonly thought to be manifest through local variations in neural gain. However, what would be the effects of brain-wide changes in gain? We hypothesized that global fluctuations in gain modulate the breadth of attention and the degree to which processing is focused on aspects of the environment to which one is predisposed to attend. We found that measures of pupil diameter, which are thought to track levels of locus coeruleus norepinephrine activity and neural gain, were correlated with the degree to which learning was focused on stimulus dimensions that individual human participants were more predisposed to process. In support of our interpretation of this effect in terms of global changes in gain, we found that the measured pupillary and behavioral variables were strongly correlated with global changes in the strength and clustering of functional connectivity, as brain-wide fluctuations of gain would predict.
Diuk, C., Tsai, K., Wallis, ., Botvinick, M., & Niv, Y. (2013). Hierarchical Learning Induces Two Simultaneous, But Separable, Prediction Errors in Human Basal Ganglia. Journal of Neuroscience, 33(13), 5797–5805. https://doi.org/10.1523/JNEUROSCI.5445-12.2013 (Original work published 2013)
Studies suggest that dopaminergic neurons report a unitary, global reward prediction error signal. However, learning in complex real-life tasks, in particular tasks that show hierarchical structure, requires multiple prediction errors that may coincide in time. We used functional neuroimaging to measure prediction error signals in humans performing such a hierarchical task involving simultaneous, uncorrelated prediction errors. Analysis of signals in a priori anatomical regions of interest in the ventral striatum and the ventral tegmental area indeed evidenced two simultaneous, but separable, prediction error signals corresponding to the two levels of hierarchy in the task. This result suggests that suitably designed tasks may reveal a more intricate pattern of firing in dopaminergic neurons. Moreover, the need for downstream separation of these signals implies possible limitations on the number of different task levels that we can learn about simultaneously.
Gershman, S., Jones, C., Norman, K., Monfils, M.-H., & Niv, Y. (2013). Gradual extinction prevents the return of fear: implications for the discovery of state. Front Behav Neurosci, 7, 164. https://doi.org/10.3389/fnbeh.2013.00164

Fear memories are notoriously difficult to erase, often recovering over time. The longstanding explanation for this finding is that, in extinction training, a new memory is formed that competes with the old one for expression but does not otherwise modify it. This explanation is at odds with traditional models of learning such as Rescorla-Wagner and reinforcement learning. A possible reconciliation that was recently suggested is that extinction training leads to the inference of a new state that is different from the state that was in effect in the original training. This solution, however, raises a new question: under what conditions are new states, or new memories formed? Theoretical accounts implicate persistent large prediction errors in this process. As a test of this idea, we reasoned that careful design of the reinforcement schedule during extinction training could reduce these prediction errors enough to prevent the formation of a new memory, while still decreasing reinforcement sufficiently to drive modification of the old fear memory. In two Pavlovian fear-conditioning experiments, we show that gradually reducing the frequency of aversive stimuli, rather than eliminating them abruptly, prevents the recovery of fear. This finding has important implications for theories of state discovery in reinforcement learning.

Gershman, S., & Niv, Y. (2013). Perceptual estimation obeys Occam’s razor. Frontiers in Psychology, 4(SEP), 623. https://doi.org/10.3389/fpsyg.2013.00623
Theoretical models of unsupervised category learning postulate that humans “invent” categories to accommodate new patterns, but tend to group stimuli into a small number of categories. This “Occam's razor” principle is motivated by normative rules of statistical inference. If categories influence perception, then one should find effects of category invention on simple perceptual estimation. In a series of experiments, we tested this prediction by asking participants to estimate the number of colored circles on a computer screen, with the number of circles drawn from a color-specific distribution. When the distributions associated with each color overlapped substantially, participants' estimates were biased toward values intermediate between the two means, indicating that subjects ignored the color of the circles and grouped different-colored stimuli into one perceptual category. These data suggest that humans favor simpler explanations of sensory inputs. In contrast, when the distributions associated with each color overlapped minimally, the bias was reduced (i.e., the estimates for each color were closer to the true means), indicating that sensory evidence for more complex explanations can override the simplicity bias. We present a rational analysis of our task, showing how these qualitative patterns can arise from Bayesian computations.
Niv, Y. (2013). Neuroscience: Dopamine ramps up. Nature, 500(7464), 533–535. https://doi.org/10.1038/500533a
We thought we had figured out dopamine, a neuromodulator involved in everything from learning to addiction. But the finding that dopamine levels ramp up as rats navigate to a reward may overthrow current theories. See Letter p.575
Schoenbaum, G., Stalnaker, T., & Niv, Y. (2013). How Did the Chicken Cross the Road? With Her Striatal Cholinergic Interneurons, Of Course. Neuron, 79(1), 3–6. https://doi.org/10.1016/j.neuron.2013.06.033
Recognizing when the world changes is fundamental for normal learning. In this issue of Neuron, Bradfield etal. (2013) show that cholinergic interneurons in dorsomedial striatum are critical to the process whereby new states of the world are appropriately registered and retrieved during associative learning

2012

Gershman, S., & Niv, Y. (2012). Exploring a latent cause theory of classical conditioning. Learn Behav, 40(3), 255–268. https://doi.org/10.3758/s13420-012-0080-8 (Original work published 2012)
We frame behavior in classical conditioning experiments as the product of normative statistical inference. According to this theory, animals learn an internal model of their environment from experience. The basic building blocks of this internal model are latent causes-explanatory constructs inferred by the animal that partition observations into coherent clusters. Generalization of conditioned responding from one cue to another arises from the animal's inference that the cues were generated by the same latent cause. Through a wide range of simulations, we demonstrate where the theory succeeds and where it fails as a general account of classical conditioning.
Lucantonio, F., Stalnaker, T., Shaham, Y., Niv, Y., & Schoenbaum, G. (2012). The impact of orbitofrontal dysfunction on cocaine addiction. Nature Neuroscience, 15(3), 358–366. https://doi.org/10.1038/nn.3014
Cocaine addiction is characterized by poor judgment and maladaptive decision-making. Here we review evidence implicating the orbitofrontal cortex in such behavior. This evidence suggests that cocaine-induced changes in orbitofrontal cortex disrupt the representation of states and transition functions that form the basis of flexible and adaptive 'model-based' behavioral control. By impairing this function, cocaine exposure leads to an overemphasis on less flexible, maladaptive 'model-free' control systems. We propose that such an effect accounts for the complex pattern of maladaptive behaviors associated with cocaine addiction.
Niv, Y., Edlund, J., Dayan, P., & O’Doherty, J. (2012). Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain. Journal of Neuroscience, 32(2), 551–562. https://doi.org/10.1523/JNEUROSCI.5498-10.2012
Humans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.
Wilson, R., & Niv, Y. (2012). Inferring Relevance in a Changing World. Frontiers in Human Neuroscience, 5(JANUARY 2012), 189. https://doi.org/10.3389/fnhum.2011.00189
Reinforcement learning models of human and animal learning usually concentrate on how we learn the relationship between different stimuli or actions and rewards. However, in real-world situations "stimuli" are ill-defined. On the one hand, our immediate environment is extremely multidimensional. On the other hand, in every decision making scenario only a few aspects of the environment are relevant for obtaining reward, while most are irrelevant. Thus a key question is how do we learn these relevant dimensions, that is, how do we learn what to learn about? We investigated this process of "representation learning" experimentally, using a task in which one stimulus dimension was relevant for determining reward at each point in time. As in real life situations, in our task the relevant dimension can change without warning, adding ever-present uncertainty engendered by a constantly changing environment. We show that human performance on this task is better described by a suboptimal strategy based on selective attention and serial-hypothesis-testing rather than a normative strategy based on probabilistic inference. From this, we conjecture that the problem of inferring relevance in general scenarios is too computationally demanding for the brain to solve optimally. As a result the brain utilizes approximations, employing these even in simplified scenarios in which optimal representation learning is tractable, such as the one in our experiment.

2011

Niv, Y., & Chan, S. (2011). On the value of information and other rewards. Nature Neuroscience, 14(9), 1095–1097. https://doi.org/10.1038/nn.2918
Knowledge is not just power. Even if advance information can not influence an upcoming event, people (and animals) prefer to know ahead of time what the outcome will be. According to the firing patterns of neurons in the lateral habenula, from the brain's perspective, knowledge is also water—or at least its equivalent in terms of reward.
Takahashi, Y., Roesch, M., Wilson, R., Toreson, K., O’Donnell, P., Niv, Y., & Schoenbaum, G. (2011). Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nature Neuroscience, 14(12), 1590–1597. https://doi.org/10.1038/nn.2957 (Original work published 2011)
The orbitofrontal cortex has been hypothesized to carry information regarding the value of expected rewards. Such information is essential for associative learning, which relies on comparisons between expected and obtained reward for generating instructive error signals. These error signals are thought to be conveyed by dopamine neurons. To test whether orbitofrontal cortex contributes to these error signals, we recorded from dopamine neurons in orbitofrontal-lesioned rats performing a reward learning task. Lesions caused marked changes in dopaminergic error signaling. However, the effect of lesions was not consistent with a simple loss of information regarding expected value. Instead, without orbitofrontal input, dopaminergic error signals failed to reflect internal information about the impending response that distinguished externally similar states leading to differently valued future rewards. These results are consistent with current conceptualizations of orbitofrontal cortex as supporting model-based behavior and suggest an unexpected role for this information in dopaminergic error signaling.