Publications

Starting in 2022, the lab has decided to post here only the archival, open-access version of our publications. This is part of the movement to emphasize quality and content over the impact factor or prestige of the journal a paper is published in. Full citations (for referencing papers in your own work) can be found on pubmed and/or within the archival version, which will be updated once a paper is accepted for publication after peer review.

Asterisk (*) denotes equal contribution

Advanced Filters

Working Papers

Bein, O., & Niv, Y. Schemas, reinforcement learning, and the medial prefrontal cortex. https://doi.org/10.31234/osf.io/spxq9 (Original work published 2023)

Schemas are rich and complex knowledge structures about the typical unfolding of events in a context. For example, a schema of a lovely dinner at a restaurant. Schemas are central in psychology and neuroscience. Here, we suggest that reinforcement learning (RL), a computational theory of learning the structure of the world and relevant goal-oriented behavior, underlies schema learning. We synthesize literature about schemas and RL to offer that three RL principles might govern the learning of schemas: learning via prediction errors, constructing hierarchical knowledge using hierarchical RL, and dimensionality reduction through learning a simplified and abstract representation of the world. We then suggest that the orbito-medial prefrontal cortex is involved in both schemas and RL due to its involvement in dimensionality reduction and in guiding memory reactivation through interactions with posterior brain regions. Finally, we hypothesize that the amount of dimensionality reduction might underlie gradients of involvement along the ventral-dorsal and posterior-anterior axes of the orbito-medial prefrontal cortex. More specific and detailed representations might engage the ventral and posterior parts, while abstraction might shift representations toward the dorsal and anterior parts of the medial prefrontal cortex.

PDF

2023

Amir, N., Niv, Y., & Langdon, A. (2023). States as goal-directed concepts: an epistemic approach to state-representation learning. Conference on Neural Information Processing Systems (NeurIPS). https://doi.org/10.48550/arXiv.2312.02367 (Original work published 2023)

Our goals fundamentally shape how we experience the world. For example, when we are hungry, we tend to view objects in our environment according to whether or not they are edible (or tasty). Alternatively, when we are cold, we may view the very same objects according to their ability to produce heat. Computational theories of learning in cognitive systems, such as reinforcement learning, use the notion of "state-representation" to describe how agents decide which features of their environment are behaviorally-relevant and which can be ignored. However, these approaches typically assume "ground-truth" state representations that are known by the agent, and reward functions that need to be learned. Here we suggest an alternative approach in which state-representations are not assumed veridical, or even pre-defined, but rather emerge from the agent's goals through interaction with its environment. We illustrate this novel perspective by inferring the goals driving rat behavior in an odor-guided choice task and discuss its implications for developing, from first principles, an information-theoretic account of goal-directed state representation learning and behavior.

Pisupati, S., Berwian, I., Chiu, J., Ren, Y., & Niv, Y. (2023). Human inductive biases for aversive continual learning — a hierarchical Bayesian nonparametric model. Proceedings of Machine Learning Research.

Humans and animals often display remarkable continual learning abilities, adapting quickly to changing environments while retaining, reusing, and accumulating old knowledge over a lifetime. Unfortunately, in environments with adverse outcomes, the inductive biases supporting such forms of learning can turn maladaptive, yielding persistent negative beliefs that are hard to extinguish, such as those prevalent in anxiety disorders. Here, we present and model human behavioral data from a fear-conditioning task with changing latent contexts, in which participants had to predict whether visual stimuli would be followed by an aversive scream. We show that participants’ learning in our task spans three different regimes — with old knowledge either being updated, discarded (forgotten) or retained and reused in new contexts (remembered) by different participants. The latter regime corresponds to (maladaptive) spontaneous recovery of fear. We demonstrate using simulations that these behavioral regimes can be captured by varying inductive biases in Bayesian non-parametric models of contextual learning. In particular, we show that the “remembering” regime can be produced by “persistent” variants of hierarchical Dirichlet process priors over contexts and negatively biased “deterministic” beta distribution priors over outcomes. Such inductive biases correspond well to widely observed “core beliefs” that may have adaptive value in some lifelong-learning environments, at the cost of being maladaptive in other environments and tasks such as ours. Our work offers a tractable window into human inductive biases for continual learning algorithms, and could potentially help identify individual differences in learning strategies relevant for response to psychotherapy. 

 

PDF
Rouhani, N., Niv, Y., Frank, M. J., & Schwabe, L. (2023). Multiple routes to enhanced memory for emotionally relevant events. https://doi.org/10.1016/j.tics.2023.06.006 (Original work published 2023)

Events associated with aversive or rewarding outcomes are prioritized in memory. This memory boost is commonly attributed to the elicited affective response, closely linked to noradrenergic and dopaminergic modulation of hippocampal plasticity. Here, we review and compare this ‘affect’ mechanism to an additional, recently discovered, ‘prediction’ mechanism whereby memories are strengthened by the extent to which outcomes deviate from expectations, that is, by prediction errors. The mnemonic impact of prediction errors is separate from the affective outcome itself and has a distinct neural signature. While both routes enhance memory, these mechanisms are linked to different, and sometimes opposing, predictions for memory integration. We discuss new findings that highlight mechanisms by which emotional events strengthen, integrate, and segment memory.

PDF
Zorowitz, S., Solis, J., Niv, Y., & Bennett, D. (2023). Inattentive responding can induce spurious associations between task behaviour and symptom measures. https://doi.org/10.1038/s41562-023-01640-7 (Original work published 2023)

Although online samples have many advantages for psychiatric research, some potential pitfalls of this approach are not widely understood. Here we detail circumstances in which spurious correlations may arise between task behaviour and symptom scores. The problem arises because many psychiatric symptom surveys have asymmetric score distributions in the general population, meaning that careless responders on these surveys will show apparently elevated symptom levels. If these participants are similarly careless in their task performance, this may result in a spurious association between symptom scores and task behaviour. We demonstrate this pattern of results in two samples of participants recruited online (total N = 779) who performed one of two common cognitive tasks. False-positive rates for these spurious correlations increase with sample size, contrary to common assumptions. Excluding participants flagged for careless responding on surveys abolished the spurious correlations, but exclusion based on task performance alone was less effective.

PDF
Zorowitz, S., & Niv, Y. (2023). Improving the reliability of cognitive task measures: A narrative review. https://doi.org/10.1016/j.bpsc.2023.02.004 (Original work published 2023)

Cognitive tasks are capable of providing researchers with crucial insights into the relationship between cognitive processing and psychiatric phenomena. However, many recent studies have found that task measures exhibit poor reliability, which hampers their usefulness for individual-differences research. Here we provide a narrative review of approaches to improve the reliability of cognitive task measures. Specifically, we introduce a taxonomy of experiment design and analysis strategies for improving task reliability. Where appropriate, we highlight studies that are exemplary for improving the reliability of specific task measures. We hope that this article can serve as a helpful guide for experimenters who wish to design a new task, or improve an existing one, to achieve sufficient reliability for use in individual-differences research.

PDF
Berwian, I. M., Pisupati, S., & Niv, Y. (2023). A Reinforcement Learning Framework to Illuminate Change Mechanisms Underlying Specific Psychotherapy Interventions. https://doi.org/10.1016/j.biopsych.2023.02.151 (Original work published 2023)
PDF
Bedder, R. L., Pisupati, S., & Niv, Y. (2023). Modelling Rumination as a State-Inference Process. Cognitive Science Conference Proceedings 2023. Referenced from psyarxiv.com: Modelling Rumination as a State-Inference Process (Original work published 2023)

Rumination is a kind of repetitive negative thinking that involves prolonged sampling of negative episodes from one’s past, typically prompted by a present negative experience. We model rumination as an attempt at hidden-state inference, formalized as a partially-observable Markov decision process (POMDP). Using this allegorical model, we demonstrate conditions under which continuous, prolonged collection of samples from memory is the optimal policy. Consistent with phenomenological observations from clinical and experimental work, we show that prolonged sampling (i.e., chronic rumination), formalized as needing to sample more evidence before selecting an action, is required when possible negative outcomes increase in magnitude, when states of the world with negative outcomes are a priori more likely, and when samples are more variable than expected. By demonstrating that prolonged sampling may allow for optimal action selection under certain environmental conditions, we show how rumination may be adaptive for solving particular problems.

PDF

2022

Barbosa, J., Stein, H., Zorowitz, S., Niv, Y., Summerfield, C., Soto-Faraco, S., & Hyafil, A. (2022). A practical guide for studying human behavior in the lab. https://doi.org/10.3758/s13428-022-01793-9 (Original work published 2022)

In the last few decades, the field of neuroscience has witnessed major technological advances that have allowed researchers to measure and control neural activity with great detail. Yet, behavioral experiments in humans remain an essential approach to investigate the mysteries of the mind. Their relatively modest technological and economic requisites make behavioral research an attractive and accessible experimental avenue for neuroscientists with very diverse backgrounds. However, like any experimental enterprise, it has its own inherent challenges that may pose practical hurdles, especially to less experienced behavioral researchers. Here, we aim at providing a practical guide for a steady walk through the workflow of a typical behavioral experiment with human subjects. This primer concerns the design of an experimental protocol, research ethics, and subject care, as well as best practices for data collection, analysis, and sharing. The goal is to provide clear instructions for both beginners and experienced researchers from diverse backgrounds in planning behavioral experiments.

PDF

Senior faculty are incredibly powerful. In a two-page tenure letter, they can make or break a career. This power has an outsized impact on scholars with marginalized identities, such as Black academics, who are promoted with tenure at lower rates than their White colleagues. We suggest that this difference in tenure rates is due to an implicit, overly narrow definition of academic excellence that does not recognize all contributions that Black scholars make to their departments, institutions and academia in general, as well as the many invisible extra burdens of mentoring and representation that these scholars bear. Our goal is to empower letter-writers to counteract these factors and promote the academic culture we all want to support. Towards this end, and inspired by Tema Okun’s (2021) antidotes to “White supremacy culture” in academia, we propose to faculty with majority privilege a set of practical steps for writing inclusive, anti-racist tenure letters. Our recommendations address what to do before writing the letter, what to include (and not include) in the letter itself, and what to do after writing the letter to further support our excellent colleagues. Written from the perspective of USA-based, mostly non-Black, academics and non-academics in STEM fields who are learning about and working toward Black liberation in academia, we hope these recommendations, and their future refinement, can support widespread ongoing work toward an inclusive academia that appreciates and rewards diverse ways of doing, learning and knowing.

Realistic and complex decision tasks often allow for many possible solutions. How do we find the correct one? Introspection suggests a process of trying out solutions one after the other until success. However, such methodical serial testing may be too slow, especially in environments with noisy feedback. Alternatively, the underlying learning process may involve implicit reinforcement learning that learns about many possibilities in parallel. Here we designed a multi-dimensional probabilistic active-learning task tailored to study how people learn to solve such complex problems. Participants configured three-dimensional stimuli by selecting features for each dimension and received probabilistic reward feedback. We manipulated task complexity by changing how many feature dimensions were relevant to maximizing reward, as well as whether this information was provided to the participants. To investigate how participants learn the task, we examined models of serial hypothesis testing, feature-based reinforcement learning, and combinations of the two strategies. Model comparison revealed evidence for hypothesis testing that relies on reinforcement-learning when selecting what hypothesis to test. The extent to which participants engaged in hypothesis testing depended on the instructed task complexity: people tended to serially test hypotheses when instructed that there were fewer relevant dimensions, and relied more on gradual and parallel learning of feature values when the task was more complex. This demonstrates a strategic use of task information to balance the costs and benefits of the two methods of learning.

Cognitive tasks are capable of providing researchers with crucial insights into the re- lationship between cognitive processing and psychiatric phenomena across individuals. However, many recent studies have found that task measures exhibit poor reliability, which hampers their utility for individual-differences research. Here we provide a nar- rative review of approaches to improve the reliability of cognitive task measures. First, we review methods of calculating reliability and discuss some nuances that are specific to cognitive tasks. Then, we introduce a taxonomy of approaches for improving task reliability. Where appropriate, we highlight studies that are exemplary for improving the reliability of specific task measures. We hope that this article can serve as a helpful guide for experimenters who wish to design a new task, or improve an existing one, to achieve sufficient reliability for use in individual-differences research.

Pisupati, S., & Niv, Y. (2022). The challenges of lifelong learning in biological and artificial systems. Trends in Cognitive Sciences.

How do biological systems learn continuously throughout their lifespans, adapting to change while retaining old knowledge, and how can these principles be applied to artificial learning systems? In this Forum article we outline challenges and strategies of ‘lifelong learning’ in biological and artificial systems, and argue that a collaborative study of each system’s failure modes can benefit both.

PDF
Song, M., Jones, C. E., Monfils, M.-H., & Niv, Y. (2022). Explaining the effectiveness of fear extinction through latent-cause inference. Neurons, Behavior, Data Analysis, and Theory. PDF: Explaining the effectiveness of fear extinction through latent-cause inference
Acquiring fear responses to predictors of aversive outcomes is crucial for survival. At the same time, it is important to be able to modify such associations when they are maladaptive, for instance in treating anxiety and trauma-related disorders. Standard extinction procedures can reduce fear temporarily, but with sufficient delay or with reminders of the aversive experience, fear often returns. The latent-cause inference framework explains the return of fear by presuming that animals learn a rich model of the environment, in which the standard extinction procedure triggers the inference of a new latent cause, preventing the unlearning of the original aversive associations. This computational framework had previously inspired an alternative extinction paradigm – gradual extinction – which indeed was shown to be more effective in reducing the return of fear. However, the original framework was not sufficient to explain the pattern of results seen in the experiments. Here, we propose a formal model to explain the effectiveness of gradual extinction in reducing spontaneous recovery and reinstatement effects, in contrast to the ineffectiveness of standard extinction and a gradual reverse control procedure. We demonstrate through quantitative simulation that our model can explain qualitative behavioral differences across different extinction procedures as seen in the empirical study. We verify the necessity of several key assumptions added to the latent-cause framework, which suggest potential general principles of animal learning and provide novel predictions for future experiments.
Song, M., Takahashi, Y., Burton, A., Roesch, M., Schoenbaum, G., Niv, Y., & Langdon, A. (2022). Minimal cross-trial generalization in learning the representation of an odor-guided choice task. PLOS Computational Biology, 18(3). PDF: Minimal cross-trial generalization in learning the representation of an odor-guided choice task
There is no single way to represent a task. Indeed, despite experiencing the same task events and contingencies, different subjects may form distinct task representations. As experimenters, we often assume that subjects represent the task as we envision it. However, such a representation cannot be taken for granted, especially in animal experiments where we cannot deliver explicit instruction regarding the structure of the task. Here, we tested how rats represent an odor-guided choice task in which two odor cues indicated which of two responses would lead to reward, whereas a third odor indicated free choice among the two responses. A parsimonious task representation would allow animals to learn from the forced trials what is the better option to choose in the free-choice trials. However, animals may not necessarily generalize across odors in this way. We fit reinforcement-learning models that use different task representations to trial-by-trial choice behavior of individual rats performing this task, and quantified the degree to which each animal used the more parsimonious representation, generalizing across trial types. Model comparison revealed that most rats did not acquire this representation despite extensive experience. Our results demonstrate the importance of formally testing possible task representations that can afford the observed behavior, rather than assuming that animals’ task representations abide by the generative task structure that governs the experimental design.
Langdon, A., Botvinick, M., Nakahara, H., Tanaka, K., Matsumoto, M., & Kanai, R. (2022). Meta-learning, social cognition and consciousness in brains and machines. Neural Networks, 145, 80-89. PDF: Meta-learning, social cognition and consciousness in brains and machines
The intersection between neuroscience and artificial intelligence (AI) research has created synergistic effects in both fields. While neuroscientific discoveries have inspired the development of AI architectures, new ideas and algorithms from AI research have produced new ways to study brain mechanisms. A well-known example is the case of reinforcement learning (RL), which has stimulated neuroscience research on how animals learn to adjust their behavior to maximize reward. In this review article, we cover recent collaborative work between the two fields in the context of meta-learning and its extension to social cognition and consciousness. Meta-learning refers to the ability to learn how to learn, such as learning to adjust hyperparameters of existing learning algorithms and how to use existing models and knowledge to efficiently solve new tasks. This meta-learning capability is important for making existing AI systems more adaptive and flexible to efficiently solve new tasks. Since this is one of the areas where there is a gap between human performance and current AI systems, successful collaboration should produce new ideas and progress. Starting from the role of RL algorithms in driving neuroscience, we discuss recent developments in deep RL applied to modeling prefrontal cortex functions. Even from a broader perspective, we discuss the similarities and differences between social cognition and meta-learning, and finally conclude with speculations on the potential links between intelligence as endowed by model-based RL and consciousness. For future work we highlight data efficiency, autonomy and intrinsic motivation as key research areas for advancing both fields.

2021

Niv, ., Hitchcock, ., Berwian, I., & Schoen, . (2021). Toward Precision Cognitive Behavioral Therapy via Reinforcement Learning Theory (Ch. 12). In LM Williams and LM Hacks (Eds). Precision Psychiatry. American Psychiatric Association. Publisher’s Version: Toward Precision Cognitive Behavioral Therapy via Reinforcement Learning Theory (Ch. 12)
The demonstration that human decision-making can systematically violate the laws of rationality has had a wide impact on behavioural sciences. In this study, we use a pupillary index to adjudicate between two existing hypotheses about how irrational biases emerge: the hypothesis that biases result from fast, effortless processing and the hypothesis that biases result from more extensive integration. While effortless processing is associated with smaller pupillary responses, more extensive integration is associated with larger pupillary responses. Thus, we tested the relationship between pupil response and choice behaviour on six different foundational decision-making tasks that are classically used to demonstrate irrational biases. Participants demonstrated the expected systematic biases and their pupillary measurements satisfied pre-specified quality checks. Planned analyses returned inconclusive results, but exploratory examination of the data revealed an association between high pupillary responses and biased decisions. The findings provide preliminary support for the hypothesis that biases arise from gradual information integration.

Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for different actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural findings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of ‘value’ in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.

How do we evaluate a group of people after a few negative experiences with some members but mostly positive experiences otherwise? How do rare experiences influence our overall impression? We show that rare events may be overweighted due to normative inference of the hidden causes that are believed to generate the observed events. We propose a Bayesian inference model that organizes environmental statistics by combining similar events and separating outlying observations. Relying on the model’s inferred latent causes for group evaluation overweights rare or variable events. We tested the model’s predictions in eight experiments where participants observed a sequence of social or non-social behaviours and estimated their average. As predicted, estimates were biased toward sparse events when estimating after seeing all observations, but not when tracking a summary value as observations accrued. Our results suggest that biases in evaluation may arise from inferring the hidden causes of group members’ behaviours.
Hitchcock, P., Forman, E., Rothstein, N., Zhang, F., Kounios, J., Niv, Y., & Sims, C. (2021). Rumination derails reinforcement learning with possible implications for ineffective behavior. Clinical Psychological Science. PDF: Rumination derails reinforcement learning with possible implications for ineffective behavior

How does rumination affect reinforcement learning — the ubiquitous process by which we adjust behavior after error in order to behave more effectively in the future? In a within-subject design (n=49), we tested whether experimentally induced rumination disrupts reinforcement learning in a multidimensional learning task previously shown to rely on selective attention. Rumination impaired performance, yet unexpectedly this impairment could not be attributed to decreased attentional breadth (quantified using a “decay” parameter in a computational model). Instead, trait rumination (between subjects) was associated with higher decay rates (implying narrower attention), yet not with impaired performance. Our task-performance results accord with the possibility that state rumination promotes stress-generating behavior in part by disrupting reinforcement learning. The trait-rumination finding accords with the predictions of a prominent model of trait rumination (the attentional-scope model). More work is needed to understand the specific mechanisms by which state rumination disrupts reinforcement learning.

Memory helps guide behavior, but which experiences from the past are prioritized? Classic models of learning posit that events associated with unpredictable outcomes as well as, paradoxically, predictable outcomes, deploy more attention and learning for those events. Here, we test reinforcement learning and subsequent memory for those events, and treat signed and unsigned reward prediction errors (RPEs), experienced at the reward-predictive cue or reward outcome, as drivers of these two seemingly contradictory signals. By fitting reinforcement learning models to behavior, we find that both RPEs contribute to learning by modulating a dynamically changing learning rate. We further characterize the effects of these RPE signals on memory, and show that both signed and unsigned RPEs enhance memory, in line with midbrain dopamine and locus-coeruleus modulation of hippocampal plasticity, thereby reconciling separate findings in the literature.
Radulescu, A., Shin, Y. S., & Niv, Y. (2021). Human representation learning. Annual Reviews in Neuroscience. PDF: Human representation learning

The central theme of this review is the dynamic interaction between infor- mation selection and learning. We pose a fundamental question about this interaction: How do we learn what features of our experiences are worth learning about? In humans, this process depends on attention and memory, two cognitive functions that together constrain representations of the world to features that are relevant for goal attainment. Recent evidence suggests that the representations shaped by attention and memory are themselves in- ferred from experience with each task. We review this evidence and place it in the context of work that has explicitly characterized representation learning as statistical inference. We discuss how inference can be scaled to real-world decisions by approximating beliefs based on a small number of experiences. Finally, we highlight some implications of this inference process for human decision-making in social environments.

Understanding the brain requires us to answer both what the brain does, and how it does it. Using a series of examples, I make the case that behavior is often more useful than neuroscientific measurements for answering the first question. Moreover, I show that even for “how” questions that pertain to neural mechanism, a well-crafted behavioral paradigm can offer deeper insight and stronger constraints on computational and mechanistic models than do many highly challenging (and very expensive) neural studies. I conclude that behavioral, rather than neuroscientific research, is essential for understanding the brain, contrary to the opinion of prominent funding bodies and scientific journals, who erroneously place neural data on a pedestal and consider behavior to be subsidiary.

Much of traditional neuroeconomics proceeds from the hypothesis that value is reified in the brain, that is, that there are neurons or brain regions whose responses serve the discrete purpose of encoding value. This hypothesis is supported by the finding that the activity of many neurons covaries with subjective value as estimated in specific tasks and has led to the idea that the primary function of the orbitofrontal cortex is to compute and signal economic value. Here we consider an alternative: that economic value, in the cardinal, common-currency sense, is not represented in the brain and used for choice by default. This idea is motivated by consideration of the economic concept of value, which places important epistemic constraints on our ability to identify its neural basis. It is also motivated by the behavioral economics literature, especially work on heuristics, which proposes value-free process models for much if not all of choice. Finally, it is buoyed by recent neural and behavioral findings regarding how animals and humans learn to choose between options. In light of our hypothesis, we critically reevaluate putative neural evidence for the representation of value and explore an alternative: direct learning of action policies. We delineate how this alternative can provide a robust account of behavior that concords with existing empirical data.

Chan, S. C., Schuck, N. W., Lopatina, N., Schoenbaum, G., & Niv, Y. (2021). Orbitofrontal cortex and learning predictions of state transitions. Behavioral Neuroscience. PDF: Orbitofrontal cortex and learning predictions of state transitions
Learning the transition structure of the environment – the probabilities of transitioning from one environmental state to another – is a key prerequisite for goal-directed planning and model-based decision making. To investigate the role of the orbitofrontal cortex (OFC) in goal-directed planning and decision making, we used fMRI to assess univariate and multivariate activity in the OFC while humans experienced state transitions that varied in degree of surprise. In convergence with recent evidence, we found that OFC activity was related to greater learning about transition structure, both across subjects and on a trial-by-trial basis. However, this relationship was inconsistent with a straightforward interpretation of OFC activity as representing a state prediction error that would facilitate learning of transitions via error-correcting mechanisms. The state prediction error hypothesis predicts that OFC activity at the time of observing an outcome should increase expectation of that observed outcome on subsequent trials. Instead, our results showed that OFC activity was associated with increased expectation of the more probable outcome; that is, with more optimal predictions. Our findings add to the evidence of OFC involvement in learning state-to-state transition structure, while providing new constraints for algorithmic hypotheses regarding how these transitions are learned.
Bennett, D., Davidson, G., & Niv, Y. (2021). A model of mood as integrated advantage. Psychological Review. PubMed Central: A model of mood as integrated advantage
Mood is an integrative and diffuse affective state that is thought to exert a pervasive effect on cognition and behavior. At the same time, mood itself is thought to fluctuate slowly as a product of feedback from interactions with the environment. Here we present a new computational theory of the valence of mood—the Integrated Advantage model—that seeks to account for this bidirectional interaction. Adopting theoretical formalisms from reinforcement learning, we propose to conceptualize the valence of mood as a leaky integral of an agent’s appraisals of the Advantage of its actions. This model generalizes and extends previous models of mood wherein affective valence was conceptualized as a moving average of reward prediction errors. We give a full theoretical derivation of the Integrated Advantage model and provide a functional explanation of how an integrated-Advantage variable could be deployed adaptively by a biological agent to accelerate learning in complex and/or stochastic environments. Specifically, drawing on stochastic optimization theory, we propose that an agent can utilize our hypothesized form of mood to approximate a momentum-based update to its behavioral policy, thereby facilitating rapid learning of optimal actions. We then show how this model of mood provides a principled and parsimonious explanation for a number of contextual effects on mood from the affective science literature, including expectation- and surprise-related effects, counterfactual effects from information about foregone alternatives, action-typicality effects, and action/inaction asymmetry.

2020

Daniel, R., Radulescu, A., & Niv, Y. (2020). Intact reinforcement learning but impaired attentional control during multidimensional probabilistic learning in older adults. Journal of Neuroscience, 40(5), 1084-1096. https://doi.org/10.1523/JNEUROSCI.0254-19.2019

To efficiently learn optimal behavior in complex environments, humans rely on an interplay of learning and attention. Healthy aging has been shown to independently affect both of these functions. Here, we investigate how reinforcement learning and selective attention interact during learning from trial and error across age groups. We acquired behavioral and fMRI data from older and younger adults performing two probabilistic learning tasks with varying attention demands. While learning in the unidimensional task did not dier across age groups, older adults performed worse than younger adults in the multidimensional task, which required high levels of selective attention. Computational modeling showed that choices of older adults are better predicted by reinforcement learning than Bayesian inference, and that older adults rely more on reinforcement learning based predictions than younger adults. Conversely, a higher proportion of younger adults' choices was predicted by a computationally demanding Bayesian approach. In line with the behavioral findings, we observed no group differences in reinforcement learning related fMRI activation. Specifically, prediction-error activation in the nucleus accumbens was similar across age groups, and numerically higher in older adults. However, activation in the default mode was less suppressed in older adults for higher

attentional task demands, and the level of suppression correlated with behavioral performance. Our results indicate that healthy aging does not signicantly impair simple reinforcement learning. However, in complex environments, older adults rely more heavily on suboptimal reinforcement-learning strategies supported by the ventral striatum, whereas younger adults utilize attention processes supported by cortical networks.

Free will is anything but free. With it comes the onus of choice: not only what to do, but which inner voice to listen to — our ‘automatic’ response system, which some consider ‘impulsive’ or ‘irrational’, or our supposedly more rational deliberative one. Rather than a devil and angel sitting on our shoulders, research suggests that we have two decision-making systems residing in the brain, in our basal ganglia. Neither system is the devil and neither is irrational. They both have our best interests at heart and aim to suggest the best course of action calculated through rational algorithms. However, the algorithms they use are qualitatively different and do not always agree on which action is optimal. The rivalry between habitual, fast action and deliberative, purposeful action is an ongoing one.
We remember when things change. Particularly salient are experiences where there is a change in rewards, eliciting reward prediction errors (RPEs). How do RPEs influence our memory of those experiences? One idea is that this signal directly enhances the encoding of memory. Another, not mutually exclusive, idea is that the RPE signals a deeper change in the environment, leading to the mnemonic separation of subsequent experiences from what came before, thereby creating a new latent context and a more separate memory trace. We tested this in four experiments where participants learned to predict rewards associated with a series of trial-unique images. High-magnitude RPEs indicated a change in the underlying distribution of rewards. To test whether these large RPEs created a new latent context, we first assessed recognition priming for sequential pairs that included a high-RPE event or not (Exp. 1: n = 27 & Exp. 2: n = 83). We found evidence of recognition priming for the high-RPE event, indicating that the high-RPE event is bound to its predecessor in memory. Given that high-RPE events are themselves preferentially remembered (Rouhani, Norman, & Niv, 2018), we next tested whether there was an event boundary across a high-RPE event (i.e., excluding the high-RPE event itself; Exp. 3: n = 85). Here, sequential pairs across a high RPE no longer showed recognition priming whereas pairs within the same latent reward state did, providing initial evidence for an RPE-modulated event boundary. We then investigated whether RPE event boundaries disrupt temporal memory by asking participants to order and estimate the distance between two events that had either included a high-RPE event between them or not (Exp. 4). We found (n = 49) and replicated (n = 77) worse sequence memory for events across a high RPE. In line with our recognition priming results, we did not find sequence memory to be impaired between the high-RPE event and its predecessor, but instead found worse sequence memory for pairs across a high-RPE event. Moreover, greater distance between events at encoding led to better sequence memory for events across a low-RPE event, but not a high-RPE event, suggesting separate mechanisms for the temporal ordering of events within versus across a latent reward context. Altogether, these findings demonstrate that high-RPE events are both more strongly encoded, show intact links with their predecessor, and act as event boundaries that interrupt the sequential integration of events. We captured these effects in a variant of the Context Maintenance and Retrieval model (CMR; Polyn, Norman, & Kahana, 2009), modified to incorporate RPEs into the encoding process.
With the wide adoption of functional magnetic resonance imaging (fMRI) by cognitive neuroscience researchers, large volumes of brain imaging data have been accumulated in recent years. Aggregating these data to derive scientific insights often faces the challenge that fMRI data are high-dimensional, heterogeneous across people, and noisy. These challenges demand the development of computational tools that are tailored both for the neuroscience questions and for the properties of the data. We review a few recently developed algorithms in various domains of fMRI research: fMRI in naturalistic tasks, analyzing full-brain functional connectivity, pattern classification, inferring representational similarity and modeling structured residuals. These algorithms all tackle the challenges in fMRI similarly: they start by making clear statements of assumptions about neural data and existing domain knowledge, incorporate those assumptions and domain knowledge into probabilistic graphical models, and use those models to estimate properties of interest or latent structures in the data. Such approaches can avoid erroneous findings, reduce the impact of noise, better utilize known properties of the data, and better aggregate data across groups of subjects. With these successful cases, we advocate wider adoption of explicit model construction in cognitive neuroscience. Although we focus on fMRI, the principle illustrated here is generally applicable to brain data of other modalities.
Langdon, A., & Daw, N. (2020). Beyond the Average View of Dopamine. Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2020.04.006
Dopamine (DA) responses are synonymous with the ‘reward prediction error’ of reinforcement learning (RL), and are thought to update neural estimates of expected value. A recent study by Dabney et al. enriches this picture, demonstrating that DA neurons track variability in rewards, providing a readout of risk in the brain.
Sharpe, M. J., Batchelor, H. M., Mueller, L. E., Chang, C. Y., Maes, E. J., Niv, Y., & Schoenbaum, G. (2020). Dopamine transients do not act as model-free prediction errors during associative learning. Nature Communications, 11(1), 106. https://doi.org/10.1038/s41467-019-13953-1 (Original work published 2020)
Dopamine neurons are proposed to signal the reward prediction error in model-free reinforcement learning algorithms. This term represents the unpredicted or ‘excess’ value of the rewarding event, value that is then added to the intrinsic value of any antecedent cues, contexts or events. To support this proposal, proponents cite evidence that artificially-induced dopamine transients cause lasting changes in behavior. Yet these studies do not generally assess learning under conditions where an endogenous prediction error would occur. Here, to address this, we conducted three experiments where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquire value and instead entered into associations with the later events, whether valueless cues or valued rewards. These results show that in learning situations appropriate for the appearance of a prediction error, dopamine transients support associative, rather than model-free, learning.

2019

Bravo-Hermsdorff, G., Felso, V., Ray, E., Gunderson, L. M., Helander, M. E., Maria, J., & Niv, Y. (2019). Gender and collaboration patterns in a temporal scientific authorship network. Applied Network Science, 4(1), 112. https://doi.org/10.1007/s41109-019-0214-4 (Original work published 2019)
One can point to a variety of historical milestones for gender equality in STEM (science, technology, engineering, and mathematics), however, practical effects are incremental and ongoing. It is important to quantify gender differences in subdomains of scientific work in order to detect potential biases and monitor progress. In this work, we study the relevance of gender in scientific collaboration patterns in the Institute for Operations Research and the Management Sciences (INFORMS), a professional society with sixteen peer-reviewed journals. Using their publication data from 1952 to 2016, we constructed a large temporal bipartite network between authors and publications, and augmented the author nodes with gender labels. We characterized differences in several basic statistics of this network over time, highlighting how they have changed with respect to relevant historical events. We find a steady increase in participation by women (e.g., fraction of authorships by women and of new women authors) starting around 1980. However, women still comprise less than 25% of the INFORMS society and an even smaller fraction of authors with many publications. Moreover, we describe a methodology for quantifying the structural role of an authorship with respect to the overall connectivity of the network, using it to measure subtle differences between authorships by women and by men. Specifically, as measures of structural importance of an authorship, we use effective resistance and contraction importance, two measures related to diffusion throughout a network. As a null model, we propose a degree-preserving temporal and geometric network model with emergent communities. Our results suggest the presence of systematic differences between the collaboration patterns of men and women that cannot be explained by only local statistics.
Niv, Y. (2019). Learning task-state representations. Nature Neuroscience, 22(10), 1544–1553. https://doi.org/10.1038/s41593-019-0470-8
Arguably, the most difficult part of learning is deciding what to learn about. Should I associate the positive outcome of safely completing a street-crossing with the situation ‘the car approaching the crosswalk was red' or with ‘the approaching car was slowing down'? In this Perspective, we summarize our recent research into the computational and neural underpinnings of ‘representation learning'—how humans (and other animals) construct task representations that allow efficient learning and decision-making. We first discuss the problem of learning what to ignore when confronted with too much information, so that experience can properly generalize across situations. We then turn to the problem of augmenting perceptual information with inferred latent causes that embody unobservable task-relevant information, such as contextual knowledge. Finally, we discuss recent findings regarding the neural substrates of task representations that suggest the orbitofrontal cortex represents ‘task states', deploying them for decision-making and learning elsewhere in the brain.
Schuck, N., & Niv, Y. (2019). Sequential replay of nonspatial task states in the human hippocampus. Science. https://doi.org/10.1126/science.aaw5181
Sequential neural activity patterns related to spatial experiences are “replayed” in the hippocampus of rodents during rest. We investigated whether replay of nonspatial sequences can be detected noninvasively in the human hippocampus. Participants underwent functional magnetic resonance imaging (fMRI) while resting after performing a decision-making task with sequential structure. Hippocampal fMRI patterns recorded at rest reflected sequentiality of previously experienced task states, with consecutive patterns corresponding to nearby states. Hippocampal sequentiality correlated with the fidelity of task representations recorded in the orbitofrontal cortex during decision-making, which were themselves related to better task performance. Our findings suggest that hippocampal replay may be important for building representations of complex, abstract tasks elsewhere in the brain and establish feasibility of investigating fast replay signals with fMRI.
McDougle, S., Butcher, P., Parvin, D., Mushtaq, F., Niv, Y., Ivry, R., & Taylor, J. (2019). Neural Signatures of Prediction Errors in a Decision-Making Task Are Modulated by Action Execution Failures. Current Biology. https://doi.org/10.1016/j.cub.2019.04.011
Decisions must be implemented through actions, and actions are prone to error. As such, when an expected outcome is not obtained, an individual should be sensitive to not only whether the choice itself was suboptimal but also whether the action required to indicate that choice was executed successfully. The intelligent assignment of credit to action execution versus action selection has clear ecological utility for the learner. To explore this, we used a modified version of a classic reinforcement learning task in which feedback indicated whether negative prediction errors were, or were not, associated with execution errors. Using fMRI, we asked if prediction error computations in the human striatum, a key substrate in reinforcement learning and decision making, are modulated when a failure in action execution results in the negative outcome. Participants were more tolerant of non-rewarded outcomes when these resulted from execution errors versus when execution was successful, but reward was withheld. Consistent with this behavior, a model-driven analysis of neural activity revealed an attenuation of the signal associated with negative reward prediction errors in the striatum following execution failures. These results converge with other lines of evidence suggesting that prediction errors in the mesostriatal dopamine system integrate high-level information during the evaluation of instantaneous reward outcomes.
Sharpe, M., Batchelor, H. M., Mueller, L., Chang, C. Y., Maes, E., Niv, Y., & Schoenbaum, G. (2019). Dopamine transients delivered in learning contexts do not act as model-free prediction errors. BioRxiv. https://doi.org/10.1101/574541
Dopamine neurons fire transiently in response to unexpected rewards. These neural correlates are proposed to signal the reward prediction error described in model-free reinforcement learning algorithms. This error term represents the unpredicted or excess value of the rewarding event. In model-free reinforcement learning, this value is then stored as part of the learned value of any antecedent cues, contexts or events, making them intrinsically valuable, independent of the specific rewarding event that caused the prediction error. In support of equivalence between dopamine transients and this model-free error term, proponents cite causal optogenetic studies showing that artificially induced dopamine transients cause lasting changes in behavior. Yet none of these studies directly demonstrate the presence of cached value under conditions appropriate for associative learning. To address this gap in our knowledge, we conducted three studies where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquired value and instead entered into value-independent associative relationships with the other cues or rewards. These results show that dopamine transients, constrained within appropriate learning situations, support valueless associative learning.
Zhou, J., Gardner, M. P. H., Stalnaker, T., Ramus, S., Wikenheiser, A., Niv, Y., & Schoenbaum, G. (2019). Rat Orbitofrontal Ensemble Activity Contains Multiplexed but Dissociable Representations of Value and Task Structure in an Odor Sequence Task. Current Biology, 29(6), 897–907.e3. https://doi.org/10.1016/j.cub.2019.01.048 (Original work published 2019)
The orbitofrontal cortex (OFC) has long been implicated in signaling information about expected outcomes to facilitate adaptive or flexible behavior. Current proposals focus on signaling of expected value versus the representation of a value-agnostic cognitive map of the task. While often suggested as mutually exclusive, these alternatives may represent extreme ends of a continuum determined by task complexity and experience. As learning proceeds, an initial, detailed cognitive map might be acquired, based largely on external information. With more experience, this hypothesized map can then be tailored to include relevant abstract hidden cognitive constructs. The map would default to an expected value in situations where other attributes are largely irrelevant, but, in richer tasks, a more detailed structure might continue to be represented, at least where relevant to behavior. Here, we examined this by recording single-unit activity from the OFC in rats navigating an odor sequence task analogous to a spatial maze. The odor sequences provided a mappable state space, with 24 unique “positions” defined by sensory information, likelihood of reward, or both. Consistent with the hypothesis that the OFC represents a cognitive map tailored to the subjects' intentions or plans, we found a close correspondence between how subjects were using the sequences and the neural representations of the sequences in OFC ensembles. Multiplexed with this value-invariant representation of the task, we also found a representation of the expected value at each location. Thus, the value and task structure co-existed as dissociable components of the neural code in OFC.
Bennett, D., Silverstein, S., & Niv, Y. (2019). The two cultures of computational psychiatry. In JAMA Psychiatry. https://doi.org/10.1001/jamapsychiatry.2019.0231
Translating advances in neuroscience into benefits for patients with mental illness presents enormous challenges because it involves both the most complex organ, the brain, and its interaction with a similarly complex environment. Dealing with such complexities demands powerful techniques. Computational psychiatry combines multiple levels and types of computation with multiple types of data in an effort to improve understanding, prediction and treatment of mental illness. Computational psychiatry, broadly defined, encompasses two complementary approaches: data driven and theory driven. Data-driven approaches apply machine-learning methods to high-dimensional data to improve classification of disease, predict treatment outcomes or improve treatment selection. These approaches are generally agnostic as to the underlying mechanisms. Theory-driven approaches, in contrast, use models that instantiate prior knowledge of, or explicit hypotheses about, such mechanisms, possibly at multiple levels of analysis and abstraction. We review recent advances in both approaches, with an emphasis on clinical applications, and highlight the utility of combining them.
Cai, M. B., Schuck, N., Pillow, J., & Niv, Y. (2019). Representational structure or task structure? Bias in neural representational similarity analysis and a Bayesian method for reducing bias. PLoS Computational Biology. https://doi.org/10.1371/journal.pcbi.1006299
The activity of neural populations in the brains of humans and animals can exhibit vastly different spatial patterns when faced with different tasks or environmental stimuli. The degrees of similarity between these neural activity patterns in response to different events are used to characterize the representational structure of cognitive states in a neural population. The dominant methods of investigating this similarity structure first estimate neural activity patterns from noisy neural imaging data using linear regression, and then examine the similarity between the estimated patterns. Here, we show that this approach introduces spurious bias structure in the resulting similarity matrix, in particular when applied to fMRI data. This problem is especially severe when the signal-to-noise ratio is low and in cases where experimental conditions cannot be fully randomized in a task. We propose Bayesian Representational Similarity Analysis (BRSA), an alternative method for computing representational similarity, in which we treat the covariance structure of neural activity patterns as a hyper-parameter in a generative model of the neural data. By marginalizing over the unknown activity patterns, we can directly estimate this covariance structure from imaging data. This method offers significant reductions in bias and allows estimation of neural representational similarity with previously unattained levels of precision at low signal-to-noise ratio, without losing the possibility of deriving an interpretable distance measure from the estimated similarity. The method is closely related to Pattern Component Model (PCM), but instead of modeling the estimated neural patterns as in PCM, BRSA models the imaging data directly and is suited for analyzing data in which the order of task conditions is not fully counterbalanced. The probabilistic framework allows for jointly analyzing data from a group of participants. The method can also simultaneously estimate a signal-to-noise ratio map that shows where the learned representational structure is supported more strongly. Both this map and the learned covariance matrix can be used as a structured prior for maximum a posteriori estimation of neural activity patterns, which can be further used for fMRI decoding. Our method therefore paves the way towards a more unified and principled analysis of neural representations underlying fMRI signals. We make our tool freely available in Brain Imaging Analysis Kit (BrainIAK).
Radulescu, A., Niv, Y., & Ballard, I. (2019). Holistic Reinforcement Learning: The Role of Structure and Attention. In Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2019.01.010
Compact representations of the environment allow humans to behave efficiently in a complex world. Reinforcement learning models capture many behavioral and neural effects but do not explain recent findings showing that structure in the environment influences learning. In parallel, Bayesian cognitive models predict how humans learn structured knowledge but do not have a clear neurobiological implementation. We propose an integration of these two model classes in which structured knowledge learned via approximate Bayesian inference acts as a source of selective attention. In turn, selective attention biases reinforcement learning towards relevant dimensions of the environment. An understanding of structure learning will help to resolve the fundamental challenge in decision science: explaining why people make the decisions they do.
Radulescu, A., & Niv, Y. (2019). State representation in mental illness. In Current Opinion in Neurobiology. https://doi.org/10.1016/j.conb.2019.03.011
Reinforcement learning theory provides a powerful set of computational ideas for modeling human learning and decision making. Reinforcement learning algorithms rely on state representations that enable efficient behavior by focusing only on aspects relevant to the task at hand. Forming such representations often requires selective attention to the sensory environment, and recalling memories of relevant past experiences. A striking range of psychiatric disorders, including bipolar disorder and schizophrenia, involve changes in these cognitive processes. We review and discuss evidence that these changes can be cast as altered state representation, with the goal of providing a useful transdiagnostic dimension along which mental disorders can be understood and compared.
Rouhani, N., & Niv, Y. (2019). Depressive symptoms bias the prediction-error enhancement of memory towards negative events in reinforcement learning. Psychopharmacology, 236(8), 2425–2435. https://doi.org/10.1007/s00213-019-05322-z (Original work published 2019)

Rationale. Depression is a disorder characterized by sustained negative affect and blunted positive affect, suggesting potential abnormalities in reward learning and its interaction with episodic memory. Objectives. This study investigated how reward prediction errors experienced during learning modulate memory for rewarding events in individuals with depressive and non-depressive symptoms.

Methods. Across three experiments, participants learned the average values of two scene categories in two learning contexts. Each learning context had either high or low outcome variance, allowing us to test the effects of small and large prediction errors on learning and memory. Participants were later tested for their memory of trial-unique scenes that appeared alongside outcomes. We compared learning and memory performance of individuals with self-reported depressive symptoms (N = 101) to those without (N = 184).

Results. Although there were no overall differences in reward learning between the depressive and non-depressive group, depression severity within the depressive group predicted greater error in estimating the values of the scene categories. Similarly, there were no overall differences in memory performance. However, in depressive participants, negative prediction errors enhanced episodic memory more so than did positive prediction errors, and vice versa for non-depressive participants who showed a larger effect of positive prediction errors on memory. These results reflected differences in memory both within group and across groups.

Conclusions. Individuals with self-reported depressive symptoms showed relatively intact reinforcement learning, but demonstrated a bias for encoding events that accompanied surprising negative outcomes versus surprising positive ones. We discuss a potential neural mechanism supporting these effects, which may underlie or contribute to the excessive negative affect observed in depression.

Langdon, A., Song, M., & Niv, Y. (2019). Uncovering the ‘state’: Tracing the hidden state representations that structure learning and decision-making. Behavioural Processes, 167, 103891. https://doi.org/10.1016/j.beproc.2019.103891 (Original work published 2019)
We review the abstract concept of a ‘state' – an internal representation posited by reinforcement learning theories to be used by an agent, whether animal, human or artificial, to summarize the features of the external and internal environment that are relevant for future behavior on a particular task. Armed with this summary representation, an agent can make decisions and perform actions to interact effectively with the world. Here, we review recent findings from the neurobiological and behavioral literature to ask: ‘what is a state?' with respect to the internal representations that organize learning and decision making across a range of tasks. We find that state representations include information beyond a straightforward summary of the immediate cues in the environment, providing timing or contextual information from the recent or more distant past, which allows these additional factors to influence decision making and other goal-directed behaviors in complex and perhaps unexpected ways.
Langdon, A. J., Hathaway, B. A., Zorowitz, S., Harris, C. B. W., & Winstanley, C. A. (2019). Relative insensitivity to time-out punishments induced by win-paired cues in a rat gambling task. Psychopharmacology, 236(8), 2543–2556. https://doi.org/10.1007/s00213-019-05308-x (Original work published 2019)
Rationale. Pairing rewarding outcomes with audiovisual cues in simulated gambling games increases risky choice in both humans and rats. However, the cognitive mechanism through which this sensory enhancement biases decision-making is unknown. Objectives. To assess the computational mechanisms that promote risky choice during gambling, we applied a series of reinforcement learning models to a large dataset of choices acquired from rats as they each performed one of two variants of a rat gambling task (rGT), in which rewards on “win” trials were delivered either with or without salient audiovisual cues. Methods. We used a sampling technique based on Markov chain Monte Carlo to obtain posterior estimates of model parameters for a series of RL models of increasing complexity, in order to assess the relative contribution of learning about positive and negative outcomes to the latent valuation of each choice option on the cued and uncued rGT. Results. Rats which develop a preference for the risky options on the rGT substantially down-weight the equivalent cost of the time-out punishments during these tasks. For each model tested, the reduction in learning from the negative time-outs correlated with the degree of risk preference in individual rats. We found no apparent relationship between risk preference and the parameters that govern learning from the positive rewards. Conclusions. The emergence of risk-preferring choice on the rGT derives from a relative insensitivity to the cost of the time-out punishments, as opposed to a relative hypersensitivity to rewards. This hyposensitivity to punishment is more likely to be induced in individual rats by the addition of salient audiovisual cues to rewards delivered on win trials.

2018

Hermsdorff, G. B., Pereira, T., & Niv, Y. (2018). Quantifying Humans’ Priors Over Graphical Representations of Tasks. Springer Proceedings in Complexity, 281–290. https://doi.org/10.1007/978-3-319-96661-8_30
Some new tasks are trivial to learn while others are almost impossible; what determines how easy it is to learn an arbitrary task? Similar to how our prior beliefs about new visual scenes colors our per- ception of new stimuli, our priors about the structure of new tasks shapes our learning and generalization abilities [2]. While quantifying visual pri- ors has led to major insights on how our visual system works [5,10,11], quantifying priors over tasks remains a formidable goal, as it is not even clear how to define a task [4]. Here, we focus on tasks that have a natural mapping to graphs.We develop a method to quantify humans' priors over these “task graphs”, combining new modeling approaches with Markov chain Monte Carlo with people, MCMCP (a process whereby an agent learns from data generated by another agent, recursively [9]). We show that our method recovers priors more accurately than a standard MCMC sampling approach. Additionally, we propose a novel low-dimensional “smooth” (In the sense that graphs that differ by fewer edges are given similar probabilities.) parametrization of probability distributions over graphs that allows for more accurate recovery of the prior and better generalization.We have also created an online experiment platform that gamifies ourMCMCPalgorithm and allows subjects to interactively draw the task graphs. We use this platform to collect human data on sev- eral navigation and social interactions tasks. We show that priors over these tasks have non-trivial structure, deviating significantly from null models that are insensitive to the graphical information. The priors also notably differ between the navigation and social domains, showing fewer differences between cover stories within the same domain. Finally, we extend our framework to the more general case of quantifying priors over exchangeable random structures.