The demonstration that human decision-making can systematically violate the laws of rationality has had a wide impact on behavioural sciences. In this study, we use a pupillary index to adjudicate between two existing hypotheses about how irrational biases emerge: the hypothesis that biases result from fast, effortless processing and the hypothesis that biases result from more extensive integration. While effortless processing is associated with smaller pupillary responses, more extensive integration is associated with larger pupillary responses. Thus, we tested the relationship between pupil response and choice behaviour on six different foundational decision-making tasks that are classically used to demonstrate irrational biases. Participants demonstrated the expected systematic biases and their pupillary measurements satisfied pre-specified quality checks. Planned analyses returned inconclusive results, but exploratory examination of the data revealed an association between high pupillary responses and biased decisions. The findings provide preliminary support for the hypothesis that biases arise from gradual information integration.
Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for different actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural findings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of ‘value’ in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.
How do we evaluate a group of people after a few negative experiences with some members but mostly positive experiences otherwise? How do rare experiences influence our overall impression? We show that rare events may be overweighted due to normative inference of the hidden causes that are believed to generate the observed events. We propose a Bayesian inference model that organizes environmental statistics by combining similar events and separating outlying observations. Relying on the model’s inferred latent causes for group evaluation overweights rare or variable events. We tested the model’s predictions in eight experiments where participants observed a sequence of social or non-social behaviours and estimated their average. As predicted, estimates were biased toward sparse events when estimating after seeing all observations, but not when tracking a summary value as observations accrued. Our results suggest that biases in evaluation may arise from inferring the hidden causes of group members’ behaviours.
How does rumination affect reinforcement learning — the ubiquitous process by which we adjust behavior after error in order to behave more effectively in the future? In a within-subject design (n=49), we tested whether experimentally induced rumination disrupts reinforcement learning in a multidimensional learning task previously shown to rely on selective attention. Rumination impaired performance, yet unexpectedly this impairment could not be attributed to decreased attentional breadth (quantified using a “decay” parameter in a computational model). Instead, trait rumination (between subjects) was associated with higher decay rates (implying narrower attention), yet not with impaired performance. Our task-performance results accord with the possibility that state rumination promotes stress-generating behavior in part by disrupting reinforcement learning. The trait-rumination finding accords with the predictions of a prominent model of trait rumination (the attentional-scope model). More work is needed to understand the specific mechanisms by which state rumination disrupts reinforcement learning.
Memory helps guide behavior, but which experiences from the past are prioritized? Classic models of learning posit that events associated with unpredictable outcomes as well as, paradoxically, predictable outcomes, deploy more attention and learning for those events. Here, we test reinforcement learning and subsequent memory for those events, and treat signed and unsigned reward prediction errors (RPEs), experienced at the reward-predictive cue or reward outcome, as drivers of these two seemingly contradictory signals. By fitting reinforcement learning models to behavior, we find that both RPEs contribute to learning by modulating a dynamically changing learning rate. We further characterize the effects of these RPE signals on memory, and show that both signed and unsigned RPEs enhance memory, in line with midbrain dopamine and locus-coeruleus modulation of hippocampal plasticity, thereby reconciling separate findings in the literature.
The central theme of this review is the dynamic interaction between infor- mation selection and learning. We pose a fundamental question about this interaction: How do we learn what features of our experiences are worth learning about? In humans, this process depends on attention and memory, two cognitive functions that together constrain representations of the world to features that are relevant for goal attainment. Recent evidence suggests that the representations shaped by attention and memory are themselves in- ferred from experience with each task. We review this evidence and place it in the context of work that has explicitly characterized representation learning as statistical inference. We discuss how inference can be scaled to real-world decisions by approximating beliefs based on a small number of experiences. Finally, we highlight some implications of this inference process for human decision-making in social environments.
Understanding the brain requires us to answer both what the brain does, and how it does it. Using a series of examples, I make the case that behavior is often more useful than neuroscientific measurements for answering the first question. Moreover, I show that even for “how” questions that pertain to neural mechanism, a well-crafted behavioral paradigm can offer deeper insight and stronger constraints on computational and mechanistic models than do many highly challenging (and very expensive) neural studies. I conclude that behavioral, rather than neuroscientific research, is essential for understanding the brain, contrary to the opinion of prominent funding bodies and scientific journals, who erroneously place neural data on a pedestal and consider behavior to be subsidiary.
Much of traditional neuroeconomics proceeds from the hypothesis that value is reified in the brain, that is, that there are neurons or brain regions whose responses serve the discrete purpose of encoding value. This hypothesis is supported by the finding that the activity of many neurons covaries with subjective value as estimated in specific tasks and has led to the idea that the primary function of the orbitofrontal cortex is to compute and signal economic value. Here we consider an alternative: that economic value, in the cardinal, common-currency sense, is not represented in the brain and used for choice by default. This idea is motivated by consideration of the economic concept of value, which places important epistemic constraints on our ability to identify its neural basis. It is also motivated by the behavioral economics literature, especially work on heuristics, which proposes value-free process models for much if not all of choice. Finally, it is buoyed by recent neural and behavioral findings regarding how animals and humans learn to choose between options. In light of our hypothesis, we critically reevaluate putative neural evidence for the representation of value and explore an alternative: direct learning of action policies. We delineate how this alternative can provide a robust account of behavior that concords with existing empirical data.
The accepted manuscript version of this article will be publicly available on 05/31/2022.
Learning the transition structure of the environment – the probabilities of transitioning from one environmental state to another – is a key prerequisite for goal-directed planning and model-based decision making. To investigate the role of the orbitofrontal cortex (OFC) in goal-directed planning and decision making, we used fMRI to assess univariate and multivariate activity in the OFC while humans experienced state transitions that varied in degree of surprise. In convergence with recent evidence, we found that OFC activity was related to greater learning about transition structure, both across subjects and on a trial-by-trial basis. However, this relationship was inconsistent with a straightforward interpretation of OFC activity as representing a state prediction error that would facilitate learning of transitions via error-correcting mechanisms. The state prediction error hypothesis predicts that OFC activity at the time of observing an outcome should increase expectation of that observed outcome on subsequent trials. Instead, our results showed that OFC activity was associated with increased expectation of the more probable outcome; that is, with more optimal predictions. Our findings add to the evidence of OFC involvement in learning state-to-state transition structure, while providing new constraints for algorithmic hypotheses regarding how these transitions are learned.
Mood is an integrative and diffuse affective state that is thought to exert a pervasive effect on cognition and behavior. At the same time, mood itself is thought to fluctuate slowly as a product of feedback from interactions with the environment. Here we present a new computational theory of the valence of mood—the Integrated Advantage model—that seeks to account for this bidirectional interaction. Adopting theoretical formalisms from reinforcement learning, we propose to conceptualize the valence of mood as a leaky integral of an agent’s appraisals of the Advantage of its actions. This model generalizes and extends previous models of mood wherein affective valence was conceptualized as a moving average of reward prediction errors. We give a full theoretical derivation of the Integrated Advantage model and provide a functional explanation of how an integrated-Advantage variable could be deployed adaptively by a biological agent to accelerate learning in complex and/or stochastic environments. Specifically, drawing on stochastic optimization theory, we propose that an agent can utilize our hypothesized form of mood to approximate a momentum-based update to its behavioral policy, thereby facilitating rapid learning of optimal actions. We then show how this model of mood provides a principled and parsimonious explanation for a number of contextual effects on mood from the affective science literature, including expectation- and surprise-related effects, counterfactual effects from information about foregone alternatives, action-typicality effects, and action/inaction asymmetry.
The accepted manuscript version of this article will be publicly available on 09/13/2022.
This page contains links to original data from experiments run at the Princeton Neuroscience Institute. These data are available to others for educational purposes. If they are used in publications, please cite the source of the data by indicating the published reference and the address of this website