Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for different actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural findings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of ‘value’ in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.
How do we evaluate a group of people after a few negative experiences with some members but mostly positive experiences otherwise? How do rare experiences influence our overall impression? We show that rare events may be overweighted due to normative inference of the hidden causes that are believed to generate the observed events. We propose a Bayesian inference model that organizes environmental statistics by combining similar events and separating outlying observations. Relying on the model’s inferred latent causes for group evaluation overweights rare or variable events. We tested the model’s predictions in eight experiments where participants observed a sequence of social or non-social behaviours and estimated their average. As predicted, estimates were biased toward sparse events when estimating after seeing all observations, but not when tracking a summary value as observations accrued. Our results suggest that biases in evaluation may arise from inferring the hidden causes of group members’ behaviours.
Memory helps guide behavior, but which experiences from the past are prioritized? Classic models of learning posit that events associated with unpredictable outcomes as well as, paradoxically, predictable outcomes, deploy more attention and learning for those events. Here, we test reinforcement learning and subsequent memory for those events, and treat signed and unsigned reward prediction errors (RPEs), experienced at the reward-predictive cue or reward outcome, as drivers of these two seemingly contradictory signals. By fitting reinforcement learning models to behavior, we find that both RPEs contribute to learning by modulating a dynamically changing learning rate. We further characterize the effects of these RPE signals on memory, and show that both signed and unsigned RPEs enhance memory, in line with midbrain dopamine and locus-coeruleus modulation of hippocampal plasticity, thereby reconciling separate findings in the literature.
The demonstration that human decision-making can systematically violate the laws of rationality has had a wide impact on behavioural sciences. In this study, we use a pupillary index to adjudicate between two existing hypotheses about how irrational biases emerge: the hypothesis that biases result from fast, effortless processing and the hypothesis that biases result from more extensive integration. While effortless processing is associated with smaller pupillary responses, more extensive integration is associated with larger pupillary responses. Thus, we tested the relationship between pupil response and choice behaviour on six different foundational decision-making tasks that are classically used to demonstrate irrational biases. Participants demonstrated the expected systematic biases and their pupillary measurements satisfied pre-specified quality checks. Planned analyses returned inconclusive results, but exploratory examination of the data revealed an association between high pupillary responses and biased decisions. The findings provide preliminary support for the hypothesis that biases arise from gradual information integration.
This page contains links to original data from experiments run at the Princeton Neuroscience Institute. These data are available to others for educational purposes. If they are used in publications, please cite the source of the data by indicating the published reference and the address of this website