Daniel, R., Radulescu, A., & Niv, Y.
(2020). Intact reinforcement learning but impaired attentional control during multidimensional probabilistic learning in older adults
. Journal of Neuroscience
, 1084-1096. PDFAbstract
To efficiently learn optimal behavior in complex environments, humans rely on an interplay of learning and attention. Healthy aging has been shown to independently affect both of these functions. Here, we investigate how reinforcement learning and selective attention interact during learning from trial and error across age groups. We acquired behavioral and fMRI data from older and younger adults performing two probabilistic learning tasks with varying attention demands. While learning in the unidimensional task did not dier across age groups, older adults performed worse than younger adults in the multidimensional task, which required high levels of selective attention. Computational modeling showed that choices of older adults are better predicted by reinforcement learning than Bayesian inference, and that older adults rely more on reinforcement learning based predictions than younger adults. Conversely, a higher proportion of younger adults' choices was predicted by a computationally demanding Bayesian approach. In line with the behavioral findings, we observed no group differences in reinforcement learning related fMRI activation. Specifically, prediction-error activation in the nucleus accumbens was similar across age groups, and numerically higher in older adults. However, activation in the default mode was less suppressed in older adults for higher
attentional task demands, and the level of suppression correlated with behavioral performance. Our results indicate that healthy aging does not signicantly impair simple reinforcement learning. However, in complex environments, older adults rely more heavily on suboptimal reinforcement-learning strategies supported by the ventral striatum, whereas younger adults utilize attention processes supported by cortical networks.
Drummond, N., & Niv, Y.
(2020). Model-based decision making and model-free learning
. Current Biology
(15), 860-865. PDFAbstract
Free will is anything but free. With it comes the onus of choice: not only what to do, but which inner voice to listen to — our ‘automatic’ response system, which some consider ‘impulsive’ or ‘irrational’, or our supposedly more rational deliberative one. Rather than a devil and angel sitting on our shoulders, research suggests that we have two decision-making systems residing in the brain, in our basal ganglia. Neither system is the devil and neither is irrational. They both have our best interests at heart and aim to suggest the best course of action calculated through rational algorithms. However, the algorithms they use are qualitatively different and do not always agree on which action is optimal. The rivalry between habitual, fast action and deliberative, purposeful action is an ongoing one.
Rouhani, N., Norman, K. A., Niv, Y., & Bornstein, A. M.
(2020). Reward prediction errors create event boundaries in memory
We remember when things change. Particularly salient are experiences where there is a change in rewards, eliciting reward prediction errors (RPEs). How do RPEs influence our memory of those experiences? One idea is that this signal directly enhances the encoding of memory. Another, not mutually exclusive, idea is that the RPE signals a deeper change in the environment, leading to the mnemonic separation of subsequent experiences from what came before, thereby creating a new latent context and a more separate memory trace. We tested this in four experiments where participants learned to predict rewards associated with a series of trial-unique images. High-magnitude RPEs indicated a change in the underlying distribution of rewards. To test whether these large RPEs created a new latent context, we first assessed recognition priming for sequential pairs that included a high-RPE event or not (Exp. 1: n = 27 & Exp. 2: n = 83). We found evidence of recognition priming for the high-RPE event, indicating that the high-RPE event is bound to its predecessor in memory. Given that high-RPE events are themselves preferentially remembered (Rouhani, Norman, & Niv, 2018), we next tested whether there was an event boundary across a high-RPE event (i.e., excluding the high-RPE event itself; Exp. 3: n = 85). Here, sequential pairs across a high RPE no longer showed recognition priming whereas pairs within the same latent reward state did, providing initial evidence for an RPE-modulated event boundary. We then investigated whether RPE event boundaries disrupt temporal memory by asking participants to order and estimate the distance between two events that had either included a high-RPE event between them or not (Exp. 4). We found (n = 49) and replicated (n = 77) worse sequence memory for events across a high RPE. In line with our recognition priming results, we did not find sequence memory to be impaired between the high-RPE event and its predecessor, but instead found worse sequence memory for pairs across a high-RPE event. Moreover, greater distance between events at encoding led to better sequence memory for events across a low-RPE event, but not a high-RPE event, suggesting separate mechanisms for the temporal ordering of events within versus across a latent reward context. Altogether, these findings demonstrate that high-RPE events are both more strongly encoded, show intact links with their predecessor, and act as event boundaries that interrupt the sequential integration of events. We captured these effects in a variant of the Context Maintenance and Retrieval model (CMR; Polyn, Norman, & Kahana, 2009), modified to incorporate RPEs into the encoding process.
Cai, M. B., Shvartsman, M., Wu, A., Zhang, H., & Ju, X.
(2020). Incorporating structured assumptions with probabilistic graphical models in fMRI data analysis
. Publisher's VersionAbstract
With the wide adoption of functional magnetic resonance imaging (fMRI) by cognitive neuroscience researchers, large volumes of brain imaging data have been accumulated in recent years. Aggregating these data to derive scientific insights often faces the challenge that fMRI data are high-dimensional, heterogeneous across people, and noisy. These challenges demand the development of computational tools that are tailored both for the neuroscience questions and for the properties of the data. We review a few recently developed algorithms in various domains of fMRI research: fMRI in naturalistic tasks, analyzing full-brain functional connectivity, pattern classification, inferring representational similarity and modeling structured residuals. These algorithms all tackle the challenges in fMRI similarly: they start by making clear statements of assumptions about neural data and existing domain knowledge, incorporate those assumptions and domain knowledge into probabilistic graphical models, and use those models to estimate properties of interest or latent structures in the data. Such approaches can avoid erroneous findings, reduce the impact of noise, better utilize known properties of the data, and better aggregate data across groups of subjects. With these successful cases, we advocate wider adoption of explicit model construction in cognitive neuroscience. Although we focus on fMRI, the principle illustrated here is generally applicable to brain data of other modalities.
Langdon, A. J., & Daw, N.
(2020). Beyond the Average View of Dopamine
. Trends in Cognitive Sciences
Dopamine (DA) responses are synonymous with the ‘reward prediction error’ of reinforcement learning (RL), and are thought to update neural estimates of expected value. A recent study by Dabney et al.
enriches this picture, demonstrating that DA neurons track variability in rewards, providing a readout of risk in the brain.
Sharpe, M. J., Batchelor, H. M., Mueller, L. E., Chang, C. Y., Maes, E. J. P., Niv, Y., & Schoenbaum, G.
(2020). Dopamine transients do not act as model-free prediction errors during associative learning
. Nature Communications
(1), 106. Publisher's VersionAbstract
Dopamine neurons are proposed to signal the reward prediction error in model-free reinforcement learning algorithms. This term represents the unpredicted or ‘excess’ value of the rewarding event, value that is then added to the intrinsic value of any antecedent cues, contexts or events. To support this proposal, proponents cite evidence that artificially-induced dopamine transients cause lasting changes in behavior. Yet these studies do not generally assess learning under conditions where an endogenous prediction error would occur. Here, to address this, we conducted three experiments where we optogenetically activated dopamine neurons while rats were learning associative relationships, both with and without reward. In each experiment, the antecedent cues failed to acquire value and instead entered into associations with the later events, whether valueless cues or valued rewards. These results show that in learning situations appropriate for the appearance of a prediction error, dopamine transients support associative, rather than model-free, learning.