Gershman, S. J., & Niv, Y. (2012).
Exploring a latent cause theory of classical conditioning. Learn Behav ,
40 (3), 255–268.
PDFAbstractWe frame behavior in classical conditioning experiments as the product of normative statistical inference. According to this theory, animals learn an internal model of their environment from experience. The basic building blocks of this internal model are latent causes-explanatory constructs inferred by the animal that partition observations into coherent clusters. Generalization of conditioned responding from one cue to another arises from the animal's inference that the cues were generated by the same latent cause. Through a wide range of simulations, we demonstrate where the theory succeeds and where it fails as a general account of classical conditioning.
Lucantonio, F., Stalnaker, T. A., Shaham, Y., Niv, Y., & Schoenbaum, G. (2012).
The impact of orbitofrontal dysfunction on cocaine addiction.
Nature Neuroscience ,
15 (3), 358–366.
PDFAbstractCocaine addiction is characterized by poor judgment and maladaptive decision-making. Here we review evidence implicating the orbitofrontal cortex in such behavior. This evidence suggests that cocaine-induced changes in orbitofrontal cortex disrupt the representation of states and transition functions that form the basis of flexible and adaptive 'model-based' behavioral control. By impairing this function, cocaine exposure leads to an overemphasis on less flexible, maladaptive 'model-free' control systems. We propose that such an effect accounts for the complex pattern of maladaptive behaviors associated with cocaine addiction.
Wilson, R. C., & Niv, Y. (2012).
Inferring Relevance in a Changing World.
Frontiers in Human Neuroscience ,
5 (JANUARY 2012), 189.
PDFAbstractReinforcement learning models of human and animal learning usually concentrate on how we learn the relationship between different stimuli or actions and rewards. However, in real-world situations "stimuli" are ill-defined. On the one hand, our immediate environment is extremely multidimensional. On the other hand, in every decision making scenario only a few aspects of the environment are relevant for obtaining reward, while most are irrelevant. Thus a key question is how do we learn these relevant dimensions, that is, how do we learn what to learn about? We investigated this process of "representation learning" experimentally, using a task in which one stimulus dimension was relevant for determining reward at each point in time. As in real life situations, in our task the relevant dimension can change without warning, adding ever-present uncertainty engendered by a constantly changing environment. We show that human performance on this task is better described by a suboptimal strategy based on selective attention and serial-hypothesis-testing rather than a normative strategy based on probabilistic inference. From this, we conjecture that the problem of inferring relevance in general scenarios is too computationally demanding for the brain to solve optimally. As a result the brain utilizes approximations, employing these even in simplified scenarios in which optimal representation learning is tractable, such as the one in our experiment.
Niv, Y., Edlund, J. A., Dayan, P. D., & O'Doherty, J. P. (2012).
Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain.
Journal of Neuroscience ,
32 (2), 551–562.
PDFAbstractHumans and animals are exquisitely, though idiosyncratically, sensitive to risk or variance in the outcomes of their actions. Economic, psychological, and neural aspects of this are well studied when information about risk is provided explicitly. However, we must normally learn about outcomes from experience, through trial and error. Traditional models of such reinforcement learning focus on learning about the mean reward value of cues and ignore higher order moments such as variance. We used fMRI to test whether the neural correlates of human reinforcement learning are sensitive to experienced risk. Our analysis focused on anatomically delineated regions of a priori interest in the nucleus accumbens, where blood oxygenation level-dependent (BOLD) signals have been suggested as correlating with quantities derived from reinforcement learning. We first provide unbiased evidence that the raw BOLD signal in these regions corresponds closely to a reward prediction error. We then derive from this signal the learned values of cues that predict rewards of equal mean but different variance and show that these values are indeed modulated by experienced risk. Moreover, a close neurometric-psychometric coupling exists between the fluctuations of the experience-based evaluations of risky options that we measured neurally and the fluctuations in behavioral risk aversion. This suggests that risk sensitivity is integral to human learning, illuminating economic models of choice, neuroscientific models of affective learning, and the workings of the underlying neural mechanisms.