Niv, Y., & Montague, R. P.
(2009). Theoretical and Empirical Studies of Learning
, 331–351 . Elsevier. PDFAbstract
This chapter introduces the reinforcement learning framework and gives a brief background to the origins and history of reinforcement learning models of decision-making. Reinforcement learning provides a normative framework, within which conditioning can be analyzed. That is, this suggests a means by which optimal prediction and action selection can be achieved, and exposes explicitly the computations that must be realized in the service of these. In contrast to descriptive models that describe behavior as it is, normative models study behavior from the point of view of its hypothesized function-that is, they study behavior, as it should be if it were to accomplish specific goals in an optimal way. The appeal of normative models derives from several sources. Historically, the core ideas in reinforcement learning arose from two separate and parallel lines of research. One axis is mainly associated with Richard Sutton, formerly an undergraduate psychology major, and his PhD advisor, Andrew Barto, a computer scientist. Interested in artificial intelligence and agent-based learning, Sutton and Barto developed algorithms for reinforcement learning that were inspired by the psychological literature on Pavlovian and instrumental conditioning. ©2009 Elsevier Inc. All rights reserved.
Botvinick, M. M., Niv, Y., & Barto, A. G.
(2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective
(3), 262–280. PDFAbstract
Research on human and animal behavior has long emphasized its hierarchical structure-the divisibility of ongoing behavior into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behavior has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this paper, we reexamine behavioral hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behavior. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behavior.
Todd, M. T., Niv, Y., & Cohen, J. D.
(2009). Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement
. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Ed.)
, Advances in Neural Information Processing Systems 21
(pp. 1689–1696). PDFAbstract
Working memory is a central topic of cognitive neuroscience because it is critical for solving real-world problems in which information from multiple temporally distant sources must be combined to generate appropriate behavior. However, an often neglected fact is that learning to use working memory effectively is itself a difficult problem. The Gating framework is a collection of psychological models that show how dopamine can train the basal ganglia and prefrontal cortex to form useful working memory representations in certain types of problems. We unite Gating with machine learning theory concerning the general problem of memory-based optimal control. We present a normative model that learns, by online temporal difference methods, to use working memory to maximize discounted future reward in partially observable settings. The model successfully solves a benchmark working memory problem, and exhibits limitations similar to those observed in humans. Our purpose is to introduce a concise, normative definition of high level cognitive concepts such as working memory and cognitive control in terms of maximizing discounted future rewards. 1
(2009). Reinforcement learning in the brain
. Journal of Mathematical Psychology
(3), 139–154. PDFAbstract
A wealth of research focuses on the decision-making processes that animals and humans employ when selecting actions in the face of reward and punishment. Initially such work stemmed from psychological investigations of conditioned behavior, and explanations of these in terms of computational models. Increasingly, analysis at the computational level has drawn on ideas from reinforcement learning, which provide a normative framework within which decision-making can be analyzed. More recently, the fruits of these extensive lines of research have made contact with investigations into the neural basis of decision making. Converging evidence now links reinforcement learning to specific neural substrates, assigning them precise computational roles. Specifically, electrophysiological recordings in behaving animals and functional imaging of human decision-making have revealed in the brain the existence of a key reinforcement learning signal, the temporal difference reward prediction error. Here, we first introduce the formal reinforcement learning framework. We then review the multiple lines of evidence linking reinforcement learning to the function of dopaminergic neurons in the mammalian midbrain and to more recent data from human imaging experiments. We further extend the discussion to aspects of learning not associated with phasic dopamine signals, such as learning of goal-directed responding that may not be dopamine-dependent, and learning about the vigor (or rate) with which actions should be performed that has been linked to tonic aspects of dopaminergic signaling. We end with a brief discussion of some of the limitations of the reinforcement learning framework, highlighting questions for future research.