Publications

2005
Niv, Y., Daw, N. D., Joel, D., & Dayan, P. D. (2005). Motivational effects on behavior: Towards a reinforcement learning model of rates of responding. In CoSyNe . Salt Lake City, Utah. PDF
Daw, N. D., Niv, Y., & Dayan, P. D. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience , 8 (12), 1704–1711. PDFAbstract
A broad range of neural and behavioral data suggests that the brain contains multiple systems for behavioral choice, including one associated with prefrontal cortex and another with dorsolateral striatum. However, such a surfeit of control raises an additional choice problem: how to arbitrate between the systems when they disagree. Here, we consider dual-action choice systems from a normative perspective, using the computational theory of reinforcement learning. We identify a key trade-off pitting computational simplicity against the flexible and statistically efficient use of experience. The trade-off is realized in a competition between the dorsolateral striatal and prefrontal systems. We suggest a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate. This provides a unifying account of a wealth of experimental evidence about the factors favoring dominance by either system.
2002
Joel, D., Niv, Y., & Ruppin, E. (2002). Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Networks , 15 (4-6), 535–547. PDFAbstract
A large number of computational models of information processing in the basal ganglia have been developed in recent years. Prominent in these are actor-critic models of basal ganglia functioning, which build on the strong resemblance between dopamine neuron activity and the temporal difference prediction error signal in the critic, and between dopamine-dependent long-term synaptic plasticity in the striatum and learning guided by a prediction error signal in the actor. We selectively review several actor-critic models of the basal ganglia with an emphasis on two important aspects: the way in which models of the critic reproduce the temporal dynamics of dopamine firing, and the extent to which models of the actor take into account known basal ganglia anatomy and physiology. To complement the efforts to relate basal ganglia mechanisms to reinforcement learning (RL), we introduce an alternative approach to modeling a critic network, which uses Evolutionary Computation techniques to 'evolve' an optimal RL mechanism, and relate the evolved mechanism to the basic model of the critic. We conclude our discussion of models of the critic by a critical discussion of the anatomical plausibility of implementations of a critic in basal ganglia circuitry, and conclude that such implementations build on assumptions that are inconsistent with the known anatomy of the basal ganglia. We return to the actor component of the actor-critic model, which is usually modeled at the striatal level with very little detail. We describe an alternative model of the basal ganglia which takes into account several important, and previously neglected, anatomical and physiological characteristics of basal ganglia-thalamocortical connectivity and suggests that the basal ganglia performs reinforcement-biased dimensionality reduction of cortical inputs. We further suggest that since such selective encoding may bias the representation at the level of the frontal cortex towards the selection of rewarded plans and actions, the reinforcement-driven dimensionality reduction framework may serve as a basis for basal ganglia actor models. We conclude with a short discussion of the dual role of the dopamine signal in RL and in behavioral switching. Copyright ©2002 Elsevier Science Ltd.
Niv, Y., Joel, D., Meilijson, I., & Ruppin, E. (2002). Evolution of reinforcement learning in foraging bees: A simple explanation for risk averse behavior. Neurocomputing , 44-46, 951–956. PDFAbstract
Reinforcement learning is a fundamental process by which organisms learn to achieve goals from their interactions with the environment. We use evolutionary computation techniques to derive (near-)optimal neuronal learning rules in a simple neural network model of decision-making in simulated bumblebees foraging for nectar. The resulting bees exhibit efficient reinforcement learning. The evolved synaptic plasticity dynamics give rise to varying exploration/exploitation levels and to the well-documented foraging strategy of risk aversion. This behavior is shown to emerge directly from optimal reinforcement learning, providing a biologically founded, parsimonious and novel explanation of risk-averse behavior. ©2002 Published by Elsevier Science B.V.
Niv, Y., Joel, D., Meilijson, I., & Ruppin, E. (2002). Evolution of Reinforcement Learning in Uncertain Environments: A Simple Explanation for Complex Foraging Behaviors. Adaptive Behavior , 10 (1), 5–24. PDF
2001
Niv, Y., Joel, D., Meilijson, I., & Ruppin, E. (2001). Evolution of Reinforcement Learning in Uncertain Environments : Emergence of Risk-Aversion and Matching. Learning . Tel-Aviv University. PDFAbstract
©Springer-Verlag Berlin Heidelberg 2001.Reinforcement learning (RL) is a fundamental process by which organisms learn to achieve a goal from interactions with the environment. Using Artificial Life techniques we derive (near-)optimal neuronal learning rules in a simple neural network model of decision-making in simulated bumblebees foraging for nectar. The resulting networks exhibit efficient RL, allowing the bees to respond rapidly to changes in reward contingencies. The evolved synaptic plasticity dynamics give rise to varying exploration/exploitation levels from which emerge the welldocumented foraging strategies of risk aversion and probability matching. These are shown to be a direct result of optimal RL, providing a biologically founded, parsimonious and novel explanation for these behaviors. Our results are corroborated by a rigorous mathematical analysis and by experiments in mobile robots.

Pages