Publications by Year: Forthcoming

Zorowitz, S., Niv, Y., & Bennett, D. (Forthcoming). Inattentive responding can induce spurious associations between task behavior and symptom measures. PreprintAbstract

A common research design in the field of computational psychiatry involves leveraging the power of online participant recruitment to assess correlations between behavior in cognitive tasks and the self-reported severity of psychiatric symptoms in large, diverse samples. Although large online samples have many advantages for psychiatric research, some potential pitfalls of this research design are not widely understood. Here we detail circumstances in which entirely spurious correlations may arise between task behavior and symptom severity as a result of inadequate screening of careless or low-effort responding on psychiatric symptom surveys. Specifically, since many psychiatric symptom surveys have asymmetric ground-truth score distributions in the general population, participants who respond carelessly on these surveys will show apparently elevated symptom levels. If these participants are similarly careless in their task performance, and are not excluded from analysis, this may result in a spurious association between greater symptom scores and worse behavioral task performance. Here, we demonstrate exactly this pattern of results in N = 386 participants recruited online to complete a self-report symptom battery and a short reversal-learning choice task. We show that many behavior-symptom correlations are entirely abolished when participants flagged for careless responding on surveys are excluded from analysis. We also show that exclusion based on task performance alone is not sufficient to prevent these spurious correlations. Of note, we demonstrate that false-positive rates for these spurious correlations increase with sample size, contrary to common assumptions. We offer guidance on how researchers using this general experimental design can guard against this issue in future research; in particular, we recommend the adoption of screening methods for self-report measures that are currently uncommon in this field.

Hitchcock, P., Forman, E., Rothstein, N., Zhang, F., Kounios, J., Niv, Y., & Sims, C. (Forthcoming). Rumination derails reinforcement learning with possible implications for ineffective behavior. PreprintAbstract

How does rumination affect reinforcement learning — the ubiquitous process by which we adjust behavior after error in order to behave more effectively in the future? In a within-subject design (n=49), we tested whether experimentally induced rumination disrupts reinforcement learning in a multidimensional learning task previously shown to rely on selective attention. Rumination impaired performance, yet unexpectedly this impairment could not be attributed to decreased attentional breadth (quantified using a “decay” parameter in a computational model). Instead, trait rumination (between subjects) was associated with higher decay rates (implying narrower attention), yet not with impaired performance. Our task-performance results accord with the possibility that state rumination promotes stress-generating behavior in part by disrupting reinforcement learning. The trait-rumination finding accords with the predictions of a prominent model of trait rumination (the attentional-scope model). More work is needed to understand the specific mechanisms by which state rumination disrupts reinforcement learning.

Bennett, D., Niv, Y., & Langdon, A. (Forthcoming). Value-free reinforcement learning: Policy optimization as a minimal model of operant behavior. PreprintAbstract

Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for dierent actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural ndings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of `value' in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.

Niv, Y. (Forthcoming). The primacy of behavioral research for understanding the brain. PreprintAbstract

Understanding the brain requires us to answer both what the brain does, and how it does it. Using a series of examples, I make the case that behavior is often more useful than neuroscientific measurements for answering the first question. Moreover, I show that even for “how” questions that pertain to neural mechanism, a well-crafted behavioral paradigm can offer deeper insight and stronger constraints on computational and mechanistic models than do many highly challenging (and very expensive) neural studies. I conclude that behavioral, rather than neuroscientific research, is essential for understanding the brain, contrary to the opinion of prominent funding bodies and scientific journals, who erroneously place neural data on a pedestal and consider behavior to be subsidiary.

Chan, S. C. Y., Schuck, N. W., Lopatina, N., Schoenbaum, G., & Niv, Y. (Forthcoming). Orbitofrontal cortex and learning predictions of state transitions. PreprintAbstract
Learning the transition structure of the environment – the probabilities of transitioning from one environmental state to another – is a key prerequisite for goal-directed planning and model-based decision making. To investigate the role of the orbitofrontal cortex (OFC) in goal-directed planning and decision making, we used fMRI to assess univariate and multivariate activity in the OFC while humans experienced state transitions that varied in degree of surprise. In convergence with recent evidence, we found that OFC activity was related to greater learning about transition structure, both across subjects and on a trial-by-trial basis. However, this relationship was inconsistent with a straightforward interpretation of OFC activity as representing a state prediction error that would facilitate learning of transitions via error-correcting mechanisms. The state prediction error hypothesis predicts that OFC activity at the time of observing an outcome should increase expectation of that observed outcome on subsequent trials. Instead, our results showed that OFC activity was associated with increased expectation of the more probable outcome; that is, with more optimal predictions. Our findings add to the evidence of OFC involvement in learning state-to-state transition structure, while providing new constraints for algorithmic hypotheses regarding how these transitions are learned.
Bennett, D., Davidson, G., & Niv, Y. (Forthcoming). A model of mood as integrated advantage. PreprintAbstract
Mood is an integrative and diffuse affective state that is thought to exert a pervasive effect on cognition and behavior. At the same time, mood itself is thought to fluctuate slowly as a product of feedback from interactions with the environment. Here we present a new computational theory of the valence of mood—the Integrated Advantage model—that seeks to account for this bidirectional interaction. Adopting theoretical formalisms from reinforcement learning, we propose to conceptualize the valence of mood as a leaky integral of an agent’s appraisals of the Advantage of its actions. This model generalizes and extends previous models of mood wherein affective valence was conceptualized as a moving average of reward prediction errors. We give a full theoretical derivation of the Integrated Advantage model and provide a functional explanation of how an integrated-Advantage variable could be deployed adaptively by a biological agent to accelerate learning in complex and/or stochastic environments. Specifically, drawing on stochastic optimization theory, we propose that an agent can utilize our hypothesized form of mood to approximate a momentum-based update to its behavioral policy, thereby facilitating rapid learning of optimal actions. We then show how this model of mood provides a principled and parsimonious explanation for a number of contextual effects on mood from the affective science literature, including expectation- and surprise-related effects, counterfactual effects from information about foregone alternatives, action-typicality effects, and action/inaction asymmetry.