Inverse rational control with partially observable continuous nonlinear dynamics

Minhae Kwon; Saurabh Daptardar; Paul Schrater; Xaq Pitkow

Inverse rational control with partially observable continuous nonlinear dynamics

Minhae Kwon, Saurabh Daptardar, Paul Schrater, Xaq Pitkow

Research output: Contribution to journal › Conference article › peer-review

Abstract

A fundamental question in neuroscience is how the brain creates an internal model of the world to guide actions using sequences of ambiguous sensory information. This is naturally formulated as a reinforcement learning problem under partial observations, where an agent must estimate relevant latent variables in the world from its evidence, anticipate possible future states, and choose actions that optimize total expected reward. This problem can be solved by control theory, which allows us to find the optimal actions for a given system dynamics and objective function. However, animals often appear to behave suboptimally. Why? We hypothesize that animals have their own flawed internal model of the world, and choose actions with the highest expected subjective reward according to that flawed model. We describe this behavior as rational but not optimal. The problem of Inverse Rational Control (IRC) aims to identify which internal model would best explain an agent’s actions. Our contribution here generalizes past work on Inverse Rational Control which solved this problem for discrete control in partially observable Markov decision processes. Here we accommodate continuous nonlinear dynamics and continuous actions, and impute sensory observations corrupted by unknown noise that is private to the animal. We first build an optimal Bayesian agent that learns an optimal policy generalized over the entire model space of dynamics and subjective rewards using deep reinforcement learning. Crucially, this allows us to compute a likelihood over models for experimentally observable action trajectories acquired from a suboptimal agent. We then find the model parameters that maximize the likelihood using gradient ascent. Our method successfully recovers the true model of rational agents. This approach provides a foundation for interpreting the behavioral and neural dynamics of animal brains during complex tasks.

Original language	English (US)
Journal	Advances in Neural Information Processing Systems
Volume	2020-December
State	Published - 2020
Externally published	Yes
Event	34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online Duration: Dec 6 2020 → Dec 12 2020

Bibliographical note

Funding Information:
The authors thank Dora Angelaki, James Bridgewater, Kaushik Lakshminarasimhan, Baptiste Caziot, Zhengwei Wu, Rajkumar Raju, and Yizhou Chen for useful discussions. MK, SD, and XP were supported in part by an award from the McNair Foundation. SD and XP were supported in part by the Simons Collaboration on the Global Brain award 324143 and NSF 1450923 BRAIN 43092. MK and XP were supported in part by NSF CAREER Award IOS-1552868. MK was supported in part by National Research Foundation of Korea grant NRF-2020R1F1A1069182. PS and XP were supported in part by BRAIN Initiative grant NIH 5U01NS094368.

Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.

OpenUrl availability

Full text

Cite this

@article{a56292ad93cc4781884037f50179af6d,

title = "Inverse rational control with partially observable continuous nonlinear dynamics",

abstract = "A fundamental question in neuroscience is how the brain creates an internal model of the world to guide actions using sequences of ambiguous sensory information. This is naturally formulated as a reinforcement learning problem under partial observations, where an agent must estimate relevant latent variables in the world from its evidence, anticipate possible future states, and choose actions that optimize total expected reward. This problem can be solved by control theory, which allows us to find the optimal actions for a given system dynamics and objective function. However, animals often appear to behave suboptimally. Why? We hypothesize that animals have their own flawed internal model of the world, and choose actions with the highest expected subjective reward according to that flawed model. We describe this behavior as rational but not optimal. The problem of Inverse Rational Control (IRC) aims to identify which internal model would best explain an agent{\textquoteright}s actions. Our contribution here generalizes past work on Inverse Rational Control which solved this problem for discrete control in partially observable Markov decision processes. Here we accommodate continuous nonlinear dynamics and continuous actions, and impute sensory observations corrupted by unknown noise that is private to the animal. We first build an optimal Bayesian agent that learns an optimal policy generalized over the entire model space of dynamics and subjective rewards using deep reinforcement learning. Crucially, this allows us to compute a likelihood over models for experimentally observable action trajectories acquired from a suboptimal agent. We then find the model parameters that maximize the likelihood using gradient ascent. Our method successfully recovers the true model of rational agents. This approach provides a foundation for interpreting the behavioral and neural dynamics of animal brains during complex tasks.",

author = "Minhae Kwon and Saurabh Daptardar and Paul Schrater and Xaq Pitkow",

note = "Funding Information: The authors thank Dora Angelaki, James Bridgewater, Kaushik Lakshminarasimhan, Baptiste Caziot, Zhengwei Wu, Rajkumar Raju, and Yizhou Chen for useful discussions. MK, SD, and XP were supported in part by an award from the McNair Foundation. SD and XP were supported in part by the Simons Collaboration on the Global Brain award 324143 and NSF 1450923 BRAIN 43092. MK and XP were supported in part by NSF CAREER Award IOS-1552868. MK was supported in part by National Research Foundation of Korea grant NRF-2020R1F1A1069182. PS and XP were supported in part by BRAIN Initiative grant NIH 5U01NS094368. Publisher Copyright: {\textcopyright} 2020 Neural information processing systems foundation. All rights reserved.; 34th Conference on Neural Information Processing Systems, NeurIPS 2020 ; Conference date: 06-12-2020 Through 12-12-2020",

year = "2020",

language = "English (US)",

volume = "2020-December",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

}

TY - JOUR

T1 - Inverse rational control with partially observable continuous nonlinear dynamics

AU - Kwon, Minhae

AU - Daptardar, Saurabh

AU - Schrater, Paul

AU - Pitkow, Xaq

N1 - Funding Information: The authors thank Dora Angelaki, James Bridgewater, Kaushik Lakshminarasimhan, Baptiste Caziot, Zhengwei Wu, Rajkumar Raju, and Yizhou Chen for useful discussions. MK, SD, and XP were supported in part by an award from the McNair Foundation. SD and XP were supported in part by the Simons Collaboration on the Global Brain award 324143 and NSF 1450923 BRAIN 43092. MK and XP were supported in part by NSF CAREER Award IOS-1552868. MK was supported in part by National Research Foundation of Korea grant NRF-2020R1F1A1069182. PS and XP were supported in part by BRAIN Initiative grant NIH 5U01NS094368. Publisher Copyright: © 2020 Neural information processing systems foundation. All rights reserved.

PY - 2020

Y1 - 2020

N2 - A fundamental question in neuroscience is how the brain creates an internal model of the world to guide actions using sequences of ambiguous sensory information. This is naturally formulated as a reinforcement learning problem under partial observations, where an agent must estimate relevant latent variables in the world from its evidence, anticipate possible future states, and choose actions that optimize total expected reward. This problem can be solved by control theory, which allows us to find the optimal actions for a given system dynamics and objective function. However, animals often appear to behave suboptimally. Why? We hypothesize that animals have their own flawed internal model of the world, and choose actions with the highest expected subjective reward according to that flawed model. We describe this behavior as rational but not optimal. The problem of Inverse Rational Control (IRC) aims to identify which internal model would best explain an agent’s actions. Our contribution here generalizes past work on Inverse Rational Control which solved this problem for discrete control in partially observable Markov decision processes. Here we accommodate continuous nonlinear dynamics and continuous actions, and impute sensory observations corrupted by unknown noise that is private to the animal. We first build an optimal Bayesian agent that learns an optimal policy generalized over the entire model space of dynamics and subjective rewards using deep reinforcement learning. Crucially, this allows us to compute a likelihood over models for experimentally observable action trajectories acquired from a suboptimal agent. We then find the model parameters that maximize the likelihood using gradient ascent. Our method successfully recovers the true model of rational agents. This approach provides a foundation for interpreting the behavioral and neural dynamics of animal brains during complex tasks.

AB - A fundamental question in neuroscience is how the brain creates an internal model of the world to guide actions using sequences of ambiguous sensory information. This is naturally formulated as a reinforcement learning problem under partial observations, where an agent must estimate relevant latent variables in the world from its evidence, anticipate possible future states, and choose actions that optimize total expected reward. This problem can be solved by control theory, which allows us to find the optimal actions for a given system dynamics and objective function. However, animals often appear to behave suboptimally. Why? We hypothesize that animals have their own flawed internal model of the world, and choose actions with the highest expected subjective reward according to that flawed model. We describe this behavior as rational but not optimal. The problem of Inverse Rational Control (IRC) aims to identify which internal model would best explain an agent’s actions. Our contribution here generalizes past work on Inverse Rational Control which solved this problem for discrete control in partially observable Markov decision processes. Here we accommodate continuous nonlinear dynamics and continuous actions, and impute sensory observations corrupted by unknown noise that is private to the animal. We first build an optimal Bayesian agent that learns an optimal policy generalized over the entire model space of dynamics and subjective rewards using deep reinforcement learning. Crucially, this allows us to compute a likelihood over models for experimentally observable action trajectories acquired from a suboptimal agent. We then find the model parameters that maximize the likelihood using gradient ascent. Our method successfully recovers the true model of rational agents. This approach provides a foundation for interpreting the behavioral and neural dynamics of animal brains during complex tasks.

UR - http://www.scopus.com/inward/record.url?scp=85099547319&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85099547319&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85099547319

SN - 1049-5258

VL - 2020-December

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020

Y2 - 6 December 2020 through 12 December 2020

ER -

Inverse rational control with partially observable continuous nonlinear dynamics

Abstract

Bibliographical note

OpenUrl availability

Other files and links

Fingerprint

Cite this