Creating advice-taking reinforcement learners

Richard Maclin; Jude W. Shavlik

doi:10.1007/BF00114730

Creating advice-taking reinforcement learners

Richard Maclin, Jude W. Shavlik

Research output: Contribution to journal › Article › peer-review

129 Scopus citations

Abstract

Learning from reinforcements is a promising approach for creating intelligent agents However, reinforcement learning usually requires a large number of training episodes. We present and evaluate a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach the advice giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple imperative programming language. Based on techniques from knowledge-based neural networks, we insert these programs directly into the agent's utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that investigates several aspects of our approach and shows that, given good advice, a learner can achieve statistically significant gains in expected reward. A second experiment shows that advice improves the expected reward regardless of the stage of training at which it is given, while another study demonstrates that subsequent advice can result in further gains in reward. Finally, we present experimental results that indicate our method is more powerful than a naive technique for making use of advice.

Original language	English (US)
Pages (from-to)	251-281
Number of pages	31
Journal	Machine Learning
Volume	22
Issue number	1-3
DOIs	https://doi.org/10.1007/BF00114730
State	Published - 1996
Externally published	Yes

Keywords

Adaptive agents
Advice-giving
Knowledge-based neural networks
Learning from instruction
Neural networks
Q-learning
Reinforcement learning
Theory refinement

Access

10.1007/BF00114730

OpenUrl availability

Full text

Cite this

@article{55fab0337d724caba9f457fa65caf705,

title = "Creating advice-taking reinforcement learners",

abstract = "Learning from reinforcements is a promising approach for creating intelligent agents However, reinforcement learning usually requires a large number of training episodes. We present and evaluate a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach the advice giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple imperative programming language. Based on techniques from knowledge-based neural networks, we insert these programs directly into the agent's utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that investigates several aspects of our approach and shows that, given good advice, a learner can achieve statistically significant gains in expected reward. A second experiment shows that advice improves the expected reward regardless of the stage of training at which it is given, while another study demonstrates that subsequent advice can result in further gains in reward. Finally, we present experimental results that indicate our method is more powerful than a naive technique for making use of advice.",

keywords = "Adaptive agents, Advice-giving, Knowledge-based neural networks, Learning from instruction, Neural networks, Q-learning, Reinforcement learning, Theory refinement",

author = "Richard Maclin and Shavlik, {Jude W.}",

year = "1996",

doi = "10.1007/BF00114730",

language = "English (US)",

volume = "22",

pages = "251--281",

journal = "Machine Learning",

issn = "0885-6125",

publisher = "Springer Netherlands",

number = "1-3",

}

TY - JOUR

T1 - Creating advice-taking reinforcement learners

AU - Maclin, Richard

AU - Shavlik, Jude W.

PY - 1996

Y1 - 1996

N2 - Learning from reinforcements is a promising approach for creating intelligent agents However, reinforcement learning usually requires a large number of training episodes. We present and evaluate a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach the advice giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple imperative programming language. Based on techniques from knowledge-based neural networks, we insert these programs directly into the agent's utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that investigates several aspects of our approach and shows that, given good advice, a learner can achieve statistically significant gains in expected reward. A second experiment shows that advice improves the expected reward regardless of the stage of training at which it is given, while another study demonstrates that subsequent advice can result in further gains in reward. Finally, we present experimental results that indicate our method is more powerful than a naive technique for making use of advice.

AB - Learning from reinforcements is a promising approach for creating intelligent agents However, reinforcement learning usually requires a large number of training episodes. We present and evaluate a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach the advice giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple imperative programming language. Based on techniques from knowledge-based neural networks, we insert these programs directly into the agent's utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that investigates several aspects of our approach and shows that, given good advice, a learner can achieve statistically significant gains in expected reward. A second experiment shows that advice improves the expected reward regardless of the stage of training at which it is given, while another study demonstrates that subsequent advice can result in further gains in reward. Finally, we present experimental results that indicate our method is more powerful than a naive technique for making use of advice.

KW - Adaptive agents

KW - Advice-giving

KW - Knowledge-based neural networks

KW - Learning from instruction

KW - Neural networks

KW - Q-learning

KW - Reinforcement learning

KW - Theory refinement

UR - http://www.scopus.com/inward/record.url?scp=0029732210&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029732210&partnerID=8YFLogxK

U2 - 10.1007/BF00114730

DO - 10.1007/BF00114730

M3 - Article

AN - SCOPUS:0029732210

SN - 0885-6125

VL - 22

SP - 251

EP - 281

JO - Machine Learning

JF - Machine Learning

IS - 1-3

ER -

Creating advice-taking reinforcement learners

Abstract

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this