Building relational world models for reinforcement learning

Trevor Walker; Lisa Torrey; Jude Shavlik; Richard MacLin

doi:10.1007/978-3-540-78469-2_27

Building relational world models for reinforcement learning

Trevor Walker, Lisa Torrey, Jude Shavlik, Richard MacLin

Computer Science (Duluth)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

Many reinforcement learning domains are highly relational.Whiletraditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward states, utilizing inductive logic programming to learn their preimage, logical definitions of the region of state space that leads to the high-reward states via some action. These learned preimages are chained together to form an MDP that abstractly represents the domain. AMBIL estimates the reward and transition probabilities of this MDP from past experience. Since our MDPs are small, AMBIL uses value-iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. AMBIL is able to employ complex background knowledge and supports relational representations. Empirical evaluation on both synthetic domains and a sub-task of the RoboCup soccer domain shows significant performance gains compared to standard Q-learning.

Original language	English (US)
Title of host publication	Inductive Logic Programming - 17th International Conference, ILP 2007, Revised Selected Papers
Pages	280-291
Number of pages	12
DOIs	https://doi.org/10.1007/978-3-540-78469-2_27
State	Published - 2008
Event	17th International Conference on Inductive Logic Programming, ILP 2007 - Corvallis, OR, United States Duration: Jun 19 2007 → Jun 21 2007

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	4894 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	17th International Conference on Inductive Logic Programming, ILP 2007
Country/Territory	United States
City	Corvallis, OR
Period	6/19/07 → 6/21/07

Access

10.1007/978-3-540-78469-2_27

OpenUrl availability

Full text

Cite this

Walker, T., Torrey, L., Shavlik, J., & MacLin, R. (2008). Building relational world models for reinforcement learning. In Inductive Logic Programming - 17th International Conference, ILP 2007, Revised Selected Papers (pp. 280-291). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4894 LNAI). https://doi.org/10.1007/978-3-540-78469-2_27

Building relational world models for reinforcement learning. / Walker, Trevor; Torrey, Lisa; Shavlik, Jude et al.
Inductive Logic Programming - 17th International Conference, ILP 2007, Revised Selected Papers. 2008. p. 280-291 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4894 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Walker, T, Torrey, L, Shavlik, J & MacLin, R 2008, Building relational world models for reinforcement learning. in Inductive Logic Programming - 17th International Conference, ILP 2007, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4894 LNAI, pp. 280-291, 17th International Conference on Inductive Logic Programming, ILP 2007, Corvallis, OR, United States, 6/19/07. https://doi.org/10.1007/978-3-540-78469-2_27

Walker T, Torrey L, Shavlik J, MacLin R. Building relational world models for reinforcement learning. In Inductive Logic Programming - 17th International Conference, ILP 2007, Revised Selected Papers. 2008. p. 280-291. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-540-78469-2_27

@inproceedings{9cf8bf05dc1941789122b1218b3f873f,

title = "Building relational world models for reinforcement learning",

abstract = "Many reinforcement learning domains are highly relational.Whiletraditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward states, utilizing inductive logic programming to learn their preimage, logical definitions of the region of state space that leads to the high-reward states via some action. These learned preimages are chained together to form an MDP that abstractly represents the domain. AMBIL estimates the reward and transition probabilities of this MDP from past experience. Since our MDPs are small, AMBIL uses value-iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. AMBIL is able to employ complex background knowledge and supports relational representations. Empirical evaluation on both synthetic domains and a sub-task of the RoboCup soccer domain shows significant performance gains compared to standard Q-learning.",

author = "Trevor Walker and Lisa Torrey and Jude Shavlik and Richard MacLin",

year = "2008",

doi = "10.1007/978-3-540-78469-2_27",

language = "English (US)",

isbn = "3540784683",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "280--291",

booktitle = "Inductive Logic Programming - 17th International Conference, ILP 2007, Revised Selected Papers",

note = "17th International Conference on Inductive Logic Programming, ILP 2007 ; Conference date: 19-06-2007 Through 21-06-2007",

}

TY - GEN

T1 - Building relational world models for reinforcement learning

AU - Walker, Trevor

AU - Torrey, Lisa

AU - Shavlik, Jude

AU - MacLin, Richard

PY - 2008

Y1 - 2008

N2 - Many reinforcement learning domains are highly relational.Whiletraditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward states, utilizing inductive logic programming to learn their preimage, logical definitions of the region of state space that leads to the high-reward states via some action. These learned preimages are chained together to form an MDP that abstractly represents the domain. AMBIL estimates the reward and transition probabilities of this MDP from past experience. Since our MDPs are small, AMBIL uses value-iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. AMBIL is able to employ complex background knowledge and supports relational representations. Empirical evaluation on both synthetic domains and a sub-task of the RoboCup soccer domain shows significant performance gains compared to standard Q-learning.

AB - Many reinforcement learning domains are highly relational.Whiletraditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward states, utilizing inductive logic programming to learn their preimage, logical definitions of the region of state space that leads to the high-reward states via some action. These learned preimages are chained together to form an MDP that abstractly represents the domain. AMBIL estimates the reward and transition probabilities of this MDP from past experience. Since our MDPs are small, AMBIL uses value-iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. AMBIL is able to employ complex background knowledge and supports relational representations. Empirical evaluation on both synthetic domains and a sub-task of the RoboCup soccer domain shows significant performance gains compared to standard Q-learning.

UR - http://www.scopus.com/inward/record.url?scp=40249113257&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=40249113257&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-78469-2_27

DO - 10.1007/978-3-540-78469-2_27

M3 - Conference contribution

AN - SCOPUS:40249113257

SN - 3540784683

SN - 9783540784685

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 280

EP - 291

BT - Inductive Logic Programming - 17th International Conference, ILP 2007, Revised Selected Papers

T2 - 17th International Conference on Inductive Logic Programming, ILP 2007

Y2 - 19 June 2007 through 21 June 2007

ER -

Building relational world models for reinforcement learning

Abstract

Publication series

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this