Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures

Hengyue Liang; Xibai Lou; Yang Yang; Changhyun Choi

doi:10.1109/ICRA48506.2021.9561737

Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures

Hengyue Liang, Xibai Lou, Yang Yang, Changhyun Choi

Electrical and Computer Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

12 Scopus citations

Abstract

This paper introduces a challenging object grasping task and proposes a self-supervised learning approach. The goal of the task is to grasp an object which is not feasible with a single parallel gripper, but only with harnessing environment fixtures (e.g., walls, furniture, heavy objects). This Slide-to-Wall grasping task assumes no prior knowledge except the partial observation of a target object. Hence the robot should learn an effective policy given a scene observation that may include the target object, environmental fixtures, and any other disturbing objects. We formulate the problem as visual affordances learning for which Target-Oriented Deep Q-Network (TO-DQN) is proposed to efficiently learn visual affordance maps (i.e., Q-maps) to guide robot actions. Since the training necessitates robot's exploration and collision with the fixtures, TO-DQN is first trained safely with a simulated robot manipulator and then applied to a real robot. We empirically show that TO-DQN can learn to solve the task in different environment settings in simulation and outperforms a standard and a variant of Deep Q-Network (DQN) in terms of training efficiency and robustness. The testing performance in both simulation and real-robot experiments shows that the policy trained by TO-DQN achieves comparable performance to humans.

Original language	English (US)
Title of host publication	2021 IEEE International Conference on Robotics and Automation, ICRA 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	6422-6428
Number of pages	7
ISBN (Electronic)	9781728190778
DOIs	https://doi.org/10.1109/ICRA48506.2021.9561737
State	Published - 2021
Event	2021 IEEE International Conference on Robotics and Automation, ICRA 2021 - Xi'an, China Duration: May 30 2021 → Jun 5 2021

Publication series

Name	Proceedings - IEEE International Conference on Robotics and Automation
Volume	2021-May
ISSN (Print)	1050-4729

Conference

Conference	2021 IEEE International Conference on Robotics and Automation, ICRA 2021
Country/Territory	China
City	Xi'an
Period	5/30/21 → 6/5/21

Bibliographical note

Funding Information:
*This work was in part supported by the MnDRIVE Initiative on Robotics, Sensors, and Advanced Manufacturing. †The authors are with the University of Minnesota, Minneapolis, MN 55455, USA. {liang656, lou00015, yang5276, cchoi}@umn.edu

Publisher Copyright:
© 2021 IEEE

Keywords

Deep learning in grasping
Grasping
Manipulation
Perception for grasping and manipulation

Access

10.1109/ICRA48506.2021.9561737

OpenUrl availability

Full text

Cite this

Liang, H., Lou, X., Yang, Y., & Choi, C. (2021). Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures. In 2021 IEEE International Conference on Robotics and Automation, ICRA 2021 (pp. 6422-6428). (Proceedings - IEEE International Conference on Robotics and Automation; Vol. 2021-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICRA48506.2021.9561737

Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures. / Liang, Hengyue; Lou, Xibai; Yang, Yang et al.
2021 IEEE International Conference on Robotics and Automation, ICRA 2021. Institute of Electrical and Electronics Engineers Inc., 2021. p. 6422-6428 (Proceedings - IEEE International Conference on Robotics and Automation; Vol. 2021-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Liang, H, Lou, X, Yang, Y & Choi, C 2021, Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures. in 2021 IEEE International Conference on Robotics and Automation, ICRA 2021. Proceedings - IEEE International Conference on Robotics and Automation, vol. 2021-May, Institute of Electrical and Electronics Engineers Inc., pp. 6422-6428, 2021 IEEE International Conference on Robotics and Automation, ICRA 2021, Xi'an, China, 5/30/21. https://doi.org/10.1109/ICRA48506.2021.9561737

Liang H, Lou X, Yang Y, Choi C. Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures. In 2021 IEEE International Conference on Robotics and Automation, ICRA 2021. Institute of Electrical and Electronics Engineers Inc. 2021. p. 6422-6428. (Proceedings - IEEE International Conference on Robotics and Automation). doi: 10.1109/ICRA48506.2021.9561737

Liang, Hengyue ; Lou, Xibai ; Yang, Yang et al. / Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures. 2021 IEEE International Conference on Robotics and Automation, ICRA 2021. Institute of Electrical and Electronics Engineers Inc., 2021. pp. 6422-6428 (Proceedings - IEEE International Conference on Robotics and Automation).

@inproceedings{04effa64336e4886b1b55491343aabc0,

title = "Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures",

abstract = "This paper introduces a challenging object grasping task and proposes a self-supervised learning approach. The goal of the task is to grasp an object which is not feasible with a single parallel gripper, but only with harnessing environment fixtures (e.g., walls, furniture, heavy objects). This Slide-to-Wall grasping task assumes no prior knowledge except the partial observation of a target object. Hence the robot should learn an effective policy given a scene observation that may include the target object, environmental fixtures, and any other disturbing objects. We formulate the problem as visual affordances learning for which Target-Oriented Deep Q-Network (TO-DQN) is proposed to efficiently learn visual affordance maps (i.e., Q-maps) to guide robot actions. Since the training necessitates robot's exploration and collision with the fixtures, TO-DQN is first trained safely with a simulated robot manipulator and then applied to a real robot. We empirically show that TO-DQN can learn to solve the task in different environment settings in simulation and outperforms a standard and a variant of Deep Q-Network (DQN) in terms of training efficiency and robustness. The testing performance in both simulation and real-robot experiments shows that the policy trained by TO-DQN achieves comparable performance to humans.",

keywords = "Deep learning in grasping, Grasping, Manipulation, Perception for grasping and manipulation",

author = "Hengyue Liang and Xibai Lou and Yang Yang and Changhyun Choi",

note = "Funding Information: *This work was in part supported by the MnDRIVE Initiative on Robotics, Sensors, and Advanced Manufacturing. †The authors are with the University of Minnesota, Minneapolis, MN 55455, USA. {liang656, lou00015, yang5276, cchoi}@umn.edu Publisher Copyright: {\textcopyright} 2021 IEEE; 2021 IEEE International Conference on Robotics and Automation, ICRA 2021 ; Conference date: 30-05-2021 Through 05-06-2021",

year = "2021",

doi = "10.1109/ICRA48506.2021.9561737",

language = "English (US)",

series = "Proceedings - IEEE International Conference on Robotics and Automation",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6422--6428",

booktitle = "2021 IEEE International Conference on Robotics and Automation, ICRA 2021",

}

TY - GEN

T1 - Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures

AU - Liang, Hengyue

AU - Lou, Xibai

AU - Yang, Yang

AU - Choi, Changhyun

N1 - Funding Information: *This work was in part supported by the MnDRIVE Initiative on Robotics, Sensors, and Advanced Manufacturing. †The authors are with the University of Minnesota, Minneapolis, MN 55455, USA. {liang656, lou00015, yang5276, cchoi}@umn.edu Publisher Copyright: © 2021 IEEE

PY - 2021

Y1 - 2021

N2 - This paper introduces a challenging object grasping task and proposes a self-supervised learning approach. The goal of the task is to grasp an object which is not feasible with a single parallel gripper, but only with harnessing environment fixtures (e.g., walls, furniture, heavy objects). This Slide-to-Wall grasping task assumes no prior knowledge except the partial observation of a target object. Hence the robot should learn an effective policy given a scene observation that may include the target object, environmental fixtures, and any other disturbing objects. We formulate the problem as visual affordances learning for which Target-Oriented Deep Q-Network (TO-DQN) is proposed to efficiently learn visual affordance maps (i.e., Q-maps) to guide robot actions. Since the training necessitates robot's exploration and collision with the fixtures, TO-DQN is first trained safely with a simulated robot manipulator and then applied to a real robot. We empirically show that TO-DQN can learn to solve the task in different environment settings in simulation and outperforms a standard and a variant of Deep Q-Network (DQN) in terms of training efficiency and robustness. The testing performance in both simulation and real-robot experiments shows that the policy trained by TO-DQN achieves comparable performance to humans.

AB - This paper introduces a challenging object grasping task and proposes a self-supervised learning approach. The goal of the task is to grasp an object which is not feasible with a single parallel gripper, but only with harnessing environment fixtures (e.g., walls, furniture, heavy objects). This Slide-to-Wall grasping task assumes no prior knowledge except the partial observation of a target object. Hence the robot should learn an effective policy given a scene observation that may include the target object, environmental fixtures, and any other disturbing objects. We formulate the problem as visual affordances learning for which Target-Oriented Deep Q-Network (TO-DQN) is proposed to efficiently learn visual affordance maps (i.e., Q-maps) to guide robot actions. Since the training necessitates robot's exploration and collision with the fixtures, TO-DQN is first trained safely with a simulated robot manipulator and then applied to a real robot. We empirically show that TO-DQN can learn to solve the task in different environment settings in simulation and outperforms a standard and a variant of Deep Q-Network (DQN) in terms of training efficiency and robustness. The testing performance in both simulation and real-robot experiments shows that the policy trained by TO-DQN achieves comparable performance to humans.

KW - Deep learning in grasping

KW - Grasping

KW - Manipulation

KW - Perception for grasping and manipulation

UR - http://www.scopus.com/inward/record.url?scp=85125443103&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85125443103&partnerID=8YFLogxK

U2 - 10.1109/ICRA48506.2021.9561737

DO - 10.1109/ICRA48506.2021.9561737

M3 - Conference contribution

AN - SCOPUS:85125443103

T3 - Proceedings - IEEE International Conference on Robotics and Automation

SP - 6422

EP - 6428

BT - 2021 IEEE International Conference on Robotics and Automation, ICRA 2021

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2021 IEEE International Conference on Robotics and Automation, ICRA 2021

Y2 - 30 May 2021 through 5 June 2021

ER -

Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this