Attention to Action: Leveraging Attention for Object Navigation

Shi Chen; Qi Zhao

Attention to Action: Leveraging Attention for Object Navigation

Shi Chen, Qi Zhao

Computer Science and Engineering

Research output: Contribution to conference › Paper › peer-review

1 Scopus citations

Abstract

Navigation towards different objects is prevalent in daily lives. State-of-the-art embodied vision methods accomplish the task by implicitly learning the relationship between perception and action or optimizing them with separate objectives. While effective in some cases, they have not yet developed (1) a tight integration of perception and action, and (2) the capability to address visual variance that is significant in the moving and embodied setting. To close these research gaps, we introduce a new attention mechanism, which represents the pursuit of visual information that highlights the potential directions of final targets. Instead of working conventionally as a weighted map for aggregating visual features, the new attention is defined as a compact intermediate state connecting visual observations and action. It is explicitly coupled with action to enable a joint optimization through a consistent action space, and also plays an importance role in learning features more robust against visual variance. Our experiments show significant improvements in navigation across various types of unseen environments with known and unknown semantics. Ablation analyses indicate that the proposed method correlates attention patterns with the directions of action, and overcomes visual variance by distilling useful information from visual observations into attention distribution. Our code is publicly available at https://github.com/szzexpoi/ana.

Original language	English (US)
State	Published - 2021
Event	32nd British Machine Vision Conference, BMVC 2021 - Virtual, Online Duration: Nov 22 2021 → Nov 25 2021

Conference

Conference	32nd British Machine Vision Conference, BMVC 2021
City	Virtual, Online
Period	11/22/21 → 11/25/21

Bibliographical note

Publisher Copyright:
© 2021. The copyright of this document resides with its authors.

OpenUrl availability

Full text

Cite this

@conference{3bb39d3192d54b0fb516a4d92d19ff46,

title = "Attention to Action: Leveraging Attention for Object Navigation",

abstract = "Navigation towards different objects is prevalent in daily lives. State-of-the-art embodied vision methods accomplish the task by implicitly learning the relationship between perception and action or optimizing them with separate objectives. While effective in some cases, they have not yet developed (1) a tight integration of perception and action, and (2) the capability to address visual variance that is significant in the moving and embodied setting. To close these research gaps, we introduce a new attention mechanism, which represents the pursuit of visual information that highlights the potential directions of final targets. Instead of working conventionally as a weighted map for aggregating visual features, the new attention is defined as a compact intermediate state connecting visual observations and action. It is explicitly coupled with action to enable a joint optimization through a consistent action space, and also plays an importance role in learning features more robust against visual variance. Our experiments show significant improvements in navigation across various types of unseen environments with known and unknown semantics. Ablation analyses indicate that the proposed method correlates attention patterns with the directions of action, and overcomes visual variance by distilling useful information from visual observations into attention distribution. Our code is publicly available at https://github.com/szzexpoi/ana.",

author = "Shi Chen and Qi Zhao",

note = "Publisher Copyright: {\textcopyright} 2021. The copyright of this document resides with its authors.; 32nd British Machine Vision Conference, BMVC 2021 ; Conference date: 22-11-2021 Through 25-11-2021",

year = "2021",

language = "English (US)",

}

TY - CONF

T1 - Attention to Action

T2 - 32nd British Machine Vision Conference, BMVC 2021

AU - Chen, Shi

AU - Zhao, Qi

PY - 2021

Y1 - 2021

N2 - Navigation towards different objects is prevalent in daily lives. State-of-the-art embodied vision methods accomplish the task by implicitly learning the relationship between perception and action or optimizing them with separate objectives. While effective in some cases, they have not yet developed (1) a tight integration of perception and action, and (2) the capability to address visual variance that is significant in the moving and embodied setting. To close these research gaps, we introduce a new attention mechanism, which represents the pursuit of visual information that highlights the potential directions of final targets. Instead of working conventionally as a weighted map for aggregating visual features, the new attention is defined as a compact intermediate state connecting visual observations and action. It is explicitly coupled with action to enable a joint optimization through a consistent action space, and also plays an importance role in learning features more robust against visual variance. Our experiments show significant improvements in navigation across various types of unseen environments with known and unknown semantics. Ablation analyses indicate that the proposed method correlates attention patterns with the directions of action, and overcomes visual variance by distilling useful information from visual observations into attention distribution. Our code is publicly available at https://github.com/szzexpoi/ana.

AB - Navigation towards different objects is prevalent in daily lives. State-of-the-art embodied vision methods accomplish the task by implicitly learning the relationship between perception and action or optimizing them with separate objectives. While effective in some cases, they have not yet developed (1) a tight integration of perception and action, and (2) the capability to address visual variance that is significant in the moving and embodied setting. To close these research gaps, we introduce a new attention mechanism, which represents the pursuit of visual information that highlights the potential directions of final targets. Instead of working conventionally as a weighted map for aggregating visual features, the new attention is defined as a compact intermediate state connecting visual observations and action. It is explicitly coupled with action to enable a joint optimization through a consistent action space, and also plays an importance role in learning features more robust against visual variance. Our experiments show significant improvements in navigation across various types of unseen environments with known and unknown semantics. Ablation analyses indicate that the proposed method correlates attention patterns with the directions of action, and overcomes visual variance by distilling useful information from visual observations into attention distribution. Our code is publicly available at https://github.com/szzexpoi/ana.

UR - http://www.scopus.com/inward/record.url?scp=85176099538&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85176099538&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85176099538

Y2 - 22 November 2021 through 25 November 2021

ER -

Attention to Action: Leveraging Attention for Object Navigation

Abstract

Conference

Bibliographical note

OpenUrl availability

Other files and links

Fingerprint

Cite this