Attention to Action: Leveraging Attention for Object Navigation

Shi Chen, Qi Zhao

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations

Abstract

Navigation towards different objects is prevalent in daily lives. State-of-the-art embodied vision methods accomplish the task by implicitly learning the relationship between perception and action or optimizing them with separate objectives. While effective in some cases, they have not yet developed (1) a tight integration of perception and action, and (2) the capability to address visual variance that is significant in the moving and embodied setting. To close these research gaps, we introduce a new attention mechanism, which represents the pursuit of visual information that highlights the potential directions of final targets. Instead of working conventionally as a weighted map for aggregating visual features, the new attention is defined as a compact intermediate state connecting visual observations and action. It is explicitly coupled with action to enable a joint optimization through a consistent action space, and also plays an importance role in learning features more robust against visual variance. Our experiments show significant improvements in navigation across various types of unseen environments with known and unknown semantics. Ablation analyses indicate that the proposed method correlates attention patterns with the directions of action, and overcomes visual variance by distilling useful information from visual observations into attention distribution. Our code is publicly available at https://github.com/szzexpoi/ana.

Original languageEnglish (US)
StatePublished - 2021
Event32nd British Machine Vision Conference, BMVC 2021 - Virtual, Online
Duration: Nov 22 2021Nov 25 2021

Conference

Conference32nd British Machine Vision Conference, BMVC 2021
CityVirtual, Online
Period11/22/2111/25/21

Bibliographical note

Publisher Copyright:
© 2021. The copyright of this document resides with its authors.

Fingerprint

Dive into the research topics of 'Attention to Action: Leveraging Attention for Object Navigation'. Together they form a unique fingerprint.

Cite this