Boosted Attention: Leveraging Human Attention for Image Captioning

Shi Chen; Qi Zhao

doi:10.1007/978-3-030-01252-6_5

Boosted Attention: Leveraging Human Attention for Image Captioning

Shi Chen, Qi Zhao

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

9 Scopus citations

Abstract

Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly by optimizing the captioning objectives. While somewhat effective, the learned top-down attention can fail to focus on correct regions of interest without direct supervision of attention. Inspired by the human visual system which is driven by not only the task-specific top-down signals but also the visual stimuli, we in this work propose to use both types of attention for image captioning. In particular, we highlight the complementary nature of the two types of attention and develop a model (Boosted Attention) to integrate them for image captioning. We validate the proposed approach with state-of-the-art performance across various evaluation metrics.

Original language	English (US)
Title of host publication	Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
Editors	Vittorio Ferrari, Cristian Sminchisescu, Yair Weiss, Martial Hebert
Publisher	Springer Verlag
Pages	72-88
Number of pages	17
ISBN (Print)	9783030012519
DOIs	https://doi.org/10.1007/978-3-030-01252-6_5
State	Published - 2018
Event	15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany Duration: Sep 8 2018 → Sep 14 2018

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11215 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	15th European Conference on Computer Vision, ECCV 2018
Country/Territory	Germany
City	Munich
Period	9/8/18 → 9/14/18

Bibliographical note

Publisher Copyright:
© 2018, Springer Nature Switzerland AG.

Keywords

Human attention
Image captioning
Visual attention

Access

10.1007/978-3-030-01252-6_5

OpenUrl availability

Full text

Cite this

Chen, S., & Zhao, Q. (2018). Boosted Attention: Leveraging Human Attention for Image Captioning. In V. Ferrari, C. Sminchisescu, Y. Weiss, & M. Hebert (Eds.), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings (pp. 72-88). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11215 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-01252-6_5

Boosted Attention: Leveraging Human Attention for Image Captioning. / Chen, Shi; Zhao, Qi.
Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. ed. / Vittorio Ferrari; Cristian Sminchisescu; Yair Weiss; Martial Hebert. Springer Verlag, 2018. p. 72-88 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11215 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Chen, S & Zhao, Q 2018, Boosted Attention: Leveraging Human Attention for Image Captioning. in V Ferrari, C Sminchisescu, Y Weiss & M Hebert (eds), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11215 LNCS, Springer Verlag, pp. 72-88, 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, 9/8/18. https://doi.org/10.1007/978-3-030-01252-6_5

Chen S, Zhao Q. Boosted Attention: Leveraging Human Attention for Image Captioning. In Ferrari V, Sminchisescu C, Weiss Y, Hebert M, editors, Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Springer Verlag. 2018. p. 72-88. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-01252-6_5

Chen, Shi ; Zhao, Qi. / Boosted Attention : Leveraging Human Attention for Image Captioning. Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. editor / Vittorio Ferrari ; Cristian Sminchisescu ; Yair Weiss ; Martial Hebert. Springer Verlag, 2018. pp. 72-88 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{cef9eea9f1a04a3b909560b9c14d543c,

title = "Boosted Attention: Leveraging Human Attention for Image Captioning",

abstract = "Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly by optimizing the captioning objectives. While somewhat effective, the learned top-down attention can fail to focus on correct regions of interest without direct supervision of attention. Inspired by the human visual system which is driven by not only the task-specific top-down signals but also the visual stimuli, we in this work propose to use both types of attention for image captioning. In particular, we highlight the complementary nature of the two types of attention and develop a model (Boosted Attention) to integrate them for image captioning. We validate the proposed approach with state-of-the-art performance across various evaluation metrics.",

keywords = "Human attention, Image captioning, Visual attention",

author = "Shi Chen and Qi Zhao",

note = "Publisher Copyright: {\textcopyright} 2018, Springer Nature Switzerland AG.; 15th European Conference on Computer Vision, ECCV 2018 ; Conference date: 08-09-2018 Through 14-09-2018",

year = "2018",

doi = "10.1007/978-3-030-01252-6_5",

language = "English (US)",

isbn = "9783030012519",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "72--88",

editor = "Vittorio Ferrari and Cristian Sminchisescu and Yair Weiss and Martial Hebert",

booktitle = "Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings",

}

TY - GEN

T1 - Boosted Attention

T2 - 15th European Conference on Computer Vision, ECCV 2018

AU - Chen, Shi

AU - Zhao, Qi

PY - 2018

Y1 - 2018

N2 - Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly by optimizing the captioning objectives. While somewhat effective, the learned top-down attention can fail to focus on correct regions of interest without direct supervision of attention. Inspired by the human visual system which is driven by not only the task-specific top-down signals but also the visual stimuli, we in this work propose to use both types of attention for image captioning. In particular, we highlight the complementary nature of the two types of attention and develop a model (Boosted Attention) to integrate them for image captioning. We validate the proposed approach with state-of-the-art performance across various evaluation metrics.

AB - Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly by optimizing the captioning objectives. While somewhat effective, the learned top-down attention can fail to focus on correct regions of interest without direct supervision of attention. Inspired by the human visual system which is driven by not only the task-specific top-down signals but also the visual stimuli, we in this work propose to use both types of attention for image captioning. In particular, we highlight the complementary nature of the two types of attention and develop a model (Boosted Attention) to integrate them for image captioning. We validate the proposed approach with state-of-the-art performance across various evaluation metrics.

KW - Human attention

KW - Image captioning

KW - Visual attention

UR - http://www.scopus.com/inward/record.url?scp=85055124853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055124853&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-01252-6_5

DO - 10.1007/978-3-030-01252-6_5

M3 - Conference contribution

AN - SCOPUS:85055124853

SN - 9783030012519

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 72

EP - 88

BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings

A2 - Ferrari, Vittorio

A2 - Sminchisescu, Cristian

A2 - Weiss, Yair

A2 - Hebert, Martial

PB - Springer Verlag

Y2 - 8 September 2018 through 14 September 2018

ER -

Boosted Attention: Leveraging Human Attention for Image Captioning

Abstract

Publication series

Other

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this