Leveraging Human Attention in Novel Object Captioning

Xianyu Chen; Ming Jiang; Qi Zhao

Leveraging Human Attention in Novel Object Captioning

Xianyu Chen, Ming Jiang, Qi Zhao

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

7 Scopus citations

Abstract

Image captioning models depend on training with paired image-text corpora, which poses various challenges in describing images containing novel objects absent from the training data. While previous novel object captioning methods rely on external image taggers or object detectors to describe novel objects, we present the Attention-based Novel Object Captioner (ANOC) that complements novel object captioners with human attention features that characterize generally important information independent of tasks. It introduces a gating mechanism that adaptively incorporates human attention with self-learned machine attention, with a Constrained Self-Critical Sequence Training method to address the exposure bias while maintaining constraints of novel object descriptions. Extensive experiments conducted on the nocaps and Held-Out COCO datasets demonstrate that our method considerably outperforms the state-of-the-art novel object captioners. Our source code is available at https://github.com/chenxy99/ANOC.

Original language	English (US)
Title of host publication	Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021
Editors	Zhi-Hua Zhou
Publisher	International Joint Conferences on Artificial Intelligence
Pages	622-628
Number of pages	7
ISBN (Electronic)	9780999241196
State	Published - 2021
Event	30th International Joint Conference on Artificial Intelligence, IJCAI 2021 - Virtual, Online, Canada Duration: Aug 19 2021 → Aug 27 2021

Publication series

Name	IJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)	1045-0823

Conference

Conference	30th International Joint Conference on Artificial Intelligence, IJCAI 2021
Country/Territory	Canada
City	Virtual, Online
Period	8/19/21 → 8/27/21

Bibliographical note

Funding Information:
This work is supported by NSF Grants 1908711.

Publisher Copyright:
© 2021 International Joint Conferences on Artificial Intelligence. All rights reserved.

OpenUrl availability

Full text

Cite this

Leveraging Human Attention in Novel Object Captioning. / Chen, Xianyu; Jiang, Ming ; Zhao, Qi.
Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021. ed. / Zhi-Hua Zhou. International Joint Conferences on Artificial Intelligence, 2021. p. 622-628 (IJCAI International Joint Conference on Artificial Intelligence).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Chen, X, Jiang, M & Zhao, Q 2021, Leveraging Human Attention in Novel Object Captioning. in Z-H Zhou (ed.), Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021. IJCAI International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence, pp. 622-628, 30th International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual, Online, Canada, 8/19/21.

@inproceedings{8aacb9c4af9e447aba95154f4932cc66,

title = "Leveraging Human Attention in Novel Object Captioning",

abstract = "Image captioning models depend on training with paired image-text corpora, which poses various challenges in describing images containing novel objects absent from the training data. While previous novel object captioning methods rely on external image taggers or object detectors to describe novel objects, we present the Attention-based Novel Object Captioner (ANOC) that complements novel object captioners with human attention features that characterize generally important information independent of tasks. It introduces a gating mechanism that adaptively incorporates human attention with self-learned machine attention, with a Constrained Self-Critical Sequence Training method to address the exposure bias while maintaining constraints of novel object descriptions. Extensive experiments conducted on the nocaps and Held-Out COCO datasets demonstrate that our method considerably outperforms the state-of-the-art novel object captioners. Our source code is available at https://github.com/chenxy99/ANOC.",

author = "Xianyu Chen and Ming Jiang and Qi Zhao",

note = "Funding Information: This work is supported by NSF Grants 1908711. Publisher Copyright: {\textcopyright} 2021 International Joint Conferences on Artificial Intelligence. All rights reserved.; 30th International Joint Conference on Artificial Intelligence, IJCAI 2021 ; Conference date: 19-08-2021 Through 27-08-2021",

year = "2021",

language = "English (US)",

series = "IJCAI International Joint Conference on Artificial Intelligence",

publisher = "International Joint Conferences on Artificial Intelligence",

pages = "622--628",

editor = "Zhi-Hua Zhou",

booktitle = "Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021",

}

TY - GEN

T1 - Leveraging Human Attention in Novel Object Captioning

AU - Chen, Xianyu

AU - Jiang, Ming

AU - Zhao, Qi

PY - 2021

Y1 - 2021

N2 - Image captioning models depend on training with paired image-text corpora, which poses various challenges in describing images containing novel objects absent from the training data. While previous novel object captioning methods rely on external image taggers or object detectors to describe novel objects, we present the Attention-based Novel Object Captioner (ANOC) that complements novel object captioners with human attention features that characterize generally important information independent of tasks. It introduces a gating mechanism that adaptively incorporates human attention with self-learned machine attention, with a Constrained Self-Critical Sequence Training method to address the exposure bias while maintaining constraints of novel object descriptions. Extensive experiments conducted on the nocaps and Held-Out COCO datasets demonstrate that our method considerably outperforms the state-of-the-art novel object captioners. Our source code is available at https://github.com/chenxy99/ANOC.

AB - Image captioning models depend on training with paired image-text corpora, which poses various challenges in describing images containing novel objects absent from the training data. While previous novel object captioning methods rely on external image taggers or object detectors to describe novel objects, we present the Attention-based Novel Object Captioner (ANOC) that complements novel object captioners with human attention features that characterize generally important information independent of tasks. It introduces a gating mechanism that adaptively incorporates human attention with self-learned machine attention, with a Constrained Self-Critical Sequence Training method to address the exposure bias while maintaining constraints of novel object descriptions. Extensive experiments conducted on the nocaps and Held-Out COCO datasets demonstrate that our method considerably outperforms the state-of-the-art novel object captioners. Our source code is available at https://github.com/chenxy99/ANOC.

UR - http://www.scopus.com/inward/record.url?scp=85121104024&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85121104024&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85121104024

T3 - IJCAI International Joint Conference on Artificial Intelligence

SP - 622

EP - 628

BT - Proceedings of the 30th International Joint Conference on Artificial Intelligence, IJCAI 2021

A2 - Zhou, Zhi-Hua

PB - International Joint Conferences on Artificial Intelligence

T2 - 30th International Joint Conference on Artificial Intelligence, IJCAI 2021

Y2 - 19 August 2021 through 27 August 2021

ER -

Leveraging Human Attention in Novel Object Captioning

Abstract

Publication series

Conference

Bibliographical note

OpenUrl availability

Other files and links

Fingerprint

Cite this