Self-distillation for few-shot image captioning

Xianyu Chen, Ming Jiang, Qi Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

The development of large-scale image-captioning datasets is expensive, while the abundance of unpaired images and text corpus can potentially help reduce the efforts of manual annotation. In this paper, we study the few-shot image captioning problem that only requires a small amount of annotated image-caption pairs. We propose an ensemble- based self-distillation method that allows image captioning models to be trained with unpaired images and captions. The ensemble consists of multiple base models trained with different data samples in each iteration. For learning from unpaired images, we generate multiple pseudo captions with the ensemble and allocate different weights according to their confidence levels. For learning from unpaired captions, we propose a simple yet effective pseudo feature generation method based on Gradient Descent. The pseudo captions and pseudo features from the ensemble are used to train the base models in future iterations. The proposed method is general over different image captioning models and datasets. Our experiments demonstrate significant performance improvements and meaningful captions generated with only 1% of paired training data. Source code is available at https://github.com/chenxy99/SD-FSIC.

Original languageEnglish (US)
Title of host publicationProceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages545-555
Number of pages11
ISBN (Electronic)9780738142661
DOIs
StatePublished - Jan 2021
Event2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021 - Virtual, Online, United States
Duration: Jan 5 2021Jan 9 2021

Publication series

NameProceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021

Conference

Conference2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
Country/TerritoryUnited States
CityVirtual, Online
Period1/5/211/9/21

Bibliographical note

Publisher Copyright:
© 2021 IEEE.

Fingerprint

Dive into the research topics of 'Self-distillation for few-shot image captioning'. Together they form a unique fingerprint.

Cite this