Explicit Knowledge Incorporation for Visual Reasoning

Yifeng Zhang; Ming Jiang; Qi Zhao

doi:10.1109/CVPR46437.2021.00141

Explicit Knowledge Incorporation for Visual Reasoning

Yifeng Zhang, Ming Jiang, Qi Zhao

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

22 Scopus citations

Abstract

Existing explainable and explicit visual reasoning methods only perform reasoning based on visual evidence but do not take into account knowledge beyond what is in the visual scene. To addresses the knowledge gap between visual reasoning methods and the semantic complexity of real-world images, we present the first explicit visual reasoning method that incorporates external knowledge and models high-order relational attention for improved generalizability and explainability. Specifically, we propose a knowledge incorporation network that explicitly creates and includes new graph nodes for entities and predicates from external knowledge bases to enrich the semantics of the scene graph used in explicit reasoning. We then create a novel Graph-Relate module to perform high-order relational attention on the enriched scene graph. By explicitly introducing structured external knowledge and high-order relational attention, our method demonstrates significant generalizability and explainability over the state-of-the-art visual reasoning approaches on the GQA and VQAv2 datasets.

Original language	English (US)
Title of host publication	Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
Publisher	IEEE Computer Society
Pages	1356-1365
Number of pages	10
ISBN (Electronic)	9781665445092
DOIs	https://doi.org/10.1109/CVPR46437.2021.00141
State	Published - 2021
Event	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 - Virtual, Online, United States Duration: Jun 19 2021 → Jun 25 2021

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919

Conference

Conference	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
Country/Territory	United States
City	Virtual, Online
Period	6/19/21 → 6/25/21

Bibliographical note

Funding Information:
This work is supported by NSF Grants 1908711.

Publisher Copyright:
© 2021 IEEE

Access

10.1109/CVPR46437.2021.00141

OpenUrl availability

Full text

Cite this

Zhang, Y., Jiang, M., & Zhao, Q. (2021). Explicit Knowledge Incorporation for Visual Reasoning. In Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 (pp. 1356-1365). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR46437.2021.00141

Explicit Knowledge Incorporation for Visual Reasoning. / Zhang, Yifeng; Jiang, Ming ; Zhao, Qi.
Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. IEEE Computer Society, 2021. p. 1356-1365 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zhang, Y, Jiang, M & Zhao, Q 2021, Explicit Knowledge Incorporation for Visual Reasoning. in Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, pp. 1356-1365, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, Online, United States, 6/19/21. https://doi.org/10.1109/CVPR46437.2021.00141

@inproceedings{c9a5ce1f89bc46649851114e5f36f2ca,

title = "Explicit Knowledge Incorporation for Visual Reasoning",

abstract = "Existing explainable and explicit visual reasoning methods only perform reasoning based on visual evidence but do not take into account knowledge beyond what is in the visual scene. To addresses the knowledge gap between visual reasoning methods and the semantic complexity of real-world images, we present the first explicit visual reasoning method that incorporates external knowledge and models high-order relational attention for improved generalizability and explainability. Specifically, we propose a knowledge incorporation network that explicitly creates and includes new graph nodes for entities and predicates from external knowledge bases to enrich the semantics of the scene graph used in explicit reasoning. We then create a novel Graph-Relate module to perform high-order relational attention on the enriched scene graph. By explicitly introducing structured external knowledge and high-order relational attention, our method demonstrates significant generalizability and explainability over the state-of-the-art visual reasoning approaches on the GQA and VQAv2 datasets.",

author = "Yifeng Zhang and Ming Jiang and Qi Zhao",

note = "Funding Information: This work is supported by NSF Grants 1908711. Publisher Copyright: {\textcopyright} 2021 IEEE; 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 ; Conference date: 19-06-2021 Through 25-06-2021",

year = "2021",

doi = "10.1109/CVPR46437.2021.00141",

language = "English (US)",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "1356--1365",

booktitle = "Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021",

}

TY - GEN

T1 - Explicit Knowledge Incorporation for Visual Reasoning

AU - Zhang, Yifeng

AU - Jiang, Ming

AU - Zhao, Qi

PY - 2021

Y1 - 2021

N2 - Existing explainable and explicit visual reasoning methods only perform reasoning based on visual evidence but do not take into account knowledge beyond what is in the visual scene. To addresses the knowledge gap between visual reasoning methods and the semantic complexity of real-world images, we present the first explicit visual reasoning method that incorporates external knowledge and models high-order relational attention for improved generalizability and explainability. Specifically, we propose a knowledge incorporation network that explicitly creates and includes new graph nodes for entities and predicates from external knowledge bases to enrich the semantics of the scene graph used in explicit reasoning. We then create a novel Graph-Relate module to perform high-order relational attention on the enriched scene graph. By explicitly introducing structured external knowledge and high-order relational attention, our method demonstrates significant generalizability and explainability over the state-of-the-art visual reasoning approaches on the GQA and VQAv2 datasets.

AB - Existing explainable and explicit visual reasoning methods only perform reasoning based on visual evidence but do not take into account knowledge beyond what is in the visual scene. To addresses the knowledge gap between visual reasoning methods and the semantic complexity of real-world images, we present the first explicit visual reasoning method that incorporates external knowledge and models high-order relational attention for improved generalizability and explainability. Specifically, we propose a knowledge incorporation network that explicitly creates and includes new graph nodes for entities and predicates from external knowledge bases to enrich the semantics of the scene graph used in explicit reasoning. We then create a novel Graph-Relate module to perform high-order relational attention on the enriched scene graph. By explicitly introducing structured external knowledge and high-order relational attention, our method demonstrates significant generalizability and explainability over the state-of-the-art visual reasoning approaches on the GQA and VQAv2 datasets.

UR - http://www.scopus.com/inward/record.url?scp=85123184303&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85123184303&partnerID=8YFLogxK

U2 - 10.1109/CVPR46437.2021.00141

DO - 10.1109/CVPR46437.2021.00141

M3 - Conference contribution

AN - SCOPUS:85123184303

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 1356

EP - 1365

BT - Proceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021

PB - IEEE Computer Society

T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021

Y2 - 19 June 2021 through 25 June 2021

ER -

Explicit Knowledge Incorporation for Visual Reasoning

Abstract

Publication series

Conference

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this