Learning to Predict Sequences of Human Visual Fixations

Ming Jiang; Xavier Boix; Gemma Roig; Juan Xu; Luc Van Gool; Qi Zhao

doi:10.1109/TNNLS.2015.2496306

Learning to Predict Sequences of Human Visual Fixations

Ming Jiang, Xavier Boix, Gemma Roig, Juan Xu, Luc Van Gool, Qi Zhao

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

43 Scopus citations

Abstract

Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.

Original language	English (US)
Article number	7374716
Pages (from-to)	1241-1252
Number of pages	12
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	27
Issue number	6
DOIs	https://doi.org/10.1109/TNNLS.2015.2496306
State	Published - Jun 2016

Bibliographical note

Publisher Copyright:
© 2016 IEEE.

Keywords

Scanpath prediction
Visual saliency prediction

Access

10.1109/TNNLS.2015.2496306

OpenUrl availability

Full text

Cite this

@article{c876d433368b44779c33b40f4e460d00,

title = "Learning to Predict Sequences of Human Visual Fixations",

abstract = "Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.",

keywords = "Scanpath prediction, Visual saliency prediction",

author = "Ming Jiang and Xavier Boix and Gemma Roig and Juan Xu and {Van Gool}, Luc and Qi Zhao",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2016",

month = jun,

doi = "10.1109/TNNLS.2015.2496306",

language = "English (US)",

volume = "27",

pages = "1241--1252",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "6",

}

TY - JOUR

T1 - Learning to Predict Sequences of Human Visual Fixations

AU - Jiang, Ming

AU - Boix, Xavier

AU - Roig, Gemma

AU - Xu, Juan

AU - Van Gool, Luc

AU - Zhao, Qi

PY - 2016/6

Y1 - 2016/6

N2 - Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.

AB - Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.

KW - Scanpath prediction

KW - Visual saliency prediction

UR - http://www.scopus.com/inward/record.url?scp=84954044819&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84954044819&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2015.2496306

DO - 10.1109/TNNLS.2015.2496306

M3 - Article

C2 - 26761903

AN - SCOPUS:84954044819

SN - 2162-237X

VL - 27

SP - 1241

EP - 1252

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 6

M1 - 7374716

ER -

Learning to Predict Sequences of Human Visual Fixations

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this