TY - JOUR
T1 - Learning to Predict Sequences of Human Visual Fixations
AU - Jiang, Ming
AU - Boix, Xavier
AU - Roig, Gemma
AU - Xu, Juan
AU - Van Gool, Luc
AU - Zhao, Qi
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/6
Y1 - 2016/6
N2 - Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.
AB - Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.
KW - Scanpath prediction
KW - Visual saliency prediction
UR - http://www.scopus.com/inward/record.url?scp=84954044819&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84954044819&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2015.2496306
DO - 10.1109/TNNLS.2015.2496306
M3 - Article
C2 - 26761903
AN - SCOPUS:84954044819
SN - 2162-237X
VL - 27
SP - 1241
EP - 1252
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 6
M1 - 7374716
ER -