Explain Vision Focus: Blending Human Saliency Into Synthetic Face Images

  • Kaiwei Zhang
  • , Dandan Zhu
  • , Xiongkuo Min*
  • , Huiyu Duan
  • , Guangtao Zhai*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Synthetic faces have been extensively researched and applied in various fields, such as face parsing and recognition. Compared to real face images, synthetic faces engender more controllable and consistent experimental stimuli due to the ability to precisely merge expression animations onto the facial skeleton. Accordingly, we establish an eye-tracking database with 780 synthetic face images and fixation data collected from 22 participants. The use of synthetic images with consistent expressions ensures reliable data support for exploring the database and determining the following findings: (1) A correlation study between saliency intensity and facial movement reveals that the variation of attention distribution within facial regions is mainly attributed to the movement of the mouth. (2) A categorized analysis of different demographic factors demonstrates that the bias towards salient regions aligns with differences in some demographic categories of synthetic characters. In practice, inference of facial saliency distribution is commonly used to predict the regions of interest for facial video-related applications. Therefore, we propose a benchmark model that accurately predicts saliency maps, closely matching the ground truth annotations. This achievement is made possible by utilizing channel alignment and progressive summation for feature fusion, along with the incorporation of Sinusoidal Position Encoding. The ablation experiment also demonstrates the effectiveness of our proposed model. We hope that this paper will contribute to advancing the photorealism of generative digital humans.

Original languageEnglish
Pages (from-to)489-502
Number of pages14
JournalIEEE Transactions on Multimedia
Volume27
DOIs
StatePublished - 2025

Keywords

  • Eye tracking
  • saliency prediction
  • synthetic face image
  • visual attention

Fingerprint

Dive into the research topics of 'Explain Vision Focus: Blending Human Saliency Into Synthetic Face Images'. Together they form a unique fingerprint.

Cite this