TY - JOUR
T1 - FacePaint
T2 - Two-way cross context semantic network for face image inpainting
AU - Shi, Yongsheng
AU - Huang, Dongjin
AU - Liu, Jinhua
AU - Qu, Jiantao
AU - Xie, Zhifeng
AU - Ma, Lizhuang
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/12
Y1 - 2025/12
N2 - Due to the lack of publicly available paired data for occluded and unoccluded faces, current face image inpainting methods struggle to generate high-quality results for naturally occluded face images. Moreover, for large occluded areas, most methods are prone to reconstructing distorted structures and blurred textures, which can destroy the global semantic content of face images. To address these challenges, we propose a transformer-based inpainting framework (FacePaint) for automatically inpainting face images occluded by different types of objects. First, we simulate human faces that are occluded in the real world and construct a new face dataset. Second, to enhance the understanding of global semantics, we propose a Two-way Cross Context Semantic Attention network (TCCSA) by incorporating the self-attention (SA), context-to-semantic (CTS), and semantic-to-context (STC) models. TCCSA can capture the semantic information of faces while extracting long-range contextual features guided by semantic priors. Third, to improve the ability for reconstructing large occluded regions, we propose a novel gated convolution-based feed-forward network (FFN) dedicated to extracting local contextual features of images. Finally, to ensure that FacePaint can focus on the structures and textures of images, as well as the semantic information, a new loss function is proposed to guide its training. Extensive experimental results demonstrate that the proposed FacePaint is significantly superior to the state-of-the-art approaches both qualitatively and quantitatively on five synthesized datasets. Additionally, FacePaint can be effectively applied to real scenes, which can generate high-fidelity results from occluded face images by different objects in the wild.
AB - Due to the lack of publicly available paired data for occluded and unoccluded faces, current face image inpainting methods struggle to generate high-quality results for naturally occluded face images. Moreover, for large occluded areas, most methods are prone to reconstructing distorted structures and blurred textures, which can destroy the global semantic content of face images. To address these challenges, we propose a transformer-based inpainting framework (FacePaint) for automatically inpainting face images occluded by different types of objects. First, we simulate human faces that are occluded in the real world and construct a new face dataset. Second, to enhance the understanding of global semantics, we propose a Two-way Cross Context Semantic Attention network (TCCSA) by incorporating the self-attention (SA), context-to-semantic (CTS), and semantic-to-context (STC) models. TCCSA can capture the semantic information of faces while extracting long-range contextual features guided by semantic priors. Third, to improve the ability for reconstructing large occluded regions, we propose a novel gated convolution-based feed-forward network (FFN) dedicated to extracting local contextual features of images. Finally, to ensure that FacePaint can focus on the structures and textures of images, as well as the semantic information, a new loss function is proposed to guide its training. Extensive experimental results demonstrate that the proposed FacePaint is significantly superior to the state-of-the-art approaches both qualitatively and quantitatively on five synthesized datasets. Additionally, FacePaint can be effectively applied to real scenes, which can generate high-fidelity results from occluded face images by different objects in the wild.
KW - Image inpainting
KW - Occluded face images
KW - Semantic priors
KW - Transformer
UR - https://www.scopus.com/pages/publications/105007227539
U2 - 10.1016/j.displa.2025.103092
DO - 10.1016/j.displa.2025.103092
M3 - 文章
AN - SCOPUS:105007227539
SN - 0141-9382
VL - 90
JO - Displays
JF - Displays
M1 - 103092
ER -