TY - JOUR
T1 - Contact-conditioned hand-held object reconstruction from single-view images
AU - Wang, Xiaoyuan
AU - Li, Yang
AU - Boukhayma, Adnane
AU - Wang, Changbo
AU - Christie, Marc
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/8
Y1 - 2023/8
N2 - Reconstructing the shape of hand-held objects from single-view color images is a long-standing problem in computer vision and computer graphics. The task is complicated by the ill-posed nature of single-view reconstruction, as well as potential occlusions due to both the hand and the object. Previous works mostly handled the problem by utilizing known object templates as priors to reduce the complexity. In contrast, our paper proposes a novel approach without knowing the object templates beforehand but by exploiting prior knowledge of contacts in hand-object interactions to train an attention-based network that can perform precise hand-held object reconstructions with only a single forward pass in inference. The network we propose encodes visual features together with contact features using a multi-head attention module as a way to condition the training of a neural field representation. This neural field representation outputs a Signed Distance Field representing the reconstructed object and extensive experiments on three well-known datasets demonstrate that our method achieves superior reconstruction results even under severe occlusion compared to the state-of-the-art techniques.
AB - Reconstructing the shape of hand-held objects from single-view color images is a long-standing problem in computer vision and computer graphics. The task is complicated by the ill-posed nature of single-view reconstruction, as well as potential occlusions due to both the hand and the object. Previous works mostly handled the problem by utilizing known object templates as priors to reduce the complexity. In contrast, our paper proposes a novel approach without knowing the object templates beforehand but by exploiting prior knowledge of contacts in hand-object interactions to train an attention-based network that can perform precise hand-held object reconstructions with only a single forward pass in inference. The network we propose encodes visual features together with contact features using a multi-head attention module as a way to condition the training of a neural field representation. This neural field representation outputs a Signed Distance Field representing the reconstructed object and extensive experiments on three well-known datasets demonstrate that our method achieves superior reconstruction results even under severe occlusion compared to the state-of-the-art techniques.
KW - Attention-based network
KW - Contact-conditioned reconstruction
KW - Hand-held object reconstruction
KW - Neural field learning
KW - Single-view reconstruction
UR - https://www.scopus.com/pages/publications/85162174139
U2 - 10.1016/j.cag.2023.05.022
DO - 10.1016/j.cag.2023.05.022
M3 - 文章
AN - SCOPUS:85162174139
SN - 0097-8493
VL - 114
SP - 150
EP - 157
JO - Computers and Graphics
JF - Computers and Graphics
ER -