TY - GEN
T1 - Object skeleton extraction in natural images by fusing scale-associated deep side outputs
AU - Shen, Wei
AU - Zhao, Kai
AU - Jiang, Yuan
AU - Wang, Yan
AU - Zhang, Zhijiang
AU - Bai, Xiang
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/9
Y1 - 2016/12/9
N2 - Object skeleton is a useful cue for object detection, complementary to the object contour, as it provides a structural representation to describe the relationship among object parts. While object skeleton extraction in natural images is a very challenging problem, as it requires the extractor to be able to capture both local and global image context to determine the intrinsic scale of each skeleton pixel. Existing methods rely on per-pixel based multi-scale feature computation, which results in difficult modeling and high time consumption. In this paper, we present a fully convolutional network with multiple scale-associated side outputs to address this problem. By observing the relationship between the receptive field sizes of the sequential stages in the network and the skeleton scales they can capture, we introduce a scale-associated side output to each stage. We impose supervision to different stages by guiding the scale-associated side outputs toward groundtruth skeletons of different scales. The responses of the multiple scaleassociated side outputs are then fused in a scale-specific way to localize skeleton pixels with multiple scales effectively. Our method achieves promising results on two skeleton extraction datasets, and significantly outperforms other competitors.
AB - Object skeleton is a useful cue for object detection, complementary to the object contour, as it provides a structural representation to describe the relationship among object parts. While object skeleton extraction in natural images is a very challenging problem, as it requires the extractor to be able to capture both local and global image context to determine the intrinsic scale of each skeleton pixel. Existing methods rely on per-pixel based multi-scale feature computation, which results in difficult modeling and high time consumption. In this paper, we present a fully convolutional network with multiple scale-associated side outputs to address this problem. By observing the relationship between the receptive field sizes of the sequential stages in the network and the skeleton scales they can capture, we introduce a scale-associated side output to each stage. We impose supervision to different stages by guiding the scale-associated side outputs toward groundtruth skeletons of different scales. The responses of the multiple scaleassociated side outputs are then fused in a scale-specific way to localize skeleton pixels with multiple scales effectively. Our method achieves promising results on two skeleton extraction datasets, and significantly outperforms other competitors.
UR - https://www.scopus.com/pages/publications/84986267484
U2 - 10.1109/CVPR.2016.31
DO - 10.1109/CVPR.2016.31
M3 - 会议稿件
AN - SCOPUS:84986267484
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 222
EP - 230
BT - Proceedings - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
PB - IEEE Computer Society
T2 - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
Y2 - 26 June 2016 through 1 July 2016
ER -