TY - GEN
T1 - E2DAS
T2 - 27th International Conference on Pattern Recognition, ICPR 2024
AU - Zhang, Nana
AU - Liu, Qian
AU - Zhu, Dandan
AU - Zhu, Kun
AU - Zhai, Guangtao
AU - Yang, Xiaokang
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Recent years have witnessed rapid progress of convolutional neural networks (CNNs) and their successful application in the task of saliency prediction for omnidirectional images (ODIs). Albeit achieving tremendous performance improvements, these CNNs-based saliency models are plagued by two major shortcomings: spatial content-agnostic and computationally intensive. Inspired by the effectiveness of equivariant network in the majority of computer vision tasks, we propose a novel efficient equivariant dynamic aggregation saliency (E2DAS) model to efficiently tackle the issue of human fixation prediction in ODIs. To be specific, our proposed model consists of an efficient equivariant module, a dynamic convolutional aggregation module, and an optimization computation module. Different from existing saliency models for ODIs, we are the first attempt to introduce an efficient equivariant dynamic convolutional aggregation operation into the saliency prediction task, which can fundamentally alleviate the projection distortion problem and can effectively learn spatial content-adaptive features. Moreover, we clearly observe a considerable decrease in the number of parameters resulting from the replacement of standard convolution with dynamic convolution aggregation. Extensive experiments on several benchmark datasets show the proposed model’s superiority over other state-of-the-art methods in terms of performance.
AB - Recent years have witnessed rapid progress of convolutional neural networks (CNNs) and their successful application in the task of saliency prediction for omnidirectional images (ODIs). Albeit achieving tremendous performance improvements, these CNNs-based saliency models are plagued by two major shortcomings: spatial content-agnostic and computationally intensive. Inspired by the effectiveness of equivariant network in the majority of computer vision tasks, we propose a novel efficient equivariant dynamic aggregation saliency (E2DAS) model to efficiently tackle the issue of human fixation prediction in ODIs. To be specific, our proposed model consists of an efficient equivariant module, a dynamic convolutional aggregation module, and an optimization computation module. Different from existing saliency models for ODIs, we are the first attempt to introduce an efficient equivariant dynamic convolutional aggregation operation into the saliency prediction task, which can fundamentally alleviate the projection distortion problem and can effectively learn spatial content-adaptive features. Moreover, we clearly observe a considerable decrease in the number of parameters resulting from the replacement of standard convolution with dynamic convolution aggregation. Extensive experiments on several benchmark datasets show the proposed model’s superiority over other state-of-the-art methods in terms of performance.
KW - Equivariant dynamic aggregation
KW - light-weight model
KW - omnidirectional images
KW - saliency prediction
KW - spatial content-adaptive
UR - https://www.scopus.com/pages/publications/85213054248
U2 - 10.1007/978-3-031-78122-3_26
DO - 10.1007/978-3-031-78122-3_26
M3 - 会议稿件
AN - SCOPUS:85213054248
SN - 9783031781216
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 407
EP - 423
BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
A2 - Antonacopoulos, Apostolos
A2 - Chaudhuri, Subhasis
A2 - Chellappa, Rama
A2 - Liu, Cheng-Lin
A2 - Bhattacharya, Saumik
A2 - Pal, Umapada
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 1 December 2024 through 5 December 2024
ER -