TY - GEN
T1 - Hierarchical Safety Realignment
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
AU - Li, Yue
AU - Yi, Xin
AU - Shi, Dongsheng
AU - de Melo, Gerard
AU - Wang, Xiaoling
AU - Wang, Linlin
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - With the growing size of Large Vision-Language Models (LVLMs), network pruning techniques designed to compress these models for deployment in resource-constrained environments have attracted significant attention. However, we observe that pruning frequently results in a degradation in safety performance. To address this issue, we propose a novel and lightweight approach, named Hierarchical Safety Realignment (HSR). HSR operates by first quantifying the contribution of each attention head to safety, identifying the most critical ones, and then selectively restoring neurons directly within these attention heads that play a pivotal role in maintaining safety. This process hierarchically realigns the safety of pruned LVLMs, progressing from the attention head level to the neuron level. We validate HSR across various models and pruning strategies, consistently achieving notable improvements in safety performance. To the best of our knowledge, this is the first work explicitly focused on restoring safety in LVLMs post-pruning. The code will be available at https://github.com/TheShineyue/HSR.
AB - With the growing size of Large Vision-Language Models (LVLMs), network pruning techniques designed to compress these models for deployment in resource-constrained environments have attracted significant attention. However, we observe that pruning frequently results in a degradation in safety performance. To address this issue, we propose a novel and lightweight approach, named Hierarchical Safety Realignment (HSR). HSR operates by first quantifying the contribution of each attention head to safety, identifying the most critical ones, and then selectively restoring neurons directly within these attention heads that play a pivotal role in maintaining safety. This process hierarchically realigns the safety of pruned LVLMs, progressing from the attention head level to the neuron level. We validate HSR across various models and pruning strategies, consistently achieving notable improvements in safety performance. To the best of our knowledge, this is the first work explicitly focused on restoring safety in LVLMs post-pruning. The code will be available at https://github.com/TheShineyue/HSR.
UR - https://www.scopus.com/pages/publications/105028574266
U2 - 10.18653/v1/2025.findings-acl.394
DO - 10.18653/v1/2025.findings-acl.394
M3 - 会议稿件
AN - SCOPUS:105028574266
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 7600
EP - 7612
BT - Findings of the Association for Computational Linguistics
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
Y2 - 27 July 2025 through 1 August 2025
ER -