TY - GEN
T1 - Multi-Modal Adversarial Example Detection with Transformer
AU - Ding, Chaoyue
AU - Sun, Shiliang
AU - Zhao, Jing
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Although deep neural networks have shown great potential for many tasks, they are vulnerable to adversarial examples, which are generated by adding small perturbations to natural examples. Recently, many studies have proved that making full use of different modalities can effectively enhance the representational ability of deep neural networks. We propose a multi-modal deep fusion Transformer, termed MDFT. First, the audio feature and the rich semantic text features are extracted by audio encoders and text encoders, respectively. Then, multi-modal attention mechanisms are established to capture the high-level interactions between the audio and linguistic domains to obtain joint multi-modal representation. Finally, the representation is propagated to a dense layer to generate the detection result. The accuracy of this model compared with its unimodal variant on WiAd dataset and BlAd dataset are improved by 0.12 % and 0.19 %, respectively. Experimental results on the two datasets show that MDFT outperforms its unimodal variant model.
AB - Although deep neural networks have shown great potential for many tasks, they are vulnerable to adversarial examples, which are generated by adding small perturbations to natural examples. Recently, many studies have proved that making full use of different modalities can effectively enhance the representational ability of deep neural networks. We propose a multi-modal deep fusion Transformer, termed MDFT. First, the audio feature and the rich semantic text features are extracted by audio encoders and text encoders, respectively. Then, multi-modal attention mechanisms are established to capture the high-level interactions between the audio and linguistic domains to obtain joint multi-modal representation. Finally, the representation is propagated to a dense layer to generate the detection result. The accuracy of this model compared with its unimodal variant on WiAd dataset and BlAd dataset are improved by 0.12 % and 0.19 %, respectively. Experimental results on the two datasets show that MDFT outperforms its unimodal variant model.
KW - Transformer
KW - adversarial example detection
KW - multi-modal
UR - https://www.scopus.com/pages/publications/85140719758
U2 - 10.1109/IJCNN55064.2022.9892561
DO - 10.1109/IJCNN55064.2022.9892561
M3 - 会议稿件
AN - SCOPUS:85140719758
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2022 International Joint Conference on Neural Networks, IJCNN 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Joint Conference on Neural Networks, IJCNN 2022
Y2 - 18 July 2022 through 23 July 2022
ER -