Reliable Multimodal Semantic Communication for Audio-Visual Event Localization

  • Yuandi Li
  • , Zhe Xiang
  • , Fei Yu*
  • , Zhuoran Zhang
  • , Yanhao Wang*
  • , Zhangshuang Guan
  • , Hui Ji
  • , Zhiguo Wan
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The widespread adoption of smart mobile devices and applications has driven an exponential growth in wireless data traffic, posing significant challenges to modern communication systems. Ensuring reliable task-oriented multimodal semantic communication has become increasingly critical. In this letter, we propose RMMSC, a novel framework designed to enhance the effectiveness and reliability of Audio-Visual Event (AVE) localization-driven multimodal semantic communication. Specifically, RMMSC improves the accuracy of multimodal semantic information through advanced semantic encoding and cross-modal feature integration. It employs a two-level coding scheme that combines error-correcting codes with semantic encoders to enhance the reliability of multimodal semantic transmission. As an optional design choice, RMMSC supports a hybrid encryption mechanism to protect transmitted data if required by the application context. Simulation results validate the effectiveness of RMMSC, demonstrating significant improvements in accuracy and reliability for the AVE task.

Original languageEnglish
Pages (from-to)317-321
Number of pages5
JournalIEEE Communications Letters
Volume30
DOIs
StatePublished - 2026

Keywords

  • Semantic communication
  • audio-visual event localization
  • multimodal semantic communication

Fingerprint

Dive into the research topics of 'Reliable Multimodal Semantic Communication for Audio-Visual Event Localization'. Together they form a unique fingerprint.

Cite this