TY - GEN
T1 - TimeSoccer
T2 - 33rd ACM International Conference on Multimedia, MM 2025
AU - You, Ling
AU - Huang, Wenxuan
AU - Xie, Xinni
AU - Wei, Xiangyi
AU - Li, Bangyan
AU - Lin, Shaohui
AU - Li, Yang
AU - Wang, Changbo
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/10/27
Y1 - 2025/10/27
N2 - Soccer is a globally popular sporting event, typically characterized by long matches and distinctive highlight moments. Recent advances in Multimodal Large Language Models (MLLMs) show promising capabilities in temporal grounding and video understanding. However, generating soccer commentary requires both precise temporal localization and semantically rich descriptions over long-form videos. Existing soccer MLLMs often rely on temporal priors for caption generation, which limits their ability to process the entire video in an end-to-end manner. Traditional approaches, on the other hand, follow a complex two-step paradigm that fails to capture the global context, leading to suboptimal performance. To solve the above issues, we present TimeSoccer, the first end-to-end soccer MLLM for Single-anchor Dense Video Captioning (SDVC) in full-match soccer videos. TimeSoccer jointly predicts timestamps and generates captions in a single pass, enabling global context modeling across 45-minute matches. To support long video understanding of soccer matches, we introduce MoFA-Select, a training-free, motion-aware frame compression module that adaptively selects representative frames via a coarse-to-fine strategy, and incorporates complementary training paradigms to strengthen the model's ability to handle long temporal sequences. Extensive experiments demonstrate that our TimeSoccer achieves State-of-The-Art (SoTA) performance on the SDVC task in an end-to-end form, generating high-quality commentary with accurate temporal alignment and strong semantic relevance. For more information, please visit: https://vpx-ecnu.github.io/TimeSoccer-Website/.
AB - Soccer is a globally popular sporting event, typically characterized by long matches and distinctive highlight moments. Recent advances in Multimodal Large Language Models (MLLMs) show promising capabilities in temporal grounding and video understanding. However, generating soccer commentary requires both precise temporal localization and semantically rich descriptions over long-form videos. Existing soccer MLLMs often rely on temporal priors for caption generation, which limits their ability to process the entire video in an end-to-end manner. Traditional approaches, on the other hand, follow a complex two-step paradigm that fails to capture the global context, leading to suboptimal performance. To solve the above issues, we present TimeSoccer, the first end-to-end soccer MLLM for Single-anchor Dense Video Captioning (SDVC) in full-match soccer videos. TimeSoccer jointly predicts timestamps and generates captions in a single pass, enabling global context modeling across 45-minute matches. To support long video understanding of soccer matches, we introduce MoFA-Select, a training-free, motion-aware frame compression module that adaptively selects representative frames via a coarse-to-fine strategy, and incorporates complementary training paradigms to strengthen the model's ability to handle long temporal sequences. Extensive experiments demonstrate that our TimeSoccer achieves State-of-The-Art (SoTA) performance on the SDVC task in an end-to-end form, generating high-quality commentary with accurate temporal alignment and strong semantic relevance. For more information, please visit: https://vpx-ecnu.github.io/TimeSoccer-Website/.
KW - multimodal model
KW - temporal localization
KW - video captioning
UR - https://www.scopus.com/pages/publications/105024074704
U2 - 10.1145/3746027.3755077
DO - 10.1145/3746027.3755077
M3 - 会议稿件
AN - SCOPUS:105024074704
T3 - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
SP - 3418
EP - 3427
BT - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PB - Association for Computing Machinery, Inc
Y2 - 27 October 2025 through 31 October 2025
ER -