TY - GEN
T1 - DiffLane
T2 - 2025 IEEE International Conference on Multimedia and Expo, ICME 2025
AU - Liu, Wenxiang
AU - Liu, Yongkang
AU - Meng, Weiliang
AU - He, Gaoqi
AU - Li, Jianhua
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Mask-based video lane detection methods currently have achieved promising performance. However, they generate irregular lane masks in complex scenes, resulting in inaccurate lane positioning. Diffusion models have achieved notable success in the field of image segmentation because of their ability to restore pixel-level details. In this paper, we propose a novel framework DiffLane, termed Diffusion Model-Based Lane Mask Generation for Accurate Video Lane Detection. The main idea of our work is to exploit the detail-restoring capability of diffusion models to generate high-quality lane masks. DiffLane includes the MultiFrame Fusion Enhancer (MFFE), the MultiScale De-noising Network (MSDN) and the Dynamic Lane Perception Unit (DLPU). In MFFE, the current frame is enhanced with visual information from the past two frames through global matching-based optical flow estimation. This enhanced frame serves as a condition for each denoising step. MSDN predicts noise through a multi-scale fusion strategy, enabling the diffusion model to remove noise and generate regular lane masks precisely. DLPU regresses the coefficient vectors from the generated lane masks with DSConv applied in two directions, completing the accurate video lane detection task. Extensive experiments on the VIL-100 and OpenLane-V datasets demonstrate that our method outperforms other state-of-the-art approaches.
AB - Mask-based video lane detection methods currently have achieved promising performance. However, they generate irregular lane masks in complex scenes, resulting in inaccurate lane positioning. Diffusion models have achieved notable success in the field of image segmentation because of their ability to restore pixel-level details. In this paper, we propose a novel framework DiffLane, termed Diffusion Model-Based Lane Mask Generation for Accurate Video Lane Detection. The main idea of our work is to exploit the detail-restoring capability of diffusion models to generate high-quality lane masks. DiffLane includes the MultiFrame Fusion Enhancer (MFFE), the MultiScale De-noising Network (MSDN) and the Dynamic Lane Perception Unit (DLPU). In MFFE, the current frame is enhanced with visual information from the past two frames through global matching-based optical flow estimation. This enhanced frame serves as a condition for each denoising step. MSDN predicts noise through a multi-scale fusion strategy, enabling the diffusion model to remove noise and generate regular lane masks precisely. DLPU regresses the coefficient vectors from the generated lane masks with DSConv applied in two directions, completing the accurate video lane detection task. Extensive experiments on the VIL-100 and OpenLane-V datasets demonstrate that our method outperforms other state-of-the-art approaches.
KW - diffusion model
KW - lane mask generation
KW - video lane detection
UR - https://www.scopus.com/pages/publications/105022659500
U2 - 10.1109/ICME59968.2025.11209203
DO - 10.1109/ICME59968.2025.11209203
M3 - 会议稿件
AN - SCOPUS:105022659500
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2025 IEEE International Conference on Multimedia and Expo
PB - IEEE Computer Society
Y2 - 30 June 2025 through 4 July 2025
ER -