TY - GEN
T1 - ExplainDrive
T2 - 2025 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2025
AU - Yu, Xing
AU - Peng, Jinghan
AU - Li, Hang
AU - Li, Ermuyun
AU - Du, Dehui
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - End-to-end decision models based on deep learning have become increasingly prominent in automated driving systems. However, their black-box nature poses significant challenges to interpreting decision processes, especially in dynamic and complex scenarios. Existing approaches largely focus on post-hoc analyses or isolated single-step explanations, lacking comprehensive explanations from scenario understanding to decision-making and failing to address the complexity of real-world scenarios with coherent reasoning. To address these limitations, we propose ExplainDrive, a multimodal Chain-of-Thought reasoning framework that integrates causally optimized temporal representations with explainable decision-making. ExplainDrive follows a three-stage pipeline: (i) extracting spatio-temporal features via a Causal Temporal Former, (ii) constructing hierarchical scenario understanding, and (iii) progressively deriving driving decisions with interpretable rationales. This design enhances transparency at each intermediate step and mitigates spurious correlations through causal feature selection. Extensive experiments on the BDD-X and nuScenes datasets demonstrate that ExplainDrive consistently improves the quality of decision explanations and outperforms compared models across multiple key evaluation metrics.
AB - End-to-end decision models based on deep learning have become increasingly prominent in automated driving systems. However, their black-box nature poses significant challenges to interpreting decision processes, especially in dynamic and complex scenarios. Existing approaches largely focus on post-hoc analyses or isolated single-step explanations, lacking comprehensive explanations from scenario understanding to decision-making and failing to address the complexity of real-world scenarios with coherent reasoning. To address these limitations, we propose ExplainDrive, a multimodal Chain-of-Thought reasoning framework that integrates causally optimized temporal representations with explainable decision-making. ExplainDrive follows a three-stage pipeline: (i) extracting spatio-temporal features via a Causal Temporal Former, (ii) constructing hierarchical scenario understanding, and (iii) progressively deriving driving decisions with interpretable rationales. This design enhances transparency at each intermediate step and mitigates spurious correlations through causal feature selection. Extensive experiments on the BDD-X and nuScenes datasets demonstrate that ExplainDrive consistently improves the quality of decision explanations and outperforms compared models across multiple key evaluation metrics.
KW - automated driving systems
KW - chain-of-thought reasoning
KW - explainability
KW - multimodal large language models
UR - https://www.scopus.com/pages/publications/105033154269
U2 - 10.1109/SMC58881.2025.11343663
DO - 10.1109/SMC58881.2025.11343663
M3 - 会议稿件
AN - SCOPUS:105033154269
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 1946
EP - 1952
BT - 2025 IEEE International Conference on Systems, Man, and Cybernetics
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 October 2025 through 8 October 2025
ER -