TY - JOUR
T1 - A Trustworthy Counterfactual Explanation Method with Latent Space Smoothing
AU - Li, Yan
AU - Cai, Xia
AU - Wu, Chunwei
AU - Lin, Xiao
AU - Cao, Guitao
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Despite the large-scale adoption of Artificial Intelligence (AI) models in healthcare, there is an urgent need for trustworthy tools to rigorously backtrack the model decisions so that they behave reliably. Counterfactual explanations take a counter-intuitive approach to allow users to explore 'what if' scenarios gradually becoming popular in the trustworthy field. However, most previous work on model's counterfactual explanation cannot generate in-distribution attribution credibly, produces adversarial examples, or fails to give a confidence interval for the explanation. Hence, in this paper, we propose a novel approach that generates counterfactuals in locally smooth directed semantic embedding space, and at the same time gives an uncertainty estimate in the counterfactual generation process. Specifically, we identify low-dimensional directed semantic embedding space based on Principal Component Analysis (PCA) applied in differential generative model. Then, we propose latent space smoothing regularization to rectify counterfactual search within in-distribution, such that visually-imperceptible changes are more robust to adversarial perturbations. Moreover, we put forth an uncertainty estimation framework for evaluating counterfactual uncertainty. Extensive experiments on several challenging realistic Chest X-ray and CelebA datasets show that our approach performs consistently well and better than the existing several state-of-the-art baseline approaches.
AB - Despite the large-scale adoption of Artificial Intelligence (AI) models in healthcare, there is an urgent need for trustworthy tools to rigorously backtrack the model decisions so that they behave reliably. Counterfactual explanations take a counter-intuitive approach to allow users to explore 'what if' scenarios gradually becoming popular in the trustworthy field. However, most previous work on model's counterfactual explanation cannot generate in-distribution attribution credibly, produces adversarial examples, or fails to give a confidence interval for the explanation. Hence, in this paper, we propose a novel approach that generates counterfactuals in locally smooth directed semantic embedding space, and at the same time gives an uncertainty estimate in the counterfactual generation process. Specifically, we identify low-dimensional directed semantic embedding space based on Principal Component Analysis (PCA) applied in differential generative model. Then, we propose latent space smoothing regularization to rectify counterfactual search within in-distribution, such that visually-imperceptible changes are more robust to adversarial perturbations. Moreover, we put forth an uncertainty estimation framework for evaluating counterfactual uncertainty. Extensive experiments on several challenging realistic Chest X-ray and CelebA datasets show that our approach performs consistently well and better than the existing several state-of-the-art baseline approaches.
KW - Trustworthy AI
KW - counterfactual explanation
KW - disentanglement representation
KW - image processing
KW - smoothing
UR - https://www.scopus.com/pages/publications/85201786985
U2 - 10.1109/TIP.2024.3442614
DO - 10.1109/TIP.2024.3442614
M3 - 文章
C2 - 39159026
AN - SCOPUS:85201786985
SN - 1057-7149
VL - 33
SP - 4584
EP - 4599
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -