TY - GEN
T1 - FIPSER
T2 - 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
AU - Chen, Junwei
AU - Zhang, Yueling
AU - Zhang, Lingfeng
AU - Zhang, Min
AU - Wan, Chengcheng
AU - Su, Ting
AU - Pu, Geguang
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/10/27
Y1 - 2024/10/27
N2 - As a rapidly evolving AI technology, deep neural networks are becoming increasingly integrated into human society, yet raising concerns about fairness issues. Previous studies have proposed a metric called causal fairness to measure the fairness of machine learning models and proposed some search algorithms to mine individual discrimination instance pairs (IDIPs). Fairness issues can be alleviated by retraining models with corrected IDIPs. However, the number of samples that are used as seeds for these methods is often limited due to the pursuit of efficiency. In addition, the quantity of IDIPs generated on different seeds varies, so it makes sense to select appropriate samples as seeds, which has not been sufficiently considered in past studies. In this paper, we study the imbalance in IDIP quantities for various datasets and sensitive attributes, highlighting the need for selecting and ranking seed samples. Then, we proposed FIPSER, a feature importance and perturbation potential-based seed prioritization method. Our experimental results show that, on average, when applied to the current state-of-the-art method of IDIP mining, FIPSER can improve its effectiveness by 45% and efficiency by 11%.
AB - As a rapidly evolving AI technology, deep neural networks are becoming increasingly integrated into human society, yet raising concerns about fairness issues. Previous studies have proposed a metric called causal fairness to measure the fairness of machine learning models and proposed some search algorithms to mine individual discrimination instance pairs (IDIPs). Fairness issues can be alleviated by retraining models with corrected IDIPs. However, the number of samples that are used as seeds for these methods is often limited due to the pursuit of efficiency. In addition, the quantity of IDIPs generated on different seeds varies, so it makes sense to select appropriate samples as seeds, which has not been sufficiently considered in past studies. In this paper, we study the imbalance in IDIP quantities for various datasets and sensitive attributes, highlighting the need for selecting and ranking seed samples. Then, we proposed FIPSER, a feature importance and perturbation potential-based seed prioritization method. Our experimental results show that, on average, when applied to the current state-of-the-art method of IDIP mining, FIPSER can improve its effectiveness by 45% and efficiency by 11%.
KW - fairness testing
KW - feature importance
KW - imbalance
KW - seed prioritization
UR - https://www.scopus.com/pages/publications/85212422039
U2 - 10.1145/3691620.3695486
DO - 10.1145/3691620.3695486
M3 - 会议稿件
AN - SCOPUS:85212422039
T3 - Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
SP - 1069
EP - 1081
BT - Proceedings - 2024 39th ACM/IEEE International Conference on Automated Software Engineering, ASE 2024
PB - Association for Computing Machinery, Inc
Y2 - 28 October 2024 through 1 November 2024
ER -