TY - JOUR
T1 - IVKMP
T2 - A robust data-driven heterogeneous defect model based on deep representation optimization learning
AU - Zhu, Kun
AU - Ying, Shi
AU - Ding, Weiping
AU - Zhang, Nana
AU - Zhu, Dandan
N1 - Publisher Copyright:
© 2021
PY - 2022/1
Y1 - 2022/1
N2 - Heterogeneous defect prediction (HDP) aims to transfer informative knowledge, namely the defect-proneness tendency of software metrics, from a source project to predict potential defects in a target project by matching metrics with similar distributions between different software projects. Nevertheless, the complex internal intrinsic structure hidden behind the defect data makes it difficult for the prior heterogeneous defect models to capture and migrate the most informative software metrics, and severely hinders HDP performance. To address these issues, we propose a robust data-driven HDP model called IVKMP in this study. We firstly adopt an advanced deep generation network – InfoGAN (Information maximizing GANs) for data augmentation, namely simultaneously achieving class balance and generating sufficient defect instances. Secondly, the multi-objective VaEA (Vector angle-based Evolutionary Algorithm) optimization is employed to select the fewest representative metric subsets while achieving the minimum error. Finally, a deep defect predictor for HDP based on the lightweight but effective deep network – PCANet (Principal Component Analysis Network) with the binary hashing and block-wise histogram is built to essentially capture more semantically related robust representations. We compare the IVKMP model with multiple state-of-the-art baseline models across 542 heterogeneous project pairs of 26 software projects. Experimental results demonstrate the superiority and robustness of our IVKMP model.
AB - Heterogeneous defect prediction (HDP) aims to transfer informative knowledge, namely the defect-proneness tendency of software metrics, from a source project to predict potential defects in a target project by matching metrics with similar distributions between different software projects. Nevertheless, the complex internal intrinsic structure hidden behind the defect data makes it difficult for the prior heterogeneous defect models to capture and migrate the most informative software metrics, and severely hinders HDP performance. To address these issues, we propose a robust data-driven HDP model called IVKMP in this study. We firstly adopt an advanced deep generation network – InfoGAN (Information maximizing GANs) for data augmentation, namely simultaneously achieving class balance and generating sufficient defect instances. Secondly, the multi-objective VaEA (Vector angle-based Evolutionary Algorithm) optimization is employed to select the fewest representative metric subsets while achieving the minimum error. Finally, a deep defect predictor for HDP based on the lightweight but effective deep network – PCANet (Principal Component Analysis Network) with the binary hashing and block-wise histogram is built to essentially capture more semantically related robust representations. We compare the IVKMP model with multiple state-of-the-art baseline models across 542 heterogeneous project pairs of 26 software projects. Experimental results demonstrate the superiority and robustness of our IVKMP model.
KW - Deep neural network
KW - Heterogeneous defect prediction
KW - Information maximizing GANs
KW - Principal component analysis network multi-objective optimization
UR - https://www.scopus.com/pages/publications/85120006971
U2 - 10.1016/j.ins.2021.11.029
DO - 10.1016/j.ins.2021.11.029
M3 - 文章
AN - SCOPUS:85120006971
SN - 0020-0255
VL - 583
SP - 332
EP - 363
JO - Information Sciences
JF - Information Sciences
ER -