TY - JOUR
T1 - WSBCV
T2 - A data-driven cross-version defect model via multi-objective optimization and incremental representation learning
AU - Zhang, Nana
AU - Zhu, Kun
AU - Ding, Weiping
AU - Zhu, Dandan
N1 - Publisher Copyright:
© 2024 Elsevier Inc.
PY - 2024/5
Y1 - 2024/5
N2 - Cross-version defect prediction (CVDP) refers to training an excellent model on tagged defect data in a previously released project version, and then performing defect prediction on an unlabeled instance module in the current version. Nevertheless, the complicated internal intrinsic construction hidden behind the code defects makes it difficult for the previous cross-version defect models to capture more discriminative software features, and seriously restrains the CVDP performance. In this study, we propose an intelligent data-driven CVDP model named WSBCV based on multi-objective optimization and incremental representation learning. We firstly leverage an advanced deep generation adversarial network – WGAN-GP (Wasserstein GAN with Gradient Penalty) to perform data augmentation, including balancing defect classes and synthesizing abundant training instances. Secondly, a multi-objective SPEA/R (Strength Pareto-based Evolutionary Algorithm / Reference) feature selection optimization method is built to effectively search the fewest representative feature subsets while achieving the minimum error. Finally, a powerful defect predictor for CVDP based on the BLS (Broad Learning System) with incremental learning is built to learn excellent feature representations and achieve incremental online model update quickly. Experimental results across 32 cross-version pairs from 45 version demonstrate that the proposed SPEA/R, BLS and WSBCV all have statistically significant difference advantages compared to ten multi-objective feature selection approaches, six defect predictors and two CVDP models, respectively.
AB - Cross-version defect prediction (CVDP) refers to training an excellent model on tagged defect data in a previously released project version, and then performing defect prediction on an unlabeled instance module in the current version. Nevertheless, the complicated internal intrinsic construction hidden behind the code defects makes it difficult for the previous cross-version defect models to capture more discriminative software features, and seriously restrains the CVDP performance. In this study, we propose an intelligent data-driven CVDP model named WSBCV based on multi-objective optimization and incremental representation learning. We firstly leverage an advanced deep generation adversarial network – WGAN-GP (Wasserstein GAN with Gradient Penalty) to perform data augmentation, including balancing defect classes and synthesizing abundant training instances. Secondly, a multi-objective SPEA/R (Strength Pareto-based Evolutionary Algorithm / Reference) feature selection optimization method is built to effectively search the fewest representative feature subsets while achieving the minimum error. Finally, a powerful defect predictor for CVDP based on the BLS (Broad Learning System) with incremental learning is built to learn excellent feature representations and achieve incremental online model update quickly. Experimental results across 32 cross-version pairs from 45 version demonstrate that the proposed SPEA/R, BLS and WSBCV all have statistically significant difference advantages compared to ten multi-objective feature selection approaches, six defect predictors and two CVDP models, respectively.
KW - Broad learning system
KW - Cross-version defect prediction
KW - Deep generation adversarial network
KW - Incremental learning
KW - Multi-objective optimization
UR - https://www.scopus.com/pages/publications/85190233911
U2 - 10.1016/j.ins.2024.120595
DO - 10.1016/j.ins.2024.120595
M3 - 文章
AN - SCOPUS:85190233911
SN - 0020-0255
VL - 669
JO - Information Sciences
JF - Information Sciences
M1 - 120595
ER -