IVKMP: A robust data-driven heterogeneous defect model based on deep representation optimization learning

  • Kun Zhu
  • , Shi Ying*
  • , Weiping Ding
  • , Nana Zhang
  • , Dandan Zhu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Scopus citations

Abstract

Heterogeneous defect prediction (HDP) aims to transfer informative knowledge, namely the defect-proneness tendency of software metrics, from a source project to predict potential defects in a target project by matching metrics with similar distributions between different software projects. Nevertheless, the complex internal intrinsic structure hidden behind the defect data makes it difficult for the prior heterogeneous defect models to capture and migrate the most informative software metrics, and severely hinders HDP performance. To address these issues, we propose a robust data-driven HDP model called IVKMP in this study. We firstly adopt an advanced deep generation network – InfoGAN (Information maximizing GANs) for data augmentation, namely simultaneously achieving class balance and generating sufficient defect instances. Secondly, the multi-objective VaEA (Vector angle-based Evolutionary Algorithm) optimization is employed to select the fewest representative metric subsets while achieving the minimum error. Finally, a deep defect predictor for HDP based on the lightweight but effective deep network – PCANet (Principal Component Analysis Network) with the binary hashing and block-wise histogram is built to essentially capture more semantically related robust representations. We compare the IVKMP model with multiple state-of-the-art baseline models across 542 heterogeneous project pairs of 26 software projects. Experimental results demonstrate the superiority and robustness of our IVKMP model.

Original languageEnglish
Pages (from-to)332-363
Number of pages32
JournalInformation Sciences
Volume583
DOIs
StatePublished - Jan 2022
Externally publishedYes

Keywords

  • Deep neural network
  • Heterogeneous defect prediction
  • Information maximizing GANs
  • Principal component analysis network multi-objective optimization

Fingerprint

Dive into the research topics of 'IVKMP: A robust data-driven heterogeneous defect model based on deep representation optimization learning'. Together they form a unique fingerprint.

Cite this