TY - JOUR
T1 - Enhancing missing data imputation through combined bipartite graph and complete directed graph
AU - Zhang, Zhaoyang
AU - Zhu, Hongtu
AU - Zhang, Yingjie
AU - Shu, Hai
AU - Chen, Ziqi
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2025/10/7
Y1 - 2025/10/7
N2 - In this paper, we address a central challenge in tabular missing-data imputation: explicitly identifying and exploiting interdependencies among features to improve reconstruction quality. Current state-of-the-art methods mostly model similarity or interdependence between samples. However, our experiments on real-world tabular datasets show that, when samples are truly independent, building such observation-level graphs yields only marginal and dataset-specific performance gains, rather than consistent and generalizable benefits. We therefore introduce the Bipartite and Complete Directed Graph Neural Network (BCGNN). In BCGNN, observations and features are treated as two distinct node types, and each observed cell value is converted into an attributed edge connecting them. The bipartite component inductively learns node embeddings by fully leveraging the information encoded in these attributed edges, while the complete directed graph component explicitly describes and propagates intricate feature–feature dependencies. The combined graph furnishes a robust inductive framework for representation learning while explicitly parameterizing higher-order dependencies among features. Across diverse missing mechanisms, BCGNN outperforms leading imputation baselines, achieving an average 15% reduction in mean absolute error. Extensive experiments confirm that a deeper understanding of feature interdependence markedly enhances embedding quality. BCGNN also delivers superior performance on downstream label-prediction tasks with missing inputs and demonstrates robust generalization to unseen data.
AB - In this paper, we address a central challenge in tabular missing-data imputation: explicitly identifying and exploiting interdependencies among features to improve reconstruction quality. Current state-of-the-art methods mostly model similarity or interdependence between samples. However, our experiments on real-world tabular datasets show that, when samples are truly independent, building such observation-level graphs yields only marginal and dataset-specific performance gains, rather than consistent and generalizable benefits. We therefore introduce the Bipartite and Complete Directed Graph Neural Network (BCGNN). In BCGNN, observations and features are treated as two distinct node types, and each observed cell value is converted into an attributed edge connecting them. The bipartite component inductively learns node embeddings by fully leveraging the information encoded in these attributed edges, while the complete directed graph component explicitly describes and propagates intricate feature–feature dependencies. The combined graph furnishes a robust inductive framework for representation learning while explicitly parameterizing higher-order dependencies among features. Across diverse missing mechanisms, BCGNN outperforms leading imputation baselines, achieving an average 15% reduction in mean absolute error. Extensive experiments confirm that a deeper understanding of feature interdependence markedly enhances embedding quality. BCGNN also delivers superior performance on downstream label-prediction tasks with missing inputs and demonstrates robust generalization to unseen data.
KW - Bipartite graph
KW - Complete directed graph
KW - Graph neural network
KW - Interdependence
KW - Missing data imputation
UR - https://www.scopus.com/pages/publications/105008701332
U2 - 10.1016/j.neucom.2025.130717
DO - 10.1016/j.neucom.2025.130717
M3 - 文章
AN - SCOPUS:105008701332
SN - 0925-2312
VL - 649
JO - Neurocomputing
JF - Neurocomputing
M1 - 130717
ER -