TY - JOUR
T1 - A parallel and incremental approach for data-intensive learning of Bayesian networks
AU - Yue, Kun
AU - Fang, Qiyu
AU - Wang, Xiaoling
AU - Li, Jin
AU - Liu, Weiyi
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2015/12
Y1 - 2015/12
N2 - Bayesian network (BN) has been adopted as the underlying model for representing and inferring uncertain knowledge. As the basis of realistic applications centered on probabilistic inferences, learning a BN from data is a critical subject of machine learning, artificial intelligence, and big data paradigms. Currently, it is necessary to extend the classical methods for learning BNs with respect to data-intensive computing or in cloud environments. In this paper, we propose a parallel and incremental approach for data-intensive learning of BNs from massive, distributed, and dynamically changing data by extending the classical scoring and search algorithm and using MapReduce. First, we adopt the minimum description length as the scoring metric and give the two-pass MapReduce-based algorithms for computing the required marginal probabilities and scoring the candidate graphical model from sample data. Then, we give the corresponding strategy for extending the classical hill-climbing algorithm to obtain the optimal structure, as well as that for storing a BN by pairs. Further, in view of the dynamic characteristics of the changing data, we give the concept of influence degree to measure the coincidence of the current BN with new data, and then propose the corresponding two-pass MapReduce-based algorithms for BNs incremental learning. Experimental results show the efficiency, scalability, and effectiveness of our methods.
AB - Bayesian network (BN) has been adopted as the underlying model for representing and inferring uncertain knowledge. As the basis of realistic applications centered on probabilistic inferences, learning a BN from data is a critical subject of machine learning, artificial intelligence, and big data paradigms. Currently, it is necessary to extend the classical methods for learning BNs with respect to data-intensive computing or in cloud environments. In this paper, we propose a parallel and incremental approach for data-intensive learning of BNs from massive, distributed, and dynamically changing data by extending the classical scoring and search algorithm and using MapReduce. First, we adopt the minimum description length as the scoring metric and give the two-pass MapReduce-based algorithms for computing the required marginal probabilities and scoring the candidate graphical model from sample data. Then, we give the corresponding strategy for extending the classical hill-climbing algorithm to obtain the optimal structure, as well as that for storing a BN by pairs. Further, in view of the dynamic characteristics of the changing data, we give the concept of influence degree to measure the coincidence of the current BN with new data, and then propose the corresponding two-pass MapReduce-based algorithms for BNs incremental learning. Experimental results show the efficiency, scalability, and effectiveness of our methods.
KW - Bayesian network learning
KW - MapReduce
KW - data-intensive computing
KW - incremental learning
KW - parallel algorithm
KW - uncertain knowledge
UR - https://www.scopus.com/pages/publications/84960411888
U2 - 10.1109/TCYB.2015.2388791
DO - 10.1109/TCYB.2015.2388791
M3 - 文章
AN - SCOPUS:84960411888
SN - 2168-2267
VL - 45
SP - 2890
EP - 2904
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 12
M1 - 7018001
ER -