TY - GEN
T1 - Distributed fuzzy rough set for big data analysis in cloud computing
AU - Qu, Wenhao
AU - Kong, Linghe
AU - Wu, Kaishun
AU - Tang, Feilong
AU - Chen, Guihai
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Fuzzy rough set based feature selection is a widely adopted technique for big data analysis. However, the high accuracy of this technique depends on all the data correlations, so that it always works in the centralized computing mode. With the increasing data volume, the centralized server, especially its computation capability and memory space, cannot afford the computing of fuzzy rough set. To enable the fuzzy rough set for big data analysis, in this paper, we propose the novel Distributed Fuzzy Rough Set (DFRS) based feature selection in cloud computing, which separates and assigns the tasks to multiple nodes for parallel computing. The key challenge is to maintain the global information on each distributed node without conserving the entire fuzzy relation matrix. We tackle this challenge by a dynamic data decomposition algorithm and a data summarization process on each distributed node. Extensive experiments based on multiple real datasets demonstrate that DFRS significantly improves the runtime and its feature selection accuracy is nearly the same as the traditional centralized computing.
AB - Fuzzy rough set based feature selection is a widely adopted technique for big data analysis. However, the high accuracy of this technique depends on all the data correlations, so that it always works in the centralized computing mode. With the increasing data volume, the centralized server, especially its computation capability and memory space, cannot afford the computing of fuzzy rough set. To enable the fuzzy rough set for big data analysis, in this paper, we propose the novel Distributed Fuzzy Rough Set (DFRS) based feature selection in cloud computing, which separates and assigns the tasks to multiple nodes for parallel computing. The key challenge is to maintain the global information on each distributed node without conserving the entire fuzzy relation matrix. We tackle this challenge by a dynamic data decomposition algorithm and a data summarization process on each distributed node. Extensive experiments based on multiple real datasets demonstrate that DFRS significantly improves the runtime and its feature selection accuracy is nearly the same as the traditional centralized computing.
KW - Big data
KW - Distributed feature selection
KW - Dynamic data decomposition
KW - Fuzzy rough sets
UR - https://www.scopus.com/pages/publications/85078919438
U2 - 10.1109/ICPADS47876.2019.00023
DO - 10.1109/ICPADS47876.2019.00023
M3 - 会议稿件
AN - SCOPUS:85078919438
T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
SP - 109
EP - 116
BT - Proceedings - 2019 IEEE 25th International Conference on Parallel and Distributed Systems, ICPADS 2019
PB - IEEE Computer Society
T2 - 25th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2019
Y2 - 4 December 2019 through 6 December 2019
ER -