TY - GEN
T1 - M-kernel merging
T2 - 8th International Conference on Database Systems for Advanced Applications, DASFAA 2003
AU - Zhou, Aoying
AU - Cai, Zhiyuan
AU - Wei, Li
AU - Qian, Weining
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - Density estimation is a costly operation for computing distribution information of data sets underlying many important data mining applications, such as clustering and biased sampling. However, traditional density estimation methods are inapplicable for streaming data, which are continuously arriving large volume of data, because of their request for linear storage and square size calculation. The shortcoming limits the application of many existing effective algorithms on data streams, for which the mining problem is an emergency for applications and a challenge for research. In this paper, the problem of computing density functions over data streams is examined. A novel method attacking this shortcoming of existing methods is developed to enable density estimation for large volume of data in linear time, fixed size memory, and without lose of accuracy. The method is based on M-Kernel merging, so that limited kernel functions to be maintained are determined intelligently, The application of the new method on different streaming data models is discussed, and the result of intensive experiments is presented. The analytical and empirical result show that this new density estimation algorithm for data streams can calculate density functions on demand at any time with high accuracy for different streaming data models.
AB - Density estimation is a costly operation for computing distribution information of data sets underlying many important data mining applications, such as clustering and biased sampling. However, traditional density estimation methods are inapplicable for streaming data, which are continuously arriving large volume of data, because of their request for linear storage and square size calculation. The shortcoming limits the application of many existing effective algorithms on data streams, for which the mining problem is an emergency for applications and a challenge for research. In this paper, the problem of computing density functions over data streams is examined. A novel method attacking this shortcoming of existing methods is developed to enable density estimation for large volume of data in linear time, fixed size memory, and without lose of accuracy. The method is based on M-Kernel merging, so that limited kernel functions to be maintained are determined intelligently, The application of the new method on different streaming data models is discussed, and the result of intensive experiments is presented. The analytical and empirical result show that this new density estimation algorithm for data streams can calculate density functions on demand at any time with high accuracy for different streaming data models.
KW - Algorithm design and analysis
KW - Application software
KW - Computer science
KW - Data engineering
KW - Data mining
KW - Data models
KW - Density functional theory
KW - Distributed computing
KW - Information processing
KW - Laboratories
UR - https://www.scopus.com/pages/publications/84943424240
U2 - 10.1109/DASFAA.2003.1192393
DO - 10.1109/DASFAA.2003.1192393
M3 - 会议稿件
AN - SCOPUS:84943424240
T3 - Proceedings - 8th International Conference on Database Systems for Advanced Applications, DASFAA 2003
SP - 285
EP - 292
BT - Proceedings - 8th International Conference on Database Systems for Advanced Applications, DASFAA 2003
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 26 March 2003 through 28 March 2003
ER -