M-kernel merging: Towards density estimation over data streams

Aoying Zhou, Zhiyuan Cai, Li Wei, Weining Qian

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

74 Scopus citations

Abstract

Density estimation is a costly operation for computing distribution information of data sets underlying many important data mining applications, such as clustering and biased sampling. However, traditional density estimation methods are inapplicable for streaming data, which are continuously arriving large volume of data, because of their request for linear storage and square size calculation. The shortcoming limits the application of many existing effective algorithms on data streams, for which the mining problem is an emergency for applications and a challenge for research. In this paper, the problem of computing density functions over data streams is examined. A novel method attacking this shortcoming of existing methods is developed to enable density estimation for large volume of data in linear time, fixed size memory, and without lose of accuracy. The method is based on M-Kernel merging, so that limited kernel functions to be maintained are determined intelligently, The application of the new method on different streaming data models is discussed, and the result of intensive experiments is presented. The analytical and empirical result show that this new density estimation algorithm for data streams can calculate density functions on demand at any time with high accuracy for different streaming data models.

Original languageEnglish
Title of host publicationProceedings - 8th International Conference on Database Systems for Advanced Applications, DASFAA 2003
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages285-292
Number of pages8
ISBN (Electronic)0769518958, 9780769518954
DOIs
StatePublished - 2003
Externally publishedYes
Event8th International Conference on Database Systems for Advanced Applications, DASFAA 2003 - Kyoto, Japan
Duration: 26 Mar 200328 Mar 2003

Publication series

NameProceedings - 8th International Conference on Database Systems for Advanced Applications, DASFAA 2003

Conference

Conference8th International Conference on Database Systems for Advanced Applications, DASFAA 2003
Country/TerritoryJapan
CityKyoto
Period26/03/0328/03/03

Keywords

  • Algorithm design and analysis
  • Application software
  • Computer science
  • Data engineering
  • Data mining
  • Data models
  • Density functional theory
  • Distributed computing
  • Information processing
  • Laboratories

Fingerprint

Dive into the research topics of 'M-kernel merging: Towards density estimation over data streams'. Together they form a unique fingerprint.

Cite this