Efficiently clustering probabilistic data streams

Chen Zhang*, Cheqing Jin, Aoying Zhou

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Data mining on uncertain data stream has attracted a lot of attentions because of the widely existed imprecise data generated from a variety of streaming applications in recent years. The main challenge of mining uncertain data streams stems from the strict space and time requirements of processing arriving tuples in high-speed. When new tuples arrive, the number of the possible world instances will increase exponentially related to the volume of the data stream. As one of the most important mining task, how to devise clustering algorithms has been studied intensively on deterministic data streams, whereas the work on the uncertain data streams still remains rare. This paper proposes a novel solution for clustering on uncertain data streams in point probability model, where the existence of each tuple is uncertain. Detailed analysis and the thorough experimental reports both on synthetic and real data sets illustrate the advantages of our new method in terms of effectiveness and efficiency.

Original languageEnglish
Title of host publicationAdvances in Data and Web Management - Joint International Conferences, APWeb/WAIM 2009, Proceedings
PublisherSpringer Verlag
Pages273-284
Number of pages12
ISBN (Print)9783642006715
DOIs
StatePublished - 2009
EventJoint International Conference on Advances in Data and Web Management, APWeb/WAIM 2009 - Suzhou, China
Duration: 2 Apr 20094 Apr 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5446
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceJoint International Conference on Advances in Data and Web Management, APWeb/WAIM 2009
Country/TerritoryChina
CitySuzhou
Period2/04/094/04/09

Keywords

  • Clustering
  • Uncertain data stream

Fingerprint

Dive into the research topics of 'Efficiently clustering probabilistic data streams'. Together they form a unique fingerprint.

Cite this