Model-based clustering of short text streams

  • Jianhua Yin
  • , Wei Zhang*
  • , Daren Chao
  • , Xiaohui Yu
  • , Zhongkun Liu
  • , Jianyong Wang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

68 Scopus citations

Abstract

Short text stream clustering has become an increasingly important problem due to the explosive growth of short text in diverse social medias. In this paper, we propose a model-based short text stream clustering algorithm (MStream) which can deal with the concept drift problem and sparsity problem naturally. The MStream algorithm can achieve state-of-the-art performance with only one pass of the stream, and can have even better performance when we allow multiple iterations of each batch. We further propose an improved algorithm of MStream with forgetting rules called MStreamF, which can efficiently delete outdated documents by deleting clusters of outdated batches. Our extensive experimental study shows that MStream and MStreamF can achieve better performance than three baselines on several real datasets.

Original languageEnglish
Title of host publicationKDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages2634-2642
Number of pages9
ISBN (Print)9781450355520
DOIs
StatePublished - 19 Jul 2018
Event24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018 - London, United Kingdom
Duration: 19 Aug 201823 Aug 2018

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018
Country/TerritoryUnited Kingdom
CityLondon
Period19/08/1823/08/18

Keywords

  • Dirichlet Process
  • Mixture Model
  • Text Stream Clustering

Fingerprint

Dive into the research topics of 'Model-based clustering of short text streams'. Together they form a unique fingerprint.

Cite this