Contrastive author-aware text clustering

  • Xudong Tang
  • , Chao Dong
  • , Wei Zhang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

In the era of User Generated Content (UGC), authors (IDs) of texts widely exist and play a key role in determining the topic categories of texts. Existing text clustering efforts are mainly attributed to utilizing textual information, but the effect of authors on text clustering remains largely underexplored. To mitigate this issue, we propose a novel Contrastive Author-aware Text clustering approach, dubbed as CAT. CAT injects author information not only in characterizing texts through representations but also in pushing or pulling text representations of different authors through contrastive learning, which is rarely adopted by text clustering. Specifically, the developed contrastive learning method conducts both cluster-instance contrast by the text representation augmentation and instance-instance contrast by the multi-view representations. We perform comprehensive experiments on three public datasets, demonstrating that CAT largely outperforms strong competitive text clustering baselines and validating the effectiveness of the CAT's main components.

Original languageEnglish
Article number108787
JournalPattern Recognition
Volume130
DOIs
StatePublished - Oct 2022

Keywords

  • Contrastive learning
  • Representation learning
  • Text clustering

Fingerprint

Dive into the research topics of 'Contrastive author-aware text clustering'. Together they form a unique fingerprint.

Cite this