Pólya urn model and its application to text categorization

  • Haibin Zhang
  • , Xianyi Wu*
  • , Xueqin Zhou
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Pólya urn model is a basic model widely applied in statistics and text mining. Most algorithms to training the model are very slow and complicated so that it generally difficult to fit a Pólya urn model to big data sets. This paper proposes a new minorization-maximization (MM) algorithm for the maximum likelihood estimation (MLE) of the Pólya urn model in which the surrogate function is constructed by means of a simple convex function. The convergence of the MM algorithm is analyzed and the asymptotic normality of the corresponding MLE for non-identically distributed observations is also derived. The performance of this new MM algorithm is also compared with Newton method and other MM algorithms. The Pólya urn model is applied to text categorization. Its superiority to naive Bayes (NB) classifier, k-Nearest Neighbor (k-NN) and support vector machine (SVM) are demonstrated by a real newsgroup dataset.

Original languageEnglish
Pages (from-to)227-237
Number of pages11
JournalStatistics and its Interface
Volume12
Issue number2
DOIs
StatePublished - 2019

Keywords

  • Asymptotic properties
  • Minorizationmaximization
  • Pólya urn model
  • Text categorization

Fingerprint

Dive into the research topics of 'Pólya urn model and its application to text categorization'. Together they form a unique fingerprint.

Cite this