A Bayesian hierarchical model for comparing average F1 scores

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

25 Scopus citations

Abstract

In multi-class text classification, the performance (effectiveness) of a classifier is usually measured by micro-averaged and macro-averaged F1 scores. However, the scores themselves do not tell us how reliable they are in terms of forecasting the classifier's future performance on unseen data. In this paper, we propose a novel approach to explicitly modelling the uncertainty of average F1 scores through Bayesian reasoning, and demonstrate that it can provide much more comprehensive performance comparison between text classifiers than the traditional frequentist null hypothesis significance testing (NHST).

Original languageEnglish
Title of host publicationProceedings - 15th IEEE International Conference on Data Mining, ICDM 2015
EditorsCharu Aggarwal, Zhi-Hua Zhou, Alexander Tuzhilin, Hui Xiong, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages589-598
Number of pages10
ISBN (Electronic)9781467395038
DOIs
StatePublished - 5 Jan 2016
Event15th IEEE International Conference on Data Mining, ICDM 2015 - Atlantic City, United States
Duration: 14 Nov 201517 Nov 2015

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2016-January
ISSN (Print)1550-4786

Conference

Conference15th IEEE International Conference on Data Mining, ICDM 2015
Country/TerritoryUnited States
CityAtlantic City
Period14/11/1517/11/15

Keywords

  • Bayesian inference
  • Hypothesis testing
  • Model comparison
  • Performance evaluation
  • Text classification

Fingerprint

Dive into the research topics of 'A Bayesian hierarchical model for comparing average F1 scores'. Together they form a unique fingerprint.

Cite this