TY - GEN
T1 - Bayesian performance comparison of text classifiers
AU - Zhang, Dell
AU - Wang, Jun
AU - Yilmaz, Emine
AU - Wang, Xiaoling
AU - Zhou, Yuxin
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/7/7
Y1 - 2016/7/7
N2 - How can we know whether one classifier is really better than the other? In the area of text classification, since the publication of Yang and Liu's seminal SIGIR-1999 paper, it has become a standard practice for researchers to apply nullhypothesis significance testing (NHST) on their experimental results in order to establish the superiority of a classifier. However, such a frequentist approach has a number of inherent deficiencies and limitations, e.g., the inability to accept the null hypothesis (that the two classifiers perform equally well), the difficulty to compare commonly-used multivariate performance measures like F1 scores instead of accuracy, and so on. In this paper, we propose a novel Bayesian approach to the performance comparison of text classifiers, and argue its advantages over the traditional frequentist approach based on t-test etc. In contrast to the existing probabilistic model for F1 scores which is unpaired, our proposed model takes the correlation between classifiers into account and thus achieves greater statistical power. Using several typical text classification algorithms and a benchmark dataset, we demonstrate that the our approach provides rich information about the difference between two classifiers' performances.
AB - How can we know whether one classifier is really better than the other? In the area of text classification, since the publication of Yang and Liu's seminal SIGIR-1999 paper, it has become a standard practice for researchers to apply nullhypothesis significance testing (NHST) on their experimental results in order to establish the superiority of a classifier. However, such a frequentist approach has a number of inherent deficiencies and limitations, e.g., the inability to accept the null hypothesis (that the two classifiers perform equally well), the difficulty to compare commonly-used multivariate performance measures like F1 scores instead of accuracy, and so on. In this paper, we propose a novel Bayesian approach to the performance comparison of text classifiers, and argue its advantages over the traditional frequentist approach based on t-test etc. In contrast to the existing probabilistic model for F1 scores which is unpaired, our proposed model takes the correlation between classifiers into account and thus achieves greater statistical power. Using several typical text classification algorithms and a benchmark dataset, we demonstrate that the our approach provides rich information about the difference between two classifiers' performances.
KW - Bayesian inference
KW - Hypothesis testing
KW - Performance evaluation
KW - Text classification
UR - https://www.scopus.com/pages/publications/84980332154
U2 - 10.1145/2911451.2911547
DO - 10.1145/2911451.2911547
M3 - 会议稿件
AN - SCOPUS:84980332154
T3 - SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 15
EP - 24
BT - SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016
Y2 - 17 July 2016 through 21 July 2016
ER -