TY - GEN
T1 - SENTIX
T2 - 28th International Conference on Computational Linguistics, COLING 2020
AU - Zhou, Jie
AU - Tian, Junfeng
AU - Wang, Rui
AU - Wu, Yuanbin
AU - Xiao, Wenming
AU - He, Liang
N1 - Publisher Copyright:
© 2020 COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Pre-trained language models have been widely applied to cross-domain NLP tasks like sentiment analysis, achieving state-of-the-art performance. However, due to the variety of users’ emotional expressions across domains, fine-tuning the pre-trained models on the source domain tends to overfit, leading to inferior results on the target domain. In this paper, we pre-train a sentiment-aware language model (SENTIX) via domain-invariant sentiment knowledge from large-scale review datasets, and utilize it for cross-domain sentiment analysis task without fine-tuning. We propose several pre-training tasks based on existing lexicons and annotations at both token and sentence levels, such as emoticons, sentiment words, and ratings, without human interference. A series of experiments are conducted and the results indicate the great advantages of our model. We obtain new state-of-the-art results in all the cross-domain sentiment analysis tasks, and our proposed SENTIX can be trained with only 1% samples (18 samples) and it achieves better performance than BERT with 90% samples. Code is available at https://github.com/12190143/SentiX.
AB - Pre-trained language models have been widely applied to cross-domain NLP tasks like sentiment analysis, achieving state-of-the-art performance. However, due to the variety of users’ emotional expressions across domains, fine-tuning the pre-trained models on the source domain tends to overfit, leading to inferior results on the target domain. In this paper, we pre-train a sentiment-aware language model (SENTIX) via domain-invariant sentiment knowledge from large-scale review datasets, and utilize it for cross-domain sentiment analysis task without fine-tuning. We propose several pre-training tasks based on existing lexicons and annotations at both token and sentence levels, such as emoticons, sentiment words, and ratings, without human interference. A series of experiments are conducted and the results indicate the great advantages of our model. We obtain new state-of-the-art results in all the cross-domain sentiment analysis tasks, and our proposed SENTIX can be trained with only 1% samples (18 samples) and it achieves better performance than BERT with 90% samples. Code is available at https://github.com/12190143/SentiX.
UR - https://www.scopus.com/pages/publications/85147604780
M3 - 会议稿件
AN - SCOPUS:85147604780
T3 - COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
SP - 568
EP - 579
BT - COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
A2 - Scott, Donia
A2 - Bel, Nuria
A2 - Zong, Chengqing
PB - Association for Computational Linguistics (ACL)
Y2 - 8 December 2020 through 13 December 2020
ER -