TY - GEN
T1 - A refined TF-IDF algorithm based on channel distribution information for web news feature extraction
AU - Xu, Mingmin
AU - He, Liang
AU - Lin, Xin
PY - 2010
Y1 - 2010
N2 - TF-IDF algorithm is widely used in text feature extraction, in which IDF value demonstrates the importance of a term. While applying to the procession of web news, the traditional IDF doesn't work well, especially in a collection divided according to channels. In order to solve this problem, a refined IDF schema is proposed, named Channel Distribution Information (CDI) IDF, which is based on the information among the IDF values of each channel collections. According to the statistical features, the Top terms and the meaningless terms could be identified. Experiments on a manual labeled test set indicated that, related to the traditional TF-IDF, the CDI TF-IDF increases the Recall, Precise and F0.5 measure by 2.71%, 3.07% and 3.00%.
AB - TF-IDF algorithm is widely used in text feature extraction, in which IDF value demonstrates the importance of a term. While applying to the procession of web news, the traditional IDF doesn't work well, especially in a collection divided according to channels. In order to solve this problem, a refined IDF schema is proposed, named Channel Distribution Information (CDI) IDF, which is based on the information among the IDF values of each channel collections. According to the statistical features, the Top terms and the meaningless terms could be identified. Experiments on a manual labeled test set indicated that, related to the traditional TF-IDF, the CDI TF-IDF increases the Recall, Precise and F0.5 measure by 2.71%, 3.07% and 3.00%.
KW - Channel distribution information
KW - Feature extraction
KW - TF-IDF
UR - https://www.scopus.com/pages/publications/77953055904
U2 - 10.1109/ETCS.2010.130
DO - 10.1109/ETCS.2010.130
M3 - 会议稿件
AN - SCOPUS:77953055904
SN - 9780769539874
T3 - 2nd International Workshop on Education Technology and Computer Science, ETCS 2010
SP - 15
EP - 19
BT - 2nd International Workshop on Education Technology and Computer Science, ETCS 2010
T2 - 2nd International Workshop on Education Technology and Computer Science, ETCS 2010
Y2 - 6 March 2010 through 7 March 2010
ER -