TY - GEN
T1 - A multiple feature integration model to infer occupation from social media records
AU - Wang, Xiang
AU - Yu, Lele
AU - Yao, Junjie
AU - Cui, Bin
PY - 2013
Y1 - 2013
N2 - With the rapid development of more and more social media applications, lots of users are connected with friends and their daily life and opinions are recorded. Social media provides us an unprecedented way to collect and analyze billions of users' information. Proper user attribute identification or profile inference becomes more and more attractive and feasible. However, the flourishing social records also pose great challenge in effective feature selection and integration for user profile inference. This is mainly caused by the text sparsity and complex community structures. In this paper, we propose a comprehensive framework to infer user's occupation from his/her social activities recorded in micro-blog message streams. A multi-source integrated classification model is set up with some fine selected features. We first identify some beneficial basic content features, and then we proceed to tailor a community discovery based latent dimension solution to extract community features. Extensive empirical studies are conducted on a large real micro-blog dataset. Not only we demonstrate the integrated model shows advantages over several baseline methods, but also we verify the effect of homophily in users' interaction records. The different effects of heterogeneous interactive networks are also revealed.
AB - With the rapid development of more and more social media applications, lots of users are connected with friends and their daily life and opinions are recorded. Social media provides us an unprecedented way to collect and analyze billions of users' information. Proper user attribute identification or profile inference becomes more and more attractive and feasible. However, the flourishing social records also pose great challenge in effective feature selection and integration for user profile inference. This is mainly caused by the text sparsity and complex community structures. In this paper, we propose a comprehensive framework to infer user's occupation from his/her social activities recorded in micro-blog message streams. A multi-source integrated classification model is set up with some fine selected features. We first identify some beneficial basic content features, and then we proceed to tailor a community discovery based latent dimension solution to extract community features. Extensive empirical studies are conducted on a large real micro-blog dataset. Not only we demonstrate the integrated model shows advantages over several baseline methods, but also we verify the effect of homophily in users' interaction records. The different effects of heterogeneous interactive networks are also revealed.
KW - Feature Selection
KW - Heterogeneous Network
KW - Micro-blog
KW - Occupation Inference
KW - User Profile Modeling
UR - https://www.scopus.com/pages/publications/84887503885
U2 - 10.1007/978-3-642-41154-0_10
DO - 10.1007/978-3-642-41154-0_10
M3 - 会议稿件
AN - SCOPUS:84887503885
SN - 9783642411533
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 137
EP - 150
BT - Web Information Systems Engineering, WISE 2013 - 14th International Conference, Proceedings
T2 - 14th International Conference on Web Information Systems Engineering, WISE 2013
Y2 - 13 October 2013 through 15 October 2013
ER -