TY - JOUR
T1 - Improving Encarta search engine performance by mining user logs
AU - Ling, Charles X.
AU - Gao, Jianfeng
AU - Zhang, Huajie
AU - Qian, Weining
AU - Zhang, Hongjiang
PY - 2002/12
Y1 - 2002/12
N2 - We propose a data-mining approach that produces generalized query patterns (with generalized keywords) from the raw user logs of the Microsoft Encarta search engine (http://encarta.msn.com). Those query patterns can act as cache of the search engine, improving its performance. The cache of the generalized query patterns is more advantageous than the cache of the most frequent user queries since our patterns are generalized, covering more queries and future queries - even those not previously asked. Our method is unique since query patterns discovered reflect the actual dynamic usage and user feedbacks of the search engine, rather than the syntactic linkage structure of web pages (as Google does). Simulation shows that such generalized query patterns improve search engine's overall speed considerably. The generalized query patterns, when viewed with a graphical user interface, are also helpful to web editors, who can easily discover topics in which users are mostly interested.
AB - We propose a data-mining approach that produces generalized query patterns (with generalized keywords) from the raw user logs of the Microsoft Encarta search engine (http://encarta.msn.com). Those query patterns can act as cache of the search engine, improving its performance. The cache of the generalized query patterns is more advantageous than the cache of the most frequent user queries since our patterns are generalized, covering more queries and future queries - even those not previously asked. Our method is unique since query patterns discovered reflect the actual dynamic usage and user feedbacks of the search engine, rather than the syntactic linkage structure of web pages (as Google does). Simulation shows that such generalized query patterns improve search engine's overall speed considerably. The generalized query patterns, when viewed with a graphical user interface, are also helpful to web editors, who can easily discover topics in which users are mostly interested.
KW - Data mining on the Internet
KW - Search engine improvement
KW - Web log mining
KW - Web mining
UR - https://www.scopus.com/pages/publications/0036976914
U2 - 10.1142/S0218001402002179
DO - 10.1142/S0218001402002179
M3 - 文章
AN - SCOPUS:0036976914
SN - 0218-0014
VL - 16
SP - 1101
EP - 1116
JO - International Journal of Pattern Recognition and Artificial Intelligence
JF - International Journal of Pattern Recognition and Artificial Intelligence
IS - 8
ER -