Abstract
The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1−1/e for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.
| Original language | English |
|---|---|
| Pages (from-to) | 477-487 |
| Number of pages | 11 |
| Journal | Frontiers of Computer Science |
| Volume | 10 |
| Issue number | 3 |
| DOIs | |
| State | Published - 1 Jun 2016 |
Keywords
- diversification
- submodular function maximization
- top-k retrieval
Fingerprint
Dive into the research topics of 'Optimizing top-k retrieval: submodularity analysis and search strategies'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver