Dynamic clustering based contextual combinatorial multi-armed bandit for online recommendation

  • Cairong Yan
  • , Haixia Han*
  • , Yanting Zhang
  • , Dandan Zhu
  • , Yongquan Wan*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Recommender systems still face a trade-off between exploring new items to maximize user satisfaction and exploiting those already interacted with to match user interests. This problem is widely recognized as the exploration/exploitation (EE) dilemma, and the multi-armed bandit (MAB) algorithm has proven to be an effective solution. As the scale of users and items in real-world application scenarios increases, their purchase interactions become sparser. Then three issues need to be investigated when building MAB-based recommender systems. First, large-scale users and sparse interactions increase the difficulty of user preference mining. Second, traditional bandits model items as arms and cannot deal with ever-growing items effectively. Third, widely used Bernoulli-based reward mechanisms only feedback 0 or 1, ignoring rich implicit feedback such as behaviors like click and add-to-cart. To address these problems, we propose an algorithm named Dynamic Clustering based Contextual Combinatorial Multi-Armed Bandits (DC3MAB), which consists of three configurable key components. Specifically, a dynamic user clustering strategy enables different users in the same cluster to cooperate in estimating the expected rewards of arms. A dynamic item partitioning approach based on collaborative filtering significantly reduces the scale of arms and produces a recommendation list instead of one item to provide diversity. In addition, a multi-class reward mechanism based on fine-grained implicit feedback helps better capture user preferences. Extensive empirical experiments on three real-world datasets demonstrate the superiority of our proposed DC3MAB over state-of-the-art bandits (On average, +75.8% in F1 and +54.3% in cumulative reward). The source code is available at https://github.com/HaixHan/DC3MAB.

Original languageEnglish
Article number109927
JournalKnowledge-Based Systems
Volume257
DOIs
StatePublished - 5 Dec 2022
Externally publishedYes

Keywords

  • Contextual multi-armed bandit
  • Dynamic clustering
  • Implicit feedback
  • Online recommendation

Fingerprint

Dive into the research topics of 'Dynamic clustering based contextual combinatorial multi-armed bandit for online recommendation'. Together they form a unique fingerprint.

Cite this