Adversarial Conservative Alternating Q-Learning for Credit Card Debt Collection

  • Wenhui Liu
  • , Jiapeng Zhu
  • , Lyu Ni
  • , Jingyu Bi
  • , Zhijian Wu
  • , Jiajie Long
  • , Mengyao Gao
  • , Dingjiang Huang*
  • , Shuigeng Zhou
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Debt collection is utilized for risk control after credit card delinquency. The existing rule-based method tends to be myopic and non-adaptive due to the delayed feedback. Reinforcement learning (RL) has an inherent advantage in dealing with such task and can learn policies end-to-end. However, employing RL here remains difficult because of different interaction processes from standard RL and the notorious problem of optimistic estimations in the offline setting. To tackle these challenges, we first propose an Alternating Q-Learning (AQL) framework to adapt debt collection processes to comparable procedures in RL. Based on AQL, we further develop an Adversarial Conservative Alternating Q-Learning (ACAQL) to address the issue of overoptimistic estimations. Specifically, adversarial conservative value regularization is proposed to balance optimism and conservatism on Q-values of out-of-distribution actions. Furthermore, ACAQL utilizes the counterfactual action stitching to mitigate the overestimation by enhancing behavior data. Finally, we evaluate ACAQL on a real-world dataset created from Bank of Shanghai. Offline experimental results show that our approach outperforms state-of-the-art methods and effectively alleviates the optimistic estimation issue. Moreover, we conduct online A/B tests on the bank, and ACAQL achieves at least a 6% improvement of the debt recovery rate, which yields tangible economic benefits.

Original languageEnglish
Pages (from-to)1542-1555
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume37
Issue number4
DOIs
StatePublished - 2025

Keywords

  • Credit card
  • adversarial conservatism
  • alternating Q-learning
  • counterfactual action stitching
  • debt collection
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Adversarial Conservative Alternating Q-Learning for Credit Card Debt Collection'. Together they form a unique fingerprint.

Cite this