跳到主要导航 跳到搜索 跳到主要内容

Adversarial Conservative Alternating Q-Learning for Credit Card Debt Collection

  • Wenhui Liu
  • , Jiapeng Zhu
  • , Lyu Ni
  • , Jingyu Bi
  • , Zhijian Wu
  • , Jiajie Long
  • , Mengyao Gao
  • , Dingjiang Huang*
  • , Shuigeng Zhou
  • *此作品的通讯作者
  • East China Normal University
  • Bank of Shanghai
  • Fudan University

科研成果: 期刊稿件文章同行评审

摘要

Debt collection is utilized for risk control after credit card delinquency. The existing rule-based method tends to be myopic and non-adaptive due to the delayed feedback. Reinforcement learning (RL) has an inherent advantage in dealing with such task and can learn policies end-to-end. However, employing RL here remains difficult because of different interaction processes from standard RL and the notorious problem of optimistic estimations in the offline setting. To tackle these challenges, we first propose an Alternating Q-Learning (AQL) framework to adapt debt collection processes to comparable procedures in RL. Based on AQL, we further develop an Adversarial Conservative Alternating Q-Learning (ACAQL) to address the issue of overoptimistic estimations. Specifically, adversarial conservative value regularization is proposed to balance optimism and conservatism on Q-values of out-of-distribution actions. Furthermore, ACAQL utilizes the counterfactual action stitching to mitigate the overestimation by enhancing behavior data. Finally, we evaluate ACAQL on a real-world dataset created from Bank of Shanghai. Offline experimental results show that our approach outperforms state-of-the-art methods and effectively alleviates the optimistic estimation issue. Moreover, we conduct online A/B tests on the bank, and ACAQL achieves at least a 6% improvement of the debt recovery rate, which yields tangible economic benefits.

源语言英语
页(从-至)1542-1555
页数14
期刊IEEE Transactions on Knowledge and Data Engineering
37
4
DOI
出版状态已出版 - 2025

指纹

探究 'Adversarial Conservative Alternating Q-Learning for Credit Card Debt Collection' 的科研主题。它们共同构成独一无二的指纹。

引用此