TY - JOUR
T1 - Adversarial Conservative Alternating Q-Learning for Credit Card Debt Collection
AU - Liu, Wenhui
AU - Zhu, Jiapeng
AU - Ni, Lyu
AU - Bi, Jingyu
AU - Wu, Zhijian
AU - Long, Jiajie
AU - Gao, Mengyao
AU - Huang, Dingjiang
AU - Zhou, Shuigeng
N1 - Publisher Copyright:
© 2025 IEEE. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Debt collection is utilized for risk control after credit card delinquency. The existing rule-based method tends to be myopic and non-adaptive due to the delayed feedback. Reinforcement learning (RL) has an inherent advantage in dealing with such task and can learn policies end-to-end. However, employing RL here remains difficult because of different interaction processes from standard RL and the notorious problem of optimistic estimations in the offline setting. To tackle these challenges, we first propose an Alternating Q-Learning (AQL) framework to adapt debt collection processes to comparable procedures in RL. Based on AQL, we further develop an Adversarial Conservative Alternating Q-Learning (ACAQL) to address the issue of overoptimistic estimations. Specifically, adversarial conservative value regularization is proposed to balance optimism and conservatism on Q-values of out-of-distribution actions. Furthermore, ACAQL utilizes the counterfactual action stitching to mitigate the overestimation by enhancing behavior data. Finally, we evaluate ACAQL on a real-world dataset created from Bank of Shanghai. Offline experimental results show that our approach outperforms state-of-the-art methods and effectively alleviates the optimistic estimation issue. Moreover, we conduct online A/B tests on the bank, and ACAQL achieves at least a 6% improvement of the debt recovery rate, which yields tangible economic benefits.
AB - Debt collection is utilized for risk control after credit card delinquency. The existing rule-based method tends to be myopic and non-adaptive due to the delayed feedback. Reinforcement learning (RL) has an inherent advantage in dealing with such task and can learn policies end-to-end. However, employing RL here remains difficult because of different interaction processes from standard RL and the notorious problem of optimistic estimations in the offline setting. To tackle these challenges, we first propose an Alternating Q-Learning (AQL) framework to adapt debt collection processes to comparable procedures in RL. Based on AQL, we further develop an Adversarial Conservative Alternating Q-Learning (ACAQL) to address the issue of overoptimistic estimations. Specifically, adversarial conservative value regularization is proposed to balance optimism and conservatism on Q-values of out-of-distribution actions. Furthermore, ACAQL utilizes the counterfactual action stitching to mitigate the overestimation by enhancing behavior data. Finally, we evaluate ACAQL on a real-world dataset created from Bank of Shanghai. Offline experimental results show that our approach outperforms state-of-the-art methods and effectively alleviates the optimistic estimation issue. Moreover, we conduct online A/B tests on the bank, and ACAQL achieves at least a 6% improvement of the debt recovery rate, which yields tangible economic benefits.
KW - Credit card
KW - adversarial conservatism
KW - alternating Q-learning
KW - counterfactual action stitching
KW - debt collection
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/86000436877
U2 - 10.1109/TKDE.2025.3528219
DO - 10.1109/TKDE.2025.3528219
M3 - 文章
AN - SCOPUS:86000436877
SN - 1041-4347
VL - 37
SP - 1542
EP - 1555
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 4
ER -