TY - GEN
T1 - LASCA
T2 - 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024
AU - Gu, Yongfeng
AU - Wu, Yupeng
AU - Lu, Huakang
AU - Lu, Xingyu
AU - Qian, Hong
AU - Zhou, Jun
AU - Zhou, Aimin
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/8/24
Y1 - 2024/8/24
N2 - Customer segmentation plays a crucial role in credit risk assessment by dividing users into specific risk levels based on their credit scores. Previous methods fail to comprehensively consider the stability in the segmentation process, resulting in frequent changes and inconsistencies in users' risk levels over time. This increases potential risks to a company. To this end, this paper at first introduces and formalizes the concept of stability regret in the segmentation process. However, evaluating stability is challenging due to its black-box nature and the computational burden posed by vast user data sets. To address these challenges, this paper proposes a large-scale stable customer segmentation approach named LASCA. LASCA consists of two phases: high-quality dataset construction (HDC) and reliable data-driven optimization (RDO). Specifically, HDC utilizes an evolutionary algorithm to collect high-quality binning solutions. RDO subsequently builds a reliable surrogate model to search for the most stable binning solution based on the collected dataset. Extensive experiments conducted on real-world large-scale datasets (up to 0.8 billion) show that LASCA surpasses the state-of-the-art binning methods in finding the most stable binning solution. Notably, HDC greatly enhances data quality by 50%. RDO efficiently discovers more stable binning solutions with a 36% improvement in stability, accelerating the optimization process by 25 times via data-driven evaluation. Currently, LASCA has been successfully deployed in the large-scale credit risk assessment system of Alipay.
AB - Customer segmentation plays a crucial role in credit risk assessment by dividing users into specific risk levels based on their credit scores. Previous methods fail to comprehensively consider the stability in the segmentation process, resulting in frequent changes and inconsistencies in users' risk levels over time. This increases potential risks to a company. To this end, this paper at first introduces and formalizes the concept of stability regret in the segmentation process. However, evaluating stability is challenging due to its black-box nature and the computational burden posed by vast user data sets. To address these challenges, this paper proposes a large-scale stable customer segmentation approach named LASCA. LASCA consists of two phases: high-quality dataset construction (HDC) and reliable data-driven optimization (RDO). Specifically, HDC utilizes an evolutionary algorithm to collect high-quality binning solutions. RDO subsequently builds a reliable surrogate model to search for the most stable binning solution based on the collected dataset. Extensive experiments conducted on real-world large-scale datasets (up to 0.8 billion) show that LASCA surpasses the state-of-the-art binning methods in finding the most stable binning solution. Notably, HDC greatly enhances data quality by 50%. RDO efficiently discovers more stable binning solutions with a 36% improvement in stability, accelerating the optimization process by 25 times via data-driven evaluation. Currently, LASCA has been successfully deployed in the large-scale credit risk assessment system of Alipay.
KW - credit risk assessment
KW - data-driven optimization
KW - large-scale customer segmentation
KW - reliable surrogate model
KW - stability
UR - https://www.scopus.com/pages/publications/85203694515
U2 - 10.1145/3637528.3671550
DO - 10.1145/3637528.3671550
M3 - 会议稿件
AN - SCOPUS:85203694515
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 5006
EP - 5017
BT - KDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 25 August 2024 through 29 August 2024
ER -