LASCA: A Large-Scale Stable Customer Segmentation Approach to Credit Risk Assessment

Yongfeng Gu, Yupeng Wu, Huakang Lu, Xingyu Lu, Hong Qian, Jun Zhou, Aimin Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Customer segmentation plays a crucial role in credit risk assessment by dividing users into specific risk levels based on their credit scores. Previous methods fail to comprehensively consider the stability in the segmentation process, resulting in frequent changes and inconsistencies in users' risk levels over time. This increases potential risks to a company. To this end, this paper at first introduces and formalizes the concept of stability regret in the segmentation process. However, evaluating stability is challenging due to its black-box nature and the computational burden posed by vast user data sets. To address these challenges, this paper proposes a large-scale stable customer segmentation approach named LASCA. LASCA consists of two phases: high-quality dataset construction (HDC) and reliable data-driven optimization (RDO). Specifically, HDC utilizes an evolutionary algorithm to collect high-quality binning solutions. RDO subsequently builds a reliable surrogate model to search for the most stable binning solution based on the collected dataset. Extensive experiments conducted on real-world large-scale datasets (up to 0.8 billion) show that LASCA surpasses the state-of-the-art binning methods in finding the most stable binning solution. Notably, HDC greatly enhances data quality by 50%. RDO efficiently discovers more stable binning solutions with a 36% improvement in stability, accelerating the optimization process by 25 times via data-driven evaluation. Currently, LASCA has been successfully deployed in the large-scale credit risk assessment system of Alipay.

Original languageEnglish
Title of host publicationKDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages5006-5017
Number of pages12
ISBN (Electronic)9798400704901
DOIs
StatePublished - 24 Aug 2024
Event30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024 - Barcelona, Spain
Duration: 25 Aug 202429 Aug 2024

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
ISSN (Print)2154-817X

Conference

Conference30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024
Country/TerritorySpain
CityBarcelona
Period25/08/2429/08/24

Keywords

  • credit risk assessment
  • data-driven optimization
  • large-scale customer segmentation
  • reliable surrogate model
  • stability

Fingerprint

Dive into the research topics of 'LASCA: A Large-Scale Stable Customer Segmentation Approach to Credit Risk Assessment'. Together they form a unique fingerprint.

Cite this