TY - GEN
T1 - Knowledge Distillation for Job Title Prediction and Project Recommendation in Open Source Communities
AU - Liu, Xin
AU - Su, Hang
AU - Lu, Xuesong
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - In the era of rapid digitalization, the demand for digital talents is surging and talent management in open source communities has become a crucial research area. This paper explores the application of large language models (LLMs) in two key talent management tasks within open source communities: project recommendation and job title prediction. First, we construct an evaluation dataset TM-Eval to assess the performance of LLMs on the two tasks. Second, we construct a QA dataset JA-QA from LinkedIn that describes the required APIs for each job title with job description. The dataset is used to distill knowledge pertaining to job-API correspondence of larger LLMs into smaller ones, in order to reduce computational overhead for the two tasks. We propose a hierarchical knowledge transfer method including logit-based distillation, feature-based distillation and task-specific fine-tuning with Low-Rank Adaptation. Experimental results show that larger LLMs outperform smaller ones on the two tasks. Moreover, the proposed distillation method can effectively enhance the performance of smaller LLMs, making them even surpass the original larger LLMs in some cases. This study provides a new approach for talent management in open source communities, which leverages the knowledge of LLMs to improve prediction and recommendation accuracy while reducing computational overhead. A replication package is available at https://github.com/DaSESmartEdu/KDJPPR.
AB - In the era of rapid digitalization, the demand for digital talents is surging and talent management in open source communities has become a crucial research area. This paper explores the application of large language models (LLMs) in two key talent management tasks within open source communities: project recommendation and job title prediction. First, we construct an evaluation dataset TM-Eval to assess the performance of LLMs on the two tasks. Second, we construct a QA dataset JA-QA from LinkedIn that describes the required APIs for each job title with job description. The dataset is used to distill knowledge pertaining to job-API correspondence of larger LLMs into smaller ones, in order to reduce computational overhead for the two tasks. We propose a hierarchical knowledge transfer method including logit-based distillation, feature-based distillation and task-specific fine-tuning with Low-Rank Adaptation. Experimental results show that larger LLMs outperform smaller ones on the two tasks. Moreover, the proposed distillation method can effectively enhance the performance of smaller LLMs, making them even surpass the original larger LLMs in some cases. This study provides a new approach for talent management in open source communities, which leverages the knowledge of LLMs to improve prediction and recommendation accuracy while reducing computational overhead. A replication package is available at https://github.com/DaSESmartEdu/KDJPPR.
KW - Knowledge Distillation
KW - Large Language Models
KW - Open Source Community
KW - Talent Management
UR - https://www.scopus.com/pages/publications/105020015985
U2 - 10.1007/978-3-032-06129-4_23
DO - 10.1007/978-3-032-06129-4_23
M3 - 会议稿件
AN - SCOPUS:105020015985
SN - 9783032061287
T3 - Lecture Notes in Computer Science
SP - 393
EP - 409
BT - Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track and Demo Track - European Conference, ECML PKDD 2025, Proceedings
A2 - Dutra, Inês
A2 - Jorge, Alípio M.
A2 - Soares, Carlos
A2 - Gama, João
A2 - Pechenizkiy, Mykola
A2 - Cortez, Paulo
A2 - Pashami, Sepideh
A2 - Pasquali, Arian
A2 - Moniz, Nuno
A2 - Abreu, Pedro H.
PB - Springer Science and Business Media Deutschland GmbH
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025
Y2 - 15 September 2025 through 19 September 2025
ER -