Knowledge Distillation for Job Title Prediction and Project Recommendation in Open Source Communities

Xin Liu, Hang Su, Xuesong Lu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the era of rapid digitalization, the demand for digital talents is surging and talent management in open source communities has become a crucial research area. This paper explores the application of large language models (LLMs) in two key talent management tasks within open source communities: project recommendation and job title prediction. First, we construct an evaluation dataset TM-Eval to assess the performance of LLMs on the two tasks. Second, we construct a QA dataset JA-QA from LinkedIn that describes the required APIs for each job title with job description. The dataset is used to distill knowledge pertaining to job-API correspondence of larger LLMs into smaller ones, in order to reduce computational overhead for the two tasks. We propose a hierarchical knowledge transfer method including logit-based distillation, feature-based distillation and task-specific fine-tuning with Low-Rank Adaptation. Experimental results show that larger LLMs outperform smaller ones on the two tasks. Moreover, the proposed distillation method can effectively enhance the performance of smaller LLMs, making them even surpass the original larger LLMs in some cases. This study provides a new approach for talent management in open source communities, which leverages the knowledge of LLMs to improve prediction and recommendation accuracy while reducing computational overhead. A replication package is available at https://github.com/DaSESmartEdu/KDJPPR.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track and Demo Track - European Conference, ECML PKDD 2025, Proceedings
EditorsInês Dutra, Alípio M. Jorge, Carlos Soares, João Gama, Mykola Pechenizkiy, Paulo Cortez, Sepideh Pashami, Arian Pasquali, Nuno Moniz, Pedro H. Abreu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages393-409
Number of pages17
ISBN (Print)9783032061287
DOIs
StatePublished - 2026
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025 - Porto, Portugal
Duration: 15 Sep 202519 Sep 2025

Publication series

NameLecture Notes in Computer Science
Volume16022
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025
Country/TerritoryPortugal
CityPorto
Period15/09/2519/09/25

Keywords

  • Knowledge Distillation
  • Large Language Models
  • Open Source Community
  • Talent Management

Fingerprint

Dive into the research topics of 'Knowledge Distillation for Job Title Prediction and Project Recommendation in Open Source Communities'. Together they form a unique fingerprint.

Cite this