跳到主要导航 跳到搜索 跳到主要内容

TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills

  • Qiushi Sun
  • , Nuo Chen
  • , Jianing Wang
  • , Xiang Li*
  • , Ming Gao
  • *此作品的通讯作者
  • National University of Singapore
  • East China Normal University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Code pre-trained models (CodePTMs) have recently demonstrated a solid capacity to process various code intelligence tasks, e.g., code clone detection, code translation, and code summarization. The current mainstream method that deploys these models to downstream tasks is to fine-tune them on individual tasks, which is generally costly and needs sufficient data for large models. To tackle the issue, in this paper, we present TransCoder, a unified Transferable fine-tuning strategy for Code representation learning. Inspired by human inherent skills of knowledge generalization, TransCoder drives the model to learn better code-related knowledge like human programmers. Specifically, we employ a tunable prefix encoder to first capture cross-task and cross-language transferable knowledge, subsequently applying the acquired knowledge for optimized downstream adaptation. Besides, our approach confers benefits for tasks with minor training sample sizes and languages with smaller corpora, underscoring versatility and efficacy. Extensive experiments conducted on representative benchmarks clearly demonstrate that our method can lead to superior performance on various code-related tasks and encourage mutual reinforcement, especially in low-resource scenarios. Our codes are available at https://github.com/QiushiSun/TransCoder.

源语言英语
主期刊名2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
编辑Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
出版商European Language Resources Association (ELRA)
16713-16726
页数14
ISBN(电子版)9782493814104
出版状态已出版 - 2024
活动Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024 - Hybrid, Torino, 意大利
期限: 20 5月 202425 5月 2024

出版系列

姓名2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

会议

会议Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
国家/地区意大利
Hybrid, Torino
时期20/05/2425/05/24

指纹

探究 'TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills' 的科研主题。它们共同构成独一无二的指纹。

引用此