跳到主要导航 跳到搜索 跳到主要内容

Self-Supervised Query Reformulation for Code Search

  • Yuetian Mao
  • , Chengcheng Wan
  • , Yuze Jiang
  • , Xiaodong Gu*
  • *此作品的通讯作者
  • Shanghai Jiao Tong University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Automatic query reformulation is a widely utilized technology for enriching user requirements and enhancing the outcomes of code search. It can be conceptualized as a machine translation task, wherein the objective is to rephrase a given query into a more comprehensive alternative. While showing promising results, training such a model typically requires a large parallel corpus of query pairs (i.e., the original query and a reformulated query) that are confidential and unpublished by online code search engines. This restricts its practicality in software development processes. In this paper, we propose SSQR, a self-supervised query reformulation method that does not rely on any parallel query corpus. Inspired by pre-trained models, SSQR treats query reformulation as a masked language modeling task conducted on an extensive unannotated corpus of queries. SSQR extends T5 (a sequence-to-sequence model based on Transformer) with a new pre-training objective named corrupted query completion (CQC), which randomly masks words within a complete query and trains T5 to predict the masked content. Subsequently, for a given query to be reformulated, SSQR identifies potential locations for expansion and leverages the pre-trained T5 model to generate appropriate content to fill these gaps. The selection of expansions is then based on the information gain associated with each candidate. Evaluation results demonstrate that SSQR outperforms unsupervised baselines significantly and achieves competitive performance compared to supervised methods.

源语言英语
主期刊名ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
编辑Satish Chandra, Kelly Blincoe, Paolo Tonella
出版商Association for Computing Machinery, Inc
363-374
页数12
ISBN(电子版)9798400703270
DOI
出版状态已出版 - 30 11月 2023
活动31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023 - San Francisco, 美国
期限: 3 12月 20239 12月 2023

出版系列

姓名ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

会议

会议31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023
国家/地区美国
San Francisco
时期3/12/239/12/23

指纹

探究 'Self-Supervised Query Reformulation for Code Search' 的科研主题。它们共同构成独一无二的指纹。

引用此