Code Summarization with Project-Specific Features

Yu Wang, Xin Liu, Xuesong Lu*, Aoying Zhou

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Code summarization aims to automatically generate natural language descriptions for code snippets, which help people maintain and understand code snippets. Existing code summarization methods are mostly based on the encoder-decoder structure, where the encoder learns latent features from a code snippet and the decoder generates the corresponding summary based on the features. Such methods do not leverage project-specific information and tend to generate general summaries. However, in practice developers want the generated summaries to be project-specific, i.e., being consistent with the existing summaries in the same project on aspects such as sentence patterns and domain concepts. In this work, we investigate project-specific code summarization. We propose a two-stage method CSWPS, which can be seamlessly integrated into any existing encoder-decoder summarization model. In the first stage, CSWPS learns project-specific features from existing summaries in each project using multi-task learning. In the second stage, CSWPS samples from the project-specific features conditioned on the input source code and project information, and extracts the features most relevant to the input code. The features guide the decoder to generate a project-specific summary for the input code. By incorporating CSWPS into existing code summarization models, we can always improve their performance and achieve the new state-of-the-art. We also empirically show that the summaries generated by incorporating CSWPS are more project-specific, via feature visualization and human study. A replication package for this work is available at https://github.com/DaSESmartEdu/CSWPS.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track - European Conference, ECML PKDD 2024, Proceedings
EditorsAlbert Bifet, Tomas Krilavičius, Ioanna Miliou, Slawomir Nowaczyk
PublisherSpringer Science and Business Media Deutschland GmbH
Pages190-206
Number of pages17
ISBN (Print)9783031703775
DOIs
StatePublished - 2024
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2024 - Vilnius, Lithuania
Duration: 9 Sep 202413 Sep 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14949 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2024
Country/TerritoryLithuania
CityVilnius
Period9/09/2413/09/24

Keywords

  • Deep Learning
  • Latent Representation Sampling
  • Neural Network
  • Project-Specific
  • Source Code Summarization

Fingerprint

Dive into the research topics of 'Code Summarization with Project-Specific Features'. Together they form a unique fingerprint.

Cite this