跳到主要导航 跳到搜索 跳到主要内容

Exploring retrieval-augmented generation for multi-label discipline classification of academic short texts

  • Duxin Shang*
  • , Yufeng Duan
  • , Ping Bai
  • , Jiahong Xie
  • *此作品的通讯作者
  • East China Normal University

科研成果: 期刊稿件文章同行评审

摘要

The discipline classification of academic short texts can effectively promote bibliometric analysis of academic papers. Traditional classification methods face challenges such as data sparsity and limited annotation resources when handling academic short texts. Additionally, these methods exhibit significant limitations in computational complexity and interpretability. To address these issues, this paper proposes a Retrieval-Augmented Generation (RAG)-based multi-label classification framework for academic short texts. This framework enhances the input to generative models by retrieving relevant information from an external knowledge base, thereby enhancing both classification performance and interpretability. The framework comprises four core modules: knowledge base construction, retriever, prompt engineering, and large language model (LLM) invocation. Under this framework, we construct an academic text knowledge base containing multiple disciplines based on the Semantic Scholar Open Research Corpus (S2ORC) academic paper dataset. We also design targeted prompts to guide the Large Language Model in generating discipline classification labels and their justifications. Experimental results demonstrate that the RAG-based approach offers significant advantages in multi-label classification tasks for academic short texts. Compared to traditional deep learning models and standalone Large Language Models, RAG significantly reduces classification error rates and enhances label coverage and the accuracy of top-1 label predictions.

源语言英语
页(从-至)2373-2399
页数27
期刊Scientometrics
131
4
DOI
出版状态已出版 - 4月 2026

指纹

探究 'Exploring retrieval-augmented generation for multi-label discipline classification of academic short texts' 的科研主题。它们共同构成独一无二的指纹。

引用此