ParaSum: Contrastive Paraphrasing for Low-Resource Extractive Text Summarization

Moming Tang, Chengyu Wang, Jianing Wang, Cen Chen, Ming Gao, Weining Qian

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Existing extractive summarization methods achieve state-of-the-art (SOTA) performance with pre-trained language models (PLMs) and sufficient training data. However, PLM-based methods are known to be data-hungry and often fail to deliver satisfactory results in low-resource scenarios. Constructing a high-quality summarization dataset with human-authored reference summaries is a prohibitively expensive task. To address these challenges, this paper proposes a novel paradigm for low-resource extractive summarization, called ParaSum. This paradigm reformulates text summarization as textual paraphrasing, aligning the text summarization task with the self-supervised Next Sentence Prediction (NSP) task of PLMs. This approach minimizes the training gap between the summarization model and PLMs, enabling a more effective probing of the knowledge encoded within PLMs and enhancing the summarization performance. Furthermore, to relax the requirement for large amounts of training data, we introduce a simple yet efficient model and align the training paradigm of summarization to textual paraphrasing to facilitate network-based transfer learning. Extensive experiments over two widely used benchmarks (i.e., CNN/DailyMail, Xsum) and a recent open-sourced high-quality Chinese benchmark (i.e., CNewSum) show that ParaSum consistently outperforms existing PLM-based summarization methods in all low-resource settings, demonstrating its effectiveness over different types of datasets.

Original languageEnglish
Title of host publicationKnowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings
EditorsZhi Jin, Yuncheng Jiang, Wenjun Ma, Robert Andrei Buchmann, Ana-Maria Ghiran, Yaxin Bi
PublisherSpringer Science and Business Media Deutschland GmbH
Pages106-119
Number of pages14
ISBN (Print)9783031402883
DOIs
StatePublished - 2023
EventKnowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings - Guangzhou, China
Duration: 16 Aug 202318 Aug 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14119 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceKnowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings
Country/TerritoryChina
CityGuangzhou
Period16/08/2318/08/23

Keywords

  • extractive summarization
  • low-resource scenarios
  • pre-trained language model
  • textual paraphrasing
  • transfer learning

Fingerprint

Dive into the research topics of 'ParaSum: Contrastive Paraphrasing for Low-Resource Extractive Text Summarization'. Together they form a unique fingerprint.

Cite this