Optimizing word set coverage for multi-event summarization

  • Jihong Yan
  • , Wenliang Cheng
  • , Chengyu Wang
  • , Jun Liu
  • , Ming Gao*
  • , Aoying Zhou
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

We have witnessed the proliferation of the Internet over the past few decades. A large amount of textual information is generated on the Web. It is impossible to locate and digest all the latest updates available on the Web for individuals. Text summarization would provide an efficient way to generate short, concise abstracts from the massive documents. These massive documents involve many events which are hard to be identified by the summarization procedure directly. We propose a novel methodology that identifies events from these text corpora and creates summarization for each event. We employ a probabilistic, topic model to learn the potential topics from the massive documents and further discover events in terms of the topic distributions of documents. To target the summarization, we define the word set coverage problem (WSCP) to capture the most representative sentences to summarize an event. For getting solution of the WSCP, we propose an approximate algorithm to solve the optimization problem. We conduct a set of experiments to evaluate our proposed approach on two real datasets: Sina news and Johnson & Johnson medical news. On both datasets, our proposed method outperforms competitive baselines by considering the harmonic mean of coverage and conciseness.

Original languageEnglish
Pages (from-to)996-1015
Number of pages20
JournalJournal of Combinatorial Optimization
Volume30
Issue number4
DOIs
StatePublished - 1 Nov 2015

Keywords

  • Event summarization
  • Optimization
  • Set coverage
  • Word set

Fingerprint

Dive into the research topics of 'Optimizing word set coverage for multi-event summarization'. Together they form a unique fingerprint.

Cite this