A sliding-window framework for representative subset selection

Yanhao Wang, Yuchen Li, Kian Lee Tan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Representative subset selection (RSS) is an important tool for users to draw insights from massive datasets. A common approach is to model RSS as the submodular maximization problem because the utility of extracted representatives often satisfies the 'diminishing returns' property. To capture the data recency issue and support different types of constraints in real-world problems, we formulate RSS as maximizing a submodular function subject to a d-knapsack constraint (SMDK) over sliding windows. Then, we propose a novel KnapWindow framework for SMDK. Theoretically, KnapWindow is 1-1+d-Approximate for SMDK and achieves sublinear complexity. Finally, we evaluate the efficiency and effectiveness of KnapWindow on real-world datasets. The results show that it achieves up to 120x speedups over the batch baseline with at least 94% utility assurance.

Original languageEnglish
Title of host publicationProceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1272-1275
Number of pages4
ISBN (Electronic)9781538655207
DOIs
StatePublished - 24 Oct 2018
Externally publishedYes
Event34th IEEE International Conference on Data Engineering, ICDE 2018 - Paris, France
Duration: 16 Apr 201819 Apr 2018

Publication series

NameProceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018

Conference

Conference34th IEEE International Conference on Data Engineering, ICDE 2018
Country/TerritoryFrance
CityParis
Period16/04/1819/04/18

Keywords

  • Approximation algorithm
  • Data stream
  • Data summarization
  • Sliding window
  • submodular maximization

Fingerprint

Dive into the research topics of 'A sliding-window framework for representative subset selection'. Together they form a unique fingerprint.

Cite this