A simple, yet effective and efficient, sliding window sampling algorithm

  • Xuesong Lu*
  • , Wee Hyong Tok
  • , Chedy Raissi
  • , Stéphane Bressan
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Sampling streams of continuous data with limited memory, or reservoir sampling, is a utility algorithm. Standard reservoir sampling maintains a random sample of the entire stream as it has arrived so far. This restriction does not meet the requirement of many applications that need to give preference to recent data. The simplest algorithm for maintaining a random sample of a sliding window reproduces periodically the same sample design. This is also undesirable for many applications. Other existing algorithms are using variable size memory, variable size samples or maintain biased samples and allow expired data in the sample. We propose an effective algorithm, which is very simple and therefore efficient, for maintaining a near random fixed size sample of a sliding window. Indeed our algorithm maintains a biased sample that may contain expired data. Yet it is a good approximation of a random sample with expired data being present with low probability. We analytically explain why and under which parameter settings the algorithm is effective. We empirically evaluate its performance (effectiveness) and compare it with the performance of existing representatives of random sampling over sliding windows and biased sampling algorithm.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings
Pages337-351
Number of pages15
EditionPART 1
DOIs
StatePublished - 2010
Externally publishedYes
Event15th International Conference on Database Systems for Advanced Applications, DASFAA 2010 - Tsukuba, Japan
Duration: 1 Apr 20104 Apr 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume5981 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th International Conference on Database Systems for Advanced Applications, DASFAA 2010
Country/TerritoryJapan
CityTsukuba
Period1/04/104/04/10

Fingerprint

Dive into the research topics of 'A simple, yet effective and efficient, sliding window sampling algorithm'. Together they form a unique fingerprint.

Cite this