Fair and representative subset selection from data streams

Yanhao Wang, Francesco Fabbri, Michael Mathioudakis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

28 Scopus citations

Abstract

We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be formulated as maximizing a monotone submodular function subject to a cardinality constraint k. In this work, we consider the setting where data items in the stream belong to one of several disjoint groups and investigate the optimization problem with an additional fairness constraint that limits selection to a given number of items from each group. We then propose efficient algorithms for the fairness-aware variant of the streaming submodular maximization problem. In particular, we first give a -approximation algorithm that requires passes over the stream for any constant ? > 0. Moreover, we give a single-pass streaming algorithm that has the same approximation ratio of when unlimited buffer sizes and post-processing time are permitted, and discuss how to adapt it to more practical settings where the buffer sizes are bounded. Finally, we demonstrate the efficiency and effectiveness of our proposed algorithms on two real-world applications, namely maximum coverage on large graphs and personalized recommendation.

Original languageEnglish
Title of host publicationThe Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021
PublisherAssociation for Computing Machinery, Inc
Pages1340-1350
Number of pages11
ISBN (Electronic)9781450383127
DOIs
StatePublished - 3 Jun 2021
Externally publishedYes
Event30th World Wide Web Conference, WWW 2021 - Ljubljana, Slovenia
Duration: 19 Apr 202123 Apr 2021

Publication series

NameThe Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021

Conference

Conference30th World Wide Web Conference, WWW 2021
Country/TerritorySlovenia
CityLjubljana
Period19/04/2123/04/21

Keywords

  • Algorithmic fairness
  • Approximation algorithm
  • Data summarization
  • Streaming algorithm
  • Submodular maximization

Fingerprint

Dive into the research topics of 'Fair and representative subset selection from data streams'. Together they form a unique fingerprint.

Cite this