TY - JOUR
T1 - Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models †
AU - Wang, Yanhao
AU - Fabbri, Francesco
AU - Mathioudakis, Michael
AU - Li, Jia
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/7
Y1 - 2023/7
N2 - Diversity maximization is a fundamental problem with broad applications in data summarization, web search, and recommender systems. Given a set X of n elements, the problem asks for a subset S of (Formula presented.) elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we study diversity maximization with fairness constraints in streaming and sliding-window models. Specifically, we focus on the max–min diversity maximization problem, which selects a subset S that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set X is partitioned into m disjoint groups by a specific sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset S contains (Formula presented.) elements from each group (Formula presented.). Although diversity maximization has been extensively studied, existing algorithms for fair max–min diversity maximization are inefficient for data streams. To address the problem, we first design efficient approximation algorithms for this problem in the (insert-only) streaming model, where data arrive one element at a time, and a solution should be computed based on the elements observed in one pass. Furthermore, we propose approximation algorithms for this problem in the sliding-window model, where only the latest w elements in the stream are considered for computation to capture the recency of the data. Experimental results on real-world and synthetic datasets show that our algorithms provide solutions of comparable quality to the state-of-the-art offline algorithms while running several orders of magnitude faster in the streaming and sliding-window settings.
AB - Diversity maximization is a fundamental problem with broad applications in data summarization, web search, and recommender systems. Given a set X of n elements, the problem asks for a subset S of (Formula presented.) elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we study diversity maximization with fairness constraints in streaming and sliding-window models. Specifically, we focus on the max–min diversity maximization problem, which selects a subset S that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set X is partitioned into m disjoint groups by a specific sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset S contains (Formula presented.) elements from each group (Formula presented.). Although diversity maximization has been extensively studied, existing algorithms for fair max–min diversity maximization are inefficient for data streams. To address the problem, we first design efficient approximation algorithms for this problem in the (insert-only) streaming model, where data arrive one element at a time, and a solution should be computed based on the elements observed in one pass. Furthermore, we propose approximation algorithms for this problem in the sliding-window model, where only the latest w elements in the stream are considered for computation to capture the recency of the data. Experimental results on real-world and synthetic datasets show that our algorithms provide solutions of comparable quality to the state-of-the-art offline algorithms while running several orders of magnitude faster in the streaming and sliding-window settings.
KW - diversity maximization
KW - group fairness
KW - sliding-window algorithm
KW - streaming algorithm
UR - https://www.scopus.com/pages/publications/85166182478
U2 - 10.3390/e25071066
DO - 10.3390/e25071066
M3 - 文章
AN - SCOPUS:85166182478
SN - 1099-4300
VL - 25
JO - Entropy
JF - Entropy
IS - 7
M1 - 1066
ER -