TY - GEN
T1 - Streaming Algorithms for Diversity Maximization with Fairness Constraints
AU - Wang, Yanhao
AU - Fabbri, Francesco
AU - Mathioudakis, Michael
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set X of n elements, it asks to select a subset S of kl n elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset S that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set X is partitioned into m disjoint groups by some sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset S contains ki elements from each group i ? [1, m]. A streaming algorithm should process X sequentially in one pass and return a subset with maximum diversity while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is 1-?4-approximate and specific for m = 2, where ? E (0,1), and the second of which achieves a 1-?3m+2-approximation for an arbitrary m. Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.
AB - Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set X of n elements, it asks to select a subset S of kl n elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset S that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set X is partitioned into m disjoint groups by some sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset S contains ki elements from each group i ? [1, m]. A streaming algorithm should process X sequentially in one pass and return a subset with maximum diversity while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is 1-?4-approximate and specific for m = 2, where ? E (0,1), and the second of which achieves a 1-?3m+2-approximation for an arbitrary m. Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.
KW - algorithmic fairness
KW - diversity maximization
KW - max-min dispersion
KW - streaming algorithm
UR - https://www.scopus.com/pages/publications/85136450548
U2 - 10.1109/ICDE53745.2022.00008
DO - 10.1109/ICDE53745.2022.00008
M3 - 会议稿件
AN - SCOPUS:85136450548
T3 - Proceedings - International Conference on Data Engineering
SP - 41
EP - 53
BT - Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
PB - IEEE Computer Society
T2 - 38th IEEE International Conference on Data Engineering, ICDE 2022
Y2 - 9 May 2022 through 12 May 2022
ER -