TY - GEN
T1 - Fair and representative subset selection from data streams
AU - Wang, Yanhao
AU - Fabbri, Francesco
AU - Mathioudakis, Michael
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/6/3
Y1 - 2021/6/3
N2 - We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be formulated as maximizing a monotone submodular function subject to a cardinality constraint k. In this work, we consider the setting where data items in the stream belong to one of several disjoint groups and investigate the optimization problem with an additional fairness constraint that limits selection to a given number of items from each group. We then propose efficient algorithms for the fairness-aware variant of the streaming submodular maximization problem. In particular, we first give a -approximation algorithm that requires passes over the stream for any constant ? > 0. Moreover, we give a single-pass streaming algorithm that has the same approximation ratio of when unlimited buffer sizes and post-processing time are permitted, and discuss how to adapt it to more practical settings where the buffer sizes are bounded. Finally, we demonstrate the efficiency and effectiveness of our proposed algorithms on two real-world applications, namely maximum coverage on large graphs and personalized recommendation.
AB - We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be formulated as maximizing a monotone submodular function subject to a cardinality constraint k. In this work, we consider the setting where data items in the stream belong to one of several disjoint groups and investigate the optimization problem with an additional fairness constraint that limits selection to a given number of items from each group. We then propose efficient algorithms for the fairness-aware variant of the streaming submodular maximization problem. In particular, we first give a -approximation algorithm that requires passes over the stream for any constant ? > 0. Moreover, we give a single-pass streaming algorithm that has the same approximation ratio of when unlimited buffer sizes and post-processing time are permitted, and discuss how to adapt it to more practical settings where the buffer sizes are bounded. Finally, we demonstrate the efficiency and effectiveness of our proposed algorithms on two real-world applications, namely maximum coverage on large graphs and personalized recommendation.
KW - Algorithmic fairness
KW - Approximation algorithm
KW - Data summarization
KW - Streaming algorithm
KW - Submodular maximization
UR - https://www.scopus.com/pages/publications/85107961103
U2 - 10.1145/3442381.3449799
DO - 10.1145/3442381.3449799
M3 - 会议稿件
AN - SCOPUS:85107961103
T3 - The Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021
SP - 1340
EP - 1350
BT - The Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021
PB - Association for Computing Machinery, Inc
T2 - 30th World Wide Web Conference, WWW 2021
Y2 - 19 April 2021 through 23 April 2021
ER -