TY - JOUR
T1 - Hierarchical Multiagent Reinforcement Learning for Allocating Guaranteed Display Ads
AU - Wang, Lu
AU - Han, Lei
AU - Chen, Xinru
AU - Li, Chengchang
AU - Huang, Junzhou
AU - Zhang, Weinan
AU - Zhang, Wei
AU - He, Xiaofeng
AU - Luo, Dijun
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2022/10/1
Y1 - 2022/10/1
N2 - In this article, we study the problem of guaranteed display ads (GDAs) allocation, which requires proactively allocate display ads to different impressions to fulfill their impression demands indicated in the contracts. Existing methods for this problem either assume the impressions that are static or solely consider a specific ad's benefits. Thus, it is hard to generalize to the industrial production scenario where the impressions are dynamical and large-scale, and the overall allocation optimality of all the considered GDAs is required. To bridge this gap, we formulate this problem as a sequential decision-making problem in the scope of multiagent reinforcement learning (MARL), by assigning an allocation agent to each ad and coordinating all the agents for allocating GDAs. The inputs are the states (e.g., the demands of the ad and the remaining time steps for displaying the ads) of each ad and the impressions at different time steps, and the outputs are the display ratios of each ad for each impression. Specifically, we propose a novel hierarchical MARL (HMARL) method that creates hierarchies over the agent policies to handle a large number of ads and the dynamics of impressions. HMARL contains: 1) a manager policy to navigate the agent to choose an appropriate subpolicy and 2) a set of subpolicies that let the agents perform diverse conditioning on their states. Extensive experiments on three real-world data sets from the Tencent advertising platform with tens of millions of records demonstrate significant improvements of HMARL over state-of-the-art approaches.
AB - In this article, we study the problem of guaranteed display ads (GDAs) allocation, which requires proactively allocate display ads to different impressions to fulfill their impression demands indicated in the contracts. Existing methods for this problem either assume the impressions that are static or solely consider a specific ad's benefits. Thus, it is hard to generalize to the industrial production scenario where the impressions are dynamical and large-scale, and the overall allocation optimality of all the considered GDAs is required. To bridge this gap, we formulate this problem as a sequential decision-making problem in the scope of multiagent reinforcement learning (MARL), by assigning an allocation agent to each ad and coordinating all the agents for allocating GDAs. The inputs are the states (e.g., the demands of the ad and the remaining time steps for displaying the ads) of each ad and the impressions at different time steps, and the outputs are the display ratios of each ad for each impression. Specifically, we propose a novel hierarchical MARL (HMARL) method that creates hierarchies over the agent policies to handle a large number of ads and the dynamics of impressions. HMARL contains: 1) a manager policy to navigate the agent to choose an appropriate subpolicy and 2) a set of subpolicies that let the agents perform diverse conditioning on their states. Extensive experiments on three real-world data sets from the Tencent advertising platform with tens of millions of records demonstrate significant improvements of HMARL over state-of-the-art approaches.
KW - Artificial intelligence
KW - computational and artificial intelligence
KW - decision support systems
UR - https://www.scopus.com/pages/publications/85107173785
U2 - 10.1109/TNNLS.2021.3070484
DO - 10.1109/TNNLS.2021.3070484
M3 - 文章
C2 - 33999823
AN - SCOPUS:85107173785
SN - 2162-237X
VL - 33
SP - 5361
EP - 5373
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 10
ER -