Adversarial Cooperative Imitation Learning for Dynamic Treatment Regimes

  • Lu Wang
  • , Wenchao Yu
  • , Xiaofeng He
  • , Wei Cheng
  • , Martin Renqiang Ren
  • , Wei Wang
  • , Bo Zong
  • , Haifeng Chen
  • , Hongyuan Zha

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

29 Scopus citations

Abstract

Recent developments in discovering dynamic treatment regimes (DTRs) have heightened the importance of deep reinforcement learning (DRL) which are used to recover the doctor's treatment policies. However, existing DRL-based methods expose the following limitations: 1) supervised methods based on behavior cloning suffer from compounding errors; 2) the self-defined reward signals in reinforcement learning models are either too sparse or need clinical guidance; 3) only positive trajectories (e.g. survived patients) are considered in current imitation learning models, with negative trajectories (e.g. deceased patients) been largely ignored, which are examples of what not to do and could help the learned policy avoid repeating mistakes. To address these limitations, in this paper, we propose the adversarial cooperative imitation learning model, ACIL, to deduce the optimal dynamic treatment regimes that mimics the positive trajectories while differs from the negative trajectories. Specifically, two discriminators are used to help achieve this goal: an adversarial discriminator is designed to minimize the discrepancies between the trajectories generated from the policy and the positive trajectories, and a cooperative discriminator is used to distinguish the negative trajectories from the positive and generated trajectories. The reward signals from the discriminators are utilized to refine the policy for dynamic treatment regimes. Experiments on the publicly real-world medical data demonstrate that ACIL improves the likelihood of patient survival and provides better dynamic treatment regimes with the exploitation of information from both positive and negative trajectories.

Original languageEnglish
Title of host publicationThe Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
PublisherAssociation for Computing Machinery, Inc
Pages1785-1795
Number of pages11
ISBN (Electronic)9781450370233
DOIs
StatePublished - 20 Apr 2020
Event29th International World Wide Web Conference, WWW 2020 - Taipei, Taiwan, Province of China
Duration: 20 Apr 202024 Apr 2020

Publication series

NameThe Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020

Conference

Conference29th International World Wide Web Conference, WWW 2020
Country/TerritoryTaiwan, Province of China
CityTaipei
Period20/04/2024/04/20

Keywords

  • dynamic treatment regimes
  • generative adversarial networks
  • imitation learning
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Adversarial Cooperative Imitation Learning for Dynamic Treatment Regimes'. Together they form a unique fingerprint.

Cite this