Sequential classification-based optimization for direct policy search

Yi Qi Hu, Hong Qian, Yang Yu

Research output: Contribution to conferencePaperpeer-review

40 Scopus citations

Abstract

Direct policy search often results in high-quality policies in complex reinforcement learning problems, which employs some optimization algorithms to search the parameters of the policy for maximizing the its total reward. Classificationbased optimization is a recently developed framework for derivative-free optimization, which has shown to be effective and efficient for non-convex optimization problems with many local optima, and may provide a power optimization tool for direct policy search. However, this framework requires to sample a batch of solutions for every update of the search model, while in reinforcement learning, the environment often offers only sequential policy evaluation. Thus the classification-based optimization may not efficient for direct policy search, where solutions have to be sampled sequentially. In this paper, we adapt the classification-based optimization for sequential sampled solutions by forming the sample batch via reusing historical solutions. Experiments on a helicopter hovering task and controlling tasks in OpenAI Gym show that the new algorithm significantly improve the performance from several state-of-the-art derivative-free optimization approaches.

Original languageEnglish
Pages2029-2035
Number of pages7
StatePublished - 2017
Externally publishedYes
Event31st AAAI Conference on Artificial Intelligence, AAAI 2017 - San Francisco, United States
Duration: 4 Feb 201710 Feb 2017

Conference

Conference31st AAAI Conference on Artificial Intelligence, AAAI 2017
Country/TerritoryUnited States
CitySan Francisco
Period4/02/1710/02/17

Fingerprint

Dive into the research topics of 'Sequential classification-based optimization for direct policy search'. Together they form a unique fingerprint.

Cite this