Efficient Continuous Space Policy Optimization for High-frequency Trading

  • Li Han
  • , Nan Ding
  • , Guoxuan Wang
  • , Dawei Cheng*
  • , Yuqi Liang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

High-frequency trading is an extraordinarily intricate financial task, which is normally treated as a near real-time sequential decision problem. Compared with the traditional two-phase approach, forecasting equity's trend and then weighting them by combinatorial optimization, deep reinforcement learning (DRL) methods have shown advances in reward chasing with optimal policies. However, existing DRL-based methods either leverage portfolio optimization on low-frequency scenarios or only support a very limited number of assets with discrete action space, facing significant computing efficiency challenges. Therefore, we propose an efficient DRL-based policy optimization (DRPO) method for high-frequency trading. In particular, we model the portfolio management task with Markov Decision Process by directly inferring the equity weights in the action space guided by maximum accumulated returns. To reduce agents' interaction complexity without reducing interpretation, we detach the environment into the "static'' market states and "dynamic'' portfolio weight states. Then, we design an efficient reward expectation calculation algorithm via probabilistic dynamic programming, which enables our agents directly collect feedback away from trajectory sampling-based morass. To the best of our knowledge, this is the first work that solves the high-frequency portfolio optimization problem by devising an efficient continuous space policy optimization algorithm in the DRL framework. Through extensive experiments on the real-world data from Dow Jones, Coinbase and SSE exchanges, we show that our proposed DRPO significantly outperforms state-of-the-art benchmark methods. The results demonstrate the practical applicability and effectiveness of the proposed method.

Original languageEnglish
Title of host publicationKDD 2023 - Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages4112-4122
Number of pages11
ISBN (Electronic)9798400701030
DOIs
StatePublished - 4 Aug 2023
Event29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023 - Long Beach, United States
Duration: 6 Aug 202310 Aug 2023

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
ISSN (Print)2154-817X

Conference

Conference29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023
Country/TerritoryUnited States
CityLong Beach
Period6/08/2310/08/23

Keywords

  • deep reinforcement learning
  • high-frequency trading
  • policy optimization

Fingerprint

Dive into the research topics of 'Efficient Continuous Space Policy Optimization for High-frequency Trading'. Together they form a unique fingerprint.

Cite this