A comparison of action selection methods for implicit policy method reinforcement learning in continuous action-space

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper I investigate methods of applying reinforcement learning to continuous state- and action-space problems without a policy function. I compare the performance of four methods, one of which is the discretisation of the action-space, and the other three are optimisation techniques applied to finding the greedy action without discretisation. The optimisation methods I apply are gradient descent, Nelder-Mead and Newton's Method. The action selection methods are applied in conjunction with the SARSA algorithm, with a multilayer perceptron utilized for the approximation of the value function. The approaches are applied to two simulated continuous state- and action-space control problems: Cart-Pole and double Cart-Pole. The results are compared both in terms of action selection time and the number of trials required to train on the benchmark problems.
Original languageEnglish
Title of host publication2016 International Joint Conference on Neural Networks (IJCNN)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages3785-3792
Number of pages8
ISBN (Print)978-1-5090-0621-2
DOIs
Publication statusPublished - 29 Jul 2016
Event2016 International Joint Conference on Neural Networks (IJCNN) - Vancouver, BC, Canada
Duration: 24 Jul 201629 Jul 2016

Conference

Conference2016 International Joint Conference on Neural Networks (IJCNN)
Period24/07/1629/07/16

Keywords

  • reinforcement learning
  • artificial neural networks
  • optimization methods
  • action selection
  • continous state- and action-space

Fingerprint

Dive into the research topics of 'A comparison of action selection methods for implicit policy method reinforcement learning in continuous action-space'. Together they form a unique fingerprint.

Cite this