Continuous Action-Space Reinforcement Learning Methods Applied to the Minimum-Time Swing-Up of the Acrobot

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as Nelder Mead-SARSA. Nelder Mead-SARSA, like NMSARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function. All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly Nelder Mead-SARSA, which performed the swing up in a shorter time than many approaches from the literature.
Original languageEnglish
Title of host publication2015 IEEE International Conference on Systems, Man, and Cybernetics
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages2084-2089
Number of pages6
ISBN (Print)978-1-4799-8696-5
DOIs
Publication statusPublished - 12 Oct 2015
Event2015 IEEE International Conference on Systems, Man, and Cybernetics - Hong Kong, China
Duration: 9 Oct 201512 Oct 2015

Conference

Conference2015 IEEE International Conference on Systems, Man, and Cybernetics
Period9/10/1512/10/15

Keywords

  • reinforcement learning
  • continuous action-space
  • computational intelligence
  • artificial neural networks
  • intelligent control

Fingerprint

Dive into the research topics of 'Continuous Action-Space Reinforcement Learning Methods Applied to the Minimum-Time Swing-Up of the Acrobot'. Together they form a unique fingerprint.

Cite this