Abstract
Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as Nelder Mead-SARSA. Nelder Mead-SARSA, like NMSARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function. All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly Nelder Mead-SARSA, which performed the swing up in a shorter time than many approaches from the literature.
| Original language | English |
|---|---|
| Title of host publication | 2015 IEEE International Conference on Systems, Man, and Cybernetics |
| Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
| Pages | 2084-2089 |
| Number of pages | 6 |
| ISBN (Print) | 978-1-4799-8696-5 |
| DOIs | |
| Publication status | Published - 12 Oct 2015 |
| Event | 2015 IEEE International Conference on Systems, Man, and Cybernetics - Hong Kong, China Duration: 9 Oct 2015 → 12 Oct 2015 |
Conference
| Conference | 2015 IEEE International Conference on Systems, Man, and Cybernetics |
|---|---|
| Period | 9/10/15 → 12/10/15 |
Keywords
- reinforcement learning
- continuous action-space
- computational intelligence
- artificial neural networks
- intelligent control
Fingerprint
Dive into the research topics of 'Continuous Action-Space Reinforcement Learning Methods Applied to the Minimum-Time Swing-Up of the Acrobot'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver