Abstract
Here I apply three reinforcement learning methods to the full, continuous action, swing-up acrobot control benchmark problem. These include two approaches from the literature: CACLA and NM-SARSA and a novel approach which I refer to as Nelder Mead-SARSA. Nelder Mead-SARSA, like NMSARSA, directly optimises the state-action value function for action selection, in order to allow continuous action reinforcement learning without a separate policy function. However, as it uses a derivative-free method it does not require the first or second partial derivatives of the value function. All three methods achieved good results in terms of swing-up times, comparable to previous approaches from the literature. Particularly Nelder Mead-SARSA, which performed the swing up in a shorter time than many approaches from the literature.
Original language | English |
---|---|
Title of host publication | 2015 IEEE International Conference on Systems, Man, and Cybernetics |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 2084-2089 |
Number of pages | 6 |
ISBN (Print) | 978-1-4799-8696-5 |
DOIs | |
Publication status | Published - 12 Oct 2015 |
Event | 2015 IEEE International Conference on Systems, Man, and Cybernetics - Hong Kong, China Duration: 9 Oct 2015 → 12 Oct 2015 |
Conference
Conference | 2015 IEEE International Conference on Systems, Man, and Cybernetics |
---|---|
Period | 9/10/15 → 12/10/15 |
Keywords
- reinforcement learning
- continuous action-space
- computational intelligence
- artificial neural networks
- intelligent control