Abstract
We propose the use of kernel-based methods as underlying
function approximator in the least-squares based policy
evaluation framework of LSPE(λ) and LSTD(λ). In particular we
present the ‘kernelization’ of model-free LSPE(λ). The ‘kernelization’
is computationally made possible by using the subset of
regressors approximation, which approximates the kernel using
a vastly reduced number of basis functions. The core of our
proposed solution is an efficient recursive implementation with
automatic supervised selection of the relevant basis functions. The
LSPE method is well-suited for optimistic policy iteration and
can thus be used in the context of online reinforcement learning.
We use the high-dimensional Octopus benchmark to demonstrate
this.
function approximator in the least-squares based policy
evaluation framework of LSPE(λ) and LSTD(λ). In particular we
present the ‘kernelization’ of model-free LSPE(λ). The ‘kernelization’
is computationally made possible by using the subset of
regressors approximation, which approximates the kernel using
a vastly reduced number of basis functions. The core of our
proposed solution is an efficient recursive implementation with
automatic supervised selection of the relevant basis functions. The
LSPE method is well-suited for optimistic policy iteration and
can thus be used in the context of online reinforcement learning.
We use the high-dimensional Octopus benchmark to demonstrate
this.
Original language | English |
---|---|
Title of host publication | Procs of the 2007 Symposium on Approximate Dynamic Programming & Reinforcement Learning (ADPRL 2007) |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 338-345 |
Volume | 2007 |
ISBN (Print) | 1-4244-0706-0 |
Publication status | Published - 2007 |