Least squares SVM for least squares TD learning

Tobias Jung, Daniel Polani

Research output: Chapter in Book/Report/Conference proceedingChapter

15 Citations (Scopus)

Abstract

We formulate the problem of least squares temporal difference learning (LSTD) in the framework of least squares SVM (LS-SVM). To cope with the large amount (and possible sequential nature) of training data arising in reinforcement learning we employ a subspace based variant of LS-SVM that sequentially processes the data and is hence especially suited for online learning. This approach is adapted from the context of Gaussian process regression and turns the unwieldy original optimization problem (with computational complexity being cubic in the number of processed data) into a reduced problem (with computional complexity being linear in the number of processed data). We introduce a QR decomposition based approach to solve the resulting generalized normal equations incrementally that is numerically more stable than existing recursive least squares based update algorithms. We also allow a forgetting factor in the updates to track non-stationary target functions (i.e. for the use with optimistic policy iteration). Experimental comparison with standard CMAC function approximation indicate that LS-SVMs are well-suited for online RL.
Original languageEnglish
Title of host publicationECAI 2006
Subtitle of host publication17th European Conference on Artificial Intelligence August 29 - September 1, 2006, Riva del Garda, Italy
EditorsGerhard Brewka, Silvia Coradeschi, Anna Perini, Paolo Traverso
Pages499-503
Number of pages5
Publication statusPublished - 1 Dec 2006

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume141
ISSN (Print)0922-6389

Fingerprint

Dive into the research topics of 'Least squares SVM for least squares TD learning'. Together they form a unique fingerprint.

Cite this