Non-Parametric Policy Gradients: A Unified Treatment of Propositional and Relational Domains (2008)

Authors

Abstract

Policy gradient approaches are a powerful instrument for learning how to interact with the environment.Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult – if not impossible – to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach – called NPPG – that overcomes this limitation. The key idea is to apply Friedmann's gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results.

Discussion

Enter your comment (wiki syntax is allowed):
GZBJB
 
paper/2008/259.txt · Last modified: 2009/05/24 18:48 (external edit)
 
Driven by DokuWiki