Active Policy Learning for Robot Planning and Exploration under Uncertainty | Zendy

Rubén Martínez-Cantín | Zendy; Nando de Freitas | Zendy; Arnaud Doucet | Zendy; José A. Castellanos | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Active Policy Learning for Robot Planning and Exploration under Uncertainty

Author(s) -

Rubén Martínez-Cantín,

Nando de Freitas,

Arnaud Doucet,

José A. Castellanos

Publication year - 2007

Language(s) - English

Resource type - Conference proceedings

DOI - 10.15607/rss.2007.iii.041

Subject(s) - computer science , robot , active learning (machine learning) , artificial intelligence , human–computer interaction , knowledge management

This paper proposes a simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes. The algorithm is tested in the domain of robot navigation and exploration under uncertainty. In such a setting, the expected cost, that must be minimized, is a function of the belief state (filtering distribution). This filtering distribution is in turn nonlinear and subject to discontinuities, which arise because constraints in the robot motion and control models. As a result, the expected cost is non-differentiable and very expensive to simulate. The new algorithm overcomes the first difficulty and reduces the number of required simulations as follows. First, it assumes that we have carried out previous simulations which returned values of the expected cost for different corresponding policy parameters. Second, it fits a Gaussian process (GP) regression model to these values, so as to approximate the expected cost as a function of the policy parameters. Third, it uses the GP predicted mean and variance to construct a statistical measure that determines which policy parameters should be used in the next simulation. The process is then repeated using the new parameters and the newly gathered expected cost observation. Since the objective is to find the policy parameters that minimize the expected cost, this iterative active learning approach effectively trades-off between exploration (in regions where the GP variance is large) and exploitation (where the GP mean is low). In our experiments, a robot uses the proposed algorithm to plan an optimal path for accomplishing a series of tasks, while maximizing the information about its pose and map estimates. These estimates are obtained with a standard filter for simultaneous localization and mapping. Upon gathering new observations, the robot updates the state estimates and is able to replan a new path in the spirit of open-loop feedback control.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research