z-logo
open-access-imgOpen Access
Approximate Value Iteration with Temporally Extended Actions
Author(s) -
Timothy Mann,
Shie Mannor,
Doina Precup
Publication year - 2015
Publication title -
journal of artificial intelligence research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.79
H-Index - 123
eISSN - 1943-5037
pISSN - 1076-9757
DOI - 10.1613/jair.4676
Subject(s) - landmark , computer science , convergence (economics) , bellman equation , mathematical optimization , reinforcement learning , value (mathematics) , function (biology) , term (time) , state space , mathematics , artificial intelligence , machine learning , statistics , physics , quantum mechanics , evolutionary biology , economics , biology , economic growth
Temporally extended actions have proven useful for reinforcement learning, but their duration also makes them valuable for efficient planning. The options framework provides a concrete way to implement and reason about temporally extended actions. Existing literature has demonstrated the value of planning with options empirically, but there is a lack of theoretical analysis formalizing when planning with options is more efficient than planning with primitive actions. We provide a general analysis of the convergence rate of a popular Approximate Value Iteration (AVI) algorithm called Fitted Value Iteration (FVI) with options. Our analysis reveals that longer duration options and a pessimistic estimate of the value function both lead to faster convergence. Furthermore, options can improve convergence even when they are suboptimal and sparsely distributed throughout the statespace. Next we consider the problem of generating useful options for planning based on a subset of landmark states. This suggests a new algorithm, Landmark-based AVI (LAVI), that represents the value function only at the landmark states. We analyze both FVI and LAVI using the proposed landmark-based options and compare the two algorithms. Our experimental results in three different domains demonstrate the key properties from the analysis. Our theoretical and experimental results demonstrate that options can play an important role in AVI by decreasing approximation error and inducing fast convergence

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom