z-logo
open-access-imgOpen Access
Constructing optimal policies for agents with constrained architectures
Author(s) -
Dmitri Dolgov,
Edmund H. Durfee
Publication year - 2003
Publication title -
citeseer x (the pennsylvania state university)
Language(s) - English
Resource type - Conference proceedings
ISBN - 1-58113-683-8
DOI - 10.1145/860575.860752
Subject(s) - computer science , distributed computing
Optimal behavior is a very desirable property of autonomous agents [13] and, as such, has received much attention over the years. However, making optimal decisions and executing optimal actions typically requires a substantial effort on the part of an agent, and in some situations the agent might lack the necessary sensory, computational, or actuating resources to carry out the optimal policy. In such cases, the agent will have to do the best it can, given its architectural constraints. We distinguish between three ways in which an agent’s architecture can affect policy optimality. An agent might have limitations that impact its ability to f rmulate, operationalize(convert to internal representation), or executean optimal policy. In this paper, we focus on agents facing the latter two types of limitations. We adopt the transient [7] constrained Markov decision problem (CMDP) framework [2, 11] in our search for optimal policies and show how gradations of increasingly constrained architectures create more complex optimization problems ranging from polynomial to NP-complete problems. We also present algorithms based on linear and integer programming that work across a range of such constrained optimization problems. The contribution of the full paper [5] is a characterization of a portion of the landscape of constrained agent architectures in terms of the complexity of finding optimal policies and algorithms for doing so. The new results that are of the most interest include the complexity proof and the algorithm for finding deterministic policies under linear execution constraints, the analysis of operationalization constraints on action utilization costs and an algorithm for approximating optimal policies that bound the probability of exceeding upper bounds on the total costs of the policy. Here we summarize this work.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom