Entry Date:
October 7, 2013

Producing Efficient Error-Bounded Solutions for Transition Independent Decentralized MDPs

There has been substantial progress on algorithms for single-agent sequential decision making using partially observable Markov de- cision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: error-bounds and fast convergence rates . Despite significant efforts, no algorithms for solving decentralized POMDPs benefit from these properties, leading to either poor solution quality or limited scalability. This paper presents the first approach for solving transition indepen- dent decentralized Markov decision processes (Dec-MDPs), that inherits these properties. Two related algorithms illustrate this ap- proach. The first recasts the original problem as a deterministic and completely observable Markov decision process. In this form, the original problem is solved by combining heuristic search with con- straint optimization to quickly converge into a near-optimal pol- icy. This algorithm also provides the foundation for the first al- gorithm for solving infinite-horizon transition independent decen- tralized MDPs. We demonstrate that both methods outperform state-of-the-art algorithms by multiple orders of magnitude, and for infinite-horizon decentralized MDPs, the algorithm is able to construct more concise policies by searching cyclic policy graphs