Entry Date:
December 12, 2013

Scaling Reinforcement Learning Methods by Incremental Feature Dependency Discovery


Optimal planning under uncertainty is a challenging problem facing practitioners dealing with real-world domain. MDPs facilitate a mathematical framework for solving these problem, but unfortunately for realistic multi-agent planning, the size of the state space is exponential in the number of agents and researchers quickly realized the limitation of tabular representations and introduced approximations to cope with the complexity and boost the learning speed through generalization. Finding the right representation is a critical milestone to scale the existing MDP solvers to larger domains.

Due to their ease of use, theoretical analytical, and empirical results, the linear family of function approximators have been favored in the literature. In this setting, the target function is approximated as the linear combination of a set of feature vectors. Many approaches try to find the right set of features offline, but online methods enjoy the advantage of adaptability to dynamic environment and often the case lower computational complexity. Hence these methods have improved the learning speed of existing reinforcement learning (RL) algorithms in low dimensional domains, yet existing online expansion methods do not scale well to high dimensional problems.

Research has explored the conjecture that one of the main difficulties limiting this scaling is that features defined over the full-dimensional state space often generalize poorly. Hence, we introduced incremental Feature Dependency Discovery (iFDD) as a computationally-inexpensive method for representational expansion that can be combined with any online, value-based RL method that uses binary features. Unlike other online expansion techniques, iFDD creates new features in low dimensional subspaces of the full state space where the approximation error persist. We provided convergence and computational complexity guarantees for iFDD.

Planning space for both of these domains exceeds hundred million possibilities. Adding useful features, iFDD allowed the agent to learn the task substantially better than the other methods.