Modular self-reconfiguring robots are robots composed of a number of modules that can disconnect and reconnect automatically and have degrees of freedom to move about each other. The promise of self-reconfigurable robotics is cheap, versatile, robust and self-repairing universal tools. Delivering on this promise requires distributed adaptive algorithms to control the robots in changing, unpredictable environments as arise, for example, during space exploration or search and rescue missions.
Until recently, all distributed controllers for modular robots were designed by hand with a particular task in mind. For example, the carefully crafted and tested local rule-based controllers took hours of developer time. We are researching probabilistic methods to automatically learn controllers, as the robot explores its environment and acts in it.
We start by casting the problem traditionally, as a distributed, discrete Partially Observable Markov Decision Process (POMDP), and notice that this approach suffers from lack of structure in a combinatorially explosive domain. We can mitigate this problem by manipulating the size of the modular robot.
However, for learning to be applicable to real robots in the physical world, the problem must be solved in the continuous domain of sensor readings and velocity or force motor commands. We are designing a structured adaptive architecture of skills that spans the robot's abilities from each module's motor control to distributed goal-level algorithms.
The approach taken here is to study the possibilities of adaptation and reinforcement learning for self-organizing systems such as self-reconfiguring modular robots, both in a generic, discrete representation and on a concrete robot model in a physics simulator. We are examining what the physical instantiation and direct sensorimotor interaction with the environment affords to the learning mechanisms.
In the discrete domain we use the Gradient Ascent in Policy Space (GAPS) learning algorithm. The objective difficulty of even the simplest scenario of motion in a straight line on an even surface without obstacles in 2D was analyzed. By attempting to solve such a toy problem with a state of the art reinforcement learning technique, we discover the limitations of unstructured search in our domain. We find, however, that it is possible to leverage off the modular nature of the locomotion problem in order to improve the convergence rate.
If the robot starts with only two modules, and more are added incrementally, this effectively reduces the problem state-space. With only two modules, given the physical coupling between them, there are only four states to explore. Adding one other module, only another fourteen possible observations are added and so forth. The problem becomes more manageable. Therefore, we implemented the Incremental GAPS (IGAPS) algorithm, which takes as input the policy parameters to which a previous running instance of IGAPS with fewer modules has converged. We observe a dramatic decrease in convergence time.
The incremental algorithm is essentially naive structuring of a given problem. Its success depends entirely on the nature of the task the robot is to perform. We are currently developing a framework in which some search space structuring would occur automatically, as part of the learning process. This can be achieved by building an architecture of skills for the robot.
A skill is a learned motor pattern that is used to achieve or sustain a goal. We are researching an architecture of module-level, cluster-level, and robot-level skills, with a view to simplify the learning problem. Each skill can be learned separately. The results can then be combined by a number of methods, such as a fixed hierarchical network, a dynamic planning algorithm, or another learning algorithm.
We are currently working on testing the principles of the skills architecture on a simulated link-based modular robot. In the future, lab robots such as MultiShady could provide an interesting platform for our learning algorithms.
The system should demonstrate increased performance at a task through reconfiguration. As an example, the modules could learn to assemble small cluster structures which are able to walk. Then the clusters could learn to reconfigure into building blocks for a larger assembly, which would also have a capability to examine itself for damage and repair broken parts by self-reconfiguration.
Such demonstrated adaptability will be a great step towards fulfilling the promise of self-organizing modular robots.