Automatic Adaptive Network Operation: Delivering Better Performance Without Human-Scale Environments

Principal Investigator David Clark

Co-investigators Karen Sollins , John Wroclawski

Robust networks must adapt to changing conditions: changes that occur over a range of time-scales. In the very short time-scale, statistical overloads can lead to inappropriately lost or delayed traffic and poor performance. Especially in military networks, externally forced sudden changes in topology or capacity can demand corresponding change in application behavior and priority. Over longer time scales, the network can re-deploy its communications assets in different ways to meet evolving needs. Today, short time-scale performance adaptation occurs more slowly than desirable, especially when individual traffic flows are using a significant part of the total capacity (a condition more often true of military networks). Mid-and long-term network engineering and reconfiguration usually involves human intervention, which implies slower response and chance of human error. The goal of this project is to develop new strategies for network design that lead to effective and automatic adaptation of the network to its changing environment at many time-scales.

Networks that adapt their performance to changing conditions effectively, rapidly, automatically, and appropriately will reduce the human effort of deployment and operation, broaden the range of conditions over which applications can operate, and increase the reliability of the delivered service.

We identify two major limitations with current technology. First, current algorithms tend to depend on a consistent global view of network conditions, and seek optimal solutions in that context. Second, current algorithms typically operate based on measures of function and performance, but not policy and administrative constraints. By breaking free of these limitations, we expect to demonstrate new classes of network algorithms and protocols that support these highly adaptable networks.

The approach is based on several core concepts:
(1) Rule-based expression of local administrative and performance metrics.
(2) Ability to bound rules and heuristics that are applicable only within physical or virtual domains or "regions." One form of this regionalization is expressed in our work on the Metanet.
(3) Algorithms that make decisions based on partial and local knowledge. These algorithms may sacrifice optimality for operationally simple and effective behavior.
(4) Use of "underdamped" control algorithms that apply short time-scale active control to an otherwise unstable system, in order to achieve faster adaptation to changing conditions.
(5) A scalable multicast-based strategy for implementing software agents that automatically locate, monitor and report on events of interest within the network.
(6) Support of basic network functions through sets of composable lower-level building blocks, as opposed to fixed pre-specified services.

Past work in this area has focused on the controlled allocation of network resources to different applications with different service requirements, and on improving the adaptation of applications and network protocols to changing network conditions. We have developed the definition of the real time service on the Internet, the Controlled Load service. We have proposed an approach called RIO to allocate the bandwidth of a data network to different data transfers in a way that is highly scalable and explicitly controllable. We have proposed extensions to current congestion controls in the Internet, and explored alternatives based on explicit rate-based feedback.

The successful completion of this project will lead to networks that operate effectively and deliver good performance in the face of rapidly changing conditions and incomplete global knowledge. These networks will configure and monitor themselves without significant human intervention, using algorithms that consider administrative as well as technical requirements. Among the specific examples of this are: (1) High-performance applications that learn of specific critical conditions such as persistent congestion, lossy or failing paths, and so on, and reconfigure internal structure or adapt behavior to meet performance goals. (2) Agents working on behalf of a high-performance application will operate within the network, converging on critical points and reporting current conditions while they facilitate the traffic of the application. (3) Regionalization primitives create local contexts for algorithms or administrative constraints, in order to achieve global goals by piecewise composition.