Automatic Software Upgrades for Distributed Systems

Principal Investigator Barbara Liskov

Co-investigator Liuba Shrira

Project Website http://www.pmg.csail.mit.edu/upgrades/

This project aims to solve the problem of how to do automatic software upgrades: how to cause the code running in the system to change while the system continues to run. We are interested in two different but related subproblems: how to upgrade code of objects in a distributed object-oriented database, and how to upgrade code in robust distributed systems that are intended to provide continuous service over a very long lifetime. In either case, there is a compelling need for automatic upgrades to correct software errors, improve performance, or to change system behavior, e.g., to support new features.

Internet services face challenging and ever-changing requirements: huge quantities of data must be managed and made continuously available to rapidly growing client populations. Examples include online email services, search engines, persistent online games, scientific and financial data processing systems, content distribution networks, and file sharing networks.

The distributed systems that provide these services are large and long-lived and therefore will need changes (upgrades) to fix bugs, add features, and improve performance. Yet while a system is upgrading, it must continue to provide service to users. The aim of our research is to develop a flexible and generic automatic upgrade system that enables distributed systems to provide service during upgrades.

The system is designed to satisfy a number of requirements. To begin with, upgrades must be easy to define. In particular, we want modularity : to define an upgrade, the upgrader must understand only a few versions of the system software, e.g., the current and new versions.

In addition, we require generality : an upgrade should be able to change the software in arbitrary ways. This implies that the new version can be incompatible with the old one: it can stop supporting legacy behavior and can change communication protocols. Generality is important because otherwise a system must continue to support legacy behavior, which complicates software and makes it less robust. Our approach allows legacy behavior to be supported as needed, but in a way that avoids complicating the current version and that makes it easy to retire the legacy behavior when the time comes.

A third point is that upgrades must be able to retain yet transform persistent state . Persistent state may need to be transformed in some application dependent way, e.g., to move to a new file format, and transformations can be costly, e.g., if the local file state is large. We do not attempt to preserve volatile state (e.g., open connections) because upgrades can be scheduled (see below) to minimize inconvenience to users of losing volatile state.

A fourth requirement is automatic deployment . The systems of interest are too large to upgrade manually (e.g., via remote login). Instead, upgrades must be deployed automatically: the upgrader defines an upgrade at a central location, and the upgrade system propagates and installs it on each node.

A fifth requirement is controlled deployment . The upgrader must be able to control when nodes upgrade. Reasons for controlled deployment include: allowing a system to provide service while an upgrade is happening, e.g., by upgrading replicas in a replicated system one-at-a-time (especially when the upgrade involves a time-consuming persistent state transform); testing an upgrade on a few nodes before installing it everywhere; and scheduling an upgrade to happen at times when the load on nodes being upgraded is light.

A sixth requirement is continuous service . Controlled deployment implies there can be long periods of time when the system is running in mixed mode, i.e., when some nodes have upgraded and others have not. Nonetheless, the system must provide service, even when the upgrade is incompatible. This implies the upgrade system must provide a way for nodes running different versions to interoperate, without restricting the kinds of changes an upgrade can make.

We have developed an upgrade infrastructure that supports these requirements. This is the first approach to provide a complete solution for automatic and controlled upgrades in distributed systems. It allows upgraders to define scheduling functions that control upgrade deployment, transform functions that control transforming persistent state, and simulation objects that enable the system to run in mixed mode. Our techniques are either entirely new, or are major extensions of what has been done before. We support all schedules used in real systems, and our support for mixed mode improves on what is done in practice.

Support for mixed mode operation raises a question: what should happen when a node runs several versions at once, and different clients interact with the different versions? We address this question by defining requirements for upgrades and providing a way to specify upgrades that enables reasoning about whether the requirements are satisfied. The specification captures the meaning of executions in which different clients interact with different versions of an object and identifies when calls must fail due to irreconcilable incompatibilities. The upgrade requirements and specification technique are entirely new.

We have implemented a prototype, called Upstart, that automatically deploys upgrades on distributed systems. Results of experiments using Upstart show that our infrastructure introduces only modest overhead, and therefore our approach is practical.