Concurrency Control for Machine Learning

Principal Investigator Stefanie Jegelka

Many machine learning algorithms iteratively transform some global state (e.g., model parameters or variable assignment) giving the illusion of serial dependencies between each operation. However, due to sparsity, exchangeability, and other symmetries, it is often the case that many, but not all, of the state-transforming operations can be computed concurrently while still preserving serializability: the equivalence to some serial execution where individual operations have been reordered.
This opportunity for serializable concurrency forms the foundation of distributed database systems. In this project, we implement updates in machine learning algorithms as concurrent transactions in a distributed database. As a result, we achieve high scalability while maintaining the semantics and theoretical properties of original serial algorithm.