Entry Date:
January 19, 2017

Fast Algorithms for Learning Graphical Models from Scarce Data

Principal Investigator Guy Bresler

Project Start Date March 2016

Project End Date
 February 2018


Graphical models (GMs) are a powerful framework used to succinctly represent complex high-dimensional phenomena. Statistical dependence between variables is represented combinatorially via edges in a graph, and this allows both model interpretability and computationally efficient inference. For these reasons, GMs are at the core of machine learning and artificial intelligence and have been used in a variety of applied fields, including finance, operations research, biology, signal processing, and social networks. For large complex data with non-obvious structure, the central problem is that of learning an appropriate model. Learning a graphical model presents both a computational and statistical challenge. The combinatorial nature of the problem means that there are a huge number of possible models to explore. At the same time, the high-dimensional nature of modern applications means that the number of data-points is often much smaller than the dimension of the ambient parameter space: learning algorithms must therefore make efficient use of the data, which is scarce relative to the problem size. Existing approaches to learning graphical models achieve either statistical efficiency or computational efficiency, but not both.

This research aims for the best of both worlds: extreme computational and statistical efficiency. While practical applications demand such efficiency, it is unlikely to be attainable in complete generality, for all models. The question is, what features of real-world systems allow for tractable learning? The research entails identifying specific model subclasses of interest and developing algorithms with provable performance guarantees. Concretely, the research provides new information-theoretic lower bounds on the amount of data required to learn, and informed by these lower bounds, gives fast (computationally efficient) algorithms that are statistically near optimal.