Learning Graphical Models: Hardness and Tractability

Principal Investigator Devavrat Shah

Project Website http://www.nsf.gov/awardsearch/showAward?AWD_ID=1462158&HistoricalAwards=false

Project Start Date August 2015

Project End Date  July 2018

Graphical models provide means to capture uncertainty present in complex, unstructured data in a succinct manner. They are particularly suited for performing inference computation at scale. They hold potential to become primary way to model uncertainty in modern data-driven decision-making tasks. This project will enable this by developing efficient methods to learn graphical model representation from observations. The outcome of this project will have wide ranging impact in society and industry. This will include, but not restricted to, our ability to understand human behavior in social networks, processing information captured in biological experiments, automated language and speech processing as well as capturing inter-relationship between various financial instruments.

A generic application of Graphical model requires solving two basic tasks. First, given data, what is an appropriate graphical model? And second, given a graphical model, how does one perform inference from partial observations? Historically, domain knowledge was used to choose the model, for example Hidden Markov Models in speech processing. Therefore, a significant amount of effort has been devoted to the second problem, of developing inference algorithms (for example, Belief propagation). However, it is not at all clear what type of model to use for most modern applications. Thus for modern applications the first problem of learning the model is paramount. The goal of this project is to develop understanding, both conceptual and algorithmic, of the problem of learning GMs from observed data. One aspect of the project is to derive new lower bounds in order to understand the interplay between fundamental statistical and computational limits. Informed by these lower bounds, the project will seek to determine new and relevant model subclasses for which learning is tractable accompanied by efficient learning algorithms.