Entry Date:
January 25, 2017

Theory and Algorithms for Learning Perturbation Models

Principal Investigator Tommi Jaakkola

Project Start Date September 2015

Project End Date
 August 2018


Machine learning concerns designing and understanding computer programs that learn from experience. Modern complex settings (for example natural language) require the use of flexible probability models that permit one to entertain large numbers of possible hypotheses (semantics) underlying the observations (sentences). In such models likely structures (parse trees) are guided by functions that assess the suitability of structures by breaking them into smaller pieces. Richer models require larger subsets making it challenging to efficiently explore large sets of possible hypotheses.

This project takes a fresh look at structured modeling by developing a new paradigm for modeling by combining randomization of parameters and combinatorial optimization. The combination provides a mechanism for inducing complex distributions over structures yet explicitly maintaining easy generation of likely structures. We pursue a comprehensive plan to understand, extend, and design these perturbation models towards the end goal of solving significant cross-cutting applied problems in natural language processing such as parsing or structured recommender tasks such as paraphrasing. Beyond modeling, the proposed work has the potential to merge tools and techniques across areas from theoretical computer science (stability, tractability), combinatorial optimization (relaxations, certificates), to probability (sampling from convex bodies). The tools developed will be broadly useful across prominent areas, from computer vision, natural language processing, to medical informatics and computational biology. The proposed work by its very nature compels strong collaborative relationships across disciplinary boundaries, from theory to applications. The PI will actively pursue these opportunities. All the software produced in this project will be open-sourced, and made available for download. The PI will also engage in outreach activities that enable high school students to participate.