Nonparametric Bayesian Statistics

Principal Investigator Tamara Broderick

Project Website https://stat.mit.edu/research/nonparametric-bayesian-statistics/

The promise of Big Data isn’t simply to estimate a mean with greater accuracy; rather, practitioners are interested in learning complex, hierarchical information from data sets. Bayesian statistics allows not only this flexible modeling but also a coherent treatment of model uncertainty as data accrue. Bayesian nonparametrics goes a step further by providing models whose complexity grows with the size of the data. We expect to see, e.g., a greater diversity of topics as we read more documents from a news publication, a greater diversity of image subjects as we view more photographs online, and more friend groups as we examine more individuals participating in a social network. Bayesian nonparametrics provides modeling solutions in all of these cases by replacing the finite-dimensional prior distributions of classical Bayesian analysis with infinite-dimensional stochastic processes. Novel structures and relationships in data—from clustering, to admixtures, to graphs, to phylogenetic trees—motivate the creation of new Bayesian nonparametric models. And data sets of increasing size call for new computational tools and algorithms for learning and inference in these models. Our work demonstrates how to retain the strengths of the Bayesian paradigm and infinite-dimensional, nonparametric analysis while simultaneously enabling fast, and even streaming, inference on modern, large data sets.