Entry Date:
January 25, 2017

Large Scale Stochastic Optimization and Statistics

Principal Investigator Philippe Rigollet

Project Start Date February 2015

Project End Date
 June 2017


Stochastic optimization offers a general framework to study many fundamental statistical problems related to prediction such as regression, classification and density estimation. Furthermore, it is a natural framework to import powerful algorithms from numerical optimization, especially for large scale problems. The broad goal of this project is to understand the fundamental interactions between statistics and stochastic optimization. To accomplish this task the investigator (a) identifies new problems from statistics, especially with complex structure, that can be recast as stochastic optimization problems; (b) develops new algorithms that optimally and efficiently solve large scale problems; (c) determines essential characteristics of the problems that govern the performance of algorithms and their fundamental limitations; and (d) explores peripheral problems of stochastic optimization including stochastic optimization with stochastic constraints and stochastic optimization with limited feedback.

The information era has witnessed an explosion in the collection of data and large scale data sets are ubiquitous in a wide range of applications including biology, networks, environmental science, sociology and marketing. This results in an acute need of new statistical methods to analyze these data sets of unprecedented size. While techniques from numerical optimization can be used in several scenarios, their analysis remains largely dissociated from that of the statistical task at hand. This research aims at providing a unified treatment of a number of large scale problems emerging from statistical learning and from optimization under uncertainty in general. Therefore, the project will not only result in new and effective algorithms, but also in a novel theoretical framework that supports the analysis of stochastic optimization problems and enables further improvements of said algorithms.