Statistical and Computational Tradeoffs in High Dimensional Learning

Principal Investigator Philippe Rigollet

Project Website http://www.nsf.gov/awardsearch/showAward?AWD_ID=1541100&HistoricalAwards=false

Project Start Date February 2015

Project End Date  August 2017

For various statistical procedures, mostly involving searching for a sparse structure in a high-dimensional parameter space, a gap between the performance attained by computationally efficient procedures and optimal ones has been observed. Examples include sparse regression, sparse principal component analysis, community detection, clustering, network analysis and matrix completion. This observation hints at the existence of an inherent statistical price to pay for using computationally efficient methods. The investigators study how large this price can be by drawing new bridges between theoretical computer science and statistical learning theory. Practically, this agenda requires shifting the current notion of statistical optimality to have more relevance under limited computational power and developing new algorithms that achieve said optimality.

The recent establishment of big-data as the new norm is causing a paradigm shift in statistics: computational power is the new bottleneck, not the lack of observations. The investigators lay theoretical foundations to study this new tradeoff between statistical and computational performance. A direct benefit of this research is to help statisticians and practitioners navigate the ocean of available heuristics and avoid the common pitfalls associated with using them.