Inference Methods for Machine Learning and High-Dimensional Data in Policy Evaluation and Structural Economic Models

Principal Investigator Victor Chernozhukov

Project Website http://www.nsf.gov/awardsearch/showAward?AWD_ID=1559172&HistoricalAwards=false

Project Start Date May 2016

Project End Date  April 2019

Much of empirical economics focuses on estimating and drawing credible inferences about the causal effects of economic policies or about parameters of underlying economic models. The type of data that researchers use for this task becomes increasingly rich and complex. While these increased data resources open up many new opportunities, they also pose additional challenges, and naïve application of such techniques may render conclusions drawn about economic effects invalid. This research project will establish a general, formal framework to provide guidance about construction of estimation and inference devices coupled with appropriate use of tools from "big data" or data-mining that will deliver reliable conclusions about economic objects of interest. The proposed research will present the methods and corresponding theoretic guarantees to cover a variety of situations encountered in empirical research in economics and the social sciences, offer empirical applications, and provide usable software in statistical packages popular within the social sciences. The theoretical and empirical work will thus help bridge the gap between social science practice and "big data", and will provide methods that will enhance the credibility of the drawn scientific conclusions.

The proposed research will provide bridges between high-dimensional statistical modeling and applied social science research. Integrating high-dimensional methods with economically relevant modeling frameworks and targets is important in providing researchers tools which can be used to analyze modern, complex data and provide reliable inferential statements about the objects of interest. The proposed research will advance the theory of inference following regularization which is a key element to inference in modern, large data sets. The main goal of this research project is to generalize available results about inference for a low-dimensional target parameter of interest by providing an encompassing framework that will include interesting nonlinear models and estimation procedures such as maximum likelihood and generalized method of moments. The investigators will also provide an extension to cover cases where the target of interest is function valued, such as when interest is in a set quantile treatment effects across a range of quantile indices. This advancement will expand the frontier for applications of high-dimensional methods in applications where inference about sets of model parameters is the goal. This expansion is useful even in low-dimensional models and is likely to become crucial as large, complicated data sets become more readily available. In addition to providing theoretical results, the research aims to provide illustrative empirical examples and software in both R and Stata for application of these methods.