Principal Investigator Ronitt Rubinfeld
Project Website http://www.nsf.gov/awardsearch/showAward?AWD_ID=1650733&HistoricalAwards=false
Project Start Date September 2016
Project End Date August 2018
It is common scientific practice to propose a model which randomly generates combinatorial objects that are intended to be good approximations of data that we see in the real world. Such models include widely studied random graph models, preferential attachment models and small world networks. Since these models are in such widespread use, many analytical algorithms have been proposed to work with data assumed to be generated according to them. This project has the larger goal of giving a methodology for understanding when these hypothesized models accurately describe the actual data. Unfortunately, it can be shown that for these and other related models, testing that the data comes from such a model requires a tremendous number of samples in the worst case. This project studies methods of making the problem more tractable by determining whether the models are "good enough" descriptions of the data: that is, determining whether the actual data yields the same behavior as the data generated by the models, from the point of view of the analytical algorithms being used. The PIs will exploit potential computational limitations of the analytical algorithms to expedite the tasks of testing randomness properties of the actual samples. In technical terms, this project investigates ways of testing the pseudo-randomness of distributions against various specific classes of algorithms and circuits. Connections to the complexity theoretic concepts of derandomization and circuit lower bounds will be explored.
The broader impact of this project includes engagement in Computer Science Unplugged activities for elementary school children, MIT PRIMES mathematical research with high school students, participation in activities for promoting women in research, mentoring and education of young researchers, development of new courses, and dissemination of results through publications, surveys, and public lectures.