Data Systems and AI Lab (DSAIL)

Co-investigators Michael Stonebraker , Samuel Madden

Project Website http://dsail.csail.mit.edu/

Google, Intel and Microsoft team up with CSAIL on new data-driven initiative
Data Systems and AI Lab (DSAIL) will focus on using machine learning to improve data systems, and vice versa

Recent years have seen an explosion in the creation of machine learning models for everything from self-driving cars to social media feeds. Despite the success of these models at perception and simple prediction, they have yet to have a larger impact on traditional enterprise computing and data processing applications.

Over the past decade, AI has made substantial methodological advances in learning the complex relationships that have evolved among data. In addition, “deep learning” has excelled at a number of perceptual tasks, including image recognition and speech processing. These enhancements have enabled applications from personal digital assistants to autonomous vehicles. An open question, however, is: How far can AI technology be pushed into other application domains?

We founded the Data Systems and AI Lab (DSAIL) to explore this frontier by going beyond the use of AI for automating simple perceptual tasks to investigating opportunities to enhance and optimize large-scale data systems and enterprise applications with learned components synthesized using AI. This will include applying AI both to the construction of traditional data structures such as indexes and database methods like query optimization, schema design, and logical and physical database design; and to algorithms like system load balancing and scheduling. In addition, large-scale enterprise applications, including data integration and predictive modelling, are already benefiting from AI technology. However, at enterprise scale, applying AI technology suffers from an absence of support tools and scalable algorithms.

To achieve these goals, several things need to happen. First, we need new, efficient AI algorithms that can efficiently operate as a part of the inner-loop of large scale systems. Second, before AI can be widely used in mission-critical enterprise applications (as opposed to inherently imprecise applications like web search and information retrieval), we need new systems that systematically manage the process of collecting, cleaning and preparing data, as well as the process of building models and integrating them into deployed systems. Third, software systems and AI algorithms need to co-evolve to take advantage of emerging hardware trends including specialized accelerators, new high-speed interconnects and advanced memory technologies. If successful, the results of this research will change the way we build the large-scale systems of the future, and the way that we use AI techniques inside the modern enterprise.

Multiple groups at CSAIL have already been developing key systems in this space. Madden’s data-discovery tool Data Civilizer, for example, allows organizations to discover related datasets from thousands of distinct business databases and files. Kraska’s work on Northstar helps inexperienced data scientists to quickly build high-quality models.

DSAIL builds on the lab’s existing initiatives that focus on financial technology, cybersecurity and systems approaches to machine learning. It represents an expansion of Intel’s previous five-year collaboration with CSAIL, the Intel Science and Technology Center for Big Data (ISTC).