Project Website http://people.csail.mit.edu/kraska/research.html
The goal of this project is to explore the frontier of which components of a data management system can be replaced by learned components. In a recent paper called the The Case for Learned Index Structures we showed that core index/data structures can be considered as models: a BTree-Index is a model to map a key to the position a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. As such, all existing index structures can be replaced with other types of models, including deep-learning models, which we termed learned indexes. Initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of- magnitude in memory over several real-world data sets.
Yet, why initial results are very promising many open research challenges remain to make the idea viable for real-world systems. However, if it is possible to overcome the current limitations of learned indexes and broaden the scope to other components (e.g., query optimization), the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs. Most importantly, it might be possible to transform the complexity class of some algorithms (e.g., the O(log n) insert performance of B-Trees to O(1)). Similarly, learned components might make it possible to take better advantage of GPU/TPUs for systems.