Intelligence Everywhere: Redefining AI Infrastructure from the City Edge to the Data Center Core
Nir Shavit Professor, MIT Computer Science & Artificial Intelligence Laboratory Co-Founder, Neural Magic
AI inference workloads are exploding—and with them a new type of parameter-driven computation that, unlike the data-driven computations of the past, requires a new execution layer between the models and the hardware they run on. In steps Red-Hat with its VLLM based Inference Server, a highly efficient run-time that provides seamless efficient execution of models, no matter how large, on a multitude of state-of-the-art acceleration hardware. I will discuss VLLM, the technical issues in designing this new inference abstraction layer, and how it will change the industry.