Redefining AI Infrastructure from the City Edge to the Data Center Core: Nir Shavit

Conference Video|Duration: 21:26
September 30, 2025
Please login to view this video.
  • Video details

    Intelligence Everywhere: Redefining AI Infrastructure from the City Edge to the Data Center Core

    Nir Shavit
    Professor, MIT Computer Science & Artificial Intelligence Laboratory
    Co-Founder, Neural Magic

    AI inference workloads are exploding—and with them a new type of parameter-driven computation that, unlike the data-driven computations of the past, requires a new execution layer between the models and the hardware they run on. In steps Red-Hat with its VLLM based Inference Server, a highly efficient run-time that provides seamless efficient execution of models, no matter how large, on a multitude of state-of-the-art acceleration hardware. I will discuss VLLM, the technical issues in designing this new inference abstraction layer, and how it will change the industry.

Locked Interactive transcript
Please login to view this video.
  • Video details

    Intelligence Everywhere: Redefining AI Infrastructure from the City Edge to the Data Center Core

    Nir Shavit
    Professor, MIT Computer Science & Artificial Intelligence Laboratory
    Co-Founder, Neural Magic

    AI inference workloads are exploding—and with them a new type of parameter-driven computation that, unlike the data-driven computations of the past, requires a new execution layer between the models and the hardware they run on. In steps Red-Hat with its VLLM based Inference Server, a highly efficient run-time that provides seamless efficient execution of models, no matter how large, on a multitude of state-of-the-art acceleration hardware. I will discuss VLLM, the technical issues in designing this new inference abstraction layer, and how it will change the industry.

Locked Interactive transcript
Please login to view this video.
  • Video details

    Intelligence Everywhere: Redefining AI Infrastructure from the City Edge to the Data Center Core

    Nir Shavit
    Professor, MIT Computer Science & Artificial Intelligence Laboratory
    Co-Founder, Neural Magic

    AI inference workloads are exploding—and with them a new type of parameter-driven computation that, unlike the data-driven computations of the past, requires a new execution layer between the models and the hardware they run on. In steps Red-Hat with its VLLM based Inference Server, a highly efficient run-time that provides seamless efficient execution of models, no matter how large, on a multitude of state-of-the-art acceleration hardware. I will discuss VLLM, the technical issues in designing this new inference abstraction layer, and how it will change the industry.

Locked Interactive transcript