2024 MIT R&D Conference: Track 5 - AI - Efficient Multi-modal LLM on the Edge

Conference Video|Duration: 20:30
November 19, 2024
  • Video details
     
    Efficient Multi-modal LLM on the Edge
    Song Han
    Associate Professor, MIT Electrical Engineering & Computer Science Department
    This talk presents efficient multi-modal LLM innovations with algorithm and system co-design. I’ll first present VILA, a visual language model deployable on the edge. It is capable of visual in-context learning, multi-image reasoning, video captioning and video QA. Followed by SmoothQuant and AWQ for LLM quantization, which enables VILA deployable on edge devices, bringing new capabilities for mobile vision applications. Second, I’ll present StreamingLLM, a KV cache optimization technique for long conversation and QUEST, leveraging sparsity for KV cache compression.
  • Video details
     
    Efficient Multi-modal LLM on the Edge
    Song Han
    Associate Professor, MIT Electrical Engineering & Computer Science Department
    This talk presents efficient multi-modal LLM innovations with algorithm and system co-design. I’ll first present VILA, a visual language model deployable on the edge. It is capable of visual in-context learning, multi-image reasoning, video captioning and video QA. Followed by SmoothQuant and AWQ for LLM quantization, which enables VILA deployable on edge devices, bringing new capabilities for mobile vision applications. Second, I’ll present StreamingLLM, a KV cache optimization technique for long conversation and QUEST, leveraging sparsity for KV cache compression.