Glia: An AI for Autonomous System Design and Optimization
Hari Balakrishnan Fujitsu Professor of Computer Science
Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We have developed Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired, multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning process. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels while yielding novel insights into workload behavior. Crucially, it achieves these results in 10x-100x less time than human experts. Our results suggest that by combining reasoning LLMs with structured experimentation, AI can produce creative and understandable designs for difficult problems in computer and AI systems design.
This is based on joint work with Prof. Mohammad Alizadeh and our students.