Rethinking How We Communicate With Our Machines
Jacob Andreas leads the Language and Intelligence Group at MIT. He is an assistant professor in the Department of Electrical Engineering and Computer Science and in the Computer Science and Artificial Intelligence Laboratory.
Jacob Andreas leads the Language and Intelligence Group at MIT, where he builds software systems that can communicate with people in natural language. The general idea behind the field of Natural Language Processing (NLP), a subset of computer science and artificial intelligence (AI), is to provide machines with the ability to understand text or human speech and respond in a way that is similar to humans.
We already interact with machine translation systems regularly in the form of virtual agents, chatbots, and smart assistants like Siri or Alexa. But there are a host of other ways in which human language can be used to communicate with computers. Andreas suggests that everything done with a programming language or a graphical interface, from building robots that work in factories to designing computer vision systems capable of identifying defective products, could potentially be enhanced by NLP.
Language is an important and underutilized tool when it comes to understanding AI.
Today, there is a tremendous amount of industry interest in training machine learning models and building AI systems to make decisions on an automated basis. But it is difficult to understand machine learning technologies, deep neural network models in particular. It’s not always clear why they make their decisions, what they have learned from their training data, or how they will behave with inputs that are different from the inputs on which they were trained.
Andreas thinks language could play a key role in clearing up some of the mystery. “Language is an important and underutilized tool when it comes to understanding AI,” he says. “Humans use language to communicate with one another and explain our own decisions. At my lab, we’re using language as a tool for helping people understand machine learning models as well.”
Andreas and his group have recently developed a procedure that automatically assigns a natural language description of each of the features that a deep network computes. MILAN (or mutual-information-guided linguistic annotation of neurons) is a groundbreaking first step toward building tools for human developers tasked with understanding deep networks, letting them know if they need to modify their training data, whether their models are ready to deploy, or if they are safe to deploy at all.
Historically, we’ve attempted to understand how models learn through visualization. But the context in which those visualizations are used in the real world often falls by the wayside. For example, if you want a model to be able to differentiate between a dog or a zebra, you want it to pay attention to the object being classified. However, past research has demonstrated that if a model is asked to recognize a zebra, it also notes the color of the ground on which the zebra is standing. If a model was trained on images of zebras in a particular context (e.g., in the African bush), it will likely misclassify a zebra standing in a grassy field.
Furthermore, auditing a model through visualization alone is a time-consuming process. A developer may ask a model to produce images that it thinks are similar as well as aspects of the image that are most relevant to the decisions it is making. However, a model typically constructs thousands (even hundreds of thousands) of these features that a human conducting an audit would then need to sift through and identify.
MILAN, on the other hand, catalogs an index of descriptions of everything that a model knows how to recognize or everything a model knows how to do, making it easier for a human user to understand if a model is picking up on a particular feature—whether that’s a sensitive demographic feature that may be irrelevant to the task, or, more generally, to automatically detect aspects of a model’s behavior that are surprising or potentially dangerous.
Human learning from natural language instruction doesn’t have a counterpart in broader machine learning practice today,” Andreas explains, “and that’s what we’re trying to change in my group.
In addition to using language to understand and interpret the decisions behind machine learning models, Andreas is also interested in using language to train machine learning models. Here, he suggests it’s important to consider how we as humans learn new skills. If, for example, you want to learn a new recipe to bake a cake, it is unlikely that you will watch thousands of videos of people baking cakes. Most likely, you’ll read a cookbook and perhaps watch a video. “Human learning from natural language instruction doesn’t have a counterpart in broader machine learning practice today,” Andreas explains, “and that’s what we’re trying to change in my group. We’re developing tools to train machine learning models by teaching them with language, rather than by just showing them examples.”
According to Andreas, his desire to explore problems with real-world impact is directly related to his experience building machine learning models designed to interact with real users.
Before coming to the Institute, he was involved in a startup that spun out of his graduate research group at UC Berkeley and was ultimately acquired by Microsoft. The technology he helped develop now powers conversational interfaces within several Microsoft products.
“I think about the tools I’m developing scaling up to millions, tens of millions, or hundreds of millions of users,” he says. He frequently engages with industry via collaborations fostered through MIT ILP and the MIT Computer Science and AI Laboratory (CSAIL) Alliances Program. Currently, he is working with Sony, applying the tools he develops at the Institute to provide insight into the multinational’s in-house computer vision models. He has also worked with organizations like Google on core tools to improve their natural language processing.
Through the exploration of language and how we as humans learn, Andreas and the Language Intelligence group at MIT are helping to rethink how we interact with AI. One of the big takeaways of his research is that collecting more robot demonstrations or more images of products may not be the most effective way to build better machine learning models. Using language as a source of information can often be much more efficient. “At the Language and Intelligence group, we want to devise methods for training AI systems that use language in the same way that we as humans learn most of our complex skills: by reading, by being instructed, and not just by looking at demonstrations,” he says.