Making Sense of Sight
James DiCarlo studies how we see and recognize objects.
The act of seeing – visually perceiving an image and recognizing its meaning – is one of the most universal automatic biological responses. But it also turns out to be quite computationally complex, explains James DiCarlo, MD, PhD, department head and professor of neuroscience in the Department of Brain and Cognitive Sciences at MIT and an investigator in the McGovern Institute for Brain Research, who studies visual recognition.
The unifying theme of Dicarlo’s lab is to understand how we can recognize objects in the world using visual data – or how we are able to see. How do individuals intake that data naturally and be able to know what ‘it’ is? “That is quite a challenging problem that the brain solves somehow,” he says. “Our hope is to understand exactly how the brain solves that problem.”
Achieving that goal opens the door to multiple applications. “The one that drove me initially is that if we could understand how the brain computes we should be able to build machines that can compute in a similar way. That is, to build better machine vision systems,” DiCarlo explains adding that his job is to extract, mimic, and perhaps improve upon those mechanisms from the brain – what he calls the best model available for what is intelligence – and using them for building artificial systems.
“The brain teaches us that human perception and neural activity is effortless and almost very automatic,” he says. “It’s because we come somehow prebuilt at least after development to have a very fast, automatic processing system. It is the standard model now of how vision should be done even in machine systems.”
DiCarlo is working to discover and sort out the various layered neuronal patterns in the brain that support visual object recognition. He hopes to eventually develop new algorithms of visual processing that could be brought to industry. Already, his lab has contributed knowledge and ideas that are included in some of the best vision systems in the world. At the heart of these systems are deep neural networks that DiCarlo describes as driven by decades of brain research with a lot of computer engineering on top of them.
“We view those algorithms as models of what might be happening in our own visual systems,” he explains. He and his colleagues build computer models from sensor data and transform that data in a way they think the brain transforms it to make explicit content. Those visual processing algorithms naturally harbor several layers of processing. DiCarlo explains, “We look at the internals of those layers and make detailed comparisons of the responses of the simulated neurons in those algorithms with the actual neuronal responses we record in the brain. That allows us to ask if each proposed algorithm is acting like the algorithm we measure in the brain. One of our projects is to further develop these algorithms to be more and more like the brain.” His team hopes their work will contribute to the next generation of algorithms that work even better than the current artificial vision systems and possibly also for hearing and tactile systems.
One of DiCarlo’s other major projects involves controlling perceptions by direct neural controls to replace lost vision or augment vision. The idea is to capture and manipulate the outputs of neurons in the brain and understand how they relate, to say, a perception of a face. “These are the outputs at the top of the brain’s visual object representation algorithm,” he says. “We suspect that if we directly manipulate that neural activity, we might be able to produce repeatable changes in the subject’s perception about the visual world. You could imagine applications for the blind and even for those who have sight but want to access the brain’s visual representations directly,” he says.
This work involves controlling neurons with light (optogenetics) and turning neurons on and off to be able to directly interface with the brain tissue that underlies visual object perception. “As we get that understanding we aim to transform that to devices that might be used to directlybypass damage or loss at lower levels of the visual system, including most forms of blindness,” he says. “We are currently tapping the top levels of the visual system and then seeing how a single injection at those high levels leads to changes in perception in a predictable way.” His team is studying what spatial and temporal format those injected signals should be in to produce the best perceptual effects.
DiCarlo’s work has attracted industry interest, particularly from those industries interested in modeling the brain’s deep neural networks to produce simulated deep neural networks for computer vision tasks. “Some of the most cutting edge industries are interested in partnering with us to be able to generate the next generation of deep neural networks that can be perhaps be even more brain-like, not just in the way they process but in the way they learn about the visual world,” DiCarlo says.
Future applications for DiCarlo’s work are incredibly varied ranging from medical imaging to security in the form of face recognition. “The applications are boundless and it’s just a matter of how good these systems can get and how human we’d like them to be.”