Brain Netflix: Materializing Human Thought with Generative Models

Principal Investigator Aude Oliva

Project Start Date December 2023

The human brain processes 11 million bits of information per second coming from our senses and compresses most of this information in ways that are still, in great part, unknown. What if we could recover this information by simply reading out brain signals? A myriad of applications, such as thought visualization, mind-operated automata, and even telepathy might become possible. All these applications, however, are subject to a key underlying scientific question: what is the fundamental limit of information we can recover from encoded brain signals? Does such a limit allow for applications that require fine-grained thought decoding, such as mind-based communication?

Based on the multi-disciplinary expertise of our research group, we aim to address these questions in three stages: 1) collecting and curating the largest combination of non-invasive human brain data, fMRI (functional magnetic resonance imaging) and M/EEG (Magneto/Electroencephalography) datasets in the field; 2) building brain-to-video models that can translate human brain signals into meaningful visuals, and 3) using this knowledge to develop models that can generate multiple modalities from brain signals, including video, audio and text, and study their characteristics to determine information recovery limits - both at an experimental and theoretical level.

With our unique framework for homogenizing neural data from different individual’s brains, we aim to build a multi-modal database of millions of brain signals, gathered through visual, textual, and auditory tasks: watching a video, listening to sounds or speech, thinking about an action, reading a phrase, or repeating words or ideas in their mind. To kickstart this project, the MIT Computational Perception and Cognition lab has collected a unique neuroscience dataset comprising 30k high-quality fMRI responses of individuals watching over 1000 videos clips. This has allowed us to develop a first version of a brain-to-video machine learning model that can reconstruct the observed short video clip from fMRI brain signals. Leveraging large text to video generative models, we can extract latent and conditioning vectors from fMRI signals that can properly recover the watched video, and we have showcased the impact of utilizing data from different subjects through our homogenization method to increase reconstruction quality.

This basic science project is the starting point for multiple high-impact scientific and technological advances over the next few years. First, a generative approach allows us to test neural data granularity levels, as well as discover the spatial and temporal bandwidth necessary and sufficient for a person and an AI system to communicate. Second, a model trained to reconstruct multiple modalities accurately would allow us to recover images, videos, and text from thought alone: this enables applications in the medical sector, such as communicators for speech-impeded individuals. Third, models that can extract actions from brain signals could be used to remotely control electronic devices, digital interfaces, and robotic agents in an intuitive way, potentially facilitating human-AI collaboration through a complementary or more seamless medium than text or speech.

Unique to our approach is the development of techniques to project brain signals into a shared digital embedding space, which gives way to two key future developments: First, this allows us to leverage data across different individuals to train reconstruction models, an alignment which is typically hard to do between different people’s brains. Second, this shared digital space helps us map one individual ’s signals into another individual ’s brain, recovering and predicting how the receiving person’s brain would have responded to a given stimulus. With enough research, this technology could be refined to allow for mind-to-mind communication, mediated via a digital model. To achieve this, these far stretch technologies require us to answer the basic scientific question around recoverable brain information limits, understanding what we can decode and simulate through generative models. Funding request: $450,000 would allow us to collect, and store, multi-modal neuroscience datasets to train and test the generative models and hire a postdoctoral researcher for 18-24 months.