2024 MIT R&D Conference: Track 5 - AI - Is AI Ready to Transform Chemistry & Materials Science

Conference Video|Duration: 18:38
November 19, 2024
  • Video details
     
    Is AI Ready to Transform Chemistry and Materials Science?
    Rafael Gomez-Bombarelli
    Jeffrey Cheah Career Development Chair,
    Associate Professor, MIT Department of Materials Science and Engineering (DMSE)

    AI’s influence is undeniable in the digital realm, affecting consumers’ lives and corporate operations. Transferring these advancements to sectors producing physical goods, such as drug discovery and biotech, commodity chemicals, materials for energy and sustainability, and manufacturing, presents a thrilling prospect and a translational challenge. This talk will explore the present use cases and the potential of applying generative AI within the chemistry and materials domain. Unlike a large part of the tech sector, these industries are capital-intensive and cautious, meaning that AI must bridge an “execution gap” between the digital and physical realms for value generation. We will outline strategies to overcome current technical and cultural hurdles.

  • Interactive transcript
    Share

    RAFAEL GOMEZ-BOMBARELLI: And I realized I made a rookie mistake, because every title that is a question, hints that the answer is no, right? This applies to headlines, Betteridge's law of headlines. If you ask a question for a title, that means you're not sure.

    So, let me rephrase. Actually, yes, AI is ready to transform chemistry and material science. So, please take it on the affirmative, not as a question. And I'll give you some evidence of why I'm ready to say yes.

    You all use probably or experience GPT, right? If anything has changed over the last decade, it's things that touch machine learning and deep learning. We've really been seeing sort of transformational change at a scale that is hard to perceive, since we're inside it.

    But Go was solved. It was supposed to be an intractable combinatorial problem, and it was solved a decade ago. And you can see the examples of things that we thought required lots of human creativity and human talent, like making good images, turn out, they can be just interpolated from known images in ways that we really didn't expect, like I said, five to 10 years ago.

    And then in the physical sciences, or in the sciences, we've got our big examples over the last four to five weeks, when two Nobel prizes were awarded to AI technologies. One for the foundational work in physics, and another one for the AlphaFold protein structure prediction task.

    And this has been a combination of compute, algorithms, and data. Being able to deploy big data sets or make them, put them into these GPUs that NVIDIA makes, and that's why they're one of the most valuable companies in the world, and algorithms that are specific to the tasks.

    And I, for instance, will never write a cover letter in my life. I'm going to ask GPT to write a cover letter for me. And I suspect people that apply for jobs at your companies or my school are using GPT for cover letters.

    OK, we want that in material science and chemistry, right? We want to have AI tell us, well, this is an amazing drug. This is an absorbent that will capture CO2 out of the air. This is a battery electrolyte that will enable lithium metal batteries for aviation. OK?

    Well, this is kind of a slightly different than what AI has done in most other places, because we're asking it to be better than the best scientists has been so far, which is kind of a much harder task than just getting a picture of an astronaut on the moon.

    Now, we're asking for the best possible picture of an astronaut on the moon, one I want to patent and sell. And we're never going to have as much data as some of these other tasks. There is a lot more images in the internet than there are electrolytes, or etching materials.

    To our advantage perhaps, is that we know the rules of the game. Physics governs, or should govern everything we do in the physical sciences, meaning that there are underlying equations that we either know and should be able to exploit, or can assume are there, even if we cannot explicitly write them. Although that's not necessarily a foregone conclusion. I'll give you more.

    So what are the things that AI does today, and maybe your companies do, companies out there do? What we call structural property prediction. Just like you trust a machine vision algorithm to classify whether a picture is a cat or not, because there is plenty of cat data to train it on.

    If there is enough data, in the thousands, maybe tens of thousands of something, we can totally train neural networks that go from matter to property. And this could be toxicity of molecules, ionic conductivity of crystals, fluorescence of proteins. So, all of these materials, classes of chemical classes, have had neural networks made for them that allow structure property predictions.

    We can use AI for synthesis planning. Not just telling you, well, this is a good chemical, but also, to plan how to make it. In chemistry, because data has been very structured for a long time, the chemists are very laboriously collecting the way they make chemicals. There are big databases, and there are retrosynthesis as a product services today.

    So you can just go to the internet and look, chemical synthesis AI, and you will find vendors that will give you today the GPT version of chemical synthesis, where you say, how would I make this chemical, and the AI says, buy these precursors, put them together in this way, you will be able to make this molecule. That exists today. It's commoditized for molecules.

    For materials is a little bit tricky, because when you say materials, maybe you're thinking of steel, and maybe here, somebody else is thinking about nanoparticles for drug delivery. Materials are a way more heterogeneous class, where we really don't record synthesis in the same way. There's no one unique repository for how to make materials.

    Actually, make materials could mean casting for some people, and it could make mean sol-gel for other people. So, that's happening right now. How to have LLMs, large language models, read this literature and deploy back protocols to make stuff, is happening as we speak.

    And then in the same spirit as these protocols, there is LLMs or agents for orchestration, where individual pieces of AI with narrow focus tasks, coordinate one another to orchestrate a more complex function that we again, thought a human person would need to do.

    And there is example in chemistry, there is example in metal organic frameworks, which are materials, at the end of the day, where an agent reads a paper, another agent summarizes a recipe, maybe an agent talks to that robot to make the chemistry. I'll talk more about that.

    But in general, we're seeing sort of AI being the glue between different agents, some of which are AI and some of which, like I said, are a traditional simulator. I have a lot on simulations.

    Or maybe all of these are just Googling. Maybe it's an agent just goes and Googles, or goes to the patent database and checks if this new chemical has been registered by somebody else. So this orchestration is definitely happening as we speak.

    And then generative AI. I mean, we've seen it work for faces. In 2018, we already could make, the community could make photorealistic faces of made-up people. And over the years, we've been able to make chemically realistic versions of molecules to some degree, and definitely, definitely of proteins.

    Protein design is, we're seeing it peak right now with Xaira has raised $1 billion out of a Nobel Award. And David Baker, it's definitely happening right now. And materials, again, are we designing an analogy, are we designing soft matter? We've got a broader mandate, but the algorithms are coming along. As you can see from the dates of those papers, these things are happening right now.

    And then there is a place that is particularly interesting to me, which is the combination of AI and simulations, because it turns out, if we really know the physics, we can make arbitrary amounts of data. We can just simulate more. We'll pay more compute, but compute is infinitely scalable. You just rent another computer, get more data.

    So we can generate data in a way that most other tasks don't, and we have this very well-defined computational workflows that can benefit from AI. So there is, in my world, in atomistic simulations, when we look at atoms one resolution down from what Professor Boning was saying, typically, we get about three things.

    How well we're going to get the energies and the forces of how these atoms talk to one another. Typically, we do quantum mechanics, really, really expensive. How many atoms do we get to simulate? Typically, we're capped tens of thousands, hundreds of thousands, some Herculean efforts can do millions. But that's already rare.

    And because of the way we simulate the world in one femtosecond increments, it takes forever to simulate materials. So we typically stop at timescales of nanoseconds to microseconds. It's actually very hard to simulate something like protein folding that takes milliseconds.

    So all of these things are being actively researched by folks at MIT, my group, but also, Meta, DeepMind, Intel, and Nvidia. Both academic teams and tech companies are thinking about this fusion of AI and simulation. So like I said, ML potentials mean machine learning models that substitute quantum chemistry.

    Meta has released 110 million data points to train these models on, openly. They just put them out there for everybody to use. And I suspect there's more coming. DeepMind made them, but didn't release them. So, kudos to Meta for opening data. Boo to Google, for not opening.

    Accelerated MD, like I said, we can only get to see nanoseconds of microseconds of timescale. But if you're looking at a protein that is responsible for some disease, its evolution takes milliseconds, or maybe seconds. OK, well, how can we run this faster?

    We do this in my lab for battery materials. I want to understand how lithium moves inside a battery material, but I cannot wait to see the time scale in an atomistic simulation. So how can I accelerate, while preserving all the physics placed on using machine learning as a surrogate model for these simulators?

    And something that, again, we're pursuing actively, which is learning the types of continuum or semiempirical models that people parameterize with PDEs from the all-atom, fully resolved quantum mechanical evolution of systems.

    So I have a couple of specific examples, and maybe we'll relate to some of your interests. One is going beyond traditional simulations of surfaces. So if you've been close to material surfaces, you will know that theoreticians typically model material surfaces as perfectly perfect, amazing, pristine cut of the solid.

    Well, that's not how surfaces look like in the real world. At the moment, maybe a very low pressure when you're doing epitaxy, everything is amazing. But the moment things are happening, surfaces undergo reconstruction. Some atoms move around, they creep up on top of one another.

    If you're growing a material, the material is physically stacking up more layers, all of this is way beyond the traditional assumption of tiny unit cells that are perfectly pristine. So this is a place where machine learning can definitely help. And over the last--

    This was last year, put a paper, where we did the whole two parts. We need good energies and forces. So we train a machine learning potential on tens of thousands of quantum mechanical calculations, such that we can scale it to larger sizes and longer timescales.

    So we do AI to make our energies and forces better. And then we do AI to sample the phase space of these surfaces could look like. And this is a diagram of the amount of learning it takes. This is not something that people come in and click a button yet. This is an active area of research, but it's possible.

    So we can make surrogate models of quantum chemistry that are fast enough, scalable enough that in this particular example, we were able to recover the experimentally known reconstruction of a complex material, like a strontium titanium oxide.

    And agree with experiments on when as a function of the pressure of oxygen and as a function of the composition of the bulk, we can get different surface reconstructions. And the stoichiometry of the surface reconstructions, what atoms they're made of, we agreed with prior calculations.

    But how the atoms are arranged, we actually found more realistic, lower energy, more believable reconstructions that prior work had. Because AI can be more aggressive in exploring these spaces. So I would say surfaces used to be a big no-no for of an issue, accuracy. They're at play now in terms of growth, in terms of reconstruction, in terms of functionalization.

    Disorder is another challenge that traditionally has been hard to model in crystalline materials. The fact that the crystal itself, you might have a material that is perfectly ordered, it has atoms in a lattice. Fine. But then, which atoms sit on which lattice can be disordered.

    And again, if you're in a world where you can only simulate tens or hundreds of atoms, you can't really access disorder, because every pattern you choose, because the unit cell is so small and it's symmetrical, turns out to be order. You can't really simulate disorder with traditional methods.

    Well, over the last year, we have used the same machine learning potentials that I told you can do energies and forces, they can do alchemy. And they can interpolate between known elements representing disorder. This is a little bit technical, but I would say one of the big Achilles horses of material discovery used to be you're constrained to making small unit cells. And small unit cells cannot capture disorder.

    The fact that different atoms in an alloy, if you're thinking about high entropy alloy, for instance, where many elements can be distributed randomly on very on different positions of the lattice, that cannot be modeled with traditional simulations systems. It's very expensive.

    Well, it turns out, these machine learning models not only scale in size, but can play this alchemical trick we call, where when we want to simulate a 9,010% alloy of copper and silver, we can actually cheat and simulate atoms that are 90% silver.

    Sorry, 90% copper, 10% silver, and they themselves behave like a disordered alloy. And this recovers experimental trends that are nonlinear and unexpected in ways that go beyond classical simulations.

    And then my last example is when all this theory is nice and dandy, Rafa, but at the end of the day, materials need to be made. Well, this is exactly where all these tools come together with robotization and automation in the lab, and what we call closing the loop or autonomous research.

    And I have an example here on molecular dyes, but this is something that is sort of moving around through the field. We've also done polymer electrolytes and different labs do different classes of materials. You may hear more about it today.

    And at the end of the day, this is about, like I said, connecting these AI and computational intelligence to actual execution in the real world. That top robot was made at MIT, was in Nature in 2023 by Claus Jensen lab.

    That bottom robot is from Berkeley, the top robot makes molecules, the bottom robot makes solid powders. And this is sort of happening. Each of these machines costs multiple millions of dollars and takes multiple years, FTE, to get set up. So, this is not trivial yet, and it's not commoditized yet, but it's happening.

    So in this example, we made a special neural network that blends theory and experiment to predict optical properties of dyes. So it's a place where we have tens of thousands of simulations, and we have tens of thousands of experiments. So we put them together, and we make this neural network that is very good at predicting optical properties.

    This is just a small piece, because then comes the robot, and like I said, big shout out to Professor Jensen, because the robot itself, driven by all these pieces of AI talking to one another, achieves what we said. A generative model proposes molecules. A retrosynthesis model tells you how to make them. An actual a master controller calls the robot to execute the experiments.

    We measure the properties of these online. All properties that are amenable to this type of robotization. Other properties, if you need a synchrotron, you cannot do that in a robot. You need to go to the synchrotron.

    And then we update the models and do the same active learning or Bayesian optimization that Professor Boning just referred to, which is, we look at the models and they themselves identify the regions that combine the most informative new experiments to run, with the most optimistic outcomes you could get.

    The math of this is well-controlled. You need to decide how much you want to balance the two. But the math of how to balance exploration and exploitation is well understood. So the model itself decides, I really don't know what's going to happen if I do this, plus, this compound looks really, really performant, according to the properties you cared about.

    And in this particular case, we went around the merry-go-round three times and hundreds of molecules were made in this collaborative, obviously, science paper.

    I will close saying that we are at peak, I feel, excitement about AI for science in materials. You can see, we've seen sort of, we're at the third wave. It was a small molecules in the late 2010s. It's been proteins peaking now.

    The main tools were-- The idea that this was going to work was put out in 2018, 2019, 2020. And now we're in the materials design peak. And you can see the startups. You can see, like I said, the fact that Meta, they just put out a catalysis data set the day before yesterday. NVIDIA is making AI for chemistry.

    We're definitely at the sort of peak of interest in energy and sustainability. And I think manufacturing is the next stop. It's coming, the digital twins. I was on two proposals for the digital twin, but the ones that didn't get it. But it's clear that it's coming. That's the next frontier.

    OK, how do we hook this not just to scientific research, but to real world scaling up? I'm excited to talk about that, too. And with that, I'll thank the team, and hopefully, there's a little bit of time for questions. Thank you.

    [APPLAUSE]

  • Video details
     
    Is AI Ready to Transform Chemistry and Materials Science?
    Rafael Gomez-Bombarelli
    Jeffrey Cheah Career Development Chair,
    Associate Professor, MIT Department of Materials Science and Engineering (DMSE)

    AI’s influence is undeniable in the digital realm, affecting consumers’ lives and corporate operations. Transferring these advancements to sectors producing physical goods, such as drug discovery and biotech, commodity chemicals, materials for energy and sustainability, and manufacturing, presents a thrilling prospect and a translational challenge. This talk will explore the present use cases and the potential of applying generative AI within the chemistry and materials domain. Unlike a large part of the tech sector, these industries are capital-intensive and cautious, meaning that AI must bridge an “execution gap” between the digital and physical realms for value generation. We will outline strategies to overcome current technical and cultural hurdles.

  • Interactive transcript
    Share

    RAFAEL GOMEZ-BOMBARELLI: And I realized I made a rookie mistake, because every title that is a question, hints that the answer is no, right? This applies to headlines, Betteridge's law of headlines. If you ask a question for a title, that means you're not sure.

    So, let me rephrase. Actually, yes, AI is ready to transform chemistry and material science. So, please take it on the affirmative, not as a question. And I'll give you some evidence of why I'm ready to say yes.

    You all use probably or experience GPT, right? If anything has changed over the last decade, it's things that touch machine learning and deep learning. We've really been seeing sort of transformational change at a scale that is hard to perceive, since we're inside it.

    But Go was solved. It was supposed to be an intractable combinatorial problem, and it was solved a decade ago. And you can see the examples of things that we thought required lots of human creativity and human talent, like making good images, turn out, they can be just interpolated from known images in ways that we really didn't expect, like I said, five to 10 years ago.

    And then in the physical sciences, or in the sciences, we've got our big examples over the last four to five weeks, when two Nobel prizes were awarded to AI technologies. One for the foundational work in physics, and another one for the AlphaFold protein structure prediction task.

    And this has been a combination of compute, algorithms, and data. Being able to deploy big data sets or make them, put them into these GPUs that NVIDIA makes, and that's why they're one of the most valuable companies in the world, and algorithms that are specific to the tasks.

    And I, for instance, will never write a cover letter in my life. I'm going to ask GPT to write a cover letter for me. And I suspect people that apply for jobs at your companies or my school are using GPT for cover letters.

    OK, we want that in material science and chemistry, right? We want to have AI tell us, well, this is an amazing drug. This is an absorbent that will capture CO2 out of the air. This is a battery electrolyte that will enable lithium metal batteries for aviation. OK?

    Well, this is kind of a slightly different than what AI has done in most other places, because we're asking it to be better than the best scientists has been so far, which is kind of a much harder task than just getting a picture of an astronaut on the moon.

    Now, we're asking for the best possible picture of an astronaut on the moon, one I want to patent and sell. And we're never going to have as much data as some of these other tasks. There is a lot more images in the internet than there are electrolytes, or etching materials.

    To our advantage perhaps, is that we know the rules of the game. Physics governs, or should govern everything we do in the physical sciences, meaning that there are underlying equations that we either know and should be able to exploit, or can assume are there, even if we cannot explicitly write them. Although that's not necessarily a foregone conclusion. I'll give you more.

    So what are the things that AI does today, and maybe your companies do, companies out there do? What we call structural property prediction. Just like you trust a machine vision algorithm to classify whether a picture is a cat or not, because there is plenty of cat data to train it on.

    If there is enough data, in the thousands, maybe tens of thousands of something, we can totally train neural networks that go from matter to property. And this could be toxicity of molecules, ionic conductivity of crystals, fluorescence of proteins. So, all of these materials, classes of chemical classes, have had neural networks made for them that allow structure property predictions.

    We can use AI for synthesis planning. Not just telling you, well, this is a good chemical, but also, to plan how to make it. In chemistry, because data has been very structured for a long time, the chemists are very laboriously collecting the way they make chemicals. There are big databases, and there are retrosynthesis as a product services today.

    So you can just go to the internet and look, chemical synthesis AI, and you will find vendors that will give you today the GPT version of chemical synthesis, where you say, how would I make this chemical, and the AI says, buy these precursors, put them together in this way, you will be able to make this molecule. That exists today. It's commoditized for molecules.

    For materials is a little bit tricky, because when you say materials, maybe you're thinking of steel, and maybe here, somebody else is thinking about nanoparticles for drug delivery. Materials are a way more heterogeneous class, where we really don't record synthesis in the same way. There's no one unique repository for how to make materials.

    Actually, make materials could mean casting for some people, and it could make mean sol-gel for other people. So, that's happening right now. How to have LLMs, large language models, read this literature and deploy back protocols to make stuff, is happening as we speak.

    And then in the same spirit as these protocols, there is LLMs or agents for orchestration, where individual pieces of AI with narrow focus tasks, coordinate one another to orchestrate a more complex function that we again, thought a human person would need to do.

    And there is example in chemistry, there is example in metal organic frameworks, which are materials, at the end of the day, where an agent reads a paper, another agent summarizes a recipe, maybe an agent talks to that robot to make the chemistry. I'll talk more about that.

    But in general, we're seeing sort of AI being the glue between different agents, some of which are AI and some of which, like I said, are a traditional simulator. I have a lot on simulations.

    Or maybe all of these are just Googling. Maybe it's an agent just goes and Googles, or goes to the patent database and checks if this new chemical has been registered by somebody else. So this orchestration is definitely happening as we speak.

    And then generative AI. I mean, we've seen it work for faces. In 2018, we already could make, the community could make photorealistic faces of made-up people. And over the years, we've been able to make chemically realistic versions of molecules to some degree, and definitely, definitely of proteins.

    Protein design is, we're seeing it peak right now with Xaira has raised $1 billion out of a Nobel Award. And David Baker, it's definitely happening right now. And materials, again, are we designing an analogy, are we designing soft matter? We've got a broader mandate, but the algorithms are coming along. As you can see from the dates of those papers, these things are happening right now.

    And then there is a place that is particularly interesting to me, which is the combination of AI and simulations, because it turns out, if we really know the physics, we can make arbitrary amounts of data. We can just simulate more. We'll pay more compute, but compute is infinitely scalable. You just rent another computer, get more data.

    So we can generate data in a way that most other tasks don't, and we have this very well-defined computational workflows that can benefit from AI. So there is, in my world, in atomistic simulations, when we look at atoms one resolution down from what Professor Boning was saying, typically, we get about three things.

    How well we're going to get the energies and the forces of how these atoms talk to one another. Typically, we do quantum mechanics, really, really expensive. How many atoms do we get to simulate? Typically, we're capped tens of thousands, hundreds of thousands, some Herculean efforts can do millions. But that's already rare.

    And because of the way we simulate the world in one femtosecond increments, it takes forever to simulate materials. So we typically stop at timescales of nanoseconds to microseconds. It's actually very hard to simulate something like protein folding that takes milliseconds.

    So all of these things are being actively researched by folks at MIT, my group, but also, Meta, DeepMind, Intel, and Nvidia. Both academic teams and tech companies are thinking about this fusion of AI and simulation. So like I said, ML potentials mean machine learning models that substitute quantum chemistry.

    Meta has released 110 million data points to train these models on, openly. They just put them out there for everybody to use. And I suspect there's more coming. DeepMind made them, but didn't release them. So, kudos to Meta for opening data. Boo to Google, for not opening.

    Accelerated MD, like I said, we can only get to see nanoseconds of microseconds of timescale. But if you're looking at a protein that is responsible for some disease, its evolution takes milliseconds, or maybe seconds. OK, well, how can we run this faster?

    We do this in my lab for battery materials. I want to understand how lithium moves inside a battery material, but I cannot wait to see the time scale in an atomistic simulation. So how can I accelerate, while preserving all the physics placed on using machine learning as a surrogate model for these simulators?

    And something that, again, we're pursuing actively, which is learning the types of continuum or semiempirical models that people parameterize with PDEs from the all-atom, fully resolved quantum mechanical evolution of systems.

    So I have a couple of specific examples, and maybe we'll relate to some of your interests. One is going beyond traditional simulations of surfaces. So if you've been close to material surfaces, you will know that theoreticians typically model material surfaces as perfectly perfect, amazing, pristine cut of the solid.

    Well, that's not how surfaces look like in the real world. At the moment, maybe a very low pressure when you're doing epitaxy, everything is amazing. But the moment things are happening, surfaces undergo reconstruction. Some atoms move around, they creep up on top of one another.

    If you're growing a material, the material is physically stacking up more layers, all of this is way beyond the traditional assumption of tiny unit cells that are perfectly pristine. So this is a place where machine learning can definitely help. And over the last--

    This was last year, put a paper, where we did the whole two parts. We need good energies and forces. So we train a machine learning potential on tens of thousands of quantum mechanical calculations, such that we can scale it to larger sizes and longer timescales.

    So we do AI to make our energies and forces better. And then we do AI to sample the phase space of these surfaces could look like. And this is a diagram of the amount of learning it takes. This is not something that people come in and click a button yet. This is an active area of research, but it's possible.

    So we can make surrogate models of quantum chemistry that are fast enough, scalable enough that in this particular example, we were able to recover the experimentally known reconstruction of a complex material, like a strontium titanium oxide.

    And agree with experiments on when as a function of the pressure of oxygen and as a function of the composition of the bulk, we can get different surface reconstructions. And the stoichiometry of the surface reconstructions, what atoms they're made of, we agreed with prior calculations.

    But how the atoms are arranged, we actually found more realistic, lower energy, more believable reconstructions that prior work had. Because AI can be more aggressive in exploring these spaces. So I would say surfaces used to be a big no-no for of an issue, accuracy. They're at play now in terms of growth, in terms of reconstruction, in terms of functionalization.

    Disorder is another challenge that traditionally has been hard to model in crystalline materials. The fact that the crystal itself, you might have a material that is perfectly ordered, it has atoms in a lattice. Fine. But then, which atoms sit on which lattice can be disordered.

    And again, if you're in a world where you can only simulate tens or hundreds of atoms, you can't really access disorder, because every pattern you choose, because the unit cell is so small and it's symmetrical, turns out to be order. You can't really simulate disorder with traditional methods.

    Well, over the last year, we have used the same machine learning potentials that I told you can do energies and forces, they can do alchemy. And they can interpolate between known elements representing disorder. This is a little bit technical, but I would say one of the big Achilles horses of material discovery used to be you're constrained to making small unit cells. And small unit cells cannot capture disorder.

    The fact that different atoms in an alloy, if you're thinking about high entropy alloy, for instance, where many elements can be distributed randomly on very on different positions of the lattice, that cannot be modeled with traditional simulations systems. It's very expensive.

    Well, it turns out, these machine learning models not only scale in size, but can play this alchemical trick we call, where when we want to simulate a 9,010% alloy of copper and silver, we can actually cheat and simulate atoms that are 90% silver.

    Sorry, 90% copper, 10% silver, and they themselves behave like a disordered alloy. And this recovers experimental trends that are nonlinear and unexpected in ways that go beyond classical simulations.

    And then my last example is when all this theory is nice and dandy, Rafa, but at the end of the day, materials need to be made. Well, this is exactly where all these tools come together with robotization and automation in the lab, and what we call closing the loop or autonomous research.

    And I have an example here on molecular dyes, but this is something that is sort of moving around through the field. We've also done polymer electrolytes and different labs do different classes of materials. You may hear more about it today.

    And at the end of the day, this is about, like I said, connecting these AI and computational intelligence to actual execution in the real world. That top robot was made at MIT, was in Nature in 2023 by Claus Jensen lab.

    That bottom robot is from Berkeley, the top robot makes molecules, the bottom robot makes solid powders. And this is sort of happening. Each of these machines costs multiple millions of dollars and takes multiple years, FTE, to get set up. So, this is not trivial yet, and it's not commoditized yet, but it's happening.

    So in this example, we made a special neural network that blends theory and experiment to predict optical properties of dyes. So it's a place where we have tens of thousands of simulations, and we have tens of thousands of experiments. So we put them together, and we make this neural network that is very good at predicting optical properties.

    This is just a small piece, because then comes the robot, and like I said, big shout out to Professor Jensen, because the robot itself, driven by all these pieces of AI talking to one another, achieves what we said. A generative model proposes molecules. A retrosynthesis model tells you how to make them. An actual a master controller calls the robot to execute the experiments.

    We measure the properties of these online. All properties that are amenable to this type of robotization. Other properties, if you need a synchrotron, you cannot do that in a robot. You need to go to the synchrotron.

    And then we update the models and do the same active learning or Bayesian optimization that Professor Boning just referred to, which is, we look at the models and they themselves identify the regions that combine the most informative new experiments to run, with the most optimistic outcomes you could get.

    The math of this is well-controlled. You need to decide how much you want to balance the two. But the math of how to balance exploration and exploitation is well understood. So the model itself decides, I really don't know what's going to happen if I do this, plus, this compound looks really, really performant, according to the properties you cared about.

    And in this particular case, we went around the merry-go-round three times and hundreds of molecules were made in this collaborative, obviously, science paper.

    I will close saying that we are at peak, I feel, excitement about AI for science in materials. You can see, we've seen sort of, we're at the third wave. It was a small molecules in the late 2010s. It's been proteins peaking now.

    The main tools were-- The idea that this was going to work was put out in 2018, 2019, 2020. And now we're in the materials design peak. And you can see the startups. You can see, like I said, the fact that Meta, they just put out a catalysis data set the day before yesterday. NVIDIA is making AI for chemistry.

    We're definitely at the sort of peak of interest in energy and sustainability. And I think manufacturing is the next stop. It's coming, the digital twins. I was on two proposals for the digital twin, but the ones that didn't get it. But it's clear that it's coming. That's the next frontier.

    OK, how do we hook this not just to scientific research, but to real world scaling up? I'm excited to talk about that, too. And with that, I'll thank the team, and hopefully, there's a little bit of time for questions. Thank you.

    [APPLAUSE]

    Download Transcript