2024 MIT Digital Technology and Strategy Conference: Lightning Talk - Themis AI

-
Video details
Know What Your Model Does Not Know
Stewart Jamieson
Head of Technology, Themis AI
-
Interactive transcript
STEWART JAMIESON: Great. Hi everyone, I'm Stewart, MIT PhD. I actually just graduated in March, and I'll be showing CSAIL our technology to make AI reliable.
But first a bit about Themis. So we're an MIT spin-off from CSAIL. Our founding team includes Professor Daniela Rus, who's the director of CSAIL, as well as Dr. Alexander Amini and Elaheh Ahmadi. We're post seed. We're currently about eight employees.
One of our beta customers is one of the largest oil and gas companies in the world. They've been using our product for about a year, and they've been thrilled with it. And so our product Capsa helps you to know what your model does not know. And I'm going to explain why that matters and how you can benefit.
So globally, what motivates us? AI has a trust problem. We've seen a number of public failures or embarrassments of AI systems making mistakes. We know AI models can be more reliable and should be more reliable. And so we've developed Capsa as a way to take any AI model and help you to know when you can trust it.
So we can give you the uncertainty or the confidence of every single output generated by any model. And using these, we can help you to reduce time to market, reduce development costs. And we've designed it to be as easy to integrate into your existing workflows as possible.
So, first, just backing up for a moment, what is uncertainty and why should you care about it? So here, we took an example data set. So this is from vision. We have a multimodal solution.
And so we took an obstacle detection network used for self-driving vehicles. And this was trained on urban driving data. So on the left, there's an example of an image from a data set that this was trained on. And the middle is an example of an output from this model that's designed to look for obstacles in those images.
And so it picked up here on the cars because it was trained to do that. But we injected this deer into the image, which doesn't appear anywhere else in the training data because it's only urban scenarios. And we can see that the model does not pick up on that as an obstacle.
Now, with Capsa, you get this confidence for every single output generated by your model. So we can see in that highlighted region there's a boost because Capsa knows your model has never seen anything like that before and so you can't rely on your model's outputs in that region of the image.
And so in this case, we can tell you maybe don't go there anyway. There might be an obstacle there. Now, this is a multi-modal solution. I can also show you an example with LLMs.
So I'm sure you're all very familiar with hallucinations. We can try to prompt hallucinations, by asking misleading questions. So here we asked LLM what is the average size of tiger eggs? And for each token in the response, Capsa tells us the confidence or the uncertainty. So here we're highlighting in different shades of red or heat the risk or uncertainty in those tokens.
And so as we generate the response, we can check the risk or uncertainty level of that response and use that to detect hallucinations or unreliable outputs. And if it exceeds some kind of maximum risk level, we can use this to, say, block the answer before it ever goes to a user, if you're in a chatbot scenario, for example.
And now here's another example with LLMs. This was actually Llama of three, and we asked it for a book with characters from MIT. So base Llama three, you just get this answer. You probably don't know if you haven't read the book recently, Ender's Game, but there's no characters from MIT in it.
But with Capsa, when we get these confidence values in every token. Here, we're averaging over phrases. We can see that the LLM is very confident in its description of the book, but it's not at all confident in its answer to the question or that passage at the end about characters from MIT. So we can not only help you to recognize hallucinations, but also even where in the outputs the hallucinations are most likely to appear.
And so we believe that these capabilities should be standard in all AI systems. And so that's exactly what Capsa does. So examples I've been going over are primarily in quality control. We're helping you to reduce hallucinations, ensure the accuracy of your AI models.
But this uncertainty quantification has a number of other applications. For example, data cleaning and data curation. We can help you find mislabeled data that tends to highlight uncertainty or uncertainty tends to highlight those examples. We can help you select new data to improve your models to help you find valuable training data.
We can also help to eliminate bias. We actually have some published works on de-biasing AI models using this capability. And then anomaly detection. So when your systems are in production, we can help you to find potential errors and avoid them, mitigate failures in real time.
So we're here to look for partnerships and more proof of concepts. So please reach out to us if you'd like to see either more specific use case demos. We have some for medical insurance. I'd like to highlight, it works for any architecture, any model, any industry.
Please do reach out with us. We're happy to work with you if you're interested as a customer or partner, distributor, so on and so forth. So great, and we'll be in the next room. Thanks
-
Video details
Know What Your Model Does Not Know
Stewart Jamieson
Head of Technology, Themis AI
-
Interactive transcript
STEWART JAMIESON: Great. Hi everyone, I'm Stewart, MIT PhD. I actually just graduated in March, and I'll be showing CSAIL our technology to make AI reliable.
But first a bit about Themis. So we're an MIT spin-off from CSAIL. Our founding team includes Professor Daniela Rus, who's the director of CSAIL, as well as Dr. Alexander Amini and Elaheh Ahmadi. We're post seed. We're currently about eight employees.
One of our beta customers is one of the largest oil and gas companies in the world. They've been using our product for about a year, and they've been thrilled with it. And so our product Capsa helps you to know what your model does not know. And I'm going to explain why that matters and how you can benefit.
So globally, what motivates us? AI has a trust problem. We've seen a number of public failures or embarrassments of AI systems making mistakes. We know AI models can be more reliable and should be more reliable. And so we've developed Capsa as a way to take any AI model and help you to know when you can trust it.
So we can give you the uncertainty or the confidence of every single output generated by any model. And using these, we can help you to reduce time to market, reduce development costs. And we've designed it to be as easy to integrate into your existing workflows as possible.
So, first, just backing up for a moment, what is uncertainty and why should you care about it? So here, we took an example data set. So this is from vision. We have a multimodal solution.
And so we took an obstacle detection network used for self-driving vehicles. And this was trained on urban driving data. So on the left, there's an example of an image from a data set that this was trained on. And the middle is an example of an output from this model that's designed to look for obstacles in those images.
And so it picked up here on the cars because it was trained to do that. But we injected this deer into the image, which doesn't appear anywhere else in the training data because it's only urban scenarios. And we can see that the model does not pick up on that as an obstacle.
Now, with Capsa, you get this confidence for every single output generated by your model. So we can see in that highlighted region there's a boost because Capsa knows your model has never seen anything like that before and so you can't rely on your model's outputs in that region of the image.
And so in this case, we can tell you maybe don't go there anyway. There might be an obstacle there. Now, this is a multi-modal solution. I can also show you an example with LLMs.
So I'm sure you're all very familiar with hallucinations. We can try to prompt hallucinations, by asking misleading questions. So here we asked LLM what is the average size of tiger eggs? And for each token in the response, Capsa tells us the confidence or the uncertainty. So here we're highlighting in different shades of red or heat the risk or uncertainty in those tokens.
And so as we generate the response, we can check the risk or uncertainty level of that response and use that to detect hallucinations or unreliable outputs. And if it exceeds some kind of maximum risk level, we can use this to, say, block the answer before it ever goes to a user, if you're in a chatbot scenario, for example.
And now here's another example with LLMs. This was actually Llama of three, and we asked it for a book with characters from MIT. So base Llama three, you just get this answer. You probably don't know if you haven't read the book recently, Ender's Game, but there's no characters from MIT in it.
But with Capsa, when we get these confidence values in every token. Here, we're averaging over phrases. We can see that the LLM is very confident in its description of the book, but it's not at all confident in its answer to the question or that passage at the end about characters from MIT. So we can not only help you to recognize hallucinations, but also even where in the outputs the hallucinations are most likely to appear.
And so we believe that these capabilities should be standard in all AI systems. And so that's exactly what Capsa does. So examples I've been going over are primarily in quality control. We're helping you to reduce hallucinations, ensure the accuracy of your AI models.
But this uncertainty quantification has a number of other applications. For example, data cleaning and data curation. We can help you find mislabeled data that tends to highlight uncertainty or uncertainty tends to highlight those examples. We can help you select new data to improve your models to help you find valuable training data.
We can also help to eliminate bias. We actually have some published works on de-biasing AI models using this capability. And then anomaly detection. So when your systems are in production, we can help you to find potential errors and avoid them, mitigate failures in real time.
So we're here to look for partnerships and more proof of concepts. So please reach out to us if you'd like to see either more specific use case demos. We have some for medical insurance. I'd like to highlight, it works for any architecture, any model, any industry.
Please do reach out with us. We're happy to work with you if you're interested as a customer or partner, distributor, so on and so forth. So great, and we'll be in the next room. Thanks