AI Generated Synthetic Data for Testing Software Applications: Kalyan Veeramachaneni

Conference Video|Duration: 39:02
April 2, 2025
Please login to view this video.
  • Video details

    AI Generated Synthetic Data for Testing Software Applications

    Kalyan Veeramachaneni
    Principal Research Scientist, MIT Laboratory for Information and Decision Systems

    Our world runs on software applications. More and more of these software applications are data-driven; that is, the logic of the application depends on the data that comes in, which determines which pathway is taken through the application during run time. In order to test these applications, developers need data. Currently, their options are to wait to get access to production data, to create fake data using Faker or test data management tools, or to manually generate data. 

    We set out to test an alternative: We wanted to see whether AI-generated synthetic data could help improve the supply of test data. This new paradigm involves learning a generative AI model from a very small subsample of the production data. Once the model is trained, the developer can port it to different environments, sample as much data as they want, and even sample data that fits specific conditions. 

    Training a generative AI model to create realistic data for enterprise-grade applications required a number of foundational developments, as we needed to improve generative AI’s ability to create realistic data and to handle the complexities that come with enterprise data. This included incorporating the ability to model relational databases, to capture and model data patterns pertaining to business logic, to model data based on context not available in data schemas, and to address a sprawl of data types. In this talk, I will cover how we are revolutionizing generative AI models so that they can produce data for enterprise-grade applications and datasets, and go over some recent success stories. 

    I will also cover another important aspect of generative AI that is often overlooked. Adopting generative AI algorithms at an enterprise level will require human involvement—they are not as automatic as we think, and require new innovations and deliberate, value-driven planning in order to succeed in this environment. I will also cover how MIT plays a unique role in an enterprise's AI adoption journey. In a field crowded with analysts, media, influencers and numerous other avenues that, while educational, are far from where rubber hits the road, this talk is an opportunity to get real about this technology and what it can do. 

Locked Interactive transcript
Please login to view this video.
  • Video details

    AI Generated Synthetic Data for Testing Software Applications

    Kalyan Veeramachaneni
    Principal Research Scientist, MIT Laboratory for Information and Decision Systems

    Our world runs on software applications. More and more of these software applications are data-driven; that is, the logic of the application depends on the data that comes in, which determines which pathway is taken through the application during run time. In order to test these applications, developers need data. Currently, their options are to wait to get access to production data, to create fake data using Faker or test data management tools, or to manually generate data. 

    We set out to test an alternative: We wanted to see whether AI-generated synthetic data could help improve the supply of test data. This new paradigm involves learning a generative AI model from a very small subsample of the production data. Once the model is trained, the developer can port it to different environments, sample as much data as they want, and even sample data that fits specific conditions. 

    Training a generative AI model to create realistic data for enterprise-grade applications required a number of foundational developments, as we needed to improve generative AI’s ability to create realistic data and to handle the complexities that come with enterprise data. This included incorporating the ability to model relational databases, to capture and model data patterns pertaining to business logic, to model data based on context not available in data schemas, and to address a sprawl of data types. In this talk, I will cover how we are revolutionizing generative AI models so that they can produce data for enterprise-grade applications and datasets, and go over some recent success stories. 

    I will also cover another important aspect of generative AI that is often overlooked. Adopting generative AI algorithms at an enterprise level will require human involvement—they are not as automatic as we think, and require new innovations and deliberate, value-driven planning in order to succeed in this environment. I will also cover how MIT plays a unique role in an enterprise's AI adoption journey. In a field crowded with analysts, media, influencers and numerous other avenues that, while educational, are far from where rubber hits the road, this talk is an opportunity to get real about this technology and what it can do. 

Locked Interactive transcript