Tapping Big Data for Disciplined Decisions
Bart Van Parys, assistant professor in operations research and statistics at MIT's Sloan School of Management, develops advanced analytics for operations research.
Airline flights and chemical plants are far too complicated and fast-moving for humans to schedule, so giant operations such as these depend on optimization algorithms. But are those algorithms really close to optimal, and how well will they respond to surprises or other stresses?
Such puzzles are addressed by Bart Van Parys, assistant professor in operations research and statistics at MIT's Sloan School of Management, who develops advanced analytics for operations research.
"Our bread and butter is decision-making in a disciplined way," Van Parys says. "It's not decision-making as you think about it in a boardroom where people decide out of three possibilities. With our methods, you try to let the computer run past all decisions in a way that is fast and leads to decisions that are actually sensible."
"But there's always a human in the loop," he emphasizes, "because ultimately, all decisions are based on design criteria, and you decide which criteria to include in your model."
Van Parys fell in love with the use of algorithms and math to solve engineering problems, he says, while studying electrical engineering as an undergraduate at KU Leuven in Belgium. After earning a PhD in control engineering at the Swiss Federal Institute of Technology in Zurich, he focused on operations research as a postdoc at the Sloan School, after which he joined the faculty there.
The hope is that you now can make decisions that are either more personalized or more immediate.
Traditionally, operations research applies scientific methods to decision-making by building and then optimizing models of processes and operations. "My research is moving away from using those models to using operational data, which we now can collect much more cheaply than we could 10 years ago," Van Parys says. "The hope is that you now can make decisions that are either more personalized or more immediate."
But models built on operational data can come with downsides—for instance, the data might be noisy or corrupted. "My research focuses on how to make decisions that are safeguarded against such bad data,” he says.
"You might think that this is not a very big problem,” he acknowledges. “But many more decisions are being made by data now, in self-driving cars and other uses. And many of our most powerful systems are very susceptible to data corruptions. For example, some image-based tools are very sensitive to even just changing a few pixels; they can go from being very great to being very bad."
He employs "robust optimization" mathematical techniques to safeguard models against being hypersensitive to parameters that aren't well understood. His algorithms try to solve each problem exactly if possible, or guarantee an acceptable solution if not. In airline operations, for instance, if a delay in one flight can mess up the whole schedule, that model must be adjusted for greater robustness.
"Of course, there's a cost to robustness," Van Parys says. "In a supply chain, for instance, it's a tradeoff between how robust you want to be and how much more money you want to put into keeping extra stock in the stores or scheduling a bit less tightly. But my research shows that typically adding a little bit of robustness is not that expensive and can have a very big effect."
Van Parys also develops enhanced ways to analyze high-dimensional data sets where very little of the data truly are useful. "You might think, the more the merrier," he says. "But if only a tiny fraction of that data is relevant, you must filter out the actual signal from all of this noise. And that is a quite challenging problem.”
You might think, the more the merrier. But if only a tiny fraction of that data is relevant, you must filter out the actual signal from all of this noise. And that is a quite challenging problem.
He uses "integer optimization" methods to find out which data points are relevant and which must be ignored to avoid potentially nonsensical decisions.
One case in point is cancer genomics, where analyses looking for the mutations that drive rare tumors might measure 10,000 genes for every patient. "Out of those 10,000 genes, maybe 20 to 30 might be relevant," Van Parys says. "The catch is that you don't know which ones. You have to find out of this enormous haystack the needle of those 30 that actually matter."
This is no small task; there might be as many potential combinations of genes as there are atoms in the universe, he says. Fortunately, methods are being developed that can do the job without explicitly enumerating all possibilities—in fact, finding solutions much faster than many experts would have expected.
"You can solve such problems rather quickly on modern hardware with modern algorithms," he says. "We hope that with these improvements we can push the boundaries on problems in learning with high-dimensional data that we don't know how to solve nowadays."
Among his corporate collaborations, Van Parys has worked with General Motors to analyze how options such as airbags and automated steering can help to meet a design goal of zero deaths from car accidents.
He has also worked with electrical power suppliers to make power distribution more robust, which is a rising problem in the United States.
"Most power schedules nowadays are robust in the sense that if you take any power line out, the system should recover," Van Parys says. "The problem is cascades; five power lines might all go out at once in a storm. It's very costly to prevent against this. Especially in a free market, there's an incentive to have less and less robustness, simply because it might be more expensive. And 99% of the time, it's okay."
His most successful industry projects begin with a clearly defined goal. "People come to me and say, We have this data, we have this problem, we just don't know how to connect the tool," he says.
Operations research is never a one-shot effort where you design a model, optimize the design and simply implement the solution. "There's a lot of back and forth," says Van Parys. "Ultimately you converge to something everyone is happy with; ultimately you have to convince management that whatever you do is sensible. "
At MIT, his students quickly grasp the essentials of this process. In their capstone projects with industry, the students then take on tough real-world puzzles.
"We can let the computer do the hard work and we can worry about the big picture, which frees up a lot of possibilities," Van Parys emphasizes.
"That being said, decisions based on such algorithms and data must be fair in some sense," he adds. "With more and more decisions being made this way, we are responsible to make sure that whatever we propose doesn't lead to bad decisions. We try to instill in every one of our students that with great power comes great responsibility."