Where Industry Meets Innovation

  • Contact Us
  • Privacy Policy
  • Copyright
  • Credits
  • sign in Sign In
February 28, 2015Night pic of MIT dome.


Browse News

  • View All
  • ILP News
  • MIT Research News
  • MIT Sloan Management Review
  • Technology Review

ILP Institute Insider

February 2, 2015
News Feature Thumbnail

Better Machine Learning

When Kalyan Veeramachaneni joined the Any Scale Learning For All (ALFA) group at MIT’s CSAIL as a postdoc in 2010, he worked on large-scale machine learning platforms that enable the construction of models from huge data sets. “The question then was how to decompose a learning algorithm and data into pieces, so each piece could be locally loaded into different machines and several models could be learnt independently,” says Veeramachaneni, currently a research scientist at ALFA.
Kalyan Veeramachaneni
Research Scientist
“We then had to decompose the learning algorithm so we could parallelize the compute on each node,” says Veeramachaneni. “In this way, the data on each node could be learned by the system, and then we could combine all the solutions and models that had been independently learned.”

By 2013, once ALFA had built multiple platforms to accomplish these goals, the team started on a new problem: the growing bottleneck caused by the process of translating the raw data into the formats required by most machine learning systems.

“Most machine learning systems usually require a covariates table in a column-wise format, as well as a response variable that we try to predict,” says Veeramachaneni. “The process to get these from raw data involves curation, syncing and linking of data, and even generating ideas for variables that we can then operationalize and form.”

Much of Veeramachaneni’s recent research has focused on how to automate this lengthy data prep process. “Data scientists go to all these boot camps in Silicon Valley to learn open source Big Data software like Hadoop, and they come back, and say ‘Great, but we’re still stuck with the problem of getting the raw data to a place where we can use all these tools,’” says Veeramachaneni.

Veeramachaneni and his team are also exploring how to efficiently integrate the expertise of domain experts, “so it won’t take up too much of their time,” he says. “Our biggest challenge is how to use human input efficiently, and how to make the interactions seamless and efficient. What sort of collaborative frameworks and mechanisms can we build too increase the pool of people who participate?”

GigaBeats and BeatDB
One project in which Veeramachaneni tested his automated data prep concepts was ALFA’s GigaBeats project. GigaBeats analyzes arterial blood pressure signals from thousands of patients to predict a future condition. With GigaBeats, numerous steps are involved to prepare the data for analysis, says Veeramachaneni. These include cleaning and conditioning, low pass filters, and extracting features by applying signal-level transformations.

Many of these steps involve human decision-making. Often, domain experts know how to do it, but sometimes it’s up to the computer scientist. In either case, there’s no easy way to go back and revisit those human interventions when a choice made later in the pipeline does not result in the expected level of predictive accuracy, says Veeramachaneni.

Recently, ALFA has built some novel platforms that automate the process, shrinking the prep time from months to a few days. To automate and accelerate data translation, while also enabling visibility into earlier decision-making, ALFA has developed a “complete solution” called BeatDB.

“With BeatDB, we have tunable parameters that in some cases can be input by domain experts, and the rest are automatically tuned,” says Veeramachaneni. “From this, we can learn how decisions made at the low-level, raw representation stage can impact the final predicted accuracy efficacy. This deep mining solution combines all layers of machine learning into a single pipeline and then optimizes and tunes with other machine learning algorithms on top of it. It really enables fast discovery.”

Now that ALFA has made progress on integrating and recording human input, the group is also looking for better ways to present the processed data. For example, when showing GigaBeats data to medical professionals, “they are often much more comfortable if a better representation is given to them instead of showing them raw data,” says Veeramachaneni. “It makes it easier to provide input. A lot of our focus is on improving the presentation so we can more easily pull their input into our algorithms, clean or fix the data, or create variables.”

A Crowdsourcing Solution
While automating ALFA’s machine learning pipelines, Veeramachaneni has also contributed to a number of real-world analytics projects. Recently, he has been analyzing raw click data from MOOCs (massive open online courses) with the hopes of improving courseware. The initial project is to determine stop-out (drop-out) rates based on online click behavior.

“The online learning platforms record data coming from the interaction of hundreds of thousands of learners,” says Veeramachaneni. “We are now able to identify variables that can predict stop-out on a single course. The next stage is to reveal the variables of stop-out and show how to improve the course design.”

The first challenge in the MOOC project was to organize the data. There are multiple data streams in addition to clickstream data, and they are usually spread over multiple databases and stored in multiple formats. Veeramachaneni has standardized these sources, integrating them into a single database called MOOCdb “In this way, software written on top of the database can be re-used,” says Veeramachaneni.

The next challenge is to decide what variables to look at. ALFA has explored all sorts of theories about MOOC behavior. For example, if a student is studying early in the morning, he or she is more likely to stay in the course. Another theory is based on dividing the time spent on the course by how many problems a student gets right. But, Veeramachaneni says, “If I’m trying to predict stop-out, there’s no algorithm that automatically comes up with the behavioral variables that influence it. The biggest challenge is that the variables are defined by humans, which creates a big bottleneck.”

They turned to crowdsourcing “to tap into as many people as we can,” says Veeramachaneni. “We have built a crowdsourcing platform where people can submit an idea against problems such as stop-out,” says Veeramachaneni. “Another set of people can operationalize that, such as writing a script to extract that variable on a per student basis.”

This research could apply to a number of domains where analysts are trying to predict human behavior based on captured data, such as fraud detection, says Veeramachaneni. Banks and other companies are increasingly analyzing their transaction databases to try to determine whether the person doing the transaction is authentic.

“One variable would be how far the transaction happened from the person’s home, or how the amount compares to the total that was spent by the person over the last year,” says Veeramachaneni. “Coming up with these ideas is based on very relatable data with which we can all identify. So crowdsourcing could be helpful here, too.”

Research News

February 24, 2015

Quick test for Ebola

When diagnosing a case of Ebola, time is of the essence. However, existing diagnostic tests take at least a day or two to yield results, preventing health care workers from quickly determining whether a patient needs immediate treatment and isolation.

A new test from MIT researchers could change that: The device, a simple paper strip similar to a pregnancy test, can rapidly diagnose Ebola, as well as other viral hemorrhagic fevers such as yellow fever and dengue fever.

“As we saw with the recent Ebola outbreak, sometimes people present with symptoms and it’s not clear what they have,” says Kimberly Hamad-Schifferli, a visiting scientist in MIT’s Department of Mechanical Engineering and a member of the technical staff at MIT’s Lincoln Laboratory. “We wanted to come up with a rapid diagnostic that could differentiate between different diseases.”

MIT Sloan
Management Review

February 11, 2015

The Motivating Power of Team Collaborations

“Freedom from conformity was a welcome change that enabled [employee] creativity to flourish,” write Hari Kumar and Satish Raghavendran, in the Fall 2014 issue of MIT Sloan Management Review.

In their article “Bringing Fun and Creativity to Work,” Kumar and Raghavendran detail how leaders in their own organization, Deloitte LLP, worked to boost employee engagement and promote innovation and entrepreneurship.

Companies are hungry for engagement, but they often struggle to figure out what works. “On the surface, large organizations should be able to handle the ups and downs of intelligent risk-taking,” write Kumar and Raghavendran. “In practice, however, their talent management processes often enforce conformity, legitimize mediocrity and penalize failed attempts at innovative thinking.”

Kumar and Raghavendran initiated a contest across four Deloitte LLP offices in India. They write that “employees were invited to join teams, which were asked to develop solutions to a wide range of challenging, real-life business problems.” In the program, called Maverick, teams were judged on their ability to identify critical issues, come up with solutions that were smart, challenging and practical, and present their ideas to the organization.

Like a reality TV show, each week a losing team was eliminated from the contest, while winning teams advanced. Winners received small financial rewards and a chance to work closely with senior leaders on projects such as branding exercises for new facilities.

Kumar and Raghavendran had five strategies in designing the contest:

Teams were small to promote collaboration.

Each team had four members, which minimized free-riding and allowed for constructive conflict resolution.

Teams could focus on any business problem they wanted.

“In an effort to encourage people to think creatively, Deloitte rewarded out-of-the-box, original thinking that challenged received wisdom,” write Kumar and Raghavendran. “In our experience, giving employees license to experiment can act as a powerful motivator and raise their level of contribution.”

The contest was designed to be playful.

The program “gave employees opportunities to engage and experiment,” say Kumar and Raghavendran. It was a safe place to let imaginations run a little wild.

None of the regular management hierarchy was involved.

Team members worked together without involvement from their reporting managers or supervisors. Employees were encouraged to make decisions independently and to examine status quo practices and look for creative solutions.

The broader goal was to affect corporate culture.

Kumar and Raghavendran say that the program, which launched in 2009, was repeated in 2010 and then expanded to several university campuses across India in 2012. “These moves have helped Deloitte to revitalize its brand identity in India and to recruit high-quality talent.”

As Kumar and Raghavendran note, the program “was designed to challenge the conventional view of employer-employee relationships as transactional and to find new ways to win the hearts and minds of our organization’s employees.”

And it seems that the program did just that. Over 500 participants indicated in a survey that the contest “had an extremely positive impact on the Deloitte culture,” write Kumar and Raghavendran. “As a group, the respondents valued the program most positively as a networking opportunity, a fun and engaging experience and an opportunity to engage in teamwork.”

Other companies would do well to consider implementing a similar program, they write. Maverick offered “a mechanism for confronting the cycle of complacency and low expectations within organizations that can undermine dynamism, entrepreneurship and growth.”

This article draws from “Bringing Fun and Creativity to Work,” by Hari Kumar (Deloitte LLP) and Satish Raghavendran (Deloitte Financial Advisory Services), which appeared in the Fall 2014 issue of MIT Sloan Management Review.