Sophie and Teachable Characters: Understanding How People Teach Reinforcement-Based Learning Agents

Principal Investigator Cynthia Breazeal

Project Website http://robotic.media.mit.edu/projects/iswgc/sophie/sophie.html

As robots become a mass consumer product, they will need to learn new skills by interacting with typical human users. However, the design of machines that learn by interacting with ordinary people is a relatively neglected topic in machine learning. To address this, we advocate a systems approach that integrates machine learning into a Human-Robot Interaction (HRI) framework. \t\t\t
The first goal is to understand the nature of the teacher's input to adequately support how people want to teach. The second goal is to then incorporate these insights into standard machine learning frameworks to improve a robot's learning performance.

To contribute to each of these goals, we use a computer game framework to log and analyze interactive training sessions that human teachers have with a Reinforcement Learning (RL) agent -- called Sophie. Although RL was originally formuated for unsupervised learning, we study it because of its popularity as a technique for teaching robots and game characters new skills by giving the human access to the agent's reward signal. However, we question the implicit assumption that people shall only want to give the learner feedback on its past actions.

To explore this topic, we carried out two user studies.

First User Study -- In the initial user study, people trained Sophie to perform a novel task within a reinforcement-based learning framework. Analysis of the data yields several important lessons for how humans approach the task of teaching a RL agent: (1) they want the ability to direct the agent's attention; (2) they communicate both instrumental and motivational intentions; (3) they beneficially tailor their instruction for the agent in accordance to how it expresses its internal state; and (4) they use negative communication as both feedback for the previous action and as a suggestion for the next action (i.e., ``do over").

Second User Study -- Given these findings, we made specific modifications to Sophie and to the game interface to improve the teaching/learning interaction. Modifications included: (1) an embellished channel of communication that distinguishes between guidance, feedback, and motivational intents; (2) endowing Sophie with transparency behaviors that reveal specific aspects of its learning process; and (3) providing Sophie with a more natural reaction to negative feedback.

A second set of user studies show that these empirically informed modifications result in several learning improvements across several dimensions including the speed of task learning, the efficiency of state exploration, the understandability of the agent's learning process for the human, and a significant drop in the number of failed trials encountered during learning (which makes the agent's exploration appear more sensible to the human).

HRI meets Machine Learning -- This work demonstrates the importance of understanding the human-teacher/robot-learner system as a whole in order to design algorithms that support how people want to teach while simultaneously improving the robot's learning performance.

We present these user studies, lessons learned, and subsequent improvements to the learning agent and its game interface as empirical results to better inform and ground the design of teachable agents -- such as personal robots or interactive game characters. We believe such lessons and modifications can be made to the general class of reinforcement-based learning agents, and are not specific to the particular algorithm or character used in these studies. In doing so, we wish to broadly contribute to the creation of fun and engaging teachable robots (physical or virtual) that learn in real-time and in-situ from humans.