Promoting Portability in Dialogue Management and Modeling

Principal Investigator Stephanie Seneff

Co-investigator James Glass

Project Website http://www.sls.lcs.mit.edu/sls/technologies/dialogue.shtml

Dialogue modeling is a key component of every application developed by the poken Language Systems (SLS) Group. It is the role of the dialogue manager to evaluate the relevance and completenss of the user's request, retrieve the requested information from the database and format an appropriate reply in the form of a semantic frame. The dialogue utilizes an ordered set of rules to guide its actions. The execution of specific dialogue actions is dependent on the current state of a dialogue frame containing all relevant information pertaining to the user's request. The dialogue modeling tools created by the SLS group enable developers to construct complex mixed-initiative dialogue systems.

One of the hardest parts of designing and building a conversational system is configuring and coding the dialogue manager. Components such as the speech recognition and language understanding have both been modularized, with domain-specific information contained in external files and models. Speechbuilder was designed to make these two components even easier to configure, with a Web-based graphical interface to help developers write grammars and create language models for recognizers. However, dialogue management has resisted this push towards portability and modularity, since its role in planning and response generation was considered ultimately too domain-dependent.

In the course of building dialogue managers for each of our separate systems (e.g., weather, air travel, flight status, urban navigation, task delegation, and on-line shopping), we have noticed that the basic functionality repeats itself across domains. For example, each system must gather information from a user and prompt the user for critical missing pieces, and each system must have a way of filtering responses from the database to insure that they match user-specified constraints. Furthermore, certain categories of information, such as dates and times, recur in multiple domains. Users can ask for flights on "Tuesday," or about the weather "the day after tomorrow," or for the estimated landing time of a flight scheduled for "late this afternoon."

We have recently begun an effort to develop a domain-independent dialogue manager that can be used as part of the SpeechBuilder framework. This dialogue manager is being designed to enable developers to construct more complex conversational systems without modifying underlying code. We are also incorporating pre-specified grammars for semantic concepts such as dates and times, along with servers for interpreting and canonicalizing these concepts. The purpose is to give SpeechBuilder developers a pre-compiled way of understanding and representing generic concepts, drastically reducing the work required to quickly configure and deploy a conversational system.

We plan on making the generic dialogue manager part of the SpeechBuilder distribution and encouraging system developers to use it. In doing so, we expect to discover areas in which we must expand and enhance the server. We would also like to make the GUI interface to the dialogue control table accessible to SpeechBuilder developers and build on it based on that experience.