SpeechBuilder: A Conversational System Development Tool

Principal Investigator James Glass

Project Website http://groups.csail.mit.edu/sls//technologies/speechbuilder.html

The SpeechBuilder utility is intended to allow people unfamiliar with speech and language processing to create their own speech-based application. The focus of SpeechBuilder version 1.0 is to allow developers to specify the knowledge representation and linguistic constraints necessary to automate the design of speech recognition and natural language understanding. To do this, SpeechBuilder uses a simple web-based interface which allows a developer to describe the important semantic concepts (e.g., objects, attributes) for their application, and to show, via example sentences, what kinds of actions are capable of being performed. Once the developer has provided this information, along with the URL to their CGI-based application, they can use SpeechBuilder to automatically create their own spoken dialogue system which they, and others, can talk to in order to access information.

SpeechBuilder makes use of human language technology (HTL) (e.g., speech recognition, language understanding, system architecture, etc) developed by scientists in the Spoken Language Systems Group at the MIT Laboratory for Computer Science. Researchers there are trying to develop next-generation human language technologies which will allow users to converse naturally with computers, anywhere, anytime. In contrast to many current speech-based applications which constrain what a user can say during a dialogue, their goal is to provide much more freedom to the user in the way they talk with computers. In order to demonstrate and improve this technology, they have created several conversational systems which have been publicly deployed on toll-free telephone numbers in North America, including the widely used Jupiter system for weather forecast information, the Pegasus system for flight status information, and the more recent Mercury system for flight information and pricing. If you have not used these systems before, please try them to see how this technology works! (i.e., donate your voice to science!) If you are in the Boston area, you can visit the MIT Museum and try talking to our systems which have a display for output.

Although these applications have been successful, there are limited resources at MIT to develop a large number of new domains. In order to encourage and enable others to build their own domains, the SpeechBuilder utility was created to make it easier for HLT novices to create their own application(s), or for researchers learning about speech and language to create a prototype application which they can subsequently modify manually. If successful, this utility will benefit others by allowing them to taylor an application to their particular interests. In addition, it will facilitate the collection of a wide variety of conversational speech data which can be used to further improve the basic human language technologies used by these applications. SpeechBuilder developers will also stress the ability of HLT technology to be rapidly ported to a variety of application domains with different vocabularies, grammars, knowledge representation, discourse and dialogue structure.

We are currently improving SpeechBuilder capabilities by incorporating additional technology such as confidence scoring and concatenative speech synthesis. We are increasing the sophistication of the discourse and dialogue managers used within SpeechBuilder, so that they may be configured to handle more complex dialogues. We are also creating modules which handle common concepts such as numbers, dates, and times, so that they may be leveraged by developers for their own applications. Finally, we have also begun to develop SpeechBuilder capabilities for Japanese and Mandarin Chinese, which leverage the ongoing research efforts in multilingual conversational systems.