Entry Date:
May 28, 2001

GALAXY: An Architecture for Conversational Speech Systems

Principal Investigator James Glass

Co-investigator Stephanie Seneff

Project Start Date September 1996

Project End Date
 August 2006


The advent of the information age places increasing demands on the notion of universal access. For information to be truly accessible to everyone (especially the technologically naive), anytime, and anywhere, we must seriously address the issue of user interface. An interface based on a user's own language is particularly appealing, because it is the most natural, flexible, and efficient means of communication among humans. Conversational systems are particularly appropriate when the information space is broad and diverse, or when the users' requests contain complex constraints.

The research and development of human language technologies has been embedded in GALAXY, a system that enables universal information access using spoken dialogue. GALAXY has a distributed, client/server architecture that shares compute servers (for speech recognition and natural language understanding) and domain servers among many users, and relies on lightweight clients for input/output. The domain servers each encapsulate some area of expertise, and are each capable of dealing with a certain set of queries. They contain general knowledge about the structure of their domain in addition to the capability of accessing specific databases. The servers interpret user requests, locate required information, and compose a suitable response. The client program provides the interface to the user. It captures audio or typed input from the user, and presents the servers' responses using graphics, text, and synthetic speech. It is our intention to minimize the computational needs of the client program, thus providing information access to the widest user population in the most affordable way.

The GALAXY system was first demonstrated in the spring of 1994. Since then, GALAXY has served as the testbed for our research and development of human language technologies, resulting in systems in different domains (e.g., automobile classified ads, restaurant guide and weather information), different languages (e.g., Mandarin Chinese and Spanish), and different access mechanisms (telephone-only or with displays). In 1996, we made our first significant architectural redesign to permit universal access via any web browser. The resulting WEBGALAXY architecture makes use of a "hub" to mediate between a Java GUI client and various compute and domain servers, dispatching messages among the various servers and maintaining a log of server activities and outputs.

In the process of developing dialogue modules for various domains in GALAXY, we came to the realization that it is critical to be able to allow researchers to easily visualize program flow through the dialogue, and to flexibly manipulate the decision-making process at the highest level. To this end, we developed a simple high-level scripting language that permits boolean and arithmetic tests on variables for decisions on the execution of particular functions. We found this mechanism to be very powerful, and were successful in incorporating it into our newest domain servers for weather and flight status information. We then began to contemplate the idea of incorporating an analogous mechanism into the program control of the entire system, which was being maintained by the GALAXY hub. In 1998, a new version of the architecture, called GALAXY-II, has been designed and implemented. In addition to serving our own needs, it has also been designated as the reference architecture for the DARPA Communicator Program, whose goal is partly to promote resource sharing and plug-and-play interoperability across multiple sites for the research and development of dialogue-based systems.

In the coming year, we will continue to refine and improve human language technology components in all areas, and to apply this technology in both existing and new application domains. One example of a new technology is in the area of dynamic vocabulary and language modeling, which would allow greater range and flexibility of coverage of our conversational systems. We will continue to make infrastructure improvements to the GALAXY architecture to enable faster prototype development in new application domains, such as flight-status information. We will develop the necessary tools and software so that application developers outside of our group will be able to develop their own applications for the DoD.