Entry Date:
October 9, 2004

Spoken Lecture Processing: Speech and Language Processing of Media

Principal Investigator James Glass

Project Start Date September 2003

Project End Date
 December 2006


In the past decade, we have seen a dramatic increase in the availability of on-line academic lecture material: low-cost media and fast networks allow new and exciting ways for disseminating knowledge in a variety of media ranging from audio recordings to streaming video. These educational resources can potentially change the way people learn; students with disabilities can enhance their educational experience, professionals can keep up with recent advancements in their field, and people of all ages can satisfy their thirst for knowledge. It is conspicuous, however, that in contrast to many other communicative activities, lecture processing has until now enjoyed relatively little benefit from the development of human language technology.

Recorded lectures could be more widely and effectively disseminated if material could be automatically indexed to allow students to access selected portions of the material via web-browsers and text-based queries (e.g., "tell me about A* search"). However, existing technology is severely limited when it comes to processing lectures. Automatic speech recognition of lecture materials is often fraught with high word error rates due to specialized technical vocabulary and the lack of in-domain spoken data for training. Although speech information retrieval algorithms can operate acceptably with recognition errors, they mostly only support key-word searches. The ability to accurately capture structural information required for concept-based retrieval is beyond the reach of existing techniques for speech analysis. Thus, the goal of this research is to enable fast, accurate and easy access to lecture content by developing speech technology for spoken lecture transcription, tagging and retrieval, and ultimately automatic structure induction and summarization.