Entry Date:
November 22, 2005

Segmental and Prosodic Aspects of Speech Planning \n

Principal Investigator Kenneth N Stevens

Co-investigators Joseph S Perkell , Stefanie Hufnagel


Segmental and Prosodic Aspects of Speech Planning
Models of speech production planning have to deal with many different aspects of the sound structure of spoken utterances, including how the speaker retrieves the sounds of the intended words from their long-term store in the mental lexicon, organizes the words and sounds into appropriate intonational and rhythmic structures, and determines the articulatory movements that are required to produce the sounds in a fluent, coordinated and natural-sounding way. We know that these characteristics of an utterance require a planning process, because a sequence of words in a given sentence structure does not specify them -- instead, any such sequence can be uttered in many different ways. In this project we study several aspects of the utterance planning process. First, we study the serial ordering process, which (somewhat surprisingly) is required to re-order the sounds of words into their correct locations for each new utterance, as suggested by sound-level serial ordering errors such as buddy moots for muddy boots. Second, we study the generation of intonational contours and the alignment of these contours with the words of the utterance. Finally, we study the generation of hand and head gestures that accompany the speech. Studies of sound-level serial ordering errors have shown that, in American English at least, syllables are not commonly observed as error units, while larger elements (such as morphemes) and smaller elements (such as syllable onsets and rimes, or individual segments) are; intensive experimentation is currently addressed to the role of individual articulatory gestures in these errors.

Studies of the effect of prosodic structure on systematic phonetic variation have shown that phrase-onset vowels and pitch accented word-onset vowels are significantly more likely to begin with non-modal phonation than are phrase-medial and unaccented-word vowels, and that higher-level intonational phrases show this behavior more than lower-level intonational phrases, despite a striking degree of variation among individual speakers. Studies of the alignment of gestures with spoken prosody have shown that gestures with sudden sharp stops (termed ‘hits’) are aligned with pitch accented (i.e. intonationally prominent) syllables. Moreover, hand hits align with accented syllables more accurately than head hits, which (perhaps because of the greater inertia of the head) tend to align with the syllable just after the accented syllable. Such findings provide evidence for the role of prosody in the speech production planning process, help to distinguish among competing models of the human speech production planning process, and move us closer to the goal of synthesizing natural-sounding speech from text.