Entry Date:
December 22, 2016

CompCog: The Edge of the Lexicon: Productive Knowledge and Direct Experience in the Acquisition and Processing of Multiword Expressions

Principal Investigator Roger Levy

Project Start Date August 2016

Project End Date
 January 2020


Language is the most discrete, measurable cultural record of the human mind, and is uniquely expressive among the communicative systems found in nature. Every day we comprehend hundreds of sentences that we hear or read but have never encountered before, and we produce hundreds more. Yet our success at these many acts of communication belies the difficulty of the task: language is rife with ambiguity, our attention is limited, our environments may be noisy, and we often have incomplete information about the shared knowledge and beliefs of the people we engage with. This ability, unique to our species, poses profound challenges for our scientific understanding of the capabilities of the human mind. Deepening our understanding of these capabilities requires a combination of ideas and methods from linguistics, psychology, and computer science. Advances in this area help lay the groundwork for improvements in natural language technologies such as document summarization, paraphrasing, question answering, and machine translation, and in better identification, diagnosis, and treatment of language disorders.

Within this broader research enterprise, this project focuses on the "edge of the lexicon", elucidating the conditions under which a linguistic expression begins to get stored in the mind of the native speaker who uses it, and the consequences of the expression being stored as a holistic unit. Native speakers know both productive rules that license and allow interpretation of phrases and sentences that they have never before encountered and a rich inventory of lexical items that can be combined through these productive rules. Many of these lexical items are individual words, but there is evidence that specific, frequent multi-word expressions, such as "meat and potatoes" or "large majority" may also get stored in the lexicon. This project combines artificial intelligence-based computational models, large linguistic datasets, and controlled psychological experimentation to explore the edge of the lexicon, probing how direct experience with specific multi-word expressions leads to their being stored in one's mental lexicon, how such storage is reconciled with productive knowledge in language comprehension and production, and how these expressions emerge and change over time.