home · mobile · calendar · colloquia · 2010-2011 · 

Colloquium - Parisien

Finding Structure in the Muck: Bayesian Models of How Kids Learn to Use Verbs
University of Toronto

Children are fantastic data miners. In the first few years of their lives, they discover a vast amount of knowledge about their native language. This means learning not just the abstract representations that make up a language, but also learning how to generalize that knowledge to new situations -- in other words, figuring out how language is productive. Given the noise and complexity in what kids hear, this is incredibly difficult, yet still, it seems effortless. In verb learning, a lot of this generalization appears to be driven by strong regularities between form and meaning. Seeing how a certain verb has been used, kids can make a decent guess about what it means. Knowing what a verb means can suggest how to use it.

In this talk, I present a series of hierarchical Bayesian models to explain how children can acquire and generalize abstract knowledge of verbs from the language they would naturally hear. Using a large, messy corpus of child-directed speech, these models can discover a broad range of abstractions governing verb argument structure, verb classes, and alternation patterns. By simulating experimental studies in child development, I show that these complex probabilistic abstractions are robust enough to capture key generalization behaviours of children and adults. Finally, I will discuss some promising ways that the insights gained from modeling child language can benefit the development of a valuable large-scale linguistic resource, namely VerbNet.

Chris Parisien is a PhD Candidate in Computer Science at the University of Toronto, working in the Computational Linguistics group. He holds a BMath in Computer Science and Cognitive Science from the University of Waterloo and an MSc in Computer Science from Toronto. His work explores ways of using computational models to answer important questions in language development and psycholinguistics. By using nonparametric topic models to discover abstract structure in noisy, sparse corpus data, this work also considers how unsupervised learning methods can build detailed lexical resources from messy text. Chris enjoys collaborations with computer scientists, linguists, psychologists, and philosophers. He will complete his PhD in August of this year.

Hosted by Martha Palmer and James Martin.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)