Professor Simon Todd's Learning by Listening: Implications for Humans and Machines
This talk describes three studies under the theme of what humans and machines can learn from passive exposure to language.
The first study explores how much information humans learn simply by being surrounded by a language they don't speak. I present behavioral experiments into what New Zealanders know about Māori, the Indigenous language, they are frequently exposed to but show know only about 70 Māori words on average. Computational and statistical modeling show that New Zealanders nevertheless have extensive knowledge of fine-grained probabilistic patterns in the way that Māori words are formed. Results such as this motivate efforts in the unsupervised learning of linguistic information by machines.
Taking that humans can learn to identify word-parts in a language on the basis of their statistical recurrence, the second study asks how a machine might go about doing the same thing. Focusing on an abstract word-formation template in Māori known as reduplication, in which a new word is formed by repeating all or part of another word, humans identify the abstraction behind reduplication fairly easily and generalize it to new words, but existing unsupervised word segmentation algorithms do not. Extending a popular algorithm to incorporate abstract reduplication templates allows it to better identify word-parts, demonstrating that language technologies can be improved by incorporating human linguistic knowledge.
The third study returns to the implicit learning of linguistic information in humans to ask how general it is, and what can facilitate or inhibit it. I replicate the first study, looking at the knowledge of Spanish among Californians and Texans who don't speak it. Non-Spanish speakers have extensive implicit knowledge of Spanish, just as non-Māori speakers did in the first study, affirming the generality of those findings. However, the extent of implicit knowledge that participants display is affected by their attitudes toward Spanish and its speakers: as participants' attitudes decline, so too do their display of implicit knowledge. Results suggest that we think carefully about how human-like we want machine learning to be.
About the Speaker
Simon Todd is an Assistant Professor in the Department of Linguistics at the University of California, Santa Barbara. His research focuses on the incredible power of passive listening for developing and accessing knowledge about language varieties and the people that speak them. For example, he has examined: how people who don't speak a language can nevertheless gain impressive implicit knowledge of its regularities by being exposed to it often; how social stereotypes based on the way that someone sounds can influence what listeners remember them saying; and how words demonstrate changes in accent over time at different rates, based on how easily they can be understood. His work delves into the rich biases associated with in-the-moment listening and explores their large-scale, long-term implications, using a combination of computational modeling, behavioral experiments, and statistical analysis of large bodies of language data.