Commitee: Lisa Davidson (co-chair), Frans Adriaans (co-chair), Shigeto Kawahara (Keio University), Gillian Gallagher, Maria Gouskova
The primary goal of this dissertation is to investigate how signal-based and knowledge-based mechanisms interact during speech production and perception. Signal-based mechanisms refer to processes that affect low-level phonetic implementation in the acoustic signal. Knowledge-based mechanisms encompass lexical knowledge as well as sublexical grammatical knowledge such as syllable structure and phonotactic probability. The focus of this dissertation is on phonotactic predictability and how phonotactic knowledge affects the production and perception of a target segment’s phonetic cues.
The case in study is Japanese high vowel devoicing, where the two high vowels of the language /i, u/ lose their phonation when between two voiceless consonants. The phonotactic distribution of Japanese is such that only one of the two high vowels can occur after certain consonants, while both high vowels can occur after others, leading to different levels of phonotactic predictability. There are two experiments and one computational modeling component to this dissertation, with a chapter devoted to each. Both experiments and the computational model utilize information from the Corpus of Spontaneous Japanese.
Chapter 1 lays the groundwork for the dissertation by discussing works related to signal-based and knowledge-based processes and works on Japanese high vowel devoicing, defining essential terminology, and delimiting the scope of research.
Chapter 2 provides an overview of works that discuss the phonotactic structure of Japanese before presenting the experiments and computational model. Japanese is well-known for its strong CV preference, but high vowel devoicing often results in consonant clusters that seemingly violate this very preference. I propose that Japanese phonotactics and high vowel devoicing apply at different phonological levels, thereby reconciling the apparent conflict.
Chapter 3 presents a production experiment which shows that phonotactically predictable high vowels are deleted as a consequence of devoicing while unpredictable high vowels retain their oral gestures to colour the burst/frication noise of the preceding consonant to aid perceptibility.
Chapter 4 presents a perception experiment that builds on the production experiment results. Similar to the production experiment results, where the amount of phonetic cues of a devoiced vowel are modulated depending on the predictability of the vowel, listeners are also shown to be attentive to high vowel cues in unpredictable environments but not in predictable environments.
Chapter 5 presents a simulated learner as an attempt to model how the process of high vowel devoicing and its effects on the perception of high vowels can be captured through data-driven constraint induction. The model is trained on data from the Corpus of Spontaneous Japanese and induces lexicon-based rules as well as lexicon less phonotactic constraints.
Chapter 6 concludes the dissertation by summarizing the findings from Chapters 2 through 5 and making suggestions for future work.