Program 2017
Monday, 17th July
19:00h - (Informal) Get-together @Hallmackenreuther
Tuesday, 18th July
09:15h - Welcome
09:30h - Martijn Wieling: “Discovering articulatory patterns using generalized additive modeling”
In this presentation I will explain how Generalized Additive Modeling (GAMs) may be used to analyze articulatory data. I will illustrate the approach with two different studies involving articulatory data. The first study focused on Dutch dialect data, whereas the second study compared articulatory trajectories of native and second-language speakers of English. Both studies show that GAMs are suitable to detect group differences in the articulatory trajectories, which are not always visible in the acoustic signal. In the final part of my presentation I will not use articulatory data, but I will analyze pronunciation variation on the basis of differences in transcriptions. The purpose of this part is to illustrate how GAMs can be used to model non-linear interactions (such as the influence of geography).
10:30h - Susanne Fuchs: “Stability and flexibility of respiratory rhythms in different speech and motor tasks”
Respiration is a vital biological rhythm in all mammals from the first cry at birth to the last sigh at death. It shows a large stability, since the absence of inhalation over ca. 10-15 minutes leads to brain dead. Apart from the stability of respiratory rhythms, temporal properties of breathing cycles and their form are flexible and adaptable to sleep-wake cycles, to changes in attention, cognition and motion.
In this talk I will focus on the adaptive nature of respiration to speech production, speech perception, and production-perception loops in face-to-face interactions. In the recent scientific literature the adaptive nature of respiration attracted increasing attention. Perspectives have slightly shifted from the role of respiration in oxygen supply and its involvement in the Central Pattern Generator to a discussion of its role as a “basis for understanding and imitating actions performed by other people”(Paccalin & Jeannerod, 2000). I will provide empirical evidence for the adaptive nature of respiration in speech communication based on a series of experiments I conduced together with my colleagues.
Furthermore, I will introduce our recent work on dual task experiments where we integrate speech in the context of body motion (rhythmical motions of the legs and arms). It shows how speech and body motion can compete for respiration and that respiratory constraints due to motion may affect linguistic structure.
Paccalin, C., and Jeannerod, M. (2000). Changes in breathing during observation of effortful actions. Brain Res. 862, 194–200.doi:10.1016/S0006-8993(00)02145-4
10:55h - Coffee
11:10h - Aude Noiray: “Coarticulatory organization in children”
In this talk, I will present recent research that examined (V)CV coarticulation in German children, spanning the preschool years when children have limited phonological knowledge and control over their speech production system to the first school year when both motor and phonological knowledge are henceforth used for literacy acquisition. I will focus on the development of lingual coarticulation, which has mainly been investigated acoustically due to practical difficulties collecting tongue data from young children. Using the technique of ultrasound imaging, we have recorded movement of the tongue during the production of pseudowords including various vocalic and consonantal contexts and tested for developmental differences in coarticulation degree between children and adults.
I will discuss results in light of current debates regarding 1) speech motor control development and, 2) the temporal domain of coarticulatory organization in the first years of language acquisition. Finally, I will try to bridge our findings with research on phonological development to show that both speech motor control and phonological development highly influence each other and should therefore be investigated in parallel.
12:10h - Ioana Chitoran & Harim Kwon: “Perception-production of “non-native” CCV and CVCV: What matters for speakers of a cluster-heavy language?”
Consonant clusters in different languages are produced with different degrees of timing lag. For instance, German and French onset clusters are produced with relatively shorter inter-consonant lag than Georgian ones. The present study examines sensitivity to these timing differences in both perception and production, across two languages: Georgian, a cluster-heavy language with longer lag, and French, a language with simpler phonotactics and shorter lag. The longer timing lag in Georgian often results in vocalic transitions (acoustic vowels) between the two consonants within a cluster. In addition to the differences in phonotactics and timing, the two languages also have opposite prominence patterns in CVCV forms: initial prominence in Georgian, final prominence in French. We report here the results of two experiments meant to test the extent to which the production-perception system can be affected by exposure to variable input.
Fourteen native speakers of Georgian participated in two experiments: shadowing and discrimination. The order of the two experiments was counterbalanced.
Experiment 1 – shadowing. Participants produced CCV and CVCV sequences by reading Georgian script (baseline), and shadowed (heard and repeated) the same sequences produced by a native speaker of French (test). The results show that the participants converged to the French model speaker in the test condition. Thus, compared to the baseline production, they produced vocalic transitions in clusters less frequently, and they produced clusters with shorter vocalic transitions. In addition, they produced “illusory clusters” when shadowing French CVCV́ in a small number of cases, and only when V1 was /ø/, similar to the quality of the vocalic transition in Georgian.
Experiment 2 – discrimination.Participants heard the same CCV and CVCV sequences in an AX discrimination task. The stimuli included the same recordings used in Experiment 1 produced by the French native speaker (test condition), and those produced by a Georgian native speaker (control condition). The participants were asked to judge whether the two sequences they heard (A and X) were the same or different. The results show that the participants confused French CøCV́ not only with CuCV́, but also with CCV.
Taken together, we claim that the effects of native language on production-perception of word-initial consonant clusters are not limited to the segmental composition of the clusters. Native speakers of cluster-heavy Georgian confused CCV without any vocalic transition with French CøCV́, suggesting that native inter-consonant timing pattern, together with prominence pattern, influences production-perception of consonant clusters.
12:35h - Lunch
14:15h - Aviad Albert & Bruno Nicenboim: “Linking sonority with periodic energy: Preliminary findings from production and perception”
Pitch intelligibility and its acoustic correlate—periodic energy—are a pair of often neglected perceptual-acoustic dimensions of speech. We present measurements of periodic energy of speech recordings that reveal a tight correlation with the sonority hierarchy, suggesting pitch intelligibility as the long-debated phonetic basis of sonority. In this talk we will focus on the following implications that this finding motivates:
1. Contrary to traditional accounts, the segmental makeup alone is not enough for an adequate description of sonority. When viewing sonority in terms of measurable periodic energy it becomes apparent that a suitable model of sonority needs to incorporate prosodic details with segmental ones: The segmental makeup determines the periodic potential, which is modulated by prosody (namely changes of intensity and duration). This prosodic contribution to the perceived strength of the periodic component is such that sonority levels of speech portions cannot be fully predictable by symbolic models of sonority that rely on a purely segmental description;
2. Sonority-based principles such as the Sonority Sequencing Principle (SSP) are useful in predicting syllabic illformedness, but they are far from providing the complete picture. Given the stance on sonority taken here, we propose an alternative model of perceptual syllabic illformedness, whereby periodic portions in the acoustic stream of speech compete for the attraction of syllabic nuclei in perception. Illformedness in this model is proportional to the degree of competition within syllables. We will present methods to derive this nucleus competition potential, which we equate with illformedness, using measurements that we apply to periodic energy curves. We will also present some preliminary data from a perceptual experiment that was designed to test the model’s illformedness predictions with CCV syllables.
3. The addition of periodicity as a relevant dimension for linguistic theory warrants more novel findings that are capable of unifying superficially separate phenomena. With this in mind, we note that periodic energy is at the center of a perceptual prominence ‘conspiracy’, whereby intensity, duration and F0 are the most commonly cited cues, which different languages employ in different ways. These apparently different and separate cues are all directly influencing, or influenced by periodic energy. Thus, periodic energy lends itself as a unified phonetic quality behind another phonological phenomenon—prominence—that puzzled generations of researchers with regards to its phonetic basis.
14:40h - Kamil Kazmierski & Andreas Baumann: “Perceptual effects of ambiguity in the long-term development of boundary-signaling consonant clusters: combining experiments and dynamical systems in (mor)phonotactic research”
Despite having vanishingly small effects on a short time scale, articulatory, perceptual and cognitive factors can greatly influence the long-term development of linguistic systems via multiple repeated and parallel production-perception loops. In linguistic research, short-term effects are usually measured by employing experimental methods, while long-term developments can be inferred from inspecting diachronic data or comparing synchronic data. Linking these two domains, i.e. predicting diachronic developments from experimental results, is often not trivial, in particular if multiple short-term factors contribute to the long-term development of linguistic items. In this paper, we demonstrate the use of relatively simple dynamical-systems models in the establishment of this link (Hofbauer & Sigmund 1998). More precisely, we show how dynamical-systems models can be informed by experimental results in order to yield long-term predictions about parts of a linguistic system that conform to what can be observed in synchronic and diachronic data.
In order to illustrate our argument, we consider the system of medial consonant diphones in Polish. Consonant diphones have been suggested to fulfill the function of signaling morpheme boundaries within words (Dressler & Dziubalska-Kołaczyk 2006; Hay & Baayen 2005). This signaling function, however, is said to be inhibited if diphone types occur across boundaries as well as morpheme-internally, thereby creating ‘ambiguous’ configurations in which the listener cannot reliably use that diphone for decomposing words into morphemes. As a consequence of this short-term cognitive effect, it has been argued that diphone inventories should diachronically evolve in such a way that diphone types occur either exclusively across morpheme-boundaries (‘low lexical probability’) or within morphemes (‘high lexical probability’) (Dressler et al. 2010).
Our approach unfolds in two steps. First, we investigate the effect that signaling ambiguity (i.e. the lexical probability of a diphone) exerts on the perception of boundary-signaling diphones in a perceptual experiment (AX discrimination task) with speakers of Polish. In a second step, we formulate a simple population dynamical model (Nowak 2000; Solé et al. 2010) in which the reproductive success of a diphone depends on its lexical probability exactly as empirically attested in the experiment. Subsequently, we use the model to simulate the parallel long-term evolution of a large set of diphone types (Geritz et al. 2002; Doebeli 2011). Finally, we compare the resulting distribution of degrees of ambiguity (i.e. lexical probabilities) in the simulated diphone inventory to that observed in empirical synchronic Polish data. We discuss the ways in which the distributions deviate from one another, as well as potential strategies for refining the underlying population-dynamical model.
References
Doebeli, M. 2011. Adaptive diversification. Princeton: Princeton University Press.
Dressler, Wolfgang U. & Katarzyna Dziubalska-Kołaczyk. 2006. Proposing Morphonotactics. Wiener Linguistische Gazette 73. 69–87.
Dressler, Wolfgang U. Katarzyna Dziubalska-Kołaczyk & Lina Pestal. 2010. Change and variation in morphonotactics. Folia Linguistica Historica 31. 51–68.
Geritz, Stefan A. H. Mats Gyllenberg, Frans J. A. Jacobs & Kalle Parvinen. 2002. Invasion dynamics and attractor inheritance. Journal of Mathematical Biology 44. 548–560.
Hay, Jennifer & Harald Baayen. 2005. Shifting paradigms: gradient structure in morphology. Trends in cognitive sciences 9. 342–348.
Hofbauer, Josef & Karl Sigmund. 1998. Evolutionary games and population dynamics. Cambridge: Cambridge University Press.
Nowak, Martin A. 2000. The basic reproductive ratio of a word, the maximum size of a lexicon. Journal of theoretical biology 204(2). 179–189.
Solé, Ricard V. Bernat Corominas-Murtra & Jordi Fortuny. 2010. Diversity, competition, extinction: the ecophysics of language change. Journal of The Royal Society Interface 7(53). 1647–1664.
15:05h - Christopher Carignan: “Gradiency in (mis)perception of articulatory co-variation: Naïve listener imitations of native articulatory strategies”
Although a traditional characterization of vowel nasality involves the somewhat implicit assumption that vowel nasalization is binary (i.e., nasal vowels are nasal, oral vowels are not), both phonetic and phonological vowel nasalization has been shown to involve changes to the shape of the oral tract, as well as an increase in breathy voicing, in addition to velo-pharyngeal coupling. Evidence suggests that these “non-nasal” articulatory modifications enhance nasalization in both production (they have similar acoustic effects to nasalization) and perception (they are integrated with nasality in listener perception). Thus, it has been suggested that these cases of articulatory co-variation have arisen diachronically due to mis-perception by listeners, or due to phonetic enhancement by speakers, or perhaps due to both. Although previous research has established the relations between these co-varying articulations in both the acoustic and perceptual domains, it has yet to be shown how these articulations interact at the interface between the two domains at the level of the individual. In this talk, I will present preliminary analyses of synchronous acoustic, ultrasound, nasalance, and EGG data related to naïve Australian English speakers’ imitations of native Southern French nasal and oral vowel productions. The results suggest that these naïve listeners-turned-speakers do, indeed, exhibit evidence of mis-perception of the underlying native articulations, but that these effects are highly gradient and speaker-dependent.
15:30h - Coffee
15:55h - Gabriel Mindlin: “A dynamical system´s approach to birdsong production”
Birdsong production is a complex behavior that emerges when a highly specialized peripheral vocal organ, the syrinx, is driven by a set of well-coordinated physiological instructions. These are generated by a neural circuitry, which is reasonably well characterized. In this presentation, I will describe a computational model whose variables are the average activities of different neural nuclei of the song system of oscine birds. Two of the variables are linked to the air sac pressure and the tension of the labia during canary song production. I will show that these time dependent gestures are capable of driving a model of the vocal organ to synthesize realistic canary like songs. I will also discuss a road map for extending this research program to the problem of human voice production.
16:55h - Mark Tiede & Louis Goldstein: “A mutual power analysis of speech error data”
The production of metronome-driven CVC sequences shows an imbalance in observed error rates when alternation is in onset (e.g. “cop top”) vs. coda (e.g. “top tock”). Kinematic data from nine AE speakers aggregated across instrusions, reductions, and substitutions show rates of 5.4% for onset and 13.7% for coda respectively, or more than twice as often (Mooshammer et al., 2010). Here we use a cross-wavelet analysis of mutual power between TR, TT, and LA trajectories to show that power in the alternation frequency is higher for alternating onset than coda sequences. This is consistent with the frequency locking account of gestural intrusions in repetitive production (Goldstein et al., 2007): the higher mutual power between articulators observed in onsets at the disyllabic alternation frequency confers stability that helps the sequence resist the simplest attractor in which all articulators are inappropriately oscillating every syllable.
17:20h - Reminiscing about Eric Vatikiotis-Bateson
18:30h - BBQ & Live music @IfL Phonetik
Wednesday, 19th July
10:00h - Doug Whalen: “Characteristics and usefulness of phonetic variability”
Speech is well-known to be quite variable, and this variability has both impeded and informed theoretical and practical endeavors for decades. In this talk, I will outline aspects of the consistency of variability within speakers; re-examine the possible differences in variability in acoustics vs. articulation; and explore the possibility that variability has usefulness in establishing and maintaining flexibility both in production and in understanding the speech of others. New analysis methods that indicate that some variability at the kinematic level may indicate increased, rather than decreased, control at a higher level will be discussed. Overall, the new means of collecting and analyzing large amounts of data open new avenues for understanding variability in speech.
11:00h - Adamantios Gafos & Stephan Kuberski: “Kinematics of repetitive speech movements”
A standard model of isolated speech movements is thought to be a linear second order system (Saltzman and Munhall, 1989). Several variations to this model have been proposed which claim to render isolated as well as sequences of speech movements more accurately. There are also proposals from other areas of motor control which aim to render both non-repetitive and repetitive sequences of not specifically speech movements. We used the Harvard/Haskins database of regularly-timed speech (Patel et al., 1999) to extract kinematic relations of jaw and lower lip movements in opening and closing /b, m/ gestures. These relations are then compared with the via-simulation predicted relations from linear and nonlinear models of the speech gesture as well as autonomous and nonautonomous two-dimensional proposed models. In simulations, topological differences with regard to the kind and the number of the model-specific attractors (fixed points, limit cycles) are considered. It is shown that, for the experimental data in this work, none of these models are able to completely account for the data. Relative time to peak velocity (RTTP) values of closing movements tend to be higher than values predicted by any fixed point model. Those models which simulation results indicate are able to generate higher values of RTTP exhibit other incompatible kinematic relations (e.g., strongly nonlinear peak velocity vs. movement amplitude relation). We conclude with a discussion of the extent to which the task at hand uniquely determines the dynamical regime (repeated isolated gestures vs. limit cycle) underlying the observed performance. We also review results from our own work with acquiring similar data, under more controlled conditions (specifically speech rate controlled), in an attempt to address the nature of the dynamical regime underlying the movements.
11:25h - Coffee
11:40h - Marianne Pouplier & Phil Hoole: “Consonant timing around the world – data from seven languages”
In this talk, we aim to further our understanding of the range of consonant coarticulation or overlap patterns found cross-linguistically. It has long been known that languages differ in coarticulation patterns, yet there is little systematic cross-linguistic work which would allow us to gain an estimate of the range of possible patterns. Such knowledge is important, however, for understanding the relationship between physiological constraints and linguistic diversity and the degree to which coarticulation is learned. The extent to which segmentally identical consonant clusters truly differ in their coarticulation between languages is far from clear since direct cross-linguistic comparisons are rare. Methodological differences between independently conducted experiments limit the reliability of meta-studies, especially since articulatory timing measures are highly sensitive to data treatment. In the current study, we use articulography data from seven languages (American English, German, French, Romanian, Polish, Russian, Georgian), comparing the degree of consonant overlap across languages and clusters. First results suggest that languages differ in the range of coarticulatory patterns one can observe: While all languages included in our sample have relatively high overlap for some of the clusters, only some of the languages show low overlap patterns also. Thus certain clusters seem to favor high-overlap patterns cross-linguistically while others allow for a relatively high degree of cross-linguistic variation.
12:05h - Taehong Cho: “Articulatory studies on preboundary lengthening in American English and Korean”
In this talk I will present some preliminary articulatory data (obtained with an EMA) on kinematic characteristics of preboundary lengthening in American English and Korean. The study on American English examined distribution of preboundary lengthening as a function of prominence (lexical stress and accent) in di- and tri-syllabic pseudo words in American English, and the study on Korean examined distribution of preboundary lengthening as a function of focus and the phonetic content (i.e., intrinsic vowel duration) in di- and tri-syllabic pseudo words. Preliminary results from 10 speakers of American English indicated that preboundary lengthening was modulated by the degree of prominence—i.e., the less prominent, the more PBL, being accompanied by the lack of preboundary lengthening when the last syllable was most prominent (stressed and accented), a kind of ceiling effect. Preliminary results from 11 speakers of Korean indicated that preboundary lengthening was distributed quite sporadically over the non-final syllables with a tendency towards progressively attenuating preboundary lengthening, and that the effect appeared to be conditioned by the phonetic content—i.e., when the final syllable had an intrinsically shorter vowel (/i/), preboundary lengthening tended to be extended more to the preceding syllable. Kinematically, both English and Korean data showed that preboundary lengthening was accompanied by a faster and larger movement, showing some kind of prosodic strengthening similar to hyperarticulation due to prominence. The results will be further discussed in terms of their implications for dynamical theories.
12:30h - Tine Mooshammer, Malte Belz & Oksana Rasskazova: “How final is final? Kinematic aspects of phrase-final and utterance-final lengthening”
Phrasal structure within utterances is signaled by tonal variation, pauses, phrase-initial strengthening and final lengthening. The latter has been seen as localized speech rate reduction, slowing down at the end of a phrase. For German two kinds of phrase boundaries are generally assumed, namely the intermediate phrase (ip) and intonational phrase (IP) boundaries (see e.g. Grice et al. 2005). In this study we compare acoustical measures and kinematic characteristics of pre-pausal gestures in phrase-final vs. utterance-final boundaries. In both contexts there is an IP boundary but they differ in whether there is continuation of an utterance (phrase-final boundary) or finality (utterance-final boundary). By means of EMA articulatory movements of 8 native speakers of German were recorded while reading aloud. Three types of prosodic boundaries were elicited: phrase-medial, phrase-final and utterance-final. Kinematic parameters such as gesture duration, displacement and peak velocity showed only minor and inconsistent differences between the two final positions.
Grice, M., S. Baumann & R. Benzmüller (2005). German Intonation in Autosegmental-Metrical Phonology. In: Jun, Sun-Ah (ed.) Prosodic Typology: The Phonology of Intonation and Phrasing. Oxford University Press. 55-83.