Program
Monday, 7th May, 2012
09:00h - Welcome
09:10h - Argyro Katsika: Coordination of prosodic events at boundaries
In this talk, a theoretical account of boundary tones and their interaction with stress is presented, framed within Articulatory Phonology. Via an articulatory magnetometer study of Greek, the coordination of boundary tones with oral gestures is explored. A variety of boundary tones in both accented and de-accented phrase-final words, stressed on the ultima, penult or antepenult, are examined. Results show that the onset of boundary tones co-occurs with the articulatory target of the phrase-final vowel, with stress further modifying this timing. In particular, the boundary tone is initiated earlier as stress occurs earlier within the phrase-final word. A parallel effect of stress on boundary lengthening is found: the scope of boundary lengthening extends towards the stressed syllable. However, stress does not influence the timing of the pause postures that follow final vowels with respect to boundary tones. Based on these results, a gestural account is proposed, in which these complex interactions at boundaries emerge from specific coordinations among the lexical oral, lexical prosodic and phrasal prosodic events involved. Possible future research directions are discussed. [Supported by NIH NIDCD DC 008780 to Louis Goldstein]
Download PDF: Dymo_Katsika
09:40h - Susanne Fuchs: Breathing cycles during speaking and listening: Data and modelling (with Amélie Rochet-Capellan, Leonardo Lancia & Pascal Perrier)
In this talk we will introduce the largest physiological unit of prosody, the breath group (Lieberman, 1966). Based on acoustic and respiratory data, we will focus on three main topics:
(1) Breath groups and linguistic structure in spontaneous speech data
Speakers do not take breaths randomly while talking. Inhalation follows to a large extent linguistic structure. According to Conrad et al. (1983) inhalation occurs consistently at syntactic constituents in read speech. However, it is unclear to what extent this is also true in spontaneous speech. Our work will focus on spontaneous speech data of 28 German female speakers summarizing short stories. Analyses of the occurrence of breathing pauses, the number of syllables, and number of clauses (main or embedded) per breath group were carried out. Results will be described and their impact on a better understanding of speech planning will be discussed.
(2) Breathing cycles while listening to speech
Few studies have investigated the relationship between speakers’ and listeners’ breathing cycles. In dialogue, speakers’ and listeners’ breathing seems to be synchronized at the time of turn-taking (McFarland, 2001). This suggests that breathing may be crucial for turn-taking and may have an impact on the potential convergence of prosodic and rhythmic characteristics in dialogue situations. We investigated the potential breathing adaptation in listeners while listening to a male or female speaker (differing in lung volume) and to different speech conditions (loud versus normal speech). Inhalation depth and duration of the breath cycle as well as breathing frequency are discussed in relation to the speaker and the loudness condition.
(3) Modelling the synchronisation in breathing between speaker and listener
Modelling the synchronisation between speakers’ and listeners’ breathing cycles turned out to be much more challenging as we first assumed. One disadvantage of various methods is that they assume that multivariate data have either the same frequency or a constant frequency relation. In breathing cycles, however, listener-speaker synchronisation may switch from a 1-to-1 fashion to a 2-to-1 fashion to a 1-to-2 fashion over time. Therefore we adapted a specific version of wavelet analysis as well as cross-recurrence analysis to investigate the synchronisation. The methods will be introduced and results will be presented.
References:
Conrad, B., Thalacker, S. & Schönle, P. (1983) Speech respiration as an indicator of integrative contextual processing. Folia Phoniatrica 35: 220-225.
Lieberman, P. (1966) Intonation, perception and language. PhD thesis at MIT Boston.
McFarland D. H. (2001). Respiratory markers of conversational interaction. JSLHR 44: 128–143.
Download PDF: Dymo_Fuchs
10:10h - Štefan Beňuš: Temporal alignment of oral and glottal gestures under continuous prosodic variability
It is well known that discrete contrasts in prosodic structure – in terms of the strength of a prosodic boundary or the type/presence of a pitch accent – affect the organization of the units in speech production. We ask if (and how) also non-discrete and functionally low prosodic variability arising from induced continuous variation in articulatory precision and tempo affects the temporal organization of consonant and vocalic gestures in ‘abi’ and ‘iba’ sequences. Data suggest that the vicinity of a prosodic boundary (realized as either a juncture or a presence of a pitch target) induces temporal re-organization in VCV sequences. The alignment patterns between laryngeal targets and articulatory landmarks do not seem to provide a stable subject-independent pattern.
Download PDF: Dymo_Benus
10:40h - Coffee Break
11:10h - Taehong Cho: Effects of prosodic boundary and syllable structure on CV articulation and its intergestural timing in Korean
In this talk, I will report some results of an EMA study which investigates effects of prosodic boundary and syllable structure on various aspects of CV articulation in Korean. The syllable structure was manipulated by creating a two-word sequence with the consonant being either the onset of the following word (#CV) or the coda of the final syllable of the preceding word (C#V), and the prosodic boundary (#) was either an Intonational Phrase (IP) or a Prosodic Word Phrase boundary. Some of the important questions addressed in this talk are (1) how the prosodic boundary strength and the syllable structure influence spatio-temporal realization of consonantal and vocalic articulation, as reflected in kinematic parameters such as movement duration, peak velocity and displacement; (2) how the degree of the CV gestural overlap is modulated by prosodic boundary and syllable structure; (3) how the stability of CV intergestural timing is modified as a function of whether CV belongs to the same or different syllables across a lexical boundary and whether the boundary is an IP or a word boundary. The results that bear on these questions will then be further discussed under the general theory of prosodic strengthening, as well as in terms of its implications for a mass-spring gestural model of speech production.
11:40h - Cécile Fougeron & Laurianne Georgeton: Domain-initial strengthening on French vowels: interaction with phonological contrasts.
Domain-initial strengthening has mainly been studied for consonants while little is known about its effect on vowel segments.
If the basic question in this talk is to assess whether vowels also undergo boundary-induced variations, our main interest is to understand how these variations interact with phonological contrast in a dense vowel system such as the one of French.
The effect of prosodic boundary is investigated for 7 French vowels contrasting in rounding and/or height. Vowels are positioned in absolute initial position in Word, Accentual Phrase and Intonational Phrase domains. Variations in the articulation for the 3 vowel pairs contrasting in rounding (/i-y/, /e-ø/, /ɛ-oe/) are based on lip data. Variations for the 4 front vowels contrasting in height /i, e, ɛ, a/ are based on lip data and preliminary tongue results.
Articulatory variations and acoustic consequences are found to induce an enhancement of contrast between vowel pairs in initial position of higher prosodic domains.
Download PDF: Dymo_Fougeron_Georgeton
12:10h - Martine Grice, Doris Mücke & Simon Ritter: Tonal onglides and oral gestures
This talk looks at the production and perception of focus types in German. A corpus of read sentences was analysed in terms of intonational onglide (the pitch movement _onto_ the accented syllable) and offglide (the pitch movement _from_ the accented syllable), as well as in terms of lip kinematics (displacement and duration of the transvocalic opening gesture during accented syllable production). We investigate which parameters, and which combinations of parameters, lead to the perception of different focus types.
Download PDF: Dymo_Grice_Muecke_Ritter
12:40h - Adrian Simpson: Sex-specific acoustic and articulatory correlates of phonological contrast (with Melanie Weirich)
A number of differences between male and female speech can be accounted for by examining anatomical and physiological differences between male and female speakers. Longer, thicker male vocal folds vibrate at a lower natural frequency than their female counterparts. The lowering of the larynx during puberty gives rise to an adult male vocal tract which is on average 15-20% longer than that of an adult female, in turn producing formant frequencies that are lower. However, consistent durational differences between female and male speech that have repeatedly been found in languages cannot be readily explained by bio-physical factors. At first sight, female speakers appear to be producing longer durations both at a segmental as well as at an utterance level. On closer inspection, however, it would seem that durational differences result from female speakers achieving greater temporal enhancement of phonological contrast, in particular, at places of prosodic significance.
This paper describes ways in which males and females differ in their realisation of phonological contrast at an acoustic and articulatory level and evaluates explanations that can be advanced to account for the differences.
13:10h - Lunch
14:40h - Louis Goldstein: Can cognitive barriers (on prosody) act like physical barriers (on articulation)?
New methods for collection of articulatory data and the development of techniques for automatic alignment of speech with a phonetic transcription have produced an order-of- magnitude increase in the amount of available data on key phonetic observables and how they vary over subjects, utterances, and contexts. This makes it possible to begin to probe the distribution of these observables for potential structure that could provide insight into the dynamics underlying speech planning and production and the problem of learning. Distributions of large (>1000) numbers of observations of constriction degree (CD), constriction location (CL), and articulator positions for various consonants and vowels will be presented in this light, gathered from both the Wisconsin x-ray microbeam corpus and the new USC real-time MRI TIMIT corpus. One finding to be discussed is that the distributions of CD values for gestures in which the tongue contacts a physical barrier (the palate and/or teeth) are highly skewed. This is not surprising and can result from predicted saturation effects (as Fujimura proposed). Potential variations in the force with which a gesture is articulated can fail to change the distance to the palate and instead produce tissue compression. However, more surprising are examples in which prosodic events show such skewed distributions, despite the lack of physical barriers. The amount of f0 fall over the last two pitch accents of a declarative Intonational Phrase show show this kind of highly skewed distribution, as will be shown in data from both English and Spanish. F0 always falls in this context, with 0 Hz appearing to function like a barrier, but with the mode of the distribution close to 0. This skewing must emerge from some aspect of the planning process, as there is no physical barrier in this case, only a phonological/cognitive one. The cognitive nature of this boundary is shown by the companion data from imperative IPs in Spanish, which shown mean pitch falls similar to that for declaratives, but completely symmetric distributions: 0 Hz pitch fall is no barrier in this case.
Dowmload PDF: Dymo_Goldstein
15:10h - Marianne Pouplier: Articulatory correlates of syllable structure: results across languages (with Stefania Marin)
We have over the past years investigated articulatory correlates of syllable structure based on a variety of languages, including German, English, Slovak and Romanian. We present here an overview of our results. In particular, we focus on the question how the segmental effects and the internal structure of a cluster interact with general syllable-position specific timing effects.
The gestural model of syllable structure has proposed that the segmental grouping effects linguistically expressed in the syllable manifest themselves in particular articulatory timing relations. Onset effects have their basis in multiple coupling relations between prevocalic consonants and the vowel. These coupling relations comprise both in-phase and eccentric coupling modes. Codas, however, only allow for eccentric coupling modes, and only the postvocalic consonant in immediate adjacency to the nucleus is directly coupled to the nucleus. In agreement with this difference in timing relations, codas do not show the same grouping effects as onsets.
Overall, our data confirm across all languages that there is a fundamental difference between onsets and codas. This pertains to onset/coda timing to the nucleus, as much as to the internal timing of onset/coda clusters. However, timing relations are quite manifold. Particularly, differences between obstruents and liquids point to the interaction of segment- and language-specific effects with general principles of syllabic organisation. This becomes evident for instance in the different behavior of /l/ coda clusters in English, Romanian on the one hand and German on the other hand. Also the syllabic consonants of Slovak provide evidence that syllable-level timing relations may vary as a function of the segments involved. Finally, the Romanian data point to the possible role of frequency in syllable timing: Onset clusters of low lexical frequency seem to behave differently from clusters of high frequency. Interestingly, these low-frequency clusters are also typologically rare cross-linguistically. This as much as the Slovak data are in line with the idea that articulatory coordination patterns may be linked to linguistic markedness.
15:40h - Juraj Simko: Emergent phonological and prosodic patterns: An optimization account
We present a recent development of our modeling paradigm combining task dynamical implementation of ariculatory phonology with optimization approach akin to hypo- / hyper-articulation theory. Temporal details of intergestural sequencing as well as dynamical parameters of active gestures are treated as emergent from trade-offs between competing efficiency requirements imposed by articulatory, auditory and communication constraints.
Three types of phenomena, traditionally investigated under different domains of speech science, are shown to be, in principle, accounted for by our model. First, phonetic patterns – temporal details of intergestural sequencing – lawfully reflect articulatory properties of gestures involved in an utterance production. Second, phonological contrast between singletons and geminates emerges in form of two discretely distinct patterns adhering to the acoustic and articulatory characteristics of these two types of gestures. Finally, a simplified version of our model is shown to be fully compatible with task dynamic coupled-oscillator accounts used for modeling various prosodic phenomena such as stress or final lengthening. Quasi-oscilatory characteristics of speech postulated by these accounts arise in our model from a hierarchically organized set of production and perception efficiency constraints.
Download PDF: Dymo_Simko
16:10h - Coffee Break
16:40h - Stefan Kopp: Speech-gesture synchrony and multimodal prosody
Speakers structure their utterances prosodically, not only in speech but also using their non-verbal modalities. Their multimodal behavior is thus a result of each modality’s dynamic production process as well as a need to create coherent and consistent multimodal deliveries. For example, speech and co-verbal gesturing are commonly assumed to be finely coordinated with regard to temporal, semantic and pragmatics aspects.
However, a range of different definitions and operationalizations have been used for these synchrony phenomena, leading to partly inconsistent empirical findings. For example, in speech, concepts as different as lexical affiliate, co-expressive element, or prosodic nucleus have been employed. In gesture, the meaningful phase, the moment of greatest effort, acme or thrust have been assumed. Depending on these choices, different synchrony rules have been found. In addition, different coordination phenomena have not been analyzed in connection yet, as necessary for gaining a complete picture of how speech and gesture work together.
I will report results from work on modeling speech and gesture production with artificial speakers, based on a large corpus of speech and gesture use in dialogue. This will include results from analyses of speech and gesture synchronies and their translation into an operational simulation model of coordinated speech and gesture production.
Download PDF: Dymo_Kopp
17:10h - Tine Mooshammer: Speech errors around the world (with Mark Tiede, Louis Goldstein, Hosung Nam, Hansook Choi, Man Gao, Argyro Katsika, Yueh-chin Chang, Feng-fan Hsieh, Christina Hagedorn)
As has been shown in a number of EMA studies with tongue twister-like syllable sequence tasks, the majority of occurring errors are intrusive gestures varying in amplitude that are coproduced with the intended target, and phonemic substitution errors are only rarely observed (see Pouplier & Goldstein 2005, Goldstein, Pouplier et al. 2007 etc.). The first aim of this study is to investigate the role of the word-internal location of dissimilarity, i.e. whether errors occur more frequently with onset mismatches (top cop) or coda mismatches (top tock). The second aim is to compare error patterns from four languages (English, Greek, Taiwanese, Korean) that vary in coda complexity restrictions, preferred syllable structure, prosodic system and voicing contrast. The stimuli and the repetition tasks were matched for timing and for segmental content as closely as possible across the investigated languages. Our assumption is that if error patterns differ between the languages it can be attributed to differences in their linguistic systems. If not, then the observed patterns can be seen as universals.
To test this we recorded articulatory movements of the tongue, jaw and lips for 9 native speakers of American English, 3 of Greek and 4 each of Taiwanese and Korean. The task in these experiments was to repeatedly produce two-word sequences timed to an accelerating metronome beat: same-word sequences (top top), and word pairs with either different codas (top tock) or onsets (top cop). Only onset alternation was tested for Greek because of coda restrictions on monosyllabic words. Two different measures will be presented here: an intrusion, substitution and reduction count (see Goldstein et al. 2007) and the Delta measure, adapted from McMillan & Corley (2010).
English, Korean and Taiwanese all showed significantly higher intrusion and reduction rates for coda alternations than for onset alternations. The highest error rate for coda alternations was found for Taiwanese. However, due to a large degree of speaker-specific variability it is difficult to generalize across languages. Therefore we assume that the increased error rates seen in codas compared to onsets is not a language-specific phenomenon. Instead it arises from the universal tendency for coda consonants, known to exhibit more phonetic variability, to also undergo phonological assimilatory and neutralization processes more frequently than onset consonants. This asymmetry can be modeled within the coupled oscillator model (Nam et al. 2009) as a mismatch between the gestural amplitudes associated with syllable position.
Download PDF: Dymo_Mooshammer
17:40h - Eric Vatikiotis-Bateson: Computing spatial and temporal coordination using correlation map analysis
Recently, we have developed a powerful tool for computing momentary (instantaneous) correlation between signals for any range of temporal offsets between the two signals (Barbosa et al 2012 [JASA 131(3), 2162-2172]). The result is a two-dimensional map that provides a surprisingly realistic means of assessing the temporal fluctuations that occur continually in biological coordination. Our current task is to make this tool useful and evaluate that utility. In this talk, we present recent efforts to compute and evaluate the robustness of the time-varying path of “optimum” correlation using EMA data recorded simultaneously for two speakers face-to-face.
Download PDF: Dymo_Bateson_Cologne
19:00h - Barbecue (at IfL Phonetik)
Tuesday, 8th May, 2012
09:00h - Ingo Hertrich: Speech timing in the listening brain – syllables and pitch
The speech signal is a rapid sequence of acoustic events that has to be encoded under time-critical conditions. During speech perception, the speech envelope, i.e., the time course of acoustic intensity, is directly reflected in electrophysiological brain activity. Acoustic correlates of syllable onsets and of pitch periodicity show even sharper brain responses, as can be shown by cross-correlation analysis. Such phase-locking mechanisms may serve as neuronal triggers for the extraction of information-bearing elements, having in mind that information density of the speech signal is particularly high in the temporal region of syllable onsets and that the acoustic salience of formant information is highest at the beginning of pitch periods. Furthermore, Blind individuals with the ability to understand ultra-fast speech (up to ca. 20 syllables / sec) show enhanced time-locking at the level of auditory cortex and an additional right-occipital MEG component phase-locked to syllable onsets of ultra-fast speech. Based on these results and additional fMRI data, a model of speech perception can be built up showing how acoustic signal properties directly synchronize our speech understanding mechanism with the incoming signal.
Download PDF: Dymo_Hertrich
09:30h - Phil Hoole & Lasse Bombien: Articulatory correlates of prosodic boundaries: evidence from mouth and throat
Previous research has shown that prosodic structure manifests itself in many different ways in the acoustic and articulatory domains. Articulatory studies so far have focused on prosodic strengthening and lengthening effects on oral articulations. In this talk, we will present initial results of a laryngeal transillumination study of Dutch speakers. The recorded speech material was designed to include a large number of existing word-onsets. Two levels of boundary strength and two levels of accentuation were elicited using three types of carrier sentences. The results suggest that previously reported prosodic “weakening” of aspiration in Dutch voiceless stops in prosodically strong positions may be accounted for by the coordination of oral and laryngeal speech gestures. The data will also be discussed in the light of previously collected electromagnetographic recordings of word-initial German consonant clusters in varying prosodic positions. In particular, the gradedness of boundary-related lengthening effects across consonant sequences will be addressed with respect to perceptual and bio-mechanical constraints on consonant cluster production.
Download PDF: Dymo_Bombien_Hoole
10:00h - Adamantios Gafos: Dynamic invariance
One view of the relation between phonological organization and phonetic indices holds that the phonetic reflexes of different phonological organizations are fixed (e.g., in syllable-final position stops are voiceless, high vowels have low F1, syllable onsets show a specific stability pattern, and so on). This view is attractive because it makes strong predictions about the relation between phonology and phonetics. I will contrast this view with an alternative perspective, the dynamic invariance view. According to the latter, the reflexes of phonological organization need not be fixed. In the dynamic invariance view, any given phonological organization makes specific predictions about how phonetic indices change (hence, dynamic) as various non-essential parameters are scaled. The phonetic indices are allowed to change individually, but their relation remains invariant, owing to the phonological organization they instantiate. Invariance then is to be found in the distinct relations or patterns of change prescribed by the different phonological organizations, rather than in static correspondences between any given phonological organization and its expected phonetic indices. I will illustrate examples of the dynamical invariance view. I will also argue that, even though the dynamic invariance view seems like a retreat from the search for invariance or from a principled theory of the relation between phonology and phonetics, that view is in fact stronger than the view of fixed or static invariance, because it offers predictions also in circumstances where the latter view either makes no predictions or makes the wrong predictions.