zum Inhalt springen

Vowel Hunter Documentation

VOWEL HUNTER has been developed in the context of Doris Mücke's dissertation project which deals with the transcription of vowels. The study is in progress; no results have been incorporated in VOWEL HUNTER. This version is distributed to all participants of our vowel-experiments 1999 in order to say "thanks" for their support. We hope it will serve as a tool for teaching. If you plan to use VOWEL HUNTER for research please respect the copyright and don't miss to quote the young authors. If you have any suggestions for improvementneed please feel free to contact us.

VOWEL HUNTER is a graphic extension to the Klatt formant synthesis (KLATT 1980, CMU 1995). It's designed for realtime-generation of vowels using inverse-filtered glottis signals from recorded speech. The user interface allows intuitive manipulation of center frequencies for the first two formants by mapping these two parameters onto a logarithmically scaled two dimensional grid. The third formant may be set either manually or automatically as described by LADEFOGED/HARSHMAN 1979 (within the synthesis this feature is called "Automatic F3"). The center frequencies of the fourth and fifth formant are fixed. The bandwidths of the formants are calculated as suggested in HAWKS/MILLER 1995. We used the source code of the "Klatt-style speech synthesizer implemented in C" developed by ILES/ING-SIMMONS. You can find it at CMU Artificial Intelligence Repository:

 

www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/speech/systems/klatt/0.html

There are different voice-sources available in VOWEL HUNTER. Make sure you select the right gender for every male/female voice because it will affect the center frequencies of the higher formants. It's up to the user to choose between the available voice sources or to implement new ones. The voice source may be based on an inverse filtered open vowel. It is possible to add new voice source presets by copying new files with the extension "*.dat" to the program directory. One Possibility to generate new voice sources is available in the program PRAAT developed by BOERSMA/WEENINK. For more information concerning the voice files see the file format section. You can find this software in the Internet using the following URL: fonsg3.let.uva.nl/praat/

We should make some comments concerning the formant frequencies as well as the scales. For our purposes a logarithmical scale has been sufficient. At the suggestion of Professor Peter Ladefoged we implemented a Bark-scale (TRAUNMÜLLER 1990) which has not yet been systematically tested within the program. The range of the two dimensional grid reaches from 200 to 1314 Hz for the first and from 500 to 13921 Hz for the second formant. Normally the values of the frequencies are hidden but you can fade them in by clicking on the menu entry "Formant Position". Then they will be displayed down below. The correlation of the third formant was a more difficult case but the following preset proved a success: For F3 we use a default value of 2700 Hz for male and 3000 Hz for female voices except for male [i] and male front rounded vowels: For [i] we adjusted 3100 Hz and for the latter 2300 Hz. You needn't stick to these adjustments. F3 can also be manipulated manually (slide control) or automatically (Auto F3). The upper formants are fixed with F4 = 3850 Hz and F5 = 4950 Hz.

Concerning the bandwidth of the formants we didn't use the revised wider bandwiths for female speech, because they didn't work very well within the synthesis. The sounds in the gray map as well as the preselection of the vowels are based on measurements of formant frequencies of the cardinal vowels spoken by WELLS/HOUSE 1995. They may serve as a quick orientation in the vowel space - they don't attempt at representing IPA sounds (though the IPA symbols are used in the gray map).

Short Tutorial

VOWEL HUNTER is very easy to use. Just click anywhere into the vowel space and you will notice the change of the vowel position represented by the the red dot. With its move the audible vowel quality changes, too. By moving on the horizontal axis the frequency of F1 is modified and any move on the vertical axis changes F2. Use the cursor keys to increase/decrease the values in small steps.

To save your adjustments click on the save button. The reset button brings the preset values back.

To make sure the higher center frequencies (F3) are correct first select any vowel you want to "hunt" for within the vowel selector (blue column on the left). Use the vowel selector also for your own changes. The stored vowel position and the respective F3 is recalled automatically.

As you may have noticed there are only "primary vowels" available in the vowel selector. To access other vowel types press either the tab key or click on the selected voweltype button. Don't be confused by the names "cardinal" and "native". This menu point enables you just to save two different vowel-subsets in one file regardless of their contents.

At the end of your session, you may save your changes permanently to disk by selecting the File->Save menu entry. They may be reread into memory in another session by the File->Open menu item.

For orientation and graphical representation of interdependencies of the vowels you may choose to display the various maps in the background of the vowel space. Try the entries available in the View menu. For a more detailed description of its functionality read the following decription of menu items.

Menu Entries

FileNewResets all values back to their presets.
OpenAllows the user to load his own value sets from a file.
SaveSaves the current values into a file.
Save AsSaves the current values into a file. The name of the file may be entered.
Batch ConvertConverts all vow-files in a directory into four ASCII representations named "AllFemaleCardinal.txt", "AllFemaleNative.txt", "AllMaleCardinal.txt" and "AllMaleNative.txt". This file type is importable with spreadsheet applications (e.g. Microsoft Excel (TM)). To start the conversionprocess, select any vowfile in a directory and click on open. All vowfiles in that directory will then be converted into the four ASCII-files.
ExitCloses the application.
VoiceGenderAffects the center frequencies of 3rd,4th and 5th formant. They are gender specific, so please select "male" or "female".
PlayPlays the selected vowel.
VoicesAllows selection of the current voice file for using. All Voice Files (extension: *.dat) of the program directory are listed here.
Play Wave FilePlays the wave file selected via "wave selection window".
ViewVowel TypeSwitch through the primary and secondary subset, "schwa variations" and further vowels.
Vowel ModeChoose between two subsets, for example your native and the cardinal vowels. Don't be confused by the names "cardinal" and "native". This menu point enables you just to save two different vowel-subsets in one file regardless of their contents.
Formant MapBlends in the vowel presets of the currently chosen vowel types.
User Formant MapBlends in the user defined vowels of the currently chosen vowel class.
Modified MapShows all modified vowels.
Modified VowelsSelects all modified vowels for further processing.
Gray MapShows the VOWEL HUNTER preset on a backround map for quick orientation.
Formant PositionShows the values (Hz) of the center frequencies of the first two formants in a status bar below the vowel space.
ZoomEnable ZoomEnables Zoom Mode into the vowel space. The center of the zoomed area depends on the preset of the currently selected vowel. The size of the zoom can be adjusted by the menu entry "zoom parameters".
Zoom ParametersShows a dialogue for entering the number of "clickable" positions that will be displayed on the main window area
Show GridThe grid divides the zoomed part of the vowel space in small clickable areas. It's available in the zoom mode only.
OptionsModeIn the "normal mode" the generated vowel should be a monophthong apart from the influence of pitch variation. The "diphthong mode" allows to generate transitions. It enables the user to set a second "point" in the vowel space by clicking the right mouse button. This feature isn't fully developed yet and just works over very short distances inside the vowel space.
ScaleChoose either the logarithmic Hertz scale or the Bark scale. In the Hertz scale, a step of 5 pixels changes the frequency by factor 1.025 on both axes. The frequency range is between 200 and 1314 Hertz on the x-axis and between 500 and 13,921 Hertz on the y-axis. In the Bark scale a step of 5 pixels on the x-axis changes the frequency by 0.105 Bark in the range between 2.15 Bark and 10.158 Bark. A step of 5 pixels on the y-axis changes the frequency by 0.148 Bark in the range between 5 Bark and 16.346 Bark. Different steps are used for the two axes in order to get nearly the same frequency values at the bottom left of the screen for both scale systems. Bark values are converted into Hertz values by this formula (Traunmüller 1990): Bark = (26.81*Hertz)/(1960+Hertz) - 0.53
Duration of VowelsAllows adjustment of the duration in msec. (slide control on the display)
Female TiltThis is a parameter used in the Klatt synthesis implementation. It is basically a low pass filter with a cutoff-frequency of 3 kHz. The given value describes the steepness of the filter slope in dB/octave.
Circle Mode ParametersGo through the vowel space in circles. Define radius and step width in degrees for the circular movement around the selected vowels. Use the "<" and ">" keys.
Wavelist WindowCompare recorded sounds with the sounds of the synthesis. This menu entry opens a second window for playback of recorded sound.
Advanced ControlsShows the slide controls for the vowel duration and F3.
Automatic F3 CalculationThe third formant will be calculated automatically as described by LADEFOGED/HARSHMAN 1979.
Select Temporary PathSelect the path where temporary files are stored.

File Formats

The Voice Files (*.dat)

The "Voice Files" contain a source signal used for the filter operation of the Klatt synthesis. The examples are inverse filtered voice signals. You can generate your own voice source files with a software like PRAAT. Binary audio signal file as signal source must be converted to be used as signal sources:

The Voice Files used by VOWEL HUNTER are basically ASCII files. They are placed in a "Dat"-Folder in the program directory. You may place them elsewhere, then you are asked at the beginning of the programm to locate the "Dat"-Folder. The file format is rather simple:

All values are "floating point values" represented in ASCII and separated by a carriage return code. The sampling rate should be 10 kHz according to the standard operation rate.

The first line contains the number of samples following. The second line contains a scaling factor for the samples. Just ignore this line since the program always searches for the maximum value. All following lines contain the sample values described above.

The User Data Files (*.vow)

These files are in binary format. Please use the integrated batch convert feature to acces the data.

Batch Converted Files

  • AllFemaleCardinal.txt
  • AllFemaleNative.txt
  • AllMaleCardinal.txt
  • AllMaleNative.txt

These files (ASCII files) are created by the menu File->Batch Convert. The values are seperated by semicolons and can be imported in any other program. The first line of every file contains the names of the columns and the following lines contain the names of the vowfile. Each selected vowel type with its saved centerfrequencies and lenght is listed one after the other.

Authors / Copyright

VOWEL HUNTER was written in 1999/00 by Doris Mücke, Frank Christian Stoffel and Martin Wilz at the Institute of Phonetics, University of Cologne, Germany. Please feel free to contact us for questions concerning VOWEL HUNTER. Suggestions for improvement are welcome.

doris.muecke@uni-koeln.de
chr.stoffel@uni-koeln.de
martin@wilz.de

phone: +49 - (0) 221 - 470 42 53
fax: +49 - (0) 221 - 470 59 38

Acknowledgements

We'd like to thank all the participants who joined the vowel-experiments in 1999 giving us their time and helpful advice. The results will be published soon. We also thank the Summer Institute of Linguistics for allowing us to use the SIL Encore IPA 93 fonts. By courtesy of Peter Ladefoged, John Wells, Jill House, Motoko Ueyama and Georg Sachse we were able to implement voice-sources which the user will appreciate. Our thanks are to the IPKöln staff, especially to Georg Heike, Reinhold Greisbach, Theo Klinker, Franziska Craesmeyer and Christine Riek.

References

Boersma, Paul & David Weenink. PRAAT. Institute of Phonetic Sciences of the University of Amsterdam, The Netherlands. Internet: fonsg3.let.uva.nl/praat/

Iles, Jon & Nick Ing-Simmons 1995. Klatt: A Klatt-style speech synthesizer implemented in C. CMU Artificial Intelligence Repository, 1995. Internet:
www.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/areas/speech/systems/klatt/0.html

Hawks, John & James Miller, 1995. A formant bandwidth estimation procedure for vowel synthesis. JASA 97 (2), pp. 1343-4.

Klatt, Dennis H., 1980. Software for a cascade/parallel formant synthesizer. JASA 67 (3), pp. 971-95.

Ladefoged, Peter & R. Harshman, 1979. Formant Frequencies and Movements of the Tongue. In: Frontiers of Speech Communication Research, pp.25-34.

Traunmüller, H., 1990. "Analytical expressions for the tonotopic sensory scale." In: JASA 88, pp. 97-100.

Wells, John & Jill House, 1995. The Sounds of the International Phonetic Alphabet. London: Phonetics & Linguistics U.C.L.