Instructions for Presentation
Instructions for Oral Presentation
Oral presentation slots will be strictly limited to 25 minutes. You should aim for a presentation lasting no longer than 20 minutes, leaving 5 minutes for questions and change of speakers. You may present oral papers by connecting your own PC to projectors pre-installed in the presentation rooms. You may also use our PC pre-installed at the workshop site if you will not or cannot use your own PC. Test and confirm that your presentation is projected properly no later than the start of the session before your own.
Instructions for Poster Presentation
Each poster is presented on 1 side of a 200-centimeter wide and 100-centimeter high poster board. A standard A0-sized poster (1189mm x 841mm) will fit on the poster board in wide (landscape) format. Each poster session is 80 minutes long. Presenters must put up their poster before the start of the session. Materials to affix posters to poster boards will be provided in the poster sessions.
Technical Program
Program at a glance
[Wednesday] [Thursday] [Friday]
Wednesday 22nd September
09:00-10:00
10:00-10:10
10:10-11:00
Tutorial 1
Chair: Yoshinori Sagisaka
- T-1 "Exploration of the other aspect of Vocoder revisited, -- A-Z STRAIGHT, TANDEM-STRAIGHT and morphing --" by Hideki Kawahara
11:00-11:25
11:25-12:40
Lecture Session 1: Concatenative Speech Synthesis
Chair: Jerome Bellegarda
- L-1.1, 11:25-11:50 "Crafting small databases for unit selection TTS: effects on intelligibility" by H. Timothy Bunnell
- L-1.2, 11:50-12:15 "Composite TTS voices" by Alistair Conkie and Ann K. Syrdal
- L-1.3, 12:15-12:40 "Compression of line spectral frequency parameters using the asynchronous interpolation model" by Alexander Kain and Todd Leen
12:40-14:00
14:00-15:20
Poster Session 1 & Coffee
Chair: Heiga Zen
- P-1.1 "Implementation of VTLN for statistical speech synthesis" by Lakshmi Saheer, John Dines, Philip N. Garner, and Hui Liang
- P-1.2 "Do prosodic cues influence uncertainty perception in articulatory speech synthesis?" by Eva Lasarcyk and Charlotte Wollermann
- P-1.3 "An unified and automatic approach of Mandarin HTS system" by Yong Guan, Jilei Tian, Yi-Jian Wu, Junichi Yamagishi, and Jani Nurminen
- P-1.4 "Synthesis of listener vocalisations with imposed intonation contours" by Sathish Pammi, Marc Schroeder, Marcela Charfuelan, Oytun Turk, and Ingmar Steiner
- P-1.5 "An investigation of the impact of speech transcript errors on HMM voices" by Jinfu Ni and Hisashi Kawai
- P-1.6 "An HMM-based singing style modeling system for singing voice synthesizers" by Keijiro Saino, Makoto Tachibana, and Hideki Kenmochi
- P-1.7 "Lombard effect mimicking" by Dong-Yan Huang, Susanto Rahardja, and Ee Ping Ong
- P-1.8 "Unsupervised prosody labeling for constructing Mandarin TTS" by Chen-Yu Chiang, Sin-Horng Chen, and Yih-Ru Wang
- P-1.9 "Analysis and synthesis of hypo and hyperarticulated speech" by Benjamin Picart, Thomas Drugman, and Thierry Dutoit
- P-1.10 "Evaluating prosody in synthetic speech with online (eye-tracking) and offline (rating) methods" by Rajakrishnan Rajkumar, Michael White, Shari R. Speer, and Kiwako Ito
15:20-17:25
Lecture Session 2: Voice Conversion
Chair: Frank Soong
- L-2.1, 15:20-15:45 "GMM-PCA based speaker-timbre conversion on full-quality speech" by Fernando Villavicencio and Esteban Maestre
- L-2.2, 15:45-16:10 "Voice conversion using precise speech alignment based on spectral property and eigen-codeword distribution" by Yi-Chin Huang, Chung-Hsien Wu, Chung-Han Lee, and Yu-Ting Chao
- L-2.3, 16:10-16:35 "On transforming spectral peaks in voice conversion" by Elizabeth Godoy, Olivier Rosec, and Thierry Chonavel
- L-2.4, 16:35-17:00 "Linear transformation approaches to many-to-one voice conversion" by Chie Hayashida, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, and Kiyohiro Shikano
- L-2.5, 17:00-17:25 "HMM-based robust voice conversion using adaptive F0 quantization" by Takashi Nose and Takao Kobayashi
17:25-17:40
18:00-20:00
Welcome Reception at ATR Cafeteria (1st floor)
Thursday 23rd September
09:00-10:40
Lecture Session 3: Statistical Parametric Speech Synthesis
Chair: Simon King
- L-3.1, 09:00-09:25 "Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters" by Ranniery Maia, Heiga Zen, and Mark Gales
- L-3.2, 09:25-09:50 "From discontinuous to continuous F0 modelling in HMM-based speech synthesis" by Kai Yu, Blaise Thomson, and Steve Young
- L-3.3, 09:50-10:15 "Spectral modeling with contextual additive structure for HMM-based speech synthesis" by Shinji Takaki, Yoshihiko Nankaku, and Keiichi Tokuda
- L-3.4, 10:15-10:40 "Bayesian speech synthesis framework integrating training and synthesis processes" by Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda
10:40-11:00
11:00-12:40
Lecture Session 4: Expressive Speech Synthesis
Chair: Nick Campbell
- L-4.1, 11:00-11:25 "Symbolic vs. acoustics-based style control for expressive unit selection" by Ingmar Steiner, Marc Schroeder, Marcela Charfuelan, and Annette Klepp
- L-4.2, 11:25-11:50 "Application of expressive TTS synthesis in an advanced ECA system" by Jan Romportl, Enrico Zovato, Raul Santos, Pavel Ircing, Jose Relano Gil, and Morena Danieli
- L-4.3, 11:50-12:15 "A hidden Markov model-based approach for emotional speech synthesis" by Chih-Yung Yang and Chia-Ping Chen
- L-4.4, 12:15-12:40 "Two vocoder techniques for neutral to emotional timbre conversion" by Fabio Tesser, Enrico Zovato, Mauro Nicolao, and Piero Cosi
12:40-14:00
14:00-15:20
Poster Session 2 & Coffee
Chair: Alistair Conkie
- P-2.1 "Refined statistical model tuning for speech synthesis" by Xu Shao, Vincent Pollet, and Andrew Breen
- P-2.2 "High quality TTS voices within one day" by Didier Cadic and Christophe d'Alessandro
- P-2.3 "Nativization of English words in Spanish using analogy" by Tatyana Polyakova and Antonio Bonafonte
- P-2.4 "Automatic prosodic labeling of accent information for Japanese spoken sentences" by Asami Yamamoto, Kazuhiro Suzuki, Kook Cho, and Yoichi Yamashita
- P-2.5 "An automatic pitch model with distance function" by Mohamed Abou-Zleikha, Peter Cahill, and Julie Carson-Berndsen
- P-2.6 "Considering readability in Text-to-Speech recording script design" by Minghui Dong, Ling Cen, Paul Chan, and Haizhou Li
- P-2.7 "Letter-based speech synthesis" by Oliver Watts, Junichi Yamagishi, and Simon King
- P-2.8 "Joint prosodic and segmental unit selection for expressive speech synthesis" by Christophe Veaux, Pierre Lanchantin, and Xavier Rodet
- P-2.9 "Speech synthesis in the mobile user interface" by Pieter E. Scholtz, Justus C. Roux, and Jacques P. du Toit
15:20-17:00
Lecture Session 5: Evaluation and Applications
Chair: Gerard Bailly
- L-5.1, 15:20-15:45 "Evaluating speech synthesis intelligibility using Amazon Mechanical Turk" by Maria K. Wolters, Karl B. Isaac, and Steve Renals
- L-5.2, 15:45-16:10 "Further exploration of the possibilities and pitfalls of multidimensional scaling as a tool for the evaluation of the quality of synthesized speech" by Anna C. Janska and Robert A.J. Clark
- L-5.3, 16:10-16:35 "Handling large audio files in audio books for building synthetic voices" by Kishore Prahallad and Alan W Black
- L-5.4, 16:35-17:00 "Improving speech synthesis for noisy environments" by Gopala Krishna Anumanchipalli, Prasanna Kumar Muthukumar, Udhyakumar Nallasamy, Alok Parlikar, Alan W Black, and Brian Langner
17:30-18:00
18:00-19:00
Free time in Nara for walking around and shopping
19:00-21:00
Banquet at Nikko Nara Hotel (4th floor)
Friday 24th September
09:00-10:40
Lecture Session 6: Prosody and Conversation
Chair: Paul Taylor
- L-6.1, 09:00-09:25 "Learning speaker-specific phrase breaks for Text-to-Speech systems" by Kishore Prahallad, E. Veera Raghavendra, and Alan W Black
- L-6.2, 09:25-09:50 "Substitution of state distributions to reproduce natural prosody on HMM-based speech synthesizers" by Nobuyuki Nishizawa and Tsuneo Kato
- L-6.3, 09:50-10:15 "Utilising spontaneous conversational speech in HMM-based speech synthesis" by Sebastian Andersson, Junichi Yamagishi, and Robert Clark
- L-6.4, 10:15-10:40 "Speech acts and dialog TTS" by Ann K. Syrdal, Alistair Conkie, Yeon-Jun Kim, and Mark Beutnagel
10:40-11:00
11:00-11:50
Tutorial 2
Chair: Keiichi Tokuda
- T-2 "Speech synthesis without the right data" by Simon King
11:50-12:40
Lecture Session 7: Multi-Lingual Speech Synthesis
Chair: Alan Black
- L-7.1, 11:50-12:15 "HMM-based polyglot speech synthesis by speaker and language adaptive training" by Heiga Zen, Norbert Braunschweiler, Sabine Buchholz, Kate Knill, Sacha Krstulovic, and Javier Latorre
- L-7.2, 12:15-12:40 "Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project" by Mirjam Wester, John Dines, Matthew Gibson, Hui Liang, Yi-Jian Wu, Lakshmi Saheer, Simon King, Keiichiro Oura, Philip N. Garner, William Byrne, Yong Guan, Teemu Hirsimaki, Reima Karhila, Mikko Kurimo, Matt Shannon, Sayaka Shiota, Jilei Tian, Keiichi Tokuda, and Junichi Yamagishi
12:40-14:00
14:00-15:20
Poster Session 3 & Coffee
Chair: Kishore Prahallad
- P-3.1 "Comparison of formant enhancement methods for HMM-based speech synthesis" by Tuomo Raitio, Antti Suni, Hannu Pulakka, Martti Vainio, and Paavo Alku
- P-3.2 "EM-HTS: real-time HMM-based Malay emotional speech synthesis" by Mumtaz B. Mustafa, Raja N. Ainon, and Roziati Zainuddin
- P-3.3 "High level emotional speech morphing using STRAIGHT" by Dong-Yan Huang, Susanto Rahardja, and Ee Ping Ong
- P-3.4 "Adding speaking style to a TTS system" by Jean-Philippe Goldman, Sophie Roekhaut, and Anne Catherine Simon
- P-3.5 "Synthesizing fast speech by implementing multi-phone units in unit selection speech synthesis" by Donata Moers, Igor Jauk, Bernd Moebius, and Petra Wagner
- P-3.6 "Improved generation of prosodic features in HMM-based Mandarin speech synthesis" by Miaomiao Wang, Miaomiao Wen, Daisuke Saito, Keikichi Hirose, and Nobuaki Minematsu
- P-3.7 "An HMM-based speech synthesiser using glottal post-filtering" by Joao P. Cabral, Steve Renals, Korin Richmond, and Junichi Yamagishi
- P-3.8 "A study of lexical stress patterns in unit selection synthesis" by Yeon-Jun Kim and Mark C. Beutnagel
- P-3.9 "Automatic prominence annotation of a German speech synthesis corpus: towards prominence-based prosody generation for unit selection synthesis" by Andreas Windmann, Petra Wagner, Fabio Tamburini, Denis Arnold, and Catharine Oertel
15:20-17:00
Lecture Session 8: Selected Topics
Chair: Bernd Moebius
- L-8.1, 15:20-15:45 "Toward naturally expressive speech synthesis: data-driven emotion detection using latent affective analysis" by Jerome R. Bellegarda
- L-8.2, 15:45-16:00 "KLATTSTAT: knowledge-based parametric speech synthesis" by Gopala Krishna Anumanchipalli, Ying-Chang Cheng, Joseph Fernandez, Xiaohan Huang, Qi Mao, and Alan W Black
- L-8.3, 16:00-16:35 "Recent development of the HMM-based singing voice synthesis system - Sinsy" by Keiichiro Oura, Ayami Mase, Tomohiko Yamada, Satoru Muto, Yoshihiko Nankaku, and Keiichi Tokuda
- L-8.4, 16:35-17:00 "Photo-real lips synthesis with trajectory-guided sample selection" by Lijuan Wang, Xiaojun Qian, Wei Han, and Frank K. Soong
17:00-17:10
18:00-20:00+
Optional: Open Source Initiatives for Speech Synthesis at Keihanna Plaza Hotel (5th floor)
[Tentative program]
- 18:00-18:10 Opening remarks
- 18:10-18:15 Short presentation from Festival
- 18:15-18:20 Short presentation from HTS
- 18:20-18:25 Short presentation from Festvox voice building tools
- 18:25-18:30 Short presentation from Open Mary
- 18:30-18:35 Other tools and databases
- 18:35-18:45 Open discussion
- 18:45-20:00+ Chatting (with snacks and drinks)