


default search action
INTERSPEECH/EUROSPEECH 2005: Lisbon, Portugal
- 9th European Conference on Speech Communication and Technology, INTERSPEECH-Eurospeech 2005, Lisbon, Portugal, September 4-8, 2005. ISCA 2005
Keynote Papers
- Graeme M. Clark:
The multiple-channel cochlear implant: interfacing electronic technology to human consciousness. 1-4
Speech Recognition - Language Modelling I-III
- Yik-Cheung Tam, Tanja Schultz:
Dynamic language model adaptation using variational Bayes inference. 5-8 - Vidura Seneviratne, Steve J. Young:
The hidden vector state language model. 9-12 - Shinsuke Mori, Gakuto Kurata:
Class-based variable memory length Markov model. 13-16 - Alexander Gruenstein, Chao Wang, Stephanie Seneff:
Context-sensitive statistical language modeling. 17-20 - Chao Wang, Stephanie Seneff, Grace Chung:
Language model data filtering via user simulation and dialogue resynthesis. 21-24 - Jen-Tzung Chien, Meng-Sung Wu, Chia-Sheng Wu:
Bayesian learning for latent semantic analysis. 25-28
Prosody in Language Performance I, II
- Daniel Hirst, Caroline Bouzon:
The effect of stress and boundaries on segmental duration in a corpus of authentic speech (british English). 29-32 - Tomoko Ohsuga, Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa:
Investigation of the relationship between turn-taking and prosodic features in spontaneous dialogue. 33-36 - Michiko Watanabe, Keikichi Hirose, Yasuharu Den, Nobuaki Minematsu:
Filled pauses as cues to the complexity of following phrases. 37-40 - Katrin Schneider, Bernd Möbius:
Perceptual magnet effect in German boundary tones. 41-44 - Angela Grimm, Jochen Trommer:
Constraints on the acquisition of simplex and complex words in German. 45-48 - Julien Meyer:
Whistled speech: a natural phonetic description of languages adapted to human perception and to the acoustical environment. 49-52
Spoken Language Extraction / Retrieval I, II
- Olivier Siohan, Michiel Bacchiani:
Fast vocabulary-independent audio search using path-based graph indexing. 53-56 - John Makhoul, Alex Baron, Ivan Bulyko, Long Nguyen, Lance A. Ramshaw, David Stallard, Richard M. Schwartz, Bing Xiang:
The effects of speech recognition and punctuation on information extraction performance. 57-60 - Ciprian Chelba, Alex Acero
:
Indexing uncertainty for spoken document search. 61-64 - Tomoyosi Akiba, Hiroyuki Abe:
Exploiting passage retrieval for n-best rescoring of spoken questions. 65-68 - BalaKrishna Kolluru, Heidi Christensen
, Yoshihiko Gotoh:
Multi-stage compaction approach to broadcast news summarisation. 69-72 - Chien-Lin Huang, Chia-Hsin Hsieh, Chung-Hsien Wu:
Audio-video summarization of TV news using speech recognition and shot change detection. 73-76
The Blizzard Challenge 2005
- Alan W. Black, Keiichi Tokuda:
The blizzard challenge - 2005: evaluating corpus-based speech synthesis on common datasets. 77-80 - Shinsuke Sakai, Han Shu:
A probabilistic approach to unit selection for corpus-based speech synthesis. 81-84 - John Kominek, Christina L. Bennett, Brian Langner, Arthur R. Toth:
The blizzard challenge 2005 CMU entry - a method for improving speech synthesis systems. 85-88 - H. Timothy Bunnell, Christopher A. Pennington, Debra Yarrington, John Gray:
Automatic personal synthetic voice construction. 89-92 - Heiga Zen, Tomoki Toda:
An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005. 93-96 - Wael Hamza, Raimo Bakis, Zhiwei Shuang, Heiga Zen:
On building a concatenative speech synthesis system from the blizzard challenge speech databases. 97-100 - Robert A. J. Clark, Korin Richmond, Simon King:
Multisyn voices from ARCTIC data for the blizzard challenge. 101-104 - Christina L. Bennett:
Large scale evaluation of corpus-based synthesizers: results and lessons from the blizzard challenge 2005. 105-108
New Applications
- Berlin Chen, Yi-Ting Chen, Chih-Hao Chang, Hung-Bin Chen:
Speech retrieval of Mandarin broadcast news via mobile devices. 109-112 - Michiaki Katoh, Kiyoshi Yamamoto, Jun Ogata, Takashi Yoshimura, Futoshi Asano, Hideki Asoh, Nobuhiko Kitawaki:
State estimation of meetings by information fusion using Bayesian network. 113-116 - Roger K. Moore:
Results from a survey of attendees at ASRU 1997 and 2003. 117-120 - Reinhold Haeb-Umbach, Basilis Kladis, Joerg Schmalenstroeer:
Speech processing in the networked home environment - a view on the amigo project. 121-124 - Masahide Sugiyama:
Fixed distortion segmentation in efficient sound segment searching. 125-128 - Tin Lay Nwe, Haizhou Li:
Identifying singers of popular songs. 129-132 - Jun Ogata, Masataka Goto:
Speech repair: quick error correction just by using selection operation for speech input interfaces. 133-136 - Dirk Olszewski, Fransiskus Prasetyo, Klaus Linhard:
Steerable highly directional audio beam loudspeaker. 137-140 - Hassan Ezzaidi, Jean Rouat:
Automatic music genre classification using second-order statistical measures for the prescriptive approach. 141-144 - Alberto Abad, Dusan Macho, Carlos Segura, Javier Hernando, Climent Nadeu:
Effect of head orientation on the speaker localization performance in smart-room environment. 145-148 - Corinne Fredouille, Gilles Pouchoulin, Jean-François Bonastre, M. Azzarello, Antoine Giovanni, Alain Ghio:
Application of automatic speaker recognition techniques to pathological voice assessment (dysphonia). 149-152 - Upendra V. Chaudhari, Ganesh N. Ramaswamy, Edward A. Epstein, Sasha Caskey, Mohamed Kamal Omar:
Adaptive speech analytics: system, infrastructure, and behavior. 153-156
E-learning and Spoken Language Processing
- Katherine Forbes-Riley, Diane J. Litman:
Correlating student acoustic-prosodic profiles with student learning in spoken tutoring dialogues. 157-160 - Diane J. Litman, Katherine Forbes-Riley:
Speech recognition performance and learning in spoken dialogue tutoring. 161-164 - Satoshi Asakawa, Nobuaki Minematsu, Toshiko Isei-Jaakkola, Keikichi Hirose:
Structural representation of the non-native pronunciations. 165-168 - Fu-Chiang Chou:
Ya-ya language box - a portable device for English pronunciation training with speech recognition technologies. 169-172 - Akinori Ito, Yen-Ling Lim, Motoyuki Suzuki, Shozo Makino:
Pronunciation error detection method based on error rule clustering using a decision tree. 173-176 - Abhinav Sethy, Shrikanth S. Narayanan, Nicolaus Mote, W. Lewis Johnson:
Modeling and automating detection of errors in Arabic language learner speech. 177-180 - Felicia Zhang, Michael Wagner:
Effects of F0 feedback on the learning of Chinese tones by native speakers of English. 181-184
E-inclusion and Spoken Language Processing I, II
- Tom Brøndsted, Erik Aaskoven:
Voice-controlled internet browsing for motor-handicapped users. design and implementation issues. 185-188 - Briony Williams, Delyth Prys, Ailbhe Ní Chasaide:
Creating an ongoing research capability in speech technology for two minority languages: experiences from the WISPR project. 189-192 - Anestis Vovos, Basilis Kladis, Nikolaos D. Fakotakis:
Speech operated smart-home control system for users with special needs. 193-196 - Takatoshi Jitsuhiro, Shigeki Matsuda, Yutaka Ashikari, Satoshi Nakamura, Ikuko Eguchi Yairi
, Seiji Igi:
Spoken dialog system and its evaluation of geographic information system for elderly persons' mobility support. 197-200 - Daniele Falavigna, Toni Giorgino, Roberto Gretter:
A frame based spoken dialog system for home care. 201-204
Acoustic Processing for ASR I-III
- Matthias Wölfel:
Frame based model order selection of spectral envelopes. 205-208 - Vivek Tyagi, Christian Wellekens, Hervé Bourlard:
On variable-scale piecewise stationary spectral analysis of speech signals for ASR. 209-212 - Arlo Faria, David Gelbart:
Efficient pitch-based estimation of VTLN warp factors. 213-216 - Yanli Zheng, Richard Sproat, Liang Gu, Izhak Shafran, Haolang Zhou, Yi Su, Daniel Jurafsky, Rebecca Starr, Su-Youn Yoon:
Accent detection and speech recognition for Shanghai-accented Mandarin. 217-220 - Loïc Barrault, Renato de Mori, Roberto Gemello, Franco Mana, Driss Matrouf:
Variability of automatic speech recognition systems using different features. 221-224 - Slavomír Lihan, Jozef Juhár, Anton Cizmar:
Crosslingual and bilingual speech recognition with Slovak and Czech speechdat-e databases. 225-228 - Carmen Peláez-Moreno, Qifeng Zhu, Barry Y. Chen, Nelson Morgan:
Automatic data selection for MLP-based feature extraction for ASR. 229-232 - Thilo Köhler, Christian Fügen, Sebastian Stüker, Alex Waibel:
Rapid porting of ASR-systems to mobile devices. 233-236 - Hugo Meinedo, João Paulo Neto:
A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ANN models. 237-240 - Etienne Marcheret, Karthik Visweswariah, Gerasimos Potamianos:
Speech activity detection fusing acoustic phonetic and energy features. 241-244 - Zoltán Tüske, Péter Mihajlik
, Zoltán Tobler, Tibor Fegyó:
Robust voice activity detection based on the entropy of noise-suppressed spectrum. 245-248 - Masamitsu Murase, Shun'ichi Yamamoto, Jean-Marc Valin, Kazuhiro Nakadai, Kentaro Yamada, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Multiple moving speaker tracking by microphone array on mobile robot. 249-252
Speech Recognition - Adaptation I, II
- Yaxin Zhang, Bian Wu, Xiaolin Ren, Xin He:
A speaker biased SI recognizer for embedded mobile applications. 253-256 - Bart Bakker, Carsten Meyer, Xavier L. Aubert:
Fast unsupervised speaker adaptation through a discriminative eigen-MLLR algorithm. 257-260 - Rusheng Hu, Jian Xue, Yunxin Zhao:
Incremental largest margin linear regression and MAP adaptation for speech separation in telemedicine applications. 261-264 - Giulia Garau, Steve Renals, Thomas Hain
:
Applying vocal tract length normalization to meeting recordings. 265-268 - Srinivasan Umesh, András Zolnay, Hermann Ney:
Implementing frequency-warping and VTLN through linear transformation of conventional MFCC. 269-272 - Xiaodong Cui, Abeer Alwan:
MLLR-like speaker adaptation based on linearization of VTLN with MFCC features. 273-276 - Chandra Kant Raut, Takuya Nishimoto, Shigeki Sagayama:
Model adaptation by state splitting of HMM for long reverberation. 277-280 - Daben Liu, Daniel Kiecza, Amit Srivastava, Francis Kubala:
Online speaker adaptation and tracking for real-time speech recognition. 281-284 - Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa:
Automatic speech recognition based on adaptation and clustering using temporal-difference learning. 285-288 - Hui Ye, Steve J. Young:
Improving the speech recognition performance of beginners in spoken conversational interaction for language learning. 289-292 - Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Rapid unsupervised speaker adaptation based on multi-template HMM sufficient statistics in noisy environments. 293-296 - Dong-jin Choi, Yung-Hwan Oh:
Rapid speaker adaptation for continuous speech recognition using merging eigenvoices. 297-300
Signal Analysis, Processing and Feature Estimation I-III
- Jian Liu, Thomas Fang Zheng, Jing Deng, Wenhu Wu:
Real-time pitch tracking based on combined SMDSF. 301-304 - András Bánhalmi, Kornél Kovács, András Kocsor, László Tóth:
Fundamental frequency estimation by least-squares harmonic model fitting. 305-308 - Siu Wa Lee, Frank K. Soong, Pak-Chung Ching:
Harmonic filtering for joint estimation of pitch and voiced source with single-microphone input. 309-312 - Marián Képesi, Luis Weruaga:
High-resolution noise-robust spectral-based pitch estimation. 313-316 - John-Paul Hosom:
F0 estimation for adult and children's speech. 317-320 - Ben Milner, Xu Shao, Jonathan Darch:
Fundamental frequency and voicing prediction from MFCCs for speech reconstruction from unconstrained speech. 321-324 - Nelly Barbot, Olivier Boëffard, Damien Lolive:
F0 stylisation with a free-knot b-spline model and simulated-annealing optimization. 325-328 - Friedhelm R. Drepper:
Voiced excitation as entrained primary response of a reconstructed glottal master oscillator. 329-332 - Damien Vincent, Olivier Rosec, Thierry Chonavel:
Estimation of LF glottal source parameters based on an ARX model. 333-336 - Leigh D. Alsteris, Kuldip K. Paliwal:
Some experiments on iterative reconstruction of speech from STFT phase and magnitude spectra. 337-340 - R. Muralishankar, Abhijeet Sangwan, Douglas D. O'Shaughnessy:
Statistical properties of the warped discrete cosine transform cepstrum compared with MFCC. 341-344 - Aníbal J. S. Ferreira:
New signal features for robust identification of isolated vowels. 345-348 - Jonathan Pincas, Philip J. B. Jackson:
Amplitude modulation of frication noise by voicing saturates. 349-352 - Ron M. Hecht, Naftali Tishby:
Extraction of relevant speech features using the information bottleneck method. 353-356 - Mohammad Firouzmand, Laurent Girin, Sylvain Marchand:
Comparing several models for perceptual long-term modeling of amplitude and phase trajectories of sinusoidal speech. 357-360 - Hynek Hermansky, Petr Fousek:
Multi-resolution RASTA filtering for TANDEM-based ASR. 361-364 - Woojay Jeon, Biing-Hwang Juang:
A category-dependent feature selection method for speech signals. 365-368 - Trausti T. Kristjansson, Sabine Deligne, Peder A. Olsen:
Voicing features for robust speech detection. 369-372
Robust Speech Recognition I-IV
- Svein Gunnar Pettersen, Magne Hallstein Johnsen, Tor André Myrvoll:
Joint Bayesian predictive classification and parallel model combination for robust speech recognition. 373-376 - Glauco F. G. Yared, Fábio Violaro, Lívio C. Sousa:
Gaussian elimination algorithm for HMM complexity reduction in continuous speech recognition systems. 377-380 - Luis Buera, Eduardo Lleida, Antonio Miguel, Alfonso Ortega:
Robust speech recognition in cars using phoneme dependent multi-environment linear normalization. 381-384 - Yi Chen, Lin-Shan Lee:
Energy-based frame selection for reliable feature normalization and transformation in robust speech recognition. 385-388 - Yoshitaka Nakajima, Hideki Kashioka, Kiyohiro Shikano, Nick Campbell:
Remodeling of the sensor for non-audible murmur (NAM). 389-392 - Amarnag Subramanya, Jeff A. Bilmes, Chia-Ping Chen:
Focused word segmentation for ASR. 393-396
Speech Perception I, II
- Jennifer A. Alexander, Patrick C. M. Wong, Ann R. Bradlow:
Lexical tone perception in musicians and non-musicians. 397-400 - Joan K.-Y. Ma, Valter Ciocca, Tara L. Whitehill:
Contextual effect on perception of lexical tones in Cantonese. 401-404 - Hansjörg Mixdorff, Yu Hu, Denis Burnham:
Visual cues in Mandarin tone perception. 405-408 - Hansjörg Mixdorff, Yu Hu:
Cross-language perception of word stress. 409-412 - Anne Cutler:
The lexical statistics of word recognition problems caused by L2 phonetic confusion. 413-416 - Chun-Fang Huang, Masato Akagi:
A multi-layer fuzzy logical model for emotional speech perception. 417-420
Spoken Language Understanding I, II
- Ian R. Lane, Tatsuya Kawahara:
Utterance verification incorporating in-domain confidence and discourse coherence measures. 421-424 - Constantinos Boulis, Mari Ostendorf:
Using symbolic prominence to help design feature subsets for topic classification and clustering of natural human-human conversations. 425-428 - Katsuhito Sudoh, Hajime Tsukada:
Tightly integrated spoken language understanding using word-to-concept translation. 429-432 - Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Vaibhava Goel, Yuqing Gao:
Exploiting unlabeled data using multiple classifiers for improved natural language call-routing. 433-436 - Hong-Kwang Jeff Kuo, Vaibhava Goel:
Active learning with minimum expected error for spoken language understanding. 437-440 - Matthias Thomae, Tibor Fábián, Robert Lieb, Günther Ruske:
Lexical out-of-vocabulary models for one-stage speech interpretation. 441-444
E-inclusion and Spoken Language Processing I, II
- Mark S. Hawley, Phil D. Green, Pam Enderby
, Stuart P. Cunningham, Roger K. Moore
:
Speech technology for e-inclusion of people with physical disabilities and disordered speech. 445-448 - Björn Granström:
Speech technology for language training and e-inclusion. 449-452 - Roger C. F. Tucker, Ksenia Shalonova:
Supporting the creation of TTS for local language voice information systems. 453-456 - Ove Andersen, Christian Hjulmand:
Access for all - a talking internet service. 457-460 - Knut Kvale, Narada D. Warakagoda:
A speech centric mobile multimodal service useful for dyslectics and aphasics. 461-464
Paralinguistic and Nonlinguistic Information in Speech
- Nick Campbell, Hideki Kashioka, Ryo Ohara:
No laughing matter. 465-468 - Christophe Blouin, Valérie Maffiolo:
A study on the automatic detection and characterization of emotion in a voice service context. 469-472 - Raul Fernandez, Rosalind W. Picard:
Classical and novel discriminant features for affect recognition from speech. 473-476 - Jaroslaw Cichosz, Krzysztof Slot:
Low-dimensional feature space derivation for emotion recognition. 477-480 - Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita:
Proposal of acoustic measures for automatic detection of vocal fry. 481-484 - Khiet P. Truong, David A. van Leeuwen:
Automatic detection of laughter. 485-488 - Anton Batliner, Stefan Steidl, Christian Hacker, Elmar Nöth, Heinrich Niemann:
Tales of tuning - prototyping for automatic classification of emotional user states. 489-492 - Iker Luengo, Eva Navas, Inmaculada Hernáez, Jon Sánchez:
Automatic emotion recognition using prosodic parameters. 493-496 - Sungbok Lee, Serdar Yildirim, Abe Kazemzadeh, Shrikanth S. Narayanan:
An articulatory study of emotional speech production. 497-500 - Gregor Hofer, Korin Richmond, Robert A. J. Clark:
Informed blending of databases for emotional speech synthesis. 501-504 - Fabio Tesser, Piero Cosi, Carlo Drioli, Graziano Tisato:
Emotional FESTIVAL-MBROLA TTS synthesis. 505-508 - Felix Burkhardt:
Emofilt: the simulation of emotional speech by prosody-transformation. 509-512 - Andrew Rosenberg, Julia Hirschberg:
Acoustic/prosodic and lexical correlates of charismatic speech. 513-516 - Yoko Greenberg, Minoru Tsuzaki, Hiroaki Kato, Yoshinori Sagisaka:
Communicative speech synthesis using constituent word attributes. 517-520 - Angelika Braun, Matthias Katerbow:
Emotions in dubbed speech: an intercultural approach with respect to F0. 521-524 - Nicolas Audibert, Véronique Aubergé, Albert Rilliard:
The prosodic dimensions of emotion in speech: the relative weights of parameters. 525-528 - Susanne Schötz:
Stimulus duration and type in perception of female and male speaker age. 529-532 - Cecilia Ovesdotter Alm, Richard Sproat:
Perceptions of emotions in expressive storytelling. 533-536 - Hideki Kawahara, Alain de Cheveigné, Hideki Banno, Toru Takahashi, Toshio Irino:
Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT. 537-540 - Tomoko Yonezawa, Noriko Suzuki, Kenji Mase, Kiyoshi Kogure:
Gradually changing expression of singing voice based on morphing. 541-544
Issues in Large Vocabulary Decoding
- I. Lee Hetherington:
A multi-pass, dynamic-vocabulary approach to real-time, large-vocabulary speech recognition. 545-548 - George Saon, Daniel Povey, Geoffrey Zweig:
Anatomy of an extremely fast LVCSR decoder. 549-552 - Dong Yu, Li Deng, Alex Acero:
Evaluation of a long-contextual-Span hidden trajectory model and phonetic recognizer using a* lattice search. 553-556 - Takaaki Hori, Atsushi Nakamura:
Generalized fast on-the-fly composition algorithm for WFST-based speech recognition. 557-560 - Hiroaki Nanjo, Teruhisa Misu, Tatsuya Kawahara:
Minimum Bayes-risk decoding considering word significance for information retrieval system. 561-564 - Arthur Chan, Mosur Ravishankar, Alexander I. Rudnicky:
On improvements to CI-based GMM selection. 565-568 - Dominique Massonié, Pascal Nocera, Georges Linarès:
Scalable language model look-ahead for LVCSR. 569-572 - Miroslav Novak:
Memory efficient approximative lattice generation for grammar based decoding. 573-576 - Dong-Hoon Ahn, Su-Byeong Oh, Minhwa Chung:
Improved semi-dynamic network decoding using WFSTs. 577-580 - Janne Pylkkönen:
New pruning criteria for efficient decoding. 581-584 - Tibor Fábián, Robert Lieb, Günther Ruske, Matthias Thomae:
A confidence-guided dynamic pruning approach - utilization of confidence measurement in speech recognition. 585-588
Spoken Language Extraction / Retrieval I, II
- Toru Taniguchi, Akishige Adachi, Shigeki Okawa, Masaaki Honda, Katsuhiko Shirai:
Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals. 589-592 - Gabriel Murray, Steve Renals, Jean Carletta:
Extractive summarization of meeting recordings. 593-596 - Arjan van Hessen, Jaap Hinke:
IR-based classification of customer-agent phone calls. 597-600 - Benoît Favre, Frédéric Béchet, Pascal Nocera:
Mining broadcast news data: robust information extraction from word lattices. 601-604 - Mikko Kurimo, Ville T. Turunen:
To recover from speech recognition errors in spoken document retrieval. 605-608 - Edgar González, Jordi Turmo:
Unsupervised clustering of spontaneous speech documents. 609-612 - Masahide Yamaguchi, Masaru Yamashita, Shoichi Matsunaga:
Spectral cross-correlation features for audio indexing of broadcast news and meetings. 613-616 - Chiori Hori, Alex Waibel:
Spontaneous speech consolidation for spoken language applications. 617-620 - Sameer Maskey, Julia Hirschberg:
Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization. 621-624 - Te-Hsuan Li, Ming-Han Lee, Berlin Chen, Lin-Shan Lee:
Hierarchical topic organization and visual presentation of spoken documents using probabilistic latent semantic analysis (PLSA) for efficient retrieval/browsing applications. 625-628 - Janez Zibert, France Mihelic, Jean-Pierre Martens, Hugo Meinedo, João Paulo Neto, Laura Docío Fernández, Carmen García-Mateo, Petr David, Jindrich Zdánský, Matús Pleva, Anton Cizmar, Andrej Zgank, Zdravko Kacic, Csaba Teleki, Klára Vicsi:
The COST278 broadcast news segmentation and speaker clustering evaluation - overview, methodology, systems, results. 629-632 - Igor Szöke, Petr Schwarz, Pavel Matejka, Lukás Burget, Martin Karafiát, Michal Fapso, Jan Cernocký:
Comparison of keyword spotting approaches for informal continuous speech. 633-636 - Teruhisa Misu, Tatsuya Kawahara:
Dialogue strategy to clarify user's queries for document retrieval system with speech interface. 637-640 - Nicolas Moreau, Shan Jin, Thomas Sikora:
Comparison of different phone-based spoken document retrieval methods with text and spoken queries. 641-644
Signal Analysis, Processing and Feature Estimation I-III
- Pedro Gómez, Francisco Díaz Pérez, Agustín Álvarez Marquina, Rafael Martínez, Victoria Rodellar, Roberto Fernández-Baíllo, Alberto Nieto, Francisco J. Fernandez:
PCA of perturbation parameters in voice pathology detection. 645-648 - Anindya Sarkar, T. V. Sreenivas:
Dynamic programming based segmentation approach to LSF matrix reconstruction. 649-652 - T. Nagarajan, Douglas D. O'Shaughnessy:
Explicit segmentation of speech based on frequency-domain AR modeling. 653-656 - Petr Motlícek, Lukás Burget, Jan Cernocký:
Non-parametric speaker turn segmentation of meeting data. 657-660 - Petri Korhonen, Unto K. Laine:
Unsupervised segmentation of continuous speech using vector autoregressive time-frequency modeling errors. 661-664 - P. Vijayalakshmi, M. Ramasubba Reddy:
The analysis on band-limited hypernasal speech using group delay based formant extraction technique. 665-668 - Jindrich Zdánský, Jan Nouza:
Detection of acoustic change-points in audio records via global BIC maximization and dynamic programming. 669-672 - Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu:
Multi-band approach of audio source discrimination with empirical mode decomposition. 673-676 - Minoru Tsuzaki, Satomi Tanaka, Hiroaki Kato, Yoshinori Sagisaka:
Application of auditory image model for speech event detection. 677-680 - José Anibal Arias:
Unsupervised identification of speech segments using kernel methods for clustering. 681-684 - Georgios Evangelopoulos, Petros Maragos:
Speech event detection using multiband modulation energy. 685-688 - John Kominek, Alan W. Black:
Measuring unsupervised acoustic clustering through phoneme pair merge-and-split tests. 689-692 - Fabio Valente, Christian Wellekens:
Variational Bayesian speaker change detection. 693-696 - Sarah Borys, Mark Hasegawa-Johnson:
Distinctive feature based SVM discriminant features for improvements to phone recognition on telephone band speech. 697-700 - P. Vijayalakshmi, M. Ramasubba Reddy:
Detection of hypernasality using statistical pattern classifiers. 701-704 - Luis Weruaga, Marián Képesi:
Self-organizing chirp-sensitive artificial auditory cortical model. 705-708 - Sotiris Karabetsos, Pirros Tsiakoulis, Stavroula-Evita Fotinea, Ioannis Dologlou:
On the use of a decimative spectral estimation method based on eigenanalysis and SVD for formant and bandwidth tracking of speech signals. 709-712 - Alexei V. Ivanov, Marek Parfieniuk, Alexander A. Petrovsky:
Frequency-domain auditory suppression modelling (FASM) - a WDFT-based anthropomorphic noise-robust feature extraction algorithm for speech recognition. 713-716
Keynote Papers
- Fernando C. N. Pereira:
Linear models for structure prediction. 717-720
Speech Recognition - Language Modelling I-III
- Chuang-Hua Chueh, To-Chang Chien, Jen-Tzung Chien:
Discriminative maximum entropy language model for speech recognition. 721-724 - Maximilian Bisani, Hermann Ney:
Open vocabulary speech recognition with flat hybrid models. 725-728 - Minwoo Jeong, Jihyun Eun, Sangkeun Jung, Gary Geunbae Lee:
An error-corrective language-model adaptation for automatic speech recognition. 729-732 - Shiuan-Sung Lin, François Yvon:
Discriminative training of finite state decoding graphs. 733-736 - Holger Schwenk, Jean-Luc Gauvain:
Building continuous space language models for transcribing european languages. 737-740 - Peng Xu, Lidia Mangu:
Using random forest language models in the IBM RT-04 CTS system. 741-744
Spoken Language Acquisition, Development and Learning I, II
- Willemijn Heeren:
Perceptual development of the duration cue in dutch /a-a: /. 745-748 - Hong You, Abeer Alwan, Abe Kazemzadeh, Shrikanth S. Narayanan:
Pronunciation variations of Spanish-accented English spoken by young children. 749-752 - Willemijn Heeren:
L2 development of quantity perception: dutch listeners learning Finnish /t-t: /. 753-756 - Claudio Zmarich, Serena Bonifacio:
Phonetic inventories in Italian children aged 18-27 months: a longitudinal study. 757-760 - Hiroko Hirano, Goh Kawai:
Pitch patterns of intonational phrases and intonational phrase groups in native and non-native speech. 761-764 - Rebecca Hincks:
Measuring liveliness in presentation speech. 765-768
Multi-modal / Multi-media Processing I, II
- Nick Campbell:
Non-verbal speech processing for a communicative agent. 769-772 - Stuart N. Wrigley
, Guy J. Brown:
Physiologically motivated audio-visual localisation and tracking. 773-776 - Jing Huang, Daniel Povey:
Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition. 777-780 - Graziano Tisato, Piero Cosi, Carlo Drioli, Fabio Tesser:
INTERFACE: a new tool for building emotive/expressive talking heads. 781-784 - Pascual Ejarque, Javier Hernando:
Variance reduction by using separate genuine- impostor statistics in multimodal biometrics. 785-788 - Volker Schubert, Stefan W. Hamerich:
The dialog application metalanguage GDialogXML. 789-792 - Jonas Beskow, Mikael Nordenberg:
Data-driven synthesis of expressive visual speech using an MPEG-4 talking head. 793-796 - Oytun Türk, Marc Schröder, Baris Bozkurt, Levent M. Arslan:
Voice quality interpolation for emotional text-to-speech synthesis. 797-800 - Murtaza Bulut, Carlos Busso, Serdar Yildirim, Abe Kazemzadeh, Chul Min Lee, Sungbok Lee, Shrikanth S. Narayanan:
Investigating the role of phoneme-level modifications in emotional speech resynthesis. 801-804 - Björn W. Schuller, Ronald Müller, Manfred K. Lang, Gerhard Rigoll:
Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. 805-808 - Jonghwa Kim, Elisabeth André
, Matthias Rehm, Thurid Vogt, Johannes Wagner:
Integrating information from speech and physiological signals to achieve emotional sensitivity. 809-812 - Ellen Douglas-Cowie, Laurence Devillers, Jean-Claude Martin, Roddy Cowie, Suzie Savvidou, Sarkis Abrilian, Cate Cox:
Multimodal databases of everyday emotion: facing up to complexity. 813-816
Spoken / Multi-modal Dialogue Systems I, II
- Francisco Torres, Emilio Sanchis, Encarna Segarra:
Learning of stochastic dialog models through a dialog simulation technique. 817-820 - Lesley-Ann Black, Michael F. McTear, Norman D. Black, Roy Harper, Michelle Lemon:
Evaluating the DI@l-log system on a cohort of elderly, diabetic patients: results from a preliminary study. 821-824 - Pavel Král, Christophe Cerisara, Jana Klecková:
Combination of classifiers for automatic recognition of dialog acts. 825-828 - Xiaojun Wu, Thomas Fang Zheng, Michael Brasser, Zhanjiang Song:
Rapidly developing spoken Chinese dialogue systems with the d-ear SDS SDK. 829-832 - Daniela Oria, Akos Vetek:
Robust algorithms and interaction strategies for voice spelling. 833-836 - Ioannis Toptsis, Axel Haasch, Sonja Hwel, Jannik Fritsch, Gernot A. Fink:
Modality integration and dialog management for a robotic assistant. 837-840 - Norbert Reithinger, Daniel Sonntag:
An integration framework for a mobile multimodal dialogue system accessing the semantic web. 841-844 - Ryuichi Nisimura, Akinobu Lee, Masashi Yamada, Kiyohiro Shikano:
Operating a public spoken guidance system in real environment. 845-848 - Esa-Pekka Salonen, Markku Turunen, Jaakko Hakulinen, Leena Helin, Perttu Prusi, Anssi Kainulainen:
Distributed dialogue management for smart terminal devices. 849-852 - Jaakko Hakulinen, Markku Turunen, Esa-Pekka Salonen:
Visualization of spoken dialogue systems for demonstration, debugging and tutoring. 853-856 - César González Ferreras, Valentín Cardeñoso-Payo:
Development and evaluation of a spoken dialog system to access a newspaper web site. 857-860 - Olivier Pietquin, Richard Beaufort:
Comparing ASR modeling methods for spoken dialogue simulation and optimal strategy learning. 861-864 - Shiu-Wah Chu, Ian M. O'Neill, Philip Hanna, Michael F. McTear:
An approach to multi-strategy dialogue management. 865-868 - Anna Hjalmarsson:
Towards user modelling in conversational dialogue systems: a qualitative study of the dynamics of dialogue parameters. 869-872 - Kouichi Katsurada, Kazumine Aoki, Hirobumi Yamada, Tsuneo Nitta:
Reducing the description amount in authoring MMI applications. 873-876 - Kazunori Komatani, Naoyuki Kanda, Tetsuya Ogata, Hiroshi G. Okuno:
Contextual constraints based on dialogue models in database search task for spoken dialogue systems. 877-880 - Mihai Rotaru, Diane J. Litman:
Using word-level pitch features to better predict student emotions during spoken tutoring dialogues. 881-884 - Antoine Raux, Brian Langner, Dan Bohus, Alan W. Black, Maxine Eskénazi:
Let's go public! taking a spoken dialog system to the real world. 885-888 - Shinya Fujie, Kenta Fukushima, Tetsunori Kobayashi:
Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system. 889-892 - Kallirroi Georgila, James Henderson, Oliver Lemon:
Learning user simulations for information state update dialogue systems. 893-896 - Darío Martín-Iglesias, Yago Pereiro-Estevan, Ana I. García-Moral, Ascensión Gallardo-Antolín, Fernando Díaz-de-María:
Design of a voice-enabled interface for real-time access to stock exchange from a PDA through GPRS. 897-900 - William Schuler, Timothy A. Miller:
Integrating denotational meaning into a DBN language model. 901-904 - Louis ten Bosch:
Improving out-of-coverage language modelling in a multimodal dialogue system using small training sets. 905-908 - Olivier Galibert, Gabriel Illouz, Sophie Rosset:
Ritel: an open-domain, human-computer dialog system. 909-912
Robust Speech Recognition I-IV
- Reinhold Haeb-Umbach, Joerg Schmalenstroeer:
A comparison of particle filtering variants for speech feature enhancement. 913-916 - Ilyas Potamitis, Nikolaos D. Fakotakis:
Enhancement of mel log-power spectrum of speech using particle filtering. 917-920 - Makoto Shozakai, Goshu Nagino:
Improving robustness of speech recognition performance to aggregate of noises by two-dimensional visualization. 921-924 - Woohyung Lim, Bong Kyoung Kim, Nam Soo Kim:
Feature compensation based on switching linear dynamic model and soft decision. 925-928 - Shilei Huang, Xiang Xie, Jingming Kuang:
Using output probability distribution for improving speech recognition in adverse environment. 929-932 - Eric H. C. Choi:
A generalized framework for compensation of mel-filterbank outputs in feature extraction for robust ASR. 933-936 - Hesham Tolba, Zili Li, Douglas D. O'Shaughnessy:
Robust automatic speech recognition using a perceptually-based optimal spectral amplitude estimator speech enhancement algorithm in various low-SNR environments. 937-940 - Stephen So, Kuldip K. Paliwal:
Improved noise-robustness in distributed speech recognition via perceptually-weighted vector quantisation of filterbank energies. 941-944 - Babak Nasersharif, Ahmad Akbari:
Sub-band weighted projection measure for robust sub-band speech recognition. 945-948 - Jianping Deng, Martin Bouchard, Tet Hin Yeap:
Noise compensation using interacting multiple kalman filters. 949-952 - Veronique Stouten, Hugo Van hamme, Patrick Wambacq:
Kalman and unscented kalman filter feature enhancement for noise robust ASR. 953-956 - Chia-Yu Wan, Lin-Shan Lee:
Histogram-based quantization (HQ) for robust and scalable distributed speech recognition. 957-960 - Yong-Joo Chung:
A data-driven approach for the model parameter compensation in noisy speech recognition. 961-964 - Satoshi Kobashikawa, Satoshi Takahashi, Yoshikazu Yamaguchi, Atsunori Ogawa:
Rapid response and robust speech recognition by preliminary model adaptation for additive and convolutional noise. 965-968 - Saurabh Prasad, Stephen A. Zahorian:
Nonlinear and linear transformations of speech features to compensate for channel and noise effects. 969-972 - Motoyuki Suzuki, Yusuke Kato, Akinori Ito, Shozo Makino:
Construction method of acoustic models dealing with various background noises based on combination of HMMs. 973-976 - Haitian Xu, Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg:
Robust speech recognition based on noise and SNR classification - a multiple-model framework. 977-980 - Hwa Jeon Song, Hyung Soon Kim:
Eigen-environment based noise compensation method for robust speech recognition. 981-984 - Martin Graciarena, Horacio Franco, Gregory K. Myers, Victor Abrash:
Robust feature compensation in nonstationary and multiple noise environments. 985-988 - Jasha Droppo, Alex Acero:
Maximum mutual information SPLICE transform for seen and unseen conditions. 989-992 - Sven E. Krüger, Martin Schafföner, Marcel Katz, Edin Andelic, Andreas Wendemuth:
Speech recognition with support vector machines in a hybrid system. 993-996 - Vincent Barreaud, Douglas D. O'Shaughnessy, Jean-Guy Dahan:
Experiments on speaker profile portability. 997-1000 - Daniele Colibro, Luciano Fissore, Claudio Vair, Emanuele Dalmasso, Pietro Laface:
A confidence measure invariant to language and grammar. 1001-1004 - Ken Schutte, James R. Glass:
Robust detection of sonorant landmarks. 1005-1008
Speech Production I
- Amélie Rochet-Capellan, Jean-Luc Schwartz:
The labial-coronal effect and CVCV stability during reiterant speech production: an acoustic analysis. 1009-1012 - Amélie Rochet-Capellan, Jean-Luc Schwartz:
The labial-coronal effect and CVCV stability during reiterant speech production: an articulatory analysis. 1013-1016 - Mitsuhiro Nakamura:
Articulatory constraints and coronal stops: an EPG study. 1017-1020 - Vincent Robert, Brigitte Wrobel-Dautcourt, Yves Laprie, Anne Bonneau:
Strategies of labial coarticulation. 1021-1024 - Jianwu Dang, Jianguo Wei, Takeharu Suzuki, Pascal Perrier:
Investigation and modeling of coarticulation during speech. 1025-1028 - Fang Hu:
Tongue kinematics in diphthong production in Ningbo Chinese. 1029-1032 - Takayuki Arai:
Comparing tongue positions of vowels in oral and nasal contexts. 1033-1036 - Slim Ouni:
Can we retrieve vocal tract dynamics that produced speech? toward a speaker articulatory strategy model. 1037-1040 - Pascal Perrier, Liang Ma, Yohan Payan:
Modeling the production of VCV sequences via the inversion of a biomechanical model of the tongue. 1041-1044 - Xiaochuan Niu, Alexander Kain, Jan P. H. van Santen:
Estimation of the acoustic properties of the nasal tract during the production of nasalized vowels. 1045-1048 - Kohichi Ogata:
A web-based articulatory speech synthesis system for distance education. 1049-1052 - Paavo Alku, Matti Airas, Tom Bäckström, Hannu Pulakka:
Group delay function as a means to assess quality of glottal inverse filtering. 1053-1056 - Eva Björkner, Johan Sundberg, Paavo Alku:
Subglottal pressure and NAQ variation in voice production of classically trained baritone singers. 1057-1060 - Gunnar Fant, Anita Kruckenberg:
Covariation of subglottal pressure, F0 and intensity. 1061-1064 - Javier Pérez, Antonio Bonafonte:
Automatic voice-source parameterization of natural speech. 1065-1068 - Chakir Zeroual, John H. Esling, Lise Crevier-Buchman:
Physiological study of whispered speech in Moroccan Arabic. 1069-1072 - Carla P. Moura
, D. Andrade, Luis M. Cunha, Maria J. Cunha, Helena Vilarinho, Henrique Barros
, Diamantino Freitas, M. Pais-Clemente:
Voice quality in down syndrome children treated with rapid maxillary expansion. 1073-1076 - Julien Hanquinet, Francis Grenez, Jean Schoentgen:
Synthesis of disordered speech. 1077-1080 - Julie Fontecave, Frédéric Berthommier:
Quasi-automatic extraction of tongue movement from a large existing speech cineradiographic database. 1081-1084 - Shimon Sapir, Ravit Cohen Mimran:
The working memory token test (WMTT): preliminary findings in young adults with and without dyslexia. 1085-1088 - Sérgio Paulo
, Luís C. Oliveira:
Reducing the corpus-based TTS signal degradation due to speaker's word pronunciations. 1089-1092 - Wai-Sum Lee:
A phonetic study of the "er-hua" rimes in Beijing Mandarin. 1093-1096
Acoustic Processing for ASR I-III
- Li Deng, Dong Yu, Alex Acero:
Learning statistically characterized resonance targets in a hidden trajectory model of speech coarticulation and reduction. 1097-1100 - Daniil Kocharov, András Zolnay, Ralf Schlüter, Hermann Ney:
Articulatory motivated acoustic features for speech recognition. 1101-1104 - Shinji Watanabe, Atsushi Nakamura:
Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition. 1105-1108 - Yu Tsao, Jinyu Li, Chin-Hui Lee:
A study on separation between acoustic models and its applications. 1109-1112 - Mohamed Afify:
Extended baum-welch reestimation of Gaussian mixture models based on reverse Jensen inequality. 1113-1116 - Asela Gunawardana, Milind Mahajan, Alex Acero, John C. Platt:
Hidden conditional random fields for phone classification. 1117-1120
Signal Analysis, Processing and Feature Estimation I-III
- Francesco Gianfelici, Giorgio Biagetti, Paolo Crippa, Claudio Turchetti:
Asymptotically exact AM-FM decomposition based on iterated hilbert transform. 1121-1124 - Athanassios Katsamanis, Petros Maragos:
Advances in statistical estimation and tracking of AM-FM speech components. 1125-1128 - Jonathan Darch, Ben P. Milner, Saeed Vaseghi:
Formant frequency prediction from MFCC vectors in noisy environments. 1129-1132 - S. R. Mahadeva Prasanna, B. Yegnanarayana:
Detection of vowel onset point events using excitation information. 1133-1136 - João P. Cabral, Luís C. Oliveira:
Pitch-synchronous time-scaling for prosodic and voice quality transformations. 1137-1140 - Yasunori Ohishi, Masataka Goto, Katunobu Itou, Kazuya Takeda:
Discrimination between singing and speaking voices. 1141-1144
Spoken Language Resources and Technology Evaluation I, II
- Douglas A. Jones, Wade Shen, Elizabeth Shriberg, Andreas Stolcke, Teresa M. Kamm, Douglas A. Reynolds:
Two experiments comparing reading with listening for human processing of conversational telephone speech. 1145-1148 - Sylvain Galliano, Edouard Geoffrois, Djamel Mostefa, Khalid Choukri, Jean-François Bonastre, Guillaume Gravier:
The ESTER phase II evaluation campaign for the rich transcription of French broadcast news. 1149-1152 - Takashi Saito:
A method of multi-layered speech segmentation tailored for speech synthesis. 1153-1156 - Sérgio Paulo
, Luís C. Oliveira:
Generation of word alternative pronunciations using weighted finite state transducers. 1157-1160 - Helmer Strik
, Diana Binnenpoorte, Catia Cucchiarini:
Multiword expressions in spontaneous speech: do we really speak like that? 1161-1164 - Jáchym Kolár, Jan Svec, Stephanie M. Strassel, Christopher Walker, Dagmar Kozlíková, Josef Psutka:
Czech spontaneous speech corpus with structural metadata. 1165-1168
Early Language Acquisition
- Kentaro Ishizuka, Ryoko Mugitani, Hiroko Kato Solvang, Shigeaki Amano:
A longitudinal analysis of the spectral peaks of vowels for a Japanese infant. 1169-1172 - Krisztina Zajdó, Jeannette M. van der Stelt, Ton G. Wempe, Louis C. W. Pols:
Cross-linguistic comparison of two-year-old children's acoustic vowel spaces: contrasting Hungarian with dutch. 1173-1176 - Britta Lintfert, Katrin Schneider:
Acoustic correlates of contrastive stress in German children. 1177-1180 - Giampiero Salvi:
Ecological language acquisition via incremental model-based clustering. 1181-1184 - Tamami Sudo, Ken Mogi:
Perceptual and linguistic category formation in infants. 1185-1188
Multi-modal / Multi-media Processing I, II
- Raghunandan S. Kumaran, Karthik Narayanan, John N. Gowdy:
Myoelectric signals for multimodal speech recognition. 1189-1192 - Philippe Daubias:
Is color information really useful for lip-reading ? (or what is lost when color is not used). 1193-1196 - Islam Shdaifat, Rolf-Rainer Grigat:
A system for audio-visual speech recognition. 1197-1200 - Norihide Kitaoka, Hironori Oshikawa, Seiichi Nakagawa:
Multimodal interface for organization name input based on combination of isolated word recognition and continuous base-word recognition. 1201-1204 - Yosuke Matsusaka:
Recognition of (3) party conversation using prosody and gaze. 1205-1208 - Dongdong Li, Yingchun Yang, Zhaohui Wu:
Combining voiceprint and face biometrics for speaker identification using SDWS. 1209-1212 - Neil Cooke, Martin J. Russell:
Using the focus of visual attention to improve spontaneous speech recognition. 1213-1216 - Sabri Gurbuz:
Real-time outer lip contour tracking for HCI applications. 1217-1220 - Jing Huang, Karthik Visweswariah:
Improving lip-reading with feature space transforms for multi-stream audio-visual speech recognition. 1221-1224 - Hansjörg Mixdorff, Denis Burnham, Guillaume Vignali, Patavee Charnvivit:
Are there facial correlates of Thai syllabic tones? 1225-1228 - Rowan Seymour, Ji Ming, Darryl Stewart:
A new posterior based audio-visual integration method for robust speech recognition. 1229-1232
Bridging the Gap ASR-HSR
- Sorin Dusan, Lawrence R. Rabiner:
On integrating insights from human speech perception into automatic speech recognition. 1233-1236 - Odette Scharenborg:
Parallels between HSR and ASR: how ASR can contribute to HSR. 1237-1240 - Louis ten Bosch, Odette Scharenborg:
ASR decoding in a computational model of human word recognition. 1241-1244 - Viktoria Maier, Roger K. Moore:
An investigation into a simulation of episodic memory for automatic speech recognition. 1245-1248 - Eric Fosler-Lussier, C. Anton Rytting, Soundararajan Srinivasan:
Phonetic ignorance is bliss: investigating the effects of phonetic information reduction on ASR performance. 1249-1252 - Marcus Holmberg, David Gelbart, Ulrich Ramacher, Werner Hemmert:
Automatic speech recognition with neural spike trains. 1253-1256 - Michael J. Carey, Tuan P. Quang:
A speech similarity distance weighting for robust recognition. 1257-1260 - Takao Murakami, Kazutaka Maruyama, Nobuaki Minematsu, Keikichi Hirose:
Japanese vowel recognition based on structural representation of speech. 1261-1264 - Soundararajan Srinivasan, DeLiang Wang:
Modeling the perception of multitalker speech. 1265-1268 - Sue Harding, Jon P. Barker, Guy J. Brown:
Binaural feature selection for missing data speech recognition. 1269-1272 - Thorsten Wesker, Bernd T. Meyer, Kirsten Wagener, Jörn Anemüller, Alfred Mertins, Birger Kollmeier:
Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines. 1273-1276
Speech Recognition - Language Modelling I-III
- Jen-Wei Kuo, Berlin Chen:
Minimum word error based discriminative training of language models. 1277-1280 - A. Ghaoui, François Yvon, Chafic Mokbel, Gérard Chollet:
On the use of morphological constraints in n-gram statistical language model. 1281-1284 - Elvira I. Sicilia-Garcia, Ji Ming, Francis Jack Smith:
A posteriori multiple word-domain language model. 1285-1288 - Javier Dieguez-Tirado, Carmen García-Mateo, Antonio Cardenal López:
Effective topic-tree based language model adaptation. 1289-1292 - Abhinav Sethy, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Building topic specific language models from webdata using competitive models. 1293-1296 - Carlos Troncoso, Tatsuya Kawahara:
Trigger-based language model adaptation for automatic meeting transcription. 1297-1300 - Jacques Duchateau, Dong Hoon Van Uytsel, Hugo Van hamme, Patrick Wambacq:
Statistical language models for large vocabulary spontaneous speech recognition in dutch. 1301-1304 - Alexandre Allauzen, Jean-Luc Gauvain:
Diachronic vocabulary adaptation for broadcast news transcription. 1305-1308 - Vesa Siivola, Bryan L. Pellom:
Growing an n-gram language model. 1309-1312 - Harald Hning, Manuel Kirschner, Fritz Class, André Berton, Udo Haiber:
Embedding grammars into statistical language models. 1313-1316 - Simo Broman, Mikko Kurimo:
Methods for combining language models in speech recognition. 1317-1320 - Airenas Vaiciunas, Gailius Raskinis:
Review of statistical modeling of highly inflected lithuanian using very large vocabulary. 1321-1324 - Genevieve Gorrell, Brandyn Webb:
Generalized hebbian algorithm for incremental latent semantic analysis. 1325-1328 - Arnar Thor Jensson, Edward W. D. Whittaker, Koji Iwano, Sadaoki Furui:
Language model adaptation for resource deficient languages using translated data. 1329-1332 - Petra Witschel, Sergey Astrov, Gabriele Bakenecker, Josef G. Bauer, Harald Höge:
POS-based language models for large vocabulary speech recognition on embedded systems. 1333-1336
Speech Recognition - Pronunciation Modelling
- Je Hun Jeon, Minhwa Chung:
Automatic generation of domain-dependent pronunciation lexicon with data-driven rules and rule adaptation. 1337-1340 - Michael Tjalve, Mark A. Huckvale:
Pronunciation variation modelling using accent features. 1341-1344 - Khiet P. Truong, Ambra Neri, Febe de Wet, Catia Cucchiarini, Helmer Strik
:
Automatic detection of frequent pronunciation errors made by L2-learners. 1345-1348 - Josef Psutka, Pavel Ircing, Josef V. Psutka, Jan Hajic, William J. Byrne, Jirí Mírovský:
Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project. 1349-1352 - Stéphane Dupont, Christophe Ris, Laurent Couvreur, Jean-Marc Boite:
A study of implicit and explicit modeling of coarticulation and pronunciation variation. 1353-1356 - Shinya Takahashi, Tsuyoshi Morimoto, Sakashi Maeda, Naoyuki Tsuruta:
Detection of coughs from user utterances using imitated phoneme model. 1357-1360 - V. Ramasubramanian, P. Srinivas, T. V. Sreenivas:
Stochastic pronunciation modeling by ergodic-HMM of acoustic sub-word units. 1361-1364 - Chen Liu, Lynette Melnar:
An automated linguistic knowledge-based cross-language transfer method for building acoustic models for a language without native training data. 1365-1368 - Ghazi Bouselmi, Dominique Fohr, Irina Illina, Jean Paul Haton:
Fully automated non-native speech recognition using confusion-based acoustic model integration. 1369-1372
Prosodic Structure
- Véronique Aubergé, Albert Rilliard:
The focus prosody: more than a simple binary function. 1373-1376 - Martha Dalton, Ailbhe Ní Chasaide:
Peak timing in two dialects of connaught irish. 1377-1380 - Janet Fletcher:
Compound rises and "uptalk" in spoken English. 1381-1384 - Li-chiung Yang:
Duration and the temporal structure of Mandarin discourse. 1385-1388 - Bei Wang:
Prosodic realization of split noun phrases in Mandarin Chinese compared in topic and focus contexts. 1389-1392 - Ziyu Xiong:
Downstep effect on disyllabic words of citation forms in standard Chinese. 1393-1396 - Jinfu Ni, Hisashi Kawai, Keikichi Hirose:
Estimation of intonation variation with constrained tone transformations. 1397-1400 - Ho-hsien Pan:
Voice quality of falling tones in taiwan min. 1401-1404 - Chiu-yu Tseng, Bau-Ling Fu:
Duration, intensity and pause predictions in relation to prosody organization. 1405-1408 - Jiahong Yuan, Jason M. Brenier, Daniel Jurafsky:
Pitch accent prediction: effects of genre and speaker. 1409-1412 - Hiroya Fujisaki, Sumio Ohno:
Analysis and modeling of fundamental frequency contours of hindi utterances. 1413-1416 - Natasha Govender, Etienne Barnard, Marelie H. Davel:
Fundamental frequency and tone in isizulu: initial experiments. 1417-1420 - Judith Bishop, Marc Peake, Dmitry Sityaev:
Intonational sequences in tuscan Italian. 1421-1424 - Caterina Petrone:
Effects of raddoppiamento sintattico on tonal alignment in Italian. 1425-1428 - Tomás Dubeda, Jan Votrubec:
Acoustic analysis of Czech stress: intonation, duration and intensity revisited. 1429-1432 - Mohamed Yeou:
Variability of F0 peak alignment in moroccan Arabic accentual focus. 1433-1436 - Anne Lacheret, Ch. Lyche, Michel Morel:
Phonological analysis of schwa and liaison within the PFC project (phonologie du fran ais contemporain): how determinant are the prosodic factors? 1437-1440 - Plínio A. Barbosa, Pablo Arantes, Alexsandro R. Meireles, Jussara M. Vieira:
Abstractness in speech-metronome synchronisation: P-centres as cyclic attractors. 1441-1444
Applications of Confidence Related Measures to ASR
- Makoto Yamada, Tsuneo Kato, Masaki Naito, Hisashi Kawai:
Improvement of rejection performance of keyword spotting using anti-keywords derived from large vocabulary considering acoustical similarity to keywords. 1445-1448 - Ralf Schlüter, T. Scharrenbach, Volker Steinbiss, Hermann Ney:
Bayes risk minimization using metric loss functions. 1449-1452 - Akio Kobayashi, Kazuo Onoe, Shoei Sato, Toru Imai:
Word error rate minimization using an integrated confidence measure. 1453-1456 - Bin Dong, Qingwei Zhao, Yonghong Yan:
Fast confidence measure algorithm for continuous speech recognition. 1457-1460 - Hamed Ketabdar, Jithendra Vepa, Samy Bengio, Hervé Bourlard:
Developing and enhancing posterior based speech recognition systems. 1461-1464 - Peng Liu, Ye Tian, Jian-Lai Zhou, Frank K. Soong:
Background model based posterior probability for measuring confidence. 1465-1468
Multilingual TTS
- Laura Mayfield Tomokiyo, Alan W. Black, Kevin A. Lenzo:
Foreign accents in synthetic speech: development and evaluation. 1469-1472 - Raul Fernandez, Wei Zhang, Ellen Eide, Raimo Bakis, Wael Hamza, Yi Liu, Michael Picheny, John F. Pitrelli, Yong Qing, Zhiwei Shuang, Li Qin Shen:
Toward multiple-language TTS: experiments in English and Mandarin. 1473-1476 - Javier Latorre, Koji Iwano, Sadaoki Furui:
Cross-language synthesis with a polyglot synthesizer. 1477-1480 - Mucemi Gakuru, Frederick K. Iraki, Roger C. F. Tucker, Ksenia Shalonova, Kamanda Ngugi:
Development of a Kiswahili text to speech system. 1481-1484 - Jaime Botella Ordinas, Volker Fischer, Claire Waast-Richard:
Multilingual models in the IBM bilingual text-to-speech systems. 1485-1488 - Artur Janicki, Piotr Herman:
Reconstruction of Polish diacritics in a text-to-speech system. 1489-1492
Speech Bandwidth Extension
- Hiroyuki Ehara, Toshiyuki Morii, Masahiro Oshikiri, Koji Yoshida, Kouichi Honma:
Design of bandwidth scalable LSF quantization using interframe and intraframe prediction. 1493-1496 - Bernd Geiser, Peter Jax, Peter Vary:
Artificial bandwidth extension of speech supported by watermark-transmitted side information. 1497-1500 - Rongqiang Hu, Venkatesh Krishnan, David V. Anderson:
Speech bandwidth extension by improved codebook mapping towards increased phonetic classification. 1501-1504 - Dhananjay Bansal, Bhiksha Raj, Paris Smaragdis:
Bandwidth expansion of narrowband speech using non-negative matrix factorization. 1505-1508 - Michael L. Seltzer, Alex Acero, Jasha Droppo:
Robust bandwidth extension of noise-corrupted narrowband speech. 1509-1512 - João P. Cabral, Luís C. Oliveira:
Pitch-synchronous time-scaling for high-frequency excitation regeneration. 1513-1516
Spoken Language Resources and Technology Evaluation I, II
- Felix Burkhardt, Astrid Paeschke, M. Rolfes, Walter F. Sendlmeier, Benjamin Weiss:
A database of German emotional speech. 1517-1520 - Philippe Boula de Mareüil, Christophe d'Alessandro, Gérard Bailly, Frédéric Béchet, Marie-Neige Garcia, Michel Morel, Romain Prudon, Jean Véronis:
Evaluating the pronunciation of proper names by four French grapheme-to-phoneme converters. 1521-1524 - Filip Jurcícek, Jirí Zahradil, Libor Jelínek:
A human-human train timetable dialogue corpus. 1525-1528 - Gloria Branco, Luís Almeida, Rui Gomes, Nuno Beires:
A Portuguese spoken and multi-modal dialog corpora. 1529-1532 - Joyce Y. C. Chan, P. C. Ching, Tan Lee:
Development of a Cantonese-English code-mixing speech corpus. 1533-1536 - Andrej Zgank, Darinka Verdonik, Aleksandra Zögling Markus, Zdravko Kacic:
BNSI Slovenian broadcast news database - speech and text corpus. 1537-1540 - Jan Volín, Radek Skarnitzl, Petr Pollák:
Confronting HMM-based phone labelling with human evaluation of speech production. 1541-1544 - Stephanie M. Strassel, Jáchym Kolár, Zhiyi Song, Leila Barclay, Meghan Lammie Glenn:
Structural metadata annotation: moving beyond English. 1545-1548 - Delphine Charlet, Sacha Krstulovic, Frédéric Bimbot, Olivier Boëffard, Dominique Fohr, Odile Mella, Filip Korkmazsky, Djamel Mostefa, Khalid Choukri, Arnaud Vallée:
Neologos: an optimized database for the development of new speech processing algorithms. 1549-1552 - Cheng-Yuan Lin, Kuan-Ting Chen, Jyh-Shing Roger Jang:
A hybrid approach to automatic segmentation and labeling for Mandarin Chinese speech corpus. 1553-1556 - Yuang-Chin Chiang, Min-Siong Liang, Hong-Yi Lin, Ren-Yuan Lyu:
The multiple pronunciations in Taiwanese and the automatic transcription of Buddhist sutra with augmented read speech. 1557-1560 - Marelie H. Davel, Etienne Barnard:
Bootstrapping pronunciation dictionaries: practical issues. 1561-1564 - Nigel G. Ward, Anais G. Rivera, Karen Ward, David G. Novick:
Root causes of lost time and user stress in a simple dialog system. 1565-1568 - Julie A. Parisi, Douglas Brungart:
Evaluating communication effectiveness in team collaboration. 1569-1572 - David Conejero, Alan Lounds
, Carmen García-Mateo, Leandro Rodríguez Liñares, Raquel Mochales, Asunción Moreno:
Bilingual aligned corpora for speech to speech translation for Spanish, English and Catalan. 1573-1576 - Hynek Boril, Petr Pollák:
Design and collection of Czech Lombard speech database. 1577-1580 - Abe Kazemzadeh, Hong You, Markus Iseli, Barbara Jones, Xiaodong Cui, Margaret Heritage, Patti Price, Elaine Andersen, Shrikanth S. Narayanan, Abeer Alwan:
TBALL data collection: the making of a young children's speech corpus. 1581-1584 - Hitomi Tohyama, Shigeki Matsubara, Nobuo Kawaguchi, Yasuyoshi Inagaki:
Construction and utilization of bilingual speech corpus for simultaneous machine interpretation research. 1585-1588 - Rebecca A. Bates, Patrick Menning, Elizabeth Willingham, Chad Kuyper:
Meeting acts: a labeling system for group interaction in meetings. 1589-1592 - Marius-Calin Silaghi, Rachna Vargiya:
A new evaluation criteria for keyword spotting techniques and a new algorithm. 1593-1596 - Christoph Draxler, Alexander Steffen:
Phattsessionz: recording 1000 adolescent speakers in schools in Germany. 1597-1600 - Solomon Teferra Abate, Wolfgang Menzel, Bairu Tafila:
An Amharic speech corpus for large vocabulary continuous speech recognition. 1601-1604 - Hans Dolfing, David Reitter, Luís Almeida, Nuno Beires, Michael Cody, Rui Gomes, Kerry Robinson, Roman Zielinski:
The FASil speech and multimodal corpora. 1605-1608 - Karin Müller:
Revealing phonological similarities between German and dutch. 1609-1612
Large Vocabulary Speech Recognition Systems
- Dimitra Vergyri, Katrin Kirchhoff, Venkata Ramana Rao Gadde, Andreas Stolcke, Jing Zheng:
Development of a conversational telephone speech recognizer for Levantine Arabic. 1613-1616 - Bhuvana Ramabhadran:
Exploiting large quantities of spontaneous speech for unsupervised training of acoustic models. 1617-1620 - Che-Kuang Lin, Lin-Shan Lee:
Improved spontaneous Mandarin speech recognition by disfluency interruption point (IP) detection using prosodic features. 1621-1624 - Jeff Z. Ma, Spyros Matsoukas:
Improvements to the BBN RT04 Mandarin conversational telephone speech recognition system. 1625-1628 - Sakriani Sakti, Satoshi Nakamura, Konstantin Markov:
Incorporating a Bayesian wide phonetic context model for acoustic rescoring. 1629-1632 - Abdelkhalek Messaoudi, Lori Lamel, Jean-Luc Gauvain:
Modeling vowels for Arabic BN transcription. 1633-1636 - Mohamed Afify, Long Nguyen, Bing Xiang, Sherif M. Abdou, John Makhoul:
Recent progress in Arabic broadcast news transcription at BBN. 1637-1640 - Spyros Matsoukas, Rohit Prasad, Srinivas Laxminarayan, Bing Xiang, Long Nguyen, Richard M. Schwartz:
The 2004 BBN 1xRT recognition systems for English broadcast news and conversational telephone speech. 1641-1644 - Rohit Prasad, Spyros Matsoukas, Chia-Lin Kao, Jeff Z. Ma, Dongxin Xu, Thomas Colthurst, Owen Kimball, Richard M. Schwartz, Jean-Luc Gauvain, Lori Lamel, Holger Schwenk, Gilles Adda, Fabrice Lefèvre:
The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system. 1645-1648 - Bing Xiang, Long Nguyen, Xuefeng Guo, Dongxin Xu:
The BBN Mandarin broadcast news transcription system. 1649-1652 - Paul Deléglise, Yannick Estève, Sylvain Meignier, Téva Merlin:
The LIUM speech transcription system: a CMU Sphinx III-based system for French broadcast news. 1653-1656 - Lori Lamel, Gilles Adda, Éric Bilinski, Jean-Luc Gauvain:
Transcribing lectures and seminars. 1657-1660 - Thomas Hain
, John Dines, Giulia Garau, Martin Karafiát, Darren Moore, Vincent Wan, Roeland Ordelman, Steve Renals:
Transcription of conference room meetings: an investigation. 1661-1664 - Jean-Luc Gauvain, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Véronique Gendner, Lori Lamel, Holger Schwenk:
Where are we in transcribing French broadcast news? 1665-1668 - Odette Scharenborg, Stephanie Seneff:
Two-pass strategy for handling OOVs in a large vocabulary recognition task. 1669-1672 - Long Nguyen, Bing Xiang, Mohamed Afify, Sherif M. Abdou, Spyros Matsoukas, Richard M. Schwartz, John Makhoul:
The BBN RT04 English broadcast news transcription system. 1673-1676 - Rong Zhang, Ziad Al Bawab, Arthur Chan, Ananlada Chotimongkol, David Huggins-Daines, Alexander I. Rudnicky:
Investigations on ensemble based semi-supervised acoustic model training. 1677-1680 - Jan Nouza, Jindrich Zdánský, Petr David, Petr Cerva, Jan Kolorenc, Dana Nejedlová:
Fully automated system for Czech spoken broadcast transcription with very large (300k+) lexicon. 1681-1684 - Mike Schuster, Takaaki Hori, Atsushi Nakamura:
Experiments with probabilistic principal component analysis in LVCSR. 1685-1688 - Thang Tat Vu, Dung Tien Nguyen, Chi Mai Luong, John-Paul Hosom:
Vietnamese large vocabulary continuous speech recognition. 1689-1692 - Takahiro Shinozaki, Mari Ostendorf, Les E. Atlas:
Data sampling for improved speech recognizer training. 1693-1696
Speech Perception I, II
- Do Dat Tran, Eric Castelli, Jean-François Serignat, Van Loan Trinh, Le Xuan Hung:
Influence of F0 on Vietnamese syllable perception. 1697-1700 - Barbara Schwanhäußer, Denis Burnham:
Lexical tone and pitch perception in tone and non-tone language speakers. 1701-1704 - Isabel Falé, Isabel Hub Faria:
Intonational contrasts in EP: a categorical perception approach. 1705-1708 - Bettina Braun, Andrea Weber, Matthew W. Crocker:
Does narrow focus activate alternative referents? 1709-1712 - Kiyoaki Aikawa, Hayato Hashimoto:
Audiovisual interaction on the perception of frequency glide of linear sweep tones. 1713-1716 - Kei Omata, Ken Mogi:
Audiovisual integration in dichotic listening. 1717-1720 - Gunilla Svanfeldt, Dirk Olszewski:
Perception experiment combining a parametric loudspeaker and a synthetic talking head. 1721-1724 - Catherine Mayo, Robert A. J. Clark, Simon King:
Multidimensional scaling of listener responses to synthetic speech. 1725-1728 - Hiroko Terasawa, Malcolm Slaney, Jonathan Berger:
A timbre space for speech. 1729-1732 - Abdellah Kacha, Francis Grenez, Jean Schoentgen:
Voice quality assessment by means of comparative judgments of speech tokens. 1733-1736 - Toshio Irino, Satoru Satou, Shunsuke Nomura, Hideki Banno, Hideki Kawahara:
Speech intelligibility derived from time-frequency and source smearing. 1737-1740 - Nahoko Hayashi, Takayuki Arai, Nao Hodoshima, Yusuke Miyauchi, Kiyohiro Kurisu:
Steady-state pre-processing for improving speech intelligibility in reverberant environments: evaluation in a hall with an electrical reverberator. 1741-1744 - Patrick C. M. Wong, Kiara M. Lee, Todd B. Parrish:
Neural bases of listening to speech in noise. 1745-1748 - P. Jongmans, Frans J. M. Hilgers, Louis C. W. Pols, Corina J. van As-Brooks:
The intelligibility of tracheoesophageal speech: first results. 1749-1752 - Guy J. Brown, Kalle J. Palomäki:
A computational model of the speech reception threshold for laterally separated speech and noise. 1753-1756 - Esther Janse:
Lexical inhibition effects in time-compressed speech. 1757-1760 - Caroline Jacquier, Fanny Meunier:
Perception of time-compressed rapid acoustic cues in French CV syllables. 1761-1764 - Claire-Léonie Grataloup, Michel Hoen, François Pellegrino, E. Veuillet, Lionel Collet, Fanny Meunier:
Reversed speech comprehension depends on the auditory efferent system functionality. 1765-1768 - Won Tokuma, Shinichi Tokuma:
Perceptual space of English fricatives for Japanese learners. 1769-1772 - Ioana Vasilescu, Maria Candea, Martine Adda-Decker:
Perceptual salience of language-specific acoustic differences in autonomous fillers across eight languages. 1773-1776 - Marc D. Pell:
Effects of cortical and subcortical brain damage on the processing of emotional prosody. 1777-1780
Keynote Papers
- Elizabeth Shriberg:
Spontaneous speech: how people really talk and why engineers should care. 1781-1784
Speech Recognition - Adaptation I, II
- Karthik Visweswariah, Peder A. Olsen:
Feature adaptation using projection of Gaussian posteriors. 1785-1788 - Xiao Li, Jeff A. Bilmes, Jonathan Malkin:
Maximum margin learning and adaptation of MLP classifiers. 1789-1792 - Arindam Mandal, Mari Ostendorf, Andreas Stolcke:
Leveraging speaker-dependent variation of adaptation. 1793-1796 - Roger Wend-Huu Hsiao, Brian Kan-Wing Mak:
A comparative study of two kernel eigenspace-based speaker adaptation methods on large vocabulary continuous speech recognition. 1797-1800 - Xuechuan Wang, Douglas D. O'Shaughnessy:
Environmental compensation using ASR model adaptation by a Bayesian parametric representation method. 1801-1804 - Jun Luo, Zhijian Ou, Zuoying Wang:
Discriminative speaker adaptation with eigenvoices. 1805-1808
Prosody Modelling and Speech Technology I, II
- Gina-Anne Levow:
Context in multi-lingual tone and pitch accent recognition. 1809-1812 - Fabio Tamburini:
Automatic prominence identification and prosodic typology. 1813-1816 - Tommy Ingulfsen, Tina Burrows, Sabine Buchholz:
Influence of syntax on prosodic boundary prediction. 1817-1820 - Roberto Gretter, Dino Seppi:
Using prosodic information for disambiguation purposes. 1821-1824 - Wentao Gu, Keikichi Hirose, Hiroya Fujisaki:
Analysis of the effects of word emphasis and echo question on F0 contours of Cantonese utterances. 1825-1828 - Tina Burrows, Peter Jackson, Katherine M. Knill, Dmitry Sityaev:
Combining models of prosodic phrasing and pausing. 1829-1832
Detecting and Synthesizing Speaker State
- Julia Hirschberg, Stefan Benus, Jason M. Brenier, Frank Enos, Sarah Friedman, Sarah Gilman, Cynthia Girand, Martin Graciarena, Andreas Kathol, Laura A. Michaelis, Bryan L. Pellom, Elizabeth Shriberg, Andreas Stolcke:
Distinguishing deceptive from non-deceptive speech. 1833-1836 - Jackson Liscombe, Julia Hirschberg, Jennifer J. Venditti:
Detecting certainness in spoken tutorial dialogues. 1837-1840 - Laurence Vidrascu, Laurence Devillers:
Detection of real-life emotions in call centers. 1841-1844 - Jackson Liscombe, Giuseppe Riccardi, Dilek Hakkani-Tür:
Using context to improve emotion detection in spoken dialog systems. 1845-1848 - Irena Yanushevskaya, Christer Gobl, Ailbhe Ní Chasaide:
Voice quality and f0 cues for affect expression: implications for synthesis. 1849-1852 - Toru Takahashi, Takeshi Fujii, Masashi Nishi, Hideki Banno, Toshio Irino, Hideki Kawahara:
Voice and emotional expression transformation based on statistics of vowel parameters in an emotional speech database. 1853-1856
Rapid Development of Spoken Dialogue Systems
- Giuseppe Di Fabbrizio, Gökhan Tür, Dilek Hakkani-Tür
:
Automated wizard-of-oz for spoken dialogue systems. 1857-1860 - Kouichi Katsurada, Kunitoshi Sato, Hiroaki Adachi, Hirobumi Yamada, Tsuneo Nitta:
A rapid prototyping tool for constructing web-based MMI applications. 1861-1864 - Philip Hanna, Ian M. O'Neill, Xingkun Liu, Michael F. McTear:
Developing extensible and reusable spoken dialogue components: an examination of the Queen's communicator. 1865-1868 - Ye-Yi Wang, Alex Acero:
SGStudio: rapid semantic grammar development for spoken language understanding. 1869-1872 - Murat Akbacak, Yuqing Gao, Liang Gu, Hong-Kwang Jeff Kuo:
Rapid transition to new spoken dialogue domains: language model training using knowledge from previous domain applications and web text resources. 1873-1876 - Manny Rayner, Pierrette Bouillon, Nikos Chatzichrisafis, Beth Ann Hockey, Marianne Santaholma, Marianne Starlander, Hitoshi Isahara, Kyoko Kanzaki, Yukie Nakao:
A methodology for comparing grammar-based and robust approaches to speech understanding. 1877-1880
Text-to-Speech I, II
- François Mairesse, Marilyn A. Walker:
Learning to personalize spoken generation for dialogue systems. 1881-1884 - S. Revelin, Didier Cadic, Claire Waast-Richard:
Optimization of text-to-speech phonetic transcriptions using a-posteriori signal comparison. 1885-1888 - Özgül Salor, Mübeccel Demirekler:
Voice transformation using principle component analysis based LSF quantization and dynamic programming approach. 1889-1892 - Hai Ping Li, Wei Zhang:
Adapt Mandarin TTS system to Chinese dialect TTS systems. 1893-1896 - Min Zheng, Qin Shi, Wei Zhang, Lianhong Cai:
Grapheme-to-phoneme conversion based on TBL algorithm in Mandarin TTS system. 1897-1900 - Paolo Massimino, Alberto Pacchiotti:
An automaton-based machine learning technique for automatic phonetic transcription. 1901-1904 - Tasanawan Soonklang, Robert I. Damper, Yannick Marchand:
Comparative objective and subjective evaluation of three data-driven techniques for proper name pronunciation. 1905-1908 - Olov Engwall:
Articulatory synthesis using corpus-based estimation of line spectrum pairs. 1909-1912 - Aoju Chen, Els den Os:
Effects of pitch accent type on interpreting information status in synthetic speech. 1913-1916 - Perttu Prusi, Anssi Kainulainen, Jaakko Hakulinen, Markku Turunen, Esa-Pekka Salonen, Leena Helin:
Towards generic spatial object model and route guidance grammar for speech-based systems. 1917-1920 - Chi-Chun Hsia, Chung-Hsien Wu, Te-Hsien Liu:
Duration-embedded bi-HMM for expressive voice conversion. 1921-1924 - Toshio Hirai, Hisashi Kawai, Minoru Tsuzaki, Nobuyuki Nishizawa:
Analysis of major factors of naturalness degradation in concatenative synthesis. 1925-1928 - Jilei Tian, Jani Nurminen, Imre Kiss:
Duration modeling and memory optimization in a Mandarin TTS system. 1929-1932 - Min-Siong Liang, Ke-Chun Chuang, Rhuei-Cheng Yang, Yuang-Chin Chiang, Ren-Yuan Lyu:
A bi-lingual Mandarin-to-taiwanese text-to-speech system. 1933-1936 - Uwe D. Reichel, Florian Schiel:
Using morphology and phoneme history to improve grapheme-to-phoneme conversion. 1937-1940 - Olga Goubanova, Simon King:
Predicting consonant duration with Bayesian belief networks. 1941-1944 - Per-Anders Jande:
Inducing decision tree pronunciation variation models from annotated speech data. 1945-1948 - Lijuan Wang, Yong Zhao, Min Chu, Frank K. Soong, Zhigang Cao:
Phonetic transcription verification with generalized posterior probability. 1949-1952 - Hua Cheng, Fuliang Weng, Niti Hantaweepant, Lawrence Cavedon, Stanley Peters:
Training a maximum entropy model for surface realization. 1953-1956 - Tomoki Toda, Kiyohiro Shikano:
NAM-to-speech conversion with Gaussian mixture models. 1957-1960 - Michelina Savino, Mario Refice, Massimo Mitaritonna:
Which Italian do current systems speak? a first step towards pronunciation modelling of Italian varieties. 1961-1964 - Dominika Oliver, Robert A. J. Clark:
Modelling pitch accent types for Polish speech synthesis. 1965-1968 - Chatchawarn Hansakunbuntheung, Ausdang Thangthai, Chai Wutiwiwatchai, Rungkarn Siricharoenchai:
Learning methods and features for corpus-based phrase break prediction on Thai. 1969-1972 - Paul Taylor:
Hidden Markov models for grapheme to phoneme conversion. 1973-1976
Speaker Characterization and Recognition I-IV
- Longbiao Wang, Norihide Kitaoka, Seiichi Nakagawa:
Robust distant speaker recognition based on position dependent cepstral mean normalization. 1977-1980 - David A. van Leeuwen:
Speaker adaptation in the NIST speaker recognition evaluation 2004. 1981-1984 - Jacob Goldberger, Hagai Aronowitz:
A distance measure between GMMs based on the unscented transform and its application to speaker recognition. 1985-1988 - Sorin Dusan:
Estimation of speaker's height and vocal tract length from speech signal. 1989-1992 - Doroteo Torre Toledano, Carlos Fombella, Joaquin Gonzalez-Rodriguez, Luis A. Hernández Gómez:
On the relationship between phonetic modeling precision and phonetic speaker recognition accuracy. 1993-1996 - J. Fortuna, P. Sivakumaran, Aladdin M. Ariyaeeinia, Amit S. Malegaonkar:
Open-set speaker identification using adapted Gaussian mixture models. 1997-2000 - James McAuley, Ji Ming, Pat Corr:
Speaker verification in noisy conditions using correlated subband features. 2001-2004 - Mikaël Collet, Yassine Mami, Delphine Charlet, Frédéric Bimbot:
Probabilistic anchor models approach for speaker verification. 2005-2008 - Mijail Arcienega, Anil Alexander, Philipp Zimmermann, Andrzej Drygajlo:
A Bayesian network approach combining pitch and spectral envelope features to reduce channel mismatch in speaker verification and forensic speaker recognition. 2009-2012 - Kwok-Kwong Yiu, Man-Wai Mak, Sun-Yuan Kung:
Channel robust speaker verification via Bayesian blind stochastic feature transformation. 2013-2016 - Tomoko Matsui, Kunio Tanabe:
dPLRM-based speaker identification with log power spectrum. 2017-2020 - Xianxian Zhang, John H. L. Hansen, Pongtep Angkititrakul, Kazuya Takeda:
Speaker verification using Gaussian mixture models within changing real car environments. 2021-2024 - Kanae Amino, Tsutomu Sugawara, Takayuki Arai:
The correspondences between the perception of the speaker individualities contained in speech sounds and their acoustic properties. 2025-2028 - Samuel Kim, Sung-Wan Yoon, Thomas Eriksson, Hong-Goo Kang, Dae Hee Youn:
A noise-robust pitch synchronous feature extraction algorithm for speaker recognition systems. 2029-2032 - Jing Deng, Thomas Fang Zheng, Zhanjiang Song, Jian Liu:
Modeling high-level information by using Gaussian mixture correlation for GMM-UBM based speaker recognition. 2033-2036 - Xianxian Zhang, John H. L. Hansen:
In-set/out-of-set speaker identification based on discriminative speech frame selection. 2037-2040 - Zhenchun Lei, Yingchun Yang, Zhaohui Wu:
Mixture of support vector machines for text-independent speaker recognition. 2041-2044 - Shilei Zhang, Junmei Bai, Shuwu Zhang, Bo Xu:
Optimal model order selection based on regression tree in speaker identification. 2045-2048 - Marcos Faúndez-Zanuy, Jordi Solé-Casals:
Speaker verification improvement using blind inversion of distortions. 2049-2052
Single-channel Speech Enhancement
- Israel Cohen:
Supergaussian GARCH models for speech signals. 2053-2056 - Athanasios Mouchtaris, Jan Van der Spiegel, Paul Mueller, Panagiotis Tsakalides:
A spectral conversion approach to feature denoising and speech enhancement. 2057-2060 - Alfonso Ortega, Eduardo Lleida, Enrique Masgrau, Luis Buera, Antonio Miguel:
Acoustic feedback cancellation in speech reinforcement systems for vehicles. 2061-2064 - Julien Bourgeois, Jürgen Freudenberger, Guillaume Lathoud:
Implicit control of noise canceller for speech enhancement. 2065-2068 - T. M. Sunil Kumar, T. V. Sreenivas:
Speech enhancement using Markov model of speech segments. 2069-2072 - Vladimir Braquet, Takao Kobayashi:
A wavelet based noise reduction algorithm for speech signal corrupted by coloured noise. 2073-2076 - Esfandiar Zavarehei, Saeed Vaseghi:
Speech enhancement in temporal DFT trajectories using Kalman filters. 2077-2080 - Qin Yan, Saeed Vaseghi, Esfandiar Zavarehei, Ben P. Milner:
Formant-tracking linear prediction models for speech processing in noisy environments. 2081-2084 - Hui Jiang, Qian-Jie Fu:
Statistical noise compensation for cochlear implant processing. 2085-2088 - Tuan Van Pham, Gernot Kubin:
WPD-based noise suppression using nonlinearly weighted threshold quantile estimation and optimal wavelet shrinking. 2089-2092 - Weifeng Li, Katunobu Itou, Kazuya Takeda, Fumitada Itakura:
Subjective and objective quality assessment of regression-enhanced speech in real car environments. 2093-2096 - Masashi Unoki, Masaaki Kubo, Atsushi Haniu, Masato Akagi:
A model for selective segregation of a target instrument sound from the mixed sound of various instruments. 2097-2100 - Richard C. Hendriks, Richard Heusdens, Jesper Jensen:
Improved decision directed approach for speech enhancement using an adaptive time segmentation. 2101-2104 - Heinrich W. Löllmann, Peter Vary:
Generalized filter-bank equalizer for noise reduction with reduced signal delay. 2105-2108 - Nicoleta Roman, DeLiang Wang:
A pitch-based model for separation of reverberant speech. 2109-2112 - David Yuheng Zhao, W. Bastiaan Kleijn
:
On noise gain estimation for HMM-based speech enhancement. 2113-2116 - Om Deshmukh, Carol Y. Espy-Wilson:
Speech enhancement using auditory phase opponency model. 2117-2120
Acoustic Modelling for LVCSR
- Brian Mak, Jeff Siu-Kei Au-Yeung, Yiu-Pong Lai, Man-Hung Siu:
High-density discrete HMM with the use of scalar quantization indexing. 2121-2124 - Jing Zheng, Andreas Stolcke:
Improved discriminative training using phone lattices. 2125-2128 - Qifeng Zhu, Barry Y. Chen, Frantisek Grézl, Nelson Morgan:
Improved MLP structures for data-driven feature extraction for ASR. 2129-2132 - Wolfgang Macherey, Lars Haferkamp, Ralf Schlüter, Hermann Ney:
Investigations on error minimizing training criteria for discriminative training in automatic speech recognition. 2133-2136 - Khe Chai Sim, Mark J. F. Gales:
Temporally varying model parameters for large vocabulary continuous speech recognition. 2137-2140 - Qifeng Zhu, Andreas Stolcke, Barry Y. Chen, Nelson Morgan:
Using MLP features in SRI's conversational speech recognition system. 2141-2144
Speech Production I
- Matti Airas, Hannu Pulakka, Tom Bäckström, Paavo Alku:
A toolkit for voice inverse filtering and parametrisation. 2145-2148 - Denisse Sciamarella, Christophe d'Alessandro:
Stylization of glottal-flow spectra produced by a mechanical vocal-fold model. 2149-2152 - Hideyuki Nomura, Tetsuo Funada:
Numerical glottal sound source model as coupled problem between vocal cord vibration and glottal flow. 2153-2156 - Marianne Pouplier, Maureen Stone:
A tagged-cine MRI investigation of German vowels. 2157-2160 - Antoine Serrurier, Pierre Badin:
A three-dimensional linear articulatory model of velum based on MRI data. 2161-2164 - Anne Cros, Didier Demolin, Ana Georgina Flesia, Antonio Galves:
On the relationship between intra-oral pressure and speech sonority. 2165-2168
Speaker Characterization and Recognition I-IV
- Mohamed Kamal Omar, Jirí Navrátil, Ganesh N. Ramaswamy:
Maximum conditional mutual information modeling for speaker verification. 2169-2172 - Luciana Ferrer, M. Kemal Sönmez, Sachin S. Kajarekar:
Class-dependent score combination for speaker recognition. 2173-2176 - Hagai Aronowitz, Dror Irony, David Burshtein:
Modeling intra-speaker variability for speaker recognition. 2177-2180 - Girija Chetty, Michael Wagner:
Liveness detection using cross-modal correlations in face-voice person authentication. 2181-2184 - Taichi Asami, Koji Iwano, Sadaoki Furui:
Stream-weight optimization by LDA and adaboost for multi-stream speaker verification. 2185-2188 - Yosef A. Solewicz, Moshe Koppel:
Considering speech quality in speaker verification fusion. 2189-2192
Gender and Age Issues in Speech and Language Research I, II
- Matteo Gerosa, Diego Giuliani, Fabio Brugnara:
Speaker adaptive acoustic modeling with mixture of adult and children's speech. 2193-2196 - Shona D'Arcy
, Martin J. Russell:
A comparison of human and computer recognition accuracy for children's speech. 2197-2200 - Piero Cosi, Bryan L. Pellom:
Italian children's speech recognition for advanced interactive literacy tutors. 2201-2204 - Martine Adda-Decker, Lori Lamel:
Do speech recognizers prefer female speakers? 2205-2208 - Serdar Yildirim, Chul Min Lee, Sungbok Lee, Alexandros Potamianos, Shrikanth S. Narayanan:
Detecting Politeness and frustration state of a child in a conversational computer game. 2209-2212 - Diana Binnenpoorte, Christophe Van Bael, Els den Os, Lou Boves:
Gender in everyday speech and language: a corpus-based study. 2213-2216
Spoken Language Acquisition, Development and Learning I, II
- Shigeaki Amano:
Developmental change of phoneme duration in a Japanese infant and mother. 2217-2220 - Haiping Jia, Hiroki Mori, Hideki Kasuya:
Mora timing organization in producing contrastive geminate/single consonants and long/short vowels by native and non-native speakers of Japanese: effects of speaking rate. 2221-2224 - Hongyan Wang, Vincent J. van Heuven:
Mutual intelligibility of american, Chinese and dutch-accented speakers of English. 2225-2228 - Peter Juel Henrichsen:
Deriving a bi-lingual dictionary from raw transcription data. 2229-2232 - Kei Ohta, Seiichi Nakagawa:
A statistical method of evaluating pronunciation proficiency for Japanese words. 2233-2236
Language and Dialect Identification I, II
- Pavel Matejka, Petr Schwarz, Jan Cernocký, Pavel Chytil:
Phonotactic language identification using high quality phoneme recognition. 2237-2240 - Rongqing Huang, John H. L. Hansen:
Advances in word based dialect/accent classification. 2241-2244 - Rym Hamdi, Salem Ghazali, Melissa Barkat-Defradas:
Syllable structure in spoken Arabic: a comparative investigation. 2245-2248 - J. C. Marcadet, Volker Fischer, Claire Waast-Richard:
A transformation-based learning approach to language identification for mixed-lingual text-to-speech synthesis. 2249-2252 - Shuichi Itahashi, Shiwei Zhu, Mikio Yamamoto:
Constructing family trees of multilingual speech using Gaussian mixture models. 2253-2256 - Jean-Luc Rouas:
Modeling long and short-term prosody for language identification. 2257-2260
Spoken Language Translation I, II
- Matthias Paulik, Christian Fügen, Sebastian Stüker, Tanja Schultz, Thomas Schaaf, Alex Waibel:
Document driven machine translation enhanced ASR. 2261-2264 - Shahram Khadivi, András Zolnay, Hermann Ney:
Automatic text dictation in computer-assisted translation. 2265-2268 - Luis Rodríguez, Jorge Civera, Enrique Vidal, Francisco Casacuberta, César Ernesto Martínez:
On the use of speech recognition in computer assisted translation. 2269-2272 - Andreas Kathol, Kristin Precoda, Dimitra Vergyri, Wen Wang, Susanne Z. Riehemann:
Speech translation for low-resource languages: the case of Pashto. 2273-2276 - David Picó, Jorge González, Francisco Casacuberta, Diamantino Caseiro, Isabel Trancoso:
Finite-state transducer inference for a speech-input Portuguese-to-English machine translation system. 2277-2280 - Kenko Ohta, Keiji Yasuda, Gen-ichiro Kikui, Masuzo Yanagida:
Quantitative evaluation of effects of speech recognition errors on speech translation quality. 2281-2284
Multi-channel Speech Enhancement
- Thomas Lotter, Bastian Sauert, Peter Vary:
A stereo input-output superdirective beamformer for dual channel noise reduction. 2285-2288 - Ulrich Klee, Tobias Gehrig, John W. McDonough:
Kalman filters for time delay of arrival-based source localization. 2289-2292 - Osamu Ichikawa, Masafumi Nishimura:
Simultaneous adaptation of echo cancellation and spectral subtraction for in-car speech recognition. 2293-2296 - Rong Hu, Yunxin Zhao:
Variable step size adaptive decorrelation filtering for competing speech separation. 2297-2300 - Daisuke Saitoh, Atsunobu Kaminuma, Hiroshi Saruwatari, Tsuyoki Nishikawa, Akinobu Lee:
Speech extraction in a car interior using frequency-domain ICA with rapid filter adaptations. 2301-2304 - Rongqiang Hu, Sunil D. Kamath, David V. Anderson:
Speech enhancement using non-acoustic sensors. 2305-2308 - Marc Delcroix, Takafumi Hikichi, Masato Miyoshi:
Improved blind dereverberation performance by using spatial information. 2309-2312 - Junfeng Li, Masato Akagi:
A hybrid microphone array post-filter in a diffuse noise field. 2313-2316 - Venkatesh Krishnan, Phil Spencer Whitehead, David V. Anderson, Mark A. Clements:
A framework for estimation of clean speech by fusion of outputs from multiple speech enhancement systems. 2317-2320 - Yuki Denda, Takanobu Nishiura, Yoichi Yamashita:
A study of weighted CSP analysis with average speech spectrum for noise robust talker localization. 2321-2324 - Young-Ik Kim, Sung Jun An, Rhee Man Kil, Hyung-Min Park:
Sound segregation based on binaural zero-crossings. 2325-2328 - Jürgen Freudenberger, Klaus Linhard:
A two-microphone diversity system and its application for hands-free car kits. 2329-2332 - Takahiro Murakami, Kiyoshi Kurihara, Yoshihisa Ishida:
Directionally constrained minimization of power algorithm for speech signals. 2333-2336 - Alessio Brutti, Maurizio Omologo, Piergiorgio Svaizer:
Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays. 2337-2340 - Nilesh Madhu, Rainer Martin:
Robust speaker localization through adaptive weighted pair TDOA (AWEPAT) estimation. 2341-2344 - Guillaume Lathoud, Mathew Magimai-Doss, Bertrand Mesot:
A spectrogram model for enhanced source localization and noise-robust ASR. 2345-2348 - Sriram Srinivasan, Mattias Nilsson, W. Bastiaan Kleijn
:
Denoising through source separation and minimum tracking. 2349-2352 - Louisa Busca Grisoni, John H. L. Hansen:
Collaborative voice activity detection for hearing aids. 2353-2356 - Enrique Robledo-Arnuncio, Biing-Hwang Juang:
Using inter-frequency decorrelation to reduce the permutation inconsistency problem in blind source separation. 2357-2360 - Amarnag Subramanya, Zhengyou Zhang, Zicheng Liu, Jasha Droppo, Alex Acero:
A graphical model for multi-sensory speech processing in air-and-bone conductive microphones. 2361-2364
Prosody in Language Performance I, II
- Heejin Kim, Jennifer Cole:
The stress foot as a unit of planned timing: evidence from shortening in the prosodic phrase. 2365-2368 - Pauline Welby, Hélène Loevenbruck:
Segmental "anchorage" and the French late rise. 2369-2372 - Ivan Chow:
Prosodic cues for syntactically-motivated junctures. 2373-2376 - Isabel Falé, Isabel Hub Faria:
A glimpse of the time-course of intonation processing in European Portuguese. 2377-2380 - Petra Wagner:
Great expectations - introspective vs. perceptual prominence ratings and their acoustic correlates. 2381-2384 - Christian Jensen, John Tøndering:
Choosing a scale for measuring perceived prominence. 2385-2388 - Jens Edlund, David House, Gabriel Skantze:
The effects of prosodic features on the interpretation of clarification ellipses. 2389-2392 - Matthias Jilka:
Exploration of different types of intonational deviations in foreign-accented and synthesized speech. 2393-2396 - Jörg Bröggelwirth:
A rhythmic-prosodic model of poetic speech. 2397-2400 - Sonja Biersack, Vera Kempe, Lorna Knapton:
Fine-tuning speech registers: a comparison of the prosodic features of child-directed and foreigner-directed speech. 2401-2404 - Timothy Arbisi-Kelm:
An analysis of the intonational structure of stuttered speech. 2405-2408 - Britta Lintfert, Wolfgang Wokurek:
Voice quality dimensions of pitch accents. 2409-2412 - Marion Dohen, Hélène Loevenbruck:
Audiovisual production and perception of contrastive focus in French: a multispeaker study. 2413-2416 - Pashiera Barkhuysen, Emiel Krahmer, Marc Swerts:
Predicting end of utterance in multimodal and unimodal conditions. 2417-2420 - Saori Tanaka, Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa:
Production of prominence in Japanese sign language. 2421-2424
Speaker Characterization and Recognition I-IV
- Andreas Stolcke, Luciana Ferrer, Sachin S. Kajarekar, Elizabeth Shriberg, Anand Venkataraman:
MLLR transforms as features in speaker recognition. 2425-2428 - Brendan Baker, Robbie Vogt, Sridha Sridharan:
Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification. 2429-2432 - Hagai Aronowitz, David Burshtein:
Efficient speaker identification and retrieval. 2433-2436 - Rohit Sinha, S. E. Tranter, Mark J. F. Gales, Philip C. Woodland:
The Cambridge University March 2005 speaker diarisation system. 2437-2440 - Xuan Zhu, Claude Barras, Sylvain Meignier, Jean-Luc Gauvain:
Combining speaker identification and BIC for speaker diarization. 2441-2444 - Dan Istrate, Nicolas Scheffer, Corinne Fredouille, Jean-François Bonastre:
Broadcast news speaker tracking for ESTER 2005 campaign. 2445-2448
Phonetics and Phonology I, II
- Sorin Dusan:
On the nature of acoustic information in identification of coarticulated vowels. 2449-2452 - Cédric Gendrot, Martine Adda-Decker:
Impact of duration on F1/F2 formant values of oral vowels: an automatic analysis of large broadcast news corpora in French and German. 2453-2456 - Hugo Quené:
Modeling of between-speaker and within-speaker variation in spontaneous speech tempo. 2457-2460 - Masahiko Komatsu, Makiko Aoyagi:
Vowel devoicing vs. mora-timed rhythm in spontaneous Japanese - inspection of phonetic labels of OGI_TS. 2461-2464 - Jalal-Eddin Al-Tamimi, Emmanuel Ferragne:
Does vowel space size depend on language vowel inventories? evidence from two Arabic dialects and French. 2465-2468 - Chilin Shih:
Understanding phonology by phonetic implementation. 2469-2472
Spoken / Multi-modal Dialogue Systems I, II
- Niels Ole Bernsen, Laila Dybkjær:
User evaluation of conversational agent h. c. Andersen. 2473-2476 - Silke Goronzy, Nicole Beringer:
Integrated development and on-the-fly simulation of multimodal dialogs. 2477-2480 - Mihai Rotaru, Diane J. Litman, Katherine Forbes-Riley:
Interactions between speech recognition problems and user emotions. 2481-2484 - Junlan Feng, Srihari Reddy, Murat Saraclar:
Webtalk: mining websites for interactively answering questions. 2485-2488 - Sebastian Möller:
Towards generic quality prediction models for spoken dialogue systems - a case study. 2489-2492 - S. Parthasarathy, Cyril Allauzen, R. Munkong:
Robust access to large structured data using voice form-filling. 2493-2496
Human factors, User Experience and Natural Language Application Design
- Esther Levin, Alex Levin:
Spoken dialog system for real-time data capture. 2497-2500 - Michael Pucher, Peter Fröhlich:
A user study on the influence of mobile device class, synthesis method, data rate and lexicon on speech synthesis quality. 2501-2504 - Fang Chen, Yael Katzenellenbogen:
User's experience of a commercial speech dialogue system. 2505-2508 - Esther Levin, Amir M. Mané:
Voice user interface design for automated directory assistance. 2509-2512 - Maria Gabriela Alvarez-Ryan, Narendra K. Gupta, Barbara Hollister, Tirso Alonso:
Optimizing user experience through design of the spoken language understanding (SLU) module. 2513-2516 - Jeremy H. Wright, David A. Kapilow, Alicia Abella:
Interactive visualization of human-machine dialogs. 2517-2520
TTS Inventory
- Matthew P. Aylett:
Synthesising hyperarticulation in unit selection TTS. 2521-2524 - Daniel Tihelka:
Symbolic prosody driven unit selection for highly natural synthetic speech. 2525-2528 - Jindrich Matousek, Zdenek Hanzlícek, Daniel Tihelka:
Hybrid syllable/triphone speech synthesis. 2529-2532 - Francisco Campillo Díaz, José Luis Alba, Eduardo Rodríguez Banga:
A neural network approach for the design of the target cost function in unit-selection speech synthesis. 2533-2536 - Christian Weiss:
FSM and k-nearest-neighbor for corpus based video-realistic audio-visual synthesis. 2537-2540 - Gui-Lin Chen, Ke-Song Han, Zhen-Li Yu, Dong-Jian Yue, Yi-Qing Zu:
An embedded and concatenative approach to TTS of multiple languages. 2541-2544 - Tony Ezzat, Ethan Meyers, James R. Glass, Tomaso A. Poggio:
Morphing spectral envelopes using audio flow. 2545-2548 - Vincent Colotte, Richard Beaufort:
Linguistic features weighting for a text-to-speech system without prosody model. 2549-2552 - Ingunn Amdal, Torbjørn Svendsen:
Unit selection synthesis database development using utterance verification. 2553-2556 - Yong Zhao, Lijuan Wang, Min Chu, Frank K. Soong, Zhigang Cao:
Refining phoneme segmentations using speaker-adaptive context dependent boundary models. 2557-2560 - Yining Chen, Yong Zhao, Min Chu:
Customizing base unit set with speech database in TTS systems. 2561-2564 - Soufiane Rouibia, Olivier Rosec:
Unit selection for speech synthesis based on a new acoustic target cost. 2565-2568 - Dan Chazan, Ron Hoory, Zvi Kons, Ariel Sagi, Slava Shechtman, Alexander Sorin:
Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling. 2569-2572 - Francesc Alías, Ignasi Iriondo Sanz, Lluís Formiga, Xavier Gonzalvo, Carlos Monzo, Xavier Sevillano:
High quality Spanish restricted-domain TTS oriented to a weather forecast application. 2573-2576 - Ingmund Bjrkan, Torbjørn Svendsen, Snorre Farner:
Comparing spectral distance measures for join cost optimization in concatenative speech synthesis. 2577-2580 - Maria João Barros, Ranniery Maia,