default search action
INTERSPEECH 2002: Denver, Colorado, USA
- John H. L. Hansen, Bryan L. Pellom:
7th International Conference on Spoken Language Processing, ICSLP2002 - INTERSPEECH 2002, Denver, Colorado, USA, September 16-20, 2002. ISCA 2002
Keynotes
- W. Tecumseh Fitch:
The evolution of spoken language: a comparative approach. 1-8 - Steve J. Young:
Talking to machines (statistically speaking). 9-16
Speech Recognition in Noise - I
- Duncan Macho, Laurent Mauuary, Bernhard Noé, Yan Ming Cheng, Douglas Ealey, Denis Jouvet, Holly Kelleher, David Pearce, Fabien Saadoun:
Evaluation of a noise-robust DSR front-end on Aurora databases. 17-20 - André Gustavo Adami, Lukás Burget, Stéphane Dupont, Harinath Garudadri, Frantisek Grézl, Hynek Hermansky, Pratibha Jain, Sachin S. Kajarekar, Nelson Morgan, Sunil Sivadas:
Qualcomm-ICSI-OGI features for ASR. 21-24 - Michael Kleinschmidt, David Gelbart:
Improving word accuracy with Gabor feature extraction. 25-28 - Jasha Droppo, Li Deng, Alex Acero:
Evaluation of SPLICE on the Aurora 2 and 3 tasks. 29-32 - Brian Kan-Wing Mak, Yik-Cheung Tam:
Performance of discriminatively trained auditory features on Aurora2 and Aurora3. 33-36 - José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio:
Feature extraction combining spectral noise reduction and cepstral histogram equalization for robust ASR. 225-228 - Jingdong Chen, Dimitris Dimitriadis, Hui Jiang, Qi Li, Tor André Myrvoll, Olivier Siohan, Frank K. Soong:
Bell labs approach to Aurora evaluation on connected digit recognition. 229-232 - Hong Kook Kim, Richard C. Rose:
Algorithms for distributed speech recognition in a noisy automobile environment. 233-236 - Florian Hilger, Sirko Molau, Hermann Ney:
Quantile based histogram equalization for online applications. 237-240 - Chia-Ping Chen, Karim Filali, Jeff A. Bilmes:
Frontend post-processing and backend model enhancement on the Aurora 2.0/3.0 databases. 241-244 - Masaki Ida, Satoshi Nakamura:
HMM COmposition-based rapid model adaptation using a priori noise GMM adaptation evaluation on Aurora2 corpus. 437-440 - Jeih-Weih Hung, Lin-Shan Lee:
Data-driven temporal filters obtained via different optimization criteria evaluated on Aurora2 database. 441-444 - Bojan Kotnik, Damjan Vlaj, Zdravko Kacic, Bogomir Horvat:
Efficient additive and convolutional noise reduction procedures. 445-448 - Markus Lieb, Alexander Fischer:
Progress with the philips continuous ASR system on the Aurora 2 noisy digits database. 449-452 - Jian Wu, Qiang Huo:
An environment compensated minimum classification error training approach and its evaluation on Aurora2 database. 453-456 - Kaisheng Yao, Donglai Zhu, Satoshi Nakamura:
Evaluation of a noise adaptive speech recognition system on the Aurora 3 database. 457-460 - Laura Docío Fernández, Carmen García-Mateo:
Distributed speech recognition over IP networks on the Aurora 3 database. 461-464 - Masakiyo Fujimoto, Yasuo Ariki:
Evaluation of noisy speech recognition based on noise reduction and acoustic model adaptation on the Aurora2 tasks. 465-468 - George Saon, Juan M. Huerta:
Improvements to the IBM Aurora 2 multi-condition system. 469-472 - Pratibha Jain, Hynek Hermansky, Brian Kingsbury:
Distributed speech recognition using noise-robust MFCC and traps-estimated manner features. 473-476 - Norihide Kitaoka, Seiichi Nakagawa:
Evaluation of spectral subtraction with smoothing of time direction on the Aurora 2 task. 477-480 - Xiaodong Cui, Markus Iseli, Qifeng Zhu, Abeer Alwan:
Evaluation of noise robust features on the Aurora databases. 481-484 - Nicholas W. D. Evans, John S. D. Mason:
Computationally efficient noise compensation for robust automatic speech recognition assessed under the Aurora 2/3 framework. 485-488 - Omar Farooq, Sekharjit Datta:
Mel-scaled wavelet filter based features for noisy unvoiced phoneme recognition. 1017-1020 - Kazuo Onoe, Hiroyuki Segi, Takeshi Kobayakawa, Shoei Sato, Toru Imai, Akio Ando:
Filter bank subtraction for robust speech recognition. 1021-1024 - Andrew C. Morris, Simon Payne, Hervé Bourlard:
Low cost duration modelling for noise robust speech recognition. 1025-1028 - Yifan Gong:
A comparative study of approximations for parallel model combination of static and dynamic parameters. 1029-1032 - Petr Motlícek, Lukás Burget:
Noise estimation for efficient speech enhancement and robust speech recognition. 1033-1036 - Özgür Çetin, Harriet J. Nock, Katrin Kirchhoff, Jeff A. Bilmes, Mari Ostendorf:
The 2001 GMTK-based SPINE ASR system. 1037-1040 - Wei-Wen Hung:
Using adaptive signal limiter together with weighting techniques for noisy speech recognition. 1041-1044 - Shingo Yamade, Kanako Matsunami, Akira Baba, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Spectral subtraction in noisy environments applied to speaker adaptation based on HMM sufficient statistics. 1045-1048 - Man-Hung Siu, Yu-Chung Chan:
Robust speech recognition against short-time noise. 1049-1052 - Mario Toma, Andrea Lodi, Roberto Guerrieri:
Word endpoints detection in the presence of non-stationary noise. 1053-1056 - Pere Pujol Marsal, Susagna Pol, Astrid Hagen, Hervé Bourlard, Climent Nadeu:
Comparison and combination of RASTA-PLP and FF features in a hybrid HMM/MLP speech recognition system. 1057-1060 - Tao Xu, Zhigang Cao:
Robust MMSE-FW-LAASR scheme at low SNRs. 1061-1064 - András Zolnay, Ralf Schlüter, Hermann Ney:
Robust speech recognition using a voiced-unvoiced feature. 1065-1068 - Febe de Wet, Johan de Veth, Bert Cranen, Lou Boves:
Accumulated kullback divergence for analysis of ASR performance in the presence of noise. 1069-1072 - Brian Kingsbury, Pratibha Jain, André Gustavo Adami:
A hybrid HMM/traps model for robust voice activity detection. 1073-1076 - Chengyi Zheng, Yonghong Yan:
Run time information fusion in speech recognition. 1077-1080 - Jon A. Arrowood, Mark A. Clements:
Using observation uncertainty in HMM decoding. 1561-1564 - Matthew N. Stuttle, Mark J. F. Gales:
Combining a Gaussian mixture model front end with MFCC parameters. 1565-1568 - Jasha Droppo, Alex Acero, Li Deng:
Noise from corrupted speech log mel-spectral energies. 1569-1572 - Carlos S. Lima, Luís B. Almeida, João L. Monteiro:
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition. 1573-1576 - Venkata Ramana Rao Gadde, Andreas Stolcke, Dimitra Vergyri, Jing Zheng, M. Kemal Sönmez, Anand Venkataraman:
Building an ASR system for noisy environments: SRI's 2001 SPINE evaluation system. 1577-1580
Experimental Phonetics
- R. J. J. H. van Son, Louis C. W. Pols:
Evidence for efficiency in vowel production. 37-40 - Matthew P. Aylett:
Stochastic suprasegmentals: relationship between the spectral characteristics of vowels, redundancy and prosodic structure. 41-44 - Jihène Serkhane, Jean-Luc Schwartz, Louis-Jean Boë, Barbara L. Davis, Christine L. Matyear:
Motor specifications of a baby robot via the analysis of infants² vocalizations. 45-48 - Laura L. Koenig, Jorge C. Lucero:
Oral-laryngeal control patterns for fricatives in 5-year-olds and adults. 49-52 - Véronique Delvaux, Thierry Metens, Alain Soquet:
French nasal vowels: acoustic and articulatory properties. 53-56
Speech Recognition: Adaptation
- Patrick Kenny, Gilles Boulianne, Pierre Dumouchel:
Maximum likelihood estimation of eigenvoices and residual variances for large vocabulary speech recognition tasks. 57-60 - Ernest Pusateri, Timothy J. Hazen:
Rapid speaker adaptation using speaker clustering. 61-64 - Chao Huang, Tao Chen, Eric Chang:
Adaptive model combination for dynamic speaker selection training. 65-68 - Ka-Yan Kwan, Tan Lee, Chen Yang:
Unsupervised n-best based model adaptation using model-level confidence measures. 69-72 - Patrick Nguyen, Luca Rigazio, Christian Wellekens, Jean-Claude Junqua:
LU factorization for feature transformation. 73-76 - Guo-Hong Ding, Yi-Fei Zhu, Chengrong Li, Bo Xu:
Implementing vocal tract length normalization in the MLLR framework. 1389-1392 - Dong Kook Kim, Nam Soo Kim:
Markov models based on speaker space model evolution. 1393-1396 - Baojie Li, Keikichi Hirose, Nobuaki Minematsu:
Robust speech recognition using inter-speaker and intra-speaker adaptation. 1397-1400 - Carlos S. Lima, Luís B. Almeida, João L. Monteiro:
Continuous environmental adaptation of a speech recogniser in telephone line conditions. 1401-1404 - Irina Illina:
Tree-structured maximum a posteriori adaptation for a segment-based speech recognition system. 1405-1408 - Thomas Plötz, Gernot A. Fink:
Robust time-synchronous environmental adaptation for continuous speech recognition systems. 1409-1412 - Thomas Niesler, Daniel Willett:
Unsupervised language model adaptation for lecture speech transcription. 1413-1416 - Yongxin Li, Hakan Erdogan, Yuqing Gao, Etienne Marcheret:
Incremental on-line feature space MLLR adaptation for telephony speech recognition. 1417-1420 - Sirko Molau, Florian Hilger, Daniel Keysers, Hermann Ney:
Enhanced histogram normalization in the acoustic feature space. 1421-1424 - David N. Levin:
Blind normalization of speech from different channels and speakers. 1425-1428 - Jun Ogata, Yasuo Ariki:
Unsupervised acoustic model adaptation based on phoneme error minimization. 1429-1432 - Bowen Zhou, John H. L. Hansen:
Improved structural maximum likelihood eigenspace mapping for rapid speaker adaptation. 1433-1436 - Ángel de la Torre, Dominique Fohr, Jean Paul Haton:
Statistical adaptation of acoustic models to noise conditions for robust speech recognition. 1437-1440 - Fabio Brugnara, Mauro Cettolo, Marcello Federico, Diego Giuliani:
Issues in automatic transcription of historical audio data. 1441-1444
Language Identification
- Verna Stockmal, Zinny S. Bond:
Same talker, different language: a replication. 77-80 - A. K. V. Sai Jayram, V. Ramasubramanian, T. V. Sreenivas:
Automatic language identification using acoustic sub-word units. 81-84 - Ian Maddieson, Ioana Vasilescu:
Factors in human language identification. 85-88 - Pedro A. Torres-Carrasquillo, Elliot Singer, Mary A. Kohler, Richard J. Greene, Douglas A. Reynolds, John R. Deller Jr.:
Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. 89-92 - Eddie Wong, Sridha Sridharan:
Methods to improve Gaussian mixture model based language identification system. 93-96
Speech Synthesis
- Hongyan Jing, Evelyne Tzoukermann:
Part-of-speech tagging in French text-to-speech synthesis: experiments in tagset selection. 97-100 - Ulla Uebler:
Grapheme-to-phoneme conversion using pseudo-morphological units. 101-104 - Maximilian Bisani, Hermann Ney:
Investigations on joint-multigram models for grapheme-to-phoneme conversion. 105-108 - Lucian Galescu, James F. Allen:
Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion. 109-112 - Matthias Jilka, Ann K. Syrdal:
The AT&t German text-to-speech system: realistic linguistic description. 113-116 - Haiping Li, Fangxin Chen, Liqin Shen:
Generating script using statistical information of the context variation unit vector. 117-120 - Chih-Chung Kuo, Jing-Yi Huang:
Efficient and scalable methods for text script generation in corpus-based TTS design. 121-124 - Peter Rutten, Matthew P. Aylett, Justin Fackrell, Paul Taylor:
A statistically motivated database pruning technique for unit selection synthesis. 125-128 - Yi-Jian Wu, Yu Hu, Xiaoru Wu, Ren-Hua Wang:
A new method of building decision tree based on target information. 129-132 - Junichi Yamagishi, Masatsune Tamura, Takashi Masuko, Keiichi Tokuda, Takao Kobayashi:
A context clustering technique for average voice model in HMM-based speech synthesis. 133-136 - Minoru Tsuzaki, Hisashi Kawai:
Feature extraction for unit selection in concatenative speech synthesis: comparison between AIM, LPC, and MFCC. 137-140 - Francisco Campillo Díaz, Eduardo Rodríguez Banga:
Combined prosody and candidate unit selections for corpus-based text-to-speech systems. 141-144 - Yeon-Jun Kim, Alistair Conkie:
Automatic segmentation combining an HMM-based approach and spectral boundary correction. 145-148 - Abhinav Sethy, Shrikanth S. Narayanan:
Refined speech segmentation for concatenative speech synthesis. 149-152 - Andrew P. Breen, Barry Eggleton, Peter Dion, Steve Minnis:
Refocussing on the text normalisation process in text-to-speech systems. 153-156 - Jithendra Vepa, Jahnavi Ayachitam, K. V. K. Kalpana Reddy:
A text-to-speech synthesis system for telugu. 157-160 - Diamantino Freitas, Daniela Braga:
Towards an intonation module for a portuguese TTS system. 161-164 - Takashi Saito, Masaharu Sakamoto:
Applying a hybrid intonation model to a seamless speech synthesizer. 165-168 - Toshio Hirai, Seiichi Tenpaku, Kiyohiro Shikano:
Using start/end timings of spectral transitions between phonemes in concatenative speech synthesis. 2357-2360 - Jinfu Ni, Hisashi Kawai:
Design of a Mandarin sentence set for corpus-based speech synthesis by use of a multi-tier algorithm taking account of the varied prosodic and spectral characteristics. 2361-2364 - Hiroki Mori, Takahiro Ohtsuka, Hideki Kasuya:
A data-driven approach to source-formant type text-to-speech system. 2365-2368 - Yu Shi, Eric Chang, Hu Peng, Min Chu:
Power spectral density based channel equalization of large speech database for concatenative TTS system. 2369-2372 - Helen M. Meng, Chi-Kin Keung, Kai-Chung Siu, Tien Ying Fung, P. C. Ching:
CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects. 2373-2376 - Jinlin Lu, Hisashi Kawai:
Perceptual evaluation of naturalness due to substitution of Chinese syllable for concatenative speech synthesis. 2377-2380 - Dan Chazan, Ron Hoory, Zvi Kons, Dorel Silberstein, Alexander Sorin:
Reducing the footprint of the IBM trainable speech synthesis system. 2381-2384 - Sung-Joo Lee, Hyung Soon Kim:
Computationally efficient time-scale modification of speech using 3 level clipping. 2385-2388 - Zhiwei Shuang, Yu Hu, Zhen-Hua Ling, Ren-Hua Wang:
A miniature Chinese TTS system based on tailored corpus. 2389-2392 - Hoeun Song, Jaein Kim, Kyongrok Lee, Jinyoung Kim:
Phonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system. 2393-2396 - Hideki Kawahara, Parham Zolfaghari, Alain de Cheveigné:
On F0 trajectory optimization for very high-quality speech manipulation. 2397-2400 - Tan Lee, Greg Kochanski, Chilin Shih, Yujia Li:
Modeling tones in continuous Cantonese speech. 2401-2404 - Minghui Dong, Kim-Teng Lua:
Pitch contour model for Chinese text-to-speech using CART and statistical model. 2405-2408 - Phuay Hui Low, Saeed Vaseghi:
Application of microprosody models in text to speech synthesis. 2413-2416 - Sheng Zhao, Jianhua Tao, Lianhong Cai:
Prosodic phrasing with inductive learning. 2417-2420 - Ben Milner, Xu Shao:
Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model. 2421-2424 - Hiromichi Kawanami, Tsuyoshi Masuda, Tomoki Toda, Kiyohiro Shikano:
Designing Japanese speech database covering wide range in prosody for hybrid speech synthesizer. 2425-2428
Multimodal Spoken Language Processing
- Dirk Bühler, Wolfgang Minker, Jochen Häußler, Sven Krger:
Flexible multimodal human-machine interaction in mobile environments. 169-172 - Edward C. Kaiser, Philip R. Cohen:
Implementation testing of a hybrid symbolic/statistical multimodal architecture. 173-176 - Yoko Yamakata, Tatsuya Kawahara, Hiroshi G. Okuno:
Belief network based disambiguation of object reference in spoken dialogue system for robot. 177-180 - Jonas Beskow, Jens Edlund, Magnus Nordstrand:
Specification and realisation of multimodal output in dialogue systems. 181-184 - Francis K. H. Quek, Yingen Xiong, David McNeill:
Gestural trajectory symmetries and discourse segmentation. 185-188 - Francis K. H. Quek, David McNeill, Robert K. Bryll, Mary P. Harper:
Gestural spatialization in natural discourse segmentation. 189-192 - Kazuhiro Nakadai, Hiroshi G. Okuno, Hiroaki Kitano:
Real-time sound source localization and separation for robot audition. 193-196 - Jiyong Ma, Jie Yan, Ronald A. Cole:
CU animate tools for enabling conversations with animated characters. 197-200 - Philip R. Cohen, Rachel Coulston, Kelly Krout:
Multiparty multimodal interaction: a preliminary analysis. 201-204 - Peter Poller, Jochen Müller:
Distributed audio-visual speech synchronization. 205-208 - Philippe Daubias, Paul Deléglise:
Lip-reading based on a fully automatic statistical model. 209-212 - Xiaoxing Liu, Yibao Zhao, Xiaobo Pi, Luhong Liang, Ara V. Nefian:
Audio-visual continuous speech recognition using a coupled hidden Markov model. 213-216