


Остановите войну!
for scientists:
INTERSPEECH 2002: Denver, Colorado, USA
- John H. L. Hansen, Bryan L. Pellom:
7th International Conference on Spoken Language Processing, ICSLP2002 - INTERSPEECH 2002, Denver, Colorado, USA, September 16-20, 2002. ISCA 2002
Keynotes
- W. Tecumseh Fitch:
The evolution of spoken language: a comparative approach. - Steve J. Young:
Talking to machines (statistically speaking).
Speech Recognition in Noise - I
- Duncan Macho, Laurent Mauuary, Bernhard Noé, Yan Ming Cheng, Douglas Ealey, Denis Jouvet, Holly Kelleher, David Pearce, Fabien Saadoun:
Evaluation of a noise-robust DSR front-end on Aurora databases. - André Gustavo Adami, Lukás Burget, Stéphane Dupont, Harinath Garudadri, Frantisek Grézl, Hynek Hermansky, Pratibha Jain, Sachin S. Kajarekar, Nelson Morgan, Sunil Sivadas:
Qualcomm-ICSI-OGI features for ASR. - Michael Kleinschmidt, David Gelbart:
Improving word accuracy with Gabor feature extraction. - Jasha Droppo, Li Deng, Alex Acero:
Evaluation of SPLICE on the Aurora 2 and 3 tasks. - Brian Kan-Wing Mak, Yik-Cheung Tam:
Performance of discriminatively trained auditory features on Aurora2 and Aurora3. - José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio:
Feature extraction combining spectral noise reduction and cepstral histogram equalization for robust ASR. - Jingdong Chen, Dimitris Dimitriadis, Hui Jiang, Qi Li, Tor André Myrvoll, Olivier Siohan, Frank K. Soong:
Bell labs approach to Aurora evaluation on connected digit recognition. - Hong Kook Kim, Richard C. Rose:
Algorithms for distributed speech recognition in a noisy automobile environment. - Florian Hilger, Sirko Molau, Hermann Ney:
Quantile based histogram equalization for online applications. - Chia-Ping Chen, Karim Filali, Jeff A. Bilmes:
Frontend post-processing and backend model enhancement on the Aurora 2.0/3.0 databases. - Masaki Ida, Satoshi Nakamura:
HMM COmposition-based rapid model adaptation using a priori noise GMM adaptation evaluation on Aurora2 corpus. - Jeih-Weih Hung, Lin-Shan Lee:
Data-driven temporal filters obtained via different optimization criteria evaluated on Aurora2 database. - Bojan Kotnik, Damjan Vlaj, Zdravko Kacic, Bogomir Horvat:
Efficient additive and convolutional noise reduction procedures. - Markus Lieb, Alexander Fischer:
Progress with the philips continuous ASR system on the Aurora 2 noisy digits database. - Jian Wu, Qiang Huo:
An environment compensated minimum classification error training approach and its evaluation on Aurora2 database. - Kaisheng Yao, Donglai Zhu, Satoshi Nakamura:
Evaluation of a noise adaptive speech recognition system on the Aurora 3 database. - Laura Docío Fernández, Carmen García-Mateo:
Distributed speech recognition over IP networks on the Aurora 3 database. - Masakiyo Fujimoto, Yasuo Ariki:
Evaluation of noisy speech recognition based on noise reduction and acoustic model adaptation on the Aurora2 tasks. - George Saon, Juan M. Huerta:
Improvements to the IBM Aurora 2 multi-condition system. - Pratibha Jain, Hynek Hermansky, Brian Kingsbury:
Distributed speech recognition using noise-robust MFCC and traps-estimated manner features. - Norihide Kitaoka, Seiichi Nakagawa:
Evaluation of spectral subtraction with smoothing of time direction on the Aurora 2 task. - Xiaodong Cui, Markus Iseli, Qifeng Zhu, Abeer Alwan:
Evaluation of noise robust features on the Aurora databases. - Nicholas W. D. Evans, John S. D. Mason:
Computationally efficient noise compensation for robust automatic speech recognition assessed under the Aurora 2/3 framework. - Omar Farooq, Sekharjit Datta:
Mel-scaled wavelet filter based features for noisy unvoiced phoneme recognition. - Kazuo Onoe, Hiroyuki Segi, Takeshi Kobayakawa, Shoei Sato, Toru Imai, Akio Ando:
Filter bank subtraction for robust speech recognition. - Andrew C. Morris, Simon Payne, Hervé Bourlard:
Low cost duration modelling for noise robust speech recognition. - Yifan Gong:
A comparative study of approximations for parallel model combination of static and dynamic parameters. - Petr Motlícek, Lukás Burget:
Noise estimation for efficient speech enhancement and robust speech recognition. - Özgür Çetin, Harriet J. Nock, Katrin Kirchhoff, Jeff A. Bilmes, Mari Ostendorf:
The 2001 GMTK-based SPINE ASR system. - Wei-Wen Hung:
Using adaptive signal limiter together with weighting techniques for noisy speech recognition. - Shingo Yamade, Kanako Matsunami, Akira Baba, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Spectral subtraction in noisy environments applied to speaker adaptation based on HMM sufficient statistics. - Man-Hung Siu, Yu-Chung Chan:
Robust speech recognition against short-time noise. - Mario Toma, Andrea Lodi, Roberto Guerrieri:
Word endpoints detection in the presence of non-stationary noise. - Pere Pujol Marsal, Susagna Pol, Astrid Hagen, Hervé Bourlard, Climent Nadeu:
Comparison and combination of RASTA-PLP and FF features in a hybrid HMM/MLP speech recognition system. - Tao Xu, Zhigang Cao:
Robust MMSE-FW-LAASR scheme at low SNRs. - András Zolnay, Ralf Schlüter, Hermann Ney:
Robust speech recognition using a voiced-unvoiced feature. - Febe de Wet, Johan de Veth, Bert Cranen, Lou Boves:
Accumulated kullback divergence for analysis of ASR performance in the presence of noise. - Brian Kingsbury, Pratibha Jain, André Gustavo Adami:
A hybrid HMM/traps model for robust voice activity detection. - Chengyi Zheng, Yonghong Yan:
Run time information fusion in speech recognition. - Jon A. Arrowood, Mark A. Clements:
Using observation uncertainty in HMM decoding. - Matthew N. Stuttle, M. J. F. Gales:
Combining a Gaussian mixture model front end with MFCC parameters. - Jasha Droppo, Alex Acero, Li Deng:
Noise from corrupted speech log mel-spectral energies. - Carlos S. Lima, Luís B. Almeida, João L. Monteiro:
Improving the role of unvoiced speech segments by spectral normalisation in robust speech recognition. - Venkata Ramana Rao Gadde, Andreas Stolcke, Dimitra Vergyri, Jing Zheng, M. Kemal Sönmez, Anand Venkataraman:
Building an ASR system for noisy environments: SRI's 2001 SPINE evaluation system.
Experimental Phonetics
- R. J. J. H. van Son, Louis C. W. Pols:
Evidence for efficiency in vowel production. - Matthew P. Aylett:
Stochastic suprasegmentals: relationship between the spectral characteristics of vowels, redundancy and prosodic structure. - Jihène Serkhane, Jean-Luc Schwartz, Louis-Jean Boë, Barbara L. Davis, Christine L. Matyear:
Motor specifications of a baby robot via the analysis of infants² vocalizations. - Laura L. Koenig, Jorge C. Lucero:
Oral-laryngeal control patterns for fricatives in 5-year-olds and adults. - Véronique Delvaux, Thierry Metens, Alain Soquet:
French nasal vowels: acoustic and articulatory properties.
Speech Recognition: Adaptation
- Patrick Kenny, Gilles Boulianne, Pierre Dumouchel:
Maximum likelihood estimation of eigenvoices and residual variances for large vocabulary speech recognition tasks. - Ernest Pusateri, Timothy J. Hazen:
Rapid speaker adaptation using speaker clustering. - Chao Huang, Tao Chen, Eric Chang:
Adaptive model combination for dynamic speaker selection training. - Ka-Yan Kwan, Tan Lee, Chen Yang:
Unsupervised n-best based model adaptation using model-level confidence measures. - Patrick Nguyen, Luca Rigazio, Christian Wellekens, Jean-Claude Junqua:
LU factorization for feature transformation. - Guo-Hong Ding, Yi-Fei Zhu, Chengrong Li, Bo Xu:
Implementing vocal tract length normalization in the MLLR framework. - Dong Kook Kim, Nam Soo Kim:
Markov models based on speaker space model evolution. - Baojie Li, Keikichi Hirose, Nobuaki Minematsu:
Robust speech recognition using inter-speaker and intra-speaker adaptation. - Carlos S. Lima, Luís B. Almeida, João L. Monteiro:
Continuous environmental adaptation of a speech recogniser in telephone line conditions. - Irina Illina:
Tree-structured maximum a posteriori adaptation for a segment-based speech recognition system. - Thomas Plötz, Gernot A. Fink:
Robust time-synchronous environmental adaptation for continuous speech recognition systems. - Thomas Niesler, Daniel Willett:
Unsupervised language model adaptation for lecture speech transcription. - Yongxin Li, Hakan Erdogan, Yuqing Gao, Etienne Marcheret:
Incremental on-line feature space MLLR adaptation for telephony speech recognition. - Sirko Molau, Florian Hilger, Daniel Keysers, Hermann Ney:
Enhanced histogram normalization in the acoustic feature space. - David N. Levin:
Blind normalization of speech from different channels and speakers. - Jun Ogata, Yasuo Ariki:
Unsupervised acoustic model adaptation based on phoneme error minimization. - Bowen Zhou, John H. L. Hansen:
Improved structural maximum likelihood eigenspace mapping for rapid speaker adaptation. - Ángel de la Torre, Dominique Fohr, Jean Paul Haton:
Statistical adaptation of acoustic models to noise conditions for robust speech recognition. - Fabio Brugnara, Mauro Cettolo, Marcello Federico, Diego Giuliani:
Issues in automatic transcription of historical audio data.
Language Identification
- Verna Stockmal, Zinny S. Bond:
Same talker, different language: a replication. - A. K. V. Sai Jayram, V. Ramasubramanian, T. V. Sreenivas:
Automatic language identification using acoustic sub-word units. - Ian Maddieson, Ioana Vasilescu:
Factors in human language identification. - Pedro A. Torres-Carrasquillo, Elliot Singer, Mary A. Kohler, Richard J. Greene, Douglas A. Reynolds, John R. Deller Jr.:
Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. - Eddie Wong, Sridha Sridharan:
Methods to improve Gaussian mixture model based language identification system.
Speech Synthesis
- Hongyan Jing, Evelyne Tzoukermann:
Part-of-speech tagging in French text-to-speech synthesis: experiments in tagset selection. - Ulla Uebler:
Grapheme-to-phoneme conversion using pseudo-morphological units. - Maximilian Bisani, Hermann Ney:
Investigations on joint-multigram models for grapheme-to-phoneme conversion. - Lucian Galescu, James F. Allen:
Pronunciation of proper names with a joint n-gram model for bi-directional grapheme-to-phoneme conversion. - Matthias Jilka, Ann K. Syrdal:
The AT&t German text-to-speech system: realistic linguistic description. - Haiping Li, Fangxin Chen, Liqin Shen:
Generating script using statistical information of the context variation unit vector. - Chih-Chung Kuo, Jing-Yi Huang:
Efficient and scalable methods for text script generation in corpus-based TTS design. - Peter Rutten, Matthew P. Aylett, Justin Fackrell, Paul Taylor:
A statistically motivated database pruning technique for unit selection synthesis. - Yi-Jian Wu, Yu Hu, Xiaoru Wu, Ren-Hua Wang:
A new method of building decision tree based on target information. - Junichi Yamagishi, Masatsune Tamura, Takashi Masuko, Keiichi Tokuda, Takao Kobayashi:
A context clustering technique for average voice model in HMM-based speech synthesis. - Minoru Tsuzaki, Hisashi Kawai:
Feature extraction for unit selection in concatenative speech synthesis: comparison between AIM, LPC, and MFCC. - Francisco Campillo Díaz, Eduardo Rodríguez Banga:
Combined prosody and candidate unit selections for corpus-based text-to-speech systems. - Yeon-Jun Kim, Alistair Conkie:
Automatic segmentation combining an HMM-based approach and spectral boundary correction. - Abhinav Sethy, Shrikanth S. Narayanan:
Refined speech segmentation for concatenative speech synthesis. - Andrew P. Breen, Barry Eggleton, Peter Dion, Steve Minnis:
Refocussing on the text normalisation process in text-to-speech systems. - Jithendra Vepa, Jahnavi Ayachitam, K. V. K. Kalpana Reddy:
A text-to-speech synthesis system for telugu. - Diamantino Freitas, Daniela Braga:
Towards an intonation module for a portuguese TTS system. - Takashi Saito, Masaharu Sakamoto:
Applying a hybrid intonation model to a seamless speech synthesizer. - Toshio Hirai, Seiichi Tenpaku, Kiyohiro Shikano:
Using start/end timings of spectral transitions between phonemes in concatenative speech synthesis. - Jinfu Ni, Hisashi Kawai:
Design of a Mandarin sentence set for corpus-based speech synthesis by use of a multi-tier algorithm taking account of the varied prosodic and spectral characteristics. - Hiroki Mori, Takahiro Ohtsuka, Hideki Kasuya:
A data-driven approach to source-formant type text-to-speech system. - Yu Shi, Eric Chang, Hu Peng, Min Chu:
Power spectral density based channel equalization of large speech database for concatenative TTS system. - Helen M. Meng, Chi-Kin Keung, Kai-Chung Siu, Tien Ying Fung, P. C. Ching:
CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects. - Jinlin Lu, Hisashi Kawai:
Perceptual evaluation of naturalness due to substitution of Chinese syllable for concatenative speech synthesis. - Dan Chazan, Ron Hoory, Zvi Kons, Dorel Silberstein, Alexander Sorin:
Reducing the footprint of the IBM trainable speech synthesis system. - Sung-Joo Lee, Hyung Soon Kim:
Computationally efficient time-scale modification of speech using 3 level clipping. - Zhiwei Shuang, Yu Hu, Zhen-Hua Ling, Ren-Hua Wang:
A miniature Chinese TTS system based on tailored corpus. - Hoeun Song, Jaein Kim, Kyongrok Lee, Jinyoung Kim:
Phonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system. - Hideki Kawahara, Parham Zolfaghari, Alain de Cheveigné:
On F0 trajectory optimization for very high-quality speech manipulation. - Tan Lee, Greg Kochanski, Chilin Shih, Yujia Li:
Modeling tones in continuous Cantonese speech. - Minghui Dong, Kim-Teng Lua:
Pitch contour model for Chinese text-to-speech using CART and statistical model. - Phuay Hui Low, Saeed Vaseghi:
Application of microprosody models in text to speech synthesis. - Sheng Zhao, Jianhua Tao, Lianhong Cai:
Prosodic phrasing with inductive learning. - Ben Milner, Xu Shao:
Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model. - Hiromichi Kawanami, Tsuyoshi Masuda, Tomoki Toda, Kiyohiro Shikano:
Designing Japanese speech database covering wide range in prosody for hybrid speech synthesizer.
Multimodal Spoken Language Processing
- Dirk Bühler, Wolfgang Minker, Jochen Häußler, Sven Krger:
Flexible multimodal human-machine interaction in mobile environments. - Edward C. Kaiser, Philip R. Cohen:
Implementation testing of a hybrid symbolic/statistical multimodal architecture. - Yoko Yamakata, Tatsuya Kawahara, Hiroshi G. Okuno:
Belief network based disambiguation of object reference in spoken dialogue system for robot. - Jonas Beskow, Jens Edlund, Magnus Nordstrand:
Specification and realisation of multimodal output in dialogue systems. - Francis K. H. Quek, Yingen Xiong, David McNeill:
Gestural trajectory symmetries and discourse segmentation. - Francis K. H. Quek, David McNeill, Robert K. Bryll, Mary P. Harper:
Gestural spatialization in natural discourse segmentation. - Kazuhiro Nakadai, Hiroshi G. Okuno, Hiroaki Kitano:
Real-time sound source localization and separation for robot audition. - Jiyong Ma, Jie Yan, Ronald A. Cole:
CU animate tools for enabling conversations with animated characters. - Philip R. Cohen, Rachel Coulston, Kelly Krout:
Multiparty multimodal interaction: a preliminary analysis. - Peter Poller, Jochen Müller:
Distributed audio-visual speech synchronization. - Philippe Daubias, Paul Deléglise:
Lip-reading based on a fully automatic statistical model. - Xiaoxing Liu, Yibao Zhao, Xiaobo Pi, Luhong Liang, Ara V. Nefian:
Audio-visual continuous speech recognition using a coupled hidden Markov model. - Laila Dybkjær, Niels Ole Bernsen:
Data, annotation schemes and coding tools for natural interactivity. - Francis K. H. Quek, Yang Shi, Cemil Kirbas, Shunguang Wu:
VisSTA: a tool for analyzing multimodal discourse data.
Perception: Non-Native
- Stephen G. Lambacher, William L. Martens, Kazuhiko Kakehi:
The influence of identification training on identification and production of the american English mid and low vowels by native speakers of Japanese. - Keiichi Tajima, Reiko Akahane-Yamada, Tsuneo Yamada:
Perceptual learning of second-language syllable rhythm by elderly listeners. - Constance M. Clarke:
Perceptual adjustment to foreign-accented English with short term exposure. - Denis K. Burnham, Ron Brooker:
Absolute pitch and lexical tones: tone perception by non-musician, musician, and absolute pitch non-tonal language speakers.