default search action
ICASSP 1996: Atlanta, Georgia, USA
- 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, ICASSP '96, Atlanta, Georgia, USA, May 7-10, 1996. IEEE Computer Society 1996, ISBN 0-7803-3192-3
Volume 1
Robust Recognition: Signals and Features
- San Zhu, Dao Wen Chen, Taiyi Huang:
Feature parameter curve method for high performance NN-based speech recognition. 1-4 - Jialong He, Li Liu, Günther Palm:
On the use of residual cepstrum in speech recognition. 5-8 - J. A. Thripuraneni, Wei Lou, Victor E. DeBrunner:
Mixed Malvar-wavelets for non-stationary signal representation. 13-16 - Daniel J. Mashao:
Experiments on a parametric nonlinear spectral warping for an HMM-based speech recognizer. 17-20 - John C. Pearson, Qiguang Lin, ChiWei Che, Dong-Suk Yuk, Limin Jin, Bert de Vries, James L. Flanagan:
Robust distant-talking speech recognition. 21-24 - Adam B. Fineberg, Kevin C. Yu:
Time-frequency representation based cepstral processing for speech recognition. 25-28 - Nabil N. Bitar, Carol Y. Espy-Wilson:
Knowledge-based parameters for HMM speech recognition. 29-32 - Ted H. Applebaum, Philippe Morin, Brian A. Hanson:
A phoneme-similarity based ASR front-end. 33-36 - Brian Strope, Abeer Alwan:
A model of dynamic auditory perception and its application to robust speech recognition. 37-40
Robust Recognition: Noise and Environment
- Hiroki Yamamoto, Masayuki Yamada, Tetsuo Kosaka, Yasuhiro Komori, Yasunori Ohora:
Independent calculation of power parameters on PMC method. 41-44 - Jen-Tzung Chien, Lee-Min Lee, Hsiao-Chuan Wang:
Noisy speech recognition using variance adapted likelihood measure. 45-48 - Ruikang Yang, Petri Haavisto:
An improved noise compensation algorithm for speech recognition in noise. 49-52 - Brian D. Womack, John H. L. Hansen:
Improved speech recognition via speaker stress directed classification. 53-56 - Sunil K. Gupta, Frank K. Soong, Raziel Haimi-Cohen:
High-accuracy connected digit recognition for mobile applications. 57-60 - Doh-Suk Kim, Jae-Hoon Jeong, Jae-Weon Kim, Soo-Young Lee:
Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments. 61-64 - Philip C. Woodland, Mark John Francis Gales, David Pye:
Improving environmental robustness in large vocabulary speech recognition. 65-68 - Satoshi Nakamura, Tetsuya Takiguchi, Kiyohiro Shikano:
Noise and room acoustics distorted speech recognition by HMM composition. 69-72 - Jean-Luc Gauvain, Lori Lamel, Gilles Adda, Driss Matrouf:
Developments in continuous speech dictation using the 1995 ARPA NAB news task. 73-76 - Sandra Dufour, Catherine Glorion, Philip Lockwood:
Evaluation of root-normalised front-end (RN LFCC) for speech recognition in wireless GSM network environments. 77-80
Speaker Recognition I
- Aaron E. Rosenberg, Sarangarajan Parthasarathy:
Speaker background models for connected digit password speaker verification. 81-84 - John M. Colombi, Dennis W. Ruck, Timothy R. Anderson, Steven K. Rogers, Mark E. Oxley:
Cohort selection and word grammar effects for speaker recognition. 85-88 - Cesar Martín del Alamo, Francisco Javier Caminero Gil, Celinda de la Torre-Munilla, Luis A. Hernández Gómez:
Discriminative training of GMM for speaker identification. 89-92 - Manish Sharma, Richard J. Mammone:
Subword-based text-dependent speaker verification system with user-selectable passwords. 93-96 - Tomoko Matsui, Takashi Nishitani, Sadaoki Furui:
Robust methods of updating model and a priori threshold in speaker verification. 97-100 - Ivan Magrin-Chagnolleau, Joachim Wilke, Frédéric Bimbot:
A further investigation on AR-vector models for text-independent speaker identification. 101-104 - Michael Schmidt, Herbert Gish:
Speaker identification via support vector classifiers. 105-108 - Anand R. Setlur, Rafid A. Sukkar, Malan B. Gandhi:
Speaker verification using mixture likelihood profiles extracted from speaker independent hidden Markov models. 109-112 - Douglas A. Reynolds:
The effects of handset variability on speaker recognition performance: experiments on the Switchboard corpus. 113-116 - Pierre J. Castellano, Sridha Sridharan, David R. Cole:
Speaker recognition in reverberant enclosures. 117-120
Speech Recognition: Large Vocabulary
- Zhishun Li, Douglas D. O'Shaughnessy:
Using a transcription graph for large vocabulary continuous speech recognition. 121-124 - Jia-Lin Shen, Lin-Shan Lee:
Fast and accurate recognition of very-large-vocabulary continuous Mandarin speech for Chinese language with improved segmental probability modeling. 125-128 - Ilija Zeljkovic:
Decoding optimal state sequence with smooth state likelihoods. 129-132 - Fil Alleva, Xuedong Huang, Mei-Yuh Hwang:
Improvements on the pronunciation prefix tree search organization. 133-136 - Monika Woszczyna, Michael Finke:
Minimizing search errors due to delayed bigrams in real-time speech recognition systems. 137-140 - Gary D. Cook, James Christie, Philip Clarkson, Michael M. Hochberg, Beth T. Logan, Anthony J. Robinson, Carl W. Seymour:
Real-time recognition of broadcast radio speech. 141-144 - Tohru Shimizu, Hirofumi Yamamoto, Hirokazu Masataki, Shoichi Matsunaga, Yoshinori Sagisaka:
Spontaneous dialogue speech recognition using cross-word context constrained word graphs. 145-148 - Steve Renals, Mike Hochberg:
Efficient evaluation of the LVCSR search space using the NOWAY decoder. 149-152 - Martine Adda-Decker, Gilles Adda, Lori Lamel, Jean-Luc Gauvain:
Developments in large vocabulary, continuous speech recognition of German. 153-156 - Fu-Hua Liu, Michael Picheny, Patibandla Srinivasa, Michael D. Monkowski, C. Julian Chen:
Speech recognition on Mandarin Call Home: a large-vocabulary, conversational, and telephone speech corpus. 157-160
Speech Recognition: Language Modeling II
- Michèle Jardino:
Multilingual stochastic n-gram class language models. 161-163 - Thomas Niesler, Philip C. Woodland:
A variable-length category-based n-gram language model. 164-167 - Peter O'Boyle, Ji Ming, John G. McMahon, Francis Jack Smith:
Improving n-gram models by incorporating enhanced distributions. 168-171 - Jerome R. Bellegarda, John W. Butzberger, Yen-Lu Chow, Noah B. Coccaro, Devang Naik:
A novel word clustering algorithm based on latent semantic analysis. 172-175 - Mark Epstein, Kishore Papineni, Salim Roukos, Todd Ward, Stephen Della Pietra:
Statistical natural language understanding using hidden clumpings. 176-179 - Azarshid Farhat, Jean-Francois Isabelle, Douglas D. O'Shaughnessy:
Clustering words for statistical language models based on contextual word similarity. 180-183 - Pascale Fung:
Domain word translation by space-frequency analysis of context length histograms. 184-187 - Hirokazu Masataki, Yoshinori Sagisaka:
Variable-order N-gram generation by word-class splitting and consecutive word grouping. 188-191 - Takeshi Kawabata, Masafumi Tamoto:
Back-off method for n-gram smoothing based on binomial posteriori distribution. 192-195 - Hubert Hin-Cheung Law, Chorkin Chan:
Ergodic multigram HMM integrating word segmentation and class tagging for Chinese language modeling. 196-199
Low-Rate Speech Coding
- Alan McCree, Kwan K. Truong, E. Bryan George, Thomas P. Barnwell III, Vishu R. Viswanathan:
A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard. 200-203 - Claude Laflamme, Redwan Salami, Ridha Matmti, Jean-Pierre Adoul:
Harmonic-stochastic excitation (HSX) speech coding below 4 kbit/s. 204-207 - Tian Wang, Kun Tang, Chongxi Feng:
A high quality MBE-LPC-FE speech coder at 2.4 kbps and 1.2 kbps. 208-211 - W. Bastiaan Kleijn, Yair Shoham, Deep Sen, Roar Hagen:
A low-complexity waveform interpolation coder. 212-215 - Juan Carlos De Martin, Allen Gersho:
Mixed-domain coding of speech at 3 kb/s. 216-219 - Costas Xydeas, Binshi Cao:
Source driven variable bit rate prototype interpolation coding. 220-223 - Shahrokh Ghaemmaghami, Mohamed A. Deriche:
A new approach to very low-rate speech coding using temporal decomposition. 224-227 - Xiaoshu Qian, Ramdas Kumaresan:
A variable frame pitch estimator and test results. 228-231 - Nobuyuki Kunieda, Tetsuya Shimamura, Jouji Suzuki:
Robust method of measurement of fundamental frequency by ACLOS: autocorrelation of log spectrum. 232-235 - Stan A. McClellan, Jerry D. Gibson:
Lag-indexed VQ for pitch filter coding. 236-239
Wideband Coding and Emerging Techniques
- Minjie Xie, Jean-Pierre Adoul:
Embedded algebraic vector quantizers (EAVQ) with application to wideband speech coding. 240-243 - Ladan Baghai-Ravary, Steve W. Beet, M. Osman Tokhi:
The two-dimensional discrete cosine transform applied to speech data. 244-247 - Kiyoshi Matsumoto:
Real-time high accurate cell loss recovery technique for speech over ATM networks. 248-250 - Zhicheng Wang:
Predictive fractal interpolation mapping: differential speech coding at low bit rates. 251-254 - Jürgen W. Paulus, Jürgen Schnitzler:
16 kbit/s wideband speech coding based on unequal subbands. 255-258 - Thomas Kleinmann, Arild Lacroix:
Low delay IIR QMF banks with high perceptive quality for speech processing. 259-262 - Shan Lu, Peter C. Doerschuk:
Demodulators for AM-FM models of speech signals: a comparison. 263-266 - Gernot Kubin:
Synthesis and coding of continuous speech with the nonlinear oscillator model. 267-270 - E. Bryan George, Alan McCree, Vishu R. Viswanathan:
Variable frame rate parameter encoding via adaptive frame selection using dynamic programming. 271-274 - Juin-Hwey Chen, Dongmei Wang:
Transform predictive coding of wideband speech signals. 275-278
Topic Identification and Spoken Information Retrieval
- David A. James:
A system for unrestricted topic retrieval from radio news broadcasts. 279-282 - Neeraj Deshmukh, Mary Weber, Joe Picone:
Automated generation of N-best pronunciations of proper nouns. 283-286 - Sung-Chien Lin, Lee-Feng Chien, Keh-Jiann Chen, Lin-Shan Lee:
An efficient voice retrieval system for very-large-vocabulary Chinese textual databases with a clustered language model. 287-290 - Tatsuya Kawahara, Norihide Kitaoka, Shuji Doshita:
Concept-based phrase spotting approach for spontaneous speech understanding. 291-298 - Patrick Schone, Douglas J. Nelson:
A Dictionary Based Method for Determining Topics in Text and Transcribed Speech. 295- - Philippe Gelin, Christian Wellekens:
Keyword spotting for video soundtrack indexing. 299-302 - Barbara Peskin, Sean Connolly, Lawrence Gillick, Steve Lowe, Don McAllaster, Venkatesh Nagesha:
Improvements in switchboard recognition and topic identification. 303-306 - Jerry H. Wright, Michael J. Carey, Eluned S. Parris:
Statistical models for topic identification using phoneme substrings. 307-310 - Gareth J. F. Jones, J. T. Foote, Karen Spärck Jones, Steve J. Young:
Robust talker-independent audio document retrieval. 311-314 - Beth A. Carlson:
Unsupervised topic clustering of switchboard speech messages. 315-318
Robust Recognition: Compensation and Normalization
- Yasuo Ariki, Shigeaki Tagashira, Masayuki Nishijima:
Speaker recognition and speaker normalization by projection to speaker subspace. 319-322 - Rivarol Vergin, Douglas D. O'Shaughnessy, Vishwa Gupta:
Compensated mel frequency cepstrum coefficients. 323-326 - Yasuhiro Minami, Sadaoki Furui:
Adaptation method based on HMM composition and EM algorithm. 327-330 - Tom Claes, Dirk Van Compernolle:
SNR-normalisation for robust speech recognition. 331-334 - Nikki Mirghafori, Eric Fosler-Lussier, Nelson Morgan:
Towards robustness to fast speech in ASR. 335-338 - Steven Wegmann, Don McAllaster, Jeremy Orloff, Barbara Peskin:
Speaker normalization on conversational telephone speech. 339-341 - Alex Acero, Xuedong Huang:
Speaker and gender normalization for continuous-density hidden Markov models. 342-345 - Ellen Eide, Herbert Gish:
A parametric approach to vocal tract length normalization. 346-348 - Jay G. Wilpon, Claus Jacobsen:
A study of speech recognition for children and the elderly. 349-352 - Li Lee, Richard C. Rose:
Speaker normalization using efficient frequency warping procedures. 353-356
Speech Synthesis
- Richard A. Sharman, Jerry H. Wright:
A fast stochastic parser for determining phrase boundaries for text-to-speech synthesis. 357-360 - Michael W. Macon, Mark A. Clements:
Speech concatenation and synthesis using an overlap-add sinusoidal model. 361-364 - Werner Verhelst, Johan Mertens:
Voice conversion using partitions of spectral feature space. 365-368 - Zhenli Yu, P. C. Ching:
Determination of vocal-tract shapes from formant frequencies based on perturbation theory and interpolation method. 369-372 - Andrew J. Hunt, Alan W. Black:
Unit selection in a concatenative speech synthesis system using a large speech database. 373-376 - Shrikanth S. Narayanan, Abeer Alwan:
Parametric hybrid source models for voiced and voiceless fricative consonants. 377-380 - Takashi Saito, Yasuhide Hashimoto, Masaharu Sakamoto:
High-quality speech synthesis using context-dependent syllabic units. 381-384 - Colin C. Goodyear, Dongbing Wei:
Articulatory copy synthesis using a nine-parameter vocal tract model. 385-388 - Takashi Masuko, Keiichi Tokuda, Takao Kobayashi, Satoshi Imai:
Speech synthesis using HMMs with dynamic features. 389-392 - King-fai Lam, Cheung-Fat Chan:
Interpolating V/UV mixture functions of a harmonic model for concatenative speech synthesis. 393-396
Speech Recognition: Language Modeling I
- Holger Stahl, Johannes Müller, Manfred K. Lang:
An efficient top-down parsing algorithm for understanding speech by using stochastic syntactic and semantic models. 397-400 - Francisco Javier Caminero Gil, Jorge Alvarez-Cercadillo, Carlos Crespo-Casas, Daniel Tapias Merino:
Data-driven discourse modeling for semantic interpretation. 401-404 - Andreas Stolcke, Elizabeth Shriberg:
Statistical language modeling for speech disfluencies. 405-408 - Alex Waibel, Michael Finke, Donna Gates, Marsal Gavaldà, Thomas Kemp, Alon Lavie, Lori S. Levin, Martin Maier, Laura Mayfield, Arthur E. McNair, Ivica Rogina, Kaori Shima, Tilo Sloboda, Monika Woszczyna, Torsten Zeppenfeld, Puming Zhan:
JANUS-II-translation of spontaneous conversational speech. 409-412 - Tatsuo Matsuoka, Robert Hasson, Michael Barlow, Sadaoki Furui:
Language model acquisition from a text corpus for speech understanding. 413-15 - Wayne H. Ward, Sunil Issar:
A class based language model for speech recognition. 416-418 - Elmar Nöth, Renato De Mori, Julia Fischer, Arnd Gebhard, Stefan Harbeck, Ralf Kompe, Roland Kuhn, Heinrich Niemann, Marion Mast:
An integrated model of acoustics and language using semantic classification trees. 419-422 - Wieland Eckert, Florian Gallwitz, Heinrich Niemann:
Combining stochastic and linguistic language models for recognition of spontaneous speech. 423-426 - Eric K. Ringger, James F. Allen:
Error correction via a post-processor for continuous speech recognition. 427-430 - Akito Nagai, Yasushi Ishikawa, Kunio Nakajima:
Integration of concept-driven semantic interpretation with speech recognition. 431-434
Speech Recognition: Acoustic Modeling
- Jean-François Mari, Dominique Fohr, Jean-Claude Junqua:
A second-order HMM for high performance word and phoneme-based continuous speech recognition. 435-438 - Seiichi Nakagawa, Kazumasa Yamamoto:
Evaluation of segmental unit input HMM. 439-442 - Michiel Bacchiani, Mari Ostendorf, Yoshinori Sagisaka, Kuldip K. Paliwal:
Design of a speech recognition system based on acoustically derived segmental units. 443-446 - Wendy J. Holmes, Martin J. Russell:
Modeling speech variability with segmental HMMs. 447-450 - Luis Villarrubia, Luis A. Hernández Gómez, Jose Maria Elvira, Juan Carlos Torrecilla:
Context-dependent units for vocabulary-independent Spanish speech recognition. 451-454 - Bin Ma, Taiyi Huang, Bo Xu, Xijun Zhang, Fei Qu:
Context-dependent acoustic models for Chinese speech recognition. 455-458 - Claus Jacobsen, Jay G. Wilpon:
Automatic recognition of Danish natural numbers for telephone applications. 459-462