


default search action
INTERSPEECH 2010: Makuhari, Japan
- Takao Kobayashi, Keikichi Hirose, Satoshi Nakamura:

11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, Chiba, Japan, September 26-30, 2010. ISCA 2010
Keynotes
- Steve J. Young:

Still talking to machines (cognitively speaking). 1-10 - Tohru Ifukube:

Sound-based assistive technology supporting "seeing", "hearing" and "speaking" for the disabled and the elderly. 11-19 - Chiu-yu Tseng:

Beyond sentence prosody. 20-29
Special Session: Models of Speech - In Search of Better Representations
- Hosung Nam, Vikramjit Mitra, Mark Tiede, Elliot Saltzman, Louis Goldstein, Carol Y. Espy-Wilson, Mark Hasegawa-Johnson:

A procedure for estimating gestural scores from natural speech. 30-33 - Yen-Liang Shue, Gang Chen, Abeer Alwan:

On the interdependencies between voice quality, glottal gaps, and voice-source related acoustic measures. 34-37 - Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino:

Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems. 38-41 - Sadao Hiroya

, Takemi Mochida:
Phase equalization-based autoregressive model of speech signals. 42-45 - Yi Xu, Santitham Prom-on:

Articulatory-functional modeling of speech prosody: a review. 46-49 - Humberto M. Torres, Hansjörg Mixdorff, Jorge A. Gurlekian, Hartmut R. Pfitzinger:

Two new estimation methods for a superpositional intonation model. 50-53
ASR: Acoustic Models I-III
- Simon Wiesler, Georg Heigold, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney:

A discriminative splitting criterion for phonetic decision trees. 54-57 - Mark J. F. Gales, Kai Yu:

Canonical state models for automatic speech recognition. 58-61 - Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder A. Olsen:

Restructuring exponential family mixture models. 62-65 - Françoise Beaufays, Vincent Vanhoucke, Brian Strope:

Unsupervised discovery and training of maximally dissimilar cluster models. 66-69 - Khe Chai Sim:

Probabilistic state clustering using conditional random field for context-dependent acoustic modelling. 70-73 - Xie Sun, Yunxin Zhao:

Integrate template matching and statistical modeling for speech recognition. 74-77 - George Saon, Hagen Soltau:

Boosting systems for LVCSR. 1341-1344 - Vaibhava Goel, Tara N. Sainath, Bhuvana Ramabhadran, Peder A. Olsen, David Nahamoo, Dimitri Kanevsky:

Incorporating sparse representation phone identification features in automatic speech recognition using exponential families. 1345-1348 - Xin Chen, Yunxin Zhao:

Integrating MLP features and discriminative training in data sampling based ensemble acoustic modeling. 1349-1352 - Jui-Ting Huang, Mark Hasegawa-Johnson:

Semi-supervised training of Gaussian mixture models by conditional entropy minimization. 1353-1356 - Guangchuan Shi, Yu Shi, Qiang Huo:

A study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR. 1357-1360 - Roger Hsiao, Florian Metze, Tanja Schultz:

Improvements to generalized discriminative feature transformation for speech recognition. 1361-1364 - Karel Veselý, Lukás Burget, Frantisek Grézl:

Parallel training of neural networks for speech recognition. 2934-2937 - Rita Singh, Benjamin Lambert, Bhiksha Raj:

The use of sense in unsupervised training of acoustic models for ASR systems. 2938-2941 - Jun Du, Yu Hu, Hui Jiang:

Boosted mixture learning of Gaussian mixture HMMs for speech recognition. 2942-2945 - Volker Leutnant, Reinhold Haeb-Umbach:

On the exploitation of hidden Markov models and linear dynamic models in a hybrid decoder architecture for continuous speech recognition. 2946-2949 - Alberto Abad, Thomas Pellegrini, Isabel Trancoso, João Paulo Neto:

Context dependent modelling approaches for hybrid speech recognizers. 2950-2953 - Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi:

A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination. 2954-2957 - Hank Liao, Christopher Alberti, Michiel Bacchiani, Olivier Siohan:

Decision tree state clustering with word and syllable features. 2958-2961 - Hiroshi Fujimura, Takashi Masuko, Mitsuyoshi Tachimori:

A duration modeling technique with incremental speech rate normalization. 2962-2965 - Martin Wöllmer, Yang Sun, Florian Eyben, Björn W. Schuller

:
Long short-term memory networks for noise robust speech recognition. 2966-2969 - Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada:

One-model speech recognition and synthesis based on articulatory movement HMMs. 2970-2973 - Xiaodong Cui, Jian Xue, Pierre L. Dognin, Upendra V. Chaudhari, Bowen Zhou:

Acoustic modeling with bootstrap and restructuring for low-resourced languages. 2974-2977 - Tetsuo Kosaka, Keisuke Goto, Takashi Ito, Masaharu Katoh:

Lecture speech recognition by combining word graphs of various acoustic models. 2978-2981 - Khe Chai Sim, Shilin Liu:

Semi-parametric trajectory modelling using temporally varying feature mapping for speech recognition. 2982-2985 - Dong Yu, Li Deng:

Deep-structured hidden conditional random fields for phonetic recognition. 2986-2989 - Jonathan Malkin, Jeff A. Bilmes:

Semi-supervised learning for improved expression of uncertainty in discriminative classifiers. 2990-2993 - Peder A. Olsen, Vaibhava Goel, Charles A. Micchelli, John R. Hershey:

Modeling posterior probabilities using the linear exponential family. 2994-2997
Spoken Dialogue Systems I, II
- Fabrice Lefèvre, François Mairesse, Steve J. Young:

Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation. 78-81 - Rajesh Balchandran, Leonid Rachevsky, Bhuvana Ramabhadran, Miroslav Novak:

Techniques for topic detection based processing in spoken dialog systems. 82-85 - Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin:

Optimizing spoken dialogue management with fitted value iteration. 86-89 - Filip Jurcícek, Blaise Thomson, Simon Keizer, François Mairesse, Milica Gasic, Kai Yu, Steve J. Young:

Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems. 90-93 - Alexander Schmitt, Michael Scholz, Wolfgang Minker, Jackson Liscombe, David Suendermann:

Is it possible to predict task completion in automated troubleshooters?. 94-97 - David Suendermann, Jackson Liscombe, Roberto Pieraccini:

Minimally invasive surgery for spoken dialog systems. 98-101
Spoken Dialogue Systems II
- Ramón López-Cózar, David Griol:

New technique to enhance the performance of spoken dialogue systems based on dialogue states-dependent language models and grammatical rules. 2998-3001 - Lluís F. Hurtado, Joaquin Planells, Encarna Segarra, Emilio Sanchis, David Griol:

A stochastic finite-state transducer approach to spoken dialog management. 3002-3005 - Romain Laroche, Philippe Bretier, Ghislain Putois:

Enhanced monitoring tools and online dialogue optimisation merged into a new spoken dialogue system design experience. 3006-3009 - Romain Laroche, Ghislain Putois, Philippe Bretier:

Optimising a handcrafted dialogue system design. 3010-3013 - Felix Putze, Tanja Schultz:

Utterance selection for speech acts in a cognitive tourguide scenario. 3014-3017 - Gabriel Parent, Maxine Eskénazi:

Lexical entrainment of real users in the let's go spoken dialog system. 3018-3021 - Silvia Quarteroni, Meritxell González, Giuseppe Riccardi, Sebastian Varges:

Combining user intention and error modeling for statistical dialog simulators. 3022-3025 - Jaakko Hakulinen, Markku Turunen, Raúl Santos de la Cámara, Nigel T. Crook:

Parallel processing of interruptions and feedback in companions affective dialogue system. 3026-3029 - Antoine Raux, Neville Mehta, Deepak Ramachandran, Rakesh Gupta:

Dynamic language modeling using Bayesian networks for spoken dialog systems. 3030-3033 - Sunao Hara, Norihide Kitaoka, Kazuya Takeda:

Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act n-gram. 3034-3037 - Wei-Bin Liang, Chung-Hsien Wu, Yu-Cheng Hsiao:

Dialogue act detection in error-prone spoken dialogue systems using partial sentence tree and latent dialogue act matrix. 3038-3041 - Tatsuya Kawahara, Kouhei Sumi, Zhi-Qiang Chang, Katsuya Takanashi:

Detection of hot spots in poster conversations based on reactive tokens of audience. 3042-3045 - Yoichi Matsuyama, Shinya Fujie, Hikaru Taniyama, Tetsunori Kobayashi:

Psychological evaluation of a group communication activation robot in a party game. 3046-3049 - Kyoko Matsuyama, Kazunori Komatani, Ryu Takeda, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno:

Analyzing user utterances in barge-in-able spoken dialogue system for improving identification accuracy. 3050-3053 - Mattias Heldner, Jens Edlund, Julia Hirschberg:

Pitch similarity in the vicinity of backchannels. 3054-3057 - Khiet P. Truong, Ronald Poppe

, Dirk Heylen:
A rule-based backchannel prediction model using pitch and pause information. 3058-3061
Speech Perception: Factors Influencing Perception
- Paul Boersma, Katerina Chládková:

Detecting categorical perception in continuous discrimination data. 102-105 - Titia Benders, Paola Escudero:

The interrelation between the stimulus range and the number of response categories in vowel categorization. 106-109 - Marie Nilsenová, Martijn Goudbeek, Luuk Kempen:

The relation between pitch perception preference and emotion identification. 110-113 - Takashi Otake, James M. McQueen, Anne Cutler:

Competition in the perception of spoken Japanese words. 114-117 - Makiko Sadakata, Lotte van der Zanden, Kaoru Sekiyama:

Influence of musical training on perception of L2 speech. 118-121 - Donald Derrick

, Bryan Gick:
Full body aero-tactile integration in speech perception. 122-125
Prosody: Models
- Tomás Dubeda, Katalin Mády:

Nucleus position within the intonation phrase: a typological study of English, Czech and Hungarian. 126-129 - Yong-cheol Lee, Satoshi Nambu:

Focus-sensitive operator or focus inducer: always and only. 130-133 - Jiahong Yuan, Mark Y. Liberman:

F0 declination in English and Mandarin broadcast news speech. 134-137 - Katrin Schweitzer, Michael Walsh, Bernd Möbius, Hinrich Schütze:

Frequency of occurrence effects on pitch accent realisation. 138-141 - César González Ferreras, Carlos Vivaracho-Pascual, David Escudero Mancebo, Valentín Cardeñoso-Payo:

On the automatic toBI accent type identification from data. 142-145 - Andrew Rosenberg:

AutoBI - a tool for automatic toBI annotation. 146-149
Speech Synthesis: Unit Selection and Others
- Volker Strom, Simon King:

A classifier-based target cost for unit selection speech synthesis trained on perceptual data. 150-153 - Wei Zhang, Xiaodong Cui:

Applying scalable phonetic context similarity in unit selection of concatenative text-to-speech. 154-157 - Mitsuaki Isogai, Hideyuki Mizuno:

Speech database reduction method for corpus-based TTS system. 158-161 - Heng Lu, Zhen-Hua Ling, Si Wei, Li-Rong Dai, Ren-Hua Wang:

Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier. 162-165 - Hanna Silén, Elina Helander, Jani Nurminen, Konsta Koppinen, Moncef Gabbouj:

Using robust viterbi algorithm and HMM-modeling in unit selection TTS to replace units of poor quality. 166-169 - Yeon-Jun Kim, Marc C. Beutnagel:

Automatic detection of abnormal stress patterns in unit selection synthesis. 170-173 - Daniel Tihelka, Jirí Kala, Jindrich Matousek:

Enhancements of viterbi search for fast unit selection synthesis. 174-177 - Thomas Ewender, Beat Pfister:

Accurate pitch marking for prosodic modification of speech segments. 178-181 - Shifeng Pan, Meng Zhang, Jianhua Tao:

A novel hybrid approach for Mandarin speech synthesis. 182-185 - Josafá de Jesus Aguiar Pontes, Sadaoki Furui:

Modeling liaison in French by using decision trees. 186-189 - Jian Luan, Jian Li:

Improvement on plural unit selection and fusion. 190-193 - Alok Parlikar, Alan W. Black, Stephan Vogel:

Improving speech synthesis of machine translation output. 194-197 - Ghislain Putois, Jonathan Chevelu, Cédric Boidin:

Paraphrase generation to improve text-to-speech synthesis. 198-201
ASR: Search, Decoding and Confidence Measures I, II
- Chang Woo Han, Shin Jae Kang, Chul Min Lee, Nam Soo Kim:

Phone mismatch penalty matrices for two-stage keyword spotting via multi-pass phone recognizer. 202-205 - Petr Motlícek, Fabio Valente, Philip N. Garner:

English spoken term detection in multilingual recordings. 206-209 - Icksang Han, Chiyoun Park, Jeongmi Cho, Jeongsu Kim:

A hybrid approach to robust word lattice generation via acoustic-based word detection. 210-213 - Volker Steinbiss, Martin Sundermeyer, Hermann Ney:

Direct observation of pruning errors (DOPE): a search analysis tool. 214-217 - David Rybach, Michael Riley:

Direct construction of compact context-dependency transducers from data. 218-221 - Miroslav Novak:

Incremental composition of static decoding graphs with label pushing. 222-225 - Zhanlei Yang, Wenju Liu:

A novel path extension framework using steady segment detection for Mandarin speech recognition. 226-229 - Ralf Schlüter, Markus Nußbaum-Thom, Hermann Ney:

On the relation of Bayes risk, word error, and word posteriors in ASR. 230-233 - David Nolden, Hermann Ney, Ralf Schlüter:

Time conditioned search in automatic speech recognition reconsidered. 234-237 - Satoshi Kobashikawa, Taichi Asami, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi:

Efficient data selection for speech recognition based on prior confidence estimation using speech and context independent models. 238-241 - Atsunori Ogawa, Atsushi Nakamura:

A novel confidence measure based on marginalization of jointly estimated error cause probabilities. 242-245 - Julien Fayolle, Fabienne Moreau, Christian Raymond, Guillaume Gravier, Patrick Gros:

CRF-based combination of contextual features to improve a posteriori word-level confidence measures. 1942-1945 - Martin Wöllmer, Florian Eyben, Björn W. Schuller

, Gerhard Rigoll:
Recognition of spontaneous conversational speech using long short-term memory phoneme predictions. 1946-1949 - Thomas Pellegrini, Isabel Trancoso:

Improving ASR error detection with non-decoder based features. 1950-1953 - Ladan Golipour, Douglas D. O'Shaughnessy:

Phoneme classification and lattice rescoring based on a k-NN approach. 1954-1957 - Jeff A. Bilmes, Hui Lin:

Online adaptive learning for speech recognition decoding. 1958-1961 - Takaaki Hori, Shinji Watanabe, Atsushi Nakamura:

Improvements of search error risk minimization in viterbi beam search for speech recognition. 1962-1965
Special-Purpose Speech Applications
- Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore

, Sergey I. Rybchenko:
Evaluation of a silent speech interface based on magnetic sensing. 246-249 - Rubén San Segundo

, Verónica López-Ludeña, Raquel Martín, Syaheerah L. Lutfi, Javier Ferreiros
, Ricardo de Córdoba, José Manuel Pardo:
Advanced speech communication system for deaf people. 250-253 - Sethserey Sam, Eric Castelli, Laurent Besacier:

Unsupervised acoustic model adaptation for multi-origin non native ASR. 254-257 - Dilek Hakkani-Tür

, Dimitra Vergyri, Gökhan Tür:
Speech-based automated cognitive status assessment. 258-261 - Toru Imai, Shinichi Homma, Akio Kobayashi, Takahiro Oku, Shoei Sato:

Speech recognition with a seamlessly updated language model for real-time closed-captioning. 262-265 - Takuya Nishimoto, Takayuki Watanabe:

The comparison between the deletion-based methods and the mixing-based methods for audio CAPTCHA systems. 266-269 - Martine Adda-Decker, Lori Lamel, Natalie D. Snoeren:

Comparing mono- & multilingual acoustic seed models for a low e-resourced language: a case-study of luxembourgish. 270-273 - R. J. J. H. van Son, Irene Jacobi, Frans J. M. Hilgers:

Manipulating treacheoesophageal speech. 274-277 - David Imseng, Hervé Bourlard, Mathew Magimai-Doss:

Towards mixed language speech recognition systems. 278-281 - Etienne Barnard, Johan Schalkwyk, Charl Johannes van Heerden, Pedro J. Moreno:

Voice search for development. 282-285 - Gina-Anne Levow, Susan Duncan, Edward T. King:

Cross-cultural investigation of prosody in verbal feedback in interactional rapport. 286-289 - Mary Tai Knox, Gerald Friedland:

Multimodal speaker diarization using oriented optical flow histograms. 290-293 - Catherine Middag, Yvan Saeys, Jean-Pierre Martens:

Towards an ASR-free objective analysis of pathological speech. 294-297
Speech Analysis
- Keith W. Godin, John H. L. Hansen:

Session variability contrasts in the MARP corpus. 298-301 - Kazuhiro Kondo

, Yusuke Takano:
Estimation of two-to-one forced selection intelligibility scores by speech recognizers using noise-adapted models. 302-305 - Thomas Schaaf, Florian Metze:

Analysis of gender normalization using MLP and VTLN features. 306-309 - Guillaume Aimetti, Roger K. Moore

, Louis ten Bosch:
Discovering an optimal set of minimally contrasting acoustic speech units: a point of focus for whole-word pattern matching. 310-313 - Themos Stafylakis

, Xavier Anguera:
Improvements to the equal-parameter BIC for speaker diarization. 314-317 - Nima Mesgarani, Samuel Thomas, Hynek Hermansky:

A multistream multiresolution framework for phoneme recognition. 318-321 - Giampiero Salvi, Fabio Tesser, Enrico Zovato, Piero Cosi:

Cluster analysis of differential spectral envelopes on emotional speech. 322-325 - Samuel R. Bowman, Karen Livescu:

Modeling pronunciation variation with context-dependent articulatory feature decision trees. 326-329 - Bhiksha Raj, Kevin W. Wilson, Alexander Krueger, Reinhold Haeb-Umbach:

Ungrounded independent non-negative factor analysis. 330-333 - John R. Hershey, Peder A. Olsen, Steven J. Rennie:

Signal interaction and the devil function. 334-337
Systems for LVCSR
- Yuya Akita, Masato Mimura, Graham Neubig, Tatsuya Kawahara:

Semi-automated update of automatic transcription system for the Japanese national congress. 338-341 - Xunying Liu, Mark J. F. Gales, Philip C. Woodland:

Language model cross adaptation for LVCSR system combination. 342-345 - Shinji Watanabe, Takaaki Hori, Atsushi Nakamura:

Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data. 346-349 - Pavel Kveton, Miroslav Novak:

Accelerating hierarchical acoustic likelihood computation on graphics processors. 350-353 - Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang, Martin Jansche, Pedro J. Moreno:

Search by voice in Mandarin Chinese. 354-357 - Thomas Hain

, Lukás Burget, John Dines, Philip N. Garner, Asmaa El Hannani, Marijn Huijbregts, Martin Karafiát, Mike Lincoln, Vincent Wan:
The AMIDA 2009 meeting transcription system. 358-361
Speaker Characterization and Recognition I-IV
- William M. Campbell, Zahi N. Karam:

Simple and efficient speaker comparison using approximate KL divergence. 362-365 - Hanwu Sun, Bin Ma, Chien-Lin Huang, Trung Hieu Nguyen, Haizhou Li:

The IIR NIST SRE 2008 and 2010 summed channel speaker recognition systems. 366-369 - Chien-Lin Huang, Hanwu Sun, Bin Ma, Haizhou Li:

Speaker characterization using long-term and temporal information. 370-373 - Sergio Perez-Gomez, Daniel Ramos, Javier Gonzalez-Dominguez, Joaquin Gonzalez-Rodriguez:

Score-level compensation of extreme speech duration variability in speaker verification. 374-377 - Alberto Abad, Isabel Trancoso:

Speaker recognition experiments using connectionist transformation network features. 378-381 - Yun Lei, John H. L. Hansen:

Speaker recognition using supervised probabilistic principal component analysis. 382-385 - Benjamin Bigot, Julien Pinquier, Isabelle Ferrané, Régine André-Obrecht:

Looking for relevant features for speaker role recognition. 1057-1060 - Marcel Kockmann, Lukás Burget, Ondrej Glembek, Luciana Ferrer, Jan Cernocký:

Prosodic speaker verification using subspace multinomial models with intersession compensation. 1061-1064 - Eryu Wang, Kong-Aik Lee, Bin Ma, Haizhou Li, Wu Guo, Li-Rong Dai:

The estimation and kernel metric of spectral correlation for text-independent speaker verification. 1065-1068 - Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen, Pasi Fränti:

Improving monaural speaker identification by double-talk detection. 1069-1072 - B. Avinash, Sunitha Guruprasad, B. Yegnanarayana:

Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals. 1073-1076 - Qingsong Liu, Wei Huang, Dongxing Xu, Hongbin Cai, Beiqian Dai:

A fast implementation of factor analysis for speaker verification. 1077-1080 - Ce Zhang, Rong Zheng, Bo Xu:

An investigation into direct scoring methods without SVM training in speaker verification. 1437-1440 - Reda Jourani, Khalid Daoudi, Régine André-Obrecht, Driss Aboutajdine:

Large margin Gaussian mixture models for speaker identification. 1441-1444 - Rong Zheng, Bo Xu:

On the use of Gaussian component information in the generative likelihood ratio estimation for speaker verification. 1445-1448 - Man-Wai Mak, Wei Rao:

Acoustic vector resampling for GMMSVM-based speaker verification. 1449-1452 - Konstantin Biatov:

A fast speaker indexing using vector quantization and second order statistics with adaptive threshold computation. 1453-1456 - Gang Wang, Xiaojun Wu, Thomas Fang Zheng:

Using phoneme recognition and text-dependent speaker verification to improve speaker segmentation for Chinese speech. 1457-1460 - Claudio Garretón, Néstor Becerra Yoma:

On enhancing feature sequence filtering with filter-bank energy transformation in speaker verification with telephone speech. 1461-1464 - Donglai Zhu, Bin Ma, Kong-Aik Lee, Cheung-Chi Leung, Haizhou Li:

MAP estimation of subspace transform for speaker recognition. 1465-1468 - Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming:

A longest matching segment approach for text-independent speaker recognition. 1469-1472 - Ville Hautamäki, Tomi Kinnunen, Mohaddeseh Nosratighods, Kong-Aik Lee, Bin Ma, Haizhou Li:

Approaching human listener accuracy with modern speaker verification. 1473-1476 - Jouni Pohjalainen, Rahim Saeidi, Tomi Kinnunen, Paavo Alku:

Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions. 1477-1480 - Guoli Ye, Brian Mak:

The use of subvector quantization and discrete densities for fast GMM computation for speaker verification. 1481-1484 - Fred S. Richardson, Joseph P. Campbell:

Transcript-dependent speaker recognition using mixer 1 and 2. 2102-2105 - Thomas Drugman, Thierry Dutoit:

On the potential of glottal signatures for speaker recognition. 2106-2109 - R. Padmanabhan, Hema A. Murthy:

Acoustic feature diversity and speaker verification. 2110-2113 - Omid Dehzangi, Bin Ma, Engsiong Chng, Haizhou Li:

A discriminative performance metric for GMM-UBM speaker identification. 2114-2117 - Xavier Anguera, Jean-François Bonastre:

A novel speaker binary key derived from anchor models. 2118-2121 - Weiqiang Zhang, Yan Deng, Liang He, Jia Liu:

Variant time-frequency cepstral features for speaker recognition. 2122-2125 - Ning Wang, P. C. Ching, Tan Lee:

Exploitation of phase information for speaker recognition. 2126-2129 - Yanhua Long, Li-Rong Dai, Bin Ma, Wu Guo:

Effects of the phonological relevance in speaker verification. 2130-2133 - Gabriel Hernández Sierra, Jean-François Bonastre, Driss Matrouf, José R. Calvo:

Topological representation of speech for speaker recognition. 2134-2137 - Seyed Omid Sadjadi, John H. L. Hansen:

Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions. 2138-2141 - Xiang Zhang, Chuan Cao, Lin Yang, Hongbin Suo, Jianping Zhang, Yonghong Yan:

Speaker recognition using the resynthesized speech via spectrum modeling. 2142-2145
Source Separation
- Robert Peharz, Michael Stark, Franz Pernkopf, Yannis Stylianou:

A factorial sparse coder model for single channel source separation. 386-389 - Yasmina Benabderrahmane, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:

Oriented PCA method for blind speech separation of convolutive mixtures. 390-393 - Hsin-Lung Hsieh, Jen-Tzung Chien:

Online Gaussian process for nonstationary speech separation. 394-397 - Meng Yu, Wenye Ma, Jack Xin, Stanley J. Osher:

Convexity and fast speech extraction by split bregman method. 398-401 - Wenye Ma, Meng Yu, Jack Xin, Stanley J. Osher:

Reducing musical noise in blind source separation by time-domain sparse filters and split bregman method. 402-405 - John Woodruff, Rohit Prabhavalkar

, Eric Fosler-Lussier, DeLiang Wang:
Combining monaural and binaural evidence for reverberant speech segregation. 406-409
Speech Synthesis: HMM-Based Speech Synthesis I, II
- Heiga Zen:

Speaker and language adaptive training for HMM-based polyglot speech synthesis. 410-413 - Kai Yu, Heiga Zen, François Mairesse, Steve J. Young:

Context adaptive training with factorized decision trees for HMM-based speech synthesis. 414-417 - Junichi Yamagishi, Oliver Watts, Simon King, Bela Usabaev:

Roles of the average voice in speaker-adaptive HMM-based speech synthesis. 418-421 - Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank K. Soong, Xin Zhuang, Shengyi Kong:

An HMM trajectory tiling (HTT) approach to high quality TTS. 422-425 - Yining Chen, Zhi-Jie Yan, Frank K. Soong:

A perceptual study of acceleration parameters in HMM-based TTS. 426-429 - Shuji Yokomizo, Takashi Nose, Takao Kobayashi:

Evaluation of prosodic contextual factors for HMM-based speech synthesis. 430-433 - Slava Shechtman, Alexander Sorin:

Sinusoidal model parameterization for HMM-based TTS system. 805-808 - Yoshinori Shiga, Tomoki Toda, Shinsuke Sakai, Hisashi Kawai:

Improved training of excitation for HMM-based parametric speech synthesis. 809-812 - June Sig Sung, Doo Hwa Hong, Kyung Hwan Oh, Nam Soo Kim:

Excitation modeling based on waveform interpolation for HMM-based speech synthesis. 813-816 - Xin Zhuang, Yao Qian, Frank K. Soong, Yi-Jian Wu, Bo Zhang:

Formant-based frequency warping for improving speaker adaptation in HMM TTS. 817-820 - Hongwei Hu, Martin J. Russell:

Improved modelling of speech dynamics using non-linear formant trajectories for HMM-based speech synthesis. 821-824 - Zhen-Hua Ling, Yu Hu, Li-Rong Dai:

Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis. 825-828 - Matt Shannon, William Byrne:

Autoregressive clustering for HMM speech synthesis. 829-832 - Nicholas Pilkington, Heiga Zen:

An implementation of decision tree-based context clustering on graphics processing units. 833-836 - Alexander Gutkin, Xavi Gonzalvo, Stefan Breuer, Paul Taylor:

Quantized HMMs for low footprint text-to-speech synthesis. 837-840 - Oliver Watts, Junichi Yamagishi, Simon King:

The role of higher-level linguistic features in HMM-based speech synthesis. 841-844 - Ayami Mase, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:

HMM-based singing voice synthesis system using pitch-shifted pseudo training data. 845-848 - Jinfu Ni, Hisashi Kawai:

An unsupervised approach to creating web audio contents-based HMM voices. 849-852 - Tomoki Koriyama, Takashi Nose, Takao Kobayashi:

Conversational spontaneous speech synthesis using average voice model. 853-856
Multi-Modal Signal Processing
- Jonas Hörnstein, José Santos-Victor:

Learning words and speech units through natural interactions. 434-437 - Qingju Liu, Wenwu Wang, Philip J. B. Jackson:

Bimodal coherence based scale ambiguity cancellation for target speech extraction and enhancement. 438-441 - Hiroaki Kawashima, Yu Horii, Takashi Matsuyama:

Speech estimation in non-stationary noise environments using timing structures between mouth movements and sound signals. 442-445 - Lijuan Wang, Xiaojun Qian, Wei Han, Frank K. Soong:

Synthesizing photo-real talking head via trajectory-guided sample selection. 446-449 - Victoria M. Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, Antonia Colazo-Simon, Claire Pillot-Loiseau, Pierre Roussel-Ragot, Cédric Gendrot, Sophie Quattrocchi:

Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. 450-453 - Gregor Hofer, Korin Richmond:

Comparison of HMM and TMDN methods for lip synchronisation. 454-457
Paralanguage
- Florian Schiel, Christian Heinrich, Veronika Neumeyer:

Rhythm and formant features for automatic alcohol detection. 458-461 - Irena Yanushevskaya, Christer Gobl, John Kane, Ailbhe Ní Chasaide:

An exploration of voice source correlates of focus. 462-465 - James D. Harnsberger, Rahul Shrivastav, W. S. Brown Jr.:

Modeling perceived vocal age in american English. 466-469 - Marie-José Caraty, Claude Montacié:

Multivariate analysis of vocal fatigue in continuous reading. 470-473 - Alexander Kain, Jan P. H. van Santen:

Frequency-domain delexicalization using surrogate vowels. 474-477 - Florian Metze, Anton Batliner, Florian Eyben, Tim Polzehl, Björn W. Schuller

, Stefan Steidl:
Emotion recognition using imperfect speech recognition. 478-481 - Gang Liu, Yun Lei, John H. L. Hansen:

A novel feature extraction strategy for multi-stream robust emotion identification. 482-485 - Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger:

Setup for acoustic-visual speech synthesis by concatenating bimodal units. 486-489 - Bart Jochems, Martha A. Larson, Roeland Ordelman, Ronald Poppe, Khiet P. Truong:

Towards affective state modeling in narrative and conversational settings. 490-493 - Narichika Nomoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi:

Detection of anger emotion in dialog speech using prosody feature and temporal relation of utterances. 494-497 - Benjamin Roustan, Marion Dohen:

Gesture and speech coordination: the influence of the relationship between manual gesture and speech. 498-501 - Hynek Boril, Seyed Omid Sadjadi, Tristan Kleinschmidt, John H. L. Hansen:

Analysis and detection of cognitive load and frustration in drivers' speech. 502-505 - Akira Sasou, Yasuharu Hashimoto, Katsuhiko Sakaue:

Acoustic-based recognition of head gestures accompanying speech. 506-509 - Sandro Castronovo, Angela Mahr, Margarita Pentcheva, Christian A. Müller:

Multimodal dialog in the car: combining speech and turn-and-push dial to control comfort functions. 510-513 - Danil Korchagin, Philip N. Garner, Petr Motlícek:

Hands free audio analysis from home entertainment. 514-517 - Shaikh Mostafa Al Masum, Antonio Rui Ferreira Rebordão, Keikichi Hirose:

Affective story teller: a TTS system for emotional expressivity. 518-521
ASR: Speaker Adaptation, Robustness Against Reverberation
- Shweta Ghai, Rohit Sinha:

Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization. 522-525 - Bo Li, Khe Chai Sim:

Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems. 526-529 - Ravichander Vipperla, Steve Renals, Joe Frankel:

Augmentation of adaptation data. 530-533 - Lukás Machlica, Zbynek Zajíc, Ludek Müller:

Discriminative adaptation based on fast combination of DMAP and dfMLLR. 534-537 - Doddipatla Rama Sanand, Ralf Schlüter, Hermann Ney:

Revisiting VTLN using linear transformation on conventional MFCC. 538-541 - Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:

Speaker adaptation based on nonlinear spectral transform for speech recognition. 542-545 - Tetsuo Kosaka, Takashi Ito, Masaharu Katoh, Masaki Kohda:

Speaker adaptation based on system combination using speaker-class models. 546-549 - Yongwon Jeong, Young Rok Song, Hyung Soon Kim:

Speaker adaptation in transformation space using two-dimensional PCA. 550-553 - Jan Trmal, Jan Zelinka, Ludek Müller:

On speaker adaptive training of artificial neural networks. 554-557 - Yongjun He, Jiqing Han:

Model synthesis for band-limited speech recognition. 558-561 - Takahiro Fukumori, Masanori Morise, Takanobu Nishiura:

Performance estimation of reverberant speech recognition based on reverberant criteria RSR-dn with acoustic parameters. 562-565 - Armin Sehr, Christian Hofmann, Roland Maas, Walter Kellermann:

A novel approach for matched reverberant training of HMMs using data pairs. 566-569 - Hari Krishna Maganti, Marco Matassoni:

An auditory based modulation spectral feature for reverberant speech recognition. 570-573 - Martin Wolf, Climent Nadeu:

On the potential of channel selection for recognition of reverberated speech with multiple microphones. 574-577 - Randy Gomez, Tatsuya Kawahara:

An improved wavelet-based dereverberation for robust automatic speech recognition. 578-581 - Rico Petrick, Thomas Fehér, Masashi Unoki, Rüdiger Hoffmann:

Methods for robust speech recognition in reverberant environments: a comparison. 582-585
Language Learning, TTS, and Other Applications
- Masayuki Suzuki, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose:

Integration of multilayer regression analysis with structure-based pronunciation assessment. 586-589 - Joost van Doremalen, Catia Cucchiarini, Helmer Strik

:
Using non-native error patterns to improve pronunciation verification. 590-593 - Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose:

Regularized-MLLR speaker adaptation for computer-assisted language learning system. 594-597 - Kuniaki Hirabayashi, Seiichi Nakagawa:

Automatic evaluation of English pronunciation by Japanese speakers using various acoustic features and pattern recognition techniques. 598-601 - Hsien-Cheng Liao, Jiang-Chun Chen, Sen-Chia Chang, Ying-Hua Guan, Chin-Hui Lee:

Decision tree based tone modeling with corrective feedbacks for automatic Mandarin tone assessment. 602-605 - Jingli Lu, Ruili Wang, Liyanage C. De Silva, Yang Gao, Jia Liu:

CASTLE: a computer-assisted stress teaching and learning environment for learners of English as a second language. 606-609 - Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu:

Automatic reference independent evaluation of prosody quality using multiple knowledge fusions. 610-613 - Su-Youn Yoon, Mark Hasegawa-Johnson, Richard Sproat:

Landmark-based automated pronunciation error detection. 614-617 - Zhiwei Shuang, Shiyin Kang, Yong Qin, Li-Rong Dai, Lianhong Cai:

HMM based TTS for mixed language text. 618-621 - Hui Liang, John Dines:

An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation. 622-625 - Tatsuya Kawahara, Norihiro Katsumaru, Yuya Akita, Shinsuke Mori:

Classroom note-taking system for hearing impaired students using automatic speech recognition adapted to lectures. 626-629 - Paul R. Dixon, Sadaoki Furui:

Exploring web-browser based runtimes engines for creating ubiquitous speech interfaces. 630-632
Pitch and Glottal-Waveform Estimation and Modeling I, II
- Xuejing Sun, Sameer Gadre:

Efficient three-stage pitch estimation for packet loss concealment. 633-636 - Keiichi Funaki:

On evaluation of the f0 estimation based on time-varying complex speech analysis. 637-640 - Feng Huang, Tan Lee:

Pitch estimation in noisy speech based on temporal accumulation of spectrum peaks. 641-644 - Tianyu T. Wang, Thomas F. Quatieri:

Multi-pitch estimation by a joint 2-d representation of pitch and pitch dynamics. 645-648 - Pirros Tsiakoulis, Alexandros Potamianos:

On the effect of fundamental frequency on amplitude and frequency modulation patterns in speech resonances. 649-652 - M. Shahidur Rahman, Tetsuya Shimamura:

Pitch determination using autocorrelation function in spectral domain. 653-656 - Thomas Drugman, Thierry Dutoit:

Chirp complex cepstrum-based decomposition for asynchronous glottal analysis. 657-660 - Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle:

Exploiting glottal formant parameters for glottal inverse filtering and parameterization. 661-664 - Nicolas Sturmel, Christophe d'Alessandro, Boris Doval:

Glottal parameters estimation on speech using the zeros of the z-transform. 665-668 - Sri Harish Reddy Mallidi, Kishore Prahallad, Suryakanth V. Gangashetty, B. Yegnanarayana:

Significance of pitch synchronous analysis for speaker recognition using AANN models. 669-672 - Gang Chen, Xue Feng, Yen-Liang Shue, Abeer Alwan:

On using voice source measures in automatic gender classification of children's speech. 673-676 - Wei Chu, Abeer Alwan:

SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech. 2590-2593 - Jung Ook Hong, Patrick J. Wolfe:

Robust and efficient pitch estimation using an iterative ARMA technique. 2594-2597 - Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Hidehisa Nagano, Kunio Kashino:

Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases. 2598-2601 - Martin Heckmann, Claudius Gläser, Frank Joublin, Kazuhiro Nakadai:

Applying geometric source separation for improved pitch extraction in human-robot interaction. 2602-2605 - John Kane, Mark Kane, Christer Gobl:

A spectral LF model based approach to voice source parameterisation. 2606-2609 - Thomas Drugman, Thierry Dutoit:

Glottal-based analysis of the lombard effect. 2610-2613
Open Vocabulary Spoken Document Retrieval (Special Session)
- Yoshiaki Itoh, Hiromitsu Nishizaki, Xinhui Hu, Hiroaki Nanjo, Tomoyosi Akiba, Tatsuya Kawahara, Seiichi Nakagawa, Tomoko Matsui, Yoichi Yamashita, Kiyoaki Aikawa:

Constructing Japanese test collections for spoken term detection. 677-680 - Satoshi Natori, Hiromitsu Nishizaki, Yoshihiro Sekiguchi:

Japanese spoken term detection using syllable transition network derived from multiple speech recognizers' outputs. 681-684 - Sha Meng, Weiqiang Zhang, Jia Liu:

Combining Chinese spoken term detection systems via side-information conditioned linear logistic regression. 685-688 - Taisuke Kaneko, Tomoyosi Akiba:

Metric subspace indexing for fast spoken term detection. 689-692 - Chun-an Chan, Lin-Shan Lee:

Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping. 693-696 - Daniel Schneider, Timo Mertens, Martha A. Larson, Joachim Köhler:

Contextual verification for open vocabulary spoken term detection. 697-700 - Javier Tejedor

, Doroteo T. Toledano, Miguel Bautista, Simon King, Dong Wang, José Colás:
Augmented set of features for confidence estimation in spoken term detection. 701-704 - Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:

Cluster-based language model for spoken document retrieval using NMF-based document clustering. 705-708
Robust ASR
- Rogier C. van Dalen, Mark J. F. Gales:

Asymptotically exact noise-corrupted speech likelihoods. 709-712 - Ramón Fernandez Astudillo, Reinhold Orglmeister:

A MMSE estimator in mel-cepstral domain for robust large vocabulary automatic speech recognition using uncertainty propagation. 713-716 - Bhiksha Raj, Tuomas Virtanen, Sourish Chaudhuri, Rita Singh:

Non-negative matrix factorization based compensation of music for automatic speech recognition. 717-720 - Kris Demuynck, Xueru Zhang, Dirk Van Compernolle, Hugo Van hamme

:
Feature versus model based noise robustness. 721-724 - Ji Hun Park, Seon Man Kim, Jae Sam Yoon, Hong Kook Kim, Sung Joo Lee, Yunkeun Lee:

SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment. 725-728 - Chanwoo Kim, Richard M. Stern, Kiwan Eom, Jaewon Lee:

Automatic selection of thresholds for signal separation algorithms based on interaural delay. 729-732
Language and Dialect Identification
- Florian Verdet, Driss Matrouf, Jean-François Bonastre, Jean Hennebert:

Channel detectors for system fusion in the context of NIST LRE 2009. 733-736 - Rong Tong, Bin Ma, Haizhou Li, Engsiong Chng:

Selecting phonotactic features for language recognition. 737-740 - Abualsoud Hanani, Michael J. Carey, Martin J. Russell:

Improved language recognition using mixture components statistics. 741-744 - Mikel Peñagarikano, Amparo Varona, Luis Javier Rodríguez-Fuentes, Germán Bordel:

Using cross-decoder co-occurrences of phone n-grams in SVM-based phonotactic language recognition. 745-748 - Oscar Koller, Alberto Abad, Isabel Trancoso, Céu Viana:

Exploiting variety-dependent phones in portuguese variety identification applied to broadcast news transcription. 749-752 - Fadi Biadsy, Julia Hirschberg, Michael Collins:

Dialect recognition using a phone-GMM-supervector-based SVM kernel. 753-756
Technologies for Learning and Education
- Xiaojun Qian, Frank K. Soong, Helen M. Meng:

Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT). 757-760 - Liang-Yu Chen, Jyh-Shing Roger Jang:

Automatic pronunciation scoring using learning to rank and DP-based score segmentation. 761-764 - Wai Kit Lo, Shuang Zhang, Helen M. Meng:

Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system. 765-768 - Minh Duong, Jack Mostow:

Adapting a duration synthesis model to rate children's oral reading prosody. 769-772 - Su-Youn Yoon, Lei Chen, Klaus Zechner:

Predicting word accuracy for the automatic speech recognition of non-native speech. 773-776 - Taotao Zhu, Dengfeng Ke, Zhenbiao Chen, Bo Xu:

A new approach for automatic tone error detection in strong accented Mandarin based on dominant set. 777-780
Emotional Speech
- S. R. Mahadeva Prasanna, D. Govind:

Analysis of excitation source information in emotional speech. 781-784 - Dongrui Wu, Thomas D. Parsons, Shrikanth S. Narayanan:

Acoustic feature analysis in speech emotion primitives estimation. 785-788 - Lan-Ying Yeh, Tai-Shih Chi:

Spectro-temporal modulations for robust speech emotion recognition. 789-792 - Chi-Chun Lee, Matthew Black, Athanasios Katsamanis, Adam C. Lammert, Brian R. Baucom

, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Quantification of prosodic entrainment in affective spontaneous spoken interactions of married couples. 793-796 - Emily Mower, Kyu Jeong Han, Sungbok Lee, Shrikanth S. Narayanan:

A cluster-profile representation of emotion using agglomerative hierarchical clustering. 797-800 - Björn W. Schuller

, Laurence Devillers:
Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm. 801-804
New Paradigms in ASR I, II
- Xiaodong Wang, Kunihiko Owa, Makoto Shozakai:

Mandarin digit recognition assisted by selective tone distinction. 857-860 - Kazuhiko Abe, Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:

Brazilian portuguese acoustic model training based on data borrowing from other language. 861-864 - Ngoc Thang Vu, Tim Schlippe, Franziska Kraus, Tanja Schultz:

Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit. 865-868 - Houwei Cao, Tan Lee, P. C. Ching:

Cross-lingual speaker adaptation via Gaussian component mapping. 869-872 - Mohamed Elmahdy, Rainer Gruhn, Wolfgang Minker, Slim Abdennadher:

Cross-lingual acoustic modeling for dialectal Arabic speech recognition. 873-876 - Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:

Cross-lingual and multi-stream posterior features for low resource LVCSR systems. 877-880 - Shiva Sundaram, Jerome R. Bellegarda:

Latent perceptual mapping: a new acoustic modeling framework for speech recognition. 881-884 - Richard Dufour, Fethi Bougares, Yannick Estève, Paul Deléglise:

Unsupervised model adaptation on targeted speech segments for LVCSR system combination. 885-888 - Irene Ayllón Clemente, Martin Heckmann, Alexander Denecke, Britta Wrede, Christian Goerick:

Incremental word learning using large-margin discriminative training and variance floor estimation. 889-892 - Tuomas Virtanen, Jort F. Gemmeke, Antti Hurmalainen:

State-based labelling for a sparse representation of speech and its application to robust speech recognition. 893-896 - Mirko Hannemann, Stefan Kombrink, Martin Karafiát, Lukás Burget:

Similarity scoring for recognizing repeated out-of-vocabulary words. 897-900 - Dino Seppi, Dirk Van Compernolle:

Data pruning for template-based automatic speech recognition. 901-904 - Man-Hung Siu, Herbert Gish, Arthur Chan, William Belfield:

Improved topic classification and keyword discovery using an HMM-based speech recognizer trained without supervision. 2838-2841 - Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo:

An analysis of sparseness and regularization in exemplar-based methods for speech classification. 2842-2845 - Abdel-rahman Mohamed, Dong Yu, Li Deng:

Investigation of full-sequence training of deep belief networks for speech recognition. 2846-2849 - Yow-Bang Wang, Lin-Shan Lee:

Mandarin tone recognition using affine-invariant prosodic features and tone posteriorgram. 2850-2853 - Geoffrey Zweig, Patrick Nguyen, Jasha Droppo, Alex Acero:

Continuous speech recognition with a TF-IDF acoustic model. 2854-2857 - Geoffrey Zweig, Patrick Nguyen:

SCARF: a segmental conditional random field toolkit for speech recognition. 2858-2861
Speech Production: Various Approaches
- Akiko Amano-Kusumoto, John-Paul Hosom, Alexander Kain:

Speaking style dependency of formant targets. 905-908 - Tatsuya Kitamura:

Similarity of effects of emotions on the speech organ configuration with and without speaking. 909-912 - Daniel Bone, Samuel Kim, Sungbok Lee, Shrikanth S. Narayanan:

A study of intra-speaker and inter-speaker affective variability using electroglottograph and inverse filtered glottal waveforms. 913-916 - Ken-Ichi Sakakibara, Hiroshi Imagawa, Miwako Kimura, Hisayuki Yokonishi, Niro Tayama:

Modal analysis of vocal fold vibrations using laryngotopography. 917-920 - Martti Vainio, Matti Airas, Juhani Järvikivi, Paavo Alku:

Laryngeal voice quality in the expression of focus. 921-924 - Masako Fujimoto, Kikuo Maekawa, Seiya Funatsu:

Laryngeal characteristics during the production of geminate consonants. 925-928 - Julien Cisonni, Kazunori Nozaki, Annemie Van Hirtum, Shigeo Wada:

Numerical study of turbulent flow-induced sound production in presence of a tooth-shaped obstacle: towards sibilant [s] physical modeling. 929-932 - Iris Hanique, Barbara Schuppler, Mirjam Ernestus:

Morphological and predictability effects on schwa reduction: the case of dutch word-initial syllables. 933-936 - Samer Al Moubayed, Gopal Ananthakrishnan:

Acoustic-to-articulatory inversion based on local regression. 937-940 - Mirjam Broersma:

Korean lenis, fortis, and aspirated stops: effect of place of articulation on acoustic realization. 941-944 - Toru Nakashika, Ryuki Tachibana, Masafumi Nishimura, Tetsuya Takiguchi, Yasuo Ariki:

Speech synthesis by modeling harmonics structure with multiple function. 945-948 - Makoto Otani, Tatsuya Hirahara:

Physics of body-conducted silent speech - production, propagation and representation of non-audible murmur. 949-952
Speech Enhancement
- Subhojit Chakladar, Nam Soo Kim, Yu Gwang Jin, Tae Gyoon Kang:

Multichannel noise reduction using low order RTF estimate. 953-956 - Inho Lee, Jongsung Yoon, Yoonjae Lee, Hanseok Ko:

Reinforced blocking matrix with cross channel projection for speech enhancement. 957-960 - Ning Cheng, Wenju Liu, Lan Wang:

Masking property based microphone array post-filter design. 961-964 - Yusuke Sato, Tetsuya Hoya

, Hovagim Bakardjian, Andrzej Cichocki:
Reduction of broadband noise in speech signals by multilinear subspace analysis. 965-968 - Jungpyo Hong, Seung Ho Han, Sangbae Jeong, Minsoo Hahn:

Novel probabilistic control of noise reduction for improved microphone array beamforming. 969-972 - Kai Li, Qiang Fu, Yonghong Yan:

Speech enhancement using improved generalized sidelobe canceller in frequency domain with multi-channel postfiltering. 973-976 - Jani Even, Carlos Toshinori Ishi, Hiroshi Saruwatari, Norihiro Hagita:

Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface. 977-980 - Ajay Srinivasamurthy, Thippur V. Sreenivas:

Multi-channel iterative dereverberation based on codebook constrained iterative multi-channel wiener filter. 981-984 - Anand Joseph Xavier Medabalimi, Sri Harish Reddy Mallidi, B. Yegnanarayana:

Speaker-dependent mapping of source and system features for enhancement of throat microphone speech. 985-988 - Jun Cai, Stefano Marini, Pierre Malarme, Francis Grenez, Jean Schoentgen:

An analytic modeling approach to enhancing throat microphone speech commands for keyword spotting. 989-992 - Stephen So, Kamil K. Wójcicki, Kuldip K. Paliwal:

Single-channel speech enhancement using kalman filtering in the modulation domain. 993-996 - Miao Yao, Weiqian Liang:

Integrated feedback and noise reduction algorithm in digital hearing aids via oscillation detection. 997-1000 - Charles Mercier, Roch Lefebvre:

A blind signal-to-noise ratio estimator for high noise speech recordings. 1001-1004
Special Session: Fact and Replica of Speech Production (Special Session)
- Hiroshi Imagawa, Ken-Ichi Sakakibara, Isao T. Tokuda, Mamiko Otsuka, Niro Tayama:

Estimation of glottal area function using stereo-endoscopic high-speed digital imaging. 1005-1008 - Kazunori Nozaki, Youhei Ohnishi, Takashi Suda, Shigeo Wada, Shinji Shimojo:

Toward aero-acoustical analysis of the sibilant /s/: an oral cavity modeling. 1009-1012 - Kunitoshi Motoki:

Effects of wall impedance on transmission and attenuation of higher-order modes in vocal-tract model. 1013-1016 - Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube:

Articulatory synthesis and perception of plosive-vowel syllables with virtual consonant targets. 1017-1020 - Kotaro Fukui, Toshihiro Kusano, Yoshikazu Mukaeda, Yuto Suzuki, Atsuo Takanishi, Masaaki Honda:

Speech robot mimicking human articulatory motion. 1021-1024 - Takayuki Arai:

Mechanical vocal-tract models for speech dynamics. 1025-1028 - Michael C. Brady:

Prosodic timing analysis for articulatory re-synthesis using a bank of resonators with an adaptive oscillator. 1029-1032
ASR: Language Modeling
- Ahmad Emami, Stanley F. Chen

, Abraham Ittycheriah, Hagen Soltau, Bing Zhao:
Decoding with shrinkage-based language models. 1033-1036 - Stanley F. Chen

, Stephen M. Chu:
Enhanced word classing for model M. 1037-1040 - Junho Park, Xunying Liu, Mark J. F. Gales, Philip C. Woodland:

Improved neural network based language modelling and adaptation. 1041-1044 - Tomás Mikolov, Martin Karafiát, Lukás Burget, Jan Cernocký, Sanjeev Khudanpur

:
Recurrent neural network based language model. 1045-1048 - Preethi Jyothi, Eric Fosler-Lussier:

Discriminative language modeling using simulated ASR errors. 1049-1052 - Graham Neubig, Masato Mimura, Shinsuke Mori, Tatsuya Kawahara:

Learning a language model from continuous speech. 1053-1056
Single-Channel Speech Enhancement
- Stephen So, Kuldip K. Paliwal:

Fast converging iterative kalman filtering for speech enhancement using long and overlapped tapered windows with large side lobe attenuation. 1081-1084 - Xuejing Sun, Kuan-Chieh Yen, Rogerio Guedes Alves:

Robust noise estimation using minimum correction with harmonicity control. 1085-1088 - Mahdi Triki:

New insights into subspace noise tracking. 1089-1092 - Mahdi Triki, Kees Janse:

Bias considerations for minimum subspace noise tracking. 1093-1096 - Ji Ming, Ramji Srinivasan, Danny Crookes:

A corpus-based approach to speech enhancement from nonstationary noise. 1097-1100 - Zhe Chen, You-Chi Cheng, Fuliang Yin, Chin-Hui Lee:

Bandwidth expansion of speech based on wavelet transform modulus maxima vector mapping. 1101-1104
Speech Synthesis: Miscellaneous Topics
- Kalu U. Ogbureke, Peter Cahill, Julie Carson-Berndsen:

Hidden Markov models with context-sensitive observations for grapheme-to-phoneme conversion. 1105-1108 - Brian Langner, Stephan Vogel, Alan W. Black:

Evaluating a dialog language generation system: comparing the mountain system to other NLG approaches. 1109-1112 - Wesley Mattheyses, Lukas Latacz, Werner Verhelst:

Active appearance models for photorealistic visual speech synthesis. 1113-1116 - Jerome R. Bellegarda:

Latent affective mapping: a novel framework for the data-driven analysis of emotion in text. 1117-1120 - Anna C. Janska, Robert A. J. Clark:

Native and non-native speaker judgements on the quality of synthesized speech. 1121-1124 - Dominic Espinosa, Michael White, Eric Fosler-Lussier, Chris Brew:

Machine learning for text selection with expressive unit-selection voices. 1125-1128
Prosody: Basics Applications
- Alexei V. Ivanov, Giuseppe Riccardi, Sucheta Ghosh, Sara Tonelli, Evgeny A. Stepanov:

Acoustic correlates of meaning structure in conversational speech. 1129-1132 - Nicolas Obin, Xavier Rodet, Anne Lacheret:

HMM-based prosodic structure model using rich linguistic context. 1133-1136 - Charlotte Wollermann, Bernhard Schröder, Ulrich Schade:

Audiovisual congruence and pragmatic focus marking. 1137-1140 - Margaret Zellers, Michele Gubian, Brechtje Post:

Redescribing intonational categories with functional data analysis. 1141-1144 - Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu:

Exploring goodness of prosody by diverse matching templates. 1145-1148 - Mickael Rouvier, Richard Dufour, Georges Linarès, Yannick Estève:

A language-identification inspired method for spontaneous speech detection. 1149-1152 - Gérard Bailly, Amélie Lelong:

Speech dominoes and phonetic convergence. 1153-1156 - Mátyás Brendel, Riccardo Zaccarelli, Laurence Devillers:

A quick sequential forward floating feature selection algorithm for emotion detection from speech. 1157-1160 - Géza Kiss, Jan P. H. van Santen:

Automated vocal emotion recognition using phoneme class specific features. 1161-1164 - Adrian Pass, Jianguo Zhang, Darryl Stewart:

Feature selection for pose invariant lip biometrics. 1165-1168 - Hussein Hussein, Rüdiger Hoffmann:

Signal-based accent and phrase marking using the fujisaki model. 1169-1172 - Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan:

A study of interplay between articulatory movement and prosodic characteristics in emotional speech production. 1173-1176
ASR: Feature Extraction I, II
- Shang-wen Li, Liang-Che Sun, Lin-Shan Lee:

Improved phoneme recognition by integrating evidence from spectro-temporal and cepstral features. 1177-1180 - Suman V. Ravuri, Nelson Morgan:

Using spectro-temporal features to improve AFE feature extraction for ASR. 1181-1184 - Ibon Saratxaga, Inma Hernáez, Igor Odriozola, Eva Navas, Iker Luengo, Daniel Erro:

Using harmonic phase information to improve ASR rate. 1185-1188 - Kazumasa Yamamoto, Eiichi Sueyoshi, Seiichi Nakagawa:

Speech recognition using long-term phase information. 1189-1192 - Jan Zelinka, Jan Trmal, Ludek Müller:

Low-dimensional space transforms of posteriors in speech recognition. 1193-1196 - Christian Plahl, Ralf Schlüter, Hermann Ney:

Hierarchical bottle neck features for LVCSR. 1197-1200 - Frantisek Grézl, Martin Karafiát:

Hierarchical neural net architectures for feature extraction in ASR. 1201-1204 - Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan:

Mutual information analysis for feature and sensor subset selection in surface electromyography based speech recognition. 1205-1208 - Bernd T. Meyer, Birger Kollmeier:

Learning from human errors: prediction of phoneme confusions based on modified ASR training. 1209-1212 - Bo Li, Khe Chai Sim:

Hidden logistic linear regression for support vector machine based phone verification. 2614-2617 - Tim Ng, Bing Zhang, Long Nguyen:

Jointly optimized discriminative features for speech recognition. 2618-2621 - Florian Müller, Alfred Mertins:

Invariant integration features combined with speaker-adaptation methods. 2622-2625 - Mark Raugas, Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan:

Multi resolution discriminative models for subvocalic speech recognition. 2626-2629 - Fabio Valente, Mathew Magimai-Doss, Christian Plahl, Suman V. Ravuri, Wen Wang:

A comparative large scale study of MLP features for Mandarin ASR. 2630-2633 - Cong-Thanh Do, Dominique Pastor, Gaël Le Lan, André Goalic:

Recognizing cochlear implant-like spectrally reduced speech with HMM-based ASR: experiments with MFCCs and PLP coefficients. 2634-2637
Speech Perception: Cross Language and Age
- Kazuhiro Kondo

, Takayuki Kanda, Yosuke Kobayashi, Hiroyuki Yagyu:
Speech intelligibility of diagonally localized speech with competing noise using bone-conduction headphones. 1213-1216 - Pierre L. Divenyi:

Masking of vowel-analog transitions by vowel-analog distracters. 1217-1220 - François Pellegrino, Emmanuel Ferragne, Fanny Meunier:

2010, a speech oddity: phonetic transcription of reversed speech. 1221-1224 - Hsin-Yi Lin, Janice Fon:

Perception on pitch reset at discourse boundaries. 1225-1228 - Marjorie Dole, Michel Hoen, Fanny Meunier:

Effect of spatial separation on speech-in-noise comprehension in dyslexic adults. 1229-1232 - Ellen Marklund, Francisco Lacerda, Anna Ericsson:

Speech categorization context effects in seven- to nine-month-old infants. 1233-1236 - Diane Kewley-Port, Larry E. Humes, Daniel Fogerty:

Changes in temporal processing of speech across the adult lifespan. 1237-1240 - Jared Bernstein, Jian Cheng, Masanori Suzuki:

Fluency and structural complexity as predictors of L2 oral proficiency. 1241-1244 - Marco van de Ven, Benjamin V. Tucker, Mirjam Ernestus:

Semantic facilitation in bilingual everyday speech comprehension. 1245-1248 - Bo-ren Hsieh, Ho-hsien Pan:

L2 experience and non-native vowel categorization of L1-Mandarin speakers. 1249-1252 - Mirjam Wester:

Cross-lingual talker discrimination. 1253-1256 - Takashi Otake:

Dajare is not the lowest form of wit. 1257-1260
SLP Systems
- Rafael Torres, Shota Takeuchi, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:

Comparison of methods for topic classification in a speech-oriented guidance system. 1261-1264 - Pere Comas, Jordi Turmo, Lluís Màrquez:

Using dependency parsing and machine learning for factoid question answering on spoken documents. 1265-1268 - Carolina Parada, Abhinav Sethy, Mark Dredze, Frederick Jelinek:

A spoken term detection framework for recovering out-of-vocabulary words using the web. 1269-1272 - Hung-yi Lee, Chia-Ping Chen, Ching-feng Yeh, Lin-Shan Lee:

Improved spoken term detection by discriminative training of acoustic models based on user relevance feedback. 1273-1276 - Sebastian Tschöpel, Daniel Schneider:

A lightweight keyword and tag-cloud retrieval algorithm for automatic speech recognition transcripts. 1277-1280 - Noboru Kanedera, Tetsuo Funada, Seiichi Nakagawa:

Lecture subtopic retrieval by retrieval keyword expansion using subordinate concept. 1281-1284 - Hiroaki Nanjo, Yusuke Iyonaga, Takehiko Yoshimi:

Spoken document retrieval for oral presentations integrating global document similarities into local document similarities. 1285-1288 - Joseph Polifroni, Stephanie Seneff:

Combining word-based features, statistical language models, and parsing for named entity recognition. 1289-1292 - Azeddine Zidouni, Sophie Rosset, Hervé Glotin:

Efficient combined approach for named entity recognition in spoken language. 1293-1296 - Sree Harsha Yella, Vasudeva Varma, Kishore Prahallad:

Prominence based scoring of speech segments for automatic speech-to-speech summarization. 1297-1300 - Zihan Liu, Lei Xie, Wei Feng:

Maximum lexical cohesion for fine-grained news story segmentation. 1301-1304 - Xiaoxuan Wang, Lei Xie, Bin Ma, Engsiong Chng, Haizhou Li:

Phoneme lattice based texttiling towards multilingual story segmentation. 1305-1308
Quality of Experiencing Speech Services (Special Session)
- Anton Schlesinger, Marinus M. Boone:

The characterization of the relative information content by spectral features for the objective intelligibility assessment of nonlinearly processed speech. 1309-1312 - Marcel Wältermann, Alexander Raake, Sebastian Möller:

Analytical assessment and distance modeling of speech transmission quality. 1313-1316 - Nicolas Côté, Vincent Koehl, Valérie Gautier-Turbin, Alexander Raake, Sebastian Möller:

An intrusive super-wideband speech quality model: DIAL. 1317-1320 - Sebastian Egger, Raimund Schatz, Stefan Scherer:

It takes two to tango - assessing the impact of delay on conversational interactivity on perceived speech quality. 1321-1324 - Sebastian Möller, Florian Hinterleitner, Tiago H. Falk, Tim Polzehl:

Comparison of approaches for instrumentally predicting the quality of text-to-speech systems. 1325-1328 - Imre Kiss, Joseph Polifroni, Chao Wang, Ghinwa F. Choueiter, Mike Phillips:

A hybrid architecture for mobile voice user interfaces. 1329-1332 - Markku Turunen, Jaakko Hakulinen, Tomi Heimonen:

Assessment of spoken and multimodal applications: lessons learned from laboratory and field studies. 1333-1336 - Klaus-Peter Engelbrecht, Hamed Ketabdar, Sebastian Möller:

Improving cross database prediction of dialogue quality using mixture of experts. 1337-1340
Language Processing
- Camille Guinaudeau, Guillaume Gravier, Pascale Sébillot:

Improving ASR-based topic segmentation of TV programs with confidence measures and semantic relations. 1365-1368 - Saturnino Luz, Jing Su:

The relevance of timing, pauses and overlaps in dialogues: detecting topic changes in scenario based meetings. 1369-1372 - Richard Dufour, Benoît Favre:

Semi-supervised part-of-speech tagging in speech applications. 1373-1376 - Frédéric Tantini, Christophe Cerisara, Claire Gardent:

Memory-based active learning for French broadcast news. 1377-1380 - Dan Gillick:

Can conversational word usage be used to predict speaker demographics?. 1381-1384 - Chao-Hong Liu, Chung-Hsien Wu:

Prosodic word-based error correction in speech recognition using prosodic word expansion and contextual information. 1385-1388
Speech and Audio Segmentation
- Sarah Hoffmann, Beat Pfister:

Fully automatic segmentation for prosodic speech corpora. 1389-1392 - Vahid Khanagha, Khalid Daoudi, Oriol Pont, Hussein M. Yahia:

A novel text-independent phonetic segmentation algorithm based on the microcanonical multiscale formalism. 1393-1396 - You-Yu Lin, Yih-Ru Wang, Yuan-Fu Liao:

Phone boundary detection using sample-based acoustic parameters. 1397-1400 - Utpala Musti, Asterios Toutios, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger:

HMM-based automatic visual speech segmentation using facial data. 1401-1404 - David Wang, Robert Vogt, Sridha Sridharan:

Bayes factor based speaker segmentation for speaker diarization. 1405-1408 - Qiang Huang, Stephen J. Cox:

Using high-level information to detect key audio events in a tennis game. 1409-1412
Prosody: Analysis
- Catherine Lai:

What do you mean, you're uncertain?: the interpretation of cue words and rising intonation in dialogue. 1413-1416 - Yi-Fen Liu, Shu-Chuan Tseng, Jyh-Shing Roger Jang, C.-H. Alvin Chen:

Coping imbalanced prosodic unit boundary detection with linguistically-motivated prosodic features. 1417-1420 - Zhigang Chen, Guoping Hu, Wei Jiang:

Improving prosodic phrase prediction by unsupervised adaptation and syntactic features extraction. 1421-1424 - Yujia Li, Tan Lee:

Perception-based automatic approximation of F0 contours in Cantonese speech. 1425-1428 - Raul Fernandez, Bhuvana Ramabhadran:

Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data. 1429-1432 - Erin Cvejic

, Jeesun Kim, Chris Davis, Guillaume Gibert:
Prosody for the eyes: quantifying visual prosody using guided principal component analysis. 1433-1436
Systems for LVCSR and Rich Transcription
- Naveen Parihar, Ralf Schlüter, David Rybach, Eric A. Hansen:

Parallel lexical-tree based LVCSR on multi-core processors. 1485-1488 - Jike Chong, Ekaterina Gonina, Kisun You, Kurt Keutzer:

Exploring recognition network representations for efficient speech inference on highly parallel platforms. 1489-1492 - Diamantino Caseiro:

WFST compression for automatic speech recognition. 1493-1496 - Ivan Bulyko:

Speech recognizer optimization under speed constraints. 1497-1500 - Florian Metze, Roger Hsiao, Qin Jin, Udhyakumar Nallasamy, Tanja Schultz:

The 2010 CMU GALE speech-to-text system. 1501-1504 - Tin Lay Nwe, Hanwu Sun, Bin Ma, Haizhou Li:

Speaker diarization in meeting audio for single distant microphone. 1505-1508 - Fernando Batista, Helena Moniz, Isabel Trancoso, Hugo Meinedo, Ana Isabel Mata, Nuno J. Mamede:

Extending the punctuation module for european portuguese. 1509-1512 - Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:

Utilizing a noisy-channel approach for Korean LVCSR. 1513-1516 - Markus Nußbaum-Thom, Simon Wiesler, Martin Sundermeyer, Christian Plahl, Stefan Hahn, Ralf Schlüter, Hermann Ney:

The RWTH 2009 quaero ASR evaluation system for English and German. 1517-1520
Phonetics
- Benjamin Munson, Renata Solum:

When is indexical information about speech activated? evidence from a cross-modal priming experiment. 1521-1524 - Benjamin Munson:

The influence of actual and perceived sexual orientation on diadochokinetic rate in women and men. 1525-1528 - Kristine M. Yu:

Laryngealization and features for Chinese tonal recognition. 1529-1532 - Viet Son Nguyen, Eric Castelli, René Carré:

Production and perception of vietnamese short vowels in V1V2 context. 1533-1536 - Gertraud Fenk-Oczlon, August Fenk:

Measuring basic tempo across languages and some implications for speech rhythm. 1537-1540 - Yukari Hirata, Shigeaki Amano:

Durational structure of Japanese single/geminate stops in three- and four-mora words spoken at varied rates. 1541-1544 - Shin-ichiro Sano, Tomohiko Ooigawa:

Distribution and trichotomic realization of voiced velars in Japanese - an experimental study. 1545-1548 - Jagoda Sieczkowska, Bernd Möbius, Grzegorz Dogil:

Specification in context - devoicing processes in Polish, French, american English and German sonorants. 1549-1552 - Kuniko Y. Nielsen:

Phonetic imitation of Japanese vowel devoicing. 1553-1556 - Mary Stevens, John Hajek:

Post-aspiration in standard Italian: some first cross-regional acoustic evidence. 1557-1560 - Mirko Grimaldi, Andrea Calabrese, Francesco Sigona

, Luigia Garrapa, Bianca Sisinni:
Articulatory grounding of southern salentino harmony processes. 1561-1564 - Yuuki Tanida

, Taiji Ueno, Satoru Saito, Matthew A. Lambon Ralph:
Effects of accent typicality and phonotactic frequency on nonword immediate serial recall performance in Japanese. 1565-1567 - Osamu Fujimura:

How abstract is phonetics?. 1568-1571
Speech Production: Vocal Tract Modeling and Imaging
- Adam C. Lammert, Michael I. Proctor, Shrikanth S. Narayanan:

Data-driven analysis of realtime vocal tract MRI using correlated image regions. 1572-1575 - Michael I. Proctor, Daniel Bone, Athanasios Katsamanis, Shrikanth S. Narayanan:

Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis. 1576-1579 - Yoon-Chul Kim, Shrikanth S. Narayanan, Krishna S. Nayak:

Improved real-time MRI of oral-velar coordination using a golden-ratio spiral view order. 1580-1583 - Erik Bresch, Athanasios Katsamanis, Louis Goldstein, Shrikanth S. Narayanan:

Statistical multi-stream modeling of real-time MRI articulatory speech data. 1584-1587 - Gopal Ananthakrishnan, Pierre Badin, Julián Andrés Valdés Vargas, Olov Engwall:

Predicting unseen articulations from multi-speaker articulatory models. 1588-1591 - Chao Qin, Miguel Á. Carreira-Perpiñán:

Estimating missing data sequences in x-ray microbeam recordings. 1592-1595 - Chao Qin, Miguel Á. Carreira-Perpiñán, Mohsen Farhadloo:

Adaptation of a tongue shape model by local feature transformations. 1596-1599 - Sungbok Lee, Shrikanth S. Narayanan:

Vocal tract contour analysis of emotional speech by the functional data curve representation. 1600-1603 - Adam C. Lammert, Louis Goldstein, Khalil Iskarous:

Locally-weighted regression for estimating the forward kinematics of a geometric vocal tract model. 1604-1607 - Michael Reimer, Frank Rudzicz:

Identifying articulatory goals from kinematic data using principal differential analysis. 1608-1611 - Zuheng Ming, Denis Beautemps, Gang Feng, Sébastien Schmerber:

Estimation of speech lip features from discrete cosinus transform. 1612-1615 - Farzaneh Ahmadi, Ian Vince McLoughlin, Hamid R. Sharifzadeh:

Autoregressive modelling for linear prediction of ultrasonic speech. 1616-1619
Speech Intelligibility Enhancement for All Ages, Health Conditions and Environments (Special Session)
- Takayuki Arai, Nao Hodoshima:

Enhanced speech yielding higher intelligibility for all listeners and environments. 1620-1623 - Seyed Omid Sadjadi, Sanjay A. Patil, John H. L. Hansen:

Quality conversion of non-acoustic signals for facilitating human-to-human speech communication under harsh acoustic conditions. 1624-1627 - Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:

The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion. 1628-1631 - Gibak Kim, Philipos C. Loizou:

A new binary mask based on noise constraints for improved speech intelligibility. 1632-1635 - Yan Tang, Martin Cooke:

Energy reallocation strategies for speech enhancement in known noise conditions. 1636-1639 - Jing Chen, Thomas Baer, Brian C. J. Moore:

Effects of enhancement of spectral changes on speech quality and subjective speech intelligibility. 1640-1643
ASR: Acoustic Model Adaptation
- Catherine Breslin, K. K. Chin, Mark J. F. Gales, Kate M. Knill, Haitian Xu:

Prior information for rapid speaker adaptation. 1644-1647 - Jonas Lööf, Ralf Schlüter, Hermann Ney:

Discriminative adaptation for log-linear acoustic models. 1648-1651 - Dimitra Vergyri, Lori Lamel, Jean-Luc Gauvain:

Automatic speech recognition of multiple accented English data. 1652-1655 - Jinyu Li, Yu Tsao, Chin-Hui Lee:

Shrinkage model adaptation in automatic speech recognition. 1656-1659 - Jinyu Li, Dong Yu, Yifan Gong, Li Deng:

Unscented transform with online distortion estimation for HMM adaptation. 1660-1663 - Michael L. Seltzer, Alex Acero

:
HMM adaptation using linear spline interpolation with integrated spline parameter training for robust speech recognition. 1664-1667
SLP Systems for Information Extraction/Retrieval
- Dong Wang, Simon King, Nicholas W. D. Evans, Raphaël Troncy:

CRF-based stochastic pronunciation modeling for out-of-vocabulary spoken term detection. 1668-1671 - Chia-Ping Chen, Hung-yi Lee, Ching-feng Yeh, Lin-Shan Lee:

Improved spoken term detection by feature space pseudo-relevance feedback. 1672-1675 - Aren Jansen, Kenneth Church, Hynek Hermansky:

Towards spoken term discovery at scale with zero resources. 1676-1679 - Evandro B. Gouvêa, Tony Ezzat:

Vocabulary independent spoken query: a case for subword units. 1680-1683 - Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen:

Extractive speech summarization - from the view of decision theory. 1684-1687 - Gabriel Murray, Giuseppe Carenini, Raymond T. Ng:

The impact of ASR on abstractive vs. extractive meeting summaries. 1688-1691
Speech Representation
- Li Deng, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, Geoffrey E. Hinton:

Binary coding of speech spectrograms using a deep auto-encoder. 1692-1695 - Juhan Nam, Gautham J. Mysore, Joachim Ganseman, Kyogu Lee, Jonathan S. Abel:

A super-resolution spectrogram using coupled PLCA. 1696-1699 - Georgios Tzedakis, Yannis Pantazis, Olivier Rosec, Yannis Stylianou:

Fast least-squares solution for sinusoidal, harmonic and quasi-harmonic models. 1700-1703 - Afsaneh Asaei, Hervé Bourlard, Philip N. Garner:

Sparse component analysis for speech recognition in multi-speaker environment. 1704-1707 - Trond Skogstad, Torbjørn Svendsen:

Intra-frame variability as a predictor of frame classifiability. 1708-1711 - Tetsuya Shimamura, Ngoc Dinh Nguyen:

Autocorrelation and double autocorrelation based spectral representations for a noisy word recognition system. 1712-1715
Voice Conversion
- Elina Helander, Hanna Silén, Joaquín Míguez

, Moncef Gabbouj:
Maximum a posteriori voice conversion using sequential monte carlo methods. 1716-1719 - Pierre Lanchantin, Xavier Rodet:

Dynamic model selection for spectral voice conversion. 1720-1723 - Takashi Nose, Takao Kobayashi:

Speaker-independent HMM-based voice conversion using quantized fundamental frequency. 1724-1727 - Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, Nobuaki Minematsu:

Probabilistic integration of joint density model and speaker model for voice conversion. 1728-1731 - Zhizheng Wu, Tomi Kinnunen, Engsiong Chng, Haizhou Li:

Text-independent F0 transformation with non-parallel data for voice conversion. 1732-1735 - Xiaodan Zhuang, Lijuan Wang, Frank K. Soong, Mark Hasegawa-Johnson:

A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion. 1736-1739
Prosody: Language-Specific Models
- Anastasia Karlsson, David House, Jan-Olof Svantesson, Damrong Tayanin:

Influence of lexical tones on intonation in kammu. 1740-1743 - Satoshi Nambu, Yong-cheol Lee:

Phonetic realization of second occurrence focus in Japanese. 1744-1747 - Jianjing Kuang:

Prosodic grouping and relative clause disambiguation in Mandarin. 1748-1751 - Ya Li, Jianhua Tao, Meng Zhang, Shifeng Pan, Xiaoying Xu:

Text-based unstressed syllable prediction in Mandarin. 1752-1755 - Tomás Dubeda:

"flat pitch accents" in Czech. 1756-1759 - Tomás Dubeda:

Positional variability of pitch accents in Czech. 1760-1763 - Shyamal Kr. Das Mandal, Arup Saha, Tulika Basu, Keikichi Hirose, Hiroya Fujisaki:

Modeling of sentence-medial pauses in bangla readout speech: occurrence and duration. 1764-1767 - Adrian Leemann, Lucy Zuberbühler:

Declarative sentence intonation patterns in 8 swiss German dialects. 1768-1771 - Je Hun Jeon, Yang Liu:

Syllable-level prominence detection with acoustic evidence. 1772-1775 - Sankalan Prasad, Kalika Bali:

Prosody cues for classification of the discourse particle "hã" in hindi. 1776-1779 - Yuan Jia, Aijun Li:

Interaction of syntax-marked focus and wh-question induced focus in standard Chinese. 1780-1783 - Samer Al Moubayed, Jonas Beskow:

Prominence detection in Swedish using syllable correlates. 1784-1787 - Na Zhi, Daniel Hirst, Pier Marco Bertinetto:

Automatic analysis of the intonation of a tone language. applying the momel algorithm to spontaneous standard Chinese (beijing). 1788-1791 - Raymond W. M. Ng, Cheung-Chi Leung, Ville Hautamäki, Tan Lee, Bin Ma, Haizhou Li:

Towards long-range prosodic attribute modeling for language recognition. 1792-1795 - Robert Schubert, Oliver Jokisch, Diane Hirschfeld:

A modified parameterization of the Fujisaki model. 1796-1799
ASR: Language Modeling and Speech Understanding I
- Saeedeh Momtazi, Friedrich Faubel, Dietrich Klakow:

Within and across sentence boundary language model. 1800-1803 - Ruhi Sarikaya, Stanley F. Chen

, Abhinav Sethy, Bhuvana Ramabhadran:
Impact of word classing on shrinkage-based language models. 1804-1807 - Stanislas Oger, Vladimir Popescu, Georges Linarès:

Combination of probabilistic and possibilistic language models. 1808-1811 - Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk:

On-demand language model interpolation for mobile speech input. 1812-1815 - Tim Schlippe, Chenfei Zhu, Jan Gebhardt, Tanja Schultz:

Text normalization based on statistical machine translation and internet user support. 1816-1819 - Tanel Alumäe, Mikko Kurimo:

Efficient estimation of maximum entropy language models with n-gram features: an SRILM extension. 1820-1823 - Christian Gillot, Christophe Cerisara, David Langlois, Jean Paul Haton:

Similar n-gram language model. 1824-1827 - Markpong Jongtaveesataporn, Sadaoki Furui:

Topic and style-adapted language modeling for Thai broadcast news ASR. 1828-1831 - Ahmad Emami, Hong-Kwang Jeff Kuo, Imed Zitouni, Lidia Mangu:

Augmented context features for Arabic speech recognition. 1832-1835 - Lucía Ortega, Isabel Galiano, Lluís F. Hurtado, Emilio Sanchis, Encarna Segarra:

A statistical segment-based approach for spoken language understanding. 1836-1839 - Benjamin Lecouteux, Raphaël Rubino, Georges Linarès:

Improving back-off models with bag of words and hollow-grams. 2418-2421 - Ciprian Chelba, Thorsten Brants, Will Neveitt, Peng Xu:

Study on interaction between entropy pruning and kneser-ney smoothing. 2422-2425 - Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki, Koichi Shinoda:

Dynamic language model adaptation using keyword category classification. 2426-2429 - Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa:

Integration of cache-based model and topic dependent class model with soft clustering and soft voting. 2430-2433 - Frédéric Duvert, Renato de Mori:

Conditional models for detecting lambda-functions in a spoken language understanding system. 2434-2437 - Md. Akmal Haidar, Douglas D. O'Shaughnessy:

Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation. 2438-2441 - Qun Feng Tan, Kartik Audhkhasi, Panayiotis G. Georgiou, Emil Ettelaie, Shrikanth S. Narayanan:

Automatic speech recognition system channel modeling. 2442-2445 - Takanobu Oba, Takaaki Hori, Atsushi Nakamura:

Round-robin discrimination model for reranking ASR hypotheses. 2446-2449 - Hasim Sak, Murat Saraclar, Tunga Güngör:

On-the-fly lattice rescoring for real-time automatic speech recognition. 2450-2453
First and Second Language Acquisition
- Angela Cooper, Yue Wang:

Cantonese tone word learning by tone and non-tone language speakers. 1840-1843 - Anne Cutler, Janise Shanley:

Validation of a training method for L2 continuous-speech segmentation. 1844-1847 - Jiahong Yuan:

Linguistic rhythm in foreign accent. 1848-1849 - Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka:

The effect of a word embedded in a sentence and speaking rate variation on the perceptual training of geminate and singleton consonant distinction. 1850-1853 - Chiharu Tsurutani:

Foreign accent matters most when timing is wrong. 1854-1857 - Hyejin Hong, Jina Kim, Minhwa Chung:

Effects of Korean learners' consonant cluster reduction strategies on English speech recognition performance. 1858-1861 - June S. Levitt, William F. Katz:

The effects of EMA-based augmented visual feedback on the English speakers' acquisition of the Japanese flap: a perceptual study. 1862-1865 - Hinako Masuda, Takayuki Arai:

Perception of voiceless fricatives by Japanese listeners of advanced and intermediate level English proficiency. 1866-1869 - Lya Meister, Einar Meister

:
Perception of estonian vowel categories by native and non-native speakers. 1870-1873 - Qin Shi, Kun Li, Shilei Zhang, Stephen M. Chu, Ji Xiao, Zhijian Ou:

Spoken English assessment system for non-native speakers using acoustic and prosodic features. 1874-1877 - Elena E. Lyakso, Olga V. Frolova, Anna V. Kurazhova, Julia S. Gaikova:

Russian infants and children's sounds and speech corpuses for language acquisition studies. 1878-1881 - Julia Monnin, Hélène Loevenbruck:

Language-specific influence on phoneme development: French and drehu data. 1882-1885 - Jeffrey J. Holliday, Mary E. Beckman, Chanelle Mays:

Did you say susi or shushi? measuring the emergence of robust fricative contrasts in English- and Japanese-acquiring children. 1886-1889
Spoken Language Resources, Systems and Evaluation I, II
- Josef R. Novak, Paul R. Dixon, Sadaoki Furui:

An empirical comparison of the t3, juicer, HDecode and sphinx3 decoders. 1890-1893 - Philip N. Garner, John Dines:

Tracter: a lightweight dataflow framework. 1894-1897 - Marelie H. Davel, Febe de Wet:

Verifying pronunciation dictionaries using conflict analysis. 1898-1901 - Brandon Roy, Soroush Vosoughi, Deb Roy:

Automatic estimation of transcription accuracy and difficulty. 1902-1905 - Benjamin Lambert, Rita Singh, Bhiksha Raj:

Creating a linguistic plausibility dataset with non-expert annotators. 1906-1909 - Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:

Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition. 1910-1913 - Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, Mike LeBeau:

Building transcribed speech corpora quickly and cheaply for many languages. 1914-1917 - Heidi Christensen, Jon Barker, Ning Ma, Phil D. Green:

The CHiME corpus: a resource and a challenge for computational hearing in multisource environments. 1918-1921 - Wen Cao, Dongning Wang, Jinsong Zhang, Ziyu Xiong:

Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training. 1922-1925 - Shogo Ishikawa, Shinya Kiriyama, Yoichi Takebayashi, Shigeyoshi Kitazawa:

How children acquire situation understanding skills?: a developmental analysis utilizing multimodal speech behavior corpus. 1926-1929 - Ina Wechsung, Stefan Schaffer, Robert Schleicher, Anja Naumann, Sebastian Möller:

The influence of expertise and efficiency on modality selection strategies and perceived mental effort. 1930-1933 - Christine Kühnel, Benjamin Weiss, Sebastian Möller:

Parameters describing multimodal interaction - definitions and three usage scenarios. 1934-1937 - Alexander Zgorzelski, Alexander Schmitt, Tobias Heinroth, Wolfgang Minker:

Repair strategies on trial: which error recovery do users like best?. 1938-1941 - Maryam Kamvar, Doug Beeferman:

Say what? why users choose to speak their web queries. 1966-1969 - Jonathan Teutenberg, Catherine Inez Watson:

The effect of audience familiarity on the perception of modified accent. 1970-1973 - Korin Richmond, Robert A. J. Clark, Susan Fitt:

On generating combilex pronunciations via morphological analysis. 1974-1977 - Florian Gödde, Sebastian Möller:

Say it as you mean it - analyzing free user comments in the VOICE awards corpus. 1978-1981 - Viktor Rozgic, Bo Xiao, Athanasios Katsamanis, Brian R. Baucom

, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
A new multichannel multi modal dyadic interaction database. 1982-1985 - Dau-Cheng Lyu, Tien Ping Tan, Engsiong Chng, Haizhou Li:

SEAME: a Mandarin-English code-switching speech corpus in south-east asia. 1986-1989
Speech Production: Analysis
- Daniel Felps, Christian Geng, Michael Berger, Korin Richmond, Ricardo Gutierrez-Osuna:

Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database. 1990-1993 - Vikram Ramanarayanan, Dani Byrd, Louis Goldstein, Shrikanth S. Narayanan:

Investigating articulatory setting - pauses, ready position, and rest - using real-time MRI. 1994-1997 - Chao Qin, Miguel Á. Carreira-Perpiñán:

Articulatory inversion of american English /turnr/ by conditional density modes. 1998-2001 - Atef Ben Youssef, Pierre Badin, Gérard Bailly:

Can tongue be recovered from face? the answer of data-driven statistical models. 2002-2005 - Francisco Torreira, Mirjam Ernestus:

Phrase-medial vowel devoicing in spontaneous French. 2006-2009 - Chierh Cheng, Yi Xu, Michele Gubian:

Exploring the mechanism of tonal contraction in taiwan Mandarin. 2010-2013
Paralanguage Cognition
- Benjamin Weiss, Felix Burkhardt:

Voice attributes affecting likability perception. 2014-2017 - Kristiina Jokinen, Kazuaki Harada, Masafumi Nishida, Seiichi Yamamoto:

Turn-alignment using eye-gaze and speech in conversational interaction. 2018-2021 - Tet Fei Yap, Julien Epps, Eliathamby Ambikairajah, Eric H. C. Choi:

An investigation of formant frequencies for cognitive load classification. 2022-2025 - Martijn Goudbeek, Mirjam Broersma:

Language specific effects of emotion on phoneme duration. 2026-2029 - Matthew Black, Athanasios Katsamanis, Chi-Chun Lee, Adam C. Lammert, Brian R. Baucom

, Andrew Christensen, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Automatic classification of married couples' behavior using audio features. 2030-2033 - Gideon Kowadlo, Patrick Ye, Ingrid Zukerman:

Influence of gestural salience on the interpretation of spoken requests. 2034-2037
Robust ASR Against Noise
- Vikramjit Mitra, Hosung Nam, Carol Y. Espy-Wilson, Elliot Saltzman, Louis Goldstein:

Robust word recognition using articulatory trajectories and gestures. 2038-2041 - Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino:

Performance estimation of noisy speech recognition considering recognition task complexity. 2042-2045 - Friedrich Faubel, Dietrich Klakow:

Estimating noise from noisy speech features with a monte carlo variant of the expectation maximization algorithm. 2046-2049 - Satoshi Tamura, Eriko Hishikawa, Wataru Taguchi, Satoru Hayamizu:

Template-based spectral estimation using microphone array for speech recognition. 2050-2053 - Aleem Mushtaq, Yu Tsao, Chin-Hui Lee:

A particle filter feature compensation approach to robust speech recognition. 2054-2057 - Chanwoo Kim

, Richard M. Stern:
Nonlinear enhancement of onset for robust speech recognition. 2058-2061 - Shirin Badiezadegan, Richard C. Rose:

Mask estimation in non-stationary noise environments for missing feature based robust speech recognition. 2062-2065 - Lae-Hoon Kim, Kyung-Tae Kim, Mark Hasegawa-Johnson:

Robust automatic speech recognition with decoder oriented ideal binary mask estimation. 2066-2069 - Gökhan Ince, Kazuhiro Nakadai, Tobias Rodemann, Hiroshi Tsujino, Jun-ichi Imura:

A robust speech recognition system against the ego noise of a robot. 2070-2073 - Kuo-Hao Wu, Chia-Ping Chen:

Empirical mode decomposition for noise-robust automatic speech recognition. 2074-2077 - Wooil Kim, Jun-Won Suh, John H. L. Hansen:

An effective feature compensation scheme tightly matched with speech recognizer employing SVM-based GMM generation. 2078-2081 - Jort F. Gemmeke, Tuomas Virtanen:

Artificial and online acquired noise dictionaries for noise robust ASR. 2082-2085 - Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda:

Voice activity detection based on conditional random fields using multiple features. 2086-2089 - Yong Zhao, Biing-Hwang Juang:

A comparative study of noise estimation algorithms for VTS-based robust speech recognition. 2090-2093 - Frank Seide, Pei Zhao:

On using missing-feature theory with cepstral features - approximations to the multivariate integral. 2094-2097 - Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves:

Using a DBN to integrate sparse classification and GMM-based ASR. 2098-2101
Voice Conversion and Speech Synthesis
- Axel Röbel:

Shape-invariant speech transformation with the phase vocoder. 2146-2149 - Kayoko Yanagisawa, Mark A. Huckvale:

A phonetic alternative to cross-language voice conversion in a text-dependent context: evaluation of speaker identity. 2150-2153 - Esther Klabbers, Alexander Kain, Jan P. H. van Santen:

Evaluation of speaker mimic technology for personalizing SGD voices. 2154-2157 - Kumi Ohta, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:

Adaptive voice-quality control based on one-to-many eigenvoice conversion. 2158-2161 - Fernando Villavicencio, Jordi Bonada:

Applying voice conversion to concatenative singing-voice synthesis. 2162-2165 - Miaomiao Wang, Miaomiao Wen, Keikichi Hirose, Nobuaki Minematsu:

Improved generation of fundamental frequency in HMM-based speech synthesis using generation process model. 2166-2169 - Ming Lei, Yi-Jian Wu, Frank K. Soong, Zhen-Hua Ling, Li-Rong Dai:

A hierarchical F0 modeling method for HMM-based speech synthesis. 2170-2173 - Javier Latorre, Mark J. F. Gales, Heiga Zen:

Training a parametric-based logF0 model with the minimum generation error criterion. 2174-2177 - Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu:

Improving Mandarin segmental duration prediction with automatically extracted syntax features. 2178-2181 - Daniel R. van Niekerk, Etienne Barnard:

An intonation model for TTS in sepedi. 2182-2185 - Michael Pucher, Dietmar Schabus, Junichi Yamagishi:

Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners. 2186-2189 - Gabriel Webster, Sacha Krstulovic, Kate M. Knill:

A comparison of pronunciation modeling approaches for HMM-TTS. 2190-2193 - Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi:

HMM-based text-to-articulatory-movement prediction and analysis of critical articulators. 2194-2197
Detection, Classification, and Segmentation
- Jiaxing Ye, Takumi Kobayashi, Tetsuya Higuchi:

Audio-based sports highlight detection by fourier local auto-correlations. 2198-2201 - Hynek Boril, Abhijeet Sangwan, Taufiq Hasan, John H. L. Hansen:

Automatic excitement-level detection for sports highlights generation. 2202-2205 - Jörg-Hendrik Bach, Jörn Anemüller:

Detecting novel objects in acoustic scenes through classifier incongruence. 2206-2209 - Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis:

A multidomain approach for automatic home environmental sound classification. 2210-2213 - Patrick Cardinal, Vishwa Gupta, Gilles Boulianne

:
Content-based advertisement detection. 2214-2217 - Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis:

Identification of abnormal audio events based on probabilistic novelty detection. 2218-2221 - Norbert Braunschweiler, Mark J. F. Gales, Sabine Buchholz:

Lightly supervised recognition for automatic alignment of large coherent speech recordings. 2222-2225 - Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman:

Incremental diarization of telephone conversations. 2226-2229 - Srikanth Cherla, V. Ramasubramanian:

Audio analytics by template modeling and 1-pass DP based decoding. 2230-2233 - Mariusz Ziólko, Jakub Galka, Bartosz Ziólko, Tomasz Drwiega:

Perceptual wavelet decomposition for speech segmentation. 2234-2237 - Venkatesh Keri, Kishore Prahallad:

A comparative study of constrained and unconstrained approaches for segmentation of speech signal. 2238-2241 - Morgan Sonderegger, Joseph Keshet:

Automatic discriminative measurement of voice onset time. 2242-2245 - Yi Ren Leng, Tran Huy Dat, Norihide Kitaoka, Haizhou Li:

Selective gammatone filterbank feature for robust sound event recognition. 2246-2249
Compressive Sensing for Speech and Language Processing (Special Session)
- Allen Y. Yang, Zihan Zhou, Yi Ma, Shankar Sastry:

Towards a robust face recognition system using compressive sensing. 2250-2253 - Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Abhinav Sethy:

Sparse representation features for speech recognition. 2254-2257 - Abhinav Sethy, Tara N. Sainath, Bhuvana Ramabhadran, Dimitri Kanevsky:

Data selection for language modeling using sparse representations. 2258-2261 - Jort F. Gemmeke, Ulpu Remes, Kalle J. Palomäki:

Observation uncertainty measures for sparse imputation. 2262-2265 - Tara N. Sainath, Sameer Maskey, Dimitri Kanevsky, Bhuvana Ramabhadran, David Nahamoo, Julia Hirschberg:

Sparse representations for text categorization. 2266-2269 - Garimella S. V. S. Sivaram, Sriram Ganapathy, Hynek Hermansky:

Sparse auto-associative neural networks: theory and application to speech recognition. 2270-2273
ASR: Lexical and Pronunciation Modeling
- Chi Hu, Xiaodan Zhuang, Mark Hasegawa-Johnson:

FSM-based pronunciation modeling using articulatory phonological code. 2274-2277 - Denis Jouvet, Dominique Fohr, Irina Illina:

Detailed pronunciation variant modeling for speech transcription. 2278-2281 - Line Adde, Bert Réveil, Jean-Pierre Martens, Torbjørn Svendsen:

A minimum classification error approach to pronunciation variation modeling of non-native proper names. 2282-2285 - Antoine Laurent, Sylvain Meignier, Téva Merlin, Paul Deléglise:

Acoustics-based phonetic transcription method for proper nouns. 2286-2289 - Tim Schlippe, Sebastian Ochs, Tanja Schultz:

Wiktionary as a source for automatic pronunciation extraction. 2290-2293 - Ibrahim Badr, Ian McGraw, James R. Glass:

Learning new word pronunciations from spoken examples. 2294-2297
Speaker Recognition and Diarization
- I-Fan Chen, Shih-Sian Cheng, Hsin-Min Wang:

Phonetic subspace mixture model for speaker diarization. 2298-2301 - Martin Zelenák, Carlos Segura, Javier Hernando:

Overlap detection for speaker diarization by fusing spectral and spatial features. 2302-2305 - Alfred Dielmann, Giulia Garau, Hervé Bourlard:

Floor holder detection and end of speaker turn prediction in meetings. 2306-2309 - Carlos Vaquero, Alfonso Ortega

, Jesús Antonio Villalba López, Antonio Miguel, Eduardo Lleida:
Confidence measures for speaker segmentation and their relation to speaker verification. 2310-2313 - Anthony Larcher, Christophe Lévy, Driss Matrouf, Jean-François Bonastre:

Decoupling session variability modelling and speaker characterisation. 2314-2317 - Cheung-Chi Leung, Donglai Zhu, Kong-Aik Lee, Bin Ma, Haizhou Li:

Incorporating MAP estimation and covariance transform for SVM based speaker recognition. 2318-2321
Speech and Audio Classification
- Stéphane Rossignol, Olivier Pietquin:

Single-speaker/multi-speaker co-channel speech classification. 2322-2325 - Oriol Vinyals, Gerald Friedland, Nelson Morgan:

Discriminative training for hierarchical clustering in speaker diarization. 2326-2329 - Jürgen T. Geiger, Frank Wallhoff, Gerhard Rigoll:

GMM-UBM based open-set online speaker diarization. 2330-2333 - Ladan Golipour, Douglas D. O'Shaughnessy:

A segment-based non-parametric approach for monophone recognition. 2334-2337 - Taras Butko, Climent Nadeu:

A fast one-pass-training feature selection technique for GMM-based acoustic event detection with audio-visual data. 2338-2341 - Nobuhide Yamakawa, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:

Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition. 2342-2345
Emotion Recognition
- Ling He, Margaret Lech, Nicholas B. Allen:

On the importance of glottal flow spectral energy for the recognition of emotions in speech. 2346-2349 - Laurence Devillers, Christophe Vaudable, Clément Chastagnol:

Real-life emotion-related states detection in call centers: a cross-corpora study. 2350-2353 - Ali Hassan, Robert I. Damper:

Multi-class and hierarchical SVMs for emotion recognition. 2354-2357 - David Philippou-Hübner, Bogdan Vlasenko, Tobias Grosser, Andreas Wendemuth:

Determining optimal features for emotion recognition from speech by applying an evolutionary algorithm. 2358-2361 - Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn W. Schuller

, Shrikanth S. Narayanan:
Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. 2362-2365 - Kartik Audhkhasi, Shrikanth S. Narayanan:

Data-dependent evaluator modeling and its application to emotional valence classification from speech. 2366-2369
Speech Coding, Modeling, and Transmission
- Zhanyu Ma, Arne Leijon:

Modelling speech line spectral frequencies with dirichlet mixture models. 2370-2373 - Zhanyu Ma, Arne Leijon:

PDF-optimized LSF vector quantization based on beta mixture models. 2374-2377 - José Enrique García Laínez, Alfonso Ortega

, Antonio Miguel, Eduardo Lleida:
Non-linear predictive vector quantization of feature vectors for distributed speech recognition. 2378-2381 - Lasse Laaksonen, Mikko Tammi, Vladimir Malenovsky, Tommy Vaillancourt, Mi Suk Lee, Tomofumi Yamanashi, Masahiro Oshikiri, Claude Lamblin, Balázs Kövesi, Lei Miao, Deming Zhang, Jon Gibbs, Holly Francois:

Superwideband extension of g.718 and g.729.1 speech codecs. 2382-2385 - José L. Carmona, Angel M. Gomez, Antonio M. Peinado, José L. Pérez-Córdoba, José A. González:

A multipulse FEC scheme based on amplitude estimation for CELP codecs over packet networks. 2386-2389 - Anssi Rämö, Henri Toukomaa:

Voice quality evaluation of recent open source codecs. 2390-2393 - Bengt J. Borgström, Per Henrik Borgström, Abeer Alwan:

Efficient HMM-based estimation of missing features, with applications to packet loss concealment. 2394-2397 - Xiaoqiang Xiao, Robert M. Nickel

:
Speech inventory based discriminative training for joint speech enhancement and low-rate speech coding. 2398-2401 - Qipeng Gong, Peter Kabal:

Quality-based playout buffering with FEC for conversational voIP. 2402-2405 - Masatsune Tamura, Takehiko Kagoshima, Masami Akamine:

Sub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding. 2406-2409 - Sundar Harshavardhan, Chandra Sekhar Seelamantula, Thippur V. Sreenivas:

A multimodal density function estimation approach to formant tracking. 2410-2413 - Heikki Rasilo, Unto K. Laine, Okko Johannes Räsänen:

Estimation studies of vocal tract shape trajectory using a variable length and lossy kelly-lochbaum model. 2414-2417
Speech Perception: Processing and Intelligibility
- Serajul Haque, Roberto Togneri:

A feature extraction method for automatic speech recognition based on the cochlear nucleus. 2454-2457 - Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky:

A phoneme recognition framework based on auditory spectro-temporal receptive fields. 2458-2461 - Amy V. Beeston, Guy J. Brown:

Perceptual compensation for effects of reverberation in speech identification: a computer model based on auditory efferent processing. 2462-2465 - Barbara Schuppler, Mirjam Ernestus, Wim A. van Dommelen, Jacques C. Koreman:

Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties. 2466-2469 - Matthew Robertson, Guy J. Brown, Wendy Lecluyse, Manasa Panda, Christine M. Tan:

A speech-in-noise test based on spoken digits: comparison of normal and impaired listeners using a computer model. 2470-2473 - Takayuki Kagomiya, Seiji Nakagawa:

Evaluation of bone-conducted ultrasonic hearing-aid regarding transmission of paralinguistic information: a comparison with cochlear implant simulator. 2474-2477 - Tim Jürgens, Stefan Fredelake, Ralf M. Meyer, Birger Kollmeier, Thomas Brand:

Challenging the speech intelligibility index: macroscopic vs. microscopic prediction of sentence recognition in normal and hearing-impaired listeners. 2478-2481 - Verena N. Uslar, Thomas Brand, Mirko Hanke, Rebecca Carroll, Esther Ruigendijk, Cornelia Hamann, Birger Kollmeier:

Does sentence complexity interfere with intelligibility in noise? evaluation of the oldenburg linguistically and audiologically controlled sentence test (OLACS). 2482-2485 - Juan-Pablo Ramirez, Hamed Ketabdar, Alexander Raake:

Intelligibility predictions for speech against fluctuating masker. 2486-2489 - Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano:

An effect of formant amplitude in vowel perception. 2490-2493 - Christopher I. Petkov, Benjamin Wilson:

Functional imaging of brain regions sensitive to communication sounds in primates. 2494-2497
Spoken Language Understanding and Spoken Language Translation I, II
- Ye-Yi Wang:

Strategies for statistical spoken language understanding with small amount of data - an empirical study. 2498-2501 - Bassam Jabaian, Laurent Besacier, Fabrice Lefèvre:

Investigating multiple approaches for SLU portability to a new language. 2502-2505 - Anja Austermann, Seiji Yamada, Kotaro Funakoshi, Mikio Nakano:

Learning naturally spoken commands for a robot. 2506-2509 - Amparo Albalate, Aparna Suchindranath, David Suendermann, Wolfgang Minker:

A semi-supervised cluster-and-label approach for utterance classification. 2510-2513 - Silvia Quarteroni, Giuseppe Riccardi:

Classifying dialog acts in human-human and human-machine spoken conversations. 2514-2517 - Fei Liu, Yang Liu:

Exploring speaker characteristics for meeting summarization. 2518-2521 - Shasha Xie, Hui Lin, Yang Liu:

Semi-supervised extractive speech summarization via co-training algorithm. 2522-2525 - Asli Celikyilmaz, Dilek Hakkani-Tür:

Extractive summarization using a latent variable model. 2526-2529 - Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan:

Hierarchical classification for speech-to-speech translation. 2530-2533 - Matthias Paulik, Alex Waibel:

Rapid development of speech translation using consecutive interpretation. 2534-2537 - Sameer Maskey, Steven J. Rennie, Bowen Zhou:

Combining many alignments for speech to speech translation. 2538-2541 - Pierre Gotab, Géraldine Damnati, Frédéric Béchet, Lionel Delphin-Poulat:

Online SLU model adaptation with a partial oracle. 2862-2865 - Om Deshmukh, Harish Doddala, Ashish Verma, Karthik Visweswariah:

Role of language models in spoken fluency evaluation. 2866-2869 - Sibel Yaman, Dilek Hakkani-Tür

, Gökhan Tür:
Social role discovery from spoken language using dynamic Bayesian networks. 2870-2873 - Michelle Hewlett Sanchez, Gökhan Tür, Luciana Ferrer, Dilek Hakkani-Tür

:
Domain adaptation and compensation for emotion detection. 2874-2877 - Sankaranarayanan Ananthakrishnan, Rohit Prasad, Prem Natarajan:

Phrase alignment confidence for statistical machine translation. 2878-2881 - Ian R. Lane, Alex Waibel:

Named-entity projection and data-driven morphological decomposition for field maintainable speech-to-speech translation systems. 2882-2885
Social Signals in Speech (Special Session)
- Paul M. Brunet, Marcela Charfuelan, Roderick Cowie, Marc Schröder, Hastings Donnan, Ellen Douglas-Cowie:

Detecting Politeness and efficiency in a cooperative social interaction. 2542-2545 - Nick Campbell, Stefan Scherer:

Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity. 2546-2549 - Emina Kurtic, Guy J. Brown, Bill Wells:

Resources for turn competition in overlap in multi-party conversations: speech rate, pausing and duration. 2550-2553 - Khiet P. Truong, Dirk Heylen:

Disambiguating the functions of conversational sounds with prosody: the case of 'yeah'. 2554-2557 - Marcela Charfuelan, Marc Schröder, Ingmar Steiner:

Prosody and voice quality of vocal social signals: the case of dominance in scenario meetings. 2558-2561 - Daniel Neiberg, Joakim Gustafson:

The prosody of Swedish conversational grunts. 2562-2565
Physiology and Pathology of Spoken Language
- Christophe Mertens, Francis Grenez, Lise Crevier-Buchman, Jean Schoentgen:

Reliable tracking based on speech sample salience of vocal cycle length perturbations. 2566-2569 - Hideki Kasuya, Hajime Yoshida, Satoshi Ebihara, Hiroki Mori:

Longitudinal changes of selected voice source parameters. 2570-2573 - Ali Alpan, Jean Schoentgen, Youri Maryn, Francis Grenez:

Automatic perceptual categorization of disordered connected speech. 2574-2577 - Heejin Kim, Panying Rong, Torrey M. Loucks, Mark Hasegawa-Johnson:

Kinematic analysis of tongue movement control in spastic dysarthria. 2578-2581 - Irene Jacobi, Lisette van der Molen, Maya van Rossum, Frans J. M. Hilgers:

Pre- and short-term posttreatment vocal functioning in patients with advanced head and neck cancer treated with concomitant chemoradiotherapy. 2582-2585 - Joan K. Y. Ma, Rüdiger Hoffmann:

Acoustic analysis of intonation in parkinson's disease. 2586-2589
Speaker Diarization
- Carlos Vaquero, Oriol Vinyals, Gerald Friedland:

A hybrid approach to online speaker diarization. 2638-2641 - Simon Bozonnet, Nicholas W. D. Evans, Xavier Anguera, Oriol Vinyals, Gerald Friedland, Corinne Fredouille:

System output combination for improved speaker diarization. 2642-2645 - Simon Bozonnet, Nicholas W. D. Evans, Corinne Fredouille, Dong Wang, Raphaël Troncy:

An integrated top-down/bottom-up approach to speaker diarization. 2646-2649 - Deepu Vijayasenan, Fabio Valente, Hervé Bourlard:

Advances in fast multistream diarization based on the information bottleneck framework. 2650-2653 - Giulia Garau, Alfred Dielmann, Hervé Bourlard:

Audio-visual synchronisation for speaker diarisation. 2654-2657 - Kyu Jeong Han, Shrikanth S. Narayanan:

An improved cluster model selection method for agglomerative hierarchical speaker clustering using incremental Gaussian mixture models. 2658-2661 - Nigel G. Ward, Olac Fuentes, Alejandro Vega:

Dialog prediction for a general model of turn-taking. 2662-2665 - Tobias Herbig, Franz Gerl, Wolfgang Minker:

Speaker tracking in an unsupervised speech controlled system. 2666-2669 - Paula Lopez-Otero, Laura Docío Fernández, Carmen García-Mateo:

MultiBIC: an improved speaker segmentation technique for TV shows. 2670-2673
Multi-Modal ASR, Including Audio-Visual ASR
- John-Paul Hosom, Tom Jakobs, Allen Baker, Susan Fager:

Automatic speech recognition for assistive writing in speech supplemented word prediction. 2674-2677 - Alexey Karpov, Andrey Ronzhin, Konstantin Markov, Milos Zelezný:

Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition. 2678-2681 - Louis H. Terry, Karen Livescu, Janet B. Pierrehumbert, Aggelos K. Katsaggelos:

Audio-visual anticipatory coarticulation modeling by human and machine. 2682-2685 - Matthias Janke, Michael Wand

, Tanja Schultz:
Impact of lack of acoustic feedback in EMG-based silent speech recognition. 2686-2689 - Chong-Jia Ni, Wenju Liu, Bo Xu:

Using prosody to improve Mandarin automatic speech recognition. 2690-2693 - Satoshi Tamura, Masato Ishikawa, Takashi Hashiba, Shin'ichi Takeuchi, Satoru Hayamizu:

A robust audio-visual speech recognition using audio-visual voice activity detection. 2694-2697 - Dorothea Kolossa, Jike Chong, Steffen Zeiler, Kurt Keutzer:

Efficient manycore CHMM speech recognition for audiovisual and multistream data. 2698-2701 - Takami Yoshida, Kazuhiro Nakadai:

Two-layered audio-visual integration in voice activity detection and automatic speech recognition for robots. 2702-2705 - Panikos Heracleous, Norihiro Hagita:

Non-audible murmur recognition based on fusion of audio and visual streams. 2706-2709
Speaker and Language Recognition
- Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel:

Improved n-gram phonotactic models for language recognition. 2710-2713 - Sirinoot Boonsuk, Donglai Zhu, Bin Ma, Atiwong Suchato, Proadpran Punyabukkana, Nattanun Thatphithakkul, Chai Wutiwiwatchai:

A study of term weighting in phonotactic approach to spoken language recognition. 2714-2717 - Sabato Marco Siniscalchi, Jeremy Reed, Torbjørn Svendsen, Chin-Hui Lee:

Exploiting context-dependency and acoustic resolution of universal speech attribute models in spoken language recognition. 2718-2721 - David Imseng, Mathew Magimai-Doss, Hervé Bourlard:

Hierarchical multilayer perceptron based language identification. 2722-2725 - Alvin F. Martin, Craig S. Greenberg:

The NIST 2010 speaker recognition evaluation. 2726-2729 - Shih-Sian Cheng, I-Fan Chen, Hsin-Min Wang:

Bayesian speaker recognition using Gaussian mixture model and laplace approximation. 2730-2733 - Tomi Kinnunen, Rahim Saeidi, Johan Sandberg, Maria Hansson-Sandsten:

What else is new than the hamming window? robust MFCCs for speaker recognition via multitapering. 2734-2737 - Achintya Kumar Sarkar, Srinivasan Umesh:

Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework. 2738-2741 - Zahi N. Karam, William M. Campbell:

Graph-embedding for speaker recognition. 2742-2745 - Chang Huai You, Haizhou Li, Kong-Aik Lee:

A hybrid modeling strategy for GMM-SVM speaker recognition with adaptive relevance factor. 2746-2749 - Sundar Harshavardhan, Thippur V. Sreenivas:

Robust mixture modeling using t-distribution: application to speaker ID. 2750-2753 - Chi-Sang Jung, Kyu Jeong Han, Hyunson Seo, Shrikanth S. Narayanan, Hong-Goo Kang:

A variable frame length and rate algorithm based on the spectral kurtosis measure for speaker verification. 2754-2757
Source Localization and Separation
- Kohei Hayashida, Masanori Morise, Takanobu Nishiura:

Near field sound source localization based on cross-power spectrum phase analysis with multiple microphones. 2758-2761 - Jinho Choi, Chang D. Yoo:

A maximum a posteriori sound source localization in reverberant and noisy conditions. 2762-2765 - Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto:

Multichannel source separation based on source location cue with log-spectral shaping by hidden Markov source model. 2766-2769 - Duc Thanh Chau

, Junfeng Li, Masato Akagi:
A DOA estimation algorithm based on equalization-cancellation theory. 2770-2773 - Tania Habib, Harald Romsdorfer:

Concurrent speaker localization using multi-band position-pitch (m-popi) algorithm with spectro-temporal pre-processing. 2774-2777 - Ji-Hyun Song, Kyu-Ho Lee, Yun-Sik Park, Sang-Ick Kang, Joon-Hyuk Chang:

On using Gaussian mixture model for double-talk detection in acoustic echo suppression. 2778-2781 - Cemil Demir, A. Taylan Cemgil, Murat Saraclar:

Catalog-based single-channel speech-music separation. 2782-2785 - Ke Hu, DeLiang Wang:

Unvoiced speech segregation based on CASA and spectral subtraction. 2786-2789 - Ke Hu, DeLiang Wang:

Unsupervised sequential organization for cochannel speech separation. 2790-2793
INTERSPEECH 2010 Paralinguistic Challenge (Special Session)
- Björn W. Schuller

, Stefan Steidl, Anton Batliner, Felix Burkhardt, Laurence Devillers, Christian A. Müller, Shrikanth S. Narayanan:
The INTERSPEECH 2010 paralinguistic challenge. 2794-2797 - Florian Lingenfelser, Johannes Wagner, Thurid Vogt, Jonghwa Kim, Elisabeth André

:
Age and gender classification from speech using decision level fusion and ensemble based techniques. 2798-2801 - Je Hun Jeon, Rui Xia, Yang Liu:

Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence. 2802-2805 - Phuoc Nguyen, Trung Le, Dat Tran, Xu Huang, Dharmendra Sharma:

Fuzzy support vector machines for age and gender classification. 2806-2809 - Rok Gajsek, Janez Zibert, Tadej Justin, Vitomir Struc, Bostjan Vesnicer, France Mihelic:

Gender and affect recognition based on GMM and GMM-UBM modeling with relevance MAP estimation. 2810-2813 - Royi Porat, Dan Lange, Yaniv Zigel:

Age recognition based on speech signals using weights supervector. 2814-2817 - Hugo Meinedo, Isabel Trancoso:

Age and gender classification using fusion of acoustic and prosodic features. 2818-2821 - Marcel Kockmann, Lukás Burget, Jan Cernocký:

Brno university of technology system for interspeech 2010 paralinguistic challenge. 2822-2825 - Ming Li, Chi-Sang Jung, Kyu Jeong Han:

Combining five acoustic level modeling methods for automatic speaker age and gender recognition. 2826-2829 - Tobias Bocklet, Georg Stemmer, Viktor Zeißler, Elmar Nöth:

Age and gender recognition based on multiple systems - early vs. late fusion. 2830-2833 - Michael Feld, Felix Burkhardt, Christian A. Müller:

Automatic speaker age and gender recognition in the car for tailoring dialog and mobile services. 2834-2837
Signal Processing for Music and Song
- Kiyoaki Aikawa, Junko Uenuma, Tomoko Akitake:

Acoustic correlates of voice quality improvement by voice training. 2886-2889 - Minghui Dong, Paul Y. Chan, Ling Cen, Haizhou Li, Jason Teo, Ping Jen Kua:

Phonetic segmentation of singing voice using MIDI and parallel speech. 2890-2893 - Keijiro Saino, Makoto Tachibana, Hideki Kenmochi:

A singing style modeling system for singing voice synthesizers. 2894-2897 - Jingzhou Yang, Jia Liu, Weiqiang Zhang:

A fast query by humming system based on notes. 2898-2901 - Seokhwan Jo, Sihyun Joo, Chang D. Yoo:

Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model. 2902-2905 - Jihoon Park, Kwang-Ki Kim, Jeongil Seo, Minsoo Hahn:

Modified spatial audio object coding scheme with harmonic extraction and elimination structure for interactive audio service. 2906-2909
Modeling First Language Acquisition
- Christina Bergmann, Michele Gubian, Lou Boves:

Modelling the effect of speaker familiarity and noise on infant word recognition. 2910-2913 - Kouki Miyazawa, Hideaki Kikuchi, Reiko Mazuka:

Unsupervised learning of vowels from continuous speech based on self-organized phoneme acquisition model. 2914-2917 - Andrew R. Plummer, Mary E. Beckman, Mikhail Belkin, Eric Fosler-Lussier, Benjamin Munson:

Learning speaker normalization using semisupervised manifold alignment. 2918-2921 - Okko Johannes Räsänen:

Fully unsupervised word learning from continuous speech using transitional probabilities of atomic acoustic events. 2922-2925 - Louis ten Bosch, Lou Boves:

Language acquisition and cross-modal associations: computational simulation of the result of infant studies. 2926-2929 - Maarten Versteegh, Louis ten Bosch, Lou Boves:

Active word learning under uncertain input conditions. 2930-2933
Discourse and Dialogue
- Rémi Lavalley, Chloé Clavel, Patrice Bellot, Marc El-Bèze:

Combining text categorization and dialog modeling for speaker role identification on call center conversations. 3062-3065 - Akira Nakamura, Satoru Hayamizu:

Topic-dependent n-gram models based on optimization of context lengths in LDA. 3066-3069 - Nicolas Obin, Volker Dellwo, Anne Lacheret, Xavier Rodet:

Expectations for discourse genre identification: a prosodic study. 3070-3073 - Ramón Granell

, Stephen G. Pulman, Carlos D. Martínez-Hinarejos, José-Miguel Benedí:
Dialogue act tagging and segmentation with a single perceptron. 3074-3077 - Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa:

Improving the readability of class lecture ASR results using a confusion network. 3078-3081
Voice Activity and Turn Detection
- Sang-Kyun Kim, Jae-Hun Choi, Sang-Ick Kang, Ji-Hyun Song, Joon-Hyuk Chang:

Toward detecting voice activity employing soft decision in second-order conditional MAP. 3082-3085 - Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura:

Voice activity detection in a reguarized reproducing kernel hilbert space. 3086-3089 - Ji Wu, Xiao-Lei Zhang, Wei Li:

A new VAD framework using statistical model and human knowledge based empirical rule. 3090-3093 - Mark C. Huggins, Brett Y. Smolenski, Aaron D. Lawson:

Adaptive high accuracy approaches to speech activity detection in noisy and hostile audio environments. 3094-3097 - Prasanta Kumar Ghosh, Andreas Tsiartas, Panayiotis G. Georgiou, Shrikanth S. Narayanan:

Robust voice activity detection in stereo recording with crosstalk. 3098-3101 - Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani:

Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization. 3102-3105 - Bowon Lee, Debargha Muhkerjee:

Spectral entropy-based voice activity detector for videoconferencing systems. 3106-3109 - David Dean, Sridha Sridharan, Robert Vogt, Michael Mason:

The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. 3110-3113 - Tao Yu, John H. L. Hansen:

A Bayesian approach to voice activity detection using multiple statistical models and discriminative training. 3114-3117 - Houman Ghaemmaghami, Brendan Baker, Robert Vogt, Sridha Sridharan:

Noise robust voice activity detection using features extracted from the time-domain autocorrelation function. 3118-3121 - Tasuku Oonishi, Koji Iwano, Sadaoki Furui:

VAD-measure-embedded decoder with online model adaptation. 3122-3125 - Shiwen Deng, Jiqing Han:

Robust statistical voice activity detection using a likelihood ratio sign test. 3126-3129 - Alexei V. Ivanov, Giuseppe Riccardi:

Automatic turn segmentation in spoken conversations. 3130-3133 - Yohei Kawaguchi, Masahito Togami, Yasunari Obuchi:

Turn taking-based conversation detection by using DOA estimation. 3134-3137

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














