default search action
INTERSPEECH 2014: Singapore
- Haizhou Li, Helen M. Meng, Bin Ma, Engsiong Chng, Lei Xie:
15th Annual Conference of the International Speech Communication Association, INTERSPEECH 2014, Singapore, September 14-18, 2014. ISCA 2014
Keynote
- Anne Cutler:
Learning about speech. - K. J. Ray Liu:
Decision learning in data science: where John Nash meets social media. - Lori Lamel:
Language diversity: speech processing in a multi-lingual context. - William S.-Y. Wang:
Sound patterns in language. - Li Deng:
Achievements and challenges of deep learning - from speech analysis and recognition to language and multimodal processing.
Deep Neural Networks for Speech Generation and Synthesis (Special
- Dong Yu, Adam Eversole, Michael L. Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoffrey Zweig, Christopher J. Rossbach, Jon Currey:
An introduction to computational networks and the computational network toolkit (invited talk).
Multi-Lingual ASR
- Anne Cutler, Yu Zhang, Ekapol Chuangsuwanich, James R. Glass:
Language ID-based training of multilingual stacked bottleneck features. 1-5 - Van Hai Do, Xiong Xiao, Chng Eng Siong, Haizhou Li:
Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR. 6-10 - Ngoc Thang Vu, Yuanfan Wang, Marten Klose, Zlatka Mihaylova, Tanja Schultz:
Improving ASR performance on non-native speech using multilingual and crosslingual information. 11-15 - Kate M. Knill, Mark J. F. Gales, Anton Ragni, Shakti P. Rath:
Language independent and unsupervised acoustic models for speech recognition and keyword spotting. 16-20 - Peter Bell, Joris Driesen, Steve Renals:
Cross-lingual adaptation with multi-task adaptive networks. 21-25 - Marzieh Razavi, Mathew Magimai-Doss:
On recognition of non-native speech using probabilistic lexical model. 26-30
Prosody Processing
- Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Direct F0 control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation. 31-35 - Daniel R. van Niekerk, Etienne Barnard:
A target approximation intonation model for yorùbá TTS. 36-40 - Anandaswarup Vadapalli, Kishore Prahallad:
Learning continuous-valued word representations for phrase break prediction. 41-45 - Hao Che, Jianhua Tao, Ya Li:
Improving Mandarin prosodic boundary prediction with rich syntactic features. 46-50 - Rasmus Dall, Marcus Tomalin, Mirjam Wester, William J. Byrne, Simon King:
Investigating automatic & human filled pause insertion for speech synthesis. 51-55 - Rasmus Dall, Mirjam Wester, Martin Corley:
The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech. 56-60
Speaker Recognition - Applications
- Elie Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu, Sébastien Marcel:
Introducing i-vectors for joint anti-spoofing and speaker verification. 61-65 - Ryan Leary, Walter Andrews:
Random projections for large-scale speaker search. 66-70 - Corinne Fredouille, Delphine Charlet:
Analysis of i-vector framework for speaker identification in TV-shows. 71-75 - Antoine Laurent, Nathalie Camelin, Christian Raymond:
Boosting bonsai trees for efficient features combination: application to speaker role identification. 76-80 - Yves Raimond, Thomas Nixon:
Identifying contributors in the BBC world service archive. 81-85 - Finnian Kelly, Rahim Saeidi, Naomi Harte, David A. van Leeuwen:
Effect of long-term ageing on i-vector speaker verification. 86-90
Phonetics and Phonology 1, 2
- Maarten Versteegh, Amanda Seidl, Alejandrina Cristià:
Acoustic correlates of phonological status. 91-95 - Manu Airaksinen, Paavo Alku:
Parameterization of the glottal source with the phase plane plot. 96-100 - Phil Rose:
Transcribing tone - a likelihood-based quantitative evaluation of chao's tone letters. 101-105 - Diyana Hamzah, James Sneed German:
Intonational phonology and prosodic hierarchy in malay. 106-110 - Uwe D. Reichel, Katalin Mády:
Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian. 111-115 - George Christodoulides, Mathieu Avanzi:
An evaluation of machine learning methods for prominence detection in French. 116-119
Open Domain Situated Conversational Interaction (Special Session)
- Aasish Pappu, Alexander I. Rudnicky:
Learning situated knowledge bases through dialog. 120-124 - Teruhisa Misu:
Crowdsourcing for situated dialog systems in a moving car. 125-129 - Ryuichiro Higashinaka, Toyomi Meguro, Kenji Imamura, Hiroaki Sugiyama, Toshiro Makino, Yoshihiro Matsuo:
Evaluating coherence in open domain conversational systems. 130-134 - Frédéric Béchet, Alexis Nasr, Benoît Favre:
Adapting dependency parsing to spontaneous speech for open domain spoken language understanding. 135-139 - Milica Gasic, Dongho Kim, Pirros Tsiakoulis, Catherine Breslin, Matthew Henderson, Martin Szummer, Blaise Thomson, Steve J. Young:
Incremental on-line adaptation of POMDP-based dialogue managers to extended domains. 140-144 - Jean-Philippe Robichaud, Paul A. Crook, Puyang Xu, Omar Zia Khan, Ruhi Sarikaya:
Hypotheses ranking for robust domain classification and tracking in dialogue systems. 145-149
Speech Production: Models and Acoustics
- Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan:
Motor control primitives arising from a learned dynamical systems model of speech articulation. 150-154 - Chia-Hsin Yeh, Chiung-Yao Wang, Jung-Yueh Tu:
Nonword repetition of taiwanese disyllabic tonal sequences in adults with language attrition. 155-158 - Andreas Windmann, Juraj Simko, Petra Wagner:
A unified account of prominence effects in an optimization-based model of speech timing. 159-163 - Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan:
Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraints. 164-168 - Prasad Sudhakar, Prasanta Kumar Ghosh:
Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: benefit to speech recognition. 169-173 - Jun Wang, William F. Katz, Thomas F. Campbell:
Contribution of tongue lateral to consonant production. 174-178 - Min Liu, Shuju Shi, Jinsong Zhang:
A preliminary study on acoustic correlates of tone2+tone2 disyllabic word stress in Mandarin. 179-183 - Mohammad Abuoudeh, Olivier Crouzet:
Vowel length impact on locus equation parameters: an investigation on jordanian Arabic. 184-188 - Philip J. Roberts, Henning Reetz, Aditi Lahiri:
Corpus-testing a fricative discriminator; or, just how invariant is this invariant? 189-192 - Brian O. Bush, Alexander Kain:
Modeling coarticulation in continuous speech. 193-197 - Khalid Daoudi, Blaise Bertrac:
On classification between normal and pathological voices using the MEEI-kayPENTAX database: issues and consequences. 198-202 - Véronique Bukmaier, Jonathan Harrington, Ulrich Reubold, Felicitas Kleber:
Synchronic variation in the articulation and the acoustics of the Polish three-way place distinction in sibilants and its implications for diachronic change. 203-207
Extraction of Para-Linguistic Information
- Rahul Gupta, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan:
Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughter. 208-212 - Bo Xiao, Daniel Bone, Maarten Van Segbroeck, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Modeling therapist empathy through prosody in drug addiction counseling. 213-217 - Daniel Bone, Chi-Chun Lee, Alexandros Potamianos, Shrikanth S. Narayanan:
An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model. 218-222 - Kun Han, Dong Yu, Ivan Tashev:
Speech emotion recognition using deep neural network and extreme learning machine. 223-227 - Khiet P. Truong, Gerben J. Westerhof, Franciska de Jong, Dirk Heylen:
An annotation scheme for sighs in spontaneous dialogue. 228-232 - Lei He, Volker Dellwo:
Speaker idiosyncratic variability of intensity across syllables. 233-237 - Soroosh Mariooryad, Reza Lotfian, Carlos Busso:
Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora. 238-242 - Saeid Safavi, Martin J. Russell, Peter Jancovic:
Identification of age-group from children's speech by computers and humans. 243-247
Spoken Language Understanding
- Mohamed Morchid, Richard Dufour, Mohamed Bouallegue, Georges Linarès, Renato De Mori:
Theme identification in human-human conversations with features from specific speaker type hidden spaces. 248-252 - Alex Marin, Roman Holenstein, Ruhi Sarikaya, Mari Ostendorf:
Learning phrase patterns for text classification using a knowledge graph and unlabeled data. 253-257 - Puyang Xu, Ruhi Sarikaya:
Targeted feature dropout for robust slot filling in natural language understanding. 258-262 - Sz-Rung Shiang, Hung-yi Lee, Lin-Shan Lee:
Spoken question answering using tree-structured conditional random fields and two-layer random walk. 263-267 - Ruhi Sarikaya, Asli Celikyilmaz, Anoop Deoras, Minwoo Jeong:
Shrinkage based features for slot tagging with conditional random fields. 268-272 - Yangyang Shi, Yi-Cheng Pan, Mei-Yuh Hwang:
Cluster based Chinese abbreviation modeling. 273-277 - Xiantao Zhang, Dongchen Li, Xihong Wu:
Parsing named entity as syntactic structure. 278-282 - Gökhan Tür, Anoop Deoras, Dilek Hakkani-Tür:
Detecting out-of-domain utterances addressed to a virtual personal assistant. 283-287 - Spiros Georgiladakis, Christina Unger, Elias Iosif, Sebastian Walter, Philipp Cimiano, Euripides G. M. Petrakis, Alexandros Potamianos:
Fusion of knowledge-based and data-driven approaches to grammar induction. 288-292 - Denys Katerenchuk, Andrew Rosenberg:
Improving named entity recognition with prosodic features. 293-297 - Suman V. Ravuri, Andreas Stolcke:
Neural network models for lexical addressee detection. 298-302 - Valerie Freeman, Julian Chan, Gina-Anne Levow, Richard A. Wright, Mari Ostendorf, Victoria Zayats:
Manipulating stance and involvement using collaborative tasks: an exploratory comparison. 303-307
Spoken Dialogue Systems
- Fabrizio Ghigi, Maxine Eskénazi, M. Inés Torres, Sungjin Lee:
Incremental dialog processing in a task-oriented dialog. 308-312 - Naoki Hotta, Kazunori Komatani, Satoshi Sato, Mikio Nakano:
Detecting incorrectly-segmented utterances for posteriori restoration of turn-taking and ASR results. 313-317 - Hany Hassan, Lee Schwartz, Dilek Hakkani-Tür, Gökhan Tür:
Segmentation and disfluency removal for conversational speech translation. 318-322 - Shinji Watanabe, John R. Hershey, Tim K. Marks, Youichi Fujii, Yusuke Koji:
Cost-level integration of statistical and rule-based dialog managers. 323-327 - Dongho Kim, Catherine Breslin, Pirros Tsiakoulis, Milica Gasic, Matthew Henderson, Steve J. Young:
Inverse reinforcement learning for micro-turn management. 328-332 - John Kane, Irena Yanushevskaya, Céline De Looze, Brian Vaughan, Ailbhe Ní Chasaide:
Analysing the prosodic characteristics of speech-chunks preceding silences in task-based interactions. 333-337
DNN Architectures and Robust Recognition
- Hasim Sak, Andrew W. Senior, Françoise Beaufays:
Long short-term memory recurrent neural network architectures for large scale acoustic modeling. 338-342 - George Saon, Hagen Soltau, Ahmad Emami, Michael Picheny:
Unfolded recurrent neural networks for speech recognition. 343-347 - Vikrant Singh Tomar, Richard C. Rose:
Manifold regularized deep neural networks. 348-352 - Bo Li, Khe Chai Sim:
Modeling long temporal contexts for robust DNN-based speech recognition. 353-357 - Feipeng Li, Phani S. Nidadavolu, Hynek Hermansky:
A long, deep and wide artificial neural net for robust speech recognition in unknown noise. 358-362 - Ladislav Seps, Jirí Málek, Petr Cerva, Jan Nouza:
Investigation of deep neural networks for robust recognition of nonlinearly distorted speech. 363-367
Speaker Recognition - Evaluation and Forensics
- Désiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Craig S. Greenberg, Alvin F. Martin, Alan McCree, Mark A. Przybocki, Douglas A. Reynolds:
Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challenge. 368-372 - David A. van Leeuwen, Niko Brümmer:
Constrained speaker linking. 373-377 - Sergey Novoselov, Timur Pekhovsky, Konstantin Simonchik, Andrey Shulipa:
RBM-PLDA subsystem for the NIST i-vector challenge. 378-382 - Stephen H. Shum, Najim Dehak, James R. Glass:
Limited labels for unlimited data: active learning for speaker recognition. 383-387 - Niko Brümmer, Albert Swart:
Bayesian calibration for forensic evidence reporting. 388-392 - Shunichi Ishihara:
Replicate mismatch between test/background and development databases: the impact on the performance of likelihood ratio-based forensic voice comparison. 393-397
Speech Production I, II
- Manu Airaksinen, Tom Bäckström, Paavo Alku:
Automatic estimation of the lip radiation effect in glottal inverse filtering. 398-402 - Marcelo de Oliveira Rosa:
Simulation of 3d larynges with asymmetric distribution of viscoelastic properties in their vocal folds. 403-407 - Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura:
Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods. 408-412 - Jangwon Kim, Donna Erickson, Sungbok Lee, Shrikanth S. Narayanan:
A study of invariant properties and variation patterns in the converter/distributor model for emotional speech. 413-417 - Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer:
A hybrid approach to 3d tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation. 418-421 - Tokihiko Kaburagi:
Estimation of vocal-tract shape from speech spectrum and speech resynthesis based on a generative model. 422-426
INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE)
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval, Erik Marchi, Yue Zhang:
The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. 427-431 - Jouni Pohjalainen, Paavo Alku:
Filtering and subspace selection for spectral features in detecting speech under physical stress. 432-436 - Ming Li:
Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens. 437-441 - Heysem Kaya, Tugçe Özkaptan, Albert Ali Salah, Sadik Fikret Gürgen:
Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction. 442-446 - How Jing, Ting-Yao Hu, Hung-Shin Lee, Wei-Chen Chen, Chi-Chun Lee, Yu Tsao, Hsin-Min Wang:
Ensemble of machine learning algorithms for cognitive and physical speaker load detection. 447-451 - Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth:
Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks. 452-456
Hearing and Perception
- Nandini Iyer, Eric R. Thompson, Brian D. Simpson, Griffin D. Romigh:
Revisiting the right-ear advantage for speech: implications for speech displays. 457-461 - Louis ten Bosch, Mirjam Ernestus, Lou Boves:
Comparing reaction time sequences from human participants and computational models. 462-466 - Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu:
Detecting the number of competing speakers - human selective hearing versus spectrogram distance based estimator. 467-470 - Guo Li, Gang Peng:
The influence of sensory memory and attention on the context effect in talker normalization. 471-475 - Payton Lin, Fei Chen, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao:
Automatic speech recognition with primarily temporal envelope information. 476-480 - Ying-Hui Lai, Fei Chen, Yu Tsao:
An adaptive envelope compression strategy for speech processing in cochlear implants. 481-484 - Brian S. Helfer, Thomas F. Quatieri, James R. Williamson, Laurel Keyes, Benjamin Evans, W. Nicholas Greene, Trina Vian, Joseph Lacirignola, Trey E. Shenk, Thomas M. Talavage, Jeff Palmer, Kristin Heaton:
Articulatory dynamics and coordination in classifying cognitive change with preclinical mTBI. 485-489 - Nozomi Jinbo, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics. 490-494 - Dongmei Wang, James M. Kates, John H. L. Hansen:
Investigation of the relative perceptual importance of temporal envelope and temporal fine structure between tonal and non-tonal languages. 495-498 - Daniel Fogerty, Fei Chen:
Vowel spectral contributions to English and Mandarin sentence intelligibility. 499-503 - Vinay Kumar Mittal, B. Yegnanarayana:
Significance of aperiodicity in the pitch perception of expressive voices. 504-508
Cross-Linguistic Studies
- Mirjam Wester, María Luisa García Lecumberri, Martin Cooke:
DIAPIX-FL: a symmetric corpus of problem-solving dialogues in first and second languages. 509-513 - Christophe Coupé, Yoon Mi Oh, François Pellegrino, Egidio Marsico:
Cross-linguistic investigations of oral and silent reading. 514-518 - Juul Coumans, Roeland van Hout, Odette Scharenborg:
Non-native word recognition in noise: the role of word-initial and word-final information. 519-523 - Janice Wing Sze Wong:
The effects of high and low variability phonetic training on the perception and production of English vowels /e/-/æ/ by Cantonese ESL learners with high and low L2 proficiency levels. 524-528 - Pepi Burgos, Mátyás Jani, Catia Cucchiarini, Roeland van Hout, Helmer Strik:
Dutch vowel production by Spanish learners: duration and spectral features. 529-533 - Angelos Lengeris, Katerina Nicolaidis:
English consonant confusions by Greek listeners in quiet and noise and the role of phonological short-term memory. 534-538 - Sylvain Detey, Isabelle Racine, Julien Eychenne, Yuji Kawaguchi:
Corpus-based L2 phonological data and semi-automatic perceptual analysis: the case of nasal vowels produced by beginner Japanese learners of French. 539-543 - Gábor Pintér, Shinobu Mizuguchi, Koichi Tateishi:
Perception of prosodic prominence and boundaries by L1 and L2 speakers of English. 544-547 - Rose Thomas Kalathottukaren, Suzanne C. Purdy, Elaine Ballard:
Prosody perception, reading accuracy, nonliteral language comprehension, and music and tonal pitch discrimination in school aged children. 548-552