


default search action
INTERSPEECH 2012: Portland, Oregon, USA
- 13th Annual Conference of the International Speech Communication Association, INTERSPEECH 2012, Portland, Oregon, USA, September 9-13, 2012. ISCA 2012
An Information-Extraction Approach to Speech Analysis and Processing
- Chin-Hui Lee:
An Information-Extraction Approach to Speech Analysis and Processing. 1-5
ASR: Deep Neural Networks I
- Dong Yu, Li Deng, Frank Seide:
Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks. 6-9 - Brian Kingsbury, Tara N. Sainath, Hagen Soltau:
Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization. 10-13 - George Saon, Brian Kingsbury:
Discriminative feature-space transforms using deep neural networks. 14-17 - Zoltán Tüske, Ralf Schlüter, Hermann Ney, Martin Sundermeyer:
Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both? 18-21 - Andrew L. Maas, Quoc V. Le, Tyler M. O'Neil, Oriol Vinyals, Patrick Nguyen, Andrew Y. Ng:
Recurrent Neural Networks for Noise Reduction in Robust ASR. 22-25 - Xie Chen, Adam Eversole, Gang Li, Dong Yu, Frank Seide:
Pipelined Back-Propagation for Context-Dependent Deep Neural Networks. 26-29
Language Recognition
- Hynek Boril, Abhijeet Sangwan, John H. L. Hansen:
Arabic Dialect Identification - 'Is the Secret in the Silence?' and Other Observations. 30-33 - Craig S. Greenberg, Alvin F. Martin, Mark A. Przybocki:
The 2011 NIST Language Recognition Evaluation. 34-37 - Luis Javier Rodríguez-Fuentes, Mikel Peñagarikano, Amparo Varona, Mireia Díez, Germán Bordel, Alberto Abad, David Martínez González, Jesús Antonio Villalba López, Alfonso Ortega, Eduardo Lleida:
The BLZ Submission to the NIST 2011 LRE: Data Collection, System Development and Performance. 38-41 - Luis Fernando D'Haro, Ondrej Glembek, Oldrich Plchot, Pavel Matejka, Mehdi Soufifar, Ricardo de Córdoba, Jan Cernocký:
Phonotactic Language Recognition using i-vectors and Phoneme Posteriogram Counts. 42-45 - Alan McCree, Bengt J. Borgstrom:
Supervector LDA: A New Approach to Reduced-Complexity I-vector Language Recognition. 46-49 - Pavel Matejka, Oldrich Plchot, Mehdi Soufifar, Ondrej Glembek, Luis Fernando D'Haro, Karel Veselý, Frantisek Grézl, Jeff Z. Ma, Spyros Matsoukas, Najim Dehak
:
Patrol Team Language Identification System for DARPA RATS P1 Evaluation. 50-53
Communication Disorders and Assistive Technologies
- Fang Hu, Yungang Wu, Wen Xu, Demin Han:
Articulatory Strategies in Obstruent Production in Mandarin Esophageal Speech. 54-57 - Marion Bechet, Fabrice Hirsch, Camille Fauth, Rudolph Sock:
Consonantal space area in Children with a Cleft Palate An acoustic Study. 58-61 - Milton Orlando Sarria-Paja, Tiago H. Falk:
Automated Dysarthria Severity Classification for Improved Objective Intelligibility Assessment of Spastic Dysarthric Speech. 62-65 - Abdellah Kacha, Francis Grenez, Jean Schoentgen:
Assessment of Disordered Voices Using Empirical Mode Decomposition in the Log-Spectral Domain. 66-69 - Anna Katharina Fuchs, Martin Hagmüller:
Learning an Artificial F0-Contour for ALT Speech. 70-73 - Korin Richmond, Steve Renals:
Ultrax: An Animated Midsagittal Vocal Tract Display for Speech Therapy. 74-77
Voice Conversion
- Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Yih-Ru Wang, Sin-Horng Chen:
A Study of Mutual Information for GMM-Based Spectral Conversion. 78-81 - Na Li, Yu Qiao:
Bayesian Mixture of Probabilistic Linear Regressions for Voice Conversion. 82-85 - Daniel Erro, Eva Navas, Inma Hernáez:
Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation. 86-89 - Winston S. Percybrooks, Elliot Moore:
A HMM approach to residual estimation for high resolution voice conversion. 90-93 - Tomoki Toda, Takashi Muramatsu, Hideki Banno:
Implementation of Computationally Efficient Real-Time Voice Conversion. 94-97 - Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose:
Effects of Speaker Adaptive Training on Tensor-based Arbitrary Speaker Conversion. 98-101
Speaker Trait Challenge - Part 1
- Björn W. Schuller, Stefan Steidl, Anton Batliner, Elmar Nöth, Alessandro Vinciarelli, Felix Burkhardt, Rob van Son, Felix Weninger, Florian Eyben, Tobias Bocklet, Gelareh Mohammadi, Benjamin Weiss:
The INTERSPEECH 2012 Speaker Trait Challenge. 254-257 - Tim Polzehl, Katrin Schoenenberg, Sebastian Möller, Florian Metze, Gelareh Mohammadi, Alessandro Vinciarelli:
On Speaker-Independent Personality Perception and Prediction from Speech. 258-261 - Kartik Audhkhasi, Angeliki Metallinou, Ming Li, Shrikanth S. Narayanan:
Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network. 262-265 - Clément Chastagnol, Laurence Devillers:
Personality traits detection using a parallelized modified SFFS algorithm. 266-269 - Jouni Pohjalainen, Serdar Kadioglu, Okko Räsänen:
Feature Selection for Speaker Traits. 270-273 - Johannes Wagner, Florian Lingenfelser, Elisabeth André
:
A Frame Pruning Approach for Paralinguistic Recognition Tasks. 274-277 - Alexei Ivanov, Xin Chen:
Modulation Spectrum Analysis for Speaker Personality Trait Recognition. 278-281 - Nicholas Cummins, Julien Epps, Jia Min Karen Kua:
A Comparison of Classification Paradigms for Speaker Likeability Determination. 282-285 - Dingchao Lu, Fei Sha:
Predicting Likability of Speakers with Gaussian Processes. 286-289 - Raymond Brueckner, Björn W. Schuller:
Likability Classification - A Not so Deep Neural Network Approach. 290-293 - Dongrui Wu:
Genetic Algorithm Based Feature Selection for Speaker Trait Classification. 294-297
Phonetics and Phonology
- Felix Weninger, Björn W. Schuller:
Discrimination of Linguistic and Non-Linguistic Vocalizations in Spontaneous Speech: Intra- and Inter-Corpus Perspectives. 102-105 - Mathieu Avanzi, Pauline Dubosson, Sandra Schwab, Nicolas Obin:
Accentual Transfer from Swiss-German to French. A Study of "Français Fédéral". 106-109 - Stefanie Jannedy, Melanie Weirich:
Phonology & the Interpretation of Fine Phonetic Detail in Berlin German. 110-113 - Carlos Toshinori Ishi, Chaoran Liu, Hiroshi Ishiguro, Norihiro Hagita:
Evaluation of a formant-based speech-driven lip motion generation. 114-117 - Jeffrey Kallay, Jeffrey J. Holliday:
Using spectral measures to differentiate Mandarin and Korean sibilant fricatives. 118-121 - Hua-Li Jian, Richard Konopka:
EFL Conversational Triads: Foreigner-directed Speech and Hyperarticulation. 122-125 - Iris Chuoying Ouyang, Khalil Iskarous:
Syllable perception depends on tone perception. 126-129 - Masako Fujimoto, Seiya Funatsu, Ichiro Fujimoto:
How consonants, dialect and speech rate affect vowel devoicing? 134-137
Enhancement
- Thomas Fehér, Dietmar Richter, Oliver Jokisch, Rüdiger Hoffmann:
Distance-Dependent Noise Reduction for Two-Channel Microphones. 138-141 - Wei Xue, Wenju Liu:
Direction of Arrival Estimation Based on Subband Weighting for Noisy Conditions. 142-145 - Jorge I. Marin-Hurtado, David V. Anderson:
Binaural Noise Reduction Using Frequency-Warped FIR Filters. 146-149 - Meng Yu, Jack Xin:
Exploring Off Time Nature for Speech Enhancement. 150-153 - Xulei Bao, Jie Zhu:
Model-based Single-Channel Dereverberation in Noisy Acoustical Environments. 154-157 - Majid Mirbagheri, Sahar Akram, Shihab A. Shamma:
An Auditory Inspired Multimodal Framework for Speech Enhancement. 158-161 - Oldooz Hazrati, Jaewook Lee, Philipos C. Loizou:
Binary Mask Estimation for Improved Speech Intelligibility in Reverberant Environments. 162-165 - Petko Nikolov Petkov, W. Bastiaan Kleijn
, Gustav Eje Henter:
Enhancing Subjective Speech Intelligibility Using a Statistical Model of Speech. 166-169
Language Modeling
- Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney:
Morpheme Level Feature-based Language Models for German LVCSR. 170-173 - Hitoshi Yamamoto, Paul R. Dixon, Shigeki Matsuda, Chiori Hori, Hideki Kashioka:
Tied-State Mixture Language Model for WFST-based Speech Recognition. 174-177 - Tanel Alumäe, Kaarel Kaljurand:
Maximum Entropy Language Model Adaptation for Mobile Speech Input. 178-181 - Gwénolé Lecorvé, John Dines, Thomas Hain
, Petr Motlícek:
Supervised and unsupervised Web-based language model domain adaptation. 182-185 - Yik-Cheung Tam, Paul Vozila:
A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling. 186-189 - Youzheng Wu, Kazuhiko Abe, Paul R. Dixon, Chiori Hori, Hideki Kashioka:
Leveraging Social Annotation for Topic Language Model Adaptation. 190-193 - Martin Sundermeyer, Ralf Schlüter, Hermann Ney:
LSTM Neural Networks for Language Modeling. 194-197 - Puyang Xu, Brian Roark, Sanjeev Khudanpur:
Phrasal Cohort Based Unsupervised Discriminative Language Modeling. 198-201 - Damianos G. Karakos, Brian Roark, Izhak Shafran, Kenji Sagae, Maider Lehr, Emily Tucker Prud'hommeaux, Puyang Xu, Nathan Glenn, Sanjeev Khudanpur, Murat Saraclar, Daniel M. Bikel, Mark Dredze, Chris Callison-Burch, Yuan Cao, Keith B. Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley:
Deriving conversation-based features from unlabeled speech for discriminative language modeling. 202-205 - Erinç Dikici, Arda Çelebi, Murat Saraclar:
Performance Comparison of Training Algorithms for Semi-Supervised Discriminative Language Modeling. 206-209 - Kapil Thadani, Fadi Biadsy, Daniel M. Bikel:
On-the-fly Topic Adaptation for YouTube Video Transcription. 210-213
Spoken Language Understanding and Dialog
- Bassam Jabaian, Fabrice Lefèvre, Laurent Besacier:
Portability of Semantic Annotations for Fast Development of Dialogue Corpora. 214-217 - Zoraida Callejas, Ramón López-Cózar:
Optimization of Dialog Strategies using Automatic Dialog Simulation and Statistical Dialog Management Techniques. 218-221 - Hiroaki Sugiyama, Toyomi Meguro, Yasuhiro Minami:
Preference-learning based Inverse Reinforcement Learning for Dialog Control. 222-225 - Raveesh Meena, Gabriel Skantze, Joakim Gustafson:
A Data-driven Approach to Understanding Spoken Route Directions in Human-Robot Dialogue. 226-229 - Kazunori Komatani, Akira Hirano, Mikio Nakano:
Detecting System-directed Utterances using Dialogue-level Features. 230-233 - Joaquin Planells, Lluís F. Hurtado, Emilio Sanchis, Encarna Segarra:
An Online Generated Transducer to Increase Dialog Manager Coverage. 234-237 - Abe Kazemzadeh, James Gibson, Juanchen Li, Sungbok Lee, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
A Sequential Bayesian Dialog Agent for Computational Ethnography. 238-241 - Frank Seide, Sean McDirmid:
ClippyScript: A Programming Language for Multi-Domain Dialogue Systems. 242-245 - Klaus-Peter Engelbrecht, Sebastian Möller:
Correlation Between Model-based Approximations of Grounding-related Cognition and User Judgments. 246-249 - Keith Vertanen, Per Ola Kristensson:
Spelling as a Complementary Strategy for Speech Recognition. 2294-2297
ASR: Noise Robustness
- Ken'ichi Kumatani, Bhiksha Raj, Rita Singh, John W. McDonough:
Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition. 298-301 - Felix Weninger, Martin Wöllmer, Björn W. Schuller:
Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise. 302-305 - Liang Lu, K. K. Chin, Arnab Ghoshal, Steve Renals:
Noise Compensation for Subspace Gaussian Mixture Models. 306-309 - Yang Sun, Mathew M. Doss, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves:
Combination of Sparse Classification and Multilayer Perceptron for Noise-robust ASR. 310-313 - Weifeng Li, Hervé Bourlard:
Sub-band based Log-energy and Its Dynamic Range Stretching for Robust In-car Speech Recognition. 314-317 - Mohamed Bouallegue, Driss Matrouf, Georges Linarès, Mickael Rouvier:
Subspace Gaussian Mixture Models Based on Noise Compensation for Speech Recognition. 318-321
Spoken Language Understanding and Dialog II
- Florian Kretzschmar, Sebastian Möller:
"Help Me, I Need More User Tests!" User Simulations as Supportive Tool in the Development Process of Spoken Dialogue Systems. 322-325 - Silke M. Witt:
Caller Response Timing Patterns in Spoken Dialog Systems. 326-329 - Dilek Hakkani-Tür, Gökhan Tür, Larry P. Heck, Ashley Fidler, Asli Celikyilmaz:
A Discriminative Classification-Based Approach to Information State Updates for a Multi-Domain Dialog System. 330-333 - Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani-Tür, Larry P. Heck:
Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog. 334-337 - Gökhan Tür, Minwoo Jeong, Ye-Yi Wang, Dilek Hakkani-Tür, Larry P. Heck:
Exploiting the Semantic Web for Unsupervised Natural Language Semantic Parsing. 338-341 - Andrew Fandrianto, Maxine Eskénazi:
Prosodic Entrainment in an Information-Driven Dialog System. 342-345
Paralinguistics I
- Fabien Ringeval, Mohamed Chetouani, Björn W. Schuller:
Novel Metrics of Speech Rhythm for the Assessment of Emotion. 346-349 - Martin Wöllmer, Florian Eyben, Björn W. Schuller, Gerhard Rigoll:
Temporal and Situational Context Modeling for Improved Dominance Recognition in Meetings. 350-353 - Marc Swerts, Kitty Leuverink, Madelène Munnik, Vera Nijveld:
Audiovisual correlates of basic emotions in blind and sighted people. 354-357 - Houwei Cao, Ragini Verma, Ani Nenkova:
Combining Ranking and Classification to Improve Emotion Recognition in Spontaneous Speech. 358-361 - Zixing Zhang, Björn W. Schuller:
Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition. 362-365 - Viktor Rozgic, Sankaranarayanan Ananthakrishnan, Shirin Saleem, Rohit Kumar, Aravind Namandi Vembu, Rohit Prasad:
Emotion Recognition using Acoustic and Lexical Features. 366-369
Pitch and HarMondayic Analysis
- Phillip L. De Leon, Bryan Stewart, Junichi Yamagishi:
Synthetic Speech Discrimination using Pitch Pattern Statistics Derived from Image Analysis. 370-373 - Zhengqi Wen, Hideki Kawahara, Jianhua Tao:
Pitch-Scaled Analysis based Residual Reconstruction for Speech Analysis and Synthesis. 374-377 - Feng Huang, Tan Lee:
Robust Pitch Estimation Using l1-regularized Maximum Likelihood Estimation. 378-381 - Gilles Degottex, Yannis Stylianou:
A full-band adaptive harmonic representation of speech. 382-385 - Hideki Kawahara, Masanori Morise, Ryuichi Nisimura, Toshio Irino:
Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation. 386-389 - Kota Yoshizato, Hirokazu Kameoka, Daisuke Saito, Shigeki Sagayama:
Hidden Markov Convolutive Mixture Model for Pitch Contour Analysis of Speech. 390-393
Speaker Trait Challenge - Part 2
- Benjamin Weiss, Felix Burkhardt:
Is 'not bad' good enough? Aspects of unknown voices' likability. 510-513 - Michelle Hewlett Sanchez, Aaron Lawson, Dimitra Vergyri, Harry Bratt:
Multi-System Fusion of Extended Context Prosodic and Cepstral Features for Paralinguistic Speaker Trait Classification. 514-517 - Harm Buisman, Eric O. Postma:
The log-Gabor method: speech classification using spectrogram image analysis. 518-521 - Yazid Attabi, Pierre Dumouchel:
Anchor Models and WCCN Normalization For Speaker Trait Classification. 522-525 - Claude Montacié, Marie-José Caraty:
Pitch and Intonation Contribution to Speakers' Traits Classification. 526-529 - Gopala Krishna Anumanchipalli, Hugo Meinedo, Miguel M. F. Bugalho, Isabel Trancoso, Luís C. Oliveira, Alan W. Black:
Text-dependent pathological voice detection. 530-533 - Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, Shrikanth S. Narayanan:
Intelligibility classification of pathological speech using fusion of multiple high level descriptors. 534-537 - Anthony P. Stark, Alireza Bayestehtashk, Meysam Asgari, Izhak Shafran:
Interspeech Pathology Challenge: Investigations into Speaker and Sentence Specific Effects. 538-541 - Xinhui Zhou, Daniel Garcia-Romero, Nima Mesgarani, Maureen L. Stone, Carol Y. Espy-Wilson, Shihab A. Shamma:
Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations. 542-545 - Dong-Yan Huang, Yongwei Zhu, Dajun Wu, Rongshan Yu:
Detecting Intelligibility by Linear Dimensionality Reduction and Normalized Voice Quality Hierarchical Features. 546-549
Perceptual Learning and Perceptual Cues to Segments and Tones
- Matthias J. Sjerps, James M. McQueen, Holger Mitterer:
Extrinsic normalization for vocal tracts depends on the signal, not on attention. 394-397 - Hiroaki Hatano, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Kiyoshi Honda, Shinobu Masaki:
Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers. 402-405 - Natthawut Kertkeidkachorn, Surapol Vorapatratorn, Sirinart Tangruamsub, Proadpran Punyabukkana, Atiwong Suchato:
Contribution of Spectral Shapes to Tone Perception. 414-417 - Julien Meyer:
Pitch and phonological perception of tone in the Suruí language of Rondônia (Brazil): identification task of LHL and LHH tonal patterns. 422-425 - Rui Cao, Ratree Wayland, Edith Kaan:
The Role of Creaky Voice in Mandarin Tone 2 and Tone 3 Perception. 426-429 - K. S. Nataraj, Prem C. Pandey:
Detection of Transition Segments in VCV Utterances for Estimation of the Place of Closure of Oral Stops for Speech Training. 406-409 - Odette Scharenborg, Esther Janse, Andrea Weber:
Perceptual Learning of /f/-/s/ by Older Listeners. 398-401 - Cyril Dubois, Rudolph Sock:
Audiovisual discrimination of CV syllables: a simultaneous fMRI-EEG study. 410-413 - Charturong Tantibundhit, Chutamanee Onsuwan, P. Phienphanich, Chai Wutiwiwatchai:
Methodological Issues in Assessing Perceptual Representation of Consonant Sounds in Thai. 418-421 - Michael D. Tyler, Mona Faris:
Can litheners retune native categories acroth a thoneme boundary? 430-433
Speech Synthesis: Prosody
- Eric Morley, Esther Klabbers, Jan P. H. van Santen, Alexander Kain, Seyed Hamidreza Mohammadi:
Synthetic F0 Can Effectively Convey Speaker ID in Delexicalized Speech. 434-437 - Timo Baumann, David Schlangen:
Evaluating Prosodic Processing for Incremental Speech Synthesis. 438-441