


default search action
17th Interspeech 2016: San Francisco, CA, USA
- Nelson Morgan:
17th Annual Conference of the International Speech Communication Association, Interspeech 2016, San Francisco, CA, USA, September 8-12, 2016. ISCA 2016
Keynote 1: ISCA Medalist: John Makhoul
- John Makhoul:
A 50-Year Retrospective on Speech and Language Processing. 1
Neural Networks in Speech Recognition
- Ivan Medennikov, Alexey Prudnikov, Alexander Zatvornitskiy
:
Improving English Conversational Telephone Speech Recognition. 2-6 - George Saon
, Tom Sercu, Steven J. Rennie, Hong-Kwang Jeff Kuo:
The IBM 2016 English Conversational Telephone Speech Recognition System. 7-11 - Liang Lu, Steve Renals
:
Small-Footprint Deep Neural Networks with Highway Connections for Speech Recognition. 12-16 - Dong Yu, Wayne Xiong, Jasha Droppo
, Andreas Stolcke, Guoli Ye, Jinyu Li
, Geoffrey Zweig:
Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention. 17-21 - Golan Pundak, Tara N. Sainath:
Lower Frame Rate Neural Network Acoustic Models. 22-26 - Gakuto Kurata, Brian Kingsbury:
Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling. 27-31
Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines
- Lei Chen, Gary Feng, Michelle P. Martin-Raugh, Chee Wee Leong, Christopher Kitchen, Su-Youn Yoon, Blair Lehman, Harrison Kell, Chong Min Lee:
Automatic Scoring of Monologue Video Interviews Using Multimodal Cues. 32-36 - Chee Seng Chong, Jeesun Kim
, Chris Davis
:
The Sound of Disgust: How Facial Expression May Influence Speech Production. 37-41 - Zhaojun Yang, Shrikanth S. Narayanan:
Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions. 42-46 - Attigodu C. Ganesh, Frédéric Berthommier, Jean-Luc Schwartz:
Audiovisual Speech Scene Analysis in the Context of Competing Sources. 47-51 - Najmeh Sadoughi, Carlos Busso
:
Head Motion Generation with Synthetic Speech: A Data Driven Approach. 52-56 - Jeesun Kim
, Chris Davis
:
The Consistency and Stability of Acoustic and Visual Cues for Different Prosodic Attitudes. 57-61 - Jeesun Kim, Gérard Bailly:
Introduction to Poster Presentation of Part II.
Prosody
- Irene Vogel, Laura Spinu:
The Unit of Speech Encoding: The Case of Romanian. 62-66 - Jeanin Jügler, Frank Zimmerer
, Jürgen Trouvain, Bernd Möbius:
The Perceptual Effect of L1 Prosody Transplantation on L2 Speech: The Case of French Accented German. 67-71 - Bijun Ling
, Jie Liang:
Organizing Syllables into Sandhi Domains - Evidence from F0 and Duration Patterns in Shanghai Chinese. 72-76 - Neville Ryant, Mark Y. Liberman
:
Automatic Analysis of Phonetic Speech Style Dimensions. 77-81 - Angeliki Athanasopoulou, Irene Vogel:
The Acoustic Manifestation of Prominence in Stressless Languages. 82-86 - Wei Lai
, Jiahong Yuan, Ya Li, Xiaoying Xu, Mark Y. Liberman
:
The Rhythmic Constraint on Prosodic Boundaries in Mandarin Chinese Based on Corpora of Silent Reading and Speech Perception. 87-91
Speech and Language Processing for Clinical Health Applications
- Fu-Sheng Tsai, Ya-Ling Hsu, Wei-Chen Chen, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee
:
Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions. 92-96 - Tan Lee
, Yuanyuan Liu
, Yu Ting Yeung, Thomas K. T. Law, Kathy Y. S. Lee
:
Predicting Severity of Voice Disorder from DNN-HMM Acoustic Posteriors. 97-101 - Klaske E. van Sluis, Michiel W. M. van den Brekel
, Frans J. M. Hilgers, Rob J. J. H. van Son
:
Long-Term Stability of Tracheoesophageal Voices. 102-106 - Gábor Gosztolya, László Tóth
, Tamás Grósz
, Veronika Vincze, Ildikó Hoffmann, Gréta Szatlóczki, Magdolna Pákáski, János Kálmán:
Detecting Mild Cognitive Impairment from Spontaneous Speech by Correlation-Based Phonetic Feature Selection. 107-111 - Jen J. Gong, Maryann Gong, Dina Levy-Lambert, Jordan R. Green, Tiffany P. Hogan, John V. Guttag:
Towards an Automated Screening Tool for Developmental Speech and Language Impairments. 112-116 - Vikram C. M., Nagaraj Adiga
, S. R. Mahadeva Prasanna:
Spectral Enhancement of Cleft Lip and Palate Speech. 117-121
Speech Coding and Audio Processing for Noise Reduction
- Tian Guan, Guangxing Chu, Fei Chen
, Feng Yang:
Assessing Level-Dependent Segmental Contribution to the Intelligibility of Speech Processed by Single-Channel Noise-Suppression Algorithms. 122-125 - Tudor-Catalin Zorila, Sheila Flanagan, Brian C. J. Moore, Yannis Stylianou:
Effectiveness of Near-End Speech Enhancement Under Equal-Loudness and Equal-Level Constraints. 126-130 - Bidisha Sharma, S. R. Mahadeva Prasanna:
Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence. 131-135 - Lei Wang, Shufeng Zhu, Diliang Chen, Yong Feng
, Fei Chen
:
Relative Contributions of Amplitude and Phase to the Intelligibility Advantage of Ideal Binary Masked Sentences. 136-139 - Qingju Liu, Yan Tang
, Philip J. B. Jackson
, Wenwu Wang:
Predicting Binaural Speech Intelligibility from Signals Estimated by a Blind Source Separation Algorithm. 140-144 - Petko Nikolov Petkov, Norbert Braunschweiler, Yannis Stylianou:
Automated Pause Insertion for Improved Intelligibility Under Reverberation. 145-149
Speech Analysis
- Jean-Luc Rouas
, Leonidas Ioannidis:
Automatic Classification of Phonation Modes in Singing Voice: Towards Singing Style Characterisation and Application to Ethnomusicological Recordings. 150-154 - Himanshu N. Bhavsar, Tanvina B. Patel, Hemant A. Patil:
Novel Nonlinear Prediction Based Features for Spoofed Speech Detection. 155-159 - Sri Harsha Dumpala, Bhanu Teja Nellore, Raghu Ram Nevali, Suryakanth V. Gangashetty
, B. Yegnanarayana:
Robust Vowel Landmark Detection Using Epoch-Based Features. 160-164 - Johannes Töger, Yongwan Lim, Sajan Goud Lingala
, Shrikanth S. Narayanan, Krishna S. Nayak
:
Sensitivity of Quantitative RT-MRI Metrics of Vocal Tract Dynamics to Image Reconstruction Settings. 165-169 - Milos Cernak, Afsaneh Asaei, Pierre-Edouard Honnet, Philip N. Garner
, Hervé Bourlard:
Sound Pattern Matching for Automatic Prosodic Event Detection. 170-174 - Mostafa Ali Shahin
, Julien Epps
, Beena Ahmed
:
Automatic Classification of Lexical Stress in English and Arabic Languages Using Deep Learning. 175-179
First and Second Language Acquisition
- Fei Chen, Nan Yan, Xunan Huang
, Hao Zhang
, Lan Wang, Gang Peng
:
Development of Mandarin Onset-Rime Detection in Relation to Age and Pinyin Instruction. 180-184 - Xinyi Wen, Yuan Jia:
Joint Effect of Dialect and Mandarin on English Vowel Production: A Case Study in Changsha EFL Learners. 185-189 - Tamami Katayama:
Effects of L1 Phonotactic Constraints on L2 Word Segmentation Strategies. 190-194 - Jane Wottawa, Martine Adda-Decker, Frédéric Isel:
Putting German [ʃ] and [ç] in Two Different Boxes: Native German vs L2 German of French Learners. 195-199 - Dean Luo, Ruxin Luo, Lixin Wang:
Naturalness Judgement of L2 English Through Dubbing Practice. 200-203 - Yasuaki Shinohara
:
Audiovisual Training Effects for Japanese Children Learning English /r/-/l/. 204-207 - Sarah Harper
, Louis Goldstein, Shrikanth S. Narayanan:
L2 Acquisition and Production of the English Rhotic Pharyngeal Gesture. 208-212
Speech and Hearing Disorders & Perception
- Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen:
Auditory-Visual Perception of VCVs Produced by People with Down Syndrome: Preliminary Results. 213-217 - Emre Yilmaz
, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik
:
Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech. 218-222 - Imed Laaridh, Corinne Fredouille, Christine Meunier
:
Evaluation of a Phone-Based Anomaly Detection Approach for Dysarthric Speech. 223-227 - Chitralekha Bhat, Bhavik Vachhani, Sunil Kumar Kopparapu:
Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral Estimation. 228-232 - Fei Chen, Nan Yan, Xiaojie Pan, Feng Yang, Zhuanzhuan Ji, Lan Wang, Gang Peng
:
Impaired Categorical Perception of Mandarin Tones and its Relationship to Language Ability in Autism Spectrum Disorders. 233-237 - Kathleen F. Nagle
, James T. Heaton:
Perceived Naturalness of Electrolaryngeal Speech Produced Using sEMG-Controlled vs. Manual Pitch Modulation. 238-242 - Shamima Najnin, Bonny Banerjee
, Lisa Lucks Mendel, Masoumeh Heidari Kapourchali
, Jayanta Kumar Dutta, Sungmin Lee, Chhayakanta Patro, Monique Pousson:
Identifying Hearing Loss from Learned Speech Kernels. 243-247 - Panying Rong, Yana Yunusova
, Jordan R. Green:
Differential Effects of Velopharyngeal Dysfunction on Speech Intelligibility During Early and Late Stages of Amyotrophic Lateral Sclerosis. 248-252 - Véronique Delvaux, Virginie Roland, Kathy Huet, Myriam Piccaluga, Marie-Claire Haelewyck, Bernard Harmegnies:
The Production of Intervocalic Glides in Non Dysarthric Parkinsonian Speech. 253-256 - Yang Feng, Zhang Lu:
Auditory Processing Impairments Under Background Noise in Children with Non-Syndromic Cleft Lip and/or Palate. 257-261 - Zhi Zhu, Ryota Miyauchi, Yukiko Araki, Masashi Unoki
:
Modulation Spectral Features for Predicting Vocal Emotion Recognition by Simulated Cochlear Implants. 262-266 - Keiko Ochi
, Koichi Mori, Naomi Sakai, Nobutaka Ono
:
Automatic Discrimination of Soft Voice Onset Using Acoustic Features of Breathy Voicing. 267-271 - Jing Shao
, Caicai Zhang
, Gang Peng
, Yike Yang
, William S.-Y. Wang:
Effect of Noise on Lexical Tone Perception in Cantonese-Speaking Amusics. 272-276 - Yuki Takashima, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki, Nobuyuki Mitani, Kiyohiro Omori, Kaoru Nakazono:
Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss. 277-281 - Yuling Gu, Boon Pang Lim, Nancy F. Chen
:
Perception of Tone in Whispered Mandarin Sentences: The Case for Singapore Mandarin. 282-286
Speech Synthesis Poster
- Feng-Long Xie, Frank K. Soong, Haifeng Li:
A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences. 287-291 - Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki:
Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-Embedded Non-Negative Matrix Factorization. 292-296 - Yu Gu, Zhen-Hua Ling, Li-Rong Dai:
Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks. 297-301 - Yi Yang, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu:
Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features. 302-306 - Naoki Hosaka, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Voice Conversion Based on Trajectory Model Training of Neural Networks Considering Global Variance. 307-311 - Sandesh Aryal, Ricardo Gutierrez-Osuna:
Comparing Articulatory and Acoustic Strategies for Reducing Non-Native Accents. 312-316 - Seyyed Saeed Sarfjoo, Cenk Demiroglu:
Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data. 317-321 - Lifa Sun, Hao Wang, Shiyin Kang, Kun Li, Helen M. Meng:
Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams. 322-326 - Anusha Prakash, Jeena J. Prakash, Hema A. Murthy:
Acoustic Analysis of Syllables Across Indian Languages. 327-331 - Teng Zhang, Zhipeng Chen, Ji Wu, Sam Lai, Wenhui Lei, Carsten Isert:
Objective Evaluation Methods for Chinese Text-To-Speech Systems. 332-336 - Yusuke Ijima, Taichi Asami, Hideyuki Mizuno:
Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis. 337-341 - Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, Keiichi Tokuda:
A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks. 342-346 - Monika Podsiadlo, Shweta Chahar:
Text-to-Speech for Individuals with Vision Loss: A User Study. 347-351 - Cassia Valentini-Botinhao, Xin Wang
, Shinji Takaki, Junichi Yamagishi:
Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks. 352-356 - Erica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg:
Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis. 357-361
Topics in Speech Processing
- Fei Tao, Louis Daudet, Christian Poellabauer
, Sandra L. Schneider, Carlos Busso
:
A Portable Automatic PA-TA-KA Syllable Detection System to Derive Biomarkers for Neurological Disorders. 362-366 - Omid Ghahabi, Antonio Bonafonte
, Javier Hernando, Asunción Moreno:
Deep Neural Networks for i-Vector Language Identification of Short Utterances in Cars. 367-371 - Abraham Woubie, Jordi Luque
, Javier Hernando:
Improving i-Vector and PLDA Based Speaker Clustering with Long-Term Features. 372-376
Show & Tell Session 1
- Aaron Lawson, Mitchell McLaren, Harry Bratt, Martin Graciarena, Horacio Franco, Christopher George, Allen R. Stauffer, Chris Bartels, Julien van Hout:
Open Language Interface for Voice Exploitation (OLIVE). 377-378 - Lubos Smídl, Adam Chýlek, Jan Svec:
A Multimodal Dialogue System for Air Traffic Control Trainees Based on Discrete-Event Simulation. 379-380 - Elodie Gauthier, David Blachon, Laurent Besacier, Guy-Noël Kouarata, Martine Adda-Decker, Annie Rialland, Gilles Adda, Grégoire Bachman:
Lig-Aikuma: A Mobile App to Collect Parallel Speech for Under-Resourced Language Studies. 381-382 - Martin Gruber, Jindrich Matousek, Zdenek Hanzlícek, Zdenek Krnoul, Zbynek Zajíc:
ARET - Automatic Reading of Educational Texts for Visually Impaired Students. 383-384
New Trends in Neural Networks for Speech Recognition
- Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals
:
Segmental Recurrent Neural Networks for End-to-End Speech Recognition. 385-389 - Markus Nußbaum-Thom, Jia Cui, Bhuvana Ramabhadran, Vaibhava Goel:
Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units. 390-394 - Wei-Ning Hsu, Yu Zhang, Ann Lee, James R. Glass:
Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition. 395-399 - Chunyang Wu, Penny Karanasou, Mark J. F. Gales, Khe Chai Sim:
Stimulated Deep Neural Network for Speech Recognition. 400-404 - Leonardo Badino
:
Phonetic Context Embeddings for DNN-HMM Phone Recognition. 405-409 - Ying Zhang, Mohammad Pezeshki, Philémon Brakel, Saizheng Zhang, César Laurent, Yoshua Bengio, Aaron C. Courville:
Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks. 410-414
Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances
- Guangsen Wang, Kong-Aik Lee
, Trung Hieu Nguyen, Hanwu Sun, Bin Ma:
Joint Speaker and Lexical Modeling for Short-Term Characterization of Speaker. 415-419 - Md. Jahangir Alam, Patrick Kenny, Vishwa Gupta:
Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus. 420-424 - Achintya Kumar Sarkar
, Zheng-Hua Tan:
Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBM. 425-429 - Tomi Kinnunen, Md. Sahidullah
, Ivan Kukanov, Héctor Delgado
, Massimiliano Todisco
, Achintya Kumar Sarkar
, Nicolai Bæk Thomsen, Ville Hautamäki
, Nicholas W. D. Evans, Zheng-Hua Tan:
Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots Corpus. 430-434 - Jianbo Ma, Saad Irtza, Kaavya Sriskandaraja
, Vidhyasaharan Sethu
, Eliathamby Ambikairajah
:
Parallel Speaker and Content Modelling for Text-Dependent Speaker Verification. 435-439 - Hossein Zeinali
, Hossein Sameti, Lukás Burget
, Jan Cernocký
, Nooshin Maghsoodi, Pavel Matejka:
i-Vector/HMM Based Text-Dependent Speaker Verification System for RedDots Challenge. 440-444 - Rohan Kumar Das
, Sarfaraz Jelil, S. R. Mahadeva Prasanna:
Exploring Session Variability and Template Aging in Speaker Verification for Fixed Phrase Short Utterances. 445-449
Articulatory Measurements and Analysis
- Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu:
Prediction of the Articulatory Movements of Unseen Phonemes of a Speaker Using the Speech Structure of Another Speaker. 450-454 - Ganesh Sivaraman
, Vikramjit Mitra, Hosung Nam, Mark K. Tiede, Carol Y. Espy-Wilson:
Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech Inversion. 455-459 - Adam C. Lammert, Christine H. Shadle, Shrikanth S. Narayanan, Thomas F. Quatieri:
Investigation of Speed-Accuracy Tradeoffs in Speech Production Using Real-Time Magnetic Resonance Imaging. 460-464 - Tanner Sorensen
, Asterios Toutios, Louis Goldstein, Shrikanth S. Narayanan:
Characterizing Vocal Tract Dynamics Across Speakers Using Real-Time MRI. 465-469 - Mathieu Labrunie, Pierre Badin
, Dirk Voit, Arun A. Joseph, Laurent Lamalle
, Coriandre Vilain, Louis-Jean Boë, Jens Frahm:
Tracking Contours of Orofacial Articulators from Real-Time MRI of Speech. 470-474 - Sajan Goud Lingala
, Asterios Toutios, Johannes Töger, Yongwan Lim, Yinghua Zhu, Yoon-Chul Kim, Colin Vaz, Shrikanth S. Narayanan, Krishna S. Nayak
:
State-of-the-Art MRI Protocol for Comprehensive Assessment of Vocal Tract Structure and Function. 475-479
Automatic Assessment of Emotions
- Rui Xia, Yang Liu:
DBN-ivector Framework for Acoustic Emotion Recognition. 480-484 - Brian Stasak
, Julien Epps
, Nicholas Cummins
, Roland Goecke
:
An Investigation of Emotional Speech in Depression Classification. 485-489 - Reza Lotfian, Carlos Busso
:
Retrieving Categorical Emotions Using a Probabilistic Framework to Define Preference Learning Samples. 490-494 - Maximilian Schmitt, Fabien Ringeval, Björn W. Schuller
:
At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech. 495-499 - Arodami Chorianopoulou, Polychronis Koutsakis
, Alexandros Potamianos:
Speech Emotion Recognition Using Affective Saliency. 500-504 - Rahul Gupta, Nishant Nath, Taruna Agrawal, Panayiotis G. Georgiou, David C. Atkins
, Shrikanth S. Narayanan:
Laughter Valence Prediction in Motivational Interviewing Based on Lexical and Acoustic Cues. 505-509
Acoustic and Articulatory Phonetics
- Marcin Wlodarczak
, Mattias Heldner
:
Respiratory Belts and Whistles: A Preliminary Study of Breathing Acoustics for Turn-Taking. 510-514 - Constantijn Kaland
, Vincenzo Galatà
, Lorenzo Spreafico, Alessandro Vietti:
/r/ as Language Marker in Bilingual Speech Production and Perception. 515-519 - Manfred Pützer, Frank Zimmerer
, Wolfgang Wokurek, Jeanin Jügler:
Evaluation of Phonatory Behavior of German and French Speakers in Native and Non-Native Speech. 520-524 - Sofia Strömbergsson
:
Today's Most Frequently Used F0 Estimation Methods, and Their Accuracy in Estimating Male and Female Pitch in Clean Speech. 525-529 - Lei He, Volker Dellwo
:
A Praat-Based Algorithm to Extract the Amplitude Envelope and Temporal Fine Structure Using the Hilbert Transform. 530-534 - Ewald Enzinger
:
Likelihood Ratio Calculation in Acoustic-Phonetic Forensic Voice Comparison: Comparison of Three Statistical Modelling Approaches. 535-539
Source Separation and Spatial Audio
- Xiaoke Qi, Jianhua Tao:
A Sparse Spherical Harmonic-Based Model in Subbands for Head-Related Transfer Functions. 540-544 - Yusuf Ziya Isik
, Jonathan Le Roux, Zhuo Chen, Shinji Watanabe
, John R. Hershey:
Single-Channel Multi-Speaker Separation Using Deep Clustering. 545-549 - Hao Li, Shuai Nie, Xueliang Zhang, Hui Zhang:
Jointly Optimizing Activation Coefficients of Convolutive NMF Using DNN for Speech Separation. 550-554 - Masood Delfarah, DeLiang Wang:
A Feature Study for Masking-Based Reverberant Speech Separation. 555-559 - Chung-Chien Hsu, Tai-Shih Chi, Jen-Tzung Chien
:
Discriminative Layered Nonnegative Matrix Factorization for Speech Separation. 560-564 - Arpita Gang, Pravesh Biyani:
On Discriminative Framework for Single Channel Audio Source Separation. 565-569