


default search action
18th Interspeech 2017: Stockholm, Sweden
- Francisco Lacerda
:
18th Annual Conference of the International Speech Communication Association, Interspeech 2017, Stockholm, Sweden, August 20-24, 2017. ISCA 2017
ISCA Medal 2017 Ceremony
- Haizhou Li:
ISCA Medal for Scientific Achievement. 1
Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1
- Tomi Kinnunen, Md. Sahidullah
, Héctor Delgado
, Massimiliano Todisco
, Nicholas W. D. Evans, Junichi Yamagishi, Kong-Aik Lee
:
The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection. 2-6 - Roberto Font, Juan M. Espín
, María José Cano:
Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge. 7-11 - Hemant A. Patil, Madhu R. Kamble, Tanvina B. Patel, Meet H. Soni:
Novel Variable Length Teager Energy Separation Based Instantaneous Frequency Features for Replay Detection. 12-16 - Weicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, Ming Li:
Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion. 17-21 - Sarfaraz Jelil, Rohan Kumar Das
, S. R. Mahadeva Prasanna, Rohit Sinha
:
Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features. 22-26 - Marcin Witkowski
, Stanislaw Kacprzak
, Piotr Zelasko, Konrad Kowalczyk
, Jakub Galka:
Audio Replay Attack Detection Using High-Frequency Features. 27-31 - Xianliang Wang, Yanhong Xiao, Xuan Zhu:
Feature Selection Based on CQCCs for Automatic Speaker Verification Spoofing. 32-36
Special Session: Speech Technology for Code-Switching in Multilingual Communities
- Emre Yilmaz
, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, Henk van den Heuvel, David A. van Leeuwen:
Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech. 37-41 - Emre Yilmaz
, Henk van den Heuvel, David A. van Leeuwen:
Exploiting Untranscribed Broadcast Data for Improved Code-Switching Detection. 42-46 - Vikram Ramanarayanan, David Suendermann-Oeft:
Jee haan, I'd like both, por favor: Elicitation of a Code-Switched Corpus of Hindi-English and Spanish-English Human-Machine Dialog. 47-51 - Sai Krishna Rallabandi, Alan W. Black:
On Building Mixed Lingual Speech Synthesis Systems. 52-56 - Khyathi Raghavi Chandu, Sai Krishna Rallabandi, Sunayana Sitaram, Alan W. Black:
Speech Synthesis for Mixed-Language Navigation Instructions. 57-61 - Djegdjiga Amazouz, Martine Adda-Decker, Lori Lamel:
Addressing Code-Switching in French/Algerian Arabic Speech. 62-66 - Gualberto A. Guzmán, Joseph Ricard, Jacqueline Serigos, Barbara E. Bullock
, Almeida Jacqueline Toribio:
Metrics for Modeling Code-Switching Across Corpora. 67-71 - Ewald van der Westhuizen, Thomas Niesler:
Synthesising isiZulu-English Code-Switch Bigrams Using Word Embeddings. 72-76 - Victor Soto, Julia Hirschberg:
Crowdsourcing Universal Part-of-Speech Tags for Code-Switching. 77-81
Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2
- Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev, Vadim Shchemelinin:
Audio Replay Attack Detection with Deep Learning Frameworks. 82-86 - Zhe Ji, Zhi-Yi Li, Peng Li, MaoBo An, Shengxiang Gao, Dan Wu, Faru Zhao:
Ensemble Learning for Countermeasure of Audio Replay Spoofing Attack in ASVspoof2017. 87-91 - Lantian Li
, Yixiang Chen, Dong Wang, Thomas Fang Zheng:
A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification. 92-96 - Parav Nagarsheth, Elie Khoury
, Kailash Patil, Matt Garland:
Replay Attack Detection Using DNN for Channel Discrimination. 97-101 - Zhuxin Chen, Zhifeng Xie, Weibin Zhang, Xiangmin Xu:
ResNet and Model Fusion for Automatic Spoofing Detection. 102-106 - K. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri
, Suryakanth V. Gangashetty
, Anil Kumar Vuppala:
SFF Anti-Spoofer: IIIT-H Submission for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017. 107-111
Conversational Telephone Speech Recognition
- William Hartmann, Roger Hsiao, Tim Ng, Jeff Z. Ma, Francis Keith, Man-Hung Siu:
Improved Single System Conversational Telephone Speech Recognition with VGG Bottleneck Features. 112-116 - Jeremy Heng Meng Wong, Mark J. F. Gales:
Student-Teacher Training with Diverse Decision Tree Ensembles. 117-121 - Xiaodong Cui, Vaibhava Goel
, George Saon
:
Embedding-Based Speaker Adaptive Training of Deep Neural Networks. 122-126 - Jeff Z. Ma, Francis Keith, Tim Ng, Man-Hung Siu, Owen Kimball:
Improving Deliverable Speech-to-Text Systems with Multilingual Knowledge Transfer. 127-131 - George Saon
, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall:
English Conversational Telephone Speech Recognition by Humans and Machines. 132-136 - Andreas Stolcke, Jasha Droppo
:
Comparing Human and Machine Errors in Conversational Speech Transcription. 137-141
Multimodal Paralinguistics
- Volha Petukhova, Manoj Raju, Harry Bunt:
Multimodal Markers of Persuasive Speech: Designing a Virtual Debate Coach. 142-146 - Daniel Bone, Julia Mertens, Emily Zane
, Sungbok Lee, Shrikanth S. Narayanan, Ruth B. Grossman:
Acoustic-Prosodic and Physiological Response to Stressful Interactions in Children with Autism Spectrum Disorder. 147-151 - Alec Burmania, Carlos Busso
:
A Stepwise Analysis of Aggregated Crowdsourced Labels Describing Multimodal Emotional Behaviors. 152-156 - Gaurav Fotedar, Prasanta Kumar Ghosh:
An Information Theoretic Analysis of the Temporal Synchrony Between Head Gestures and Prosodic Patterns in Spontaneous Speech. 157-161 - Dong-Yan Huang, Wan Ding, Mingyu Xu, Huaiping Ming, Minghui Dong, Xinguo Yu, Haizhou Li
:
Multimodal Prediction of Affective Dimensions via Fusing Multiple Regression Techniques. 162-165 - Marion Dohen, Benjamin Roustan:
Co-Production of Speech and Pointing Gestures in Clear and Perturbed Interactive Tasks: Multimodal Designation Strategies. 166-170
Dereverberation, Echo Cancellation and Speech
- Peter Guzewich, Stephen A. Zahorian:
Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing. 171-175 - Philipp Bulling, Klaus Linhard, Arthur Wolf, Gerhard Schmidt:
Stepsize Control for Acoustic Feedback Cancellation Based on the Detection of Reverberant Signal Periods and the Estimated System Distance. 176-180 - Jan Franzen, Tim Fingscheidt
:
A Delay-Flexible Stereo Acoustic Echo Cancellation for DFT-Based In-Car Communication (ICC) Systems. 181-185 - Dongmei Wang, John H. L. Hansen:
Speech Enhancement Based on Harmonic Estimation Combined with MMSE to Improve Speech Intelligibility for Cochlear Implant Recipients. 186-190 - David Ayllón, Roberto Gil-Pita
, Manuel Rosa-Zurera
:
Improving Speech Intelligibility in Binaural Hearing Aids by Estimating a Time-Frequency Mask with a Weighted Least Squares Classifier. 191-195 - Tsung-Chen Wu, Tai-Shih Chi, Chia-Fone Lee:
Simulations of High-Frequency Vocoder on Mandarin Speech Recognition for Acoustic Hearing Preserved Cochlear Implant. 196-200
Acoustic and Articulatory Phonetics
- Zainab Hermes, Marissa S. Barlaz, Ryan Shosted, Zhi-Pei Liang, Bradley P. Sutton
:
Phonetic Correlates of Pharyngeal and Pharyngealized Consonants in Saudi, Lebanese, and Jordanian Arabic: An rt-MRI Study. 201-205 - Benjamin Elie, Yves Laprie
:
Glottal Opening and Strategies of Production of Fricatives. 206-209 - Mohamed Yassine Frej, Christopher Carignan
, Catherine T. Best:
Acoustics and Articulation of Medial versus Final Coronal Stop Gemination Contrasts in Moroccan Arabic. 210-214 - Giuseppina Turco, Karim Shoul, Rachid Ridouane
:
How are Four-Level Length Distinctions Produced? Evidence from Moroccan Arabic. 215-218 - Caroline Jones
, Katherine Demuth
, Weicong Li
, Andre Almeida:
Vowels in the Barunga Variety of North Australian Kriol. 219-223 - Indranil Dutta
, Irfan S., Pamir Gogoi, Priyankoo Sarmah
:
Nature of Contrast and Coarticulation: Evidence from Mizo Tones and Assamese Vowel Harmony. 224-228
Multimodal and Articulatory Synthesis
- João Paulo Cabral
, Benjamin R. Cowan
, Katja Zibrek
, Rachel McDonnell
:
The Influence of Synthetic Voice on the Evaluation of a Virtual Character. 229-233 - Amelia Jane Gully, Takenori Yoshimura, Damian T. Murphy, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda:
Articulatory Text-to-Speech Synthesis Using the Digital Waveguide Mesh Driven by a Deep Neural Network. 234-238 - Sébastien Le Maguer, Ingmar Steiner, Alexander Hewer:
An HMM/DNN Comparison for Synchronized Text-to-Speech and Tongue Motion Synthesis. 239-243 - Rachel Alexander, Tanner Sorensen
, Asterios Toutios, Shrikanth S. Narayanan:
VCV Synthesis Using Task Dynamics to Animate a Factor-Based Articulatory Model. 244-248 - Joseph Mendelson, Matthew P. Aylett:
Beyond the Listening Test: An Interactive Approach to TTS Evaluation. 249-253 - Beiming Cao, Myung Jong Kim, Jan P. H. van Santen, Ted Mau, Jun Wang:
Integrating Articulatory Information in Deep Learning-Based Text-to-Speech Synthesis. 254-258
Neural Networks for Language Modeling
- Min Ma, Michael Nirschl, Fadi Biadsy, Shankar Kumar:
Approaches for Neural-Network Language Model Adaptation. 259-263 - Youssef Oualil, Dietrich Klakow:
A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models. 264-268 - Xie Chen, Anton Ragni, Xunying Liu, Mark J. F. Gales:
Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition. 269-273 - Yinghui Huang, Abhinav Sethy, Bhuvana Ramabhadran:
Fast Neural Network Language Model Lookups at N-Gram Speeds. 274-278 - Gakuto Kurata, Abhinav Sethy, Bhuvana Ramabhadran, George Saon
:
Empirical Exploration of Novel Architectures and Objectives for Language Models. 279-283 - Karel Benes
, Murali Karthick Baskar, Lukás Burget
:
Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward Networks. 284-288
Pathological Speech and Language
- Amir Hossein Poorjam
, Jesper Rindom Jensen
, Max A. Little, Mads Græsbøll Christensen
:
Dominant Distortion Classification for Pre-Processing of Vowels in Remote Biomedical Voice Analysis. 289-293 - Duc Le, Keli Licata, Emily Mower Provost
:
Automatic Paraphasia Detection from Aphasic Speech: A Preliminary Study. 294-298 - Nicanor García, Juan Rafael Orozco-Arroyave
, Luis Fernando D'Haro
, Najim Dehak
, Elmar Nöth:
Evaluation of the Neurological State of People with Parkinson's Disease Using i-Vectors. 299-303 - Yu-Ren Chien, Michal Borský, Jón Guðnason:
Objective Severity Assessment from Disordered Voice Using Estimated Glottal Airflow. 304-308 - Florian B. Pokorny
, Björn W. Schuller
, Peter B. Marschik, Raymond Brueckner, Pär Nyström, Nicholas Cummins
, Sven Bölte
, Christa Einspieler, Terje Falck-Ytter:
Earlier Identification of Children with Autism Spectrum Disorder: An Automatic Vocalisation-Based Approach. 309-313 - Juan Camilo Vásquez-Correa
, Juan Rafael Orozco-Arroyave
, Elmar Nöth:
Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson's Disease. 314-318
Speech Analysis and Representation 1
- Linxue Bai, Peter Jancovic, Martin J. Russell, Philip Weber
, Stephen M. Houghton:
Phone Classification Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs. 319-323 - Siyuan Chen, Julien Epps, Eliathamby Ambikairajah
, Phu Ngoc Le:
An Investigation of Crowd Speech for Room Occupancy Estimation. 324-328 - Karthika Vijayan
, Jitendra Kumar Dhiman
, Chandra Sekhar Seelamantula:
Time-Frequency Coherence for Periodic-Aperiodic Decomposition of Speech Signals. 329-333 - Alexsandro R. Meireles
, Antônio R. M. Simões, Antonio Celso Ribeiro, Beatriz Raposo de Medeiros:
Musical Speech: A New Methodology for Transcribing Speech Prosody. 334-338 - K. S. Nataraj, Prem C. Pandey, Hirak Dasgupta:
Estimation of Place of Articulation of Fricatives from Spectral Characteristics for Speech Training. 339-343 - Tom Bäckström
:
Estimation of the Probability Distribution of Spectral Fine Structure in the Speech Source. 344-348
Perception of Dialects and L2
- Sucheta Ghosh, Camille Fauth, Yves Laprie
, Aghilas Sini
:
End-to-End Acoustic Feedback in Language Learning for Correcting Devoiced French Final-Fricatives. 349-353 - Ewa Jacewicz, Robert Allen Fox:
Dialect Perception by Older Children. 354-358 - Kiyoko Yoneyama, Mafuyu Kitahara
, Keiichi Tajima:
Perception of Non-Contrastive Variations in American English by Japanese Learners: Flaps are Less Favored Than Stops. 359-363 - Lieke van Maastricht, Tim Zee, Emiel Krahmer, Marc Swerts
:
L1 Perceptions of L2 Prosody: The Interplay Between Intonation, Rhythm, and Speech Rate and Their Contribution to Accentedness and Comprehensibility. 364-368 - Izumi Takiguchi:
Effects of Pitch Fall and L1 on Vowel Length Identification in L2 Japanese. 369-373 - Yuanyuan Zhang
, Hongwei Ding:
A Preliminary Study of Prosodic Disambiguation by Chinese EFL Learners. 374-378
Far-field Speech Recognition
- Chanwoo Kim, Ananya Misra, Kean K. Chin, Thad Hughes, Arun Narayanan, Tara N. Sainath, Michiel Bacchiani:
Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home. 379-383 - Keisuke Kinoshita
, Marc Delcroix
, Haeyong Kwon, Takuma Mori, Tomohiro Nakatani:
Neural Network-Based Spectrum Estimation for Online WPE Dereverberation. 384-388 - Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Steven J. Rennie:
Factorial Modeling for Effective Suppression of Directional Noise. 389-393 - Yanhui Tu, Jun Du, Lei Sun, Feng Ma, Chin-Hui Lee:
On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones. 394-398 - Bo Li, Tara N. Sainath, Arun Narayanan, Joe Caroselli, Michiel Bacchiani, Ananya Misra, Izhak Shafran, Hasim Sak, Golan Pundak, Kean K. Chin, Khe Chai Sim, Ron J. Weiss, Kevin W. Wilson, Ehsan Variani, Chanwoo Kim, Olivier Siohan, Mitchel Weintraub, Erik McDermott, Richard Rose, Matt Shannon:
Acoustic Modeling for Google Home. 399-403 - Seyedmahdad Mirsamadi
, John H. L. Hansen:
On Multi-Domain Training and Adaptation of End-to-End RNN Acoustic Models for Distant Speech Recognition. 404-408
Speech Analysis and Representation 2
- Masanori Morise, Genta Miyashita, Kenji Ozawa
:
Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System. 409-413 - Erfan Loweimi
, Jon Barker, Oscar Saz-Torralba, Thomas Hain
:
Robust Source-Filter Separation of Speech Signal in the Phase Domain. 414-418 - Simon Stone
, Peter Steiner
, Peter Birkholz
:
A Time-Warping Pitch Tracking Algorithm Considering Fast f0 Changes. 419-423 - Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda
:
A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and fo Estimation. 424-428 - Avinash Kumar, Syed Shahnawazuddin
, Gayadhar Pradhan
:
Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments. 429-433 - Mohammed Salah Al-Radhi
, Tamás Gábor Csapó
, Géza Németh
:
Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech Synthesis. 434-438 - Chia-Lung Wu, Hsiang-Ping Hsu, Syu-Siang Wang
, Jeih-Weih Hung, Ying-Hui Lai, Hsin-Min Wang
, Yu Tsao
:
Wavelet Speech Enhancement Based on Robust Principal Component Analysis. 439-443 - Bidisha Sharma, S. R. Mahadeva Prasanna:
Vowel Onset Point Detection Using Sonority Information. 444-448 - Unto K. Laine:
Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual Studies. 449-453 - Christian Kroos, Mark D. Plumbley
:
Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural Networks. 454-458
Speech and Audio Segmentation and Classification 2
- Jia Dai, Wei Xue, Wenju Liu:
Multilingual i-Vector Based Statistical Modeling for Music Genre Classification. 459-463 - Banriskhem K. Khonglah, K. T. Deepak, S. R. Mahadeva Prasanna:
Indoor/Outdoor Audio Classification Using Foreground Speech Segmentation. 464-468 - Jinxi Guo, Ning Xu, Li-Jia Li
, Abeer Alwan:
Attention Based CLDNNs for Short-Duration Acoustic Scene Classification. 469-473 - Xianjun Xia, Roberto Togneri
, Ferdous Ahmed Sohel
, David Huang
:
Frame-Wise Dynamic Threshold Based Polyphonic Acoustic Event Detection. 474-478 - Inseon Jang, Chunghyun Ahn, Jeongil Seo, Younseon Jang:
Enhanced Feature Extraction for Speech Detection in Media Audio. 479-483 - Sukanya Sonowal, Tushar Sandhan, In Kyu Choi, Nam Soo Kim:
Audio Classification Using Class-Specific Learned Descriptors. 484-487 - Janek Ebbers, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach
, Bhiksha Raj:
Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery. 488-492 - Matthias Zöhrer, Franz Pernkopf
:
Virtual Adversarial Training and Data Augmentation for Acoustic Event Detection with Gated Recurrent Neural Networks. 493-497 - Michael McAuliffe, Michaela Socolof, Sarah Mihuc
, Michael Wagner, Morgan Sonderegger
:
Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi. 498-502 - G. Nisha Meenakshi, Prasanta Kumar Ghosh:
A Robust Voiced/Unvoiced Phoneme Classification from Whispered Speech Using the 'Color' of Whispered Phonemes and Deep Neural Network. 503-507
Search, Computational Strategies and Language Modeling
- Ian Williams, Petar S. Aleksic:
Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition. 508-512 - Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues
, Matthias Sperber, Sebastian Stüker, Alex Waibel:
Comparison of Decoding Strategies for CTC Acoustic Models. 513-517 - Hossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur:
Phone Duration Modeling for LVCSR Using Neural Networks. 518-522 - Jan Chorowski
, Navdeep Jaitly:
Towards Better Decoding and Language Model Integration in Sequence to Sequence Models. 523-527 - Wenpeng Li, Binbin Zhang, Lei Xie, Dong Yu:
Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling. 528-532 - Xu Xiang
, Yanmin Qian, Kai Yu:
Binary Deep Neural Networks for Speech Recognition. 533-537 - Akshay Chandrashekaran, Ian R. Lane:
Hierarchical Constrained Bayesian Optimization for Feature, Acoustic Model and Decoder Parameter Optimization. 538-542 - Shohei Toyama, Daisuke Saito, Nobuaki Minematsu:
Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition. 543-547 - Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas C. Raykar, Lili Kotlerman, Guy Lev:
Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural Networks. 548-552 - Xiaoyu Shen, Youssef Oualil, Clayton Greenberg, Mittul Singh
, Dietrich Klakow:
Estimation of Gap Between Current Language Models and Human Performance. 553-557 - Anna Moró, György Szaszák:
A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation Recovery. 558-562
Speech Perception
- Lei Wang, Fei Chen
:
Factors Affecting the Intelligibility of Low-Pass Filtered Speech. 563-566 - Shiyu Wang, Fei Chen
:
Phonetic Restoration of Temporally Reversed Speech. 567-570 - Mako Ishida:
Simultaneous Articulatory and Acoustic Distortion in L1 and L2 Listening: Locally Time-Reversed "Fast" Speech. 571-575 - L. Ann Burchfield, San-hei Kenny Luk, Mark Antoniou
, Anne Cutler:
Lexically Guided Perceptual Learning in Mandarin Chinese. 576-580 - Chris Davis, Chee Seng Chong, Jeesun Kim:
The Effect of Spectral Profile on the Intelligibility of Emotional Speech in Noise. 581-585 - Merel Maslowski, Antje S. Meyer, Hans Rutger Bosker
:
Whether Long-Term Tracking of Speech Rate Affects Perception Depends on Who is Talking. 586-590 - Daniel Oliveira Peres
, Dominic Watt, Waldemar Ferreira Netto
:
Emotional Thin-Slicing: A Proposal for a Short- and Long-Term Division of Emotional Speech. 591-595 - Adriana Guevara-Rukoz, Erika Parlato-Oliveira, Shi Yu, Yuki Hirose, Sharon Peperkamp
, Emmanuel Dupoux
:
Predicting Epenthetic Vowel Quality from Acoustics. 596-600 - Toshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy D. Patterson:
The Effect of Spectral Tilt on Size Discrimination of Voiced Speech Sounds. 601-605 - Jaime Lorenzo-Trueba, Cassia Valentini-Botinhao, Gustav Eje Henter, Junichi Yamagishi:
Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car. 606-610 - Oliver Niebuhr
, Jana Winkler:
The Relative Cueing Power of F0 and Duration in German Prominence Perception. 611-615 - Luciana Marques, Rebecca Scarborough:
Perception and Acoustics of Vowel Nasality in Brazilian Portuguese. 616-620 - Jonny Kim, Katie Drager:
Sociophonetic Realizations Guide Subsequent Lexical Access. 621-625
Speech Production and Perception
- Samuel Silva
, António J. S. Teixeira
:
Critical Articulators Identification from RT-MRI of the Vocal Tract. 626-630 - Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan:
Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images. 631-635 - Sasan Asadiabadi, Engin Erzin
:
Vocal Tract Airway Tissue Boundary Tracking for rtMRI Using Shape and Appearance Priors. 636-640 - T. V. Ananthapadmanabha, A. G. Ramakrishnan
, Shubham Sharma:
An Objective Critical Distance Measure Based on the Relative Level of Spectral Valley. 641-644 - Tanner Sorensen
, Zisis Iason Skordilis, Asterios Toutios, Yoon-Chul Kim, Yinghua Zhu, Jangwon Kim, Adam C. Lammert, Vikram Ramanarayanan, Louis Goldstein, Dani Byrd
, Krishna S. Nayak
, Shrikanth S. Narayanan:
Database of Volumetric and Real-Time Vocal Tract MRI for Speech Science. 645-649 - Chong Cao, Yanlu Xie, Qi Zhang, Jinsong Zhang
:
The Influence on Realization and Perception of Lexical Tones from Affricate's Aspiration. 650-654 - Matthias K. Franken
, Frank Eisner, Jan-Mathijs Schoffelen, Daniel J. Acheson, Peter Hagoort, James M. McQueen:
Audiovisual Recalibration of Vowel Categories. 655-658 - Judith Peters, Marieke Hoetjes:
The Effect of Gesture on Persuasive Speech. 659-663 - Wei Lai:
Auditory-Visual Integration of Talker Gender in Cantonese Tone Perception. 664-668 - Takayuki Ito, Hiroki Ohashi, Eva Montas, Vincent L. Gracco:
Event-Related Potentials Associated with Somatosensory Effect in Audio-Visual Speech Perception. 669-673 - Lena F. Renner, Marcin Wlodarczak:
When a Dog is a Cat and How it Changes Your Pupil Size: Pupil Dilation in Response to Information Mismatch. 674-678 - Win Thuzar Kyaw, Yoshinori Sagisaka:
Cross-Modal Analysis Between Phonation Differences and Texture Images Based on Sentiment Correlations. 679-683 - Daryush D. Mehta, Patrick C. Chwalek, Thomas F. Quatieri, Laura J. Brattain:
Wireless Neck-Surface Accelerometer and Microphone on Flex Circuit with Application to Noise-Robust Monitoring of Lombard Speech. 684-688 - Andrea Bandini
, Aravind Namasivayam, Yana Yunusova
:
Video-Based Tracking of Jaw Movements During Speech: Preliminary Results and Future Directions. 689-693 - S. B. Sunil Kumar, K. Sreenivasa Rao, Tanumay Mandal
:
Accurate Synchronization of Speech and EGG Signal Using Phase Information. 694-698 - Anna Sara H. Romøren
, Aoju Chen
:
The Acquisition of Focal Lengthening in Stockholm Swedish. 699-703
Multi-lingual Models and Adaptation for ASR
- Shiyu Zhou, Yuanyuan Zhao, Shuang Xu, Bo Xu:
Multilingual Recurrent Neural Networks with Residual Learning for Low-Resource Speech Recognition. 704-708 - Olivier Siohan:
CTC Training of Multi-Phone Acoustic Models for Speech Recognition. 709-713 - Sibo Tong, Philip N. Garner
, Hervé Bourlard:
An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation. 714-718 - Martin Karafiát
, Murali Karthick Baskar, Pavel Matejka, Karel Veselý, Frantisek Grézl, Lukás Burget
, Jan Cernocký
:
2016 BUT Babel System: Multilingual BLSTM Acoustic Model with i-Vector Based Adaptation. 719-723 - Marco Matassoni, Alessio Brutti, Daniele Falavigna:
Optimizing DNN Adaptation for Recognition of Enhanced Speech. 724-728 - Younggwan Kim, Hyungjun Lim
, Jahyun Goo, Hoirin Kim:
Deep Least Squares Regression for Speaker Adaptation. 729-733 - Van Hai Do, Nancy F. Chen
, Boon Pang Lim, Mark Hasegawa-Johnson:
Multi-Task Learning Using Mismatched Transcription for Under-Resourced Speech Recognition. 734-738 - Neethu Mariam Joy, Sandeep Reddy Kothinti, Srinivasan Umesh
, Basil Abraham:
Generalized Distillation Framework for Speaker Normalization. 739-743 - Lahiru Samarakoon, Brian Mak
, Khe Chai Sim:
Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models. 744-748 - Joachim Fainberg
, Steve Renals
, Peter Bell:
Factorised Representations for Neural Network Adaptation to Diverse Acoustic Environments. 749-753
Prosody and Text Processing
- Richard Sproat, Navdeep Jaitly:
An RNN Model of Text Normalization. 754-758 - Asaf Rendel, Raul Fernandez, Zvi Kons, Andrew Rosenberg, Ron Hoory, Bhuvana Ramabhadran:
Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels. 759-763 - Yusuke Ijima, Nobukatsu Hojo, Ryo Masumura, Taichi Asami:
Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis. 764-768 - Jinfu Ni, Yoshinori Shiga, Hisashi Kawai:
Global Syllable Vectors for Building TTS Front-End with Deep Learning. 769-773 - Ishin Fukuoka, Kazuhiko Iwata, Tetsunori Kobayashi:
Prosody Control of Utterance Sequence for Information Delivering. 774-778 - Yuchen Huang, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai:
Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer. 779-783 - Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li, Bin Liu:
Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction. 784-788 - Bo Chen, Tianling Bian, Kai Yu:
Discrete Duration Model for Speech Synthesis. 789-793 - Bo Chen, Jiahao Lai, Kai Yu:
Comparison of Modeling Target in LSTM-RNN Duration Model. 794-798 - Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi:
Learning Word Vector Representations Based on Acoustic Counts. 799-803 - Éva Székely, Joseph Mendelson, Joakim Gustafson:
Synthesising Uncertainty: The Interplay of Vocal Effort and Hesitation Disfluencies. 804-808
Show & Tell 1
- Alp Öktem, Mireia Farrús, Leo Wanner:
Prosograph: A Tool for Prosody Visualisation of Large Speech Corpora. 809-810 - Svetlana Vetchinnikova, Anna Mauranen, Nina Mikusová:
ChunkitApp: Investigating the Relevant Units of Online Speech Processing. 811-812 - Markus Jochim:
Extending the EMU Speech Database Management System: Cloud Hosting, Team Collaboration, Automatic Revision Control. 813-814 - Anne S. Warlaumont, Mark VanDam, Elika Bergelson, Alejandrina Cristià:
HomeBank: A Repository for Long-Form Real-World Audio Recordings of Children. 815-816 - Peter Bell, Joachim Fainberg, Catherine Lai, Mark Sinclair:
A System for Real Time Collaborative Transcription Correction. 817-818 - Chitralekha Bhat, Anjali Kant, Bhavik Vachhani, Sarita Rautara, Ashok Kumar Sinha, Sunil Kumar Kopparapu:
MoPAReST - Mobile Phone Assisted Remote Speech Therapy Platform. 819-820
Show & Tell 2
- Aurore Jaumard-Hakoun, Samy Chikhi, Takfarinas Medani, Angelika Nair, Gérard Dreyfus, François-Benoît Vialatte:
An Apparatus to Investigate Western Opera Singing Skill Learning Using Performance and Result Biofeedback, and Measuring its Neural Correlates. 821-822 - Christoph Draxler:
PercyConfigurator - Perception Experiments as a Service. 823-824 - Askars Salimbajevs, Indra Ikauniece:
System for Speech Transcription and Post-Editing in Microsoft Word. 825-826 - Ji Ho Park, Nayeon Lee, Dario Bertero, Anik Dey, Pascale Fung:
Emojive! Collecting Emotion Data from Speech and Facial Expression Using Mobile Game App. 827-828 - Mietta Lennes, Jussi Piitulainen, Martin Matthiesen:
Mylly - The Mill: A New Platform for Processing Speech and Text Corpora Easily and Efficiently. 829-830 - Kyori Suzuki, Ian Wilson, Hayato Watanabe:
Visual Learning 2: Pronunciation App Using Ultrasound, Video, and MRI. 831-832
Keynote 1: James Allen
- James Allen:
Dialogue as Collaborative Problem Solving. 833
Special Session: Speech and Human-Robot Interaction
- Brian Stasak
, Julien Epps
, Roland Goecke
:
Elicitation Design for Acoustic Depression Classification: An Investigation of Articulation Effort, Linguistic Complexity, and Word Affect. 834-838 - José Novoa, Jorge Wuth, Juan Pablo Escudero, Josué Fredes, Rodrigo Mahú, Richard M. Stern
, Néstor Becerra Yoma:
Robustness Over Time-Varying Channels in DNN-HMM ASR Based Human-Robot Interaction. 839-843 - Bekir Berker Türker, Zana Buçinca, Engin Erzin
, Yücel Yemez, T. Metin Sezgin
:
Analysis of Engagement and User Experience with a Laughter Responsive Social Robot. 844-848 - Alice Baird, Shahin Amiriparian
, Nicholas Cummins
, Alyssa M. Alcorn, Anton Batliner, Sergey Pugachevskiy, Michael Freitag, Maurice Gerczuk, Björn W. Schuller
:
Automatic Classification of Autistic Child Vocalisations: A Novel Database and Results. 849-853 - Catharine Oertel, Patrik Jonell, Dimosthenis Kontogiorgos, Joseph Mendelson, Jonas Beskow, Joakim Gustafson:
Crowd-Sourced Design of Artificial Attentive Listeners. 854-858 - Leonardo Lancia, Thierry Chaminade
, Noël Nguyen
, Laurent Prévot
:
Studying the Link Between Inter-Speaker Coordination and Speech Imitation Through Human-Machine Interactions. 859-863
Special Session: Incremental Processing and Responsive Behaviour
- Samuel Delalez, Christophe d'Alessandro:
Adjusting the Frame: Biphasic Performative Control of Speech Rhythm. 864-868 - Raheleh Saryazdi
, Craig G. Chambers:
Attentional Factors in Listeners' Uptake of Gesture Cues During Speech Processing. 869-873 - Carlos Toshinori Ishi, Takashi Minato, Hiroshi Ishiguro:
Motion Analysis in Vocalized Surprise Expressions. 874-878 - Robin Ruede, Markus Müller, Sebastian Stüker, Alex Waibel:
Enhancing Backchannel Prediction Using Word Embeddings. 879-883 - Eran Raveh, Ingmar Steiner, Bernd Möbius:
A Computational Model for Phonetically Responsive Spoken Dialogue Systems. 884-888 - Eustace Ebhotemhen, Volha Petukhova, Dietrich Klakow:
Incremental Dialogue Act Recognition: Token- vs Chunk-Based Classification. 889-893
Special Session: Acoustic Manifestations of Social Characteristics
- Oliver Niebuhr
:
Clear Speech - Mere Speech? How Segmental and Prosodic Speech Reduction Shape the Impression That Speakers Create on Listeners. 894-898 - Charlotte Kouklia, Nicolas Audibert
:
Relationships Between Speech Timing and Perceived Hostility in a French Corpus of Political Debates. 899-903 - Laura Fernández Gallardo, Benjamin Weiss:
Towards Speaker Characterization: Identifying and Predicting Dimensions of Person Attribution. 904-908 - Carlos Toshinori Ishi, Jun Arai, Norihiro Hagita:
Prosodic Analysis of Attention-Drawing Speech. 909-913 - Adrian P. Simpson, Riccarda Funk
, Frederik Palmer:
Perceptual and Acoustic CorreLates of Gender in the Prepubertal Voice. 914-918 - Katrin Schweitzer, Michael Walsh, Antje Schweitzer:
To See or not to See: Interlocutor Visibility and Likeability Influence Convergence in Intonation. 919-923 - Melanie Weirich, Adrian P. Simpson:
Acoustic Correlates of Parental Role and Gender Identity in the Speech of Expecting Parents. 924-928 - Rubén Solera-Ureña
, Helena Moniz
, Fernando Batista
, Vera Cabarrão
, Anna Pompili, Ramón Fernandez Astudillo, Joana Campos, Ana Paiva, Isabel Trancoso
:
A Semi-Supervised Learning Approach for Acoustic-Prosodic Personality Perception in Under-Resourced Domains. 929-933 - Rachael Tatman, Conner Kasten:
Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions. 934-938
Neural Network Acoustic Models for ASR 1
- Rohit Prabhavalkar
, Kanishka Rao, Tara N. Sainath, Bo Li, Leif Johnson, Navdeep Jaitly:
A Comparison of Sequence-to-Sequence Models for Speech Recognition. 939-943 - Albert Zeyer, Eugen Beck, Ralf Schlüter
, Hermann Ney:
CTC in the Context of Generalized Full-Sum HMM Training. 944-948 - Takaaki Hori, Shinji Watanabe
, Yu Zhang, William Chan:
Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM. 949-953 - Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith:
Multitask Learning with CTC and Segmental CRF for Speech Recognition. 954-958 - Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon
, Michael Picheny, David Nahamoo:
Direct Acoustics-to-Word Models for English Conversational Speech Recognition. 959-963 - Bo Li, Tara N. Sainath:
Reducing the Computational Complexity of Two-Dimensional LSTMs. 964-968
Models of Speech Production
- Jorge C. Lucero
:
Functional Principal Component Analysis of Vocal Tract Area Functions. 969-973 - Ganesh Sivaraman
, Carol Y. Espy-Wilson, Martijn Wieling
:
Analysis of Acoustic-to-Articulatory Speech Inversion Across Different Accents and Languages. 974-978 - Takayuki Arai:
Integrated Mechanical Model of [r]-[l] and [b]-[m]-[w] Producing Consonant Cluster [br]. 979-983 - Leonardo Badino
, Luca Franceschi, Raman Arora
, Michele Donini
, Massimiliano Pontil:
A Speaker Adaptive DNN Training Approach for Speaker-Independent Acoustic Inversion. 984-988 - Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu:
Acoustic-to-Articulatory Mapping Based on Mixture of Probabilistic Canonical Correlation Analysis. 989-993 - Tanner Sorensen
, Asterios Toutios, Johannes Töger, Louis Goldstein, Shrikanth S. Narayanan:
Test-Retest Repeatability of Articulatory Strategies Using Real-Time Magnetic Resonance Imaging. 994-998
Speaker Recognition
- David Snyder, Daniel Garcia-Romero, Daniel Povey, Sanjeev Khudanpur
:
Deep Neural Network Embeddings for Text-Independent Speaker Verification. 999-1003 - Jesús Villalba, Niko Brümmer, Najim Dehak
:
Tied Variational Autoencoder Backends for i-Vector Speaker Recognition. 1004-1008 - Shivesh Ranjan, John H. L. Hansen:
Improved Gender Independent Speaker Recognition Using Convolutional Neural Network Based Bottleneck Features. 1009-1013 - Suwon Shon, Seongkyu Mun, Wooil Kim, Hanseok Ko
:
Autoencoder Based Domain Adaptation for Speaker Recognition Under Insufficient Channel Information. 1014-1018 - Abbas Khosravani, Mohammad Mehdi Homayounpour
:
Nonparametrically Trained Probabilistic Linear Discriminant Analysis for i-Vector Speaker Verification. 1019-1023 - Jesús Jorrín, Paola García, Luis Buera:
DNN Bottleneck Features for Speaker Clustering. 1024-1028
Phonation and Voice Quality
- Kätlin Aare, Pärtel Lippus
, Juraj Simko:
Creak as a Feature of Lexical Stress in Estonian. 1029-1033 - Irena Yanushevskaya
, Ailbhe Ní Chasaide, Christer Gobl
:
Cross-Speaker Variation in Voice Source Correlates of Focus and Deaccentuation. 1034-1038 - Sishir Kalita, Wendy Lalhminghlui
, Luke Horo, Priyankoo Sarmah
, S. R. Mahadeva Prasanna, Samarendra Dandapat:
Acoustic Characterization of Word-Final Glottal Stops in Mizo and Assam Sora. 1039-1043 - Parham Mokhtari, Hiroshi Ando:
Iterative Optimal Preemphasis for Improved Glottal-Flow Estimation by Iterative Adaptive Inverse Filtering. 1044-1048 - Yaniv Sheena, Mísa Hejná, Yossi Adi, Joseph Keshet
:
Automatic Measurement of Pre-Aspiration. 1049-1053 - Kiranpreet Nara:
Acoustic and Electroglottographic Study of Breathy and Modal Vowels as Produced by Heritage and Native Gujarati Speakers. 1054-1058
Speech Synthesis Prosody
- Xin Wang
, Shinji Takaki, Junichi Yamagishi:
An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis. 1059-1063 - Viacheslav Klimkov, Adam Nadolski, Alexis Moinet, Bartosz Putrycz, Roberto Barra-Chicote, Thomas Merritt, Thomas Drugman:
Phrase Break Prediction for Long-Form Reading TTS: Exploiting Text Structure Information. 1064-1068 - Kou Tanaka, Hirokazu Kameoka, Tomoki Toda
, Satoshi Nakamura:
Physically Constrained Statistical F0 Prediction for Electrolaryngeal Speech Enhancement. 1069-1073 - Nobukatsu Hojo, Yasuhito Ohsugi, Yusuke Ijima, Hirokazu Kameoka:
DNN-SPACE: DNN-HMM-Based Generative Model of Voice F0 Contours for Statistical Phrase/Accent Command Estimation. 1074-1078 - Zofia Malisz
, Harald Berthelsen, Jonas Beskow, Joakim Gustafson:
Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis. 1079-1083 - Simon Betz, Jana Voße, Sina Zarrieß, Petra Wagner:
Increasing Recall of Lengthening Detection via Semi-Automatic Classification. 1084-1088
Emotion Recognition
- Aharon Satt, Shai Rozenberg, Ron Hoory:
Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms. 1089-1093 - Ruo Zhang, Atsushi Ando, Satoshi Kobashikawa, Yushi Aono:
Interaction and Transition Model for Speech Emotion Recognition in Dialogue. 1094-1097 - John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost
:
Progressive Neural Networks for Transfer Learning in Emotion Recognition. 1098-1102 - Srinivas Parthasarathy, Carlos Busso
:
Jointly Predicting Arousal, Valence and Dominance with Multi-Task Learning. 1103-1107 - Duc Le, Zakaria Aldeneh, Emily Mower Provost
:
Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network. 1108-1112 - Jaebok Kim, Gwenn Englebienne, Khiet P. Truong, Vanessa Evers:
Towards Speech Emotion Recognition "in the Wild" Using Aggregated Corpora and Deep Multi-Task Learning. 1113-1117
WaveNet and Novel Paradigms
- Akira Tamamori, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Tomoki Toda
:
Speaker-Dependent WaveNet Vocoder. 1118-1122 - Yu Gu, Zhen-Hua Ling:
Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension. 1123-1127 - Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi:
Direct Modeling of Frequency Spectra and Waveform Generation Based on Phase Recovery for DNN-Based Speech Synthesis. 1128-1132 - Srikanth Ronanki, Oliver Watts, Simon King:
A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis. 1133-1137 - Kazuhiro Kobayashi, Tomoki Hayashi, Akira Tamamori, Tomoki Toda
:
Statistical Voice Conversion with WaveNet-Based Waveform Generation. 1138-1142 - Vincent Wan, Yannis Agiomyrgiannakis, Hanna Silén, Jakub Vít:
Google's Next-Generation Real-Time Unit-Selection Synthesizer Using Sequence-to-Sequence LSTM-Based Autoencoders. 1143-1147
Models of Speech Perception
- Alexander Kain, Max Del Giudice, Kris Tjaden:
A Comparison of Sentence-Level Speech Intelligibility Metrics. 1148-1152 - Toshio Irino, Eri Takimoto, Toshie Matsui, Roy D. Patterson:
An Auditory Model of Speaker Size Perception for Voiced Speech Sounds. 1153-1157 - Louis ten Bosch
, Lou Boves, Mirjam Ernestus:
The Recognition of Compounds: A Computational Account. 1158-1162 - Mohsen Zareian Jahromi, Jan Østergaard
, Jesper Jensen:
Humans do not Maximize the Probability of Correct Decision When Recognizing DANTALE Words in Noise. 1163-1167 - Rainer Huber
, Constantin Spille, Bernd T. Meyer:
Single-Ended Prediction of Listening Effort Based on Automatic Speech Recognition. 1168-1172 - Chris Neufeld:
Modeling Categorical Perception with the Receptive Fields of Auditory Neurons. 1173-1177
Source Separation and Auditory Scene Analysis
- Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee:
A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation. 1178-1182 - Takuya Higuchi, Keisuke Kinoshita
, Marc Delcroix
, Katerina Zmolíková
, Tomohiro Nakatani:
Deep Clustering-Based Beamforming for Separation with Unknown Number of Sources. 1183-1187 - Shadi Pirhosseinloo, Kostas Kokkinakis:
Time-Frequency Masking for Blind Source Separation with Preserved Spatial Cues. 1188-1192 - Jen-Tzung Chien
, Kuan-Ting Kuo:
Variational Recurrent Neural Networks for Speech Separation. 1193-1197 - Valentin Andrei, Horia Cucu
, Corneliu Burileanu:
Detecting Overlapped Speech on Short Timeframes Using Deep Learning. 1198-1202 - Xu Li, Junfeng Li, Yonghong Yan:
Ideal Ratio Mask Estimation Using Deep Neural Networks for Monaural Speech Segregation in Noisy Reverberant Conditions. 1203-1207
Prosody: Tone and Intonation
- Sergio I. Quiroz, Marzena Zygis:
The Vocative Chant and Beyond: German Calling Melodies Under Routine and Urgent Contexts. 1208-1212 - Juraj Simko, Antti Suni
, Katri Hiovain, Martti Vainio
:
Comparing Languages Using Hierarchical Prosodic Analysis. 1213-1217 - Martin Ho Kwan Ip, Anne Cutler:
Intonation Facilitates Prediction of Focus Even in the Presence of Lexical Tones. 1218-1222 - Katharina Zahner, Heather Kember
, Bettina Braun
:
Mind the Peak: When Museum is Temporarily Understood as Musical in Australian English. 1223-1227 - Luca Rognoni, Judith Bishop
, Miriam Corris:
Pashto Intonation Patterns. 1228-1232 - Kikuo Maekawa:
A New Model of Final Lowering in Spontaneous Monologue. 1233-1237
Emotion Modeling
- Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai:
Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space. 1238-1242 - Saurabh Sahu, Rahul Gupta, Ganesh Sivaraman
, Wael AbdAlmageed, Carol Y. Espy-Wilson:
Adversarial Auto-Encoders for Speech Based Emotion Recognition. 1243-1247 - Ting Dang
, Vidhyasaharan Sethu
, Julien Epps
, Eliathamby Ambikairajah
:
An Investigation of Emotion Prediction Uncertainty Using Gaussian Mixture Regression. 1248-1252 - Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin G. McInnis, Emily Mower Provost
:
Capturing Long-Term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition. 1253-1257 - Ailbhe Ní Chasaide, Irena Yanushevskaya
, Christer Gobl
:
Voice-to-Affect Mapping: Inferences on Language Voice Baseline Settings. 1258-1262 - Michael Neumann, Ngoc Thang Vu:
Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech. 1263-1267
Voice Conversion 1
- Hiroyuki Miyoshi, Yuki Saito
, Shinnosuke Takamichi, Hiroshi Saruwatari:
Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities. 1268-1272 - Wei-Ning Hsu, Yu Zhang, James R. Glass:
Learning Latent Representations for Speech Generation and Transformation. 1273-1277 - Tetsuya Hashimoto, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu:
Parallel-Data-Free Many-to-Many Voice Conversion Based on DNN Integrated with Eigenspace Using a Non-Parallel Speech Corpus. 1278-1282 - Takuhiro Kaneko, Hirokazu Kameoka, Kaoru Hiramatsu, Kunio Kashino:
Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks. 1283-1287 - Luc Ardaillon, Axel Roebel
:
A Mouth Opening Effect Based on Pole Modification for Expressive Singing Voice Transformation. 1288-1292 - Seyed Hamidreza Mohammadi, Alexander Kain:
Siamese Autoencoders for Speech Style Extraction and Switching Applied to Voice Identification and Conversion. 1293-1297
Neural Network Acoustic Models for ASR 2
- Hasim Sak, Matt Shannon, Kanishka Rao, Françoise Beaufays:
Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping. 1298-1302 - Golan Pundak, Tara N. Sainath:
Highway-LSTM and Recurrent Highway Networks for Speech Recognition. 1303-1307 - Mirco Ravanelli
, Philemon Brakel, Maurizio Omologo
, Yoshua Bengio:
Improving Speech Recognition by Revising Gated Recurrent Units. 1308-1312 - Jen-Tzung Chien
, Chen Shen:
Stochastic Recurrent Neural Network for Speech Recognition. 1313-1317 - Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf
:
Frame and Segment Level Recurrent Neural Networks for Phone Classification. 1318-1322 - Kyu Jeong Han, Seongjun Hahm, Byung-Hak Kim, Jungsuk Kim, Ian R. Lane:
Deep Learning-Based Telephony Speech Recognition in the Wild. 1323-1327
Speaker Recognition Evaluation
- Kong-Aik Lee, Ville Hautamäki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu, Mickaël Rouvier, Wei Rao, Federico Alegre, J. Ma, Man-Wai Mak, Achintya Kumar Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Bin Ma, Ville Vestman, Md. Sahidullah, M. Halonen, Anssi Kanervisto, Gaël Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch, Georgios Tzimiropoulos, Q. Qian, Z. Wang, Q. Zhao, T. Wang, H. Li, J. Xue, S. Zhu, R. Jin, T. Zhao, Pierre-Michel Bousquet, Moez Ajili, Waad Ben Kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit G. B. Fauve, Kaavya Sriskandaraja, Vidhyasaharan Sethu, W. W. Lin, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Massimiliano Todisco, Nicholas W. D. Evans, Haizhou Li, John H. L. Hansen, Jean-François Bonastre, Eliathamby Ambikairajah:
The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016. 1328-1332 - Pedro A. Torres-Carrasquillo, Fred Richardson, Shahan C. Nercessian, Douglas E. Sturim, William M. Campbell, Youngjune Gwon, Swaroop Vattam, Najim Dehak
, Sri Harish Reddy Mallidi, Phani Sankar Nidadavolu, Ruizhi Li, Réda Dehak:
The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation System. 1333-1337 - Daniele Colibro, Claudio Vair, Emanuele Dalmasso, Kevin Farrell, Gennady Karvitsky, Sandro Cumani, Pietro Laface:
Nuance - Politecnico di Torino's 2016 NIST Speaker Recognition Evaluation System. 1338-1342 - Chunlei Zhang, Fahimeh Bahmaninezhad, Shivesh Ranjan, Chengzhu Yu, Navid Shokouhi, John H. L. Hansen:
UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation. 1343-1347 - Oldrich Plchot, Pavel Matejka, Anna Silnova, Ondrej Novotný, Mireia Díez Sánchez
, Johan Rohdin
, Ondrej Glembek, Niko Brümmer, Albert Swart, Jesús Jorrín-Prieto, Paola García, Luis Buera, Patrick Kenny, Md. Jahangir Alam, Gautam Bhattacharya:
Analysis and Description of ABC Submission to NIST SRE 2016. 1348-1352 - Seyed Omid Sadjadi, Timothée Kheyrkhah, Audrey Tong, Craig S. Greenberg, Douglas A. Reynolds, Elliot Singer, Lisa P. Mason, Jaime Hernandez-Cordero
:
The 2016 NIST Speaker Recognition Evaluation. 1353-1357
Glottal Source Modeling
- Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda
, Toshio Irino:
A New Cosine Series Antialiasing Function and its Application to Aliasing-Free Glottal Source Models for Speech and Singing Synthesis. 1358-1362 - Ana Ramírez López, Shreyas Seshadri
, Lauri Juvela
, Okko Räsänen
, Paavo Alku
:
Speaking Style Conversion from Normal to Lombard Speech Using a Glottal Vocoder and Bayesian GMMs. 1363-1367 - Lauri Juvela
, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
:
Reducing Mismatch in Training of DNN-Based Glottal Excitation Models in a Statistical Parametric Text-to-Speech System. 1368-1372 - Alexander Sorin, Slava Shechtman, Asaf Rendel:
Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities. 1373-1377 - Rodrigo Manríquez, Sean D. Peterson, Pavel Prado
, Patricio Orio
, Matías Zañartu:
Modeling Laryngeal Muscle Activation Noise for Low-Order Physiological Based Speech Synthesis. 1378-1382 - Felipe Espic, Cassia Valentini-Botinhao, Simon King:
Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis. 1383-1387
Prosody: Rhythm, Stress, Quantity and Phrasing
- Heather Kember
, Ann-Kathrin Grohe, Katharina Zahner, Bettina Braun
, Andrea Weber, Anne Cutler:
Similar Prosodic Structure Perceived Differently in German and English. 1388-1392 - Luying Hou, Bert Le Bruyn
, René Kager
:
Disambiguate or not? - The Role of Prosody in Unambiguous and Potentially Ambiguous Anaphora Production in Strictly Mandarin Parallel Structures. 1393-1397 - Angeliki Athanasopoulou, Irene Vogel, Hossep Dolatian
:
Acoustic Properties of Canonical and Non-Canonical Stress in French, Turkish, Armenian and Brazilian Portuguese. 1398-1402 - Leendert Plug
, Rachel Smith:
Phonological Complexity, Segment Rate and Speech Tempo Perception. 1403-1406 - Jing Yang, Yu Zhang, Aijun Li, Li Xu:
On the Duration of Mandarin Tones. 1407-1411 - Otto Ewald, Eva Liina Asu
, Susanne Schötz:
The Formant Dynamics of Long Close Vowels in Three Varieties of Swedish. 1412-1416
Speech Recognition for Language Learning
- Yao Qian, Keelan Evanini, Xinhao Wang, Chong Min Lee, Matthew Mulholland:
Bidirectional LSTM-RNN for Improving Automated Assessment of Non-Native Children's Speech. 1417-1421 - Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, Nobuaki Minematsu:
Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW. 1422-1426 - Chong Min Lee, Su-Youn Yoon, Xihao Wang, Matthew Mulholland, Ikkyu Choi, Keelan Evanini:
Off-Topic Spoken Response Detection Using Siamese Convolutional Neural Networks. 1427-1431 - Vipul Arora
, Aditi Lahiri, Henning Reetz:
Phonological Feature Based Mispronunciation Detection and Diagnosis Using Multi-Task DNNs and Active Learning. 1432-1436 - Jorge Proença
, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão
:
Detection of Mispronunciations and Disfluencies in Children Reading Aloud. 1437-1441 - David Escudero Mancebo
, César González Ferreras
, Lourdes Aguilar
, Eva Estebas-Vilaplana:
Automatic Assessment of Non-Native Prosody by Measuring Distances on Prosodic Label Sequences. 1442-1446
Stance, Credibility, and Deception
- Nigel G. Ward, Jason C. Carlson, Olac Fuentes, Diego Castán, Elizabeth Shriberg, Andreas Tsiartas:
Inferring Stance from Prosody. 1447-1451 - Gina-Anne Levow, Richard A. Wright
:
Exploring Dynamic Measures of Stance in Spoken Interaction. 1452-1456 - Valentin Barrière, Chloé Clavel
, Slim Essid:
Opinion Dynamics Modeling for Movie Review Transcripts Classification with Hidden Conditional Random Fields. 1457-1461 - Qinyi Luo, Rahul Gupta, Shrikanth S. Narayanan:
Transfer Learning Between Concepts for Human Behavior Modeling: An Application to Sincerity and Deception Prediction. 1462-1466 - Anne Schröder, Simon Stone
, Peter Birkholz
:
The Sound of Deception - What Makes a Speaker Credible? 1467-1471 - Gideon Mendels, Sarah Ita Levitan
, Kai-Zhan Lee, Julia Hirschberg:
Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection. 1472-1476
Short Utterances Speaker Recognition
- Albert Swart, Niko Brümmer:
A Generative Model for Score Normalization in Speaker Recognition. 1477-1481 - Subhadeep Dey, Srikanth R. Madikeri, Petr Motlícek
, Marc Ferras:
Content Normalization for Text-Dependent Speaker Verification. 1482-1486 - Chunlei Zhang, Kazuhito Koishida:
End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances. 1487-1491 - Hong Yu, Zheng-Hua Tan
, Zhanyu Ma, Jun Guo:
Adversarial Network Bottleneck Features for Noise Robust Speaker Verification. 1492-1496 - Shuai Wang, Yanmin Qian, Kai Yu:
What Does the Speaker Embedding Encode? 1497-1501 - Jianbo Ma, Vidhyasaharan Sethu
, Eliathamby Ambikairajah
, Kong-Aik Lee
:
Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification. 1502-1506 - Jinghua Zhong, Wenping Hu, Frank K. Soong, Helen Meng:
DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances. 1507-1511 - Ville Vestman, Dhananjaya N. Gowda, Md. Sahidullah
, Paavo Alku
, Tomi Kinnunen:
Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions. 1512-1516 - Gautam Bhattacharya, Jahangir Alam, Patrick Kenny:
Deep Speaker Embeddings for Short-Duration Speaker Verification. 1517-1521 - Soo Jin Park, Gary Yeung, Jody Kreiman, Patricia A. Keating, Abeer Alwan:
Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems. 1522-1526 - Kong-Aik Lee
, Haizhou Li
:
Gain Compensation for Fast i-Vector Extraction Over Short Duration. 1527-1531 - Hee-Soo Heo, Jee-weon Jung, Il-Ho Yang, Sung-Hyun Yoon, Ha-Jin Yu:
Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification. 1532-1536
Speaker Characterization and Recognition
- Chen Chen
, Jiqing Han, Yilin Pan:
Speaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least Squares. 1537-1541 - Lantian Li
, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang:
Deep Speaker Feature Learning for Text-Independent Speaker Verification. 1542-1546 - Pierre-Michel Bousquet, Mickael Rouvier:
Duration Mismatch Compensation Using Four-Covariance Model and Deep Neural Network for Speaker Verification. 1547-1551 - Alan McCree, Gregory Sell, Daniel Garcia-Romero:
Extended Variability Modeling and Unsupervised Adaptation for PLDA Speaker Recognition. 1552-1556 - Bengt J. Borgström, Elliot Singer, Douglas A. Reynolds, Seyed Omid Sadjadi:
Improving the Effectiveness of Speaker Verification Domain Adaptation with Inadequate In-Domain Data. 1557-1561 - Zhili Tan
, Man-Wai Mak:
i-Vector DNN Scoring and Calibration for Noise Robust Speaker Verification. 1562-1566 - Pavel Matejka, Ondrej Novotný, Oldrich Plchot, Lukás Burget
, Mireia Díez Sánchez
, Jan Cernocký
:
Analysis of Score Normalization in Multilingual Speaker Recognition. 1567-1571 - Anna Silnova, Lukás Burget
, Jan Cernocký
:
Alternative Approaches to Neural Network Based Speaker Verification. 1572-1575 - Ruchir Travadi, Shrikanth S. Narayanan:
A Distribution Free Formulation of the Total Variability Model. 1576-1580 - Md. Hafizur Rahman, Ivan Himawan, David Dean, Sridha Sridharan:
Domain Mismatch Modeling of Out-Domain i-Vectors for PLDA Speaker Verification. 1581-1585
Acoustic Models for ASR 1
- Gaofeng Cheng, Vijayaditya Peddinti, Daniel Povey, Vimal Manohar, Sanjeev Khudanpur, Yonghong Yan:
An Exploration of Dropout with LSTMs. 1586-1590 - Jaeyoung Kim, Mostafa El-Khamy
, Jungwon Lee:
Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition. 1591-1595 - Dung T. Tran, Marc Delcroix
, Shigeki Karita, Michael Hentschel, Atsunori Ogawa, Tomohiro Nakatani:
Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling. 1596-1600 - Shigeki Karita, Atsunori Ogawa, Marc Delcroix
, Tomohiro Nakatani:
Forward-Backward Convolutional LSTM for Acoustic Modeling. 1601-1605 - Sercan Ömer Arik, Markus Kliegl
, Rewon Child, Joel Hestness, Andrew Gibiansky, Christopher Fougner, Ryan Prenger, Adam Coates:
Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting. 1606-1610 - Chunyang Wu, Mark J. F. Gales:
Deep Activation Mixture Model for Speech Recognition. 1611-1615 - Michael Heck, Masayuki Suzuki, Takashi Fukuda, Gakuto Kurata, Satoshi Nakamura:
Ensembles of Multi-Scale VGG Acoustic Models. 1616-1620 - Tamás Grósz
, Gábor Gosztolya, László Tóth
:
Training Context-Dependent DNN Acoustic Models Using Probabilistic Sampling. 1621-1625 - Tamás Grósz
, Gábor Gosztolya, László Tóth
:
A Comparative Evaluation of GMM-Free State Tying Methods for ASR. 1626-1630
Acoustic Models for ASR 2
- Yiming Wang, Vijayaditya Peddinti, Hainan Xu, Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur:
Backstitch: Counteracting Finite-Sample Bias via Negative Steps. 1631-1635 - Ryu Takeda
, Kazuhiro Nakadai, Kazunori Komatani:
Node Pruning Based on Entropy of Weights and Node Activity for Small-Footprint Acoustic Model Based on Deep Neural Networks. 1636-1640 - Ehsan Variani, Tom Bagby, Erik McDermott, Michiel Bacchiani:
End-to-End Training of Acoustic Models for Large Vocabulary Continuous Speech Recognition with TensorFlow. 1641-1645 - Khe Chai Sim, Arun Narayanan:
An Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix Multiplication. 1646-1650 - Zoltán Tüske, Wilfried Michel, Ralf Schlüter
, Hermann Ney:
Parallel Neural Network Features for Improved Tandem Acoustic Modeling. 1651-1655 - Qingming Tang, Weiran Wang, Karen Livescu
:
Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis. 1656-1660
Dialog Modeling
- Ryo Masumura, Taichi Asami, Hirokazu Masataki, Ryo Ishii, Ryuichiro Higashinaka:
Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential Networks. 1661-1665 - Marcin Wlodarczak, Kornel Laskowski, Mattias Heldner
, Kätlin Aare:
Improving Prediction of Speech Activity Using Multi-Participant Respiratory State. 1666-1670 - Peter A. Heeman, Rebecca Lunsford:
Turn-Taking Offsets and Dialogue Context. 1671-1675 - Angelika Maier
, Julian Hough, David Schlangen:
Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue Systems. 1676-1680 - Yuichi Ishimoto, Takehiro Teraoka, Mika Enomoto:
End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech. 1681-1685 - Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro:
Turn-Taking Estimation Model Based on Joint Embedding of Lexical and Prosodic Contents. 1686-1690 - Hirofumi Inaguma, Koji Inoue, Masato Mimura, Tatsuya Kawahara
:
Social Signal Detection in Spontaneous Dialogue Using Bidirectional LSTM-CTC. 1691-1695 - Zahra Rahimi, Anish Kumar
, Diane J. Litman, Susannah Paletz, Mingzhi Yu:
Entrainment in Multi-Party Spoken Dialogues at Multiple Linguistic Levels. 1696-1700 - Justine Reverdy, Carl Vogel
:
Measuring Synchrony in Task-Based Dialogues. 1701-1705 - Paul A. Crook, Alex Marin:
Sequence to Sequence Modeling for User Simulation in Dialog Systems. 1706-1710 - Vikram Ramanarayanan, Patrick L. Lange, Keelan Evanini, Hillary R. Molloy, David Suendermann-Oeft:
Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human-Machine Spoken Dialog Interactions. 1711-1715 - Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono:
Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls. 1716-1720 - Stefan Ultes, Pawel Budzianowski, Iñigo Casanueva, Nikola Mrksic, Lina Maria Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica Gasic, Steve J. Young:
Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning. 1721-1725 - Shizuka Nakamura, Ryosuke Nakanishi, Katsuya Takanashi, Tatsuya Kawahara
:
Analysis of the Relationship Between Prosodic Features of Fillers and its Forms or Occurrence Positions. 1726-1730 - Syeda Narjis Fatima, Engin Erzin
:
Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic Interactions. 1731-1735
L1 and L2 Acquisition
- Micha Elsner, Kiwako Ito
:
An Automatically Aligned Corpus of Child-Directed Speech. 1736-1740 - Ocke-Schwen Bohn, Trine Askjær-Jørgensen:
A Comparison of Danish Listeners' Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences. 1741-1744 - Felicitas Kleber:
On the Role of Temporal Variability in the Acquisition of the German Vowel Length Contrast. 1745-1749 - Patrick F. Reidy, Mary E. Beckman, Jan Edwards, Benjamin Munson
:
A Data-Driven Approach for Perceptually Validated Acoustic Features for Children's Sibilant Fricative Productions. 1750-1754 - Yujia Xiao, Frank K. Soong:
Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as Reference. 1755-1759 - Si Chen, Yunjuan He, Chun Wah Yuen, Bei Li, Yike Yang
:
Mechanisms of Tone Sandhi Rule Application by Non-Native Speakers. 1760-1764 - Seth Wiener:
Changes in Early L2 Cue-Weighting of Non-Native Speech: Evidence from Learners of Mandarin Chinese. 1765-1769 - Ying Chen, Eric Pederson:
Directing Attention During Perceptual Training: A Preliminary Study of Phonetic Learning in Southern Min by Mandarin Speakers. 1770-1774 - Dean Luo, Ruxin Luo, Lixin Wang:
Prosody Analysis of L2 English for Naturalness Evaluation Through Speech Modification. 1775-1778 - Gintare Grigonyte, Gerold Schneider
:
Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production. 1779-1783 - Adriana Hanulíková
, Jenny Ekström:
Lexical Adaptation to a Novel Accent in German: A Comparison Between German, Swedish, and Finnish Listeners. 1784-1788 - Alejandra Keidel Fernández
, Thomas Hörberg
:
Qualitative Differences in L3 Learners' Neurophysiological Response to L1 versus L2 Transfer. 1789-1793 - Johan Sjons, Thomas Hörberg
, Robert Östling, Johannes Bjerva
:
Articulation Rate in Swedish Child-Directed Speech Increases as a Function of the Age of the Child Even When Surprisal is Controlled for. 1794-1798 - Kaile Zhang
, Gang Peng:
The Relationship Between the Perception and Production of Non-Native Tones. 1799-1803 - Ellen Marklund, Elísabet Eir Cortes
, Johan Sjons:
MMN Responses in Adults After Exposure to Bimodal and Unimodal Frequency Distributions of Rotated Speech. 1804-1808
Voice, Speech and Hearing Disorders
- Visar Berisha
, Julie Liss, Timothy Huston, Alan Wisler
, Yishan Jiao, Jonathan Eig:
Float Like a Butterfly Sting Like a Bee: Changes in Speech Preceded Parkinsonism Diagnosis for Muhammad Ali. 1809-1813 - Antonella Castellana, Andreas Selamtzis
, Giampiero Salvi
, Alessio Carullo
, Arianna Astolfi
:
Cepstral and Entropy Analyses in Vowels Excerpted from Continuous Speech of Dysphonic and Control Speakers. 1814-1818 - Andrea Bandini
, Jordan R. Green, Lorne Zinman, Yana Yunusova
:
Classification of Bulbar ALS from Kinematic Features of the Jaw and Lips: Towards Computer-Mediated Assessment. 1819-1823 - Nagaraj Adiga
, Vikram C. M., Keerthi Pullela, S. R. Mahadeva Prasanna:
Zero Frequency Filter Based Analysis of Voice Disorders. 1824-1828 - Nikitha K., Sishir Kalita, Vikram C. M., M. Pushpavathi, S. R. Mahadeva Prasanna:
Hypernasality Severity Analysis in Cleft Lip and Palate Speech Using Vowel Space Area. 1829-1833 - Imed Laaridh, Waad Ben Kheder, Corinne Fredouille, Christine Meunier:
Automatic Prediction of Speech Evaluation Metrics for Dysarthric Speech. 1834-1838 - Philipp Klumpp, Thomas Janu, Tomás Arias-Vergara
, Juan Camilo Vásquez-Correa
, Juan Rafael Orozco-Arroyave
, Elmar Nöth:
Apkinson - A Mobile Monitoring Solution for Parkinson's Disease. 1839-1843 - Jan Hlavnicka, Tereza Tykalová
, Roman Cmejla
, Jirí Klempír, Evzen Ruzicka, Jan Rusz
:
Dysprosody Differentiate Between Parkinson's Disease, Progressive Supranuclear Palsy, and Multiple System Atrophy. 1844-1848 - Ming Tu, Visar Berisha
, Julie Liss:
Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural Networks. 1849-1853 - Bhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu:
Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition. 1854-1858 - Jason Lilley
, Madhavi Vedula Ratnagiri, H. Timothy Bunnell
:
Prediction of Speech Delay from Acoustic Measurements. 1859-1863 - Aijun Li, Hua Zhang, Wen Sun:
The Frequency Range of "The Ling Six Sounds" in Standard Chinese. 1864-1868 - Wentao Gu
, Jiao Yin, James J. Mahshie:
Production of Sustained Vowels and Categorical Perception of Tones in Mandarin Among Cochlear-Implanted Children. 1869-1873
Source Separation and Voice Activity Detection
- Anurag Kumar, Benjamin Elizalde, Bhiksha Raj:
Audio Content Based Geotagging in Multimedia. 1874-1878 - Zhaoqiong Huang, Zhanzhong Cao, Dongwen Ying, Jielin Pan, Yonghong Yan:
Time Delay Histogram Based Speech Source Separation Using a Planar Array. 1879-1883 - Gayadhar Pradhan
, Avinash Kumar, Syed Shahnawazuddin
:
Excitation Source Features for Improving the Detection of Vowel Onset and Offset Points in a Speech Sequence. 1884-1888 - Wei Gao, Roberto Togneri
, Victor Sreeram
:
A Contrast Function and Algorithm for Blind Separation of Audio Signals. 1889-1893 - Chenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng
, Haizhou Li
:
Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source. 1894-1898 - Feng Guo, Yuhang Cao, Zheng Liu
, Jiaen Liang, Baoqing Li, Xiaobing Yuan:
Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern. 1899-1903 - Xianyun Wang, Changchun Bao, Feng Bao:
A Mask Estimation Method Integrating Data Field Model for Speech Enhancement. 1904-1908 - Matt Shannon, Gabor Simko, Shuo-Yiin Chang, Carolina Parada:
Improved End-of-Query Detection for Streaming Speech Recognition. 1909-1913 - Di He, Zuofu Cheng, Mark Hasegawa-Johnson, Deming Chen:
Using Approximated Auditory Roughness as a Pre-Filtering Feature for Human Screaming and Affective Speech AED. 1914-1918 - Jeroen Zegers
, Hugo Van hamme
:
Improving Source Separation via Multi-Speaker Representations. 1919-1923 - Bing Yang, Hong Liu, Cheng Pang:
Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector. 1924-1928 - Girija Ramesan Karthik, Prasanta Kumar Ghosh:
Subband Selection for Binaural Speech Source Localization. 1929-1933 - Bo-Rui Chen, Huang-Yi Lee, Yi-Wen Liu:
Unmixing Convolutive Mixtures by Exploiting Amplitude Co-Modulation: Methods and Evaluation on Mandarin Speech Recordings. 1934-1937 - Fei Tao, Carlos Busso
:
Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection. 1938-1942 - Roland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Björn Hoffmeister:
Domain-Specific Utterance End-Point Detection for Speech Recognition. 1943-1947 - Vinay Kothapally, John H. L. Hansen:
Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant Environments. 1948-1952
Speech-enhancement
- Yi-Chiao Wu, Hsin-Te Hwang, Syu-Siang Wang
, Chin-Cheng Hsu, Yu Tsao
, Hsin-Min Wang
:
A Post-Filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement. 1953-1957 - Hui Zhang, Xueliang Zhang, Guanglai Gao:
Multi-Target Ensemble Learning for Monaural Speech Separation. 1958-1962 - Atsunori Ogawa, Keisuke Kinoshita
, Marc Delcroix
, Tomohiro Nakatani:
Improved Example-Based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example Search. 1963-1967 - Femke B. Gelderblom, Tron V. Tronstad
, Erlend Magnus Viggen
:
Subjective Intelligibility of Deep Neural Network-Based Speech Enhancement. 1968-1972 - Maria Koutsogiannaki, Holly Francois, Kihyun Choo, Eunmi Oh:
Real-Time Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility. 1973-1977 - Hans-Günter Hirsch, Michael Gref:
On the Influence of Modifying Magnitude and Phase Spectrum to Enhance Noisy Speech Signals. 1978-1982 - Robert Rehr, Timo Gerkmann
:
MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech Enhancement. 1983-1987 - Ricard Marxer
, Jon Barker:
Binary Mask Estimation Strategies for Constrained Imputation-Based Speech Enhancement. 1988-1992 - Se Rim Park, Jinwon Lee:
A Fully Convolutional Neural Network for Speech Enhancement. 1993-1997 - Li Li, Hirokazu Kameoka, Tomoki Toda
, Shoji Makino
:
Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization. 1998-2002 - Danny Websdale, Ben Milner:
A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation. 2003-2007 - Daniel Michelsanti
, Zheng-Hua Tan
:
Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification. 2008-2012 - Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang
, Dinei Florêncio, Mark Hasegawa-Johnson:
Speech Enhancement Using Bayesian Wavenet. 2013-2017 - Xueliang Zhang, DeLiang Wang:
Binaural Reverberant Speech Separation Based on Deep Neural Networks. 2018-2022 - Tudor-Catalin Zorila, Yannis Stylianou:
On the Quality and Intelligibility of Noisy Speech Processed for Near-End Listening Enhancement. 2023-2027
Show & Tell 3
- Ralf Meermeier, Sean Colbath:
Applications of the BBN Sage Speech Processing Platform. 2028-2029 - Milos Cernak, Alain Komaty, Amir Mohammadi, André Anjos, Sébastien Marcel:
Bob Speaks Kaldi. 2030-2031 - Michal Lenarczyk:
Real Time Pitch Shifting with Formant Structure Preservation Using the Phase Vocoder. 2032-2033 - Nivedita Chennupati, B. H. V. S. Narayana Murthy, B. Yegnanarayana:
A Signal Processing Approach for Speaker Separation Using SFF Analysis. 2034-2035 - Georg Stemmer, Munir Georges, Joachim Hofer, Piotr Rozen, Josef G. Bauer, Jakub Nowicki, Tobias Bocklet, Hannah R. Colett, Ohad Falik, Michael Deisher, Sylvia J. Downing:
Speech Recognition and Understanding on Hardware-Accelerated DSP. 2036-2037 - Sho Tsuji, Christina Bergmann, Molly Lewis, Mika Braginsky, Page Piccinini, Michael C. Frank, Alejandrina Cristià:
MetaLab: A Repository for Meta-Analyses on Language Development, and More. 2038-2039
Show & Tell 4
- Adrien Daniel:
Evolving Recurrent Neural Networks That Process and Classify Raw Audio in a Streaming Fashion. 2040-2041 - Milana Milosevic, Ulrike Glavitsch:
Combining Gaussian Mixture Models and Segmental Feature Models for Speaker Recognition. 2042-2043 - Gerhard Hagerer, Nicholas Cummins, Florian Eyben, Björn W. Schuller:
"Did you laugh enough today?" - Deep Neural Networks for Mobile and Wearable Laughter Trackers. 2044-2045 - Kwang Myung Jeon, Nam Kyun Kim, Chan Woong Kwak, Jung Min Moon, Hong Kook Kim:
Low-Frequency Ultrasonic Communication for Speech Broadcasting in Public Transportation. 2046-2047 - Sean U. N. Wood, Jean Rouat:
Real-Time Speech Enhancement with GCC-NMF: Demonstration on the Raspberry Pi and NVIDIA Jetson. 2048-2049 - Aku Rouhe, Reima Karhila, Peter Smit, Mikko Kurimo:
Reading Validation for Pronunciation Evaluation in the Digitala Project. 2050-2051
Keynote 2: Catherine Pelachaud
- Catherine Pelachaud:
Conversing with Social Agents That Smile and Laugh. 2052
Special Session: Digital Revolution for Under-resourced Languages 1
- Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin, Ondrej Glembek, Murali Karthick Baskar, Martin Karafiát
, Lukás Burget
, Mark Hasegawa-Johnson, Heng Ji, Jonathan May
, Kevin Knight, Shrikanth S. Narayanan:
Team ELISA System for DARPA LORELEI Speech Evaluation 2016. 2053-2057 - Péter Mihajlik
, Lili Szabó, Balázs Tarján
, András Balog, Krisztina Rábai:
First Results in Developing a Medieval Latin Language Charter Dictation System for the East-Central Europe Region. 2058-2062 - Catherine Inez Watson
, Peter Keegan
, Margaret Maclagan, Ray Harlow, J. King:
The Motivation and Development of MPAi, a Māori Pronunciation Aid. 2063-2067 - Siyuan Feng
, Tan Lee
:
On the Linguistic Relevance of Speech Units Learned by Unsupervised Acoustic Modeling. 2068-2072 - Amit Das, Mark Hasegawa-Johnson, Karel Veselý:
Deep Auto-Encoder Based Multi-Task Learning Using Probabilistic Transcriptions. 2073-2077 - Alexander Gutkin
, Richard Sproat:
Areal and Phylogenetic Features for Multilingual Speech Synthesis. 2078-2082
Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition
- Kathleen Currie Hall, Scott Mackie, Michael Fry, Oksana Tkachman
:
SLPAnnotator: Tools for Implementing Sign Language Phonetic Annotation. 2083-2087 - Iris-Corinna Schwarz
, Noor Botros, Alekzandra Lord, Amelie Marcusson, Henrik Tidelius, Ellen Marklund:
The LENA System Applied to Swedish: Reliability of the Adult Word Count Estimate. 2088-2092 - Marisa Casillas, Andrei Amatuni, Amanda Seidl, Melanie Soderstrom
, Anne S. Warlaumont, Elika Bergelson:
What do Babies Hear? Analyses of Child- and Adult-Directed Speech. 2093-2097 - Marisa Casillas, Elika Bergelson, Anne S. Warlaumont, Alejandrina Cristià, Melanie Soderstrom
, Mark VanDam
, Han Sloetjes:
A New Workflow for Semi-Automatized Annotations: Tests with Long-Form Naturalistic Recordings of Childrens Language Environments. 2098-2102 - Christina Bergmann
, Sho Tsuji, Alejandrina Cristià:
Top-Down versus Bottom-Up Theories of Phonological Acquisition: A Big Data Approach. 2103-2107 - Sho Tsuji, Alejandrina Cristià:
Which Acoustic and Phonological Factors Shape Infants' Vowel Discrimination? Exploiting Natural Variation in InPhonDB. 2108-2112
Special Session: Digital Revolution for Under-resourced Languages 2
- Ailbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy
, Christer Gobl
:
The ABAIR Initiative: Bringing Spoken Irish into the Digital Space. 2113-2117 - Armin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John A. Quinn, Thomas Niesler:
Very Low Resource Radio Browsing for Agile Developmental and Humanitarian Monitoring. 2118-2122 - Nikolaos Malandrakis, Ondrej Glembek, Shrikanth S. Narayanan:
Extracting Situation Frames from Non-English Speech: Evaluation Framework and Pilot Results. 2123-2127 - Daniil Kocharov
, Tatiana Kachkovskaia, Pavel A. Skrelin
:
Eliciting Meaningful Units from Speech. 2128-2132 - Saurabhchand Bhati, Shekhar Nayak
, K. Sri Rama Murty
:
Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications. 2133-2137 - Elodie Gauthier
, Laurent Besacier, Sylvie Voisin:
Machine Assisted Analysis of Vowel Length Contrasts in Wolof. 2138-2142 - Thomas Glarner, Benedikt T. Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach
:
Leveraging Text Data for Word Segmentation for Underresourced Languages. 2143-2147 - Xiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu:
Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual Initialization. 2148-2152 - Basil Abraham, Srinivasan Umesh
, Neethu Mariam Joy:
Joint Estimation of Articulatory Features and Acoustic Models for Low-Resource Languages. 2153-2157 - Basil Abraham, Tejaswi Seeram, Srinivasan Umesh
:
Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages. 2158-2162 - Inga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jón Guðnason:
Building an ASR Corpus Using Althingi's Parliamentary Speeches. 2163-2167 - Tanel Alumäe
, Andrus Paats, Ivo Fridolin
, Einar Meister
:
Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software. 2168-2172 - Jón Guðnason, Matthías Pétursson, Róbert Kjaran, Simon Klüpfel, Anna Björk Nikulásdóttir:
Building ASR Corpora Using Eyra. 2173-2177 - Daniel R. van Niekerk
, Charl Johannes van Heerden, Marelie H. Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha:
Rapid Development of TTS Corpora for Four South African Languages. 2178-2182 - Alexander Gutkin
:
Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages. 2183-2187 - Joseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King:
Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili. 2188-2192
Special Session: Computational Models in Child Language Acquisition
- Rong Tong, Nancy F. Chen
, Bin Ma:
Multi-Task Learning for Mispronunciation Detection on Singapore Children's Mandarin Speech. 2193-2197 - Elin Larsen, Alejandrina Cristià, Emmanuel Dupoux
:
Relating Unsupervised Word Segmentation to Reported Vocabulary Acquisition. 2198-2202 - Mats Wirén, Kristina N. Björkenstam, Robert Östling:
Modelling the Informativeness of Non-Verbal Cues in Parent-Child Interaction. 2203-2207 - Ellen Marklund, David Pagmar, Tove Gerholm, Lisa Gustavsson:
Computational Simulations of Temporal Vocalization Behavior in Adult-Child Interaction. 2208-2212 - Sofia Strömbergsson
, Jens Edlund, Jana Götze, Kristina Nilsson Björkenstam:
Approximating Phonotactic Input in Children's Linguistic Environments from Orthographic Transcripts. 2213-2217 - Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux
:
Learning Weakly Supervised Multimodal Phoneme Embeddings. 2218-2222
Special Session: Voice Attractiveness
- Yasunari Obuchi:
Personalized Quantification of Voice Attractiveness in Multidimensional Merit Space. 2223-2227 - Hans Rutger Bosker
:
The Role of Temporal Amplitude Modulations in the Political Arena: Hillary Clinton vs. Donald Trump. 2228-2232 - Laura Fernández Gallardo, Rafael Zequeira Jiménez, Sebastian Möller:
Perceptual Ratings of Voice Likability Collected Through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing. 2233-2237 - Jürgen Trouvain, Frank Zimmerer:
Attractiveness of French Voices for German Listeners - Results from Native and Non-Native Read Speech. 2238-2242 - Antje Schweitzer, Natalie Lewandowski
, Daniel Duran
:
Social Attractiveness in Dialogs. 2243-2247 - Eszter Novák-Tót, Oliver Niebuhr
, Aoju Chen
:
A Gender Bias in the Acoustic-Melodic Features of Charismatic Speech? 2248-2252 - Jan Michalsky, Heike Schoormann:
Pitch Convergence as an Effect of Perceived Attractiveness and Likability. 2253-2256 - Li Jiao, Chengxia Wang, Cristiane Hsu
, Peter Birkholz
, Yi Xu:
Does Posh English Sound Attractive? 2257-2261 - Timo Baumann
:
Large-Scale Speaker Ranking from Crowdsourced Pairwise Listener Ratings. 2262-2266
Speech Production and Physiology
- Rosario Signorello, Sergio Hassid, Didier Demolin:
Aerodynamic Features of French Fricatives. 2267-2271 - Antoine Serrurier
, Pierre Badin, Louis-Jean Boë, Laurent Lamalle
, Christiane Neuschaefer-Rube
:
Inter-Speaker Variability: Speaker Normalisation and Quantitative Estimation of Articulatory Invariants in Speech Production for French. 2272-2276 - Nimisha Patil, Timothy Greer, Reed Blaylock
, Shrikanth S. Narayanan:
Comparison of Basic Beatboxing Articulations Between Expert and Novice Artists Using Real-Time Magnetic Resonance Imaging. 2277-2281 - Keyi Tang, Negar M. Harandi, Jonghye Woo, Georges El Fakhri, Maureen Stone, Sidney S. Fels
:
Speaker-Specific Biomechanical Model-Based Investigation of a Simple Speech Task Based on Tagged-MRI. 2282-2286 - Reed Blaylock
, Nimisha Patil, Timothy Greer, Shrikanth S. Narayanan:
Sounds of the Human Vocal Tract. 2287-2291 - Yasufumi Uezu, Tokihiko Kaburagi:
A Simulation Study on the Effect of Glottal Boundary Conditions on Vocal Tract Formants. 2292-2296
Speech and Harmonic Analysis
- P. Gangamohan
, B. Yegnanarayana:
A Robust and Alternative Approach to Zero Frequency Filtering Method for Epoch Extraction. 2297-2300 - Kanru Hua:
Improving YANGsaf F0 Estimator with Adaptive Kalman Filter. 2301-2305 - Jitendra Kumar Dhiman
, Nagaraj Adiga
, Chandra Sekhar Seelamantula:
A Spectro-Temporal Demodulation Technique for Pitch Estimation. 2306-2310 - Kenichiro Miwa, Masashi Unoki
:
Robust Method for Estimating F0 of Complex Tone Based on Pitch Perception of Amplitude Modulated Signal. 2311-2315 - Simon Graf, Tobias Herbig, Markus Buck, Gerhard Schmidt:
Low-Complexity Pitch Estimation Based on Phase Differences Between Low-Resolution Spectra. 2316-2320 - Masanori Morise:
Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals. 2321-2325
Dialog and Prosody
- Sabrina Stehwien, Ngoc Thang Vu:
Prosodic Event Recognition Using Convolutional Neural Networks with Context Information. 2326-2330 - Ramiro H. Gálvez, Stefan Benus
, Agustín Gravano, Marián Trnka
:
Prosodic Facilitation and Interference While Judging on the Veracity of Synthesized Statements. 2331-2335 - Margaret Zellers
, Antje Schweitzer:
An Investigation of Pitch Matching Across Adjacent Turns in a Corpus of Spontaneous German. 2336-2340 - Sankar Mukherjee, Alessandro D'Ausilio
, Noël Nguyen
, Luciano Fadiga
, Leonardo Badino
:
The Relationship Between F0 Synchrony and Speech Convergence in Dyadic Interaction. 2341-2345 - Jordi Luque
, Carlos Segura, Ariadna Sánchez, Martí Umbert, Luis Angel Galindo:
The Role of Linguistic and Prosodic Cues on the Prediction of Self-Reported Satisfaction in Contact Centre Phone Calls. 2346-2350 - Pablo Brusco, Juan Manuel Pérez, Agustín Gravano:
Cross-Linguistic Study of the Production of Turn-Taking Cues in American English and Argentine Spanish. 2351-2355
Social Signals, Styles, and Interaction
- Olga Egorow, Andreas Wendemuth:
Emotional Features for Speech Overlaps Classification. 2356-2360 - Chin-Po Chen, Xian-Hong Tseng, Susan Shur-Fen Gau
, Chi-Chun Lee
:
Computing Multimodal Dyadic Behaviors During Spontaneous Diagnosis Interviews Toward Automatic Categorization of Autism Spectrum Disorder. 2361-2365 - Yun-Shao Lin, Chi-Chun Lee
:
Deriving Dyad-Level Interaction Representation Using Interlocutors Structural and Expressive Multimodal Behavior Features. 2366-2370 - Raymond Brueckner, Maximilian Schmitt, Maja Pantic, Björn W. Schuller
:
Spotting Social Signals in Conversational Speech over IP: A Deep Learning Perspective. 2371-2375 - Gábor Gosztolya:
Optimized Time Series Filters for Detecting Laughter and Filler Events. 2376-2380 - Fasih Haider
, Fahim A. Salim, Saturnino Luz
, Carl Vogel
, Owen Conlan
, Nick Campbell:
Visual, Laughter, Applause and Spoken Expression Features for Predicting Engagement Within TED Talks. 2381-2385
Acoustic Model Adaptation
- Jinyu Li
, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong:
Large-Scale Domain Adaptation via Teacher-Student Learning. 2386-2390 - Waquar Ahmad
, Syed Shahnawazuddin
, Hemant Kumar Kathania, Gayadhar Pradhan
, Arun B. Samaddar:
Improving Children's Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion. 2391-2395 - Xurong Xie, Xunying Liu, Tan Lee
, Lan Wang:
RNN-LDA Clustering for Feature Based DNN Adaptation. 2396-2400 - Harish Arsikere, Sri Garimella:
Robust Online i-Vectors for Unsupervised Adaptation of DNN Acoustic Models: A Study in the Context of Digital Voice Assistants. 2401-2405 - Ajay Srinivasamurthy, Petr Motlícek
, Ivan Himawan, György Szaszák, Youssef Oualil, Hartmut Helmke:
Semi-Supervised Learning with Semantic Knowledge Extraction for Improved Speech Recognition in Air Traffic Control. 2406-2410 - Taesup Kim, Inchul Song, Yoshua Bengio:
Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition. 2411-2415
Cognition and Brain Studies
- Hans Rutger Bosker
, Anne Kösem
:
An Entrained Rhythm's Frequency, Not Phase, Influences Temporal Sampling of Speech. 2416-2420 - Xiao Wang, Yanhui Zhang
, Gang Peng:
Context Regularity Indexed by Auditory N1 and P2 Event-Related Potentials. 2421-2425 - Sakshi Verma, K. L. Prateek, Karthik Pandia
, Nauman Dawalatabad, Rogier Landman
, Jitendra Sharma, Mriganka Sur, Hema A. Murthy:
Discovering Language in Marmoset Vocalization. 2426-2430 - Hiroki Watanabe, Hiroki Tanaka, Sakriani Sakti, Satoshi Nakamura:
Subject-Independent Classification of Japanese Spoken Sentences by Multiple Frequency Bands Phase Pattern of EEG Response During Speech Perception. 2431-2435 - Noémie te Rietmolen, Radouane El Yagoubi, Alain Ghio
, Corine Astésano
:
The Phonological Status of the French Initial Accent and its Role in Semantic Processing: An Event-Related Potentials Study. 2436-2440 - Bin Zhao, Jianwu Dang, Gaoyan Zhang:
A Neuro-Experimental Evidence for the Motor Theory of Speech Perception. 2441-2445
Noise Robust Speech Recognition
- Purvi Agrawal, Sriram Ganapathy:
Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR. 2446-2450 - Masato Mimura, Yoshiaki Bando, Kazuki Shimada
, Shinsuke Sakai, Kazuyoshi Yoshii
, Tatsuya Kawahara
:
Combined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition. 2451-2455 - Dong Yu, Xuankai Chang, Yanmin Qian:
Recognizing Multi-Talker Speech with Permutation Invariant Training. 2456-2460 - Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken'ichi Furuya
, Shinji Watanabe
, Jonathan Le Roux:
Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral Information. 2461-2465 - Erfan Loweimi
, Jon Barker, Thomas Hain
:
Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR. 2466-2470 - Brian John King, I-Fan Chen, Yonatan Vaizman, Yuzong Liu, Roland Maas, Sree Hari Krishnan Parthasarathi, Björn Hoffmeister:
Robust Speech Recognition via Anchor Word Representations. 2471-2475
Topic Spotting, Entity Extraction and Semantic Analysis
- Ankur Bapna, Gökhan Tür, Dilek Hakkani-Tür
, Larry P. Heck:
Towards Zero-Shot Frame Semantic Parsing for Domain Scaling. 2476-2480 - Despoina Georgiadou, Vassilios Diakoloukas, Vassilios Tsiaras, Vassilios Digalakis:
ClockWork-RNN Based Architectures for Slot Filling. 2481-2485 - Mohamed Ameur Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset:
Investigating the Effect of ASR Tuning on Named Entity Recognition. 2486-2490 - Marco Dinarelli, Vedran Vukotic, Christian Raymond:
Label-Dependency Coding in Simple Recurrent Networks for Spoken Language Understanding. 2491-2495 - Zhong Meng, Biing-Hwang Juang:
Minimum Semantic Error Cost Training of Deep Long Short-Term Memory Networks for Topic Spotting on Conversational Speech. 2496-2500 - Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur:
Topic Identification for Speech Without ASR. 2501-2505
Dialog Systems
- Bing Liu, Ian R. Lane:
An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog. 2506-2510 - Heriberto Cuayáhuitl, Seunghak Yu:
Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates. 2511-2515 - Ali Orkan Bayer, Evgeny A. Stepanov, Giuseppe Riccardi:
Towards End-to-End Spoken Dialogue Systems with Turn Embeddings. 2516-2520 - Oleg Akhtiamov, Maxim Sidorov, Alexey A. Karpov
, Wolfgang Minker:
Speech and Text Analysis for Multimodal Addressee Detection in Human-Human-Computer Interaction. 2521-2525 - Vikram Ramanarayanan, Chee Wee Leong, David Suendermann-Oeft:
Rushing to Judgement: How do Laypeople Rate Caller Engagement in Thin-Slice Videos of Human-Machine Dialog? 2526-2530 - Ivan Kraljevski
, Diane Hirschfeld:
Hyperarticulation of Corrections in Multilingual Dialogue Systems. 2531-2535
Lexical and Pronunciation Modeling
- Benjamin Milde, Christoph Schmidt, Joachim Köhler:
Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion. 2536-2540 - Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur:
Acoustic Data-Driven Lexicon Learning Based on a Greedy Pronunciation Selection Framework. 2541-2545 - Takahiro Shinozaki, Shinji Watanabe
, Daichi Mochihashi, Graham Neubig:
Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text. 2546-2550 - Peter Smit
, Sami Virpioja
, Mikko Kurimo:
Improved Subword Modeling for WFST-Based Speech Recognition. 2551-2555 - Antoine Bruguier, Danushen Gnanapragasam, Leif Johnson, Kanishka Rao, Françoise Beaufays:
Pronunciation Learning with RNN-Transducers. 2556-2560 - Einat Naaman, Yossi Adi, Joseph Keshet
:
Learning Similarity Functions for Pronunciation Variations. 2561-2565
Language Recognition
- Gregory Gelly, Jean-Luc Gauvain:
Spoken Language Identification Using LSTM-Based Angular Proximity. 2566-2570 - Ma Jin, Yan Song, Ian Vince McLoughlin
, Wu Guo, Li-Rong Dai:
End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling. 2571-2575 - Qian Zhang, John H. L. Hansen:
Dialect Recognition Based on Unsupervised Bottleneck Features. 2576-2580 - Saad Irtza, Vidhyasaharan Sethu
, Eliathamby Ambikairajah
, Haizhou Li
:
Investigating Scalability in Hierarchical Language Identification System. 2581-2585 - Yao Qian, Keelan Evanini, Xinhao Wang, David Suendermann-Oeft, Robert A. Pugh, Patrick L. Lange, Hillary R. Molloy, Frank K. Soong:
Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech. 2586-2590 - Sameer Khurana, Maryam Najafian, Ahmed Ali, Tuka Al Hanai
, Yonatan Belinkov, James R. Glass:
QMDIS: QCRI-MIT Advanced Dialect Identification System. 2591-2595
Speaker Database and Anti-spoofing
- K. N. R. K. Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri
, Suryakanth V. Gangashetty
, Anil Kumar Vuppala:
Detection of Replay Attacks Using Single Frequency Filtering Cepstral Coefficients. 2596-2600 - Hardik B. Sailor
, Madhu R. Kamble, Hemant A. Patil:
Unsupervised Representation Learning Using Convolutional Restricted Boltzmann Machine for Spoof Speech Detection. 2601-2605 - Gajan Suthokumar, Kaavya Sriskandaraja
, Vidhyasaharan Sethu
, Chamith Wijenayake
, Eliathamby Ambikairajah
:
Independent Modelling of High and Low Energy Speech Frames for Spoofing Detection. 2606-2610 - Achintya Kumar Sarkar
, Md. Sahidullah
, Zheng-Hua Tan
, Tomi Kinnunen:
Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data. 2611-2615 - Arsha Nagrani, Joon Son Chung, Andrew Zisserman:
VoxCeleb: A Large-Scale Speaker Identification Dataset. 2616-2620 - Karen Jones, Stephanie M. Strassel, Kevin Walker, David Graff, Jonathan Wright:
Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition Technology. 2621-2624
Speech Translation
- Ron J. Weiss, Jan Chorowski
, Navdeep Jaitly, Yonghui Wu, Zhifeng Chen:
Sequence-to-Sequence Models Can Directly Translate Foreign Speech. 2625-2629 - Takatomo Kano, Sakriani Sakti, Satoshi Nakamura:
Structured-Based Curriculum Learning for End-to-End English-Japanese Speech Translation. 2630-2634 - Nicholas Ruiz, Mattia Antonino Di Gangi
, Nicola Bertoldi, Marcello Federico:
Assessing the Tolerance of Neural Machine Translation Systems Against Speech Recognition Errors. 2635-2639 - Quoc Truong Do, Sakriani Sakti, Satoshi Nakamura:
Toward Expressive Speech Translation: A Unified Sequence-to-Sequence LSTMs Approach for Translating Words and Emphasis. 2640-2644 - Eunah Cho, Jan Niehues
, Alex Waibel:
NMT-Based Segmentation and Punctuation Insertion for Real-Time Spoken Language Translation. 2645-2649
Multi-channel Speech Enhancement
- Lukas Drude, Reinhold Haeb-Umbach
:
Tight Integration of Spatial and Spectral Features for BSS with Deep Clustering Embeddings. 2650-2654 - Katerina Zmolíková
, Marc Delcroix
, Keisuke Kinoshita
, Takuya Higuchi, Atsunori Ogawa, Tomohiro Nakatani:
Speaker-Aware Neural Network Based Beamformer for Speaker Extraction in Speech Mixtures. 2655-2659 - Lukas Pfeifenberger, Matthias Zöhrer, Franz Pernkopf
:
Eigenvector-Based Speech Mask Estimation Using Logistic Regression. 2660-2664 - Sean U. N. Wood, Jean Rouat:
Real-Time Speech Enhancement with GCC-NMF. 2665-2669 - Youna Ji, Jun Byun, Young-Cheol Park:
Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy Environment. 2670-2674 - Yang Zhang, Dinei Florêncio, Mark Hasegawa-Johnson:
Glottal Model Based Speech Beamforming for ad-hoc Microphone Arrays. 2675-2679
Speech Recognition: Applications in Medical Practice
- Yuanyuan Liu
, Tan Lee
, P. C. Ching, Thomas K. T. Law, Kathy Y. S. Lee
:
Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features. 2680-2684 - Emre Yilmaz
, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik
:
Multi-Stage DNN Training for Automatic Recognition of Dysarthric Speech. 2685-2689 - Daniel V. Smith, Alex Sneddon, Lauren Ward, Andreas Duenser
, Jill Freyne, David Silvera-Tawil, Angela Morgan
:
Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech. 2690-2694 - Neethu Mariam Joy, Srinivasan Umesh
, Basil Abraham:
On Improving Acoustic Models for TORGO Dysarthric Speech Database. 2695-2699 - Olympia Simantiraki, Paulos Charonyktakis, Anastasia Pampouchidou
, Manolis Tsiknakis, Martin Cooke:
Glottal Source Features for Automatic Speech-Based Depression Assessment. 2700-2704 - Roozbeh Sadeghian
, J. David Schaffer, Stephen A. Zahorian:
Speech Processing Approach for Diagnosing Dementia in an Early Stage. 2705-2709
Language models for ASR
- Fadi Biadsy, Mohammadreza Ghodsi, Diamantino Caseiro:
Effectively Building Tera Scale MaxEnt Language Models Incorporating Non-Linguistic Signals. 2710-2714 - Salil Deena, Raymond W. M. Ng, Pranava Swaroop Madhyastha
, Lucia Specia, Thomas Hain
:
Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features. 2715-2719 - Mittul Singh
, Youssef Oualil, Dietrich Klakow:
Approximated and Domain-Adapted LSTM Language Models for First-Pass Decoding in Speech Recognition. 2720-2724 - Ciprian Chelba, Diamantino Caseiro, Fadi Biadsy:
Sparse Non-Negative Matrix Language Modeling: Maximum Entropy Flexibility on the Cheap. 2725-2729 - Manoj Kumar, Daniel Bone, Kelly McWilliams, Shanna Williams, Thomas D. Lyon
, Shrikanth S. Narayanan:
Multi-Scale Context Adaptation for Improving Child Automatic Speech Recognition in Child-Adult Spoken Interactions. 2730-2734 - Weiwu Zhu:
Using Knowledge Graph and Search Query Click Logs in Statistical Language Model for Speech Recognition. 2735-2738
Speech Recognition: Technologies for New Applications and Paradigms
- Dimitrios Dimitriadis, Petr Fousek:
Developing On-Line Speaker Diarization System. 2739-2743 - Shreyas Seshadri
, Ulpu Remes
, Okko Räsänen
:
Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing. 2744-2748 - Jorge Proença
, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão
:
Automatic Evaluation of Children Reading Aloud on Sentences and Pseudowords. 2749-2753 - Su-Youn Yoon, Chong Min Lee, Ikkyu Choi, Xinhao Wang, Matthew Mulholland, Keelan Evanini:
Off-Topic Spoken Response Detection with Word Embeddings. 2754-2758 - Wei Li, Nancy F. Chen
, Sabato Marco Siniscalchi, Chin-Hui Lee:
Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models. 2759-2763 - Shoko Tsujimura, Kazumasa Yamamoto, Seiichi Nakagawa:
Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides. 2764-2768 - Myung Jong Kim, Beiming Cao, Ted Mau, Jun Wang:
Multiview Representation Learning via Deep CCA for Silent Speech Recognition. 2769-2773 - Kate M. Knill, Mark J. F. Gales, Konstantinos Kyriakopoulos, Anton Ragni, Yu Wang:
Use of Graphemic Lexicons for Spoken Language Assessment. 2774-2778 - Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Ya Li:
Distilling Knowledge from an Ensemble of Models for Punctuation Prediction. 2779-2783 - Ernest Pusateri, Bharat Ram Ambati, Elizabeth Brooks, Ondrej Plátek
, Donald McAllaster, Venki Nagesha:
A Mostly Data-Driven Approach to Inverse Text Normalization. 2784-2788 - Wenda Chen
, Mark Hasegawa-Johnson, Nancy F. Chen
, Boon Pang Lim:
Mismatched Crowdsourcing from Multiple Annotator Languages for Recognizing Zero-Resourced Languages: A Nullspace Clustering Approach. 2789-2793 - William Gale, Sarangarajan Parthasarathy:
Experiments in Character-Level Neural Network Models for Punctuation. 2794-2798 - Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen:
Multi-Channel Apollo Mission Speech Transcripts Calibration. 2799-2803
Speaker and Language Recognition Applications
- Mitchell McLaren, Luciana Ferrer, Diego Castán, Aaron Lawson:
Calibration Approaches for Language Detection. 2804-2808 - Sarith Fernando, Vidhyasaharan Sethu
, Eliathamby Ambikairajah
, Julien Epps
:
Bidirectional Modelling for Short Duration Language Identification. 2809-2813 - Peng Shen, Xugang Lu, Sheng Li
, Hisashi Kawai:
Conditional Generative Adversarial Nets Classifier for Spoken Language Identification. 2814-2818 - Antonio Miguel, Jorge Llombart, Alfonso Ortega
, Eduardo Lleida
:
Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition. 2819-2823 - Sungrack Yun, Hye Jin Jang, Taesu Kim:
Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster Labels. 2824-2828 - Ignacio Viñals, Alfonso Ortega
, Jesús Antonio Villalba López, Antonio Miguel, Eduardo Lleida
:
Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering. 2829-2833 - Miquel India
, José A. R. Fonollosa, Javier Hernando:
LSTM Neural Network-Based Speaker Segmentation Using Acoustic and Language Modelling. 2834-2838 - Adrien Gresse, Mickael Rouvier, Richard Dufour, Vincent Labatut
, Jean-François Bonastre:
Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization. 2839-2843 - Moez Ajili, Jean-François Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn:
Homogeneity Measure Impact on Target and Non-Target Trials in Forensic Voice Comparison. 2844-2848 - Yosef A. Solewicz, Michael Jessen, David van der Vloed:
Null-Hypothesis LLR: A Proposal for Forensic Automatic Speaker Recognition. 2849-2853 - Gang Liu, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao:
The Opensesame NIST 2016 Speaker Recognition Evaluation System. 2854-2858 - Nagendra Kumar, Rohan Kumar Das
, Sarfaraz Jelil, Dhanush B. K, H. Kashyap, K. Sri Rama Murty
, Sriram Ganapathy, Rohit Sinha
, S. R. Mahadeva Prasanna:
IITG-Indigo System for NIST 2016 SRE Challenge. 2859-2863 - Abhinav Misra, Shivesh Ranjan, John H. L. Hansen:
Locally Weighted Linear Discriminant Analysis for Robust Speaker Verification. 2864-2868 - Suwon Shon, Seongkyu Mun, Hanseok Ko
:
Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition. 2869-2873
Spoken Document Processing
- Shane Settle, Keith D. Levin, Herman Kamper
, Karen Livescu
:
Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings. 2874-2878 - Daisuke Kaneko, Ryota Konno, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh:
Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection. 2879-2883 - Yuri Y. Khokhlov, Natalia A. Tomashenko
, Ivan Medennikov
, Aleksei Romanenko
:
Fast and Accurate OOV Decoder on High-Level Features. 2884-2888 - Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang
, Berlin Chen:
Exploring the Use of Significant Words Language Modeling for Spoken Document Retrieval. 2889-2893 - Hiroto Tasaki, Tomoyosi Akiba:
Incorporating Acoustic Features for Spontaneous Speech Driven Content Retrieval. 2894-2898 - Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-yi Lee, Lin-Shan Lee:
Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification. 2899-2903 - Masatoshi Tsuchiya
, Ryo Minamiguchi:
Automatic Alignment Between Classroom Lecture Utterances and Slide Components. 2904-2908 - Paula Lopez-Otero, Laura Docío Fernández
, Carmen García-Mateo
:
Compensating Gender Variability in Query-by-Example Search on Speech Using Voice Conversion. 2909-2913 - Anjishnu Kumar, Pavankumar Reddy Muddireddy, Markus Dreyer, Björn Hoffmeister:
Zero-Shot Learning Across Heterogeneous Overlapping Domains. 2914-2918 - Emiru Tsunoo, Peter Bell, Steve Renals
:
Hierarchical Recurrent Neural Network for Story Segmentation. 2919-2923 - Abdessalam Bouchekif, Delphine Charlet, Géraldine Damnati, Nathalie Camelin
, Yannick Estève:
Evaluating Automatic Topic Segmentation as a Segment Retrieval Task. 2924-2928 - Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon:
Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps. 2929-2933 - Jan Svec
, Josef V. Psutka, Lubos Smídl
, Jan Trmal:
A Relevance Score Estimation for Spoken Term Detection Based on RNN-Generated Pronunciation Embeddings. 2934-2938
Speech Intelligibility
- Laura Fernández Gallardo, Sebastian Möller, John Beerends:
Predicting Automatic Speech Recognition Performance Over Communication Channels from Instrumental Speech Quality and Intelligibility Scores. 2939-2943 - Cassia Valentini-Botinhao, Junichi Yamagishi:
Speech Intelligibility in Cars: The Effect of Speaking Style, Noise and Listener Age. 2944-2948 - Katsuhiko Yamamoto
, Toshio Irino, Toshie Matsui, Shoko Araki
, Keisuke Kinoshita
, Tomohiro Nakatani:
Predicting Speech Intelligibility Using a Gammachirp Envelope Distortion Index Based on the Signal-to-Distortion Ratio. 2949-2953 - Yafan Chen, Yong Xu, Jun Yang
:
Intelligibilities of Mandarin Chinese Sentences with Spectral "Holes". 2954-2957 - Lauren Ward, Ben G. Shirley, Yan Tang
, William J. Davies:
The Effect of Situation-Specific Non-Speech Acoustic Cues on the Intelligibility of Speech in Noise. 2958-2962 - Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan
, Jesper Jensen:
On the Use of Band Importance Weighting in the Short-Time Objective Intelligibility Measure. 2963-2967 - Constantin Spille, Bernd T. Meyer:
Listening in the Dips: Comparing Relevant Features for Speech Recognition in Humans and Machines. 2968-2972
Articulatory and Acoustic Phonetics
- Kosuke Sugai:
Mental Representation of Japanese Mora; Focusing on its Intrinsic Duration. 2973-2977 - Jia Ying, Christopher Carignan
, Jason A. Shaw
, Michael I. Proctor
, Donald Derrick
,