default search action
20th Interspeech 2019: Graz, Austria
- Gernot Kubin, Zdravko Kacic:
20th Annual Conference of the International Speech Communication Association, Interspeech 2019, Graz, Austria, September 15-19, 2019. ISCA 2019
ISCA Medal 2019 Keynote Speech
- Keiichi Tokuda:
Statistical Approach to Speech Synthesis: Past, Present and Future.
Spoken Language Processing for Children’s Speech
- Fei Wu, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:
Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network. 1-5 - Gary Yeung, Abeer Alwan:
A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of fo in Vowel Perception. 6-10 - Robert Gale, Liu Chen, Jill Dolata, Jan P. H. van Santen, Meysam Asgari:
Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques. 11-15 - Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals:
Ultrasound Tongue Imaging for Diarization and Alignment of Child Speech Therapy Sessions. 16-20 - Anastassia Loukina, Beata Beigman Klebanov, Patrick L. Lange, Yao Qian, Binod Gyawali, Nitin Madnani, Abhinav Misra, Klaus Zechner, Zuowei Wang, John Sabatini:
Automated Estimation of Oral Reading Fluency During Summer Camp e-Book Reading with MyTurnToRead. 21-25 - Vanessa Lopes, João Magalhães, Sofia Cavaco:
Sustained Vowel Game: A Computer Therapy Game for Children with Dysphonia. 26-30
Dynamics of Emotional Speech Exchanges in Multimodal Communication
- Anna Esposito, Terry Amorese, Marialucia Cuciniello, Maria Teresa Riviello, Antonietta Maria Esposito, Alda Troncone, Gennaro Cordasco:
The Dependability of Voice on Elders' Acceptance of Humanoid Agents. 31-35 - Oliver Niebuhr, Uffe Schjoedt:
God as Interlocutor - Real or Imaginary? Prosodic Markers of Dialogue Speech and Expected Efficacy in Spoken Prayer. 36-40 - Michelle Cohn, Georgia Zellou:
Expressiveness Influences Human Vocal Alignment Toward voice-AI. 41-45 - Catherine Lai, Beatrice Alex, Johanna D. Moore, Leimin Tian, Tatsuro Hori, Gianpiero Francesca:
Detecting Topic-Oriented Speaker Stance in Conversational Speech. 46-50 - Jilt Sebastian, Piero Pierucci:
Fusion Techniques for Utterance-Level Emotion Recognition Combining Speech and Transcripts. 51-55 - Marvin Rajwadi, Cornelius Glackin, Julie A. Wall, Gérard Chollet, Nigel Cannings:
Explaining Sentiment Classification. 56-60 - Ricardo Kleinlein, Cristina Luna Jiménez, Juan Manuel Montero, Zoraida Callejas, Fernando Fernández Martínez:
Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models. 61-65
End-to-End Speech Recognition
- Ralf Schlüter:
Survey Talk: Modeling in Automatic Speech Recognition: Beyond Hidden Markov Models. - Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Müller, Alex Waibel:
Very Deep Self-Attention Networks for End-to-End Speech Recognition. 66-70 - Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde:
Jasper: An End-to-End Convolutional Neural Acoustic Model. 71-75 - Niko Moritz, Takaaki Hori, Jonathan Le Roux:
Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition. 76-80 - Yonatan Belinkov, Ahmed Ali, James R. Glass:
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition. 81-85
Speech Enhancement: Multi-Channel
- Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa:
Multi-Channel Speech Enhancement Using Time-Domain Convolutional Denoising Autoencoder. 86-90 - Kristina Tesch, Robert Rehr, Timo Gerkmann:
On Nonlinear Spatial Filtering in Multichannel Speech Enhancement. 91-95 - Juan M. Martín-Doñas, Jens Heitkaemper, Reinhold Haeb-Umbach, Angel M. Gomez, Antonio M. Peinado:
Multi-Channel Block-Online Source Extraction Based on Utterance Adaptation. 96-100 - Saeed Bagheri, Daniele Giacobello:
Exploiting Multi-Channel Speech Presence Probability in Parametric Multi-Channel Wiener Filter. 101-105 - Masahito Togami, Tatsuya Komatsu:
Variational Bayesian Multi-Channel Speech Dereverberation Under Noisy Environments with Probabilistic Convolutive Transfer Function. 106-110 - Tomohiro Nakatani, Keisuke Kinoshita:
Simultaneous Denoising and Dereverberation for Low-Latency Applications Using Frame-by-Frame Online Unified Convolutional Beamformer. 111-115
Speech Production: Individual Differences and the Brain
- Cathryn Snyder, Michelle Cohn, Georgia Zellou:
Individual Variation in Cognitive Processing Style Predicts Differences in Phonetic Imitation of Device and Human Voices. 116-120 - Aravind Illa, Prasanta Kumar Ghosh:
An Investigation on Speaker Specific Articulatory Synthesis with Speaker Independent Articulatory Inversion. 121-125 - Xiaohan Zhang, Chongke Bi, Kiyoshi Honda, Wenhuan Lu, Jianguo Wei:
Individual Difference of Relative Tongue Size and its Acoustic Effects. 126-130 - Tsukasa Yoshinaga, Kazunori Nozaki, Shigeo Wada:
Individual Differences of Airflow and Sound Generation in the Vocal Tract of Sibilant /s/. 131-135 - Shashwat Uttam, Yaman Kumar, Dhruva Sahrawat, Mansi Aggarwal, Rajiv Ratn Shah, Debanjan Mahata, Amanda Stent:
Hush-Hush Speak: Speech Reconstruction Using Silent Videos. 136-140 - Pramit Saha, Muhammad Abdul-Mageed, Sidney S. Fels:
SPEAK YOUR MIND! Towards Imagined Speech Recognition with Hierarchical Deep Learning. 141-145
Speech Signal Characterization 1
- Yu-An Chung, Wei-Ning Hsu, Hao Tang, James R. Glass:
An Unsupervised Autoregressive Model for Speech Representation Learning. 146-150 - Feng Huang, Péter Balázs:
Harmonic-Aligned Frame Mask Based on Non-Stationary Gabor Transform with Application to Content-Dependent Speaker Comparison. 151-155 - Gurunath Reddy M., K. Sreenivasa Rao, Partha Pratim Das:
Glottal Closure Instants Detection from Speech Signal by Deep Features Extracted from Raw Speech and Linear Prediction Residual. 156-160 - Santiago Pascual, Mirco Ravanelli, Joan Serrà, Antonio Bonafonte, Yoshua Bengio:
Learning Problem-Agnostic Speech Representations from Multiple Self-Supervised Tasks. 161-165 - Bhanu Teja Nellore, Sri Harsha Dumpala, Karan Nathwani, Suryakanth V. Gangashetty:
Excitation Source and Vocal Tract System Based Acoustic Features for Detection of Nasals in Continuous Speech. 166-170 - Aggelina Chatziagapi, Georgios Paraskevopoulos, Dimitris Sgouropoulos, Georgios Pantazopoulos, Malvina Nikandrou, Theodoros Giannakopoulos, Athanasios Katsamanis, Alexandros Potamianos, Shrikanth Narayanan:
Data Augmentation Using GANs for Speech Emotion Recognition. 171-175
Neural Waveform Generation
- Zvi Kons, Slava Shechtman, Alexander Sorin, Carmel Rabinovitz, Ron Hoory:
High Quality, Lightweight and Adaptable TTS Using LPCNet. 176-180 - Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal:
Towards Achieving Robust Universal Neural Vocoding. 181-185 - Paarth Neekhara, Chris Donahue, Miller S. Puckette, Shlomo Dubnov, Julian J. McAuley:
Expediting TTS Synthesis with Adversarial Vocoding. 186-190 - Ahmed Mustafa, Arijit Biswas, Christian Bergler, Julia Schottenhamml, Andreas K. Maier:
Analysis by Adversarial Synthesis - A Novel Approach for Speech Vocoding. 191-195 - Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation. 196-200 - Xiaohai Tian, Eng Siong Chng, Haizhou Li:
A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data. 201-205
Attention Mechanism for Speaker State Recognition
- Kyu Jeong Han, Ramon Prieto, Tao Ma:
Survey Talk: When Attention Meets Speech Applications: Speech & Speaker Recognition Perspective. - Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins, Haishuai Wang, Björn W. Schuller:
Attention-Enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition. 206-210 - Jeng-Lin Li, Chi-Chun Lee:
Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile. 211-215 - Ascensión Gallardo-Antolín, Juan Manuel Montero:
A Saliency-Based Attention LSTM Model for Cognitive Load Classification from Speech. 216-220 - Adria Mallol-Ragolta, Ziping Zhao, Lukas Stappen, Nicholas Cummins, Björn W. Schuller:
A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews. 221-225
ASR Neural Network Training — 1
- Andrea Carmantini, Peter Bell, Steve Renals:
Untranscribed Web Audio for Low Resource Speech Recognition. 226-230 - Christoph Lüscher, Eugen Beck, Kazuki Irie, Markus Kitza, Wilfried Michel, Albert Zeyer, Ralf Schlüter, Hermann Ney:
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention. 231-235 - Naoyuki Kanda, Shota Horiguchi, Ryoichi Takashima, Yusuke Fujita, Kenji Nagamatsu, Shinji Watanabe:
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition. 236-240 - Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong:
Speaker Adaptation for Attention-Based End-to-End Speech Recognition. 241-245 - Peidong Wang, Jia Cui, Chao Weng, Dong Yu:
Large Margin Training for Attention Based End-to-End Speech Recognition. 246-250 - Khoi-Nguyen C. Mac, Xiaodong Cui, Wei Zhang, Michael Picheny:
Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition. 251-255
Zero-Resource ASR
- Benjamin Milde, Chris Biemann:
SparseSpeech: Unsupervised Acoustic Unit Discovery with Memory-Augmented Sequence Autoencoders. 256-260 - Lucas Ondel, Hari Krishna Vydana, Lukás Burget, Jan Cernocký:
Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery. 261-265 - Yosuke Higuchi, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa:
Speaker Adversarial Training of DPGMM-Based Feature Extractor for Zero-Resource Languages. 266-270 - Manasa Prasad, Daan van Esch, Sandy Ritchie, Jonas Fromseier Mortensen:
Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data. 271-275 - Emmanuel Azuh, David Harwath, James R. Glass:
Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio. 276-280 - Siyuan Feng, Tan Lee:
Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation. 281-285
Sociophonetics
- Shawn L. Nissen, Sharalee Blunck, Anita Dromey, Christopher Dromey:
Listeners' Ability to Identify the Gender of Preadolescent Children in Different Linguistic Contexts. 286-290 - Wiebke Ahlers, Philipp Meer:
Sibilant Variation in New Englishes: A Comparative Sociophonetic Study of Trinidadian and American English /s(tr)/-Retraction. 291-295 - Michele Gubian, Jonathan Harrington, Mary Stevens, Florian Schiel, Paul Warren:
Tracking the New Zealand English NEAR/SQUARE Merger Using Functional Principal Components Analysis. 296-300 - Iona Gessinger, Bernd Möbius, Bistra Andreeva, Eran Raveh, Ingmar Steiner:
Phonetic Accommodation in a Wizard-of-Oz Experiment: Intonation and Segments. 301-305 - Oliver Niebuhr, Jan Michalsky:
PASCAL and DPA: A Pilot Study on Using Prosodic Competence Scores to Predict Communicative Skills for Team Working and Public Speaking. 306-310 - Jan Michalsky, Heike Schoormann, Thomas Schultze:
Towards the Prosody of Persuasion in Competitive Negotiation. The Relationship Between f0 and Negotiation Success in Same Sex Sales Tasks. 311-315
Resources – Annotation – Evaluation
- Jacob Sager, Ravi Shankar, Jacob Reinhold, Archana Venkataraman:
VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English. 316-320 - Jia Xin Koh, Aqilah Mislan, Kevin Khoo, Brian Ang, Wilson Ang, Charmaine Ng, Ying-Ying Tan:
Building the Singapore English National Speech Corpus. 321-325 - Michael Picheny, Zoltán Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon:
Challenging the Boundaries of Speech Recognition: The MALACH Corpus. 326-330 - Pravin Bhaskar Ramteke, Sujata Supanekar, Pradyoth Hegde, Hanna Nelson, Venkataraja Aithal, Shashidhar G. Koolagudi:
NITK Kids' Speech Corpus. 331-335 - Ahmed Ali, Salam Khalifa, Nizar Habash:
Towards Variability Resistant Dialectal Speech Evaluation. 336-340 - Per Fallgren, Zofia Malisz, Jens Edlund:
How to Annotate 100 Hours in 45 Minutes. 341-345
Speaker Recognition and Diarization
- Mireia Díez, Lukás Burget, Shuai Wang, Johan Rohdin, Jan Cernocký:
Bayesian HMM Based x-Vector Clustering for Speaker Diarization. 346-350 - Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka:
Unleashing the Unused Potential of i-Vectors Enabled by GPU Acceleration. 351-355 - Suwon Shon, Najim Dehak, Douglas A. Reynolds, James R. Glass:
MCE 2018: The 1st Multi-Target Speaker Detection and Identification Challenge Evaluation. 356-360 - Zhifu Gao, Yan Song, Ian McLoughlin, Pengcheng Li, Yiheng Jiang, Li-Rong Dai:
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System. 361-365 - Qingjian Lin, Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras:
LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization. 366-370 - Joon Son Chung, Bong-Jin Lee, Icksang Han:
Who Said That?: Audio-Visual Speaker Diarisation of Real-World Meetings. 371-375 - Jiamin Xie, Leibny Paola García-Perera, Daniel Povey, Sanjeev Khudanpur:
Multi-PLDA Diarization on Children's Speech. 376-380 - Alan McCree, Gregory Sell, Daniel Garcia-Romero:
Speaker Diarization Using Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings. 381-385 - Omid Ghahabi, Volker Fischer:
Speaker-Corrupted Embeddings for Online Speaker Diarization. 386-390 - Tae Jin Park, Kyu Jeong Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis G. Georgiou, Shrikanth Narayanan:
Speaker Diarization with Lexical Information. 391-395 - Laurent El Shafey, Hagen Soltau, Izhak Shafran:
Joint Speech Recognition and Speaker Diarization via Sequence Transduction. 396-400 - Sandro Cumani:
Normal Variance-Mean Mixtures for Unsupervised Score Calibration. 401-405 - Hitoshi Yamamoto, Kong Aik Lee, Koji Okabe, Takafumi Koshinaka:
Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding. 406-410 - Emre Yilmaz, Adem Derinel, Kun Zhou, Henk van den Heuvel, Niko Brummer, Haizhou Li, David A. van Leeuwen:
Large-Scale Speaker Diarization of Radio Broadcast Archives. 411-415 - Harishchandra Dubey, Abhijeet Sangwan, John H. L. Hansen:
Toeplitz Inverse Covariance Based Robust Speaker Clustering for Naturalistic Audio Streams. 416-420
ASR for Noisy and Far-Field Speech
- György Kovács, László Tóth, Dirk Van Compernolle, Marcus Liwicki:
Examining the Combination of Multi-Band Processing and Channel Dropout for Robust Speech Recognition. 421-425 - Meet H. Soni, Ashish Panda:
Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition. 426-430 - Long Wu, Hangting Chen, Li Wang, Pengyuan Zhang, Yonghong Yan:
Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning. 431-435 - Ji Ming, Danny Crookes:
Full-Sentence Correlation: A Method to Handle Unpredictable Noise for Robust Speech Recognition. 436-440 - Meet H. Soni, Sonal Joshi, Ashish Panda:
Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions. 441-445 - Shashi Kumar, Shakti P. Rath:
Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition. 446-450 - Marc Delcroix, Shinji Watanabe, Tsubasa Ochiai, Keisuke Kinoshita, Shigeki Karita, Atsunori Ogawa, Tomohiro Nakatani:
End-to-End SpeakerBeam for Single Channel Target Speech Recognition. 451-455 - I-Hung Hsu, Ayush Jaiswal, Premkumar Natarajan:
NIESR: Nuisance Invariant End-to-End Speech Recognition. 456-460 - Takahito Suzuki, Jun Ogata, Takashi Tsunakawa, Masafumi Nishida, Masafumi Nishimura:
Knowledge Distillation for Throat Microphone Speech Recognition. 461-465 - Jian Wu, Yong Xu, Shi-Xiong Zhang, Lianwu Chen, Meng Yu, Lei Xie, Dong Yu:
Improved Speaker-Dependent Separation for CHiME-5 Challenge. 466-470 - Peidong Wang, Ke Tan, DeLiang Wang:
Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling. 471-475 - Peidong Wang, DeLiang Wang:
Enhanced Spectral Features for Distortion-Independent Acoustic Modeling. 476-480 - Paarth Neekhara, Shehzeen Hussain, Prakhar Pandey, Shlomo Dubnov, Julian J. McAuley, Farinaz Koushanfar:
Universal Adversarial Perturbations for Speech Recognition Systems. 481-485 - Masakiyo Fujimoto, Hisashi Kawai:
One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features. 486-490 - Bin Liu, Shuai Nie, Shan Liang, Wenju Liu, Meng Yu, Lianwu Chen, Shouye Peng, Changliang Li:
Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition. 491-495
Social Signals Detection and Speaker Traits Analysis
- Zixiaofan Yang, Bingyan Hu, Julia Hirschberg:
Predicting Humor by Learning from Time-Aligned Comments. 496-500 - Yoan Dinkov, Ahmed Ali, Ivan Koychev, Preslav Nakov:
Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information. 501-505 - Guozhen An, Rivka Levitan:
Mitigating Gender and L1 Differences to Improve State and Trait Recognition. 506-509 - Felix Weninger, Yang Sun, Junho Park, Daniel Willett, Puming Zhan:
Deep Learning Based Mandarin Accent Identification for Accent Robust ASR. 510-514 - Gábor Gosztolya, László Tóth:
Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection from Audio Data. 515-519 - Hiroki Mori, Tomohiro Nagata, Yoshiko Arimoto:
Conversational and Social Laughter Synthesis with WaveNet. 520-523 - Bogdan Ludusan, Petra Wagner:
Laughter Dynamics in Dyadic Conversations. 524-528 - Khiet P. Truong, Jürgen Trouvain, Michel-Pierre Jansen:
Towards an Annotation Scheme for Complex Laughter in Speech Corpora. 529-533 - Alice Baird, Shahin Amiriparian, Nicholas Cummins, Sarah Sturmbauer, Johanna Janson, Eva-Maria Meßner, Harald Baumeister, Nicolas Rohleder, Björn W. Schuller:
Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test. 534-538 - Alice Baird, Eduardo Coutinho, Julia Hirschberg, Björn W. Schuller:
Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results. 539-543 - Oliver Niebuhr, Kerstin Fischer:
Do not Hesitate! - Unless You Do it Shortly or Nasally: How the Phonetics of Filled Pauses Determine Their Subjective Frequency and Perceived Speaker Performance. 544-548 - Juan Camilo Vásquez-Correa, Philipp Klumpp, Juan Rafael Orozco-Arroyave, Elmar Nöth:
Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech. 549-553
Applications of Language Technologies
- Ching-Ting Chang, Shun-Po Chuang, Hung-yi Lee:
Code-Switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation. 554-558