


default search action
21st Interspeech 2020: Shanghai, China
- Helen Meng, Bo Xu, Thomas Fang Zheng:

21st Annual Conference of the International Speech Communication Association, Interspeech 2020, Virtual Event, Shanghai, China, October 25-29, 2020. ISCA 2020
Keynote 1
- Janet B. Pierrehumbert:

The cognitive status of simple and complex models.
ASR Neural Network Architectures I
- Jinyu Li

, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu:
On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition. 1-5 - Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin

:
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition. 6-10 - Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf:

Contextual RNN-T for Open Domain ASR. 11-15 - Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu Jeong Han, Tao Lei, Tao Ma:

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition. 16-20 - Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae-sun Seo:

Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity. 21-25 - Timo Lohrenz

, Tim Fingscheidt
:
BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example. 26-30 - Ngoc-Quan Pham, Thanh-Le Ha, Tuan-Nam Nguyen, Thai-Son Nguyen, Elizabeth Salesky

, Sebastian Stüker, Jan Niehues
, Alex Waibel:
Relative Positional Encoding for Speech Recognition and Direct Translation. 31-35 - Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:

Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers. 36-40 - Takashi Fukuda, Samuel Thomas:

Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework. 41-45 - Jinhwan Park, Wonyong Sung:

Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition. 46-50
Multi-Channel Speech Enhancement
- Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Zhanlei Yang, Longshuai Xiao:

Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-Channel Speech Recognition. 51-55 - Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Chao Weng, Jianming Liu, Dong Yu:

Neural Spatio-Temporal Beamformer for Target Speech Separation. 56-60 - Li Li, Kazuhito Koishida, Shoji Makino

:
Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis. 61-65 - Meng Yu, Xuan Ji, Bo Wu, Dan Su, Dong Yu:

End-to-End Multi-Look Keyword Spotting. 66-70 - Weilong Huang, Jinwei Feng:

Differential Beamforming for Uniform Circular Array with Directional Microphones. 71-75 - Jun Qi, Hu Hu

, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee:
Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement. 76-80 - Jian Wu, Zhuo Chen, Jinyu Li

, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie:
An End-to-End Architecture of Online Multi-Channel Speech Separation. 81-85 - Yu Nakagome, Masahito Togami, Tetsuji Ogawa

, Tetsunori Kobayashi:
Mentoring-Reverse Mentoring for Unsupervised Multi-Channel Speech Source Separation. 86-90 - Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita

, Hiroshi Sawada, Shoko Araki:
Computationally Efficient and Versatile Framework for Joint Optimization of Blind Speech Separation and Dereverberation. 91-95 - Yanhui Tu, Jun Du, Lei Sun, Feng Ma, Jia Pan, Chin-Hui Lee:

A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge. 96-100
Speech Processing in the Brain
- Youssef Hmamouche, Laurent Prévot

, Magalie Ochs, Thierry Chaminade
:
Identifying Causal Relationships Between Behavior and Local Brain Activity During Natural Conversation. 101-105 - Di Zhou

, Gaoyan Zhang, Jianwu Dang, Shuang Wu, Zhuo Zhang:
Neural Entrainment to Natural Speech Envelope Based on Subject Aligned EEG Signals. 106-110 - Chongyuan Lian, Tianqi Wang, Mingxiao Gu, Manwa L. Ng, Feiqi Zhu, Lan Wang, Nan Yan:

Does Lexical Retrieval Deteriorate in Patients with Mild Cognitive Impairment? Analysis of Brain Functional Network Will Tell. 111-115 - Zhen Fu, Jing Chen:

Congruent Audiovisual Speech Enhances Cortical Envelope Tracking During Auditory Selective Attention. 116-120 - Lei Wang, Ed X. Wu

, Fei Chen:
Contribution of RMS-Level-Based Speech Segments to Target Speech Decoding Under Noisy Conditions. 121-124 - Bin Zhao, Jianwu Dang, Gaoyan Zhang, Masashi Unoki

:
Cortical Oscillatory Hierarchy for Natural Sentence Processing. 125-129 - Louis ten Bosch

, Kimberley Mulder, Lou Boves:
Comparing EEG Analyses with Different Epoch Alignments in an Auditory Lexical Decision Experiment. 130-134 - Tanya Talkar, Sophia Yuditskaya, James R. Williamson, Adam C. Lammert, Hrishikesh Rao, Daniel J. Hannon, Anne T. O'Brien, Gloria Vergara-Diaz, Richard DeLaura, Douglas E. Sturim, Gregory A. Ciccarelli, Ross Zafonte, Jeff Palmer, Paolo Bonato, Thomas F. Quatieri:

Detection of Subclinical Mild Traumatic Brain Injury (mTBI) Through Speech and Gait. 135-139
Speech Signal Representation
- Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Félix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, Yinnon Haviv:

Towards Learning a Universal Non-Semantic Representation of Speech. 140-144 - Rajeev Rajan, Aiswarya Vinod Kumar, Ben P. Babu:

Poetic Meter Classification Using i-Vector-MTF Fusion. 145-149 - Wang Dai, Jinsong Zhang, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin, Yanlu Xie:

Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism. 150-154 - Na Hu, Berit Janssen

, Judith Hanssen, Carlos Gussenhoven, Aoju Chen
:
Automatic Analysis of Speech Prosody in Dutch. 155-159 - Adrien Gresse, Mathias Quillot, Richard Dufour, Jean-François Bonastre:

Learning Voice Representation Using Knowledge Distillation for Automatic Voice Casting. 160-164 - B. Yegnanarayana, Joseph M. Anand, Vishala Pannala

:
Enhancing Formant Information in Spectrographic Display of Speech. 165-169 - Michael Gump, Wei-Ning Hsu, James R. Glass:

Unsupervised Methods for Evaluating Speech Representations. 170-174 - Dung N. Tran, Uros Batricevic, Kazuhito Koishida:

Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments. 175-179 - Amrith Setlur, Barnabás Póczos, Alan W. Black:

Nonlinear ISA with Auxiliary Variables for Learning Speech Representations. 180-184 - Hirotoshi Takeuchi, Kunio Kashino, Yasunori Ohishi, Hiroshi Saruwatari:

Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals. 185-189
Speech Synthesis: Neural Waveform Generation I
- Yang Ai, Zhen-Hua Ling:

Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders. 190-194 - Qiao Tian, Zewang Zhang, Heng Lu, Ling-Hui Chen, Shan Liu

:
FeatherWave: An Efficient High-Fidelity Neural Vocoder with Multi-Band Linear Prediction. 195-199 - Jinhyeok Yang, Junmo Lee, Young-Ik Kim, Hoon-Young Cho, Injung Kim

:
VocGAN: A High-Fidelity Real-Time Vocoder with a Hierarchically-Nested Adversarial Network. 200-204 - Hiroki Kanagawa, Yusuke Ijima:

Lightweight LPCNet-Based Neural Vocoder with Tensor Decomposition. 205-209 - Po-Chun Hsu, Hung-yi Lee:

WG-WaveNet: Real-Time High-Fidelity Speech Synthesis Without GPU. 210-214 - Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber:

What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS. 215-219 - Vadim Popov, Stanislav Kamenev, Mikhail A. Kudinov, Sergey Repyevsky, Tasnima Sadekova, Vitalii Bushaev, Vladimir Kryzhanovskiy, Denis Parkhomenko:

Fast and Lightweight On-Device TTS with Tacotron2 and LPCNet. 220-224 - Wei Song, Guanghui Xu, Zhengchen Zhang, Chao Zhang, Xiaodong He, Bowen Zhou:

Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed. 225-229 - Sébastien Le Maguer, Naomi Harte

:
Can Auditory Nerve Models Tell us What's Different About WaveNet Vocoded Speech? 230-234 - Dipjyoti Paul, Yannis Pantazis, Yannis Stylianou:

Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions. 235-239 - Zhijun Liu, Kuan Chen, Kai Yu:

Neural Homomorphic Vocoder. 240-244
Automatic Speech Recognition for Non-Native Children’s Speech
- Roberto Gretter, Marco Matassoni, Daniele Falavigna, Keelan Evanini, Chee Wee Leong:

Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech. 245-249 - Tien-Hong Lo, Fu-An Chao, Shi-Yan Weng, Berlin Chen:

The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR Challenge. 250-254 - Kate M. Knill, Linlin Wang, Yu Wang, Xixin Wu, Mark J. F. Gales:

Non-Native Children's Automatic Speech Recognition: The INTERSPEECH 2020 Shared Task ALTA Systems. 255-259 - Hemant Kumar Kathania, Mittul Singh

, Tamás Grósz
, Mikko Kurimo:
Data Augmentation Using Prosody and False Starts to Recognize Non-Native Children's Speech. 260-264 - Mostafa Ali Shahin

, Renée Lu, Julien Epps, Beena Ahmed:
UNSW System Description for the Shared Task on Automatic Speech Recognition for Non-Native Children's Speech. 265-268
Speaker Diarization
- Shota Horiguchi, Yusuke Fujita, Shinji Watanabe

, Yawen Xue, Kenji Nagamatsu:
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors. 269-273 - Ivan Medennikov

, Maxim Korenevsky, Tatiana Prisyach, Yuri Y. Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov
, Andrei Andrusenko
, Ivan Podluzhny, Aleksandr Laptev
, Aleksei Romanenko
:
Target-Speaker Voice Activity Detection: A Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario. 274-278 - Hagai Aronowitz, Weizhong Zhu, Masayuki Suzuki, Gakuto Kurata, Ron Hoory:

New Advances in Speaker Diarization. 279-283 - Qingjian Lin, Yu Hou, Ming Li:

Self-Attentive Similarity Measurement Strategies in Speaker Diarization. 284-288 - Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno:

Speaker Attribution with Voice Profiles by Graph-Based Semi-Supervised Learning. 289-293 - Prachi Singh, Sriram Ganapathy:

Deep Self-Supervised Hierarchical Clustering for Speaker Diarization. 294-298 - Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman:

Spot the Conversation: Speaker Diarisation in the Wild. 299-303
Noise Robust and Distant Speech Recognition
- Wangyou Zhang, Yanmin Qian:

Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition. 304-308 - Zhihao Du, Jiqing Han, Xueliang Zhang:

Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition. 309-313 - Antoine Bruguier, Ananya Misra, Arun Narayanan, Rohit Prabhavalkar

:
Anti-Aliasing Regularization in Stacking Layers. 314-318 - Andrei Andrusenko

, Aleksandr Laptev
, Ivan Medennikov
:
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription. 319-323 - Wangyou Zhang, Aswin Shanmugam Subramanian

, Xuankai Chang, Shinji Watanabe
, Yanmin Qian:
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming. 324-328 - Xinchi Qiu, Titouan Parcollet, Mirco Ravanelli, Nicholas D. Lane, Mohamed Morchid:

Quaternion Neural Networks for Multi-Channel Distant Speech Recognition. 329-333 - Hangting Chen, Pengyuan Zhang, Qian Shi, Zuozhen Liu:

Improved Guided Source Separation Integrated with a Strong Back-End for the CHiME-6 Dinner Party Scenario. 334-338 - Dongmei Wang, Zhuo Chen, Takuya Yoshioka:

Neural Speech Separation Using Spatially Distributed Microphones. 339-343 - Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu:

Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones. 344-348 - Jack Deadman

, Jon Barker:
Simulating Realistically-Spatialised Simultaneous Speech Using Video-Driven Speaker Detection and the CHiME-5 Dataset. 349-353
Speech in Multimodality
- Catarina Botelho

, Lorenz Diener, Dennis Küster
, Kevin Scheck, Shahin Amiriparian
, Björn W. Schuller, Tanja Schultz
, Alberto Abad
, Isabel Trancoso
:
Toward Silent Paralinguistics: Speech-to-EMG - Retrieving Articulatory Muscle Activity from Speech. 354-358 - Jiaxuan Zhang, Sarah Ita Levitan

, Julia Hirschberg:
Multimodal Deception Detection Using Automatically Extracted Acoustic, Visual, and Lexical Features. 359-363 - Zexu Pan, Zhaojie Luo

, Jichen Yang, Haizhou Li:
Multi-Modal Attention for Speech Emotion Recognition. 364-368 - Guang Shen, Riwei Lai, Rui Chen, Yu Zhang, Kejia Zhang, Qilong Han, Hongtao Song:

WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion Recognition. 369-373 - Ming Chen, Xudong Zhao:

A Multi-Scale Fusion Framework for Bimodal Speech Emotion Recognition. 374-378 - Pengfei Liu

, Kun Li, Helen Meng:
Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition. 379-383 - Aparna Khare

, Srinivas Parthasarathy, Shiva Sundaram:
Multi-Modal Embeddings Using Multi-Task Learning for Emotion Recognition. 384-388 - Jeng-Lin Li, Chi-Chun Lee

:
Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition Network. 389-393 - Zheng Lian

, Jianhua Tao, Bin Liu, Jian Huang, Zhanlei Yang, Rongjun Li:
Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition. 394-398
Speech, Language, and Multimodal Resources
- Bo Yang, Xianlong Tan, Zhengmao Chen, Bing Wang, Min Ruan, Dan Li, Zhongping Yang, Xiping Wu, Yi Lin

:
ATCSpeech: A Multilingual Pilot-Controller Speech Corpus from Real Air Traffic Control Environment. 399-403 - Alexander Gutkin

, Isin Demirsahin, Oddur Kjartansson, Clara Rivera, Kólá Túbosún:
Developing an Open-Source Corpus of Yoruba Speech. 404-408 - Jung-Woo Ha

, Kihyun Nam, Jingu Kang, Sang-Woo Lee, Sohee Yang, Hyunhoon Jung, Hyeji Kim, Eunmi Kim, Soojin Kim, Hyun Ah Kim, Kyoungtae Doh, Chan Kyu Lee, Nako Sung, Sunghun Kim:
ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers. 409-413 - Yanhong Wang, Huan Luan, Jiahong Yuan, Bin Wang, Hui Lin:

LAIX Corpus of Chinese Learner English: Towards a Benchmark for L2 English ASR. 414-418 - Vikram Ramanarayanan:

Design and Development of a Human-Machine Dialog Corpus for the Automated Assessment of Conversational English Proficiency. 419-423 - Si Ioi Ng, Cymie Wing-Yee Ng, Jiarui Wang, Tan Lee

, Kathy Yuet-Sheung Lee, Michael Chi-Fai Tong
:
CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment. 424-428 - Katri Leino, Juho Leinonen, Mittul Singh

, Sami Virpioja
, Mikko Kurimo:
FinChat: Corpus and Evaluation Setup for Finnish Chat Conversations on Everyday Topics. 429-433 - Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas:

DiPCo - Dinner Party Corpus. 434-436 - Bo Wang, Yue Wu, Niall Taylor, Terry J. Lyons, Maria Liakata, Alejo J. Nevado-Holgado, Kate E. A. Saunders:

Learning to Detect Bipolar Disorder and Borderline Personality Disorder with Language and Speech in Non-Clinical Interviews. 437-441 - Andreas Kirkedal, Marija Stepanovic, Barbara Plank

:
FT Speech: Danish Parliament Speech Corpus. 442-446
Language Recognition
- Raphaël Duroselle, Denis Jouvet, Irina Illina:

Metric Learning Loss Functions to Reduce Domain Mismatch in the x-Vector Space for Language Recognition. 447-451 - Zheng Li, Miao Zhao, Jing Li, Yiming Zhi, Lin Li, Qingyang Hong:

The XMUSPEECH System for the AP19-OLR Challenge. 452-456 - Zheng Li, Miao Zhao, Jing Li, Lin Li, Qingyang Hong:

On the Usage of Multi-Feature Integration for Speaker Verification and Language Identification. 457-461 - Shammur A. Chowdhury

, Ahmed Ali, Suwon Shon, James R. Glass:
What Does an End-to-End Dialect Identification Model Learn About Non-Dialectal Information? 462-466 - Matias Lindgren, Tommi Jauhiainen

, Mikko Kurimo:
Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets. 467-471 - Aitor Arronte Alvarez

, Elsayed Sabry Abdelaal Issa:
Learning Intonation Pattern Embeddings for Arabic Dialect Identification. 472-476 - Badr M. Abdullah, Tania Avgustinova, Bernd Möbius, Dietrich Klakow:

Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages. 477-481
Speech Processing and Analysis
- Noé Tits, Kevin El Haddad, Thierry Dutoit:

ICE-Talk: An Interface for a Controllable Expressive Talking Machine. 482-483 - Mathieu Hu, Laurent Pierron, Emmanuel Vincent, Denis Jouvet:

Kaldi-Web: An Installation-Free, On-Device Speech Recognition System. 484-485 - Amelia C. Kelly, Eleni Karamichali, Armin Saeb, Karel Veselý, Nicholas Parslow, Agape Deng, Arnaud Letondor, Robert O'Regan, Qiru Zhou:

Soapbox Labs Verification Platform for Child Speech. 486-487 - Amelia C. Kelly, Eleni Karamichali, Armin Saeb, Karel Veselý, Nicholas Parslow, Gloria Montoya Gomez, Agape Deng, Arnaud Letondor, Niall Mullally, Adrian Hempel, Robert O'Regan, Qiru Zhou:

SoapBox Labs Fluency Assessment Platform for Child Speech. 488-489 - Baybars Külebi, Alp Öktem, Alex Peiró Lilja, Santiago Pascual, Mireia Farrús:

CATOTRON - A Neural Text-to-Speech System in Catalan. 490-491 - Vikram Ramanarayanan, Oliver Roesler, Michael Neumann, David Pautler, Doug Habberstad, Andrew Cornish, Hardik Kothare, Vignesh Murali, Jackson Liscombe, Dirk Schnelle-Walka, Patrick L. Lange, David Suendermann-Oeft:

Toward Remote Patient Monitoring of Speech, Video, Cognitive and Respiratory Biomarkers Using Multimodal Dialog Technology. 492-493 - Baihan Lin, Xinxin Zhang:

VoiceID on the Fly: A Speaker Recognition System that Learns from Scratch. 494-495
Speech Emotion Recognition I
- Zhao Ren, Jing Han, Nicholas Cummins

, Björn W. Schuller:
Enhancing Transferability of Black-Box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition Models. 496-500 - Han Feng, Sei Ueno, Tatsuya Kawahara

:
End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model. 501-505 - Bo-Hao Su, Chun-Min Chang, Yun-Shao Lin, Chi-Chun Lee

:
Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network. 506-510 - Adria Mallol-Ragolta, Nicholas Cummins

, Björn W. Schuller:
An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition. 511-515 - Kusha Sridhar, Carlos Busso

:
Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition. 516-520 - Siddique Latif

, Muhammad Asim, Rajib Rana, Sara Khalifa, Raja Jurdak
, Björn W. Schuller:
Augmenting Generative Adversarial Networks for Speech Emotion Recognition. 521-525 - Vipula Dissanayake

, Haimo Zhang, Mark Billinghurst, Suranga Nanayakkara
:
Speech Emotion Recognition 'in the Wild' Using an Autoencoder. 526-530 - Shuiyang Mao, Pak-Chung Ching, Tan Lee

:
Emotion Profile Refinery for Speech Emotion Classification. 531-535 - Sung-Lin Yeh, Yun-Shao Lin, Chi-Chun Lee

:
Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation. 536-540
ASR Neural Network Architectures and Training I
- Kshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu:

Fast and Slow Acoustic Model. 541-545 - Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix

:
Self-Distillation for Improving CTC-Transformer-Based ASR Systems. 546-550 - Zoltán Tüske, George Saon

, Kartik Audhkhasi, Brian Kingsbury:
Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard. 551-555 - Zhehuai Chen, Andrew Rosenberg, Yu Zhang, Gary Wang, Bhuvana Ramabhadran, Pedro J. Moreno:

Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection. 556-560 - Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur:

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR. 561-565 - Keyu An, Hongyu Xiang, Zhijian Ou:

CAT: A CTC-CRF Based ASR Toolkit Bridging the Hybrid and the End-to-End Approaches Towards Data Efficiency and Low Latency. 566-570 - Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara

:
CTC-Synchronous Training for Monotonic Attention Model. 571-575 - Brady Houston, Katrin Kirchhoff:

Continual Learning for Multi-Dialect Acoustic Models. 576-580 - Xingcheng Song, Zhiyong Wu, Yiheng Huang, Dan Su, Helen Meng:

SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition. 581-585
Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation
- Adriana Stan

:
RECOApy: Data Recording, Pre-Processing and Phonetic Transcription for End-to-End Speech-Based Applications. 586-590 - Yuan Shangguan, Kate Knister, Yanzhang He, Ian McGraw, Françoise Beaufays:

Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer. 591-595 - Zhe Liu, Fuchun Peng:

Statistical Testing on ASR Performance via Blockwise Bootstrap. 596-600 - Anil Ramakrishna, Shrikanth Narayanan:

Sentence Level Estimation of Psycholinguistic Norms Using Joint Multidimensional Annotations. 601-605 - Kai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhijie Yan:

Neural Zero-Inflated Quality Estimation Model for Automatic Speech Recognition System. 606-610 - Alejandro Woodward, Clara Bonnín, Issey Masuda, David Varas, Elisenda Bou-Balust, Juan Carlos Riveiro:

Confidence Measures in Encoder-Decoder Models for Speech Recognition. 611-615 - Ahmed Ali, Steve Renals:

Word Error Rate Estimation Without ASR Output: e-WER2. 616-620 - Bogdan Ludusan, Petra Wagner

:
An Evaluation of Manual and Semi-Automatic Laughter Annotation. 621-625 - Joshua L. Martin

, Kevin Tang
:
Understanding Racial Disparities in Automatic Speech Recognition: The Case of Habitual "be". 626-630
Phonetics and Phonology
- Georgia Zellou, Rebecca Scarborough, Renee Kemp:

Secondary Phonetic Cues in the Production of the Nasal Short-a System in California English. 631-635 - Louis-Marie Lorin, Lorenzo Maselli

, Léo Varnet
, Maria Giavazzi:
Acoustic Properties of Strident Fricatives at the Edges: Implications for Consonant Discrimination. 636-640 - Mingqiong Luo:

Processes and Consequences of Co-Articulation in Mandarin V1N.(C2)V2 Context: Phonology and Phonetics. 641-645 - Yang Yue, Fang Hu:

Voicing Distinction of Obstruents in the Hangzhou Wu Chinese Dialect. 646-650 - Lei Wang:

The Phonology and Phonetics of Kaifeng Mandarin Vowels. 651-655 - Margaret Zellers

, Barbara Schuppler
:
Microprosodic Variability in Plosives in German and Austrian German. 656-660 - Jing Huang

, Feng-fan Hsieh
, Yueh-Chin Chang:
Er-Suffixation in Southwestern Mandarin: An EMA and Ultrasound Study. 661-665 - Yinghao Li, Jinghua Zhang:

Electroglottographic-Phonetic Study on Korean Phonation Induced by Tripartite Plosives in Yanbian Korean. 666-670 - Nicholas Wilkins, Max Cordes Galbraith

, Ifeoma Nwogu
:
Modeling Global Body Configurations in American Sign Language. 671-675
Topics in ASR I
- Hang Li, Siyuan Chen, Julien Epps:

Augmenting Turn-Taking Prediction with Wearable Eye Activity During Conversation. 676-680 - Weiyi Lu, Yi Xu, Peng Yang, Belinda Zeng:

CAM: Uninteresting Speech Detector. 681-685 - Diamantino Caseiro, Pat Rondon, Quoc-Nam Le The, Petar S. Aleksic:

Mixed Case Contextual ASR Using Capitalization Masks. 686-690 - Huanru Henry Mao, Shuyang Li

, Julian J. McAuley
, Garrison W. Cottrell
:
Speech Recognition and Multi-Speaker Diarization of Long Conversations. 691-695 - Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng:

Investigation of Data Augmentation Techniques for Disordered Speech Recognition. 696-700 - Wenqi Wei, Jianzong Wang

, Jiteng Ma, Ning Cheng, Jing Xiao:
A Real-Time Robot-Based Auxiliary System for Risk Evaluation of COVID-19 Infection. 701-705 - David S. Barbera, Mark A. Huckvale, Victoria Fleming

, Emily Upton, Henry Coley-Fisher, Ian Shaw, William H. Latham, Alexander P. Leff
, Jenny Crinion
:
An Utterance Verification System for Word Naming Therapy in Aphasia. 706-710 - Shansong Liu, Xurong Xie, Jianwei Yu, Shoukang Hu, Mengzhe Geng, Rongfeng Su, Shi-Xiong Zhang, Xunying Liu, Helen Meng:

Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition. 711-715 - Binghuai Lin, Liyuan Wang:

Joint Prediction of Punctuation and Disfluency in Speech Transcripts. 716-720 - Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Ye Bai, Cunhang Fan:

Focal Loss for Punctuation Prediction. 721-725
Large-Scale Evaluation of Short-Duration Speaker Verification
- Zhuxin Chen, Yue Lin:

Improving X-Vector and PLDA for Text-Dependent Speaker Verification. 726-730 - Hossein Zeinali, Kong Aik Lee

, Jahangir Alam, Lukás Burget
:
SdSV Challenge 2020: Large-Scale Evaluation of Short-Duration Speaker Verification. 731-735 - Tao Jiang, Miao Zhao, Lin Li, Qingyang Hong:

The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020. 736-740 - Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim:

Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020. 741-745 - Tanel Alumäe

, Jörgen Valk:
The TalTech Systems for the Short-Duration Speaker Verification Challenge 2020. 746-750 - Peng Shen, Xugang Lu, Hisashi Kawai:

Investigation of NICT Submission for Short-Duration Speaker Verification Challenge 2020. 751-755 - Jenthe Thienpondt

, Brecht Desplanques, Kris Demuynck:
Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization. 756-760 - Alicia Lozano-Diez

, Anna Silnova, Bhargav Pulugundla, Johan Rohdin, Karel Veselý, Lukás Burget
, Oldrich Plchot, Ondrej Glembek, Ondrej Novotný, Pavel Matejka:
BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020. 761-765 - Vijay Ravi, Ruchao Fan

, Amber Afshan, Huanhua Lu, Abeer Alwan:
Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification. 766-770
Voice Conversion and Adaptation I
- Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai:

Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning. 771-775 - Shaojin Ding, Guanlong Zhao, Ricardo Gutierrez-Osuna:

Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition. 776-780 - Yanping Li, Dongxiang Xu, Yan Zhang, Yang Wang, Binbin Chen:

Non-Parallel Many-to-Many Voice Conversion with PSR-StarGAN. 781-785 - Adam Polyak, Lior Wolf, Yaniv Taigman:

TTS Skins: Speaker Conversion via ASR. 786-790 - Zining Zhang

, Bingsheng He, Zhenjie Zhang:
GAZEV: GAN-Based Zero-Shot Voice Conversion Over Non-Parallel Speech Corpus. 791-795 - Tao Wang, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Rongxiu Zhong:

Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation. 796-800 - Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman:

Unsupervised Cross-Domain Singing Voice Conversion. 801-805 - Tatsuma Ishihara, Daisuke Saito:

Attention-Based Speaker Embeddings for One-Shot Voice Conversion. 806-810 - Jian Cong, Shan Yang, Lei Xie, Guoqiao Yu, Guanglu Wan:

Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training. 811-815
Acoustic Event Detection
- Sixin Hong, Yuexian Zou, Wenwu Wang:

Gated Multi-Head Attention Pooling for Weakly Labelled Audio Tagging. 816-820 - Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang:

Environmental Sound Classification with Parallel Temporal-Spectral Attention. 821-825 - Luyu Wang, Kazuya Kawakami, Aäron van den Oord:

Contrastive Predictive Coding of Audio with an Adversary. 826-830 - Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos

:
Memory Controlled Sequential Self Attention for Sound Recognition. 831-835 - Donghyeon Kim

, Jaihyun Park
, David K. Han, Hanseok Ko
:
Dual Stage Learning Based Dynamic Time-Frequency Mask Generation for Audio Event Classification. 836-840 - Xu Zheng, Yan Song, Jie Yan, Li-Rong Dai, Ian McLoughlin

, Lin Liu:
An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection. 841-845 - Chieh-Chi Kao, Bowen Shi, Ming Sun, Chao Wang:

A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling. 846-850 - Chun-Chieh Chang, Chieh-Chi Kao, Ming Sun, Chao Wang:

Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging. 851-855 - In Young Park, Hong Kook Kim:

Two-Stage Polyphonic Sound Event Detection Based on Faster R-CNN-LSTM with Multi-Token Connectionist Temporal Classification. 856-860 - Amit Jindal, Narayanan Elavathur Ranganatha, Aniket Didolkar, Arijit Ghosh Chowdhury, Di Jin, Ramit Sawhney, Rajiv Ratn Shah:

SpeechMix - Augmenting Deep Sound Recognition Using Hidden Space Interpolations. 861-865
Spoken Language Understanding I
- Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann:

End-to-End Neural Transformer Based Spoken Language Understanding. 866-870 - Chen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao

, Lu Chen, Kai Yu:
Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding. 871-875 - Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow:

Speech to Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces. 876-880 - Pavel Denisov, Ngoc Thang Vu:

Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning. 881-885 - Srikanth Raj Chetupalli

, Sriram Ganapathy:
Context Dependent RNNLM for Automatic Transcription of Conversations. 886-890 - Yusheng Tian, Philip John Gorinski:

Improving End-to-End Speech-to-Intent Classification with Reptile. 891-895 - Won-Ik Cho, Donghyun Kwak, Ji Won Yoon, Nam Soo Kim:

Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation. 896-900 - Weitong Ruan, Yaroslav Nechaev, Luoxin Chen, Chengwei Su, Imre Kiss:

Towards an ASR Error Robust Spoken Language Understanding System. 901-905 - Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras:

End-to-End Spoken Language Understanding Without Full Transcripts. 906-910 - Karthik Gopalakrishnan, Behnam Hedayatnia, Longshaokan Wang, Yang Liu, Dilek Hakkani-Tür

:
Are Neural Open-Domain Dialog Systems Robust to Speech Recognition Errors in the Dialog History? An Empirical Study. 911-915
DNN Architectures for Speaker Recognition
- Shaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha, Zhangyang Wang:

AutoSpeech: Neural Architecture Search for Speaker Recognition. 916-920 - Ya-Qi Yu, Wu-Jun Li:

Densely Connected Time Delay Neural Network for Speaker Verification. 921-925 - Siqi Zheng, Yun Lei, Hongbin Suo:

Phonetically-Aware Coupled Network For Short Duration Text-Independent Speaker Verification. 926-930 - Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim:

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention. 931-935 - Yanfeng Wu, Chenkai Guo, Hongcan Gao, Xiaolei Hou, Jing Xu:

Vector-Based Attentive Pooling for Text-Independent Speaker Verification. 936-940 - Pooyan Safari, Miquel India

, Javier Hernando:
Self-Attention Encoding and Pooling for Speaker Recognition. 941-945 - Ruiteng Zhang, Jianguo Wei

, Wenhuan Lu, Longbiao Wang, Meng Liu, Lin Zhang
, Jiayu Jin, Junhai Xu
:
ARET: Aggregated Residual Extended Time-Delay Neural Networks for Speaker Verification. 946-950 - Hanyi Zhang, Longbiao Wang, Yunchun Zhang, Meng Liu, Kong Aik Lee

, Jianguo Wei
:
Adversarial Separation Network for Speaker Recognition. 951-955 - Jingyu Li, Tan Lee

:
Text-Independent Speaker Verification with Dual Attention Network. 956-960 - Xiaoyang Qu, Jianzong Wang

, Jing Xiao:
Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification. 961-965
ASR Model Training and Strategies
- Chao Weng, Chengzhu Yu, Jia Cui, Chunlei Zhang, Dong Yu:

Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition. 966-970 - Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li

, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou:
Semantic Mask for Transformer Based End-to-End Speech Recognition. 971-975 - Frank Zhang, Yongqiang Wang, Xiaohui Zhang, Chunxi Liu, Yatharth Saraf, Geoffrey Zweig:

Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces. 976-980 - Dimitrios Dimitriadis, Ken'ichi Kumatani, Robert Gmyr, Yashesh Gaur, Sefik Emre Eskimez:

A Federated Approach in Training Acoustic Models. 981-985 - Imran A. Sheikh

, Emmanuel Vincent, Irina Illina:
On Semi-Supervised LF-MMI Training of Acoustic Models with Limited Data. 986-990 - Yixin Gao, Noah D. Stein, Chieh-Chi Kao, Yunliang Cai, Ming Sun, Tao Zhang, Shiv Naga Prasad Vitaladevuni:

On Front-End Gain Invariant Modeling for Wake Word Spotting. 991-995 - Fenglin Ding, Wu Guo, Bin Gu, Zhen-Hua Ling, Jun Du:

Unsupervised Regularization-Based Adaptive Training for Speech Recognition. 996-1000 - Erfan Loweimi

, Peter Bell, Steve Renals:
On the Robustness and Training Dynamics of Raw Waveform Models. 1001-1005 - Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Y. Hannun, Gabriel Synnaeve, Ronan Collobert:

Iterative Pseudo-Labeling for Speech Recognition. 1006-1010
Speech Annotation and Speech Assessment
- Naoko Kawamura, Tatsuya Kitamura, Kenta Hamada:

Smart Tube: A Biofeedback System for Vocal Training and Therapy Through Tube Phonation. 1011-1012 - Seong Choi, Seunghoon Jeong, Jeewoo Yoon, Migyeong Yang, Minsam Ko, Eunil Park, Jinyoung Han, Munyoung Lee, Seonghee Lee:

VCTUBE : A Library for Automatic Speech Data Annotation. 1013-1014 - Yanlu Xie, Xiaoli Feng, Boxue Li, Jinsong Zhang, Yujia Jin:

A Mandarin L2 Learning APP with Mispronunciation Detection and Feedback. 1015-1016 - Tejas Udayakumar, Kinnera Saranu, Mayuresh Sanjay Oak, Ajit Ashok Saunshikhar, Sandip Shriram Bapat:

Rapid Enhancement of NLP Systems by Acquisition of Data in Correlated Domains. 1017-1018 - Ke Shi, Kye Min Tan, Richeng Duan, Siti Umairah Md. Salleh, Nur Farah Ain Suhaimi, Rajan Vellu, Ngoc Thuy Huong Helen Thai, Nancy F. Chen:

Computer-Assisted Language Learning System: Automatic Speech Evaluation for Children Learning Malay and Tamil. 1019-1020 - Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:

Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU. 1021-1022 - Xiaoli Feng, Yanlu Xie, Yayue Deng, Boxue Li:

A Dynamic 3D Pronunciation Teaching Model Based on Pronunciation Attributes and Anatomy. 1023-1024 - Naoki Kimura, Zixiong Su, Takaaki Saeki:

End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge. 1025-1026
Cross/Multi-Lingual and Code-Switched Speech Recognition
- Jialu Li

, Mark Hasegawa-Johnson:
Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous? 1027-1031 - Martha Yifiru Tachbelie, Solomon Teferra Abate, Tanja Schultz

:
Development of Multilingual ASR Using GlobalPhone for Less-Resourced Languages: The Case of Ethiopian Languages. 1032-1036 - Wenxin Hou, Yue Dong, Bairong Zhuang, Longfei Yang, Jiatong Shi, Takahiro Shinozaki:

Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning. 1037-1041 - Xinyuan Zhou, Emre Yilmaz, Yanhua Long, Yijie Li, Haizhou Li:

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition. 1042-1046 - Solomon Teferra Abate, Martha Yifiru Tachbelie, Tanja Schultz

:
Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages. 1047-1051 - Yushi Hu, Shane Settle, Karen Livescu

:
Multilingual Jointly Trained Acoustic and Written Word Embeddings. 1052-1056 - Chia-Yu Li, Ngoc Thang Vu:

Improving Code-Switching Language Modeling with Artificially Generated Texts Using Cycle-Consistent Adversarial Networks. 1057-1061 - Xinhui Hu, Qi Zhang, Lei Yang, Binbin Gu, Xinkang Xu:

Data Augmentation for Code-Switch Language Modeling by Fusing Multiple Text Generation Methods. 1062-1066 - Xinxing Li, Edward Lin:

A 43 Language Multilingual Punctuation Prediction Neural Network Model. 1067-1071 - Jisung Wang, Jihwan Kim, Sangki Kim, Yeha Lee:

Exploring Lexicon-Free Modeling Units for End-to-End Korean and Korean-English Code-Switching Speech Recognition. 1072-1075
Anti-Spoofing and Liveness Detection
- Patrick von Platen, Fei Tao, Gökhan Tür

:
Multi-Task Siamese Neural Network for Improving Replay Attack Detection. 1076-1080 - Kosuke Akimoto, Seng Pei Liew, Sakiko Mishima, Ryo Mizushima, Kong Aik Lee

:
POCO: A Voice Spoofing and Liveness Detection Corpus Based on Pop Noise. 1081-1085 - Hongji Wang, Heinrich Dinkel, Shuai Wang, Yanmin Qian, Kai Yu:

Dual-Adversarial Domain Adaptation for Generalized Replay Attack Detection. 1086-1090 - Hye-jin Shim, Hee-Soo Heo, Jee-weon Jung, Ha-Jin Yu:

Self-Supervised Pre-Training with Acoustic Configurations for Replay Spoofing Detection. 1091-1095 - Abhijith Girish, Adharsh Sabu, Akshay Prasannan Latha, Rajeev Rajan:

Competency Evaluation in Voice Mimicking Using Acoustic Cues. 1096-1100 - Zhenzong Wu, Rohan Kumar Das, Jichen Yang, Haizhou Li:

Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks. 1101-1105 - Hemlata Tak, Jose Patino, Andreas Nautsch

, Nicholas W. D. Evans, Massimiliano Todisco:
Spoofing Attack Detection Using the Non-Linear Fusion of Sub-Band Classifiers. 1106-1110 - Prasanth Parasu, Julien Epps, Kaavya Sriskandaraja

, Gajan Suthokumar:
Investigating Light-ResNet Architecture for Spoofing Detection Under Mismatched Conditions. 1111-1115 - Zhenchun Lei, Yingen Yang, Changhong Liu, Jihua Ye:

Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection. 1116-1120
Noise Reduction and Intelligibility
- Hendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante-B., Pascal Zobel, Andreas Maier:

Lightweight Online Noise Reduction on Embedded Devices Using Hierarchical Recurrent Neural Networks. 1121-1125 - Marco Tagliasacchi, Yunpeng Li, Karolis Misiunas, Dominik Roblek:

SEANet: A Multi-Modal Speech Enhancement Network. 1126-1130 - Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo

, Hsin-Min Wang
:
Lite Audio-Visual Speech Enhancement. 1131-1135 - Christian Bergler, Manuel Schmitt, Andreas Maier, Simeon Smeele, Volker Barth, Elmar Nöth:

ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale Communication. 1136-1140 - Hao Zhang, DeLiang Wang:

A Deep Learning Approach to Active Noise Control. 1141-1145 - Tuan Dinh, Alexander Kain, Kris Tjaden:

Improving Speech Intelligibility Through Speaker Dependent and Independent Spectral Style Conversion. 1146-1150 - Mathias Bach Pedersen, Morten Kolbæk

, Asger Heidemann Andersen, Søren Holdt Jensen, Jesper Jensen:
End-to-End Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks. 1151-1155 - Kenichi Arai

, Shoko Araki, Atsunori Ogawa, Keisuke Kinoshita
, Tomohiro Nakatani, Toshio Irino:
Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-Based ASR System. 1156-1160 - Ali Abavisani, Mark Hasegawa-Johnson:

Automatic Estimation of Intelligibility Measure for Consonants in Speech. 1161-1165 - Viet Anh Trinh, Michael I. Mandel:

Large Scale Evaluation of Importance Maps in Automatic Speech Recognition. 1166-1170
Acoustic Scene Classification
- Jixiang Li, Chuming Liang, Bo Zhang, Zhao Wang, Fei Xiang, Xiangxiang Chu:

Neural Architecture Search on Acoustic Scene Classification. 1171-1175 - Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Seung-bin Kim, Ha-Jin Yu:

Acoustic Scene Classification Using Audio Tagging. 1176-1180 - Liwen Zhang

, Jiqing Han, Ziqiang Shi:
ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification. 1181-1185 - Jivitesh Sharma

, Ole-Christoffer Granmo, Morten Goodwin:
Environment Sound Classification Using Multiple Feature Channels and Attention Based Deep Convolutional Neural Network. 1186-1190 - Weimin Wang, Weiran Wang, Ming Sun, Chao Wang:

Acoustic Scene Analysis with Multi-Head Attention Networks. 1191-1195 - Hu Hu

, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee:
Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification. 1196-1200 - Hu Hu

, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee:
An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances. 1201-1205 - Dhanunjaya Varma Devalraju

, H. Muralikrishna
, Padmanabhan Rajan, Dileep Aroor Dinesh:
Attention-Driven Projections for Soundscape Classification. 1206-1210 - Panagiotis Tzirakis, Alexander Shiarella, Robert M. Ewers, Björn W. Schuller:

Computer Audition for Continuous Rainforest Occupancy Monitoring: The Case of Bornean Gibbons' Call Detection. 1211-1215 - Zuzanna Kwiatkowska, Beniamin Kalinowski, Michal Kosmider, Krzysztof Rykaczewski:

Deep Learning Based Open Set Acoustic Scene Classification. 1216-1220
Singing Voice Computing and Processing in Music
- Orazio Angelini, Alexis Moinet, Kayoko Yanagisawa, Thomas Drugman:

Singing Synthesis: With a Little Help from my Attention. 1221-1225 - Yusong Wu, Shengchen Li, Chengzhu Yu, Heng Lu, Chao Weng, Liqiang Zhang, Dong Yu:

Peking Opera Synthesis via Duration Informed Attention Network. 1226-1230 - Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Chunlei Zhang, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu:

DurIAN-SC: Duration Informed Attention Network Based Singing Voice Conversion System. 1231-1235 - Yuanbo Hou, Frank K. Soong, Jian Luan, Shengchen Li:

Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music. 1236-1240 - Haohe Liu, Lei Xie, Jian Wu, Geng Yang:

Channel-Wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music. 1241-1245
Acoustic Model Adaptation for ASR
- Samik Sadhu, Hynek Hermansky

:
Continual Learning in Automatic Speech Recognition. 1246-1250 - Genshun Wan, Jia Pan, Qingran Wang, Jianqing Gao, Zhongfu Ye:

Speaker Adaptive Training for Speech Recognition Based on Attention-Over-Attention Mechanism. 1251-1255 - Yan Huang, Jinyu Li

, Lei He, Wenning Wei, William Gale, Yifan Gong:
Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator. 1256-1260 - Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma:

Speech Transformer with Speaker Aware Persistent Memory. 1261-1265 - Fenglin Ding, Wu Guo, Bin Gu, Zhen-Hua Ling, Jun Du:

Adaptive Speaker Normalization for CTC-Based Speech Recognition. 1266-1270 - Akhil Mathur, Nadia Berthouze

, Nicholas D. Lane:
Unsupervised Domain Adaptation Under Label Space Mismatch for Speech Classification. 1271-1275 - Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung:

Learning Fast Adaptation on Cross-Accented Speech Recognition. 1276-1280 - Kartik Khandelwal, Preethi Jyothi, Abhijeet Awasthi, Sunita Sarawagi:

Black-Box Adaptation of ASR for Accented Speech. 1281-1285 - M. A. Tugtekin Turan

, Emmanuel Vincent, Denis Jouvet:
Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation. 1286-1290 - Ryu Takeda

, Kazunori Komatani:
Frame-Wise Online Unsupervised Adaptation of DNN-HMM Acoustic Model from Perspective of Robust Adaptive Filtering. 1291-1295
Singing and Multimodal Synthesis
- Jie Wu, Jian Luan:

Adversarially Trained Multi-Singer Sequence-to-Sequence Singing Synthesizer. 1296-1300 - JinHong Lu, Hiroshi Shimodaira:

Prediction of Head Motion from Speech Waveforms with a Canonical-Correlation-Constrained Autoencoder. 1301-1305 - Peiling Lu, Jie Wu, Jian Luan, Xu Tan

, Li Zhou:
XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System. 1306-1310 - Ravindra Yadav, Ashish Sardana, Vinay P. Namboodiri

, Rajesh M. Hegde:
Stochastic Talking Face Generation Using Latent Distribution Matching. 1311-1315 - Da-Yi Wu, Yi-Hsuan Yang:

Speech-to-Singing Conversion Based on Boundary Equilibrium GAN. 1316-1320 - Shunsuke Goto, Kotaro Onishi, Yuki Saito, Kentaro Tachibana, Koichiro Mori:

Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image. 1321-1325 - Wentao Wang, Yan Wang, Jianqing Sun, Qingsong Liu, Jiaen Liang, Teng Li:

Speech Driven Talking Head Generation via Attentional Landmarks Based Representation. 1326-1330
Intelligibility-Enhancing Speech Modification
- Marc René Schädler:

Optimization and Evaluation of an Intelligibility-Improving Signal Processing Approach (IISPA) for the Hurricane Challenge 2.0 with FADE. 1331-1335 - Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi:

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise Using Generative Adversarial Network-Based Metric Learning. 1336-1340 - Jan Rennies, Henning F. Schepker, Cassia Valentini-Botinhao, Martin Cooke:

Intelligibility-Enhancing Speech Modifications - The Hurricane Challenge 2.0. 1341-1345 - Olympia Simantiraki, Martin Cooke:

Exploring Listeners' Speech Rate Preferences. 1346-1350 - Felicitas Bederna, Henning F. Schepker, Christian Rollwage, Simon Doclo, Arne Pusch, Jörg Bitzer

, Jan Rennies:
Adaptive Compressive Onset-Enhancement for Improved Speech Intelligibility in Noise and Reverberation. 1351-1355 - Carol Chermaz, Simon King:

A Sound Engineering Approach to Near End Listening Enhancement. 1356-1360 - Dipjyoti Paul, P. V. Muhammed Shifas, Yannis Pantazis, Yannis Stylianou:

Enhancing Speech Intelligibility in Text-To-Speech Synthesis Using Speaking Style Conversion. 1361-1365
Human Speech Production I
- Takayuki Arai:

Two Different Mechanisms of Movable Mandible for Vocal-Tract Model with Flexible Tongue. 1366-1370 - Qiang Fang:

Improving the Performance of Acoustic-to-Articulatory Inversion by Removing the Training Loss of Noncritical Portions of Articulatory Channels Dynamically. 1371-1375 - Aravind Illa, Prasanta Kumar Ghosh:

Speaker Conditioned Acoustic-to-Articulatory Inversion Using x-Vectors. 1376-1380 - Zirui Liu, Yi Xu, Feng-fan Hsieh

:
Coarticulation as Synchronised Sequential Target Approximation: An EMA Study. 1381-1385 - Jônatas Santos

, Jugurta Montalvão, Israel Santos:
Improved Model for Vocal Folds with a Polyp with Potential Application. 1386-1390 - Lin Zhang

, Kiyoshi Honda, Jianguo Wei
, Seiji Adachi:
Regional Resonance of the Lower Vocal Tract and its Contribution to Speaker Characteristics. 1391-1395 - Renuka Mannem, Navaneetha Gaddam, Prasanta Kumar Ghosh:

Air-Tissue Boundary Segmentation in Real Time Magnetic Resonance Imaging Video Using 3-D Convolutional Neural Network. 1396-1400 - Tilak Purohit

, Prasanta Kumar Ghosh:
An Investigation of the Virtual Lip Trajectories During the Production of Bilabial Stops and Nasal at Different Speaking Rates. 1401-1405
Targeted Source Separation
- Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li:

SpEx+: A Complete Time Domain Speaker Extraction Network. 1406-1410 - Tingle Li, Qingjian Lin, Yuanyuan Bao, Ming Li:

Atss-Net: Target Speaker Separation via Attention-Based Neural Network. 1411-1415 - Leyuan Qu, Cornelius Weber, Stefan Wermter:

Multimodal Target Speech Separation with Voice and Face References. 1416-1420 - Zining Zhang

, Bingsheng He, Zhenjie Zhang:
X-TaSNet: Robust and Accurate Time-Domain Speaker Extraction Network. 1421-1425 - Chenda Li, Yanmin Qian:

Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation. 1426-1430 - Yunzhe Hao, Jiaming Xu, Jing Shi, Peng Zhang, Lei Qin, Bo Xu:

A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments. 1431-1435 - Jianshu Zhao, Shengzhou Gao, Takahiro Shinozaki:

Time-Domain Target-Speaker Speech Separation with Waveform-Based Speaker Embedding. 1436-1440 - Tsubasa Ochiai, Marc Delcroix

, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita
, Shoko Araki:
Listen to What You Want: Neural Network-Based Universal Sound Selector. 1441-1445 - Masahiro Yasuda, Yasunori Ohishi, Yuma Koizumi, Noboru Harada

:
Crossmodal Sound Retrieval Based on Specific Target Co-Occurrence Denoted with Weak Labels. 1446-1450 - Jiahao Xu, Kun Hu, Chang Xu, Tran Duc Chung

, Zhiyong Wang:
Speaker-Aware Monaural Speech Separation. 1451-1455
Keynote 2
- Barbara G. Shinn-Cunningham:

Brain networks enabling speech perception in everyday settings.
Speech Translation and Multilingual/Multimodal Learning
- Liming Wang, Mark Hasegawa-Johnson:

A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions. 1456-1460 - Maha Elbayad, Laurent Besacier, Jakob Verbeek:

Efficient Wait-k Models for Simultaneous Machine Translation. 1461-1465 - Ha Nguyen, Fethi Bougares, Natalia A. Tomashenko

, Yannick Estève, Laurent Besacier:
Investigating Self-Supervised Pre-Training for End-to-End Speech Translation. 1466-1470 - Marco Gaido

, Mattia Antonino Di Gangi
, Matteo Negri
, Mauro Cettolo, Marco Turchi:
Contextualized Translation of Automatically Segmented Speech. 1471-1475 - Juan Miguel Pino, Qiantong Xu, Xutai Ma, Mohammad Javad Dousti, Yun Tang:

Self-Training for End-to-End Speech Translation. 1476-1480 - Marcello Federico, Yogesh Virkar, Robert Enyedi, Roberto Barra-Chicote:

Evaluating and Optimizing Prosodic Alignment for Automatic Dubbing. 1481-1485 - Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, David Harwath, James R. Glass:

Pair Expansion for Learning Multilingual Semantic Embeddings Using Disjoint Visually-Grounded Speech Audio Datasets. 1486-1490 - Anne Wu, Changhan Wang, Juan Miguel Pino, Jiatao Gu:

Self-Supervised Representations Improve End-to-End Speech Translation. 1491-1495
Speaker Recognition I
- Jee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu:

Improved RawNet with Feature Map Scaling for Text-Independent Speaker Verification Using Raw Waveforms. 1496-1500 - Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim:

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances. 1501-1505 - Bin Gu, Wu Guo, Fenglin Ding, Zhen-Hua Ling, Jun Du:

An Adaptive X-Vector Model for Text-Independent Speaker Verification. 1506-1510 - Santi Prieto, Alfonso Ortega Giménez

, Iván López-Espejo
, Eduardo Lleida:
Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions. 1511-1515 - Aaron Nicolson

, Kuldip K. Paliwal
:
Sum-Product Networks for Robust Automatic Speaker Identification. 1516-1520 - Seung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu:

Segment Aggregation for Short Utterances Speaker Verification Using Raw Waveforms. 1521-1525 - Shai Rozenberg, Hagai Aronowitz, Ron Hoory:

Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition. 1526-1529 - Yanpei Shi, Qiang Huang, Thomas Hain

:
Speaker Re-Identification with Speaker Dependent Speech Enhancement. 1530-1534 - Galina Lavrentyeva, Marina Volkova, Anastasia Avdeeva

, Sergey Novoselov, Artem Gorlanov, Tseren Andzhukaev, Artem Ivanov, Alexander Kozlov:
Blind Speech Signal Quality Estimation for Speaker Verification Systems. 1535-1539 - Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng:

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification. 1540-1544
Spoken Language Understanding II
- Vaishali Pal

, Fabien Guillot, Manish Shrivastava
, Jean-Michel Renders, Laurent Besacier:
Modeling ASR Ambiguity for Neural Dialogue State Tracking. 1545-1549 - Haoyu Wang, Shuyan Dong, Yue Liu, James Logan, Ashish Kumar Agrawal, Yang Liu:

ASR Error Correction with Augmented Transformer for Entity Retrieval. 1550-1554 - Xueli Jia, Jianzong Wang

, Zhiyong Zhang, Ning Cheng, Jing Xiao:
Large-Scale Transfer Learning for Low-Resource Spoken Language Understanding. 1555-1559 - Judith Gaspers, Quynh Ngoc Thi Do, Fabian Triefenbach:

Data Balancing for Boosting Performance of Low-Frequency Classes in Spoken Language Understanding. 1560-1564 - Yu Wang, Yilin Shen, Hongxia Jin:

An Interactive Adversarial Reward Learning-Based Spoken Language Understanding System. 1565-1569 - Jin Cao, Jun Wang, Wael Hamza, Kelly Vanee, Shang-Wen Li:

Style Attuned Pre-Training and Parameter Efficient Fine-Tuning for Spoken Language Understanding. 1570-1574 - Shota Orihashi, Mana Ihori, Tomohiro Tanaka, Ryo Masumura:

Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training. 1575-1579 - Leda Sari, Mark Hasegawa-Johnson:

Deep F-Measure Maximization for End-to-End Speech Understanding. 1580-1584 - Taesun Whang, Dongyub Lee, Chanhee Lee, Kisu Yang, Dongsuk Oh, Heuiseok Lim:

An Effective Domain Adaptive Post-Training Method for BERT in Response Selection. 1585-1589 - Antoine Caubrière, Yannick Estève, Antoine Laurent, Emmanuel Morin:

Confidence Measure for Speech-to-Concept End-to-End Spoken Language Understanding. 1590-1594
Human Speech Processing
- Grant L. McGuire, Molly Babel:

Attention to Indexical Information Improves Voice Recall. 1595-1599 - Anaïs Tran Ngoc

, Julien Meyer, Fanny Meunier
:
Categorization of Whistled Consonants by French Speakers. 1600-1604 - Anaïs Tran Ngoc

, Julien Meyer
, Fanny Meunier
:
Whistled Vowel Identification by French Listeners. 1605-1609 - Maria del Mar Cordero, Fanny Meunier

, Nicolas Grimault
, Stéphane Pota, Elsa Spinelli:
F0 Slope and Mean: Cues to Speech Segmentation in French. 1610-1614 - Amandine Michelas, Sophie Dufour:

Does French Listeners' Ability to Use Accentual Information at the Word Level Depend on the Ear of Presentation? 1615-1619 - Wen Liu

:
A Perceptual Study of the Five Level Tones in Hmu (Xinzhai Variety). 1620-1623 - Zhen Zeng, Karen Mattock, Liquan Liu

, Varghese Peter
, Alba Tuninetti, Feng-Ming Tsao:
Mandarin and English Adults' Cue-Weighting of Lexical Stress. 1624-1628 - Yan Feng, Gang Peng, William Shi-Yuan Wang:

Age-Related Differences of Tone Perception in Mandarin-Speaking Seniors. 1629-1633 - Georgia Zellou, Michelle Cohn

:
Social and Functional Pressures in Vocal Alignment: Differences for Human and Voice-AI Interlocutors. 1634-1638 - Hassan Salami Kavaki, Michael I. Mandel:

Identifying Important Time-Frequency Locations in Continuous Speech Utterances. 1639-1643
Feature Extraction and Distant ASR
- Erfan Loweimi

, Peter Bell, Steve Renals:
Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling. 1644-1648 - Purvi Agrawal, Sriram Ganapathy:

Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations. 1649-1653 - Dino Oglic, Zoran Cvetkovic, Peter Bell, Steve Renals:

A Deep 2D Convolutional Network for Waveform-Based Speech Recognition. 1654-1658 - Ludwig Kürzinger

, Nicolas Lindae, Palle Klewitz, Gerhard Rigoll:
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions. 1659-1663 - Pegah Ghahramani, Hossein Hadian, Daniel Povey, Hynek Hermansky

, Sanjeev Khudanpur:
An Alternative to MFCCs for ASR. 1664-1667 - Anirban Dutta, Ashishkumar Prabhakar Gudmalwar, Ch. V. Rama Rao:

Phase Based Spectro-Temporal Features for Building a Robust ASR System. 1668-1672 - Neethu M. Joy, Dino Oglic, Zoran Cvetkovic, Peter Bell, Steve Renals:

Deep Scattering Power Spectrum Features for Robust Speech Recognition. 1673-1677 - Titouan Parcollet, Xinchi Qiu, Nicholas D. Lane:

FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition. 1678-1682 - Kshitiz Kumar, Bo Ren, Yifan Gong, Jian Wu:

Bandpass Noise Generation and Augmentation for Unified ASR. 1683-1687 - Anurenjan Purushothaman

, Anirudh Sreeram, Rohit Kumar, Sriram Ganapathy:
Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition. 1688-1692
Voice Privacy Challenge
- Natalia A. Tomashenko

, Brij Mohan Lal Srivastava, Xin Wang
, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas W. D. Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco:
Introducing the VoicePrivacy Initiative. 1693-1697 - Andreas Nautsch, Jose Patino, Natalia A. Tomashenko

, Junichi Yamagishi, Paul-Gauthier Noé, Jean-François Bonastre
, Massimiliano Todisco, Nicholas W. D. Evans:
The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment. 1698-1702 - Candy Olivia Mawalim, Kasorn Galajit, Jessada Karnjana, Masashi Unoki

:
X-Vector Singular Value Modification and Statistical-Based Decomposition with Ensemble Regression Modeling for Speaker Anonymization System. 1703-1707 - Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent:

A Comparative Study of Speech Anonymization Metrics. 1708-1712 - Brij Mohan Lal Srivastava, Natalia A. Tomashenko

, Xin Wang
, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi:
Design Choices for X-Vector Based Speaker Anonymization. 1713-1717 - Paul-Gauthier Noé, Jean-François Bonastre, Driss Matrouf, Natalia A. Tomashenko

, Andreas Nautsch
, Nicholas W. D. Evans:
Speech Pseudonymisation Assessment Using Voice Similarity Matrices. 1718-1722
Speech Synthesis: Text Processing, Data and Evaluation
- Kyubyong Park, Seanie Lee:

g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset. 1723-1727 - Haiteng Zhang, Huashan Pan, Xiulin Li:

A Mask-Based Model for Mandarin Chinese Polyphone Disambiguation. 1728-1732 - Michelle Cohn

, Georgia Zellou:
Perception of Concatenative vs. Neural Text-To-Speech (TTS): Differences in Intelligibility in Noise and Language Attitudes. 1733-1737 - Jason Taylor, Korin Richmond

:
Enhancing Sequence-to-Sequence Text-to-Speech with Morphology. 1738-1742 - Yeunju Choi, Youngmoon Jung, Hoirin Kim:

Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling. 1743-1747 - Gabriel Mittag, Sebastian Möller:

Deep Learning Based Assessment of Synthetic Speech Naturalness. 1748-1752 - Jiawen Zhang

, Yuanyuan Zhao, Jiaqi Zhu, Jinba Xiao:
Distant Supervision for Polyphone Disambiguation in Mandarin Chinese. 1753-1757 - Pilar Oplustil Gallegos, Jennifer Williams, Joanna Rownicka, Simon King:

An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis Datasets. 1758-1762 - Anurag Das, Guanlong Zhao, John Levis, Evgeny Chukharev-Hudilainen

, Ricardo Gutierrez-Osuna:
Understanding the Effect of Voice Quality and Accent on Talker Similarity. 1763-1767
Search for Speech Recognition
- Wei Zhou

, Ralf Schlüter
, Hermann Ney:
Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition Without Length Bias. 1768-1772 - Xi Chen, Songyang Zhang

, Dandan Song, Peng Ouyang, Shouyi Yin:
Transformer with Bidirectional Decoder for Speech Recognition. 1773-1777 - Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, Richard Socher:

An Investigation of Phone-Based Subword Units for End-to-End Speech Recognition. 1778-1782 - Jeremy Heng Meng Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, Jinyu Li

, Yifan Gong:
Combination of End-to-End and Hybrid Models for Speech Recognition. 1783-1787 - Jihwan Kim, Jisung Wang, Sangki Kim, Yeha Lee:

Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition. 1788-1792 - Abhinav Garg, Ashutosh Gupta, Dhananjaya Gowda, Shatrughan Singh, Chanwoo Kim

:
Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition. 1793-1797 - Eugen Beck, Ralf Schlüter

, Hermann Ney:
LVCSR with Transformer Language Models. 1798-1802 - Yi-Chen Chen, Jui-Yang Hsu, Cheng-Kuang Lee, Hung-yi Lee:

DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation. 1803-1807
Computational Paralinguistics I
- Lukas Stappen, Georgios Rizos, Madina Hasan, Thomas Hain

, Björn W. Schuller:
Uncertainty-Aware Machine Support for Paper Reviewing on the Interspeech 2019 Submission Corpus. 1808-1812 - Michelle Cohn

, Melina Sarian, Kristin Predeck, Georgia Zellou:
Individual Variation in Language Attitudes Toward Voice-AI: The Role of Listeners' Autistic-Like Traits. 1813-1817 - Michelle Cohn

, Eran Raveh, Kristin Predeck, Iona Gessinger
, Bernd Möbius, Georgia Zellou:
Differences in Gradient Emotion Perception: Human vs. Alexa Voices. 1818-1822 - Luz Martinez-Lucas, Mohammed Abdelwahab, Carlos Busso

:
The MSP-Conversation Corpus. 1823-1827 - Fuxiang Tao, Anna Esposito

, Alessandro Vinciarelli:
Spotting the Traces of Depression in Read Speech: An Approach Based on Computational Paralinguistics and Social Signal Processing. 1828-1832 - Yelin Kim, Joshua Levy, Yang Liu:

Speech Sentiment and Customer Satisfaction Estimation in Socialbot Conversations. 1833-1837 - Haley Lepp

, Gina-Anne Levow:
Pardon the Interruption: An Analysis of Gender and Turn-Taking in U.S. Supreme Court Oral Arguments. 1838-1842 - Jana Neitsch

, Oliver Niebuhr
:
Are Germans Better Haters Than Danes? Language-Specific Implicit Prosodies of Types of Hate Speech and How They Relate to Perceived Severity and Societal Rules. 1843-1847 - Fuling Chen, Roberto Togneri

, Murray Maybery
, Diana Tan
:
An Objective Voice Gender Scoring System and Identification of the Salient Acoustic Measures. 1848-1852 - Sadari Jayawardena, Julien Epps, Zhaocheng Huang:

How Ordinal Are Your Data? 1853-1857
Acoustic Phonetics and Prosody
- Vincent Hughes

, Frantz Clermont, Philip Harrison:
Correlating Cepstra with Formant Frequencies: Implications for Phonetically-Informed Forensic Voice Comparison. 1858-1862 - Jana Neitsch

, Plínio A. Barbosa, Oliver Niebuhr
:
Prosody and Breathing: A Comparison Between Rhetorical and Information-Seeking Questions in German and Brazilian Portuguese. 1863-1867 - Rebecca Defina, Catalina Torres

, Hywel Stoakes:
Scaling Processes of Clause Chains in Pitjantjatjara. 1868-1872 - Ai Mizoguchi

, Ayako Hashimoto, Sanae Matsui, Setsuko Imatomi, Ryunosuke Kobayashi, Mafuyu Kitahara
:
Neutralization of Voicing Distinction of Stops in Tohoku Dialects of Japanese: Field Work and Acoustic Measurements. 1873-1877 - Lou Lee, Denis Jouvet, Katarina Bartkova, Yvon Keromnes, Mathilde Dargnat:

Correlation Between Prosody and Pragmatics: Case Study of Discourse Markers in French and English. 1878-1882 - Dina El Zarka, Anneliese Kelterer, Barbara Schuppler

:
An Analysis of Prosodic Prominence Cues to Information Structure in Egyptian Arabic. 1883-1887 - Benazir Mumtaz, Tina Bögel, Miriam Butt:

Lexical Stress in Urdu. 1888-1892 - Rachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Jennifer Hamet Bagnou, Xuan-Nga Cao, Emmanuel Dupoux, Anne-Catherine Bachoud-Lévi

:
Vocal Markers from Sustained Phonation in Huntington's Disease. 1893-1897 - Laure Dentel, Julien Meyer

:
How Rhythm and Timbre Encode Mooré Language in Bendré Drummed Speech. 1898-1902
Keynote 3
- Lin-Shan Lee:

Doing Something we Never could with Spoken Language Technologies-from early days to the era of deep learning.
Tonal Aspects of Acoustic Phonetics and Prosody
- Wendy Lalhminghlui

, Priyankoo Sarmah:
Interaction of Tone and Voicing in Mizo. 1903-1907 - Yaru Wu, Martine Adda-Decker, Lori Lamel:

Mandarin Lexical Tones: A Corpus-Based Study of Word Length, Syllable Position and Prosodic Position on Duration. 1908-1912 - Yingming Gao, Xinyu Zhang, Yi Xu, Jinsong Zhang, Peter Birkholz

:
An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech. 1913-1917 - Wei Lai, Aini Li

:
Integrating the Application and Realization of Mandarin 3rd Tone Sandhi in the Resolution of Sentence Ambiguity. 1918-1922 - Zhenrui Zhang, Fang Hu:

Neutral Tone in Changde Mandarin. 1923-1927 - Ping Cui, Jianjing Kuang:

Pitch Declination and Final Lowering in Northeastern Mandarin. 1928-1932 - Phil Rose:

Variation in Spectral Slope and Interharmonic Noise in Cantonese Tones. 1933-1937 - Ping Tang

, Shanpeng Li
:
The Acoustic Realization of Mandarin Tones in Fast Speech. 1938-1941
Speech Classification
- Anastassia Loukina, Keelan Evanini, Matthew Mulholland, Ian Blood, Klaus Zechner:

Do Face Masks Introduce Bias in Speech Technologies? The Case of Automated Scoring of Speaking Proficiency. 1942-1946 - Mohamed Mhiri, Samuel Myer, Vikrant Singh Tomar:

A Low Latency ASR-Free End to End Spoken Language Understanding System. 1947-1951 - Joe Wang, Rajath Kumar, Mike Rodehorst, Brian Kulis, Shiv Naga Prasad Vitaladevuni:

An Audio-Based Wakeword-Independent Verification System. 1952-1956 - Tyler Vuong, Yangyang Xia, Richard M. Stern

:
Learnable Spectro-Temporal Receptive Fields for Robust Voice Type Discrimination. 1957-1961 - Shuo-Yiin Chang, Bo Li, David Rybach

, Yanzhang He, Wei Li, Tara N. Sainath, Trevor Strohman:
Low Latency Speech Recognition Using End-to-End Prefetching. 1962-1966 - Jingsong Wang, Tom Ko, Zhen Xu, Xiawei Guo, Souxiang Liu, Wei-Wei Tu, Lei Xie:

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification. 1967-1971 - Rajath Kumar, Mike Rodehorst, Joe Wang, Jiacheng Gu, Brian Kulis:

Building a Robust Word-Level Wakeword Verification Network. 1972-1976 - Yuma Koizumi, Ryo Masumura, Kyosuke Nishida, Masahiro Yasuda, Shoichiro Saito:

A Transformer-Based Audio Captioning Model with Keyword Estimation. 1977-1981 - Tong Mo, Yakun Yu, Mohammad Salameh, Di Niu, Shangling Jui:

Neural Architecture Search for Keyword Spotting. 1982-1986 - Ximin Li, Xiaodong Wei, Xiaowei Qin:

Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution. 1987-1991
Speech Synthesis Paradigms and Methods I
- Xin Wang

, Junichi Yamagishi:
Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model. 1992-1996 - Jen-Yu Liu, Yu-Hua Chen, Yin-Cheng Yeh, Yi-Hsuan Yang:

Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization. 1997-2001 - Toru Nakashika:

Complex-Valued Variational Autoencoder: A Novel Deep Generative Model for Direct Representation of Complex Spectra. 2002-2006 - Seungwoo Choi, Seungju Han, Dongyoung Kim, Sungjoo Ha:

Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding. 2007-2011 - Hyeong Rae Ihm, Joun Yeop Lee, Byoung Jin Choi, Sung Jun Cheon, Nam Soo Kim:

Reformer-TTS: Neural Speech Synthesis with Reformer Network. 2012-2016 - Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo:

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-Spectrogram Conversion. 2017-2021 - Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Aimilios Chalamandaris, Georgia Maniati, Panos Kakoulidis

, Spyros Raptis, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis:
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency. 2022-2026 - Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu:

DurIAN: Duration Informed Attention Network for Speech Synthesis. 2027-2031 - Kentaro Mitsui, Tomoki Koriyama

, Hiroshi Saruwatari:
Multi-Speaker Text-to-Speech Synthesis Using Deep Gaussian Processes. 2032-2036 - Mano Ranjith Kumar M., Sudhanshu Srivastava, Anusha Prakash, Hema A. Murthy:

A Hybrid HMM-Waveglow Based Text-to-Speech Synthesizer Using Histogram Equalization for Low Resource Indian Languages. 2037-2041
The INTERSPEECH 2020 Computational Paralinguistics ChallengE (ComParE)
- Björn W. Schuller

, Anton Batliner, Christian Bergler, Eva-Maria Messner, Antonia F. de C. Hamilton, Shahin Amiriparian
, Alice Baird, Georgios Rizos, Maximilian Schmitt, Lukas Stappen, Harald Baumeister
, Alexis Deighton MacIntyre
, Simone Hantke:
The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks. 2042-2046 - Tomoya Koike, Kun Qian

, Björn W. Schuller, Yoshiharu Yamamoto:
Learning Higher Representations from Pre-Trained Deep Models with Data Augmentation for the COMPARE 2020 Challenge Mask Task. 2047-2051 - Steffen Illium

, Robert Müller, Andreas Sedlmeier, Claudia Linnhoff-Popien:
Surgical Mask Detection with Convolutional Neural Networks and Data Augmentations on Spectrograms. 2052-2056 - Philipp Klumpp, Tomás Arias-Vergara

, Juan Camilo Vásquez-Correa
, Paula Andrea Pérez-Toro
, Florian Hönig, Elmar Nöth, Juan Rafael Orozco-Arroyave
:
Surgical Mask Detection with Deep Recurrent Phonetic Models. 2057-2061 - Claude Montacié

, Marie-José Caraty:
Phonetic, Frame Clustering and Intelligibility Analyses for the INTERSPEECH 2020 ComParE Challenge. 2062-2066 - Mariana Julião, Alberto Abad

, Helena Moniz
:
Exploring Text and Audio Embeddings for Multi-Dimension Elderly Emotion Recognition. 2067-2071 - Maxim Markitantov

, Denis Dresvyanskiy, Danila Mamontov, Heysem Kaya
, Wolfgang Minker, Alexey Karpov
:
Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges. 2072-2076 - John Mendonça

, Francisco Teixeira, Isabel Trancoso
, Alberto Abad
:
Analyzing Breath Signals for the Interspeech 2020 ComParE Challenge. 2077-2081 - Alexis Deighton MacIntyre

, Georgios Rizos, Anton Batliner, Alice Baird, Shahin Amiriparian
, Antonia F. de C. Hamilton, Björn W. Schuller:
Deep Attentive End-to-End Continuous Breath Sensing from Speech. 2082-2086 - Jeno Szep, Salim Hariri:

Paralinguistic Classification of Mask Wearing by Image Classifiers and Fusion. 2087-2091 - Ziqing Yang, Zifan An, Zehao Fan, Chengye Jing, Houwei Cao

:
Exploration of Acoustic and Lexical Cues for the INTERSPEECH 2020 Computational Paralinguistic Challenge. 2092-2096 - Gizem Sogancioglu, Oxana Verkholyak

, Heysem Kaya
, Dmitrii Fedotov, Tobias Cadèe, Albert Ali Salah
, Alexey Karpov
:
Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition. 2097-2101 - Nicolae-Catalin Ristea, Radu Tudor Ionescu:

Are you Wearing a Mask? Improving Mask Detection from Speech Using Augmentation by Cycle-Consistent GANs. 2102-2106
Streaming ASR
- Kshitiz Kumar, Chaojun Liu, Yifan Gong, Jian Wu:

1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM. 2107-2111 - Chengyi Wang, Yu Wu, Liang Lu, Shujie Liu, Jinyu Li

, Guoli Ye, Ming Zhou:
Low Latency End-to-End Streaming Speech Recognition with a Scout Network. 2112-2116 - Gakuto Kurata, George Saon

:
Knowledge Distillation from Offline to Streaming RNN Transducer for End-to-End Speech Recognition. 2117-2121 - Wei Li, James Qin, Chung-Cheng Chiu, Ruoming Pang, Yanzhang He:

Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition. 2122-2126 - Pau Baquero-Arnal, Javier Jorge, Adrià Giménez, Joan Albert Silvestre-Cerdà

, Javier Iranzo-Sánchez, Albert Sanchís, Jorge Civera, Alfons Juan:
Improved Hybrid Streaming ASR with Transformer Language Models. 2127-2131 - Chunyang Wu, Yongqiang Wang, Yangyang Shi, Ching-Feng Yeh, Frank Zhang:

Streaming Transformer-Based Acoustic Models Using Self-Attention with Augmented Memory. 2132-2136 - Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara

:
Enhancing Monotonic Multihead Attention for Streaming ASR. 2137-2141 - Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie:

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition. 2142-2146 - Thai-Son Nguyen, Ngoc-Quan Pham, Sebastian Stüker, Alex Waibel:

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition. 2147-2151 - Vikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li

:
Transfer Learning Approaches for Streaming End-to-End Speech Recognition System. 2152-2156
Alzheimer’s Dementia Recognition Through Spontaneous Speech
- Matej Martinc, Senja Pollak:

Tackling the ADReSS Challenge: A Multimodal Approach to the Automated Recognition of Alzheimer's Dementia. 2157-2161 - Jiahong Yuan, Yuchen Bian, Xingyu Cai, Jiaji Huang, Zheng Ye, Kenneth Church

:
Disfluencies and Fine-Tuning Pre-Trained Language Models for Detection of Alzheimer's Disease. 2162-2166 - Aparna Balagopalan, Benjamin Eyre, Frank Rudzicz, Jekaterina Novikova:

To BERT or not to BERT: Comparing Speech and Language-Based Approaches for Alzheimer's Disease Detection. 2167-2171 - Saturnino Luz, Fasih Haider

, Sofia de la Fuente
, Davida Fromm, Brian MacWhinney:
Alzheimer's Dementia Recognition Through Spontaneous Speech: The ADReSS Challenge. 2172-2176 - Raghavendra Pappagari, Jaejin Cho, Laureano Moro-Velázquez, Najim Dehak

:
Using State of the Art Speaker Recognition and Natural Language Processing Technologies to Detect Alzheimer's Disease and Assess its Severity. 2177-2181 - Nicholas Cummins

, Yilin Pan, Zhao Ren, Julian Fritsch, Venkata Srikanth Nallanthighal, Heidi Christensen
, Daniel Blackburn
, Björn W. Schuller, Mathew Magimai-Doss, Helmer Strik, Aki Härmä
:
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition. 2182-2186 - Morteza Rohanian, Julian Hough, Matthew Purver

:
Multi-Modal Fusion with Gating Using Audio, Lexical and Disfluency Features for Alzheimer's Dementia Recognition from Spontaneous Speech. 2187-2191 - Thomas Searle

, Zina M. Ibrahim
, Richard J. B. Dobson
:
Comparing Natural Language Processing Techniques for Alzheimer's Dementia Prediction in Spontaneous Speech. 2192-2196 - Erik Edwards, Charles Dognin, Bajibabu Bollepalli, Maneesh Kumar Singh:

Multiscale System for Alzheimer's Dementia Recognition Through Spontaneous Speech. 2197-2201 - Anna Pompili

, Thomas Rolland, Alberto Abad
:
The INESC-ID Multi-Modal System for the ADReSS 2020 Challenge. 2202-2206 - Shahla Farzana, Natalie Parde

:
Exploring MMSE Score Prediction Using Verbal and Non-Verbal Cues. 2207-2211 - Utkarsh Sarawgi, Wazeer Zulfikar, Nouran Soliman, Pattie Maes:

Multimodal Inductive Transfer Learning for Detection of Alzheimer's Dementia and its Severity. 2212-2216 - Junghyun Koo

, Jie Hwan Lee, Jaewoo Pyo, Yujin Jo, Kyogu Lee:
Exploiting Multi-Modal Features from Pre-Trained Networks for Alzheimer's Dementia Recognition. 2217-2221 - Muhammad Shehram Shah Syed, Zafi Sherhan Syed

, Margaret Lech, Elena Pirogova
:
Automated Screening for Alzheimer's Dementia Through Spontaneous Speech. 2222-2226
Speaker Recognition Challenges and Applications
- Kong Aik Lee

, Koji Okabe, Hitoshi Yamamoto, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Keisuke Ishikawa, Koichi Shinoda:
NEC-TT Speaker Verification System for SRE'19 CTS Challenge. 2227-2231 - Ruyun Li, Tianyu Liang

, Dandan Song, Yi Liu, Yangcheng Wu, Can Xu, Peng Ouyang, Xianwei Zhang, Xianhong Chen, Weiqiang Zhang, Shouyi Yin, Liang He
:
THUEE System for NIST SRE19 CTS Challenge. 2232-2236 - Grigory Antipov, Nicolas Gengembre

, Olivier Le Blouch, Gaël Le Lan:
Automatic Quality Assessment for Audio-Visual Verification Systems. The LOVe Submission to NIST SRE Challenge 2019. 2237-2241 - Ruijie Tao, Rohan Kumar Das, Haizhou Li:

Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network. 2242-2246 - Suwon Shon, James R. Glass:

Multimodal Association for Speaker Verification. 2247-2251 - Zhengyang Chen, Shuai Wang, Yanmin Qian:

Multi-Modality Matters: A Performance Leap on VoxCeleb. 2252-2256 - Zhenyu Wang, Wei Xia, John H. L. Hansen:

Cross-Domain Adaptation with Discrepancy Minimization for Text-Independent Forensic Speaker Verification. 2257-2261 - Mufan Sang

, Wei Xia, John H. L. Hansen:
Open-Set Short Utterance Forensic Speaker Verification Using Teacher-Student Network with Explicit Inductive Bias. 2262-2266 - Anurag Chowdhury

, Austin Cozzo, Arun Ross:
JukeBox: A Multilingual Singer Recognition Dataset. 2267-2271 - Ruirui Li, Jyun-Yu Jiang, Xian Wu, Chu-Cheng Hsieh, Andreas Stolcke:

Speaker Identification for Household Scenarios with Self-Attention and Adversarial Training. 2272-2276
Applications of ASR
- Oleg Rybakov, Natasha Kononenko, Niranjan Subrahmanya, Mirkó Visontai, Stella Laurenzo:

Streaming Keyword Spotting on Mobile Devices. 2277-2281 - Hongyi Liu, Apurva Abhyankar, Yuriy Mishchenko, Thibaud Sénéchal, Gengshen Fu, Brian Kulis, Noah D. Stein, Anish Shah, Shiv Naga Prasad Vitaladevuni:

Metadata-Aware End-to-End Keyword Spotting. 2282-2286 - Yehao Kong, Jiliang Zhang:

Adversarial Audio: A New Information Hiding Method. 2287-2291 - Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic

, Odette Scharenborg
:
S2IGAN: Speech-to-Image Generation via Adversarial Learning. 2292-2296 - Juan Zuluaga-Gomez, Petr Motlícek, Qingran Zhan, Karel Veselý, Rudolf A. Braun:

Automatic Speech Recognition Benchmark for Air-Traffic Communications. 2297-2301 - Prithvi R. R. Gudepu, Gowtham P. Vadisetti, Abhishek Niranjan, Kinnera Saranu, Raghava Sarma, M. Ali Basha Shaik, Periyasamy Paramasivam:

Whisper Augmented End-to-End/Hybrid Speech Recognition System - CycleGAN Approach. 2302-2306 - Ramit Sawhney, Arshiya Aggarwal, Piyush Khanna, Puneet Mathur, Taru Jain, Rajiv Ratn Shah

:
Risk Forecasting from Earnings Calls Acoustics and Network Correlations. 2307-2311 - Huili Chen, Bita Darvish Rouhani, Farinaz Koushanfar

:
SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems. 2312-2316 - Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg

:
Evaluating Automatically Generated Phoneme Captions for Images. 2317-2321
Speech Emotion Recognition II
- Wei-Cheng Lin, Carlos Busso

:
An Efficient Temporal Modeling Approach for Speech Emotion Recognition by Mapping Varied Duration Sentences into Fixed Number of Chunks. 2322-2326 - Siddique Latif

, Rajib Rana, Sara Khalifa, Raja Jurdak
, Björn W. Schuller:
Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-Corpus Setting for Speech Emotion Recognition. 2327-2331 - Takuya Fujioka, Takeshi Homma

, Kenji Nagamatsu:
Meta-Learning for Speech Emotion Recognition Considering Ambiguity of Emotion Labels. 2332-2336 - Jiaxing Liu, Zhilei Liu

, Longbiao Wang, Yuan Gao
, Lili Guo, Jianwu Dang:
Temporal Attention Convolutional Network for Speech Emotion Recognition with Latent Representation. 2337-2341 - Zhi Zhu, Yoshinao Sato:

Reconciliation of Multiple Corpora for Speech Emotion Recognition by Multiple Classifiers with an Adversarial Corpus Discriminator. 2342-2346 - Zheng Lian

, Jianhua Tao, Bin Liu, Jian Huang, Zhanlei Yang, Rongjun Li:
Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks. 2347-2351 - Shuiyang Mao, P. C. Ching, Tan Lee

:
EigenEmo: Spectral Utterance Representation Using Dynamic Mode Decomposition for Speech Emotion Classification. 2352-2356 - Shuiyang Mao, P. C. Ching, C.-C. Jay Kuo

, Tan Lee
:
Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition. 2357-2361
Bi- and Multilinguality
- Rubén Pérez Ramón, María Luisa García Lecumberri, Martin Cooke:

The Effect of Language Proficiency on the Perception of Segmental Foreign Accent. 2362-2366 - Yi Liu

, Jinghong Ning:
The Effect of Language Dominance on the Selective Attention of Segments and Tones in Urdu-Cantonese Speakers. 2367-2371 - Mengrou Li, Ying Chen, Jie Cui:

The Effect of Input on the Production of English Tense and Lax Vowels by Chinese Learners: Evidence from an Elementary School in China. 2372-2376 - Laura Spinu

, Jiwon Hwang, Nadya Pincus, Mariana Vasilita:
Exploring the Use of an Artificial Accent of English to Assess Phonetic Learning in Monolingual and Bilingual Speakers. 2377-2381 - Shammur A. Chowdhury

, Younes Samih
, Mohamed Eldesouki, Ahmed Ali:
Effects of Dialectal Code-Switching on Speech Modules: A Study Using Egyptian Arabic Broadcast Speech. 2382-2386 - Khia A. Johnson

, Molly Babel, Robert A. Fuhrman:
Bilingual Acoustic Voice Variation is Similarly Structured Across Languages. 2387-2391 - Haobo Zhang, Haihua Xu, Van Tung Pham, Hao Huang, Eng Siong Chng:

Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-Switching Speech Recognition. 2392-2396 - Dan Du, Xianjin Zhu

, Zhu Li
, Jinsong Zhang:
Perception and Production of Mandarin Initial Stops by Native Urdu Speakers. 2397-2401 - Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman:

Now You're Speaking My Language: Visual Language Identification. 2402-2406 - Nari Rhee, Jianjing Kuang:

The Different Enhancement Roles of Covarying Cues in Thai and Mandarin Tones. 2407-2411
Single-Channel Speech Enhancement I
- Hao Shi, Longbiao Wang, Sheng Li

, Chenchen Ding, Meng Ge, Nan Li, Jianwu Dang, Hiroshi Seki:
Singing Voice Extraction with Attention-Based Spectrograms Fusion. 2412-2416 - Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao:

Incorporating Broad Phonetic Information for Speech Enhancement. 2417-2421 - Andong Li, Chengshi Zheng, Cunhang Fan, Renhua Peng, Xiaodong Li:

A Recursive Network with Dynamic Attention for Monaural Speech Enhancement. 2422-2426 - Hongjiang Yu, Wei-Ping Zhu

, Yuhong Yang
:
Constrained Ratio Mask for Speech Enhancement Using DNN. 2427-2431 - Chi-Chang Lee, Yu-Chen Lin, Hsuan-Tien Lin, Hsin-Min Wang

, Yu Tsao:
SERIL: Noise Adaptive Speech Enhancement Using Regularization-Based Incremental Learning. 2432-2436 - Yoshiaki Bando, Kouhei Sekiguchi, Kazuyoshi Yoshii

:
Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder. 2437-2441 - Ahmet Emin Bulut, Kazuhito Koishida:

Low-Latency Single Channel Speech Dereverberation Using U-Net Convolutional Neural Networks. 2442-2446 - Dung N. Tran, Kazuhito Koishida:

Single-Channel Speech Enhancement by Subspace Affinity Minimization. 2447-2451 - Haoyu Li, Junichi Yamagishi:

Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement. 2452-2456 - Feng Deng, Tao Jiang, Xiaorui Wang, Chen Zhang, Yan Li:

NAAGN: Noise-Aware Attention-Gated Network for Speech Enhancement. 2457-2461
Deep Noise Suppression Challenge
- Xiaofei Li, Radu Horaud:

Online Monaural Speech Enhancement Using Delayed Subband LSTM. 2462-2466 - Maximilian Strake, Bruno Defraene, Kristoff Fluyt, Wouter Tirry, Tim Fingscheidt

:
INTERSPEECH 2020 Deep Noise Suppression Challenge: A Fully Convolutional Recurrent Network (FCRN) for Joint Dereverberation and Denoising. 2467-2471 - Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie:

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement. 2472-2476 - Nils L. Westhausen, Bernd T. Meyer:

Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression. 2477-2481 - Jean-Marc Valin, Umut Isik, Neerad Phansalkar, Ritwik Giri, Karim Helwani, Arvindh Krishnaswamy:

A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech. 2482-2486 - Umut Isik, Ritwik Giri, Neerad Phansalkar, Jean-Marc Valin, Karim Helwani, Arvindh Krishnaswamy:

PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss. 2487-2491 - Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke:

The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results. 2492-2496
Voice and Hearing Disorders
- Sara Akbarzadeh, Sungmin Lee, Chin-Tuan Tan

:
The Implication of Sound Level on Spatial Selective Auditory Attention for Cochlear Implant Users: Behavioral and Electrophysiological Measurement. 2497-2501 - Yangyang Wan, Huali Zhou, Qinglin Meng, Nengheng Zheng:

Enhancing the Interaural Time Difference of Bilateral Cochlear Implants with the Temporal Limits Encoder. 2502-2506 - Toshio Irino, Soichi Higashiyama, Hanako Yoshigi:

Speech Clarity Improvement by Vocal Self-Training Using a Hearing Impairment Simulator and its Correlation with an Auditory Modulation Index. 2507-2511 - Zhuohuang Zhang, Donald S. Williamson

, Yi Shen
:
Investigation of Phase Distortion on Perceived Speech Quality for Hearing-Impaired Listeners. 2512-2516 - Zhuo Zhang, Gaoyan Zhang, Jianwu Dang, Shuang Wu, Di Zhou

, Longbiao Wang:
EEG-Based Short-Time Auditory Attention Detection Using Multi-Task Deep Learning. 2517-2521 - Sondes Abderrazek

, Corinne Fredouille, Alain Ghio
, Muriel Lalain, Christine Meunier
, Virginie Woisard:
Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders - Step 1: CNN Model-Based Phone Classification. 2522-2526 - Bahman Mirheidari, Daniel Blackburn

, Ronan O'Malley, Annalena Venneri, Traci Walker
, Markus Reuber, Heidi Christensen
:
Improving Cognitive Impairment Classification by Generative Neural Network-Based Feature Augmentation. 2527-2531 - Meredith Moore

, Piyush Papreja, Michael Saxon, Visar Berisha, Sethuraman Panchanathan:
UncommonVoice: A Crowdsourced Dataset of Dysphonic Speech. 2532-2536 - Purva Barche, Krishna Gurugubelli

, Anil Kumar Vuppala:
Towards Automatic Assessment of Voice Disorders: A Clinical Approach. 2537-2541 - Abhishek Shivkumar, Jack Weston, Raphael Lenain, Emil Fristed:

BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages. 2542-2546
Spoken Term Detection
- Menglong Xu, Xiao-Lei Zhang:

Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-Footprint Keyword Spotting. 2547-2551 - Théodore Bluche, Thibault Gisselbrecht:

Predicting Detection Filters for Small Footprint Open-Vocabulary Keyword Spotting. 2552-2556 - Emre Yilmaz, Özgür Bora Gevrek, Jibin Wu, Yuxiang Chen, Xuanbo Meng, Haizhou Li:

Deep Convolutional Spiking Neural Networks for Keyword Spotting. 2557-2561 - Haiwei Wu, Yan Jia, Yuanfei Nie, Ming Li:

Domain Aware Training for Far-Field Small-Footprint Keyword Spotting. 2562-2566 - Kun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song:

Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting. 2567-2571 - Peng Zhang, Xueliang Zhang:

Deep Template Matching for Small-Footprint and Configurable Keyword Spotting. 2572-2576 - Chen Yang, Xue Wen, Liming Song:

Multi-Scale Convolution for Robust Keyword Spotting. 2577-2581 - Yangbin Chen

, Tom Ko, Lifeng Shang, Xiao Chen, Xin Jiang, Qing Li
:
An Investigation of Few-Shot Learning in Spoken Term Classification. 2582-2586 - Zeyu Zhao

, Weiqiang Zhang:
End-to-End Keyword Search Based on Attention and Energy Scorer for Low Resource Languages. 2587-2591 - Takuya Higuchi, Mohammad Ghasemzadeh, Kisun You, Chandra Dhir:

Stacked 1D Convolutional Networks for End-to-End Small Footprint Voice Trigger Detection. 2592-2596
The Fearless Steps Challenge Phase-02
- Jens Heitkaemper, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

:
Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments. 2597-2601 - Xueshuai Zhang, Wenchao Wang, Pengyuan Zhang:

Speaker Diarization System Based on DPCA Algorithm for Fearless Steps Challenge Phase-2. 2602-2606 - Qingjian Lin, Tingle Li, Ming Li:

The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02. 2607-2611 - Arseniy Gorin, Daniil Kulko, Steven Grima, Alex Glasman:

"This is Houston. Say again, please". The Behavox System for the Apollo-11 Fearless Steps Challenge (Phase II). 2612-2616 - Aditya Joglekar, John H. L. Hansen, Meena Chandra Shekhar, Abhijeet Sangwan:

FEARLESS STEPS Challenge (FS-2): Supervised Learning with Massive Naturalistic Apollo Data. 2617-2621
Monaural Source Separation
- Yi Luo, Nima Mesgarani:

Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss. 2622-2626 - Jingjing Chen, Qirong Mao, Dong Liu:

On Synthesis for Supervised Monaural Speech Separation in Time Domain. 2627-2631 - Jun Wang:

Learning Better Speech Representations by Worsening Interference. 2632-2636 - Manuel Pariente, Samuele Cornell

, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas
, David Ditter, Ariel Frank
, Antoine Deleforge, Emmanuel Vincent:
Asteroid: The PyTorch-Based Audio Source Separation Toolkit for Researchers. 2637-2641 - Jingjing Chen, Qirong Mao, Dong Liu:

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation. 2642-2646 - Chengyun Deng, Yi Zhang, Shiqian Ma, Yongtao Sha, Hui Song, Xiangang Li:

Conv-TasSAN: Separative Adversarial Network Based on Conv-TasNet. 2647-2651 - Keisuke Kinoshita

, Thilo von Neumann, Marc Delcroix
, Tomohiro Nakatani, Reinhold Haeb-Umbach
:
Multi-Path RNN for Hierarchical Modeling of Long Sequential Data and its Application to Speaker Stream Separation. 2652-2656 - Vivek Sivaraman Narayanaswamy, Jayaraman J. Thiagarajan, Rushil Anirudh, Andreas Spanias:

Unsupervised Audio Source Separation Using Generative Priors. 2657-2661
Single-Channel Speech Enhancement II
- Yuanhang Qiu, Ruili Wang:

Adversarial Latent Representation Learning for Speech Enhancement. 2662-2666 - Yang Xiang, Liming Shi

, Jesper Lisby Højvang, Morten Højfeldt Rasmussen, Mads Græsbøll Christensen:
An NMF-HMM Speech Enhancement Method Based on Kullback-Leibler Divergence. 2667-2671 - Lu Zhang, Mingjiang Wang:

Multi-Scale TCN: Exploring Better Temporal DNN Model for Causal Speech Enhancement. 2672-2676 - Quan Wang, Ignacio López-Moreno, Mert Saglam, Kevin W. Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein:

VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition. 2677-2681 - Ziqiang Shi, Rujie Liu, Jiqing Han:

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss. 2682-2686 - Xiang Hao

, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li:
Sub-Band Knowledge Distillation Framework for Speech Enhancement. 2687-2691 - Sujan Kumar Roy

, Aaron Nicolson
, Kuldip K. Paliwal
:
A Deep Learning-Based Kalman Filter for Speech Enhancement. 2692-2696 - Hongjiang Yu, Wei-Ping Zhu

, Benoît Champagne:
Subband Kalman Filtering with DNN Estimated Parameters for Speech Enhancement. 2697-2701 - Xiaoqi Li, Yaxing Li, Yuanjie Dong, Shan Xu, Zhihui Zhang, Dan Wang, Shengwu Xiong:

Bidirectional LSTM Network with Ordered Neurons for Speech Enhancement. 2702-2706 - Jing Shi, Jiaming Xu, Yusuke Fujita, Shinji Watanabe

, Bo Xu:
Speaker-Conditional Chain Model for Speech Separation and Extraction. 2707-2711
Topics in ASR II
- Leanne Nortje, Herman Kamper

:
Unsupervised vs. Transfer Learning for Multimodal One-Shot Matching of Speech and Images. 2712-2716 - Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung:

Multimodal Speech Emotion Recognition Using Cross Attention with Aligned Audio and Text. 2717-2721 - Tamás Gábor Csapó:

Speaker Dependent Articulatory-to-Acoustic Mapping Using Real-Time MRI of the Vocal Tract. 2722-2726 - Tamás Gábor Csapó, Csaba Zainkó, László Tóth, Gábor Gosztolya, Alexandra Markó:

Ultrasound-Based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis. 2727-2731 - Siyuan Feng

, Odette Scharenborg
:
Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling. 2732-2736 - Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

:
Generative Adversarial Training Data Adaptation for Very Low-Resource Automatic Speech Recognition. 2737-2741 - Kazuki Tsunematsu, Johanes Effendi, Sakriani Sakti, Satoshi Nakamura:

Neural Speech Completion. 2742-2746 - Benjamin Milde, Chris Biemann:

Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization. 2747-2751 - Katerina Papadimitriou, Gerasimos Potamianos:

Multimodal Sign Language Recognition via Temporal Deformable Convolutional Sequence Learning. 2752-2756 - Vineel Pratap, Qiantong Xu, Anuroop Sriram

, Gabriel Synnaeve, Ronan Collobert:
MLS: A Large-Scale Multilingual Dataset for Speech Research. 2757-2761
Neural Signals for Spoken Communication
- Ivan Halim Parmonangan, Hiroki Tanaka

, Sakriani Sakti, Satoshi Nakamura:
Combining Audio and Brain Activity for Predicting Speech Quality. 2762-2766 - Rini A. Sharon, Hema A. Murthy:

The "Sound of Silence" in EEG - Cognitive Voice Activity Detection. 2767-2771 - Siqi Cai, Enze Su, Yonghao Song, Longhan Xie, Haizhou Li:

Low Latency Auditory Attention Detection with Common Spatial Pattern Analysis of EEG Signals. 2772-2776 - Miguel Angrick

, Christian Herff, Garett D. Johnson, Jerry J. Shih, Dean J. Krusienski
, Tanja Schultz
:
Speech Spectrogram Estimation from Intracranial Brain Activity Using a Quantization Approach. 2777-2781 - Debadatta Dash

, Paul Ferrari
, Angel W. Hernandez-Mulero, Daragh Heitzman, Sara G. Austin, Jun Wang:
Neural Speech Decoding for Amyotrophic Lateral Sclerosis. 2782-2786
Training Strategies for ASR
- Yang Chen, Weiran Wang, Chao Wang:

Semi-Supervised ASR by End-to-End Self-Training. 2787-2791 - Hitesh Tulsiani, Ashtosh Sapru, Harish Arsikere, Surabhi Punjabi, Sri Garimella:

Improved Training Strategies for End-to-End Speech Recognition in Digital Voice Assistants. 2792-2796 - Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:

Serialized Output Training for End-to-End Overlapped Speech Recognition. 2797-2801 - Felix Weninger, Franco Mana, Roberto Gemello, Jesús Andrés-Ferrer, Puming Zhan:

Semi-Supervised Learning with Data Augmentation for End-to-End ASR. 2802-2806 - Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas:

Efficient Minimum Word Error Rate Training of RNN-Transducer for End-to-End Speech Recognition. 2807-2811 - Albert Zeyer

, André Merboldt, Ralf Schlüter
, Hermann Ney:
A New Training Pipeline for an Improved Neural Transducer. 2812-2816 - Daniel S. Park, Yu Zhang, Ye Jia, Wei Han, Chung-Cheng Chiu, Bo Li, Yonghui Wu, Quoc V. Le:

Improved Noisy Student Training for Automatic Speech Recognition. 2817-2821 - Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi:

Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition. 2822-2826 - Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Hejung Yang, Abhinav Garg, Sachin Singh, Jiyeon Kim, Mehul Kumar, Sichen Jin, Shatrughan Singh, Chanwoo Kim

:
Utterance Invariant Training for Hybrid Two-Pass End-to-End Speech Recognition. 2827-2831 - Gary Wang, Andrew Rosenberg, Zhehuai Chen, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno:

SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR. 2832-2836
Speech Transmission & Coding
- Sneha Das

, Tom Bäckström
, Guillaume Fuchs
:
Fundamental Frequency Model for Postfiltering at Low Bitrates in a Transform-Domain Speech and Audio Codec. 2837-2841 - Arthur Van Den Broucke, Deepak Baby, Sarah Verhulst:

Hearing-Impaired Bio-Inspired Cochlear Models for Real-Time Auditory Applications. 2842-2846 - Jan Skoglund

, Jean-Marc Valin:
Improving Opus Low Bit Rate Quality with Neural Speech Synthesis. 2847-2851 - Pranay Manocha

, Adam Finkelstein, Richard Zhang, Nicholas J. Bryan, Gautham J. Mysore, Zeyu Jin:
A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences. 2852-2856 - Piotr Masztalski

, Mateusz Matuszewski, Karol Piaskowski, Michal Romaniuk:
StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation. 2857-2861 - Babak Naderi, Ross Cutler:

An Open Source Implementation of ITU-T Recommendation P.808 with Validation. 2862-2866 - Gabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner:

DNN No-Reference PSTN Speech Quality Prediction. 2867-2871 - Sebastian Möller

, Tobias Hübschen, Thilo Michael, Gabriel Mittag, Gerhard Schmidt:
Non-Intrusive Diagnostic Monitoring of Fullband Speech Quality. 2872-2876
Bioacoustics and Articulation
- Abdolreza Sabzi Shahrebabaki

, Negar Olfati, Sabato Marco Siniscalchi, Giampiero Salvi
, Torbjørn Svendsen
:
Transfer Learning of Articulatory Information Through Phone Information. 2877-2881 - Abdolreza Sabzi Shahrebabaki

, Sabato Marco Siniscalchi, Giampiero Salvi
, Torbjørn Svendsen
:
Sequence-to-Sequence Articulatory Inversion Through Time Convolution of Sub-Band Frequency Signals. 2882-2886 - Bernardo B. Gatto

, Eulanda Miranda dos Santos, Juan Gabriel Colonna
, Naoya Sogi, Lincon Sales de Souza, Kazuhiro Fukui:
Discriminative Singular Spectrum Analysis for Bioacoustic Classification. 2887-2891 - Renuka Mannem, Hima Jyothi R., Aravind Illa, Prasanta Kumar Ghosh:

Speech Rate Task-Specific Representation Learning from Acoustic-Articulatory Data. 2892-2896 - Abner Hernandez, Eun Jung Yeo, Sunhee Kim, Minhwa Chung:

Dysarthria Detection and Severity Assessment Using Rhythm-Based Metrics. 2897-2901 - Yi Ma, Xinzi Xu, Yongfu Li:

LungRN+NL: An Improved Adventitious Lung Sound Classification Using Non-Local Block ResNet Neural Network with Mixup Data Augmentation. 2902-2906 - Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh:

Attention and Encoder-Decoder Based Models for Transforming Articulatory Movements at Different Speaking Rates. 2907-2911 - Zijiang Yang, Shuo Liu, Meishu Song, Emilia Parada-Cabaleiro, Björn W. Schuller:

Adventitious Respiratory Classification Using Attentive Residual Neural Networks. 2912-2916 - Raphael Lenain, Jack Weston, Abhishek Shivkumar, Emil Fristed:

Surfboard: Audio Feature Extraction for Modern Machine Learning. 2917-2921 - Abinay Reddy Naini, Malla Satyapriya, Prasanta Kumar Ghosh:

Whisper Activity Detection Using CNN-LSTM Based Attention Pooling Network Trained for a Speaker Identification Task. 2922-2926
Speech Synthesis: Multilingual and Cross-Lingual Approaches
- Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma:

Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion. 2927-2931 - Zhaoyu Liu, Brian Mak

:
Multi-Lingual Multi-Speaker Text-to-Speech Synthesis for Voice Cloning with Online Speaker Enrollment. 2932-2936 - Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Chunyu Qiang, Tao Wang:

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis. 2937-2941 - Marlene Staib, Tian Huey Teh, Alexandra Torresquintero, Devang S. Ram Mohan, Lorenzo Foglianti, Raphael Lenain, Jiameng Gao:

Phonological Features for 0-Shot Multilingual Speech Synthesis. 2942-2946 - Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama

, Hiroshi Saruwatari:
Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space. 2947-2951 - Ruolan Liu, Xue Wen, Chunhui Lu, Xiao Chen:

Tone Learning in Low-Resource Bilingual TTS. 2952-2956 - Shubham Bansal, Arijit Mukherjee, Sandeepkumar Satpal, Rupesh Kumar Mehta:

On Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model. 2957-2961 - Anusha Prakash, Hema A. Murthy:

Generic Indic Text-to-Speech Synthesisers with Rapid Adaptation in an End-to-End Framework. 2962-2966 - Marcel de Korte, Jaebok Kim, Esther Klabbers:

Efficient Neural Speech Synthesis for Low-Resource Languages Through Multilingual Modeling. 2967-2971 - Tomás Nekvinda, Ondrej Dusek:

One Model, Many Languages: Meta-Learning for Multilingual Text-to-Speech. 2972-2976
Learning Techniques for Speaker Recognition I
- Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee-Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han:

In Defence of Metric Learning for Speaker Recognition. 2977-2981 - Seong Min Kye, Youngmoon Jung, Haebeom Lee, Sung Ju Hwang, Hoirin Kim:

Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs. 2982-2986 - Kai Li, Masato Akagi, Yibo Wu, Jianwu Dang:

Segment-Level Effects of Gender, Nationality and Emotion Information on Text-Independent Speaker Verification. 2987-2991 - Yanpei Shi, Qiang Huang, Thomas Hain

:
Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification. 2992-2996 - Ana Montalvo

, José R. Calvo, Jean-François Bonastre:
Multi-Task Learning for Voice Related Recognition Tasks. 2997-3001 - Umair Khan, Javier Hernando:

Unsupervised Training of Siamese Networks for Speaker Verification. 3002-3006 - Ying Liu, Yan Song, Yiheng Jiang, Ian McLoughlin

, Lin Liu, Li-Rong Dai:
An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions. 3007-3011 - Naijun Zheng

, Xixin Wu, Jinghua Zhong, Xunying Liu, Helen Meng:
Speaker-Aware Linear Discriminant Analysis in Speaker Verification. 3012-3016 - Zhengyang Chen, Shuai Wang, Yanmin Qian:

Adversarial Domain Adaptation for Speaker Verification Using Partially Shared Network. 3017-3021
Pronunciation
- Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang:

Automatic Scoring at Multi-Granularity for L2 Pronunciation. 3022-3026 - Tien-Hong Lo, Shi-Yan Weng, Hsiu-Jui Chang, Berlin Chen:

An Effective End-to-End Modeling Approach for Mispronunciation Detection. 3027-3031 - Bi-Cheng Yan, Meng-Che Wu, Hsiao-Tsung Hung, Berlin Chen:

An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling. 3032-3036 - Richeng Duan, Nancy F. Chen

:
Unsupervised Feature Adaptation Using Adversarial Multi-Task Training for Automatic Evaluation of Children's Speech. 3037-3041 - Longfei Yang, Kaiqi Fu, Jinsong Zhang, Takahiro Shinozaki:

Pronunciation Erroneous Tendency Detection with Language Adversarial Represent Learning. 3042-3046 - Sitong Cheng, Zhixin Liu, Lantian Li

, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng:
ASR-Free Pronunciation Assessment. 3047-3051 - Konstantinos Kyriakopoulos, Kate M. Knill, Mark J. F. Gales:

Automatic Detection of Accent and Lexical Pronunciation Errors in Spontaneous Non-Native English Speech. 3052-3056 - Jiatong Shi, Nan Huo, Qin Jin:

Context-Aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training. 3057-3061 - Wei Chu, Yang Liu, Jianwei Zhou:

Recognize Mispronunciations to Improve Non-Native Acoustic Modeling Through a Phone Decoder Built from One Edit Distance Finite State Automaton. 3062-3066
Diarization
- Pablo Gimeno, Victoria Mingote

, Alfonso Ortega Giménez
, Antonio Miguel, Eduardo Lleida:
Partial AUC Optimisation Using Recurrent Neural Networks for Music Detection with Limited Training Data. 3067-3071 - Marvin Lavechin, Ruben Bousbib, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristià:

An Open-Source Voice Type Classifier for Child-Centered Daylong Recordings. 3072-3076 - Chao Peng, Xihong Wu, Tianshu Qu:

Competing Speaker Count Estimation on the Fusion of the Spectral and Spatial Embedding Space. 3077-3081 - Shoufeng Lin, Xinyuan Qian:

Audio-Visual Multi-Speaker Tracking Based on the GLMB Framework. 3082-3086 - Shuo Liu, Andreas Triantafyllopoulos, Zhao Ren, Björn W. Schuller:

Towards Speech Robustness for Acoustic Scene Classification. 3087-3091 - Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari:

Identify Speakers in Cocktail Parties with End-to-End Attention. 3092-3096 - Thilo von Neumann, Christoph Böddeker, Lukas Drude, Keisuke Kinoshita

, Marc Delcroix
, Tomohiro Nakatani, Reinhold Haeb-Umbach
:
Multi-Talker ASR for an Unknown Number of Sources: Joint Training of Source Counting, Separation and ASR. 3097-3101 - Shreya G. Upadhyay, Bo-Hao Su, Chi-Chun Lee

:
Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection. 3102-3106 - Samuele Cornell

, Maurizio Omologo, Stefano Squartini
, Emmanuel Vincent:
Detecting and Counting Overlapping Speakers in Distant Speech Scenarios. 3107-3111 - Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux:

All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection. 3112-3116
Computational Paralinguistics II
- Lorenz Diener, Shahin Amiriparian

, Catarina Botelho, Kevin Scheck, Dennis Küster
, Isabel Trancoso
, Björn W. Schuller, Tanja Schultz
:
Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from Electromyographic Signals. 3117-3121 - Shun-Chang Zhong, Bo-Hao Su, Wei Huang, Yi-Ching Liu, Chi-Chun Lee

:
Predicting Collaborative Task Performance Using Graph Interlocutor Acoustic Network in Small Group Interaction. 3122-3126 - Gábor Gosztolya:

Very Short-Term Conflict Intensity Estimation Using Fisher Vectors. 3127-3131 - Hiroki Mori

, Yuki Kikuchi:
Gaming Corpus for Studying Social Screams. 3132-3135 - Amber Afshan, Jody Kreiman, Abeer Alwan:

Speaker Discrimination in Humans and Machines: Effects of Speaking Style Variability. 3136-3140 - Kamini Sabu

, Preeti Rao:
Automatic Prediction of Confidence Level from Children's Oral Reading Recordings. 3141-3145 - Wei Xue

, Viviana Mendoza Ramos, Wieke Harmsen, Catia Cucchiarini, R. W. N. M. van Hout, Helmer Strik:
Towards a Comprehensive Assessment of Speech Intelligibility for Pathological Speech. 3146-3150 - Yi Lin, Hongwei Ding:

Effects of Communication Channels and Actor's Gender on Emotion Identification by Native Mandarin Speakers. 3151-3155 - Ivo Anjos, Maxine Eskénazi, Nuno Marques

, Margarida Grilo, Isabel Guimarães, João Magalhães, Sofia Cavaco
:
Detection of Voicing and Place of Articulation of Fricatives with Deep Learning in a Virtual Speech and Language Therapy Tutor. 3156-3160
Speech Synthesis Paradigms and Methods II
- Haitong Zhang, Yue Lin:

Unsupervised Learning for Sequence-to-Sequence Text-to-Speech for Low-Resource Languages. 3161-3165 - Kasperi Palkama, Lauri Juvela

, Alexander Ilin:
Conditional Spoken Digit Generation with StyleGAN. 3166-3170 - Jingzhou Yang, Lei He:

Towards Universal Text-to-Speech. 3171-3175 - Kouichi Katsurada, Korin Richmond

:
Speaker-Independent Mel-Cepstrum Estimation from Articulator Movements Using D-Vector Input. 3176-3180 - Xiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao, Helen Meng:

Enhancing Monotonicity for Robust Autoregressive Transformer TTS. 3181-3185 - Devang S. Ram Mohan, Raphael Lenain, Lorenzo Foglianti, Tian Huey Teh, Marlene Staib, Alexandra Torresquintero, Jiameng Gao:

Incremental Text to Speech for Neural Sequence-to-Sequence Models Using Reinforcement Learning. 3186-3190 - Tao Tu, Yuan-Jui Chen, Alexander H. Liu, Hung-yi Lee:

Semi-Supervised Learning for Multi-Speaker Text-to-Speech Synthesis Using Discrete Speech Representation. 3191-3195 - Pramit Saha, Sidney S. Fels

:
Learning Joint Articulatory-Acoustic Representations with Normalizing Flows. 3196-3200 - Yuki Yamashita

, Tomoki Koriyama
, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari:
Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis. 3201-3205 - Jacob J. Webber, Olivier Perrotin

, Simon King:
Hider-Finder-Combiner: An Adversarial Architecture for General Speech Signal Modification. 3206-3210
Speaker Embedding
- Wei-Wei Lin

, Man-Wai Mak:
Wav2Spk: A Simple DNN Architecture for Learning Speaker Embeddings from Waveforms. 3211-3215 - Minh Pham, Zeqian Li, Jacob Whitehill:

How Does Label Noise Affect the Quality of Speaker Embeddings? 3216-3220 - Xuechen Liu, Md. Sahidullah, Tomi Kinnunen:

A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings. 3221-3225 - Wei Xia, John H. L. Hansen:

Speaker Representation Learning Using Global Context Guided Channel and Time-Frequency Transformations. 3226-3230 - Yoohwan Kwon, Soo-Whan Chung, Hong-Goo Kang:

Intra-Class Variation Reduction of Speaker Representation in Disentanglement Framework. 3231-3235 - Munir Georges

, Jonathan Huang, Tobias Bocklet
:
Compact Speaker Embedding: lrx-Vector. 3236-3240 - Florian L. Kreyssig, Philip C. Woodland:

Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings. 3241-3245 - Junyi Peng, Rongzhi Gu, Yuexian Zou:

Deep Speaker Embedding with Long Short Term Centroid Learning for Text-Independent Speaker Verification. 3246-3250 - Lantian Li

, Dong Wang, Thomas Fang Zheng:
Neural Discriminant Analysis for Deep Speaker Embedding. 3251-3255 - Jaejin Cho, Piotr Zelasko, Jesús Villalba, Shinji Watanabe

, Najim Dehak
:
Learning Speaker Embedding from Text-to-Speech. 3256-3260
Single-Channel Speech Enhancement III
- Yan Zhao, DeLiang Wang:

Noisy-Reverberant Speech Enhancement Using DenseUNet with Time-Frequency Attention. 3261-3265 - Zhuohuang Zhang, Chengyun Deng, Yi Shen

, Donald S. Williamson
, Yongtao Sha, Yi Zhang, Hui Song, Xiangang Li:
On Loss Functions and Recurrency Training for GAN-Based Speech Enhancement Systems. 3266-3270 - Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang:

Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement. 3271-3275 - Mikolaj Kegler

, Pierre Beckmann
, Milos Cernak:
Deep Speech Inpainting of Time-Frequency Masks. 3276-3280 - Nikhil Shankar, Gautam Shreedhar Bhat, Issa M. S. Panahi:

Real-Time Single-Channel Deep Neural Network-Based Speech Enhancement on Edge Devices. 3281-3285 - Ju Lin, Sufeng Niu, Adriaan J. de Lind van Wijngaarden

, Jerome L. McClendon, Melissa C. Smith, Kuang-Ching Wang
:
Improved Speech Enhancement Using a Time-Domain GAN with Mask Learning. 3286-3290 - Alexandre Défossez, Gabriel Synnaeve, Yossi Adi:

Real Time Speech Enhancement in the Waveform Domain. 3291-3295 - Michal Romaniuk, Piotr Masztalski

, Karol Piaskowski, Mateusz Matuszewski:
Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks. 3296-3300
Multi-Channel Audio and Emotion Recognition
- Yuya Chiba, Takashi Nose

, Akinori Ito:
Multi-Stream Attention-Based BLSTM with Feature Segmentation for Speech Emotion Recognition. 3301-3305 - Guanjun Li, Shan Liang, Shuai Nie, Wenju Liu, Zhanlei Yang, Longshuai Xiao:

Microphone Array Post-Filter for Target Speech Enhancement Without a Prior Information of Point Interferers. 3306-3310 - Atsuo Hiroe

:
Similarity-and-Independence-Aware Beamformer: Method for Target Source Extraction Using Magnitude Spectrogram as Reference. 3311-3315 - Oleg Golokolenko, Gerald Schuller:

The Method of Random Directions Optimization for Stereo Audio Source Separation. 3316-3320 - Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen:

Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations. 3321-3325 - Robin Scheibler:

Generalized Minimal Distortion Principle for Blind Source Separation. 3326-3330 - Ying Zhong, Ying Hu, Hao Huang, Wushour Silamu:

A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition. 3331-3335 - Ruichu Cai, Kaibin Guo, Boyan Xu, Xiaoyan Yang, Zhenjie Zhang:

Meta Multi-Task Learning for Speech Emotion Recognition. 3336-3340 - François Grondin, Jean-Samuel Lauzon, Jonathan Vincent, François Michaud:

GEV Beamforming Supported by DOA-Based Masks Generated on Pairs of Microphones. 3341-3345
Computational Resource Constrained Speech Recognition
- Christin Jose

, Yuriy Mishchenko, Thibaud Sénéchal, Anish Shah, Alex Escott, Shiv Naga Prasad Vitaladevuni:
Accurate Detection of Wake Word Start and End Using a CNN. 3346-3350 - Saurabh Adya, Vineet Garg, Siddharth Sigtia, Pramod Simha, Chandra Dhir:

Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering. 3351-3355 - Somshubra Majumdar, Boris Ginsburg:

MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition. 3356-3360 - Abhinav Mehrotra, Lukasz Dudziak, Jinsu Yeo, Young-Yoon Lee, Ravichander Vipperla, Mohamed S. Abdelfattah, Sourav Bhattacharya, Samin Ishtiaq, Alberto Gil C. P. Ramos, SangJeong Lee, Daehyun Kim, Nicholas D. Lane:

Iterative Compression of End-to-End ASR Model Using AutoML. 3361-3365 - Hieu Duy Nguyen, Anastasios Alexandridis, Athanasios Mouchtaris:

Quantization Aware Training with Absolute-Cosine Regularization for Automatic Speech Recognition. 3366-3370 - Abhinav Garg, Gowtham P. Vadisetti, Dhananjaya Gowda, Sichen Jin, Aditya Jayasimha, Youngho Han, Jiyeon Kim, Junmo Park, Kwangyoun Kim, Sooyeon Kim, Young-Yoon Lee, Kyungbo Min, Chanwoo Kim

:
Streaming On-Device End-to-End ASR System for Privacy-Sensitive Voice-Typing. 3371-3375 - Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert:

Scaling Up Online Speech Recognition Using ConvNets. 3376-3380 - Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang:

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition. 3381-3385 - Grant P. Strimel, Ariya Rastrow, Gautam Tiwari, Adrien Piérard, Jon Webb:

Rescore in a Flash: Compact, Cache Efficient Hashing Data Structures for n-Gram Language Models. 3386-3390
Speech Synthesis: Prosody and Emotion
- Ravi Shankar, Hsi-Wei Hsieh, Nicolas Charon, Archana Venkataraman:

Multi-Speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor Network. 3391-3395 - Ravi Shankar, Jacob Sager, Archana Venkataraman:

Non-Parallel Emotion Conversion Using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator. 3396-3400 - Noé Tits

, Kevin El Haddad
, Thierry Dutoit:
Laughter Synthesis: Combining Seq2seq Modeling with Transfer Learning. 3401-3405 - Yuexin Cao

, Zhengchen Liu, Minchuan Chen, Jun Ma, Shaojun Wang, Jing Xiao:
Nonparallel Emotional Speech Conversion Using VAE-GAN. 3406-3410 - Alexander Sorin, Slava Shechtman, Ron Hoory:

Principal Style Components: Expressive Style Control and Cross-Speaker Transfer in Neural TTS. 3411-3415 - Kun Zhou, Berrak Sisman, Mingyang Zhang, Haizhou Li:

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion. 3416-3420 - Kento Matsumoto, Sunao Hara, Masanobu Abe:

Controlling the Strength of Emotions in Speech-Like Emotional Sound Generated by WaveNet. 3421-3425 - Guangyan Zhang, Ying Qin, Tan Lee

:
Learning Syllable-Level Discrete Prosodic Representation for Expressive Speech Generation. 3426-3430 - Takuya Kishida, Shin Tsukamoto, Toru Nakashika:

Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM. 3431-3435 - Fengyu Yang, Shan Yang, Qinghua Wu, Yujun Wang, Lei Xie:

Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis. 3436-3440 - Yukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:

Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis. 3441-3445 - Sefik Emre Eskimez

, Dimitrios Dimitriadis, Robert Gmyr, Kenichi Kumanati:
GAN-Based Data Generation for Speech Emotion Recognition. 3446-3450 - Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh:

The Phonetic Bases of Vocal Expressed Emotion: Natural versus Acted. 3451-3455
The Interspeech 2020 Far Field Speaker Verification Challenge
- Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li:

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge. 3456-3460 - Peng Zhang, Peng Hu, Xueliang Zhang:

Deep Embedding Learning for Text-Dependent Speaker Verification. 3461-3465 - Aleksei Gusev, Vladimir Volokhov, Alisa Vinogradova, Tseren Andzhukaev, Andrey Shulipa, Sergey Novoselov, Timur Pekhovsky, Alexander Kozlov:

STC-Innovation Speaker Recognition Systems for Far-Field Speaker Verification Challenge 2020. 3466-3470 - Li Zhang, Jian Wu, Lei Xie:

NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge. 3471-3475 - Ying Tong, Wei Xue

, Shanluo Huang, Fan Lu, Chao Zhang, Guohong Ding, Xiaodong He:
The JD AI Speaker Verification System for the FFSVC 2020 Challenge. 3476-3480
Multimodal Speech Processing
- Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang:

FaceFilter: Audio-Visual Speech Separation Using Still Images. 3481-3485 - Soo-Whan Chung, Hong-Goo Kang, Joon Son Chung:

Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision. 3486-3490 - Michael Wand

, Jürgen Schmidhuber:
Fusion Architectures for Word-Based Audiovisual Speech Recognition. 3491-3495 - Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, Yong Xu, Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng:

Audio-Visual Multi-Channel Recognition of Overlapped Speech. 3496-3500 - Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li:

TMT: A Transformer-Based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-Aware Dialog. 3501-3505 - George Sterpu, Christian Saam, Naomi Harte

:
Should we Hard-Code the Recurrence Concept or Learn it Instead ? Exploring the Transformer Architecture for Audio-Visual Speech Recognition. 3506-3509 - Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas, Edmilson da Silva Morais:

Resource-Adaptive Deep Learning for Visual Speech Recognition. 3510-3514 - Masood S. Mortazavi:

Speech-Image Semantic Alignment Does Not Depend on Any Prior Classification Tasks. 3515-3519 - Hong Liu, Zhan Chen, Bing Yang:

Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion. 3520-3524 - Vighnesh Reddy Konda, Mayur Warialani, Rakesh Prasanth Achari, Varad Bhatnagar, Jayaprakash Akula

, Preethi Jyothi, Ganesh Ramakrishnan, Gholamreza Haffari, Pankaj Singh:
Caption Alignment for Low Resource Audio-Visual Data. 3525-3529
Keynote 4
- Shehzad Mevawalla:

Successes, Challenges and Opportunities for Speech Technology in Conversational Agents.
Speech Synthesis: Neural Waveform Generation II
- Daniel Michelsanti

, Olga Slizovskaia
, Gloria Haro
, Emilia Gómez, Zheng-Hua Tan
, Jesper Jensen:
Vocoder-Based Speech Synthesis from Silent Videos. 3530-3534 - Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda

:
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-Autoregressive Pitch-Dependent Dilated Convolution Model for Parametric Speech Generation. 3535-3539 - Yi-Chiao Wu, Patrick Lumban Tobing

, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda
:
A Cyclical Post-Filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-Speech Systems. 3540-3544 - Hyun-Wook Yoon, Sang-Hoon Lee

, Hyeong-Rae Noh, Seong-Whan Lee:
Audio Dequantization for High Fidelity Audio Generation in Flow-Based Neural Vocoder. 3545-3549 - Manish Sharma, Tom Kenter, Rob Clark:

StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes. 3550-3554 - Yang Cui, Xi Wang, Lei He, Frank K. Soong:

An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis. 3555-3559 - Yang Ai, Xin Wang

, Junichi Yamagishi, Zhen-Hua Ling:
Reverberation Modeling for Source-Filter-Based Neural Vocoder. 3560-3564 - Ravichander Vipperla, Sangjun Park, Kihyun Choo, Samin Ishtiaq, Kyoungbo Min, Sourav Bhattacharya, Abhinav Mehrotra, Alberto Gil C. P. Ramos, Nicholas D. Lane:

Bunched LPCNet: Vocoder for Low-Cost Neural Text-To-Speech Systems. 3565-3569 - Eunwoo Song

, Min-Jae Hwang, Ryuichi Yamamoto, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim:
Neural Text-to-Speech with a Modeling-by-Generation Excitation Vocoder. 3570-3574 - Jan Vainer, Ondrej Dusek:

SpeedySpeech: Efficient Neural Speech Synthesis. 3575-3579
ASR Neural Network Architectures and Training II
- Zi-qiang Zhang, Yan Song, Jianshu Zhang, Ian McLoughlin

, Li-Rong Dai:
Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution. 3580-3584 - Ashtosh Sapru, Sri Garimella:

Leveraging Unlabeled Speech for Sequence Discriminative Training of Acoustic Models. 3585-3589 - Jinyu Li

, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong:
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability. 3590-3594 - Xuankai Chang, Aswin Shanmugam Subramanian

, Pengcheng Guo, Shinji Watanabe
, Yuya Fujita, Motoi Omachi:
End-to-End ASR with Adaptive Span Self-Attention. 3595-3599 - Egor Lakomkin, Jahn Heymann, Ilya Sklyar, Simon Wiesler:

Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition. 3600-3604 - Wilfried Michel, Ralf Schlüter

, Hermann Ney:
Early Stage LM Integration Using Local and Global Log-Linear Combination. 3605-3609 - Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu:

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context. 3610-3614 - Tara N. Sainath, Ruoming Pang, David Rybach

, Basi García, Trevor Strohman:
Emitting Word Timings with End-to-End Models. 3615-3619 - Danni Liu

, Gerasimos Spanakis
, Jan Niehues
:
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection. 3620-3624
Neural Networks for Language Modeling
- Ke Li, Daniel Povey, Sanjeev Khudanpur:

Neural Language Modeling with Implicit Cache Pointers. 3625-3629 - Abhilash Jain, Aku Rouhe, Stig-Arne Grönroos

, Mikko Kurimo:
Finnish ASR with Deep Transformer Models. 3630-3634 - Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

:
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR. 3635-3639 - Jen-Tzung Chien

, Yu-Min Huang:
Stochastic Convolutional Recurrent Networks for Language Modeling. 3640-3644 - Jingjing Huo, Yingbo Gao, Weiyue Wang

, Ralf Schlüter
, Hermann Ney:
Investigation of Large-Margin Softmax in Neural Language Modeling. 3645-3649 - Da-Rong Liu, Chunxi Liu, Frank Zhang, Gabriel Synnaeve, Yatharth Saraf, Geoffrey Zweig:

Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model. 3650-3654 - Yosuke Higuchi, Shinji Watanabe

, Nanxin Chen, Tetsuji Ogawa
, Tetsunori Kobayashi:
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict. 3655-3659 - Yuya Fujita, Shinji Watanabe

, Motoi Omachi, Xuankai Chang:
Insertion-Based Modeling for End-to-End Automatic Speech Recognition. 3660-3664
Phonetic Event Detection and Segmentation
- Yefei Chen, Heinrich Dinkel, Mengyue Wu, Kai Yu:

Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection. 3665-3669 - Joohyung Lee, Youngmoon Jung, Hoirin Kim:

Dual Attention in Time and Frequency Domain for Voice Activity Detection. 3670-3674 - Tianjiao Xu, Hui Zhang, Xueliang Zhang:

Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection. 3675-3679 - Avinash Kumar, S. Shahnawazuddin, Waquar Ahmad

:
A Noise Robust Technique for Detecting Vowels in Speech Signals. 3680-3684 - Marvin Lavechin, Marie-Philippe Gill, Ruben Bousbib, Hervé Bredin, Leibny Paola García-Perera:

End-to-End Domain-Adversarial Voice Activity Detection. 3685-3689 - Ayush Agarwal, Jagabandhu Mishra

, S. R. Mahadeva Prasanna:
VOP Detection in Variable Speech Rate Condition. 3690-3694 - Zhenpeng Zheng, Jianzong Wang

, Ning Cheng, Jian Luo, Jing Xiao:
MLNET: An Adaptive Multiple Receptive-Field Attention Neural Network for Voice Activity Detection. 3695-3699 - Felix Kreuk, Joseph Keshet

, Yossi Adi:
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation. 3700-3704 - Piotr Zelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg

, Najim Dehak
:
That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages. 3705-3709 - S. Limonard, Catia Cucchiarini, R. W. N. M. van Hout, Helmer Strik

:
Analyzing Read Aloud Speech by Primary School Pupils: Insights for Research and Development. 3710-3714
Human Speech Production II
- Heikki Rasilo

, Yannick Jadoul
:
Discovering Articulatory Speech Targets from Synthesized Random Babble. 3715-3719 - Tamás Gábor Csapó:

Speaker Dependent Acoustic-to-Articulatory Inversion Using Real-Time MRI of the Vocal Tract. 3720-3724 - Narjes Bozorg, Michael T. Johnson:

Acoustic-to-Articulatory Inversion with Deep Autoregressive Articulatory-WaveNet. 3725-3729 - Ioannis K. Douros, Ajinkya Kulkarni

, Chrysanthi Dourou, Yu Xie, Jacques Felblinger, Karyna Isaieva
, Pierre-André Vuissoz, Yves Laprie:
Using Silence MR Image to Synthesise Dynamic MRI Vocal Tract Data of CV. 3730-3734 - Tamás Gábor Csapó, Kele Xu

:
Quantification of Transducer Misalignment in Ultrasound Tongue Imaging. 3735-3739 - Maud Parrot, Juliette Millet, Ewan Dunbar:

Independent and Automatic Evaluation of Speaker-Independent Acoustic-to-Articulatory Reconstruction. 3740-3744 - Lorenz Diener, Mehrdad Roustay Vishkasougheh, Tanja Schultz

:
CSL-EMG_Array: An Open Access Corpus for EMG-to-Speech Conversion. 3745-3749 - Joshua Penney

, Felicity Cox
, Anita Szakay
:
Links Between Production and Perception of Glottalisation in Individual Australian English Speaker/Listeners. 3750-3754
New Trends in Self-Supervised Speech Processing
- Shamane Siriwardhana, Andrew Reis, Rivindu Weerasekera, Suranga Nanayakkara

:
Jointly Fine-Tuning "BERT-Like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition. 3755-3759 - Yu-An Chung, Hao Tang, James R. Glass:

Vector-Quantized Autoregressive Predictive Coding. 3760-3764 - Xingchen Song, Guangsen Wang, Yiheng Huang, Zhiyong Wu, Dan Su, Helen Meng:

Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks. 3765-3769 - Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross B. Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed:

Large Scale Weakly and Semi-Supervised Learning for Low-Resource Video ASR. 3770-3774 - Ken'ichi Kumatani, Dimitrios Dimitriadis, Yashesh Gaur, Robert Gmyr, Sefik Emre Eskimez, Jinyu Li

, Michael Zeng:
Sequence-Level Self-Learning with Multiple Hypotheses. 3775-3779 - Haibin Wu, Andy T. Liu, Hung-yi Lee:

Defense for Black-Box Attacks on Anti-Spoofing Models by Self-Supervised Learning. 3780-3784 - Shu-Wen Yang, Andy T. Liu, Hung-yi Lee:

Understanding Self-Attention of Self-Supervised Audio Transformers. 3785-3789 - Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski

, Adrian Lancucki, Ricard Marxer
, James R. Glass:
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning. 3790-3794 - Ayimunishagu Abulimiti, Jochen Weiner, Tanja Schultz

:
Automatic Speech Recognition for ILSE-Interviews: Longitudinal Conversational Speech Recordings Covering Aging and Cognitive Decline. 3795-3799
Learning Techniques for Speaker Recognition II
- Dao Zhou, Longbiao Wang, Kong Aik Lee

, Yibo Wu, Meng Liu, Jianwu Dang, Jianguo Wei
:
Dynamic Margin Softmax Loss for Speaker Verification. 3800-3804 - Magdalena Rybicka

, Konrad Kowalczyk
:
On Parameter Adaptation in Softmax-Based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-Based Speaker Recognition. 3805-3809 - Victoria Mingote

, Antonio Miguel, Alfonso Ortega Giménez
, Eduardo Lleida:
Training Speaker Enrollment Models by Network Optimization. 3810-3814 - Seyyed Saeed Sarfjoo, Srikanth R. Madikeri, Petr Motlícek, Sébastien Marcel:

Supervised Domain Adaptation for Text-Independent Speaker Verification Using Limited Data. 3815-3819 - Yuheng Wei, Junzhao Du, Hui Liu:

Angular Margin Centroid Loss for Text-Independent Speaker Recognition. 3820-3824 - Jiawen Kang, Ruiqi Liu, Lantian Li

, Yunqi Cai, Dong Wang, Thomas Fang Zheng:
Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning. 3825-3829 - Brecht Desplanques, Jenthe Thienpondt

, Kris Demuynck:
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification. 3830-3834 - Wenda Chen

, Jonathan Huang, Tobias Bocklet
:
Length- and Noise-Aware Training Techniques for Short-Utterance Speaker Recognition. 3835-3839
Spoken Language Evaluatiosn
- Yiting Lu, Mark J. F. Gales, Yu Wang:

Spoken Language 'Grammatical Error Correction'. 3840-3844 - Sara Papi

, Edmondo Trentin, Roberto Gretter, Marco Matassoni, Daniele Falavigna:
Mixtures of Deep Neural Experts for Automated Speech Scoring. 3845-3849 - Xinhao Wang, Klaus Zechner, Christopher Hamill:

Targeted Content Feedback in Spoken Language Learning and Assessment. 3850-3854 - Vyas Raina, Mark J. F. Gales, Kate M. Knill:

Universal Adversarial Attacks on Spoken Language Assessment Systems. 3855-3859 - Xixin Wu, Kate M. Knill, Mark J. F. Gales, Andrey Malinin:

Ensemble Approaches for Uncertainty in Spoken Language Assessment. 3860-3864 - Zhenchao Lin, Ryo Takashima, Daisuke Saito, Nobuaki Minematsu, Noriko Nakanishi:

Shadowability Annotation with Fine Granularity on L2 Utterances and its Improvement with Native Listeners' Script-Shadowing. 3865-3869 - Yu Bai, Ferdy Hubers

, Catia Cucchiarini, Helmer Strik
:
ASR-Based Evaluation and Feedback for Individualized Reading Practice. 3870-3874 - Dominika Woszczyk, Stavros Petridis, David E. Millard:

Domain Adversarial Neural Networks for Dysarthric Speech Recognition. 3875-3879 - Shunsuke Hidaka, Yogaku Lee, Kohei Wakamiya, Takashi Nakagawa, Tokihiko Kaburagi:

Automatic Estimation of Pathological Voice Quality Based on Recurrent Neural Network Using Amplitude and Phase Spectrogram. 3880-3884
Spoken Dialogue System
- Jen-Tzung Chien

, Po-Chien Hsu:
Stochastic Curiosity Exploration for Dialogue Systems. 3885-3889 - Myeongho Jeong, Seungtaek Choi

, Hojae Han, Kyungho Kim
, Seung-won Hwang:
Conditional Response Augmentation for Dialogue Using Knowledge Distillation. 3890-3894 - Hongyin Luo, Shang-Wen Li, James R. Glass:

Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption. 3895-3899 - Teakgyu Hong, Oh-Woog Kwon, Young-Kil Kim:

End-to-End Task-Oriented Dialog System Through Template Slot Value Generation. 3900-3904 - Zhenhao He, Jiachun Wang, Jian Chen:

Task-Oriented Dialog Generation with Enhanced Entity Representation. 3905-3909 - Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara

:
End-to-End Speech-to-Dialog-Act Recognition. 3910-3914 - Yao Qian, Yu Shi, Michael Zeng:

Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-Oriented Spoken Dialog. 3915-3919 - Xinnuo Xu, Yizhe Zhang, Lars Liden, Sungjin Lee:

Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task. 3920-3924
Dereverberation and Echo Cancellation
- Ziteng Wang, Yueyue Na, Zhang Liu, Yun Li, Biao Tian, Qiang Fu:

A Semi-Blind Source Separation Approach for Speech Dereverberation. 3925-3929 - Joon-Young Yang, Joon-Hyuk Chang:

Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation. 3930-3934 - Vinay Kothapally, Wei Xia, Shahram Ghorbani, John H. L. Hansen, Wei Xue, Jing Huang:

SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation Using Optimally Smoothed Spectral Mapping. 3935-3939 - Chenggang Zhang, Xueliang Zhang:

A Robust and Cascaded Acoustic Echo Cancellation Based on Deep Learning. 3940-3944 - Yi Zhang, Chengyun Deng, Shiqian Ma, Yongtao Sha, Hui Song, Xiangang Li:

Generative Adversarial Network Based Acoustic Echo Cancellation. 3945-3949 - Lukas Pfeifenberger, Franz Pernkopf

:
Nonlinear Residual Echo Suppression Using a Recurrent Neural Network. 3950-3954 - Yi Gao, Ian Liu, J. Zheng, Cheng Luo, Bin Li:

Independent Echo Path Modeling for Stereophonic Acoustic Echo Cancellation. 3955-3958 - Hongsheng Chen, Teng Xiang, Kai Chen, Jing Lu:

Nonlinear Residual Echo Suppression Based on Multi-Stream Conv-TasNet. 3959-3963 - Wenzhi Fan, Jing Lu:

Improving Partition-Block-Based Acoustic Echo Canceler in Under-Modeling Scenarios. 3964-3968 - Jung-Hee Kim, Joon-Hyuk Chang:

Attention Wave-U-Net for Acoustic Echo Cancellation. 3969-3973
Speech Synthesis: Toward End-to-End Synthesis
- Zexin Cai, Chuxiong Zhang, Ming Li:

From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint. 3974-3978 - Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi:

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS? 3979-3983 - Tao Wang, Xuefei Liu, Jianhua Tao, Jiangyan Yi, Ruibo Fu, Zhengqi Wen:

Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding. 3984-3988 - Tao Wang, Jianhua Tao, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Chunyu Qiang:

Bi-Level Speaker Supervision for One-Shot Speech Synthesis. 3989-3993 - Alex Peiró Lilja, Mireia Farrús:

Naturalness Enhancement with Linguistic Information in End-to-End TTS Using Unsupervised Parallel Encoding. 3994-3998 - Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, Ming Liu, Ming Zhou:

MoBoAligner: A Neural Alignment Model for Non-Autoregressive TTS with Monotonic Boundary Search. 3999-4003 - Dan Lim, Won Jang, Gyeonghwan O, Heayoung Park, Bongwan Kim, Jaesam Yoon:

JDI-T: Jointly Trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment. 4004-4008 - Masashi Aso, Shinnosuke Takamichi, Hiroshi Saruwatari:

End-to-End Text-to-Speech Synthesis with Unaligned Multiple Language Units Based on Attention. 4009-4013 - Qingyun Dou, Joshua Efiong, Mark J. F. Gales:

Attention Forcing for Speech Synthesis. 4014-4018 - Jason Fong, Jason Taylor, Simon King:

Testing the Limits of Representation Mixing for Pronunciation Correction in End-to-End Speech Synthesis. 4019-4023 - Mingjian Chen, Xu Tan

, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin
:
MultiSpeech: Multi-Speaker Text to Speech with Transformer. 4024-4028
Speech Enhancement, Bandwidth Extension and Hearing Aids
- Pavlos Papadopoulos, Shrikanth Narayanan:

Exploiting Conic Affinity Measures to Design Speech Enhancement Systems Operating in Unseen Noise Conditions. 4029-4033 - Yunyun Ji, Longting Xu, Wei-Ping Zhu

:
Adversarial Dictionary Learning for Monaural Speech Enhancement. 4034-4038 - Shogo Seki, Moe Takada, Tomoki Toda

:
Semi-Supervised Self-Produced Speech Enhancement and Suppression Based on Joint Source Modeling of Air- and Body-Conducted Signals Using Variational Autoencoder. 4039-4043 - Ran Weisman, Vladimir Tourbabin, Paul Calamia, Boaz Rafaely:

Spatial Covariance Matrix Estimation for Reverberant Speech with Application to Speech Enhancement. 4044-4048 - Minh Tri Ho, Jinyoung Lee, Bong-Ki Lee, Dong Hoon Yi, Hong-Goo Kang:

A Cross-Channel Attention-Based Wave-U-Net for Multi-Channel Speech Enhancement. 4049-4053 - Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough:

TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids. 4054-4058 - Shu Hikosaka, Shogo Seki, Tomoki Hayashi, Kazuhiro Kobayashi, Kazuya Takeda, Hideki Banno, Tomoki Toda

:
Intelligibility Enhancement Based on Speech Waveform Modification Using Hearing Impairment. 4059-4063 - Nana Hou, Chenglin Xu, Van Tung Pham, Joey Tianyi Zhou, Eng Siong Chng, Haizhou Li:

Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network. 4064-4068 - Nana Hou, Chenglin Xu, Joey Tianyi Zhou, Eng Siong Chng, Haizhou Li:

Multi-Task Learning for End-to-End Noise-Robust Bandwidth Extension. 4069-4073 - Shichao Hu, Bin Zhang, Beici Liang, Ethan Zhao, Simon Lui:

Phase-Aware Music Super-Resolution Using Generative Adversarial Networks. 4074-4078
Speech Emotion Recognition III
- Jian Huang, Jianhua Tao, Bin Liu, Zheng Lian

:
Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition. 4079-4083 - Md Asif Jalal, Rosanna Milner

, Thomas Hain
, Roger K. Moore
:
Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition. 4084-4088 - Weiquan Fan, Xiangmin Xu, Xiaofen Xing, Dongyan Huang:

Adaptive Domain-Aware Representation Learning for Speech Emotion Recognition. 4089-4093 - Huan Zhou, Kai Liu:

Speech Emotion Recognition with Discriminative Feature Learning. 4094-4097 - Hengshun Zhou, Jun Du, Yanhui Tu, Chin-Hui Lee:

Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions. 4098-4102 - Yongwei Li, Jianhua Tao, Bin Liu, Donna Erickson, Masato Akagi:

Comparison of Glottal Source Parameter Values in Emotional Vowels. 4103-4107 - Huang-Cheng Chou

, Chi-Chun Lee
:
Learning to Recognize Per-Rater's Emotion Perception Using Co-Rater Training Strategy with Soft and Hard Labels. 4108-4112 - Md. Asif Jalal, Rosanna Milner

, Thomas Hain
:
Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition. 4113-4117
Accoustic Phonetics of L1-L2 and Other Interactions
- Iona Gessinger

, Bernd Möbius, Bistra Andreeva
, Eran Raveh, Ingmar Steiner:
Phonetic Accommodation of L2 German Speakers to the Virtual Language Learning Tutor Mirabella. 4118-4122 - Yuling Gu, Nancy F. Chen

:
Characterization of Singaporean Children's English: Comparisons to American and British Counterparts Using Archetypal Analysis. 4123-4127 - Svetlana Kaminskaïa:

Rhythmic Convergence in Canadian French Varieties? 4128-4132 - Sreeja Manghat, Sreeram Manghat, Tanja Schultz

:
Malayalam-English Code-Switched: Grapheme to Phoneme System. 4133-4137 - Mathilde Hutin

, Adèle Jatteau, Ioana Vasilescu, Lori Lamel, Martine Adda-Decker:
Ongoing Phonologization of Word-Final Voicing Alternations in Two Romance Languages: Romanian and French. 4138-4142 - Maxwell Hope, Jason Lilley

:
Cues for Perception of Gender in Synthetic Voices and the Role of Identity. 4143-4147 - Alla Menshikova, Daniil Kocharov

, Tatiana Kachkovskaia
:
Phonetic Entrainment in Cooperative Dialogues: A Case of Russian. 4148-4152 - Chengwei Xu, Wentao Gu

:
Prosodic Characteristics of Genuine and Mock (Im)polite Mandarin Utterances. 4153-4157 - Yanping Li, Catherine T. Best, Michael D. Tyler

, Denis Burnham:
Tone Variations in Regionally Accented Mandarin. 4158-4162 - Yike Yang

, Si Chen, Xi Chen:
F0 Patterns in Mandarin Statements of Mandarin and Cantonese Speakers. 4163-4167
Conversational Systems
- Yung-Sung Chuang

, Chi-Liang Liu, Hung-yi Lee, Lin-Shan Lee:
SpeechBERT: An Audio-and-Text Jointly Learned Language Model for End-to-End Spoken Question Answering. 4168-4172 - Chia-Chih Kuo, Shang-Bao Luo, Kuan-Yu Chen:

An Audio-Enriched BERT-Based Framework for Spoken Multiple-Choice Question Answering. 4173-4177 - Binxuan Huang, Han Wang, Tong Wang, Yue Liu, Yang Liu:

Entity Linking for Short Text Using Structured Knowledge Graph via Multi-Grained Text Matching. 4178-4182 - Mingxin Zhang, Tomohiro Tanaka, Wenxin Hou, Shengzhou Gao, Takahiro Shinozaki:

Sound-Image Grounding Based Focusing Mechanism for Efficient Automatic Spoken Language Acquisition. 4183-4187 - Kenta Yamamoto, Koji Inoue, Tatsuya Kawahara

:
Semi-Supervised Learning for Character Expression of Spoken Dialogue Systems. 4188-4192 - Xiaohan Shi

, Sixia Li, Jianwu Dang:
Dimensional Emotion Prediction Based on Interactive Context in Conversation. 4193-4197 - Asma Atamna

, Chloé Clavel:
HRI-RNN: A User-Robot Dynamics-Oriented RNN for Engagement Decrease Detection. 4198-4202 - Simone Fuscone, Benoît Favre

, Laurent Prévot
:
Neural Representations of Dialogical History for Improving Upcoming Turn Acoustic Parameters Prediction. 4203-4207 - Shengli Hu:

Detecting Domain-Specific Credibility and Expertise in Text and Speech. 4208-4212
The Attacker’s Perpective on Automatic Speaker Verification
- Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li:

The Attacker's Perspective on Automatic Speaker Verification: An Overview. 4213-4217 - Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

:
Extrapolating False Alarm Rates in Automatic Speaker Verification. 4218-4222 - Ziyue Jiang, Hongcheng Zhu

, Li Peng, Wenbing Ding, Yanzhen Ren:
Self-Supervised Spoofing Audio Detection Scheme. 4223-4227 - Qing Wang, Pengcheng Guo, Lei Xie:

Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition. 4228-4232 - Jesús Villalba

, Yuekai Zhang, Najim Dehak
:
x-Vectors Meet Adversarial Attacks: Benchmarking Adversarial Robustness in Speaker Verification. 4233-4237 - Yuekai Zhang, Ziyan Jiang, Jesús Villalba

, Najim Dehak
:
Black-Box Attacks on Spoofing Countermeasures Using Transferability of Adversarial Examples. 4238-4242
Summarization, Semantic Analysis and Classification
- Krishna D. N, Ankita Patil:

Multimodal Emotion Recognition Using Cross-Modal Attention and 1D Convolutional Neural Networks. 4243-4247 - Potsawee Manakul

, Mark J. F. Gales, Linlin Wang:
Abstractive Spoken Document Summarization Using Hierarchical Model with Multi-Stage Attention Diversity Optimization. 4248-4252 - Yichi Zhang, Yinpei Dai, Zhijian Ou, Huixin Wang, Junlan Feng

:
Improved Learning of Word Embeddings with Word Definitions and Semantic Injection. 4253-4257 - Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur:

Wake Word Detection with Alignment-Free Lattice-Free MMI. 4258-4262 - Thai Binh Nguyen

, Quang Minh Nguyen, Thi Thu Hien Nguyen
, Quoc Truong Do, Luong Chi Mai:
Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models. 4263-4267 - Hemant Yadav, Sreyan Ghosh, Yi Yu, Rajiv Ratn Shah

:
End-to-End Named Entity Recognition from English Speech. 4268-4272 - Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P. Strimel, Athanasios Mouchtaris:

Semantic Complexity in End-to-End Spoken Language Understanding. 4273-4277 - Trang Tran

, Morgan Tinkler, Gary Yeung, Abeer Alwan, Mari Ostendorf:
Analysis of Disfluency in Children's Speech. 4278-4282 - Ashish R. Mittal, Samarth Bharadwaj, Shreya Khare, Saneem A. Chemmengath, Karthik Sankaranarayanan, Brian Kingsbury:

Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition. 4283-4287 - Rishika Agarwal, Xiaochuan Niu, Pranay Dighe, Srikanth Vishnubhotla, Sameer Badaskar, Devang Naik:

Complementary Language Model and Parallel Bi-LRNN for False Trigger Mitigation. 4288-4292
Speaker Recognition II
- Tianchi Liu

, Rohan Kumar Das, Maulik C. Madhavi, Shengmei Shen, Haizhou Li:
Speaker-Utterance Dual Attention for Speaker and Utterance Verification. 4293-4297 - Lu Yi, Man-Wai Mak:

Adversarial Separation and Adaptation Network for Far-Field Speaker Verification. 4298-4302 - Hyewon Han

, Soo-Whan Chung, Hong-Goo Kang:
MIRNet: Learning Multiple Identities Representations in Overlapped Speech. 4303-4307 - Weiwei Lin

, Man-Wai Mak, Jen-Tzung Chien
:
Strategies for End-to-End Text-Independent Speaker Verification. 4308-4312 - Rosa González Hautamäki, Tomi Kinnunen:

Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data. 4313-4317 - Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Alan McCree, Abeer Alwan:

Variable Frame Rate-Based Data Augmentation to Handle Speaking-Style Variability for Automatic Speaker Verification. 4318-4322 - Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin:

A Machine of Few Words: Interactive Speaker Recognition with Reinforcement Learning. 4323-4327 - Filip Granqvist, Matt Seigel, Rogier C. van Dalen, Áine Cahill, Stephen Shum, Matthias Paulik:

Improving On-Device Speaker Verification Using Federated Learning with Privacy. 4328-4332 - Shreyas Ramoji, Prashant Krishnan V, Sriram Ganapathy:

Neural PLDA Modeling for End-to-End Speaker Verification. 4333-4337
General Topics in Speech Recognition
- Kuba Lopatka, Tobias Bocklet

:
State Sequence Pooling Training of Acoustic Models for Keyword Spotting. 4338-4342 - Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio López-Moreno, Rajiv Mathews:

Training Keyword Spotting Models on Non-IID Data with Federated Learning. 4343-4347 - Rongqing Huang, Ossama Abdel-Hamid, Xinwei Li, Gunnar Evermann:

Class LM and Word Mapping for Contextual Biasing in End-to-End ASR. 4348-4351 - Lasse Borgholt

, Jakob D. Havtorn, Zeljko Agic, Anders Søgaard, Lars Maaløe
, Christian Igel:
Do End-to-End Speech Recognition Models Care About Context? 4352-4356 - Ankur Kumar, Sachin Singh, Dhananjaya Gowda, Abhinav Garg, Shatrughan Singh, Chanwoo Kim

:
Utterance Confidence Measure for End-to-End Speech Recognition with Applications to Distributed Speech Recognition Scenarios. 4357-4361 - Huaxin Wu, Genshun Wan, Jia Pan:

Speaker Code Based Speaker Adaptive Training Using Model Agnostic Meta-Learning. 4362-4366 - Han Zhu

, Jiangjiang Zhao, Yuling Ren, Li Wang, Pengyuan Zhang:
Domain Adaptation Using Class Similarity for Robust Speech Recognition. 4367-4371 - Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura:

Incremental Machine Speech Chain Towards Enabling Listening While Speaking in Real-Time. 4372-4376 - Tina Raissi, Eugen Beck, Ralf Schlüter

, Hermann Ney:
Context-Dependent Acoustic Modeling Without Explicit Phone Clustering. 4377-4381 - S. Shahnawazuddin, Nagaraj Adiga, Kunal Kumar, Aayushi Poddar, Waquar Ahmad

:
Voice Conversion Based Data Augmentation to Improve Children's Speech Recognition in Limited Data Scenario. 4382-4386
Speech Synthesis: Prosody Modeling
- Sri Karlapati, Alexis Moinet, Arnaud Joly, Viacheslav Klimkov, Daniel Sáez-Trigueros, Thomas Drugman:

CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech. 4387-4391 - Binghuai Lin, Liyuan Wang, Xiaoli Feng, Jinsong Zhang:

Joint Detection of Sentence Stress and Phrase Boundary for Prosody. 4392-4396 - Ajinkya Kulkarni

, Vincent Colotte, Denis Jouvet:
Transfer Learning of the Expressivity Using FLOW Metric Learning in Multispeaker Text-to-Speech Synthesis. 4397-4401 - Jae-Sung Bae, Hanbin Bae, Young-Sun Joo

, Junmo Lee, Gyeong-Hoon Lee, Hoon-Young Cho:
Speaking Speed Control of End-to-End Speech Synthesis Using Sentence-Level Conditioning. 4402-4406 - Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime Lorenzo-Trueba:

Dynamic Prosody Generation for Speech Synthesis Using Linguistics-Driven Acoustic Embedding Selection. 4407-4411 - Tom Kenter, Manish Sharma, Rob Clark:

Improving the Prosody of RNN-Based English Text-To-Speech Synthesis by Incorporating a BERT Model. 4412-4416 - Yi Zhao, Haoyu Li, Cheng-I Lai, Jennifer Williams, Erica Cooper, Junichi Yamagishi:

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction. 4417-4421 - Zhen Zeng, Jianzong Wang

, Ning Cheng, Jing Xiao:
Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit. 4422-4426 - Yuma Shirahata, Daisuke Saito, Nobuaki Minematsu:

Discriminative Method to Extract Coarse Prosodic Structure and its Application for Statistical Phrase/Accent Command Estimation. 4427-4431 - Tuomo Raitio, Ramya Rasipuram, Dan Castellani:

Controllable Neural Text-to-Speech Synthesis Using Intuitive Prosodic Features. 4432-4436 - Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore:

Controllable Neural Prosody Synthesis. 4437-4441 - Matt Whitehill, Shuang Ma, Daniel McDuff, Yale Song:

Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency. 4442-4446 - Yang Gao, Weiyi Zheng, Zhaojun Yang, Thilo Köhler, Christian Fuegen, Qing He:

Interactive Text-to-Speech System via Joint Style Analysis. 4447-4451
Language Learning
- Kevin Hirschi

, Okim Kang, Catia Cucchiarini, John H. L. Hansen, Keelan Evanini, Helmer Strik:
Mobile-Assisted Prosody Training for Limited English Proficiency: Learner Background and Speech Learning Pattern. 4452-4456 - Daniel R. van Niekerk

, Anqi Xu
, Branislav Gerazov
, Paul Konstantin Krug, Peter Birkholz
, Yi Xu:
Finding Intelligible Consonant-Vowel Sounds Using High-Quality Articulatory Synthesis. 4457-4461 - Venkat Krishnamohan, Akshara Soman

, Anshul Gupta, Sriram Ganapathy:
Audiovisual Correspondence Learning in Humans and Machines. 4462-4466 - Yizhou Lan:

Perception of English Fricatives and Affricates by Advanced Chinese Learners of English. 4467-4470 - Kimiko Tsukada

, Joo-Yeon Kim, Jeong-Im Han
:
Perception of Japanese Consonant Length by Native Speakers of Korean Differing in Japanese Learning Experience. 4471-4475 - Si Ioi Ng, Tan Lee

:
Automatic Detection of Phonological Errors in Child Speech Using Siamese Recurrent Autoencoder. 4476-4480 - Hongwei Ding, Binghuai Lin, Liyuan Wang, Hui Wang, Ruomei Fang:

A Comparison of English Rhythm Produced by Native American Speakers and Mandarin ESL Primary School Learners. 4481-4485 - Chao Zhou

, Silke Hamann:
Cross-Linguistic Interaction Between Phonological Categorization and Orthography Predicts Prosodic Effects in the Acquisition of Portuguese Liquids by L1-Mandarin Learners. 4486-4490 - Wenqian Li, Jung-Yueh Tu

:
Cross-Linguistic Perception of Utterances with Willingness and Reluctance in Mandarin by Korean L2 Learners. 4491-4495
Speech Enhancement
- Rui Cheng

, Changchun Bao:
Speech Enhancement Based on Beamforming and Post-Filtering by Combining Phase Information. 4496-4500 - Yu-Xuan Wang, Jun Du, Li Chai, Chin-Hui Lee, Jia Pan:

A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement. 4501-4505 - Jiaqi Su, Zeyu Jin, Adam Finkelstein:

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks. 4506-4510 - Ashutosh Pandey, DeLiang Wang:

Learning Complex Spectral Mapping for Speech Enhancement with Improved Cross-Corpus Generalization. 4511-4515 - Julius Richter

, Guillaume Carbajal, Timo Gerkmann
:
Speech Enhancement with Stochastic Temporal Convolutional Networks. 4516-4520 - Mandar Gogate

, Kia Dashtipour, Amir Hussain:
Visual Speech In Real Noisy Environments (VISION): A Novel Benchmark Dataset and Deep Learning-Based Baseline System. 4521-4525 - Aswin Sivaraman

, Minje Kim:
Sparse Mixture of Local Experts for Efficient Speech Enhancement. 4526-4530 - Vinith Kishore, Nitya Tiwari, Periyasamy Paramasivam:

Improved Speech Enhancement Using TCN with Multiple Encoder-Decoder Layers. 4531-4535 - Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen:

Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations. 4536-4540 - Mathieu Fontaine, Kouhei Sekiguchi, Aditya Arie Nugraha, Kazuyoshi Yoshii:

Unsupervised Robust Speech Enhancement Based on Alpha-Stable Fast Multichannel Nonnegative Matrix Factorization. 4541-4545
Speech in Health II
- Merlin Albes, Zhao Ren, Björn W. Schuller, Nicholas Cummins

:
Squeeze for Sneeze: Compact Neural Networks for Cold and Flu Recognition. 4546-4550 - Nadee Seneviratne, James R. Williamson, Adam C. Lammert, Thomas F. Quatieri, Carol Y. Espy-Wilson:

Extended Study on the Use of Vocal Tract Variables to Quantify Neuromotor Coordination in Depression. 4551-4555 - Danai Xezonaki, Georgios Paraskevopoulos, Alexandros Potamianos, Shrikanth Narayanan:

Affective Conditioning on Hierarchical Attention Networks Applied to Depression Detection from Transcribed Clinical Interviews. 4556-4560 - Zhaocheng Huang, Julien Epps, Dale Joachim, Brian Stasak

, James R. Williamson, Thomas F. Quatieri:
Domain Adaptation for Enhancing Speech-Based Depression Detection in Natural Environmental Conditions Using Dilated CNNs. 4561-4565 - Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi, Ildikó Hoffmann:

Making a Distinction Between Schizophrenia and Bipolar Disorder Based on Temporal Parameters in Spontaneous Speech. 4566-4570 - Mark A. Huckvale, András Beke, Mirei Ikushima:

Prediction of Sleepiness Ratings from Voice by Man and Machine. 4571-4575 - Kristin J. Teplansky, Alan Wisler

, Beiming Cao, Wendy Liang, Chad W. Whited, Ted Mau, Jun Wang:
Tongue and Lip Motion Patterns in Alaryngeal Speech. 4576-4580 - Zhengjun Yue, Heidi Christensen

, Jon Barker:
Autoencoder Bottleneck Features with Multi-Task Optimisation for Improved Continuous Dysarthric Speech Recognition. 4581-4585 - Jhansi Mallela, Aravind Illa, Yamini Belur, Atchayaram Nalini

, Ravi Yadav
, Pradeep Reddy, Dipanjan Gope, Prasanta Kumar Ghosh:
Raw Speech Waveform Based Classification of Patients with ALS, Parkinson's Disease and Healthy Controls Using CNN-BLSTM. 4586-4590 - Anna Pompili

, Rubén Solera-Ureña
, Alberto Abad
, Rita Cardoso, Isabel Guimarães
, Margherita Fabbri, Isabel P. Martins, Joaquim J. Ferreira
:
Assessment of Parkinson's Disease Medication State Through Automatic Speech Analysis. 4591-4595
Speech and Audio Quality Assessment
- Chao Zhang, Junjie Cheng, Yanmei Gu, Huacan Wang, Jun Ma, Shaojun Wang, Jing Xiao:

Improving Replay Detection System with Channel Consistency DenseNeXt for the ASVspoof 2019 Challenge. 4596-4600 - Przemyslaw Falkowski-Gilski

, Grzegorz Debita
, Marcin Habrych, Bogdan Miedzinski, Przemyslaw Jedlikowski
, Bartosz Polnik
, Jan Wandzio, Xin Wang:
Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System. 4601-4605 - Waito Chiu, Yan Xu, Andrew Abel, Chun Lin, Zhengzheng Tu:

Investigating the Visual Lombard Effect with Gabor Based Features. 4606-4610 - Qiang Huang, Thomas Hain

:
Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models. 4611-4615 - Alessandro Ragano

, Emmanouil Benetos
, Andrew Hines
:
Development of a Speech Quality Database Under Uncontrolled Conditions. 4616-4620 - Robin Algayres, Mohamed Salah Zaïem, Benoît Sagot, Emmanuel Dupoux:

Evaluating the Reliability of Acoustic Speech Embeddings. 4621-4625 - Hao Li, DeLiang Wang, Xueliang Zhang, Guanglai Gao:

Frame-Level Signal-to-Noise Ratio Estimation Using Deep Learning. 4626-4630 - Xuan Dong, Donald S. Williamson

:
A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals. 4631-4635 - Avamarie Brueggeman, John H. L. Hansen:

Effect of Spectral Complexity Reduction and Number of Instruments on Musical Enjoyment with Cochlear Implants. 4636-4640 - Michal Kosmider:

Spectrum Correction: Acoustic Scene Classification with Mismatched Recording Devices. 4641-4645
Privacy and Security in Speech Communication
- Matthew O'Connor, W. Bastiaan Kleijn

:
Distributed Summation Privacy for Speech Enhancement. 4646-4650 - Anna Leschanowsky

, Sneha Das
, Tom Bäckström
, Pablo Pérez Zarazaga
:
Perception of Privacy Measured in the Crowd - Paired Comparison on the Effect of Background Noises. 4651-4655 - Felix Kreuk, Yossi Adi, Bhiksha Raj, Rita Singh, Joseph Keshet

:
Hide and Speak: Towards Deep Neural Networks for Speech Steganography. 4656-4660 - Sina Däubener, Lea Schönherr

, Asja Fischer
, Dorothea Kolossa
:
Detecting Adversarial Examples for Speech Recognition via Uncertainty Quantification. 4661-4665 - David Ifeoluwa Adelani, Ali Davody, Thomas Kleinbauer, Dietrich Klakow:

Privacy Guarantees for De-Identifying Text Transformations. 4666-4670 - Tejas Jayashankar, Jonathan Le Roux, Pierre Moulin:

Detecting Audio Attacks on ASR Systems with Dropout Uncertainty. 4671-4675
Voice Conversion and Adaptation II
- Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda

:
Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining. 4676-4680 - Hitoshi Suda, Gaku Kotani, Daisuke Saito:

Nonparallel Training of Exemplar-Based Voice Conversion System Using INCA-Based Alignment Technique. 4681-4685 - Chen-Yu Chen, Wei-Zhong Zheng, Syu-Siang Wang

, Yu Tsao, Pei-Chun Li, Ying-Hui Lai:
Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-Based Voice Conversion System. 4686-4690 - Da-Yi Wu, Yen-Hao Chen, Hung-yi Lee:

VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net Architecture. 4691-4695 - Seung Won Park, Doo-young Kim, Myun-chul Joe:

Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion Without Parallel Data. 4696-4700 - Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Tao Wang, Chunyu Qiang:

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis. 4701-4705 - Zheng Lian

, Zhengqi Wen, Xinyong Zhou, Songbai Pu, Shengkai Zhang, Jianhua Tao:
ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data. 4706-4710 - Shahan Nercessian:

Improved Zero-Shot Voice Conversion Using Explicit Conditioning Signals. 4711-4715 - Minchuan Chen, Weijian Hou, Jun Ma, Shaojun Wang, Jing Xiao:

Non-Parallel Voice Conversion with Fewer Labeled Data by Conditional Generative Adversarial Networks. 4716-4720 - Songxiang Liu, Yuewen Cao, Shiyin Kang, Na Hu, Xunying Liu, Dan Su, Dong Yu, Helen Meng:

Transferring Source Style in Non-Parallel Voice Conversion. 4721-4725 - Ehab A. AlBadawy, Siwei Lyu:

Voice Conversion Using Speech-to-Speech Neuro-Style Transfer. 4726-4730
Multilingual and Code-Switched ASR
- Changhan Wang, Juan Miguel Pino, Jiatao Gu:

Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation. 4731-4735 - Samuel Thomas, Kartik Audhkhasi, Brian Kingsbury:

Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings. 4736-4740 - Yun Zhu, Parisa Haghani, Anshuman Tripathi, Bhuvana Ramabhadran, Brian Farris, Hainan Xu, Han Lu, Hasim Sak, Isabel Leal, Neeraj Gaur, Pedro J. Moreno, Qian Zhang:

Multilingual Speech Recognition with Self-Attention Structured Parameterization. 4741-4745 - Srikanth R. Madikeri, Banriskhem K. Khonglah, Sibo Tong, Petr Motlícek, Hervé Bourlard, Daniel Povey:

Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems. 4746-4750 - Vineel Pratap, Anuroop Sriram

, Paden Tomasello, Awni Y. Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert:
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters. 4751-4755 - Hardik B. Sailor

, Thomas Hain
:
Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages. 4756-4760 - Khyathi Raghavi Chandu, Alan W. Black:

Style Variation as a Vantage Point for Code-Switching. 4761-4765 - Yizhou Lu, Mingkun Huang, Hao Li, Jiaqi Guo, Yanmin Qian:

Bi-Encoder Transformer Network for Mandarin-English Code-Switching Speech Recognition Using Mixture of Experts. 4766-4770 - Yash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi:

Improving Low Resource Code-Switched ASR Using Augmented Code-Switched TTS. 4771-4775 - Zimeng Qiu, Yiyuan Li, Xinjian Li, Florian Metze, William M. Campbell:

Towards Context-Aware End-to-End Code-Switching Speech Recognition. 4776-4780
Speech and Voice Disorders
- Tuan Dinh, Alexander Kain, Robin Samlan, Beiming Cao, Jun Wang:

Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency. 4781-4785 - Han Tong, Hamid R. Sharifzadeh, Ian McLoughlin

:
Automatic Assessment of Dysarthric Severity Level Using Audio-Video Cross-Modal Approach in Deep Learning. 4786-4790 - Yuqin Lin, Longbiao Wang, Sheng Li

, Jianwu Dang, Chenchen Ding:
Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription. 4791-4795 - Yuki Takashima, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki:

Dysarthric Speech Recognition Based on Deep Metric Learning. 4796-4800 - Divya Degala, M. V. Achuth Rao, Rahul Krishnamurthy

, Pebbili Gopikishore
, Veeramani Priyadharshini, Prakash T. K., Prasanta Kumar Ghosh:
Automatic Glottis Detection and Segmentation in Stroboscopic Videos Using Convolutional Networks. 4801-4805 - Yilin Pan, Bahman Mirheidari, Zehai Tu, Ronan O'Malley, Traci Walker

, Annalena Venneri, Markus Reuber, Daniel Blackburn
, Heidi Christensen
:
Acoustic Feature Extraction with Interpretable Deep Neural Network for Neurodegenerative Related Disorder Classification. 4806-4810 - Neeraj Kumar Sharma

, Prashant Krishnan V, Rohit Kumar, Shreyas Ramoji, Srikanth Raj Chetupalli
, Nirmala R., Prasanta Kumar Ghosh, Sriram Ganapathy:
Coswara - A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis. 4811-4815 - Hannah P. Rowe

, Sarah E. Gutz, Marc F. Maffei, Jordan R. Green:
Acoustic-Based Articulatory Phenotypes of Amyotrophic Lateral Sclerosis and Parkinson's Disease: Towards an Interpretable, Hypothesis-Driven Framework of Motor Control. 4816-4820 - Lubna Alhinti

, Stuart P. Cunningham, Heidi Christensen
:
Recognising Emotions in Dysarthric Speech Using Typical Speech Data. 4821-4825 - Bence Mark Halpern, Rob van Son

, Michiel W. M. van den Brekel
, Odette Scharenborg
:
Detecting and Analysing Spontaneous Oral Cancer Speech in the Wild. 4826-4830
The Zero Resource Speech Challenge 2020
- Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux:

The Zero Resource Speech Challenge 2020: Discovering Discrete Subword and Word Units. 4831-4835 - Benjamin van Niekerk

, Leanne Nortje, Herman Kamper
:
Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge. 4836-4840 - Karthik Pandia D. S

, Anusha Prakash, Mano Ranjith Kumar M., Hema A. Murthy:
Exploration of End-to-End Synthesisers for Zero Resource Speech Challenge 2020. 4841-4845 - Batuhan Gündogdu, Bolaji Yusuf, Mansur Yesilbursa, Murat Saraclar:

Vector Quantized Temporally-Aware Correspondence Sparse Autoencoders for Zero-Resource Acoustic Unit Discovery. 4846-4850 - Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:

Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge. 4851-4855 - Takashi Morita

, Hiroki Koda:
Exploring TTS Without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020). 4856-4860 - Patrick Lumban Tobing

, Tomoki Hayashi, Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Toda
:
Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling. 4861-4865 - Mingjie Chen, Thomas Hain

:
Unsupervised Acoustic Unit Representation Learning for Voice Conversion Using WaveNet Auto-Encoders. 4866-4870 - Okko Räsänen

, María Andrea Cruz Blandón
:
Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics. 4871-4875 - Saurabhchand Bhati, Jesús Villalba

, Piotr Zelasko, Najim Dehak
:
Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery. 4876-4880 - Juliette Millet, Ewan Dunbar:

Perceptimatic: A Human Speech Perception Benchmark for Unsupervised Subword Modelling. 4881-4885 - Jonathan Clayton, Scott Wellington

, Cassia Valentini-Botinhao, Oliver Watts:
Decoding Imagined, Heard, and Spoken Speech: Classification and Regression of EEG Using a 14-Channel Dry-Contact Mobile Headset. 4886-4890 - Gurunath Reddy M., K. Sreenivasa Rao, Partha Pratim Das:

Glottal Closure Instants Detection from EGG Signal by Classification Approach. 4891-4895 - Hua Li, Fei Chen:

Classify Imaginary Mandarin Tones with Cortical EEG Signals. 4896-4900
LM Adaptation, Lexical Units and Punctuation
- Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura:

Augmenting Images for ASR and TTS Through Single-Loop and Dual-Loop Multimodal Chain Framework. 4901-4905 - Lukasz Augustyniak, Piotr Szymanski

, Mikolaj Morzy, Piotr Zelasko, Adrian Szymczak, Jan Mizgajski
, Yishay Carmiel, Najim Dehak
:
Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings? 4906-4910 - Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff:

Multimodal Semi-Supervised Learning Framework for Punctuation Prediction in Conversational Speech. 4911-4915 - Ruizhe Huang, Ke Li, Ashish Arora, Daniel Povey, Sanjeev Khudanpur:

Efficient MDI Adaptation for n-Gram Language Models. 4916-4920 - Cal Peyser, Sepand Mavandadi, Tara N. Sainath, James Apfel, Ruoming Pang, Shankar Kumar:

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus. 4921-4925 - Atsunori Ogawa, Naohiro Tawara, Marc Delcroix

:
Language Model Data Augmentation Based on Text Domain Transfer. 4926-4930 - Krzysztof Wolk

:
Contemporary Polish Language Model (Version 2) Using Big Data and Sub-Word Approach. 4931-4935 - Prabhat Pandey, Volker Leutnant, Simon Wiesler, Jahn Heymann, Daniel Willett:

Improving Speech Recognition of Compound-Rich Languages. 4936-4940 - Simone Wills, Pieter Uys, Charl Johannes van Heerden, Etienne Barnard:

Language Modeling for Speech Analytics in Under-Resourced Languages. 4941-4945
Speech in Health I
- Jing Han, Kun Qian

, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller:
An Early Study on Intelligent Analysis of Speech Under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety. 4946-4950 - Alice Baird, Nicholas Cummins

, Sebastian Schnieder, Jarek Krajewski, Björn W. Schuller:
An Evaluation of the Effect of Anxiety on Speech - Computational Prediction of Anxiety from Sustained Vowels. 4951-4955 - Ziping Zhao, Qifei Li, Nicholas Cummins

, Bin Liu, Haishuai Wang, Jianhua Tao, Björn W. Schuller:
Hybrid Network Feature Extraction for Depression Assessment from Speech. 4956-4960 - Yilin Pan, Bahman Mirheidari, Markus Reuber, Annalena Venneri, Daniel Blackburn

, Heidi Christensen
:
Improving Detection of Alzheimer's Disease Using Automatic Speech Recognition to Identify High-Quality Segments for More Robust Feature Extraction. 4961-4965 - Amrit Romana, John Bandon, Noelle Carlozzi, Angela Roberts

, Emily Mower Provost:
Classification of Manifest Huntington Disease Using Vowel Distortion Measures. 4966-4970 - Sudarsana Reddy Kadiri

, Rashmi Kethireddy, Paavo Alku
:
Parkinson's Disease Detection from Speech Using Single Frequency Filtering Cepstral Coefficients. 4971-4975 - Sebastião Quintas, Julie Mauclair, Virginie Woisard, Julien Pinquier:

Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer. 4976-4980 - Ajish K. Abraham, M. Pushpavathi, N. Sreedevi, A. Navya, Vikram C. Mathad, S. R. Mahadeva Prasanna:

Spectral Moment and Duration of Burst of Plosives in Speech of Children with Hearing Impairment and Typically Developing Children - A Comparative Study. 4981-4985 - Matthew Perez

, Zakaria Aldeneh, Emily Mower Provost:
Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts. 4986-4990 - Ina Kodrasi

, Michaela Pernon, Marina Laganaro
, Hervé Bourlard:
Automatic Discrimination of Apraxia of Speech and Dysarthria Using a Minimalistic Set of Handcrafted Features. 4991-4995
ASR Neural Network Architectures II — Transformers
- Yangyang Shi, Yongqiang Wang, Chunyang Wu, Christian Fuegen, Frank Zhang, Duc Le, Ching-Feng Yeh, Michael L. Seltzer:

Weak-Attention Suppression for Transformer Based Speech Recognition. 4996-5000 - Wenyong Huang, Wenchao Hu, Yu Ting Yeung, Xiao Chen:

Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition. 5001-5005 - Song Li, Lin Li, Qingyang Hong, Lingling Liu:

Improving Transformer-Based Speech Recognition with Unsupervised Pre-Training and Multi-Task Semantic Knowledge Learning. 5006-5010 - Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux:

Transformer-Based Long-Context End-to-End Speech Recognition. 5011-5015 - Xinyuan Zhou, Grandee Lee, Emre Yilmaz, Yanhua Long, Jiaen Liang, Haizhou Li:

Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-Based LVCSR. 5016-5020 - Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma:

Universal Speech Transformer. 5021-5025 - Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen:

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition. 5026-5030 - Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq R. Joty, Eng Siong Chng, Bin Ma:

Cross Attention with Monotonic Alignment for Speech Transformer. 5031-5035 - Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang:

Conformer: Convolution-augmented Transformer for Speech Recognition. 5036-5040 - Liang Lu, Changliang Liu, Jinyu Li

, Yifan Gong:
Exploring Transformers for Large-Scale Speech Recognition. 5041-5045
Spatial Audio
- Masahito Togami, Robin Scheibler:

Sparseness-Aware DOA Estimation with Majorization Minimization. 5046-5050 - Xiaoli Zhong, Hao Song, Xuejie Liu:

Spatial Resolution of Early Reflection for Speech and White Noise. 5051-5055 - Aditya Raikar, Karan Nathwani, Ashish Panda, Sunil Kumar Kopparapu:

Effect of Microphone Position Measurement Error on RIR and its Impact on Speech Intelligibility and Quality. 5056-5060 - Shuwen Deng

, Wolfgang Mack, Emanuël A. P. Habets:
Online Blind Reverberation Time Estimation Using CRNNs. 5061-5065 - Wolfgang Mack, Shuwen Deng

, Emanuël A. P. Habets:
Single-Channel Blind Direct-to-Reverberation Ratio Estimation Using Masking. 5066-5070 - Hanan Beit-On, Vladimir Tourbabin, Boaz Rafaely:

The Importance of Time-Frequency Averaging for Binaural Speaker Localization in Reverberant Environments. 5071-5075 - Yonggang Hu

, Prasanga N. Samarasinghe
, Thushara D. Abhayapala:
Acoustic Signal Enhancement Using Relative Harmonic Coefficients: Spherical Harmonics Domain Approach. 5076-5080 - B. H. V. S. Narayana Murthy, J. V. Satyanarayana, Nivedita Chennupati, B. Yegnanarayana:

Instantaneous Time Delay Estimation of Broadband Signals. 5081-5085 - Hao Wang, Kai Chen, Jing Lu:

U-Net Based Direct-Path Dominance Test for Robust Direction-of-Arrival Estimation. 5086-5090 - Wei Xue

, Ying Tong, Chao Zhang, Guohong Ding, Xiaodong He, Bowen Zhou:
Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-Task Learning. 5091-5095

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














