


default search action
SLT 2021: Shenzhen, China
- IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021. IEEE 2021, ISBN 978-1-7281-7066-4

- Mohan Li, Catalin Zorila, Rama Doddipatla:

Transformer-Based Online Speech Recognition with Decoder-end Adaptive Computation Steps. 1-7 - Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer:

Streaming Attention-Based Models with Augmented Memory for End-To-End Speech Recognition. 8-14 - Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie:

Cascade RNN-Transducer: Syllable Based Streaming On-Device Mandarin Speech Recognition with a Syllable-To-Character Converter. 15-21 - Emiru Tsunoo, Yosuke Kashiwagi, Shinji Watanabe

:
Streaming Transformer Asr With Blockwise Synchronous Beam Search. 22-29 - Jinhwan Park, Chanwoo Kim

, Wonyong Sung:
Convolution-Based Attention Model With Positional Encoding For Streaming Speech Recognition On Embedded Devices. 30-37 - George Sterpu, Christian Saam, Naomi Harte

:
Learning to Count Words in Fluent Speech Enables Online Speech Recognition. 38-45 - Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig:

Benchmarking LF-MMI, CTC And RNN-T Criteria For Streaming ASR. 46-51 - Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su, Thong Le, Ching-Feng Yeh, Christian Fuegen, Michael L. Seltzer:

Alignment Restricted Streaming Recurrent Neural Network Transducer. 52-59 - Huahuan Zheng, Keyu An, Zhijian Ou:

Efficient Neural Architecture Search for End-to-End Speech Recognition Via Straight-Through Gradients. 60-67 - Ke Hu, Ruoming Pang, Tara N. Sainath, Trevor Strohman:

Transformer Based Deliberation for Two-Pass Speech Recognition. 68-74 - Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie:

Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition. 75-81 - Jian Luo, Jianzong Wang

, Ning Cheng, Guilin Jiang, Jing Xiao:
Multi-Quartznet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion. 82-88 - Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals:

On The Usefulness of Self-Attention for Automatic Speech Recognition with Transformers. 89-96 - Thomas Pellegrini, Romain Zimmer, Timothée Masquelier

:
Low-Activity Supervised Convolutional Spiking Neural Networks Applied to Speech Commands Recognition. 97-103 - Yuxiang Kong, Jian Wu, Quandong Wang, Peng Gao, Weiji Zhuang, Yujun Wang, Lei Xie:

Multi-Channel Automatic Speech Recognition Using Deep Complex Unet. 104-110 - Kiran Praveen, Abhishek Pandey, Deepak Kumar, Shakti Prasad Rath, Sandip Shriram Bapat:

Dynamically Weighted Ensemble Models for Automatic Speech Recognition. 111-116 - Kazuhiro Nakadai, Yosuke Fukumoto, Ryu Takeda

:
Investigation of Node Pruning Criteria for Neural Networks Model Compression with Non-Linear Function and Non-Uniform Network Topology. 117-124 - Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Y. Hannun:

Semi-Supervised end-to-end Speech Recognition via Local Prior Matching. 125-132 - Jaesung Huh, Minjae Lee, Heesoo Heo, Seongkyu Mun, Joon Son Chung:

Metric Learning for Keyword Spotting. 133-140 - Alexandru-Lucian Georgescu

, Cristian Manolache, Dan Oneata, Horia Cucu, Corneliu Burileanu:
Data-Filtering Methods for Self-Training of Automatic Speech Recognition Systems. 141-147 - Prakhar Swarup, Debmalya Chakrabarty, Ashtosh Sapru, Hitesh Tulsiani, Harish Arsikere, Sri Garimella:

Efficient Large Scale Semi-Supervised Learning for CTC Based Acoustic Models. 148-155 - Morgane Rivière, Emmanuel Dupoux:

Towards Unsupervised Learning of Speech Features in the Wild. 156-163 - Bowen Shi, Shane Settle, Karen Livescu

:
Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings. 164-171 - Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig:

Improving RNN Transducer Based ASR with Auxiliary Tasks. 172-179 - Songjun Cao, Yike Zhang, Xiaobing Feng, Long Ma:

Improving Speech Recognition Accuracy of Local POI Using Geographical Models. 180-185 - Heng-Jui Chang

, Alexander H. Liu, Hung-yi Lee, Lin-Shan Lee:
End-to-End Whispered Speech Recognition with Frequency-Weighted Approaches and Pseudo Whisper Pre-training. 186-193 - Chenpeng Du

, Hao Li, Yizhou Lu, Lan Wang, Yanmin Qian:
Data Augmentation for end-to-end Code-Switching Speech Recognition. 194-200 - Bin Wu, Sakriani Sakti, Satoshi Nakamura:

Incorporating Discriminative DPGMM Posteriorgrams for Low-Resource ASR. 201-208 - Xinwei Li, Yuanyuan Zhang, Xiaodan Zhuang, Daben Liu:

Frame-Level Specaugment for Deep Convolutional Neural Networks in Hybrid ASR Systems. 209-214 - Eugene Kharitonov, Morgane Rivière, Gabriel Synnaeve, Lior Wolf, Pierre-Emmanuel Mazaré, Matthijs Douze, Emmanuel Dupoux:

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain. 215-222 - Ashutosh Pandey, Chunxi Liu, Yun Wang, Yatharth Saraf:

Dual Application of Speech Enhancement for Automatic Speech Recognition. 223-228 - Ruizhi Li, Gregory Sell, Hynek Hermansky

:
Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream end-to-end ASR. 229-235 - Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu:

Block-Online Guided Source Separation. 236-242 - Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li

, Yifan Gong:
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition. 243-250 - Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer:

Deep Shallow Fusion for RNN-T Personalization. 251-257 - Dan Oneata, Alexandru Caranica, Adriana Stan

, Horia Cucu:
An Evaluation of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition. 258-265 - Shih-Hsuan Chiu, Berlin Chen:

Innovative Bert-Based Reranking Language Models for Speech Recognition. 266-271 - Bipasha Sen, Aditya Agarwal, Mirishkar Sai Ganesh, Anil Kumar Vuppala:

Reed: An Approach Towards Quickly Bootstrapping Multilingual Acoustic Models. 272-279 - Minguang Song, Yunxin Zhao, Shaojun Wang, Mei Han:

Word Similarity Based Label Smoothing in Rnnlm Training for ASR. 280-285 - Seong Min Kye, Joon Son Chung, Hoirin Kim:

Supervised Attention for Speaker Recognition. 286-293 - Seong Min Kye, Yoohwan Kwon, Joon Son Chung:

Cross Attentive Pooling for Speaker Verification. 294-300 - Tianyan Zhou, Yong Zhao, Jian Wu:

ResNeXt and Res2Net Structures for Speaker Verification. 301-307 - Danwei Cai, Ming Li:

Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays. 308-315 - Yiling Huang, Yutian Chen, Jason Pelecanos, Quan Wang:

Synth2Aug: Cross-Domain Speaker Recognition with TTS Synthesized Speech. 316-322 - Md. Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan

, Emmanuel Vincent:
UIAI System for Short-Duration Speaker Verification Challenge 2020. 323-329 - Zheng Li, Miao Zhao, Lin Li, Qingyang Hong:

Multi-Feature Learning with Canonical Correlation Analysis Constraint for Text-Independent Speaker Verification. 330-337 - Hrishikesh Rao, Kedar Phatak, Elie Khoury

:
Improving Speaker Recognition with Quality Indicators. 338-343 - Po-Han Chi, Pei-Hung Chung

, Tsung-Han Wu, Chun-Cheng Hsieh, Yen-Hao Chen, Shang-Wen Li, Hung-yi Lee:
Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation. 344-350 - Bo-Hao Su, Chi-Chun Lee

:
A Conditional Cycle Emotion Gan for Cross Corpus Speech Emotion Recognition. 351-357 - Michael Neumann, Ngoc Thang Vu:

Investigations on audiovisual emotion recognition in noisy conditions. 358-364 - Patrick Meyer, Ziyi Xu, Tim Fingscheidt

:
Improving Convolutional Recurrent Neural Networks for Speech Emotion Recognition. 365-372 - Manon Macary, Marie Tahon, Yannick Estève, Anthony Rousseau:

On the Use of Self-Supervised Pre-Trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition. 373-380 - Aparna Khare

, Srinivas Parthasarathy, Shiva Sundaram:
Self-Supervised Learning with Cross-Modal Transformers for Emotion Recognition. 381-388 - Shi-wook Lee

:
Domain Generalization with Triplet Network for Cross-Corpus Speech Emotion Recognition. 389-396 - Alice Baird, Shahin Amiriparian

, Manuel Milling, Björn W. Schuller:
Emotion Recognition in Public Speaking Scenarios Utilising An LSTM-RNN Approach with Attention. 397-402 - Haohan Guo

, Shaofei Zhang, Frank K. Soong, Lei He, Lei Xie:
Conversational End-to-End TTS for Voice Agents. 403-409 - Liangqi Liu, Jiankun Hu, Zhiyong Wu, Song Yang, Songfan Yang, Jia Jia, Helen Meng:

Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis. 410-414 - Kun Zhou, Berrak Sisman

, Haizhou Li:
Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech. 415-422 - Yi Lei, Shan Yang, Lei Xie:

Fine-Grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis. 423-430 - Slava Shechtman, Raul Fernandez, David Haws:

Supervised and unsupervised approaches for controlling narrow lexical focus in sequence-to-sequence speech synthesis. 431-437 - Aolan Sun, Jianzong Wang

, Ning Cheng, Huayi Peng, Zhen Zeng, Lingwei Kong, Jing Xiao:
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis. 438-445 - Chung-Ming Chien, Hung-yi Lee:

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis. 446-453 - Qiong Hu, Tobias Bleisch, Petko Petkov, Tuomo Raitio, Erik Marchi, Varun Lakshminarasimhan:

Whispered and Lombard Neural Speech Synthesis. 454-461 - Yeunju Choi, Youngmoon Jung, Hoirin Kim:

Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning with Spoofing Detection and Spoofing Type Classification. 462-469 - Eunwoo Song

, Ryuichi Yamamoto, Min-Jae Hwang, Jin-Seob Kim, Ohsung Kwon, Jae-Min Kim:
Improved Parallel Wavegan Vocoder with Perceptually Weighted Spectrogram Loss. 470-476 - Yang Ai, Haoyu Li, Xin Wang

, Junichi Yamagishi, Zhen-Hua Ling:
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation. 477-484 - Zhen Zeng, Jianzong Wang

, Ning Cheng, Jing Xiao:
MelGlow: Efficient Waveform Generative Network Based On Location-Variable Convolution. 485-491 - Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie:

Multi-Band Melgan: Faster Waveform Generation For High-Quality Text-To-Speech. 492-498 - Song Li, Beibei Ouyang, Lin Li, Qingyang Hong:

Lightspeech: Lightweight Non-Autoregressive Multi-Speaker Text-To-Speech. 499-506 - Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li:

Optimizing Voice Conversion Network with Cycle Consistency Loss of Speaker Identity. 507-513 - Tzu-hsien Huang, Jheng-Hao Lin, Hung-yi Lee:

How Far Are We from Robust Voice Conversion: A Survey. 514-521 - Heyang Xue, Shan Yang, Yi Lei, Lei Xie, Xiulin Li:

Learn2Sing: Target Speaker Singing Voice Synthesis by Learning from a Singing Teacher. 522-529 - Hayato Shibata, Mingxin Zhang, Takahiro Shinozaki:

Unsupervised Acoustic-to-Articulatory Inversion Neural Network Learning Based on Deterministic Policy Gradient. 530-537 - Tianxiang Chen, Elie Khoury

:
Spoofprint: A New Paradigm for Spoofing Attacks Detection. 538-543 - Yang Gao, Jiachen Lian, Bhiksha Raj, Rita Singh:

Detection and Evaluation of Human and Machine Generated Speech in Spoofing Attacks on Automatic Speaker Verification Systems. 544-551 - Chien-yu Huang, Yist Y. Lin, Hung-yi Lee, Lin-Shan Lee:

Defending Your Voice: Adversarial Attack on Voice Conversion. 552-559 - Hiroto Kai, Shinnosuke Takamichi, Sayaka Shiota, Hitoshi Kiya:

Lightweight Voice Anonymization Based on Data-Driven Optimization of Cascaded Voice Modification Modules. 560-566 - Youngki Kwon, Hee Soo Heo, Jaesung Huh, Bong-Jin Lee, Joon Son Chung:

Look Who's Not Talking. 567-573 - Qiujia Li, Florian L. Kreyssig, Chao Zhang, Philip C. Woodland:

Discriminative Neural Clustering for Speaker Diarisation. 574-581 - Desh Raj

, Zili Huang, Sanjeev Khudanpur:
Multi-Class Spectral Clustering with Overlaps for Speaker Diarization. 582-589 - Suchitra Krishnamachari, Manoj Kumar, So Hyun Kim

, Catherine Lord, Shrikanth Narayanan:
Developing Neural Representations for Robust Child-Adult Diarization. 590-597 - You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee:

End-To-End Lip Synchronisation Based on Pattern Classification. 598-605 - Jian Luo, Jianzong Wang

, Ning Cheng, Guilin Jiang, Jing Xiao:
End-To-End Silent Speech Recognition with Acoustic Sensing. 606-612 - Timothy Israel Santos, Andrew Abel, Nick Wilson

, Yan Xu:
Speaker-Independent Visual Speech Recognition with the Inception V3 Model. 613-620 - Shahram Ghorbani, Yashesh Gaur, Yu Shi, Jinyu Li

:
Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations. 621-628 - Mao Saeki

, Yoichi Matsuyama, Satoshi Kobashikawa, Tetsuji Ogawa
, Tetsunori Kobayashi:
Analysis of Multimodal Features for Speaking Proficiency Scoring in an Interview Dialogue. 629-635 - Srinivas Parthasarathy, Shiva Sundaram:

Detecting Expressions with Multimodal Transformers. 636-643 - Muralikrishna H, Shikha Gupta, Dileep Aroor Dinesh, Padmanabhan Rajan:

Noise-Robust Spoken Language Identification Using Language Relevance Factor Based Embedding. 644-651 - Jörgen Valk, Tanel Alumäe:

VOXLINGUA107: A Dataset for Spoken Language Recognition. 652-658 - Xiaosu Tong, Che-Wei Huang, Sri Harish Mallidi, Shaun Joseph, Sonal Pareek, Chander Chandak, Ariya Rastrow, Roland Maas:

Streaming ResLSTM with Causal Mean Aggregation for Device-Directed Utterance Detection. 659-664 - Fang Kang, Feiran Yang, Jun Yang:

Real-Time Independent Vector Analysis with a Deep-Learning-Based Source Model. 665-669 - Amit Meghanani, Chandran Savithri Anoop

, A. G. Ramakrishnan
:
An Exploration of Log-Mel Spectrogram and MFCC Features for Alzheimer's Dementia Recognition from Spontaneous Speech. 670-677 - Su Ji Park, Alan Rozet:

Film Quality Prediction Using Acoustic, Prosodic and Lexical Cues. 678-684 - Yulan Feng, Alan W. Black, Maxine Eskénazi:

Towards Automatic Route Description Unification in Spoken Dialog Systems. 685-692 - Subash Khanal, Michael T. Johnson, Narjes Bozorg:

Articulatory Comparison of L1 and L2 Speech for Mispronunciation Diagnosis. 693-697 - Yang Shen, Ayano Yasukagawa, Daisuke Saito, Nobuaki Minematsu, Kazuya Saito:

Optimized Prediction of Fluency of L2 English Based on Interpretable Network Using Quantity of Phonation and Quality of Pronunciation. 698-704 - Xinhao Wang, Keelan Evanini, Yao Qian, Matthew Mulholland:

Automated Scoring of Spontaneous Speech from Young Learners of English Using Transformers. 705-712 - Binghuai Lin, Liyuan Wang, Hongwei Ding, Xiaoli Feng:

Improving L2 English Rhythm Evaluation with Automatic Sentence Stress Detection. 713-719 - Protima Nomo Sudro

, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna:
Enhancing the Intelligibility of Cleft Lip and Palate Speech Using Cycle-Consistent Adversarial Networks. 720-727 - Ram C. M. C. Shekar, Chelzy Belitz, John H. L. Hansen:

Development of CNN-Based Cochlear Implant and Normal Hearing Sound Recognition Models Using Natural and Auralized Environmental Audio. 728-733 - Haoyu Li, Yang Ai, Junichi Yamagishi:

Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model. 734-741 - Ying Shi, Haolin Chen, Zhiyuan Tang, Lantian Li

, Dong Wang, Jiqing Han:
Can We Trust Deep Speech Prior? 742-749 - Yanpei Shi, Thomas Hain

:
Contextual Joint Factor Acoustic Embeddings. 750-757 - Yanpei Shi, Thomas Hain

:
Supervised Speaker Embedding De-Mixing in Two-Speaker Environment. 758-765 - Jianming Liu, Meng Yu, Yong Xu, Chao Weng, Shi-Xiong Zhang, Lianwu Chen, Dong Yu:

Neural Mask based Multi-channel Convolutional Beamforming for Joint Dereverberation, Echo Cancellation and Denoising. 766-770 - Aditya Jayasimha, Periyasamy Paramasivam:

Personalizing Speech Start Point and End Point Detection in ASR Systems from Speaker Embeddings. 771-777 - Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita

, Marc Delcroix
, Tomohiro Nakatani, Shoko Araki:
Multimodal Attention Fusion for Target Speaker Extraction. 778-784 - Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Böddeker, Zhuo Chen, Shinji Watanabe

:
ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration. 785-792 - Catalin Zorila, Mohan Li, Rama Doddipatla

:
An Investigation into the Multi-channel Time Domain Speaker Extraction Network. 793-800 - Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu:

Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks. 801-808 - Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:

Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. 809-816 - Zhaoheng Ni, Yong Xu, Meng Yu, Bo Wu, Shi-Xiong Zhang, Dong Yu, Michael I. Mandel:

WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation. 817-824 - Yi Luo, Cong Han, Nima Mesgarani:

Distortion-Controlled Training for end-to-end Reverberant Speech Separation with Auxiliary Autoencoding Loss. 825-832 - Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:

Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription. 833-840 - Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe

, Paola García, Kenji Nagamatsu:
Online End-To-End Neural Diarization with Speaker-Tracing Buffer. 841-848 - Yuki Takashima, Yusuke Fujita, Shinji Watanabe

, Shota Horiguchi, Paola García, Kenji Nagamatsu:
End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection. 849-856 - Yihui Fu, Jian Wu, Yanxin Hu, Mengtao Xing, Lei Xie:

DESNet: A Multi-Channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation. 857-864 - Chenda Li, Yi Luo, Cong Han, Jinyu Li

, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix
, Keisuke Kinoshita
, Christoph Böddeker, Yanmin Qian, Shinji Watanabe
, Zhuo Chen:
Dual-Path RNN for Long Recording Speech Separation. 865-872 - Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu:

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions. 873-880 - Desh Raj

, Leibny Paola García-Perera
, Zili Huang, Shinji Watanabe
, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur:
DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs. 881-888 - Katerina Zmolíková

, Marc Delcroix
, Lukás Burget, Tomohiro Nakatani, Jan Honza Cernocký:
Integration of Variational Autoencoder and Spatial Clustering for Adaptive Multi-Channel Neural Speech Separation. 889-896 - Desh Raj

, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe
, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li
, Scott Wisdom, John R. Hershey:
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis. 897-904 - Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin W. Wilson, Desh Raj

, Shinji Watanabe
, Zhuo Chen, John R. Hershey:
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement. 905-911 - Li Chai, Jun Du, Diyuan Liu, Yanhui Tu, Chin-Hui Lee:

Acoustic Modeling for Multi-Array Conversational Speech Recognition in the Chime-6 Challenge. 912-918 - Christiaan Jacobs, Yevgen Matusevych

, Herman Kamper
:
Acoustic Word Embeddings for Zero-Resource Languages Using Self-Supervised Contrastive Learning and Multilingual Adaptation. 919-926 - Lisa van Staden, Herman Kamper

:
A Comparison of Self-Supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings. 927-934 - Yushi Hu, Shane Settle, Karen Livescu

:
Acoustic Span Embeddings for Multilingual Query-by-Example Search. 935-942 - Merve Ünlü

, Ebru Arisoy:
Uncertainty-Aware Representations for Spoken Question Answering. 943-949 - Parnia Bahar, Tobias Bieschke, Ralf Schlüter

, Hermann Ney:
Tight Integrated End-to-End Training for Cascaded Speech Translation. 950-957 - Takatomo Kano, Sakriani Sakti, Satoshi Nakamura:

Transformer-Based Direct Speech-To-Speech Translation with Transcoder. 958-965 - Manoj Kumar, Varun Kumar, Hadrien Glaude, Cyprien de Lichy, Aman Alok, Rahul Gupta:

Protoda: Efficient Transfer Learning for Few-Shot Intent Classification. 966-972 - Grégory Senay, Badr Youbi Idrissi, Marine Haziza:

VirAAL: Virtual Adversarial Active Learning for NLU. 973-980 - Mahdi Namazifar, Gökhan Tür

, Dilek Hakkani-Tür
:
Warped Language Models for Noise Robust Language Understanding. 981-988 - Prashanth Gurunath Shivakumar, Naveen Kumar, Panayiotis G. Georgiou, Shrikanth Narayanan:

RNN Based Incremental Online Spoken Language Understanding. 989-996 - Pu Wang

, Hugo Van hamme
:
A Light Transformer For Speech-To-Intent Applications. 997-1003 - Shang-Wen Li, Jason Krone, Shuyan Dong, Yi Zhang, Yaser Al-Onaizan:

Meta Learning to Classify Intent and Slot Labels with Noisy Few Shot Examples. 1004-1011 - Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi:

Large-Context Conversational Representation Learning: Self-Supervised Learning For Conversational Documents. 1012-1019 - Zhengyu Zhou, In Gyu Choi, Yongliang He, Vikas Yadav, Chin-Hui Lee:

Using Paralinguistic Information to Disambiguate User Intentions for Distinguishing Phrase Structure and Sarcasm in Spoken Dialog Systems. 1020-1027 - Ting-Yun Chang, Yang Liu, Karthik Gopalakrishnan, Behnam Hedayatnia, Pei Zhou, Dilek Hakkani-Tür

:
Go Beyond Plain Fine-Tuning: Improving Pretrained Models for Social Commonsense. 1028-1035 - Zexin Lu, Jing Li

, Yingyi Zhang, Haisong Zhang:
Getting Your Conversation on Track: Estimation of Residual Life for Conversations. 1036-1043 - Hiroaki Takatsu, Mayu Okuda, Yoichi Matsuyama, Hiroshi Honda

, Shinya Fujie, Tetsunori Kobayashi:
Personalized Extractive Summarization for a News Dialogue System. 1044-1051 - Tomek Rutowski, Elizabeth Shriberg, Amir Harati, Yang Lu, Ricardo Oliveira, Piotr Chlebek:

Cross-Demographic Portability of Deep NLP-Based Depression Models. 1052-1057 - Huan-Yu Chen, Yun-Shao Lin, Chi-Chun Lee

:
Through the Words of Viewers: Using Comment-Content Entangled Network for Humor Impression Recognition. 1058-1064 - Parnia Bahar, Christopher Brix, Hermann Ney:

Two-Way Neural Machine Translation: A Proof of Concept for Bidirectional Translation Modeling Using a Two-Dimensional Grid. 1065-1070 - Maya Epps, Juan Uribe, Mandy Korpusik:

A New Dataset for Natural Language Understanding of Exercise Logs in a Food and Fitness Spoken Dialogue System. 1071-1078 - Chiara Semenzin, Lisa Hamrick, Amanda Seidl, Bridgette Kelleher, Alejandrina Cristià:

Towards Large-Scale Data Annotation of Audio from Wearables: Validating Zooniverse Annotations of Infant Vocalization Types. 1079-1085 - Marco Marini

, Mauro Viganò
, Massimo Corbo, Marina Zettin, Gloria Simoncini, Bruno Fattori, Clelia D'Anna, Massimiliano Donati
, Luca Fanucci
:
IDEA: An Italian Dysarthric Speech Database. 1086-1093 - Delowar Hossain, Yoshinao Sato:

Efficient corpus design for wake-word detection. 1094-1100 - Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, Dongyan Huang, Hui Bu, Petr Motlícek, Jean-Marc Odobez

:
IEEE SLT 2021 Alpha-Mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines. 1101-1108 - Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond

, Steve Renals:
Tal: A Synchronised Multi-Speaker Corpus of Ultrasound Tongue Imaging, Audio, and Lip Videos. 1109-1116 - Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao:

The SLT 2021 Children Speech Recognition Challenge: Open Datasets, Rules and Baselines. 1117-1123

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














