


default search action
Shinji Watanabe 0001
Person information
- affiliation: Carnegie Mellon University, Pittsburgh, PA, USA
- affiliation (former): Johns Hopkins University, Baltimore, MD, USA
- affiliation (2012 - 2017): Mitsubishi Electric Research Laboratories, Cambridge, MA, USA
- affiliation (2001 - 2011): NTT Communication Science Laboratories, Kyoto, Japan
- affiliation (PhD 2006): Waseda University, Tokyo, Japan
Other persons with the same name
- Shinji Watanabe 0002 — Kanagawa University, Department of Electrical Engineering, Yokohama, Japan
- Shinji Watanabe 0003 — Osaka Prefecture University, School of Knowledge and Information Systems, Sakai, Japan
- Shinji Watanabe 0004 — Renesas Electronics Corporation, Kawasaki, Japan
- Shinji Watanabe 0005 — Nintendo Co.,Ltd, Kyoto, Japan
- Shinji Watanabe 0006 — Gifu National College of Technology, Motosu-gun, Gifu-ken, Japan
- Shinji Watanabe 0007 — University of Miyazaki, Miyazaki, Japan
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2025
- [c456]Yihan Wu, Yichen Lu, Yifan Peng, Xihua Wang, Ruihua Song, Shinji Watanabe:
Enhancing Audiovisual Speech Recognition Through Bifocal Preference Optimization. AAAI 2025: 25516-25524 - [c455]Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Shinji Watanabe:
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics. ICLR 2025 - [c454]Masao Someki, Yifan Peng, Siddhant Arora, Markus Müller, Athanasios Mouchtaris, Grant P. Strimel, Jing Liu, Shinji Watanabe:
Context-aware Dynamic Pruning for Speech Foundation Models. ICLR 2025 - [c453]Kwanghee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe, David R. Mortensen:
Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment. NAACL (Long Papers) 2025: 2613-2628 - [c452]Yifan Peng, Krishna C. Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, Boris Ginsburg:
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning. NAACL (Long Papers) 2025: 5787-5802 - [i344]Tomohiko Nakamura, Kwanghee Choi, Keigo Hojo, Yoshiaki Bando, Satoru Fukayama, Shinji Watanabe:
Discrete Speech Unit Extraction via Independent Component Analysis. CoRR abs/2501.06562 (2025) - [i343]Kwanghee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe, David R. Mortensen:
Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment. CoRR abs/2502.07029 (2025) - [i342]William Chen, Jinchuan Tian, Yifan Peng, Brian Yan, Chao-Han Huck Yang, Shinji Watanabe:
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models. CoRR abs/2502.10373 (2025) - [i341]Jinchuan Tian, Jiatong Shi, William Chen, Siddhant Arora, Yoshiki Masuyama, Takashi Maekaku, Yihan Wu, Junyi Peng, Shikhar Bharadwaj, Yiwen Zhao, Samuele Cornell
, Yifan Peng, Xiang Yue, Chao-Han Huck Yang, Graham Neubig, Shinji Watanabe:
ESPnet-SpeechLM: An Open Speech Language Model Toolkit. CoRR abs/2502.15218 (2025) - [i340]Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Shinji Watanabe:
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics. CoRR abs/2503.01174 (2025) - [i339]Siddhant Arora, Yifan Peng, Jiatong Shi, Jinchuan Tian, William Chen, Shikhar Bharadwaj, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Shuichiro Shimizu, Vaibhav Srivastav, Shinji Watanabe:
ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems. CoRR abs/2503.08533 (2025) - [i338]Yichen Huang, Zachary Novack, Koichi Saito, Jiatong Shi, Shinji Watanabe, Yuki Mitsufuji, John Thickstun, Chris Donahue:
Aligning Text-to-Music Evaluation with Human Preferences. CoRR abs/2503.16669 (2025) - [i337]Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, Yifan Peng, Haibin Wu, Yossi Adi, Emmanuel Dupoux, Hung-Yi Lee, Karen Livescu, Shinji Watanabe:
On The Landscape of Spoken Language Models: A Comprehensive Survey. CoRR abs/2504.08528 (2025) - 2024
- [j62]Xuankai Chang, Pengcheng Guo
, Yuya Fujita
, Takashi Maekaku, Shinji Watanabe:
MC-Whisper: Extending Speech Foundation Models to Multichannel Distant Speech Recognition. IEEE Signal Process. Lett. 31: 2850-2854 (2024) - [j61]Xuankai Chang
, Shinji Watanabe
, Marc Delcroix
, Tsubasa Ochiai
, Wangyou Zhang
, Yanmin Qian
:
Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition [Special Issue On Model-Based and Data-Driven Audio Signal Processing]. IEEE Signal Process. Mag. 41(6): 39-50 (2024) - [j60]Rohit Prabhavalkar
, Takaaki Hori
, Tara N. Sainath
, Ralf Schlüter
, Shinji Watanabe
:
End-to-End Speech Recognition: A Survey. IEEE ACM Trans. Audio Speech Lang. Process. 32: 325-351 (2024) - [j59]Takaaki Saeki
, Soumi Maiti
, Xinjian Li
, Shinji Watanabe
, Shinnosuke Takamichi
, Hiroshi Saruwatari
:
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1829-1844 (2024) - [j58]Shih-Lun Wu
, Chris Donahue
, Shinji Watanabe
, Nicholas J. Bryan
:
Music ControlNet: Multiple Time-Varying Controls for Music Generation. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2692-2703 (2024) - [j57]Shu-Wen Yang
, Heng-Jui Chang
, Zili Huang, Andy T. Liu
, Cheng-I Lai
, Haibin Wu
, Jiatong Shi
, Xuankai Chang, Hsiang-Sheng Tsai
, Wen-Chin Huang
, Tzu-hsun Feng, Po-Han Chi
, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe
, Hung-yi Lee:
A Large-Scale Evaluation of Speech Foundation Models. IEEE ACM Trans. Audio Speech Lang. Process. 32: 2884-2899 (2024) - [c451]Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang
, Jinglin Liu, Yi Ren, Yuexian Zou, Zhou Zhao, Shinji Watanabe
:
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head. AAAI 2024: 23802-23804 - [c450]Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen
, Lori S. Levin:
Wav2Gloss: Generating Interlinear Glossed Text from Speech. ACL (1) 2024: 568-582 - [c449]Yifan Peng, Yui Sudo
, Muhammad Shakeel, Shinji Watanabe:
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification. ACL (1) 2024: 10192-10209 - [c448]Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan S. Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
On the Evaluation of Speech Foundation Models for Spoken Language Understanding. ACL (Findings) 2024: 11923-11938 - [c447]Yichen Lu, Jiaqi Song, Chao-Han Huck Yang, Shinji Watanabe:
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model. EMNLP (Industry Track) 2024: 440-451 - [c446]William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. EMNLP 2024: 10205-10224 - [c445]Hang Chen, Shilong Wu, Chenxi Wang, Jun Du, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe, Jingdong Chen, Odette Scharenborg, Zhong-Qiu Wang, Bao-Cai Yin, Jia Pan:
Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge. ICASSP Workshops 2024: 123-124 - [c444]Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-Weon Jung, François G. Germain, Jonathan Le Roux, Shinji Watanabe:
Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation. ICASSP 2024: 316-320 - [c443]Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhongqiu Wang, Shinji Watanabe:
Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor. ICASSP 2024: 446-450 - [c442]Muhammad Shakeel
, Yui Sudo, Yifan Peng, Shinji Watanabe:
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation. ICASSP Workshops 2024: 570-574 - [c441]Kwanghee Choi, Jee-Weon Jung, Shinji Watanabe:
Understanding Probe Behaviors Through Variational Bounds of Mutual Information. ICASSP 2024: 5655-5659 - [c440]Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens. ICASSP 2024: 7970-7974 - [c439]Salvador Medina, Sarah L. Taylor, Carsten Stoll, Gareth Edwards, Alex Hauptmann, Shinji Watanabe, Iain A. Matthews:
PhISANet: Phonetically Informed Speech Animation Network. ICASSP 2024: 8225-8229 - [c438]Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro:
Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper. ICASSP 2024: 10471-10475 - [c437]Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe:
Phoneme-Aware Encoding for Prefix-Tree-Based Contextual ASR. ICASSP 2024: 10641-10645 - [c436]Yui Sudo
, Muhammad Shakeel
, Yosuke Fukumoto, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search. ICASSP 2024: 10896-10900 - [c435]Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu:
Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models. ICASSP 2024: 11156-11160 - [c434]Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang:
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. ICASSP 2024: 11481-11485 - [c433]Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury:
Semi-Autoregressive Streaming ASR with Label Context. ICASSP 2024: 11681-11685 - [c432]Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe:
Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model. ICASSP 2024: 11741-11745 - [c431]Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun
, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur:
Less Peaky and More Accurate CTC Forced Alignment by Label Priors. ICASSP 2024: 11831-11835 - [c430]Samuele Cornell
, Jee-Weon Jung, Shinji Watanabe, Stefano Squartini:
One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition. ICASSP 2024: 11856-11860 - [c429]Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe:
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing. ICASSP 2024: 11941-11945 - [c428]Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur:
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization. ICASSP 2024: 11971-11975 - [c427]Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Absar Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur:
Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora. ICASSP 2024: 12006-12010 - [c426]Jee-Weon Jung, Roshan S. Sharma, William Chen, Bhiksha Raj, Shinji Watanabe:
AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models. ICASSP 2024: 12071-12075 - [c425]Chien-Yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan S. Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee:
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech. ICASSP 2024: 12136-12140 - [c424]Doyeop Kwak, Jaemin Jung, Kihyun Nam, Youngjoon Jang, Jee-Weon Jung, Shinji Watanabe, Joon Son Chung:
VoxMM: Rich Transcription of Conversations in the Wild. ICASSP 2024: 12551-12555 - [c423]William Chen, Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe:
Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing. ICASSP 2024: 13066-13070 - [c422]Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-Weon Jung, Xuankai Chang, Shinji Watanabe:
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks. ICASSP 2024: 13326-13330 - [c421]Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe:
Cross-Talk Reduction. IJCAI 2024: 5171-5180 - [c420]Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe:
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss. INTERSPEECH 2024 - [c419]Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Barry-John Theobald:
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features? INTERSPEECH 2024 - [c418]Yoshiaki Bando, Tomohiko Nakamura, Shinji Watanabe:
Neural Blind Source Separation and Diarization for Distant Speech Recognition. INTERSPEECH 2024 - [c417]Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin:
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units. INTERSPEECH 2024 - [c416]Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe:
Self-Supervised Speech Representations are More Phonetic than Semantic. INTERSPEECH 2024 - [c415]Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model. INTERSPEECH 2024 - [c414]Jee-weon Jung, Xin Wang, Nicholas W. D. Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Siddhant Arora, Junichi Yamagishi, Joon Son Chung:
To what extent can ASV systems naturally defend against spoofing attacks? INTERSPEECH 2024 - [c413]Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Alex Gichamba, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe:
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models. INTERSPEECH 2024 - [c412]Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe:
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting. INTERSPEECH 2024 - [c411]Hyung Yong Kim, Byeong-Yeol Kim, Yunkyu Lim, Jihwan Park, Shukjae Choi, Yooncheol Ju, Jinseok Park, Youshin Lim, Seung Woo Yu, Hanbin Lee, Shinji Watanabe:
Self-training ASR Guided by Unsupervised ASR Teacher. INTERSPEECH 2024 - [c410]Kwangyoun Kim, Suwon Shon, Yi-Te Hsu, Prashant Sridhar, Karen Livescu, Shinji Watanabe:
Convolution-Augmented Parameter-Efficient Fine-Tuning for Speech Recognition. INTERSPEECH 2024 - [c409]Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe:
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer. INTERSPEECH 2024 - [c408]Darshan Prabhu, Yifan Peng, Preethi Jyothi, Shinji Watanabe:
MULTI-CONVFORMER: Extending Conformer with Multiple Convolution Kernels. INTERSPEECH 2024 - [c407]Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann:
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation. INTERSPEECH 2024 - [c406]Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari:
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics. INTERSPEECH 2024 - [c405]Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe:
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing. INTERSPEECH 2024 - [c404]Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Sun, Shinji Watanabe:
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model. INTERSPEECH 2024 - [c403]Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets. INTERSPEECH 2024 - [c402]Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu:
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding. INTERSPEECH 2024 - [c401]Tejes Srivastava, Jiatong Shi, William Chen, Shinji Watanabe:
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios. INTERSPEECH 2024 - [c400]Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, Shinji Watanabe:
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models. INTERSPEECH 2024 - [c399]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Decoder-only Architecture for Streaming End-to-end Speech Recognition. INTERSPEECH 2024 - [c398]Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian:
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement. INTERSPEECH 2024 - [c397]Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian:
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement. INTERSPEECH 2024 - [c396]Yuning Wu
, Jiatong Shi
, Yifeng Yu
, Yuxun Tang
, Tao Qian
, Yueqian Lin
, Jionghao Han
, Xinyi Bai
, Shinji Watanabe
, Qin Jin
:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. ACM Multimedia 2024: 11279-11281 - [c395]Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan S. Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe:
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions. NAACL-HLT 2024: 2754-2774 - [c394]Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe:
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts. SLT 2024: 43-48 - [c393]Yui Sudo
, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition With Dynamic Vocabulary. SLT 2024: 78-85 - [c392]Shih-Heng Wang, Jiatong Shi, Chien-Yu Huang, Shinji Watanabe, Hung-Yi Lee:
Fusion Of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition. SLT 2024: 247-254 - [c391]Chenda Li, Samuele Cornell
, Shinji Watanabe, Yanmin Qian:
Diffusion-Based Generative Modeling With Discriminative Guidance for Streamable Speech Enhancement. SLT 2024: 333-340 - [c390]Chao-Han Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, Yen-Ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Zelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Sabato Marco Siniscalchi, Eng Siong Chng, Peter Bell, Catherine Lai, Shinji Watanabe, Andreas Stolcke:
Large Language Model Based Generative Error Correction: A Challenge and Baselines For Speech Recognition, Speaker Tagging, and Emotion Recognition. SLT 2024: 371-378 - [c389]Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-Weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe:
ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs For Audio, Music, and Speech. SLT 2024: 562-569 - [c388]Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kaiwei Chang, Jiawei Du, Ke-Han Lu, Alexander H. Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan
, James R. Glass, Shinji Watanabe, Hung-Yi Lee:
Codec-Superb @ SLT 2024: A Lightweight Benchmark For Neural Audio Codec Models. SLT 2024: 570-577 - [c387]Yifeng Yu, Jiatong Shi, Yuning Wu, Yuxun Tang, Shinji Watanabe:
Visinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation. SLT 2024: 719-726 - [c386]Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell
, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe:
ESPnet-EZ: Python-Only ESPnet For Easy Fine-Tuning And Integration. SLT 2024: 863-870 - [c385]William Chen, Brian Yan, Chih-Chen Chen, Shinji Watanabe:
Floras 50: A Massively Multilingual Multitask Benchmark for Long-Form Conversational Speech. SLT 2024: 891-898 - [i336]Jee-weon Jung, Roshan S. Sharma, William Chen, Bhiksha Raj, Shinji Watanabe
:
AugSumm: towards generalizable speech summarization using synthetic labels from large language model. CoRR abs/2401.06806 (2024) - [i335]Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe
:
Improving ASR Contextual Biasing with Guided Attention. CoRR abs/2401.08835 (2024) - [i334]Yui Sudo, Muhammad Shakeel
, Yosuke Fukumoto, Yifan Peng
, Shinji Watanabe
:
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search. CoRR abs/2401.10449 (2024) - [i333]Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe
:
Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor. CoRR abs/2401.12473 (2024) - [i332]Wangyou Zhang, Jee-weon Jung, Shinji Watanabe
, Yanmin Qian:
Improving Design of Input Condition Invariant Speech Enhancement. CoRR abs/2401.14271 (2024) - [i331]Yifan Peng
, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel
, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe
:
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer. CoRR abs/2401.16658 (2024) - [i330]Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe
, Hiroshi Saruwatari:
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics. CoRR abs/2401.16812 (2024) - [i329]Jee-weon Jung, Wangyou Zhang, Jiatong Shi, Zakaria Aldeneh, Takuya Higuchi, Barry-John Theobald, Ahmed Hussen Abdelaziz, Shinji Watanabe
:
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models. CoRR abs/2401.17230 (2024) - [i328]Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe
:
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2. CoRR abs/2401.17619 (2024) - [i327]Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe
, Ruihua Song:
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition. CoRR abs/2401.18045 (2024) - [i326]Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Skyler Seto, Tatiana Likhomanenko, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe
, Barry-John Theobald:
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features? CoRR abs/2402.00340 (2024) - [i325]Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe
, Bhiksha Raj:
Evaluating and Improving Continual Learning in Spoken Language Understanding. CoRR abs/2402.10427 (2024) - [i324]Yifan Peng, Yui Sudo, Muhammad Shakeel
, Shinji Watanabe
:
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification. CoRR abs/2402.12654 (2024) - [i323]Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe
, Yong Man Ro:
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages. CoRR abs/2402.16021 (2024) - [i322]Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe
, Graham Neubig, David R. Mortensen, Lori S. Levin:
Wav2Gloss: Generating Interlinear Glossed Text from Speech. CoRR abs/2403.13169 (2024) - [i321]Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe
, Hung-yi Lee:
A Large-Scale Evaluation of Speech Foundation Models. CoRR abs/2404.09385 (2024) - [i320]Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel
, Yifan Peng, Shinji Watanabe:
Contextualized Automatic Speech Recognition with Dynamic Vocabulary. CoRR abs/2405.13344 (2024) - [i319]Muhammad Shakeel
, Yui Sudo, Yifan Peng, Shinji Watanabe:
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation. CoRR abs/2405.13514 (2024) - [i318]Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe:
Cross-Talk Reduction. CoRR abs/2405.20402 (2024) - [i317]Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe:
YODAS: Youtube-Oriented Dataset for Audio and Speech. CoRR abs/2406.00899 (2024) - [i316]Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun
, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur:
Less Peaky and More Accurate CTC Forced Alignment by Label Priors. CoRR abs/2406.02560 (2024) - [i315]Yui Sudo, Muhammad Shakeel
, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe:
4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders. CoRR abs/2406.02950 (2024) - [i314]Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian:
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement. CoRR abs/2406.04269 (2024) - [i313]Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell
, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian:
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement. CoRR abs/2406.04660 (2024) - [i312]Jee-weon Jung, Xin Wang, Nicholas W. D. Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, Joon Son Chung:
To what extent can ASV systems naturally defend against spoofing attacks? CoRR abs/2406.05339 (2024) - [i311]Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann:
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation. CoRR abs/2406.06185 (2024) - [i310]Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin:
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units. CoRR abs/2406.07725 (2024) - [i309]Yoshiaki Bando, Tomohiko Nakamura, Shinji Watanabe:
Neural Blind Source Separation and Diarization for Distant Speech Recognition. CoRR abs/2406.08396 (2024) - [i308]Kwanghee Choi, Ankita Pasad, Tomohiko Nakamura, Satoru Fukayama, Karen Livescu, Shinji Watanabe:
Self-Supervised Speech Representations are More Phonetic than Semantic. CoRR abs/2406.08619 (2024) - [i307]Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets. CoRR abs/2406.08641 (2024) - [i306]Yifeng Yu, Jiatong Shi, Yuning Wu, Shinji Watanabe:
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation. CoRR abs/2406.08761 (2024) - [i305]Jinchuan Tian, Yifan Peng, William Chen, Kwanghee Choi, Karen Livescu, Shinji Watanabe:
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models. CoRR abs/2406.09282 (2024) - [i304]Suwon Shon, Kwangyoun Kim, Yi-Te Hsu, Prashant Sridhar, Shinji Watanabe, Karen Livescu:
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding. CoRR abs/2406.09345 (2024) - [i303]Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Y. Sun, Shinji Watanabe:
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model. CoRR abs/2406.09869 (2024) - [i302]Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan S. Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
On the Evaluation of Speech Foundation Models for Spoken Language Understanding. CoRR abs/2406.10083 (2024) - [i301]Hayato Futami, Siddhant Arora, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model. CoRR abs/2406.12317 (2024) - [i300]Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe:
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting. CoRR abs/2406.12611 (2024) - [i299]Chenda Li, Samuele Cornell
, Shinji Watanabe, Yanmin Qian:
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement. CoRR abs/2406.13471 (2024) - [i298]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Decoder-only Architecture for Streaming End-to-end Speech Recognition. CoRR abs/2406.16107 (2024) - [i297]Muhammad Shakeel
, Yui Sudo, Yifan Peng, Shinji Watanabe:
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss. CoRR abs/2406.16120 (2024) - [i296]Hye-jin Shim, Md. Sahidullah, Jee-weon Jung, Shinji Watanabe, Tomi Kinnunen:
Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing. CoRR abs/2406.17246 (2024) - [i295]William Chen, Wangyou Zhang, Yifan Peng, Xinjian Li, Jinchuan Tian, Jiatong Shi, Xuankai Chang, Soumi Maiti, Karen Livescu, Shinji Watanabe:
Towards Robust Speech Representation Learning for Thousands of Languages. CoRR abs/2407.00837 (2024) - [i294]Darshan Prabhu, Yifan Peng, Preethi Jyothi, Shinji Watanabe:
Multi-Convformer: Extending Conformer with Multiple Convolution Kernels. CoRR abs/2407.03718 (2024) - [i293]Samuele Cornell
, Taejin Park, Steve Huang, Christoph Böddeker, Xuankai Chang, Matthew Maciejewski, Matthew Wiesner, Paola García, Shinji Watanabe:
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization. CoRR abs/2407.16447 (2024) - [i292]Yichen Lu, Jiaqi Song, Xuankai Chang, Hengwei Bian, Soumi Maiti, Shinji Watanabe:
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data. CoRR abs/2408.00624 (2024) - [i291]Xi Xu, Siqi Ouyang, Brian Yan, Patrick Fernandes, William Chen, Lei Li, Graham Neubig, Shinji Watanabe:
CMU's IWSLT 2024 Simultaneous Speech Translation System. CoRR abs/2408.07452 (2024) - [i290]Samuele Cornell
, Jordan Darefsky, Zhiyao Duan, Shinji Watanabe:
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition. CoRR abs/2408.09215 (2024) - [i289]Yuning Wu, Jiatong Shi, Yifeng Yu, Yuxun Tang, Tao Qian, Yueqian Lin, Jionghao Han, Xinyi Bai, Shinji Watanabe, Qin Jin:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. CoRR abs/2409.07226 (2024) - [i288]Jee-weon Jung, Wangyou Zhang, Soumi Maiti, Yihan Wu, Xin Wang, Ji-Hoon Kim, Yuta Matsunaga, Seyun Um, Jinchuan Tian, Hye-jin Shim, Nicholas W. D. Evans, Joon Son Chung, Shinnosuke Takamichi, Shinji Watanabe:
Text-To-Speech Synthesis In The Wild. CoRR abs/2409.08711 (2024) - [i287]Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell
, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe:
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration. CoRR abs/2409.09506 (2024) - [i286]Chao-Han Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, Yen-Ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, Piotr Zelasko, Chao Zhang, Yun-Nung Chen, Yu Tsao, Jagadeesh Balam, Boris Ginsburg, Sabato Marco Siniscalchi, Eng Siong Chng, Peter Bell, Catherine Lai, Shinji Watanabe, Andreas Stolcke:
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition. CoRR abs/2409.09785 (2024) - [i285]Li-Wei Chen, Takuya Higuchi, He Bai, Ahmed Hussen Abdelaziz, Alexander Rudnicky, Shinji Watanabe, Tatiana Likhomanenko, Barry-John Theobald, Zakaria Aldeneh:
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models. CoRR abs/2409.10788 (2024) - [i284]Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Li-Wei Chen, Stephen Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Tatiana Likhomanenko, Barry-John Theobald:
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels. CoRR abs/2409.10791 (2024) - [i283]Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe:
Task Arithmetic for Language Expansion in Speech Translation. CoRR abs/2409.11274 (2024) - [i282]Yihan Wu, Yifan Peng, Yichen Lu, Xuankai Chang, Ruihua Song, Shinji Watanabe:
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts. CoRR abs/2409.12370 (2024) - [i281]Jinchuan Tian, Chunlei Zhang, Jiatong Shi, Hao Zhang, Jianwei Yu, Shinji Watanabe, Dong Yu:
Preference Alignment Improves Language Model-Based TTS. CoRR abs/2409.12403 (2024) - [i280]Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Jiawei Du
, Ke-Han Lu, Alexander H. Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan
, James R. Glass, Shinji Watanabe, Hung-yi Lee:
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models. CoRR abs/2409.14085 (2024) - [i279]Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe:
Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens. CoRR abs/2409.15732 (2024) - [i278]Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe:
ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech. CoRR abs/2409.15897 (2024) - [i277]Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas W. D. Evans, Joon Son Chung, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe:
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild. CoRR abs/2409.17285 (2024) - [i276]Brian Yan, Vineel Pratap, Shinji Watanabe, Michael Auli:
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking. CoRR abs/2409.18428 (2024) - [i275]Yichen Lu, Jiaqi Song, Chao-Han Huck Yang, Shinji Watanabe:
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model. CoRR abs/2410.03007 (2024) - [i274]Yifan Peng, Krishna C. Puvvada, Zhehuai Chen, Piotr Zelasko, He Huang, Kunal Dhawan, Ke Hu, Shinji Watanabe, Jagadeesh Balam, Boris Ginsburg:
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning. CoRR abs/2410.17485 (2024) - [i273]Ibrahim Said Ahmad, Antonios Anastasopoulos, Ondrej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico, Barry Haddow, Dávid Javorský, Mateusz Krubinski, Tsz Kin Lam, Xutai Ma, Prashant Mathur, Evgeny Matusov, Chandresh Maurya, John P. McCrae, Kenton Murray, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, Atul Kr. Ojha, John E. Ortega, Sara Papi, Peter Polák, Adam Pospísil, Pavel Pecina, Elizabeth Salesky, Nivedita Sethiya, Balaram Sarkar, Jiatong Shi, Claytone Sikasote, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh
, Brian Thompson, Marco Turchi, Alex Waibel, Shinji Watanabe, Patrick Wilken, Petr Zemánek, Rodolfo Zevallos:
Findings of the IWSLT 2024 Evaluation Campaign. CoRR abs/2411.05088 (2024) - [i272]Chien-yu Huang, Wei-Chih Chen, Shu-Wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, Yi-Jen Shih, Jiatong Shi, William Chen, Xuanjun Chen, Chi-Yuan Hsiao, Puyuan Peng, Shih-Heng Wang, Chun-Yi Kuan, Ke-Han Lu, Kai-Wei Chang, Chih-Kai Yang, Fabian Ritter Gutierrez, Ming To Chuang, Kuan-Po Huang, Siddhant Arora, You-Kuan Lin, Eunjung Yeo, Kalvin Chang, Chung-Ming Chien, Kwanghee Choi, Cheng-Hsiu Hsieh, Yi-Cheng Lin, Chee-En Yu, I-Hsiang Chiu, Heitor R. Guimarães, Jionghao Han, Tzu-Quan Lin, Tzu-Yuan Lin, Homu Chang, Ting-Wu Chang, Chun Wei Chen, Shou-Jen Chen, Yu-Hua Chen, Hsi-Chun Cheng, Kunal Dhawan, Jia-Lin Fang, Shi-Xin Fang, Kuan-Yu Fang Chiang, Chi An Fu, Hsien-Fu Hsiao, Ching Yu Hsu, Shao-Syuan Huang, Lee Chen Wei, Hsi-Che Lin, Hsuan-Hao Lin, Hsuan-Ting Lin, Jian-Ren Lin, Ting-Chun Liu, Li-Chun Lu, Tsung-Min Pai, Ankita Pasad, Shih-Yun Shan Kuan, Suwon Shon, Yuxun Tang, Yun-Shao Tsai, Jui-Chiang Wei, Tzu-Chieh Wei, Chengxi Wu, Dien-Ruei Wu, Chao-Han Huck Yang, Chieh-Chi Yang, Jia Qi Yip, Shao-Xiang Yuan, Vahid Noroozi, Zhehuai Chen, Haibin Wu, Karen Livescu, David Harwath, Shinji Watanabe, Hung-yi Lee:
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks. CoRR abs/2411.05361 (2024) - [i271]Shih-Heng Wang, Jiatong Shi, Chien-yu Huang, Shinji Watanabe, Hung-yi Lee:
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition. CoRR abs/2411.18107 (2024) - [i270]Pengcheng Guo, Xuankai Chang, Hang Lv, Shinji Watanabe, Lei Xie:
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR. CoRR abs/2412.05589 (2024) - [i269]Peter Wu, Bohan Yu, Kevin Scheck, Alan W. Black, Aditi S. Krishnapriyan, Irene Y. Chen, Tanja Schultz, Shinji Watanabe, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from Multimodal Articulatory Representations. CoRR abs/2412.13387 (2024) - [i268]Jiatong Shi, Hye-jin Shim, Jinchuan Tian, Siddhant Arora, Haibin Wu, Darius Petermann, Jia Qi Yip, You Zhang, Yuxun Tang, Wangyou Zhang, Dareen Alharthi, Yichen Huang, Koichi Saito, Jionghao Han, Yiwen Zhao, Chris Donahue, Shinji Watanabe:
VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music. CoRR abs/2412.17667 (2024) - [i267]Yihan Wu, Yichen Lu, Yifan Peng, Xihua Wang, Ruihua Song, Shinji Watanabe:
Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization. CoRR abs/2412.19005 (2024) - 2023
- [j56]Matthew Maciejewski
, Jing Shi, Shinji Watanabe
, Sanjeev Khudanpur:
A dilemma of ground truth in noisy speech separation and an approach to lessen the impact of imperfect training data. Comput. Speech Lang. 77: 101410 (2023) - [j55]Yen-Ju Lu
, Xuankai Chang
, Chenda Li
, Wangyou Zhang
, Samuele Cornell
, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler
, Zhong-Qiu Wang
, Yu Tsao
, Yanmin Qian
, Shinji Watanabe
:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing. J. Open Source Softw. 8(91): 5403 (2023) - [j54]Zhong-Qiu Wang
, Gordon Wichern
, Shinji Watanabe
, Jonathan Le Roux
:
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency. IEEE ACM Trans. Audio Speech Lang. Process. 31: 397-410 (2023) - [j53]Shota Horiguchi
, Shinji Watanabe
, Paola García
, Yuki Takashima
, Yohei Kawaguchi
:
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors. IEEE ACM Trans. Audio Speech Lang. Process. 31: 706-720 (2023) - [j52]Yen-Ju Lu
, Chia-Yu Chang, Cheng Yu
, Ching-Feng Liu, Jeih-weih Hung
, Shinji Watanabe
, Yu Tsao
:
Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information. IEEE ACM Trans. Audio Speech Lang. Process. 31: 2738-2750 (2023) - [j51]Siddharth Dalmia
, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe
, Florian Metze
, Luke Zettlemoyer, Abdelrahman Mohamed:
LegoNN: Building Modular Encoder-Decoder Models. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3112-3126 (2023) - [j50]Zhong-Qiu Wang
, Samuele Cornell
, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. 31: 3221-3236 (2023) - [c384]Li-Wei Chen, Shinji Watanabe, Alexander Rudnicky:
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech. AAAI 2023: 12644-12652 - [c383]Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polak, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe:
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit. ACL (demo) 2023: 400-411 - [c382]Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan S. Sharma, Wei-Lun Wu, Hung-yi Lee, Karen Livescu, Shinji Watanabe:
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks. ACL (1) 2023: 8906-8937 - [c381]Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino:
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units. ACL (1) 2023: 15655-15680 - [c380]Tuan Vu Ho, Shota Horiguchi, Shinji Watanabe
, Paola García, Takashi Sumiyoshi:
Synthetic Data Augmentation for ASR with Domain Filtering. APSIPA ASC 2023: 1760-1765 - [c379]William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng
, Xuankai Chang, Soumi Maiti, Shinji Watanabe
:
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning. ASRU 2023: 1-8 - [c378]Yuya Fujita, Shinji Watanabe
, Xuankai Chang, Takashi Maekaku:
LV-CTC: Non-Autoregressive ASR With CTC and Latent Variable Models. ASRU 2023: 1-6 - [c377]Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe
, Yangyang Shi, Yumeng Tao:
TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch. ASRU 2023: 1-9 - [c376]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Kohei Matsuura, Takanori Ashihara, William Chen, Shinji Watanabe
:
Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation. ASRU 2023: 1-8 - [c375]Xinjian Li, Shinnosuke Takamichi, Takaaki Saeki, William Chen, Sayaka Shiota, Shinji Watanabe
:
Yodas: Youtube-Oriented Dataset for Audio and Speech. ASRU 2023: 1-8 - [c374]Yifan Peng
, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo
, Muhammad Shakeel
, Jee-Weon Jung, Soumi Maiti, Shinji Watanabe
:
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data. ASRU 2023: 1-8 - [c373]Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe
, Tetsunori Kobayashi, Tetsuji Ogawa:
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction. ASRU 2023: 1-6 - [c372]Roshan S. Sharma, William Chen, Takatomo Kano, Ruchira Sharma, Siddhant Arora, Shinji Watanabe
, Atsunori Ogawa, Marc Delcroix, Rita Singh, Bhiksha Raj:
Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems. ASRU 2023: 1-8 - [c371]Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe
:
Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond. ASRU 2023: 1-8 - [c370]Yusuke Shinohara, Shinji Watanabe
:
Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition. ASRU 2023: 1-7 - [c369]Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe
:
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference. ASRU 2023: 1-8 - [c368]Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe
, Yanmin Qian:
Toward Universal Speech Enhancement For Diverse Input Conditions. ASRU 2023: 1-6 - [c367]Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W. Black, Shinji Watanabe:
CTC Alignments Improve Autoregressive Translation. EACL 2023: 1615-1631 - [c366]Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe
:
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History. ICASSP 2023: 1-5 - [c365]Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng
, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
:
A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge. ICASSP 2023: 1-2 - [c364]Dan Berrebbi, Brian Yan, Shinji Watanabe
:
Avoid Overthinking in Self-Supervised Models for Speech Recognition. ICASSP 2023: 1-5 - [c363]Hang Chen, Shilong Wu, Yusheng Dai, Zhe Wang, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe
, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu:
Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge. ICASSP 2023: 1-2 - [c362]Li-Wei Chen, Shinji Watanabe
, Alexander Rudnicky:
A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units. ICASSP 2023: 1-5 - [c361]William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe
:
Improving Massively Multilingual ASR with Auxiliary CTC Objectives. ICASSP 2023: 1-5 - [c360]Samuele Cornell
, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe
, Manuel Pariente, Nobutaka Ono, Stefano Squartini
:
Multi-Channel Speaker Extraction with Adversarial Training: The Wavlab Submission to The Clarity ICASSP 2023 Grand Challenge. ICASSP 2023: 1-2 - [c359]Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe
:
The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge. ICASSP 2023: 1-2 - [c358]Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe
:
Streaming Joint Speech Recognition and Disfluency Detection. ICASSP 2023: 1-5 - [c357]Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola García, Hung-Yi Lee, Shinji Watanabe
, Sanjeev Khudanpur:
Euro: Espnet Unsupervised ASR Open-Source Toolkit. ICASSP 2023: 1-5 - [c356]Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
:
Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss. ICASSP 2023: 1-5 - [c355]Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
:
BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder. ICASSP 2023: 1-5 - [c354]Junwei Huang, Karthik Ganesan, Soumi Maiti, Young Min Kim, Xuankai Chang, Paul Liang, Shinji Watanabe
:
FindAdaptNet: Find and Insert Adapters by Learned Layer Importance. ICASSP 2023: 1-5 - [c353]Jee-Weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe
, Joon Son Chung:
In Search of Strong Embedding Extractors for Speaker Diarisation. ICASSP 2023: 1-5 - [c352]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Roshan S. Sharma, Kohei Matsuura, Shinji Watanabe
:
Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders. ICASSP 2023: 1-5 - [c351]Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng
, Brian Yan, Emiru Tsunoo, Shinji Watanabe
:
E-Branchformer-Based E2E SLU Toward Stop on-Device Challenge. ICASSP 2023: 1-2 - [c350]Jiachen Lian, Alan W. Black, Yijing Lu, Louis Goldstein, Shinji Watanabe
, Gopala Krishna Anumanchipalli:
Articulatory Representation Learning via Joint Factor Analysis and Neural Matrix Factorization. ICASSP 2023: 1-5 - [c349]Takashi Maekaku, Yuya Fujita, Xuankai Chang, Shinji Watanabe
:
Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model. ICASSP 2023: 1-5 - [c348]Soumi Maiti, Yifan Peng
, Takaaki Saeki, Shinji Watanabe
:
Speechlmscore: Evaluating Speech Generation Using Speech Language Model. ICASSP 2023: 1-5 - [c347]Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe
:
Align, Write, Re-Order: Explainable End-to-End Speech Translation via Operation Sequence Generation. ICASSP 2023: 1-5 - [c346]Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe
:
Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding. ICASSP 2023: 1-5 - [c345]Yifan Peng
, Jaesong Lee, Shinji Watanabe
:
I3D: Transformer Architectures with Input-Dependent Dynamic Depth for Speech Recognition. ICASSP 2023: 1-5 - [c344]Jiatong Shi, Chan-Jan Hsu, Ho-Lam Chung, Dongji Gao, Paola García, Shinji Watanabe
, Ann Lee, Hung-Yi Lee:
Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR. ICASSP 2023: 1-5 - [c343]Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe
:
Enhancing Speech-To-Speech Translation with Multiple TTS Targets. ICASSP 2023: 1-5 - [c342]Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe
:
Context-Aware Fine-Tuning of Self-Supervised Speech Models. ICASSP 2023: 1-5 - [c341]Zhong-Qiu Wang, Samuele Cornell
, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
TF-GRIDNET: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation. ICASSP 2023: 1-5 - [c340]Zhong-Qiu Wang, Samuele Cornell
, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
FNeural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated full- and sub-band Modeling. ICASSP 2023: 1-5 - [c339]Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe
, Sabato Marco Siniscalchi, Odette Scharenborg
, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition. ICASSP 2023: 1-5 - [c338]Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe
, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Speaker-Independent Acoustic-to-Articulatory Speech Inversion. ICASSP 2023: 1-5 - [c337]Felix Wu, Kwangyoun Kim, Shinji Watanabe
, Kyu Jeong Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi:
Wav2Seq: Pre-Training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages. ICASSP 2023: 1-5 - [c336]Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe
, Boris Ginsburg:
Multi-Blank Transducers for Speech Recognition. ICASSP 2023: 1-5 - [c335]Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, Shinji Watanabe
:
Towards Zero-Shot Code-Switched Speech Recognition. ICASSP 2023: 1-5 - [c334]Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
Paaploss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement. ICASSP 2023: 1-5 - [c333]Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement. ICASSP 2023: 1-5 - [c332]Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji Watanabe:
Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks. ICLR 2023 - [c331]Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg:
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations. ICML 2023: 38462-38484 - [c330]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. IJCAI 2023: 5179-5187 - [c329]Yifan Peng
, Yui Sudo
, Muhammad Shakeel
, Shinji Watanabe
:
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models. INTERSPEECH 2023: 62-66 - [c328]Puyuan Peng, Brian Yan, Shinji Watanabe
, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. INTERSPEECH 2023: 396-400 - [c327]Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng
, Brian Yan, Emiru Tsunoo, Shinji Watanabe
:
Tensor decomposition for minimization of E2E SLU model toward on-device processing. INTERSPEECH 2023: 710-714 - [c326]Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
:
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding. INTERSPEECH 2023: 720-724 - [c325]Jiatong Shi, Dan Berrebbi, William Chen, En-Pei Hu, Wei-Ping Huang, Ho-Lam Chung, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe
:
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark. INTERSPEECH 2023: 884-888 - [c324]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe
:
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition. INTERSPEECH 2023: 1369-1373 - [c323]Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe
:
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning. INTERSPEECH 2023: 1399-1403 - [c322]Roshan Sharma, Siddhant Arora, Kenneth Zheng, Shinji Watanabe
, Rita Singh, Bhiksha Raj:
BASS: Block-wise Adaptation for Speech Summarization. INTERSPEECH 2023: 1454-1458 - [c321]Jiyang Tang
, William Chen, Xuankai Chang, Shinji Watanabe
, Brian MacWhinney:
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning. INTERSPEECH 2023: 1528-1532 - [c320]Yifan Peng
, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang
, Suwon Shon, Prashant Sridhar, Shinji Watanabe
:
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks. INTERSPEECH 2023: 2208-2212 - [c319]Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu Gong, Juan Pino, Shinji Watanabe
:
Exploration on HuBERT with Multiple Resolution. INTERSPEECH 2023: 3287-3291 - [c318]Yui Sudo
, Muhammad Shakeel
, Brian Yan, Jiatong Shi, Shinji Watanabe
:
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders. INTERSPEECH 2023: 3312-3316 - [c317]Peter Polák, Brian Yan, Shinji Watanabe
, Alex Waibel, Ondrej Bojar:
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff. INTERSPEECH 2023: 3979-3983 - [c316]William Chen, Xuankai Chang, Yifan Peng
, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe
:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. INTERSPEECH 2023: 4404-4408 - [c315]Yui Sudo
, Muhammad Shakeel
, Yifan Peng
, Shinji Watanabe
:
Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training. INTERSPEECH 2023: 4479-4483 - [c314]Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe
:
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction. INTERSPEECH 2023: 4968-4972 - [c313]Peter Wu, Tingle Li, Yijing Lu, Yubin Zhang, Jiachen Lian, Alan W. Black, Louis Goldstein, Shinji Watanabe
, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from MRI-Based Articulatory Representations. INTERSPEECH 2023: 5132-5136 - [c312]Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondrej Bojar, Claudia Borg
, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Chen, Khalid Choukri, Alexandra Chronopoulou, Anna Currey, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny Matusov, Paul McNamee, John P. McCrae, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Ha Nguyen, Jan Niehues, Xing Niu, Atul Kr. Ojha, John E. Ortega, Proyag Pal, Juan Pino, Lonneke van der Plas, Peter Polák, Elijah Rippeth, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Yun Tang, Brian Thompson, Kevin Tran, Marco Turchi, Alex Waibel, Mingxuan Wang, Shinji Watanabe, Rodolfo Zevallos:
Findings of the IWSLT 2023 Evaluation Campaign. IWSLT@ACL 2023: 1-61 - [c311]Brian Yan, Jiatong Shi, Soumi Maiti, William Chen, Xinjian Li, Yifan Peng, Siddhant Arora, Shinji Watanabe:
CMU's IWSLT 2023 Simultaneous Speech Translation System. IWSLT@ACL 2023: 235-240 - [c310]Zhong-Qiu Wang, Shinji Watanabe:
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures. NeurIPS 2023 - [c309]Taiqi He, Lindia Tjuatja, Nathaniel R. Robinson, Shinji Watanabe, David R. Mortensen, Graham Neubig, Lori S. Levin:
SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing. SIGMORPHON 2023: 209-216 - [c308]Georgios Karakasidis, Nathaniel R. Robinson, Yaroslav Getman
, Atieno Ogayo, Ragheb Al-Ghezi, Ananya Ayasi
, Shinji Watanabe
, David R. Mortensen, Mikko Kurimo:
Multilingual TTS Accent Impressions for Accented ASR. TSD 2023: 317-327 - [c307]Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell
, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe
:
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation. WASPAA 2023: 1-5 - [d1]Yen-Ju Lu
, Xuankai Chang
, Chenda Li
, Wangyou Zhang
, Samuele Cornell
, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler
, Zhong-Qiu Wang
, Yu Tsao
, Yanmin Qian
, Shinji Watanabe
:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310). Zenodo, 2023 - [i266]Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe
, Wassim El-Hajj, Ahmed Ali:
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study. CoRR abs/2301.09099 (2023) - [i265]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe
, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. CoRR abs/2301.12596 (2023) - [i264]Li-Wei Chen, Shinji Watanabe
, Alexander Rudnicky
:
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech. CoRR abs/2302.04215 (2023) - [i263]Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe
, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Speaker-Independent Acoustic-to-Articulatory Speech Inversion. CoRR abs/2302.06774 (2023) - [i262]Samuele Cornell
, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe
, Manuel Pariente, Nobutaka Ono:
Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge. CoRR abs/2302.07928 (2023) - [i261]Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement. CoRR abs/2302.08088 (2023) - [i260]Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement. CoRR abs/2302.08095 (2023) - [i259]William Chen, Brian Yan, Jiatong Shi, Yifan Peng
, Soumi Maiti, Shinji Watanabe
:
Improving Massively Multilingual ASR With Auxiliary CTC Objectives. CoRR abs/2302.12829 (2023) - [i258]Yifan Peng
, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe
:
Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding. CoRR abs/2302.14132 (2023) - [i257]Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe
:
End-to-End Speech Recognition: A Survey. CoRR abs/2303.03329 (2023) - [i256]Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe
, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The Multimodal Information based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition. CoRR abs/2303.06326 (2023) - [i255]Yifan Peng
, Jaesong Lee, Shinji Watanabe
:
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition. CoRR abs/2303.07624 (2023) - [i254]Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe
:
Enhancing Speech-to-Speech Translation with Multiple TTS Targets. CoRR abs/2304.04618 (2023) - [i253]Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg:
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations. CoRR abs/2304.06795 (2023) - [i252]Zhong-Qiu Wang, Samuele Cornell
, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling. CoRR abs/2304.08707 (2023) - [i251]Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe
:
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head. CoRR abs/2304.12995 (2023) - [i250]Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe
:
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History. CoRR abs/2305.00926 (2023) - [i249]Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng
, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge. CoRR abs/2305.01194 (2023) - [i248]Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng
, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
:
A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge. CoRR abs/2305.01620 (2023) - [i247]Yu-Kuan Fu, Liang-Hsuan Tseng, Jiatong Shi, Chen-An Li, Tsu-Yuan Hsu, Shinji Watanabe, Hung-Yi Lee:
Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation. CoRR abs/2305.07455 (2023) - [i246]Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei-Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe
:
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark. CoRR abs/2305.10615 (2023) - [i245]Yifan Peng
, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe:
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks. CoRR abs/2305.11073 (2023) - [i244]Puyuan Peng, Brian Yan, Shinji Watanabe
, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. CoRR abs/2305.11095 (2023) - [i243]Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney:
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning. CoRR abs/2305.13331 (2023) - [i242]Yifan Peng, Yui Sudo, Muhammad Shakeel, Shinji Watanabe:
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models. CoRR abs/2305.17651 (2023) - [i241]Xuankai Chang, Brian Yan, Yuya Fujita, Takashi Maekaku, Shinji Watanabe:
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning. CoRR abs/2305.18108 (2023) - [i240]Zhong-Qiu Wang, Shinji Watanabe
:
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures. CoRR abs/2305.20054 (2023) - [i239]Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu Gong, Juan Pino, Shinji Watanabe:
Exploration on HuBERT with Multiple Resolutions. CoRR abs/2306.01084 (2023) - [i238]William Chen, Xuankai Chang, Yifan Peng
, Zhaoheng Ni, Soumi Maiti, Shinji Watanabe:
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute. CoRR abs/2306.06672 (2023) - [i237]Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola García, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur:
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios. CoRR abs/2306.13734 (2023) - [i236]Roshan S. Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj:
BASS: Block-wise Adaptation for Speech Summarization. CoRR abs/2307.08217 (2023) - [i235]Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding. CoRR abs/2307.11005 (2023) - [i234]Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe:
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation. CoRR abs/2307.12231 (2023) - [i233]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe:
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition. CoRR abs/2307.12767 (2023) - [i232]Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe
:
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction. CoRR abs/2308.10107 (2023) - [i231]Soumi Maiti, Yifan Peng
, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe
:
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks. CoRR abs/2309.07937 (2023) - [i230]Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe
, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao:
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction. CoRR abs/2309.08348 (2023) - [i229]Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe
, Yong Man Ro:
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens. CoRR abs/2309.08531 (2023) - [i228]Jeong Hun Yeo, Minsu Kim, Shinji Watanabe
, Yong Man Ro:
Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model. CoRR abs/2309.08535 (2023) - [i227]Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe
:
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation. CoRR abs/2309.08876 (2023) - [i226]Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng
, Roshan S. Sharma, Shinji Watanabe
, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee:
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech. CoRR abs/2309.09510 (2023) - [i225]Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu
, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee:
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models. CoRR abs/2309.10787 (2023) - [i224]Siddhant Arora, George Saon
, Shinji Watanabe
, Brian Kingsbury:
Semi-Autoregressive Streaming ASR With Label Context. CoRR abs/2309.10926 (2023) - [i223]Peter Polák, Brian Yan, Shinji Watanabe
, Alex Waibel, Ondrej Bojar:
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff. CoRR abs/2309.11379 (2023) - [i222]Yifan Peng
, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan S. Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel
, Jee-weon Jung, Soumi Maiti, Shinji Watanabe
:
Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data. CoRR abs/2309.13876 (2023) - [i221]Masao Someki, Nicholas Eng, Yosuke Higuchi, Shinji Watanabe
:
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference. CoRR abs/2309.14922 (2023) - [i220]William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng
, Xuankai Chang, Soumi Maiti, Shinji Watanabe
:
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning. CoRR abs/2309.15317 (2023) - [i219]Amir Hussein, Dorsa Zeinali, Ondrej Klejch, Matthew Wiesner, Brian Yan, Shammur Absar Chowdhury, Ahmed M. Ali, Shinji Watanabe
, Sanjeev Khudanpur:
Speech collage: code-switched audio generation by collaging monolingual corpora. CoRR abs/2309.15674 (2023) - [i218]Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe
, Sanjeev Khudanpur:
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization. CoRR abs/2309.15686 (2023) - [i217]Xuankai Chang, Brian Yan, Kwanghee Choi, Jee-Weon Jung, Yichen Lu, Soumi Maiti, Roshan S. Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe
, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang:
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study. CoRR abs/2309.15800 (2023) - [i216]Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe
:
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing. CoRR abs/2309.15826 (2023) - [i215]Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, François G. Germain, Jonathan Le Roux, Shinji Watanabe
:
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation. CoRR abs/2309.17352 (2023) - [i214]Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe
, Yanmin Qian:
Toward Universal Speech Enhancement for Diverse Input Conditions. CoRR abs/2309.17384 (2023) - [i213]Dongchao Yang, Jinchuan Tian, Xu Tan
, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, Shinji Watanabe
, Helen Meng:
UniAudio: An Audio Foundation Model Toward Universal Audio Generation. CoRR abs/2310.00704 (2023) - [i212]Samuele Cornell
, Jee-weon Jung, Shinji Watanabe
, Stefano Squartini
:
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition. CoRR abs/2310.01688 (2023) - [i211]Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng
, Roshan S. Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe
:
UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network. CoRR abs/2310.02973 (2023) - [i210]Tejes Srivastava, Jiatong Shi, William Chen, Shinji Watanabe
:
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios. CoRR abs/2310.03938 (2023) - [i209]Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji Watanabe
:
HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model. CoRR abs/2310.03975 (2023) - [i208]Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chung, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe
:
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond. CoRR abs/2310.05513 (2023) - [i207]Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe
, Tetsunori Kobayashi, Tetsuji Ogawa:
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction. CoRR abs/2310.08277 (2023) - [i206]Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe
, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell
, Sean Kim, Stavros Petridis:
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch. CoRR abs/2310.17864 (2023) - [i205]Shih-Lun Wu, Chris Donahue, Shinji Watanabe
, Nicholas J. Bryan:
Music ControlNet: Multiple Time-varying Controls for Music Generation. CoRR abs/2311.07069 (2023) - [i204]Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe
:
Phoneme-aware Encoding for Prefix-tree-based Contextual ASR. CoRR abs/2312.09582 (2023) - [i203]Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe
, Karen Livescu:
Generative Context-aware Fine-tuning of Self-supervised Speech Models. CoRR abs/2312.09895 (2023) - [i202]Kwanghee Choi, Jee-weon Jung, Shinji Watanabe
:
Understanding Probe Behaviors through Variational Bounds of Mutual Information. CoRR abs/2312.10019 (2023) - 2022
- [j49]Amir Hussein
, Shinji Watanabe
, Ahmed Ali:
Arabic speech recognition by end-to-end, modular systems and human. Comput. Speech Lang. 71: 101272 (2022) - [j48]Zili Huang, Marc Delcroix
, Leibny Paola García-Perera
, Shinji Watanabe
, Desh Raj
, Sanjeev Khudanpur:
Joint speaker diarization and speech recognition based on region proposal networks. Comput. Speech Lang. 72: 101316 (2022) - [j47]Tae Jin Park
, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu Jeong Han, Shinji Watanabe
, Shrikanth Narayanan:
A review of speaker diarization: Recent advances with deep learning. Comput. Speech Lang. 72: 101317 (2022) - [j46]Jiatong Shi
, Chunlei Zhang
, Chao Weng, Shinji Watanabe
, Meng Yu, Dong Yu
:
An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer. Comput. Speech Lang. 73: 101327 (2022) - [j45]Aswin Shanmugam Subramanian
, Chao Weng, Shinji Watanabe
, Meng Yu, Dong Yu
:
Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition. Comput. Speech Lang. 75: 101360 (2022) - [j44]Jing Shi
, Xuankai Chang, Shinji Watanabe
, Bo Xu:
Train from scratch: Single-stage joint training of speech separation and recognition. Comput. Speech Lang. 76: 101387 (2022) - [j43]Hung-Yi Lee, Shinji Watanabe
, Karen Livescu
, Abdelrahman Mohamed, Tara N. Sainath
:
Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing. IEEE J. Sel. Top. Signal Process. 16(6): 1174-1178 (2022) - [j42]Abdelrahman Mohamed, Hung-yi Lee
, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin
, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu
, Lars Maaløe, Tara N. Sainath
, Shinji Watanabe
:
Self-Supervised Speech Representation Learning: A Review. IEEE J. Sel. Top. Signal Process. 16(6): 1179-1210 (2022) - [j41]Zhong-Qiu Wang
, Shinji Watanabe
:
Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction. IEEE Signal Process. Lett. 29: 1422-1426 (2022) - [j40]Shota Horiguchi
, Yusuke Fujita
, Shinji Watanabe
, Yawen Xue, Paola García
:
Encoder-Decoder Based Attractors for End-to-End Neural Diarization. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1493-1507 (2022) - [j39]Wangyou Zhang
, Xuankai Chang
, Christoph Böddeker, Tomohiro Nakatani
, Shinji Watanabe
, Yanmin Qian
:
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party. IEEE ACM Trans. Audio Speech Lang. Process. 30: 3173-3188 (2022) - [c306]Xinjian Li, Florian Metze, David R. Mortensen
, Shinji Watanabe, Alan W. Black:
Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble. ACL (Findings) 2022: 2106-2115 - [c305]Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee:
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities. ACL (1) 2022: 8479-8492 - [c304]Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W. Black, Shinji Watanabe:
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models. EMNLP (Findings) 2022: 5419-5429 - [c303]Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe:
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model. EMNLP (Findings) 2022: 5486-5503 - [c302]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe
:
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion. ICASSP 2022: 6237-6241 - [c301]Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe
, Dong Yu:
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization. ICASSP 2022: 6412-6416 - [c300]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe
, Tomoki Toda:
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations. ICASSP 2022: 6552-6556 - [c299]Motoi Omachi, Yuya Fujita, Shinji Watanabe
, Tianzi Wang:
Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing. ICASSP 2022: 6772-6776 - [c298]Zili Huang, Shinji Watanabe
, Shu-Wen Yang, Paola García, Sanjeev Khudanpur:
Investigating Self-Supervised Learning for Speech Enhancement and Separation. ICASSP 2022: 6837-6841 - [c297]Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jeff Hwang, Ji Chen, Peter Goldsborough, Sean Narenthiran, Shinji Watanabe
, Soumith Chintala, Vincent Quenneville-Bélair:
Torchaudio: Building Blocks for Audio and Speech Processing. ICASSP 2022: 6982-6986 - [c296]Takashi Maekaku, Xuankai Chang, Yuya Fujita, Shinji Watanabe
:
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion. ICASSP 2022: 7107-7111 - [c295]Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng
, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe
:
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet. ICASSP 2022: 7167-7171 - [c294]Niko Moritz, Takaaki Hori, Shinji Watanabe
, Jonathan Le Roux:
Sequence Transduction with Graph-Based Supervision. ICASSP 2022: 7212-7216 - [c293]Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe
, Jonathan Le Roux:
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR. ICASSP 2022: 7322-7326 - [c292]Shota Horiguchi, Yuki Takashima, Paola García, Shinji Watanabe
, Yohei Kawaguchi:
Multi-Channel End-To-End Neural Diarization with Distributed Microphones. ICASSP 2022: 7332-7336 - [c291]Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe
, Alexander Richard, Cheng Yu, Yu Tsao:
Conditional Diffusion Probabilistic Model for Speech Enhancement. ICASSP 2022: 7402-7406 - [c290]Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Jeong Han, Shinji Watanabe
:
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition. ICASSP 2022: 7872-7876 - [c289]Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe
:
Joint Speech Recognition and Audio Captioning. ICASSP 2022: 7892-7896 - [c288]Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe
:
Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR. ICASSP 2022: 8287-8291 - [c287]Keqi Deng, Zehui Yang, Shinji Watanabe
, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang:
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models. ICASSP 2022: 8522-8526 - [c286]Yen-Ju Lu, Samuele Cornell
, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe
:
Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge. ICASSP 2022: 9201-9205 - [c285]Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe
, Sabato Marco Siniscalchi, Odette Scharenborg
, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results. ICASSP 2022: 9266-9270 - [c284]Yifan Peng
, Siddharth Dalmia, Ian R. Lane, Shinji Watanabe:
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding. ICML 2022: 17627-17643 - [c283]Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, Shinji Watanabe
:
TriniTTS: Pitch-controllable End-to-end TTS without External Aligner. INTERSPEECH 2022: 16-20 - [c282]Peter Wu, Shinji Watanabe
, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from Articulatory Representations. INTERSPEECH 2022: 779-783 - [c281]Takashi Maekaku, Yuya Fujita, Yifan Peng
, Shinji Watanabe
:
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR. INTERSPEECH 2022: 1071-1075 - [c280]Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe
, Odette Scharenborg
, Jingdong Chen, Shifu Xiong, Jianqing Gao:
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis. INTERSPEECH 2022: 1111-1115 - [c279]Jiatong Shi, George Saon
, David Haws, Shinji Watanabe
, Brian Kingsbury:
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States. INTERSPEECH 2022: 1656-1660 - [c278]Keqi Deng, Shinji Watanabe
, Jiatong Shi, Siddhant Arora:
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation. INTERSPEECH 2022: 1746-1750 - [c277]Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe
, Odette Scharenborg
, Jingdong Chen, Baocai Yin, Jia Pan:
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis. INTERSPEECH 2022: 1766-1770 - [c276]Yusuke Shinohara, Shinji Watanabe
:
Minimum latency training of sequence transducers for streaming end-to-end speech recognition. INTERSPEECH 2022: 2098-2102 - [c275]Yuki Takashima, Shota Horiguchi, Shinji Watanabe
, Leibny Paola García-Perera, Yohei Kawaguchi:
Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models. INTERSPEECH 2022: 2218-2222 - [c274]Muqiao Yang, Ian R. Lane, Shinji Watanabe
:
Online Continual Learning of End-to-End Speech Recognition Models. INTERSPEECH 2022: 2668-2672 - [c273]Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
Improving Speech Enhancement through Fine-Grained Speech Characteristics. INTERSPEECH 2022: 2953-2957 - [c272]Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe
:
Two-Pass Low Latency End-to-End Spoken Language Understanding. INTERSPEECH 2022: 3478-3482 - [c271]Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan D. Amith, Shinji Watanabe
:
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation. INTERSPEECH 2022: 3533-3537 - [c270]Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen
, Shinji Watanabe
:
When Is TTS Augmentation Through a Pivot Language Useful? INTERSPEECH 2022: 3538-3542 - [c269]Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe
:
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation. INTERSPEECH 2022: 3819-3823 - [c268]Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Prasad Narisetty, Shinji Watanabe
:
Residual Language Model for End-to-end Speech Recognition. INTERSPEECH 2022: 3899-3903 - [c267]Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe
, Qin Jin:
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. INTERSPEECH 2022: 4272-4276 - [c266]Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe
, Qin Jin:
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis. INTERSPEECH 2022: 4277-4281 - [c265]Jaesong Lee, Lukas Lee, Shinji Watanabe
:
Memory-Efficient Training of RNN-Transducer with Sampled Softmax. INTERSPEECH 2022: 4441-4445 - [c264]Yui Sudo
, Muhammad Shakeel
, Kazuhiro Nakadai, Jiatong Shi, Shinji Watanabe
:
Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection. INTERSPEECH 2022: 4641-4645 - [c263]Xinjian Li, Florian Metze, David R. Mortensen
, Alan W. Black, Shinji Watanabe
:
ASR2K: Speech Recognition for Around 2000 Languages without Audio. INTERSPEECH 2022: 4885-4889 - [c262]Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe
, Yusuke Kida:
Better Intermediates Improve CTC Inference. INTERSPEECH 2022: 4965-4969 - [c261]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell
, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. INTERSPEECH 2022: 5458-5462 - [c260]Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondrej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vera Kloudová
, Surafel Melaku Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Jan Niehues
, Xing Niu, John Ortega, Juan Miguel Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe:
Findings of the IWSLT 2022 Evaluation Campaign. IWSLT@ACL 2022: 98-157 - [c259]Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng
, Dan Berrebbi, Xinyi Wang, Graham Neubig, Shinji Watanabe:
CMU's IWSLT 2022 Dialect Speech Translation System. IWSLT@ACL 2022: 298-307 - [c258]Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe:
Phone Inventories and Recognition for Every Language. LREC 2022: 1061-1067 - [c257]Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe
:
E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition. SLT 2022: 84-91 - [c256]Yoshiki Masuyama, Xuankai Chang, Samuele Cornell
, Shinji Watanabe
, Nobutaka Ono:
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation. SLT 2022: 260-265 - [c255]Yifan Peng
, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
:
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. SLT 2022: 406-413 - [c254]Soumi Maiti, Yushi Ueda, Shinji Watanabe
, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. SLT 2022: 480-487 - [c253]Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe
, Yanmin Qian:
End-to-End Multi-Speaker ASR with Independent Vector Analysis. SLT 2022: 496-501 - [c252]Shota Horiguchi, Yuki Takashima, Shinji Watanabe
, Paola García:
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization. SLT 2022: 620-625 - [c251]Tzu-hsun Feng, Shuyan Annie Dong, Ching-Feng Yeh, Shu-Wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe
, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee:
Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning. SLT 2022: 1096-1103 - [c250]Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe
, Paola García, Hung-yi Lee, Hao Tang:
On Compressing Sequences for Self-Supervised Speech Models. SLT 2022: 1128-1135 - [i201]Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, Shinji Watanabe:
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies. CoRR abs/2201.05420 (2022) - [i200]Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang:
Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models. CoRR abs/2201.10103 (2022) - [i199]Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe:
Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR. CoRR abs/2201.10190 (2022) - [i198]Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe:
Joint Speech Recognition and Audio Captioning. CoRR abs/2202.01405 (2022) - [i197]Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao:
Conditional Diffusion Probabilistic Model for Speech Enhancement. CoRR abs/2202.05256 (2022) - [i196]Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, Tomoki Hayashi:
Acoustic Event Detection with Classifier Chains. CoRR abs/2202.08470 (2022) - [i195]Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe:
Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge. CoRR abs/2202.12298 (2022) - [i194]Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe
, Jonathan Le Roux:
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR. CoRR abs/2203.00232 (2022) - [i193]Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally
, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse H. Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe
, Zeyu Jin, Yonatan Bisk:
HEAR 2021: Holistic Evaluation of Audio Representations. CoRR abs/2203.03022 (2022) - [i192]Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe
, Abdelrahman Mohamed, Hung-yi Lee:
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities. CoRR abs/2203.06849 (2022) - [i191]Jaesong Lee, Lukas Lee, Shinji Watanabe
:
Memory-Efficient Training of RNN-Transducer with Sampled Softmax. CoRR abs/2203.16868 (2022) - [i190]Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe
, Qin Jin:
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. CoRR abs/2203.17001 (2022) - [i189]Yushi Ueda, Soumi Maiti, Shinji Watanabe
, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. CoRR abs/2203.17068 (2022) - [i188]Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe
, Yusuke Kida:
Better Intermediates Improve CTC Inference. CoRR abs/2204.00176 (2022) - [i187]Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe
, Yanmin Qian:
End-to-End Multi-speaker ASR with Independent Vector Analysis. CoRR abs/2204.00218 (2022) - [i186]Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe
:
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation. CoRR abs/2204.00540 (2022) - [i185]Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan D. Amith, Shinji Watanabe
:
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation. CoRR abs/2204.02470 (2022) - [i184]Zhong-Qiu Wang, Shinji Watanabe
:
Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction. CoRR abs/2204.07566 (2022) - [i183]Keqi Deng, Shinji Watanabe
, Jiatong Shi, Siddhant Arora:
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation. CoRR abs/2204.08920 (2022) - [i182]Zhong-Qiu Wang, Gordon Wichern, Shinji Watanabe
, Jonathan Le Roux:
STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency. CoRR abs/2204.09911 (2022) - [i181]Felix Wu, Kwangyoun Kim, Shinji Watanabe
, Kyu Jeong Han, Ryan McDonald, Kilian Q. Weinberger, Yoav Artzi:
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages. CoRR abs/2205.01086 (2022) - [i180]Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Frank Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe
, Qin Jin:
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis. CoRR abs/2205.04029 (2022) - [i179]Abdelrahman Mohamed, Hung-yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu
, Lars Maaløe, Tara N. Sainath, Shinji Watanabe
:
Self-Supervised Speech Representation Learning: A Review. CoRR abs/2205.10643 (2022) - [i178]Shota Horiguchi, Shinji Watanabe
, Paola García, Yuki Takashima, Yohei Kawaguchi:
Online Neural Diarization of Unlimited Numbers of Speakers. CoRR abs/2206.02432 (2022) - [i177]Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe
, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed:
LegoNN: Building Modular Encoder-Decoder Models. CoRR abs/2206.03318 (2022) - [i176]Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Narisetty, Shinji Watanabe
:
Residual Language Model for End-to-end Speech Recognition. CoRR abs/2206.07430 (2022) - [i175]Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
Improving Speech Enhancement through Fine-Grained Speech Characteristics. CoRR abs/2207.00237 (2022) - [i174]Yifan Peng
, Siddharth Dalmia, Ian R. Lane, Shinji Watanabe
:
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding. CoRR abs/2207.02971 (2022) - [i173]Muqiao Yang, Ian R. Lane, Shinji Watanabe
:
Online Continual Learning of End-to-End Speech Recognition Models. CoRR abs/2207.05071 (2022) - [i172]Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe
:
Two-Pass Low Latency End-to-End Spoken Language Understanding. CoRR abs/2207.06670 (2022) - [i171]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell
, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. CoRR abs/2207.09514 (2022) - [i170]Nathaniel R. Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen
, Shinji Watanabe
:
When Is TTS Augmentation Through a Pivot Language Useful? CoRR abs/2207.09889 (2022) - [i169]Jiatong Shi, George Saon
, David Haws, Shinji Watanabe
, Brian Kingsbury:
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States. CoRR abs/2208.01818 (2022) - [i168]Xinjian Li, Florian Metze, David R. Mortensen
, Alan W. Black, Shinji Watanabe
:
ASR2K: Speech Recognition for Around 2000 Languages without Audio. CoRR abs/2209.02842 (2022) - [i167]Zhong-Qiu Wang, Samuele Cornell
, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation. CoRR abs/2209.03952 (2022) - [i166]Peter Wu, Shinji Watanabe
, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from Articulatory Representations. CoRR abs/2209.06337 (2022) - [i165]Kwangyoun Kim, Felix Wu, Yifan Peng
, Jing Pan, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe
:
E-Branchformer: Branchformer with Enhanced merging for speech recognition. CoRR abs/2210.00077 (2022) - [i164]Shota Horiguchi, Yuki Takashima, Shinji Watanabe
, Paola García:
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization. CoRR abs/2210.03459 (2022) - [i163]Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W. Black, Shinji Watanabe
:
CTC Alignments Improve Autoregressive Translation. CoRR abs/2210.05200 (2022) - [i162]Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe
, Paola García, Hung-yi Lee, Hao Tang:
On Compressing Sequences for Self-Supervised Speech Models. CoRR abs/2210.07189 (2022) - [i161]Jinchuan Tian, Brian Yan, Jianwei Yu, Chao Weng, Dong Yu, Shinji Watanabe
:
Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks. CoRR abs/2210.07499 (2022) - [i160]Tzu-hsun Feng, Shuyan Annie Dong, Ching-Feng Yeh, Shu-Wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe
, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee:
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning. CoRR abs/2210.08634 (2022) - [i159]Yoshiki Masuyama, Xuankai Chang, Samuele Cornell
, Shinji Watanabe
, Nobutaka Ono:
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation. CoRR abs/2210.10742 (2022) - [i158]Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesong Lee, Hye-jin Shim, Youngki Kwon, Joon Son Chung, Shinji Watanabe
:
Large-scale learning of generalised representations for speaker recognition. CoRR abs/2210.10985 (2022) - [i157]Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe
, Joon Son Chung:
In search of strong embedding extractors for speaker diarisation. CoRR abs/2210.14682 (2022) - [i156]Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W. Black, Shinji Watanabe
:
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models. CoRR abs/2210.15734 (2022) - [i155]Jiachen Lian, Alan W. Black, Yijing Lu, Louis Goldstein, Shinji Watanabe
, Gopala Krishna Anumanchipalli:
Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization. CoRR abs/2210.16498 (2022) - [i154]Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
:
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model. CoRR abs/2210.16663 (2022) - [i153]Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
:
BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder. CoRR abs/2211.00792 (2022) - [i152]Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
:
InterMPL: Momentum Pseudo-Labeling with Intermediate CTC Loss. CoRR abs/2211.00795 (2022) - [i151]Brian Yan, Matthew Wiesner, Ondrej Klejch, Preethi Jyothi, Shinji Watanabe
:
Towards Zero-Shot Code-Switched Speech Recognition. CoRR abs/2211.01458 (2022) - [i150]Yusuke Shinohara, Shinji Watanabe
:
Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition. CoRR abs/2211.02333 (2022) - [i149]Jiatong Shi, Chan-Jan Hsu, Ho-Lam Chung, Dongji Gao, Paola García, Shinji Watanabe
, Ann Lee, Hung-yi Lee:
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR. CoRR abs/2211.03025 (2022) - [i148]Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe
, Boris Ginsburg:
Multi-blank Transducers for Speech Recognition. CoRR abs/2211.03541 (2022) - [i147]Yifan Peng
, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
:
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. CoRR abs/2211.05869 (2022) - [i146]Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe
:
Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation. CoRR abs/2211.05967 (2022) - [i145]Li-Wei Chen, Shinji Watanabe
, Alexander Rudnicky
:
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units. CoRR abs/2211.06535 (2022) - [i144]Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe
:
Streaming Joint Speech Recognition and Disfluency Detection. CoRR abs/2211.08726 (2022) - [i143]Dan Berrebbi, Brian Yan, Shinji Watanabe
:
Avoid Overthinking in Self-Supervised Models for Speech Recognition. CoRR abs/2211.08989 (2022) - [i142]Zhong-Qiu Wang, Samuele Cornell
, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe
:
TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation. CoRR abs/2211.12433 (2022) - [i141]Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola García, Hung-yi Lee, Shinji Watanabe
, Sanjeev Khudanpur:
EURO: ESPnet Unsupervised ASR Open-source Toolkit. CoRR abs/2211.17196 (2022) - [i140]Soumi Maiti, Yifan Peng
, Takaaki Saeki, Shinji Watanabe
:
SpeechLMScore: Evaluating speech generation using speech language model. CoRR abs/2212.04559 (2022) - [i139]Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe
, Juan Pino:
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units. CoRR abs/2212.08055 (2022) - [i138]Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu
, Shinji Watanabe
:
Context-aware Fine-tuning of Self-supervised Speech Models. CoRR abs/2212.08542 (2022) - [i137]Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan S. Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
:
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks. CoRR abs/2212.10525 (2022) - [i136]Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe
:
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders. CoRR abs/2212.10818 (2022) - 2021
- [j38]Reinhold Haeb-Umbach
, Jahn Heymann, Lukas Drude
, Shinji Watanabe
, Marc Delcroix
, Tomohiro Nakatani:
Far-Field Automatic Speech Recognition. Proc. IEEE 109(2): 124-148 (2021) - [j37]Nanxin Chen
, Shinji Watanabe
, Jesús Villalba
, Piotr Zelasko
, Najim Dehak
:
Non-Autoregressive Transformer for Speech Recognition. IEEE Signal Process. Lett. 28: 121-125 (2021) - [c249]Yen-Ju Lu, Yu Tsao, Shinji Watanabe:
A Study on Speech Enhancement Based on Diffusion Probabilistic Model. APSIPA ASC 2021: 659-666 - [c248]Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, Louis-Philippe Morency:
Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks. APSIPA ASC 2021: 841-848 - [c247]Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, Shinji Watanabe
:
A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies. ASRU 2021: 16-23 - [c246]Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe
:
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation. ASRU 2021: 47-54 - [c245]Shota Horiguchi, Shinji Watanabe
, Paola García, Yawen Xue, Yuki Takashima, Yohei Kawaguchi:
Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors. ASRU 2021: 98-105 - [c244]Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-Wen Yang, Yu Tsao, Hung-yi Lee, Shinji Watanabe
:
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition. ASRU 2021: 228-235 - [c243]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe
:
Attention-Based Multi-Hypothesis Fusion for Speech Summarization. ASRU 2021: 487-494 - [c242]Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe
, Tomoki Toda:
On Prosody Modeling for ASR+TTS Based Voice Conversion. ASRU 2021: 642-649 - [c241]Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han
, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe
, Zheng-Hua Tan
, Hui Bu, Tao Yu, Shidong Shang:
Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing. ASRU 2021: 679-686 - [c240]Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, Shinji Watanabe
:
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates. ASRU 2021: 922-929 - [c239]Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe
, Alan W. Black:
Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity. ASRU 2021: 1050-1057 - [c238]Chaitanya Prasad Narisetty, Tomoki Hayashi, Ryunosuke Ishizaki, Shinji Watanabe, Kazuya Takeda:
Leveraging State-of-the-art ASR Techniques to Audio Captioning. DCASE 2021: 160-164 - [c237]Jiatong Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, Shinji Watanabe:
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec. EACL 2021: 1134-1145 - [c236]Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita
, Marc Delcroix
, Shinji Watanabe
, Yanmin Qian:
Dual-Path Modeling for Long Recording Speech Separation in Meetings. ICASSP 2021: 5739-5743 - [c235]Matthew Maciejewski, Jing Shi, Shinji Watanabe
, Sanjeev Khudanpur:
Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step. ICASSP 2021: 5774-5778 - [c234]Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe
, Kun Wei, Wangyou Zhang, Yuekai Zhang:
Recent Developments on Espnet Toolkit Boosted By Conformer. ICASSP 2021: 5874-5878 - [c233]Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe
:
Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition. ICASSP 2021: 6214-6218 - [c232]Jaesong Lee, Shinji Watanabe
:
Intermediate Loss Regularization for CTC-Based Speech Recognition. ICASSP 2021: 6224-6228 - [c231]Murali Karthick Baskar, Lukás Burget, Shinji Watanabe
, Ramón Fernandez Astudillo, Jan Honza Cernocký:
Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition. ICASSP 2021: 6753-6757 - [c230]Wangyou Zhang, Christoph Böddeker, Shinji Watanabe
, Tomohiro Nakatani, Marc Delcroix
, Keisuke Kinoshita
, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach
, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend. ICASSP 2021: 6898-6902 - [c229]Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe
, Meng Yu, Dong Yu:
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation. ICASSP 2021: 6908-6912 - [c228]Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe
, John R. Hershey:
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings. ICASSP 2021: 7183-7187 - [c227]Shota Horiguchi, Paola García
, Yusuke Fujita, Shinji Watanabe
, Kenji Nagamatsu:
End-To-End Speaker Diarization as Post-Processing. ICASSP 2021: 7188-7192 - [c226]Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara
, Shinji Watanabe
:
ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder. ICASSP 2021: 7503-7507 - [c225]Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe
, Tetsuji Ogawa, Tetsunori Kobayashi:
Improved Mask-CTC for Non-Autoregressive End-to-End ASR. ICASSP 2021: 8363-8367 - [c224]Aswin Shanmugam Subramanian
, Chao Weng, Shinji Watanabe
, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu:
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization. ICASSP 2021: 8433-8437 - [c223]Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, Shinji Watanabe:
Data Augmentation Methods for End-to-End Speech Recognition on Distant-Talk Scenarios. Interspeech 2021: 301-305 - [c222]Tatsuya Komatsu, Shinji Watanabe
, Koichi Miyazaki, Tomoki Hayashi:
Acoustic Event Detection with Classifier Chains. Interspeech 2021: 601-605 - [c221]Shu-Wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe
, Abdelrahman Mohamed, Hung-yi Lee:
SUPERB: Speech Processing Universal PERformance Benchmark. Interspeech 2021: 1194-1198 - [c220]Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe
, Alan W. Black:
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding. Interspeech 2021: 1264-1268 - [c219]Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe
, Georg Kucsko:
SPGISpeech: 5, 000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition. Interspeech 2021: 1434-1438 - [c218]Katerina Zmolíková
, Marc Delcroix
, Desh Raj
, Shinji Watanabe
, Jan Cernocký:
Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics. Interspeech 2021: 1464-1468 - [c217]Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe
, Alexander I. Rudnicky
:
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021. Interspeech 2021: 1564-1568 - [c216]Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe
:
Multi-Mode Transformer Transducer with Stochastic Future Context. Interspeech 2021: 1827-1831 - [c215]Brian Yan, Siddharth Dalmia, David R. Mortensen
, Florian Metze, Shinji Watanabe
:
Differentiable Allophone Graphs for Language-Universal Speech Recognition. Interspeech 2021: 2471-2475 - [c214]Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita
, Shinji Watanabe
, Marc Delcroix
, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen:
Continuous Speech Separation Using Speaker Inventory for Long Recording. Interspeech 2021: 3036-3040 - [c213]Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe
, Leibny Paola García-Perera, Kenji Nagamatsu:
Semi-Supervised Training with Pseudo-Labeling for End-To-End Neural Diarization. Interspeech 2021: 3096-3100 - [c212]Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe
, Leibny Paola García-Perera, Kenji Nagamatsu:
Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers. Interspeech 2021: 3116-3120 - [c211]Suwon Shon, Pablo Brusco, Jing Pan, Kyu Jeong Han, Shinji Watanabe
:
Leveraging Pre-Trained Language Model for Speech Sentiment Analysis. Interspeech 2021: 3420-3424 - [c210]Matthew Maciejewski, Shinji Watanabe
, Sanjeev Khudanpur:
Speaker Verification-Based Evaluation of Single-Channel Speech Separation. Interspeech 2021: 3520-3524 - [c209]Maokui He, Desh Raj
, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe
:
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker. Interspeech 2021: 3555-3559 - [c208]Guoguo Chen, Shuzhou Chai, Guan-Bo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe
, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Zhao You, Zhiyong Yan:
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio. Interspeech 2021: 3670-3674 - [c207]Pengcheng Guo, Xuankai Chang, Shinji Watanabe
, Lei Xie:
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain. Interspeech 2021: 3720-3724 - [c206]Yuya Fujita, Tianzi Wang, Shinji Watanabe
, Motoi Omachi:
Toward Streaming ASR with Non-Autoregressive Insertion-Based Model. Interspeech 2021: 3740-3744 - [c205]Jaesong Lee, Jingu Kang, Shinji Watanabe
:
Layer Pruning on Demand with Intermediate CTC. Interspeech 2021: 3745-3749 - [c204]Tianzi Wang, Yuya Fujita, Xuankai Chang, Shinji Watanabe
:
Streaming End-to-End ASR Based on Blockwise Non-Autoregressive Models. Interspeech 2021: 3755-3759 - [c203]Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe:
ESPnet-ST IWSLT 2021 Offline Speech Translation System. IWSLT 2021: 100-109 - [c202]Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda:
Self-Guided Curriculum Learning for Neural Machine Translation. IWSLT 2021: 206-214 - [c201]Motoi Omachi, Yuya Fujita, Shinji Watanabe, Matthew Wiesner:
End-to-end ASR to jointly predict transcriptions and linguistic annotations. NAACL-HLT 2021: 1861-1871 - [c200]Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe:
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation. NAACL-HLT 2021: 1872-1881 - [c199]Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, Shinji Watanabe:
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks. NAACL-HLT 2021: 1882-1896 - [c198]Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse H. Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk:
HEAR: Holistic Evaluation of Audio Representations. NeurIPS (Competition and Demos) 2021: 125-145 - [c197]Emiru Tsunoo, Yosuke Kashiwagi, Shinji Watanabe
:
Streaming Transformer Asr With Blockwise Synchronous Beam Search. SLT 2021: 22-29 - [c196]Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Böddeker, Zhuo Chen, Shinji Watanabe
:
ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration. SLT 2021: 785-792 - [c195]Yawen Xue, Shota Horiguchi, Yusuke Fujita, Shinji Watanabe
, Paola García, Kenji Nagamatsu:
Online End-To-End Neural Diarization with Speaker-Tracing Buffer. SLT 2021: 841-848 - [c194]Yuki Takashima, Yusuke Fujita, Shinji Watanabe
, Shota Horiguchi, Paola García, Kenji Nagamatsu:
End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection. SLT 2021: 849-856 - [c193]Chenda Li, Yi Luo, Cong Han, Jinyu Li
, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix
, Keisuke Kinoshita
, Christoph Böddeker, Yanmin Qian, Shinji Watanabe
, Zhuo Chen:
Dual-Path RNN for Long Recording Speech Separation. SLT 2021: 865-872 - [c192]Desh Raj
, Leibny Paola García-Perera
, Zili Huang, Shinji Watanabe
, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur:
DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs. SLT 2021: 881-888 - [c191]Desh Raj
, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe
, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li
, Scott Wisdom, John R. Hershey:
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis. SLT 2021: 897-904 - [c190]Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin W. Wilson, Desh Raj
, Shinji Watanabe
, Zhuo Chen, John R. Hershey:
Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement. SLT 2021: 905-911 - [c189]Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe
, Yanmin Qian:
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions. WASPAA 2021: 146-150 - [i135]Amir Hussein, Shinji Watanabe, Ahmed Ali:
Arabic Speech Recognition by End-to-End, Modular Systems and Human. CoRR abs/2101.08454 (2021) - [i134]Yawen Xue, Shota Horiguchi, Yusuke Fujita, Yuki Takashima, Shinji Watanabe, Paola García, Kenji Nagamatsu:
Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers. CoRR abs/2101.08473 (2021) - [i133]Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu Jeong Han, Shinji Watanabe, Shrikanth Narayanan:
A Review of Speaker Diarization: Recent Advances with Deep Learning. CoRR abs/2101.09624 (2021) - [i132]Jiatong Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, Shinji Watanabe:
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec. CoRR abs/2101.10877 (2021) - [i131]Shota Horiguchi, Nelson Yalta
, Paola García, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur:
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap. CoRR abs/2102.01363 (2021) - [i130]Jaesong Lee, Shinji Watanabe:
Intermediate Loss Regularization for CTC-based Speech Recognition. CoRR abs/2102.03216 (2021) - [i129]Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu:
Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition. CoRR abs/2102.07955 (2021) - [i128]Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe:
Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition. CoRR abs/2102.09168 (2021) - [i127]Wangyou Zhang, Christoph Böddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian:
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend. CoRR abs/2102.11525 (2021) - [i126]Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian:
Dual-Path Modeling for Long Recording Speech Separation in Meetings. CoRR abs/2102.11634 (2021) - [i125]Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang:
INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing. CoRR abs/2104.00960 (2021) - [i124]Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko:
SPGISpeech: 5, 000 hours of transcribed financial audio for fully formatted end-to-end speech recognition. CoRR abs/2104.02014 (2021) - [i123]Hirofumi Inaguma, Tatsuya Kawahara, Shinji Watanabe:
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation. CoRR abs/2104.06457 (2021) - [i122]Murali Karthick Baskar, Lukás Burget, Shinji Watanabe, Ramón Fernandez Astudillo, Jan Honza Cernocký:
EAT: Enhanced ASR-TTS for Self-supervised Speech Recognition. CoRR abs/2104.07474 (2021) - [i121]Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, Shinji Watanabe:
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks. CoRR abs/2105.00573 (2021) - [i120]Shu-Wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee:
SUPERB: Speech processing Universal PERformance Benchmark. CoRR abs/2105.01051 (2021) - [i119]Soumi Maiti, Hakan Erdogan, Kevin W. Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey:
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings. CoRR abs/2105.02096 (2021) - [i118]Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda:
Self-Guided Curriculum Learning for Neural Machine Translation. CoRR abs/2105.04475 (2021) - [i117]Emiru Tsunoo, Kentaro Shibata, Chaitanya Narisetty, Yosuke Kashiwagi, Shinji Watanabe:
Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios. CoRR abs/2106.03419 (2021) - [i116]Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola García, Kenji Nagamatsu:
End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection. CoRR abs/2106.04078 (2021) - [i115]Yuki Takashima, Yusuke Fujita, Shota Horiguchi, Shinji Watanabe, Paola García, Kenji Nagamatsu:
Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization. CoRR abs/2106.04764 (2021) - [i114]Suwon Shon, Pablo Brusco, Jing Pan, Kyu Jeong Han, Shinji Watanabe:
Leveraging Pre-trained Language Model for Speech Sentiment Analysis. CoRR abs/2106.06598 (2021) - [i113]Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan:
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio. CoRR abs/2106.06909 (2021) - [i112]Pengcheng Guo, Xuankai Chang, Shinji Watanabe, Lei Xie:
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain. CoRR abs/2106.08595 (2021) - [i111]Jaesong Lee, Jingu Kang, Shinji Watanabe:
Layer Pruning on Demand with Intermediate CTC. CoRR abs/2106.09216 (2021) - [i110]Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe:
Multi-mode Transformer Transducer with Stochastic Future Context. CoRR abs/2106.09760 (2021) - [i109]Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, Yawen Xue, Paola García:
Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization. CoRR abs/2106.10654 (2021) - [i108]Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, Alan W. Black:
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding. CoRR abs/2106.15065 (2021) - [i107]Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe:
ESPnet-ST IWSLT 2021 Offline Speech Translation System. CoRR abs/2107.00636 (2021) - [i106]Shota Horiguchi, Shinji Watanabe, Paola García, Yawen Xue, Yuki Takashima, Yohei Kawaguchi:
Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors. CoRR abs/2107.01545 (2021) - [i105]Takashi Maekaku, Xuankai Chang, Yuya Fujita, Li-Wei Chen, Shinji Watanabe, Alexander I. Rudnicky:
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021. CoRR abs/2107.05899 (2021) - [i104]Tianzi Wang, Yuya Fujita, Xuankai Chang, Shinji Watanabe:
Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models. CoRR abs/2107.09428 (2021) - [i103]Wen-Chin Huang, Tomoki Hayashi, Xinjian Li, Shinji Watanabe, Tomoki Toda:
On Prosody Modeling for ASR+TTS based Voice Conversion. CoRR abs/2107.09477 (2021) - [i102]Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, Shinji Watanabe:
Differentiable Allophone Graphs for Language-Universal Speech Recognition. CoRR abs/2107.11628 (2021) - [i101]Yen-Ju Lu, Yu Tsao, Shinji Watanabe:
A Study on Speech Enhancement Based on Diffusion Probabilistic Model. CoRR abs/2107.11876 (2021) - [i100]Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe:
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring. CoRR abs/2109.04411 (2021) - [i99]Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, Shinji Watanabe:
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates. CoRR abs/2109.12804 (2021) - [i98]Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-Wen Yang, Yu Tsao, Hung-yi Lee, Shinji Watanabe:
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition. CoRR abs/2110.04590 (2021) - [i97]Shota Horiguchi, Yuki Takashima, Paola García, Shinji Watanabe, Yohei Kawaguchi:
Multi-Channel End-to-End Neural Diarization with Distributed Microphones. CoRR abs/2110.04694 (2021) - [i96]Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe:
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation. CoRR abs/2110.05249 (2021) - [i95]Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Jeong Han, Shinji Watanabe:
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition. CoRR abs/2110.05571 (2021) - [i94]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda:
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations. CoRR abs/2110.06280 (2021) - [i93]Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, Shinji Watanabe:
ESPnet2-TTS: Extending the Edge of TTS Research. CoRR abs/2110.07840 (2021) - [i92]Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Yanmin Qian:
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions. CoRR abs/2110.14139 (2021) - [i91]Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Anjali Chourdia, Artyom Astafurov, Caroline Chen, Ching-Feng Yeh, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jay Mahadeokar, Jeff Hwang, Ji Chen, Peter Goldsborough, Prabhat Roy, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-Bélair, Yangyang Shi:
TorchAudio: Building Blocks for Audio and Speech Processing. CoRR abs/2110.15018 (2021) - [i90]Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux:
Sequence Transduction with Graph-based Supervision. CoRR abs/2111.01272 (2021) - [i89]Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, Alan W. Black:
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity. CoRR abs/2111.01326 (2021) - [i88]