


Остановите войну!
for scientists:


default search action
Shinji Watanabe 0001
Person information

- affiliation: Carnegie Mellon University, Pittsburgh, PA, USA
- affiliation (former): Johns Hopkins University, Baltimore, MD, USA
- affiliation (2012 - 2017): Mitsubishi Electric Research Laboratories, Cambridge, MA, USA
- affiliation (2001 - 2011): NTT Communication Science Laboratories, Kyoto, Japan
- affiliation (PhD 2006): Waseda University, Tokyo, Japan
Other persons with the same name
- Shinji Watanabe — disambiguation page
- Shinji Watanabe 0002 — Kanagawa University, Department of Electrical Engineering, Yokohama, Japan
- Shinji Watanabe 0003 — Osaka Prefecture University, School of Knowledge and Information Systems, Sakai, China
- Shinji Watanabe 0004 — Renesas Electronics Corporation, Kawasaki, Japan
- Shinji Watanabe 0005 — Nintendo Co.,Ltd, Kyoto, Japan
- Shinji Watanabe 0006 — Gifu National College of Technology, Motosu-gun, Gifu-ken, Japan
- Shinji Watanabe 0007 — University of Miyazaki, Miyazaki, Japan
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2023
- [j52]Matthew Maciejewski
, Jing Shi, Shinji Watanabe
, Sanjeev Khudanpur:
A dilemma of ground truth in noisy speech separation and an approach to lessen the impact of imperfect training data. Comput. Speech Lang. 77: 101410 (2023) - [j51]Zhong-Qiu Wang
, Gordon Wichern
, Shinji Watanabe
, Jonathan Le Roux
:
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency. IEEE ACM Trans. Audio Speech Lang. Process. 31: 397-410 (2023) - [j50]Shota Horiguchi
, Shinji Watanabe
, Paola García
, Yuki Takashima
, Yohei Kawaguchi
:
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors. IEEE ACM Trans. Audio Speech Lang. Process. 31: 706-720 (2023) - [c306]Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W. Black, Shinji Watanabe:
CTC Alignments Improve Autoregressive Translation. EACL 2023: 1615-1631 - [i224]Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe
, Wassim El-Hajj, Ahmed Ali:
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study. CoRR abs/2301.09099 (2023) - [i223]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe
, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. CoRR abs/2301.12596 (2023) - [i222]Li-Wei Chen, Shinji Watanabe
, Alexander Rudnicky:
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech. CoRR abs/2302.04215 (2023) - [i221]Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe
, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Speaker-Independent Acoustic-to-Articulatory Speech Inversion. CoRR abs/2302.06774 (2023) - [i220]Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe
, Manuel Pariente, Nobutaka Ono:
Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge. CoRR abs/2302.07928 (2023) - [i219]Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement. CoRR abs/2302.08088 (2023) - [i218]Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement. CoRR abs/2302.08095 (2023) - [i217]William Chen, Brian Yan, Jiatong Shi, Yifan Peng
, Soumi Maiti, Shinji Watanabe:
Improving Massively Multilingual ASR With Auxiliary CTC Objectives. CoRR abs/2302.12829 (2023) - [i216]Yifan Peng
, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe:
Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding. CoRR abs/2302.14132 (2023) - [i215]Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, Shinji Watanabe:
End-to-End Speech Recognition: A Survey. CoRR abs/2303.03329 (2023) - [i214]Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, Baocai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The Multimodal Information based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition. CoRR abs/2303.06326 (2023) - [i213]Yifan Peng
, Jaesong Lee, Shinji Watanabe:
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition. CoRR abs/2303.07624 (2023) - [i212]Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe:
Enhancing Speech-to-Speech Translation with Multiple TTS Targets. CoRR abs/2304.04618 (2023) - [i211]Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg:
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations. CoRR abs/2304.06795 (2023) - [i210]Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, Younglo Lee, Byeong-Yeol Kim, Shinji Watanabe:
Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling. CoRR abs/2304.08707 (2023) - [i209]Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe:
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head. CoRR abs/2304.12995 (2023) - [i208]Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History. CoRR abs/2305.00926 (2023) - [i207]Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe:
The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge. CoRR abs/2305.01194 (2023) - [i206]Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe:
A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge. CoRR abs/2305.01620 (2023) - [i205]Yu-Kuan Fu, Liang-Hsuan Tseng, Jiatong Shi, Chen-An Li, Tsu-Yuan Hsu, Shinji Watanabe, Hung-Yi Lee:
Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation. CoRR abs/2305.07455 (2023) - [i204]Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei-Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe:
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark. CoRR abs/2305.10615 (2023) - [i203]Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe:
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks. CoRR abs/2305.11073 (2023) - [i202]Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath:
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization. CoRR abs/2305.11095 (2023) - [i201]Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney:
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning. CoRR abs/2305.13331 (2023) - 2022
- [j49]Amir Hussein
, Shinji Watanabe
, Ahmed Ali:
Arabic speech recognition by end-to-end, modular systems and human. Comput. Speech Lang. 71: 101272 (2022) - [j48]Zili Huang, Marc Delcroix
, Leibny Paola García-Perera, Shinji Watanabe
, Desh Raj
, Sanjeev Khudanpur:
Joint speaker diarization and speech recognition based on region proposal networks. Comput. Speech Lang. 72: 101316 (2022) - [j47]Tae Jin Park
, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe
, Shrikanth Narayanan:
A review of speaker diarization: Recent advances with deep learning. Comput. Speech Lang. 72: 101317 (2022) - [j46]Jiatong Shi
, Chunlei Zhang
, Chao Weng, Shinji Watanabe
, Meng Yu, Dong Yu
:
An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer. Comput. Speech Lang. 73: 101327 (2022) - [j45]Aswin Shanmugam Subramanian
, Chao Weng, Shinji Watanabe
, Meng Yu, Dong Yu
:
Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition. Comput. Speech Lang. 75: 101360 (2022) - [j44]Jing Shi
, Xuankai Chang, Shinji Watanabe
, Bo Xu:
Train from scratch: Single-stage joint training of speech separation and recognition. Comput. Speech Lang. 76: 101387 (2022) - [j43]Hung-Yi Lee, Shinji Watanabe
, Karen Livescu, Abdelrahman Mohamed, Tara N. Sainath
:
Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing. IEEE J. Sel. Top. Signal Process. 16(6): 1174-1178 (2022) - [j42]Abdelrahman Mohamed, Hung-yi Lee
, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin
, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath
, Shinji Watanabe
:
Self-Supervised Speech Representation Learning: A Review. IEEE J. Sel. Top. Signal Process. 16(6): 1179-1210 (2022) - [j41]Zhong-Qiu Wang
, Shinji Watanabe
:
Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction. IEEE Signal Process. Lett. 29: 1422-1426 (2022) - [j40]Shota Horiguchi
, Yusuke Fujita
, Shinji Watanabe
, Yawen Xue, Paola García
:
Encoder-Decoder Based Attractors for End-to-End Neural Diarization. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1493-1507 (2022) - [j39]Wangyou Zhang
, Xuankai Chang
, Christoph Böddeker, Tomohiro Nakatani
, Shinji Watanabe
, Yanmin Qian
:
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party. IEEE ACM Trans. Audio Speech Lang. Process. 30: 3173-3188 (2022) - [c305]Xinjian Li, Florian Metze, David Mortensen, Shinji Watanabe, Alan W. Black:
Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble. ACL (Findings) 2022: 2106-2115 - [c304]Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-Wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee:
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities. ACL (1) 2022: 8479-8492 - [c303]Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W. Black, Shinji Watanabe:
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models. EMNLP (Findings) 2022: 5419-5429 - [c302]Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe:
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model. EMNLP (Findings) 2022: 5486-5503 - [c301]Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe
:
Integrating Multiple ASR Systems into NLP Backend with Attention Fusion. ICASSP 2022: 6237-6241 - [c300]Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe
, Dong Yu:
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization. ICASSP 2022: 6412-6416 - [c299]Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe
, Tomoki Toda:
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations. ICASSP 2022: 6552-6556 - [c298]Motoi Omachi, Yuya Fujita, Shinji Watanabe
, Tianzi Wang:
Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing. ICASSP 2022: 6772-6776 - [c297]Zili Huang, Shinji Watanabe
, Shu-Wen Yang, Paola García, Sanjeev Khudanpur:
Investigating Self-Supervised Learning for Speech Enhancement and Separation. ICASSP 2022: 6837-6841 - [c296]Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Artyom Astafurov, Caroline Chen, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jeff Hwang, Ji Chen, Peter Goldsborough, Sean Narenthiran, Shinji Watanabe
, Soumith Chintala, Vincent Quenneville-Bélair:
Torchaudio: Building Blocks for Audio and Speech Processing. ICASSP 2022: 6982-6986 - [c295]Takashi Maekaku, Xuankai Chang, Yuya Fujita, Shinji Watanabe
:
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion. ICASSP 2022: 7107-7111 - [c294]Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng
, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W. Black, Shinji Watanabe
:
ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet. ICASSP 2022: 7167-7171 - [c293]Niko Moritz, Takaaki Hori, Shinji Watanabe
, Jonathan Le Roux:
Sequence Transduction with Graph-Based Supervision. ICASSP 2022: 7212-7216 - [c292]Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe
, Jonathan Le Roux:
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR. ICASSP 2022: 7322-7326 - [c291]Shota Horiguchi, Yuki Takashima, Paola García, Shinji Watanabe
, Yohei Kawaguchi:
Multi-Channel End-To-End Neural Diarization with Distributed Microphones. ICASSP 2022: 7332-7336 - [c290]Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe
, Alexander Richard, Cheng Yu, Yu Tsao:
Conditional Diffusion Probabilistic Model for Speech Enhancement. ICASSP 2022: 7402-7406 - [c289]Jing Pan, Tao Lei, Kwangyoun Kim, Kyu J. Han, Shinji Watanabe
:
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition. ICASSP 2022: 7872-7876 - [c288]Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe
:
Joint Speech Recognition and Audio Captioning. ICASSP 2022: 7892-7896 - [c287]Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe
:
Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR. ICASSP 2022: 8287-8291 - [c286]Keqi Deng, Zehui Yang, Shinji Watanabe
, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang:
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models. ICASSP 2022: 8522-8526 - [c285]Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe
:
Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge. ICASSP 2022: 9201-9205 - [c284]Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe
, Sabato Marco Siniscalchi, Odette Scharenborg
, Diyuan Liu, Bao-Cai Yin, Jia Pan, Jianqing Gao, Cong Liu:
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results. ICASSP 2022: 9266-9270 - [c283]Yifan Peng
, Siddharth Dalmia, Ian R. Lane, Shinji Watanabe:
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding. ICML 2022: 17627-17643 - [c282]Yooncheol Ju, Ilhwan Kim, Hongsun Yang, Ji-Hoon Kim, Byeongyeol Kim, Soumi Maiti, Shinji Watanabe
:
TriniTTS: Pitch-controllable End-to-end TTS without External Aligner. INTERSPEECH 2022: 16-20 - [c281]Peter Wu, Shinji Watanabe
, Louis Goldstein, Alan W. Black, Gopala Krishna Anumanchipalli:
Deep Speech Synthesis from Articulatory Representations. INTERSPEECH 2022: 779-783 - [c280]Takashi Maekaku, Yuya Fujita, Yifan Peng
, Shinji Watanabe
:
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR. INTERSPEECH 2022: 1071-1075 - [c279]Hengshun Zhou, Jun Du, Gongzhen Zou, Zhaoxu Nian, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe
, Odette Scharenborg, Jingdong Chen, Shifu Xiong, Jianqing Gao:
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis. INTERSPEECH 2022: 1111-1115 - [c278]Jiatong Shi, George Saon, David Haws, Shinji Watanabe
, Brian Kingsbury:
VQ-T: RNN Transducers using Vector-Quantized Prediction Network States. INTERSPEECH 2022: 1656-1660 - [c277]Keqi Deng, Shinji Watanabe
, Jiatong Shi, Siddhant Arora:
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation. INTERSPEECH 2022: 1746-1750 - [c276]Hang Chen, Jun Du, Yusheng Dai, Chin-Hui Lee, Sabato Marco Siniscalchi, Shinji Watanabe
, Odette Scharenborg, Jingdong Chen, Baocai Yin, Jia Pan:
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis. INTERSPEECH 2022: 1766-1770 - [c275]Yusuke Shinohara, Shinji Watanabe
:
Minimum latency training of sequence transducers for streaming end-to-end speech recognition. INTERSPEECH 2022: 2098-2102 - [c274]Yuki Takashima, Shota Horiguchi, Shinji Watanabe
, Leibny Paola García-Perera, Yohei Kawaguchi:
Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models. INTERSPEECH 2022: 2218-2222 - [c273]Muqiao Yang, Ian R. Lane, Shinji Watanabe
:
Online Continual Learning of End-to-End Speech Recognition Models. INTERSPEECH 2022: 2668-2672 - [c272]Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe
, Bhiksha Raj:
Improving Speech Enhancement through Fine-Grained Speech Characteristics. INTERSPEECH 2022: 2953-2957 - [c271]Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan W. Black, Shinji Watanabe
:
Two-Pass Low Latency End-to-End Spoken Language Understanding. INTERSPEECH 2022: 3478-3482 - [c270]Dan Berrebbi, Jiatong Shi, Brian Yan, Osbel López-Francisco, Jonathan D. Amith, Shinji Watanabe
:
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation. INTERSPEECH 2022: 3533-3537 - [c269]Nathaniel Romney Robinson, Perez Ogayo, Swetha R. Gangu, David R. Mortensen, Shinji Watanabe
:
When Is TTS Augmentation Through a Pivot Language Useful? INTERSPEECH 2022: 3538-3542 - [c268]Xuankai Chang, Takashi Maekaku, Yuya Fujita, Shinji Watanabe
:
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation. INTERSPEECH 2022: 3819-3823 - [c267]Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Prasad Narisetty, Shinji Watanabe
:
Residual Language Model for End-to-end Speech Recognition. INTERSPEECH 2022: 3899-3903 - [c266]Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe
, Qin Jin:
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. INTERSPEECH 2022: 4272-4276 - [c265]Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe
, Qin Jin:
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis. INTERSPEECH 2022: 4277-4281 - [c264]Jaesong Lee, Lukas Lee, Shinji Watanabe
:
Memory-Efficient Training of RNN-Transducer with Sampled Softmax. INTERSPEECH 2022: 4441-4445 - [c263]Yui Sudo
, Muhammad Shakeel, Kazuhiro Nakadai, Jiatong Shi, Shinji Watanabe
:
Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection. INTERSPEECH 2022: 4641-4645 - [c262]Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe
:
ASR2K: Speech Recognition for Around 2000 Languages without Audio. INTERSPEECH 2022: 4885-4889 - [c261]Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe
, Yusuke Kida:
Better Intermediates Improve CTC Inference. INTERSPEECH 2022: 4965-4969 - [c260]Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
:
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. INTERSPEECH 2022: 5458-5462 - [c259]Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondrej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vera Kloudová, Surafel Melaku Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Miguel Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe:
Findings of the IWSLT 2022 Evaluation Campaign. IWSLT@ACL 2022: 98-157 - [c258]Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng
, Dan Berrebbi, Xinyi Wang, Graham Neubig, Shinji Watanabe:
CMU's IWSLT 2022 Dialect Speech Translation System. IWSLT@ACL 2022: 298-307 - [c257]Xinjian Li, Florian Metze, David R. Mortensen, Alan W. Black, Shinji Watanabe:
Phone Inventories and Recognition for Every Language. LREC 2022: 1061-1067 - [c256]Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, Shinji Watanabe
:
E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition. SLT 2022: 84-91 - [c255]Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe
, Nobutaka Ono:
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation. SLT 2022: 260-265 - [c254]Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
:
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. SLT 2022: 406-413 - [c253]Soumi Maiti, Yushi Ueda, Shinji Watanabe
, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. SLT 2022: 480-487 - [c252]Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe
, Yanmin Qian:
End-to-End Multi-Speaker ASR with Independent Vector Analysis. SLT 2022: 496-501 - [c251]Shota Horiguchi, Yuki Takashima, Shinji Watanabe
, Paola García:
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization. SLT 2022: 620-625 - [c250]Tzu-hsun Feng, Shuyan Annie Dong, Ching-Feng Yeh, Shu-Wen Yang, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, Xuankai Chang, Shinji Watanabe
, Abdelrahman Mohamed, Shang-Wen Li, Hung-yi Lee:
Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning. SLT 2022: 1096-1103 - [c249]Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe
, Paola García, Hung-yi Lee, Hao Tang:
On Compressing Sequences for Self-Supervised Speech Models. SLT 2022: 1128-1135 - [i200]Florian Boyer, Yusuke Shinohara, Takaaki Ishii, Hirofumi Inaguma, Shinji Watanabe:
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies. CoRR abs/2201.05420 (2022) - [i199]Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang:
Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models. CoRR abs/2201.10103 (2022) - [i198]Emiru Tsunoo, Chaitanya Narisetty, Michael Hentschel, Yosuke Kashiwagi, Shinji Watanabe:
Run-and-back stitch search: novel block synchronous decoding for streaming encoder-decoder ASR. CoRR abs/2201.10190 (2022) - [i197]Chaitanya Narisetty, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel, Shinji Watanabe:
Joint Speech Recognition and Audio Captioning. CoRR abs/2202.01405 (2022) - [i196]Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao:
Conditional Diffusion Probabilistic Model for Speech Enhancement. CoRR abs/2202.05256 (2022) - [i195]Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, Tomoki Hayashi:
Acoustic Event Detection with Classifier Chains. CoRR abs/2202.08470 (2022) - [i194]Yen-Ju Lu, Samuele Cornell, Xuankai Chang, Wangyou Zhang, Chenda Li, Zhaoheng Ni, Zhong-Qiu Wang, Shinji Watanabe:
Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge. CoRR abs/2202.12298 (2022) - [i193]Xuankai Chang, Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux:
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR. CoRR abs/2203.00232 (2022) - [i192]Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse H. Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe
, Zeyu Jin, Yonatan Bisk:
HEAR 2021: Holistic Evaluation of Audio Representations. CoRR abs/2203.03022 (2022) - [i191]