


Остановите войну!
for scientists:


default search action
SLT 2022: Doha, Qatar
- IEEE Spoken Language Technology Workshop, SLT 2022, Doha, Qatar, January 9-12, 2023. IEEE 2023, ISBN 979-8-3503-9690-4
- Vasista Sai Lodagala, Sreyan Ghosh, Srinivasan Umesh:
CCC-WAV2VEC 2.0: Clustering AIDED Cross Contrastive Self-Supervised Learning of Speech Representations. 1-8 - Hyung Yong Kim, Byeong-Yeol Kim, Seung Woo Yoo, Youshin Lim, Yunkyu Lim, Hanbin Lee:
ASBERT: ASR-Specific Self-Supervised Learning with Self-Training. 9-14 - Kai Zhen, Martin Radfar, Hieu Duy Nguyen, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris:
Sub-8-Bit Quantization for On-Device Speech Recognition: A Regularization-Free Approach. 15-22 - Gary Wang, Ekin D. Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park:
G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR. 23-30 - David Qiu, Tsendsuren Munkhdalai, Yanzhang He, Khe Chai Sim:
Context-Aware Neural Confidence Estimation for Rare Word Speech Recognition. 31-37 - Antoine Bruguier, David Qiu, Trevor Strohman, Yanzhang He:
Flickering Reduction with Partial Hypothesis Reranking for Streaming ASR. 38-45 - Tatsuya Komatsu, Yusuke Fujita:
Interdecoder: using Attention Decoders as Intermediate Regularization for CTC-Based Speech Recognition. 46-51 - Tara N. Sainath, Rohit Prabhavalkar, Ankur Bapna, Yu Zhang, Zhouyuan Huo, Zhehuai Chen, Bo Li, Weiran Wang, Trevor Strohman:
JOIST: A Joint Speech and Text Streaming Model for ASR. 52-59 - Ke-Han Lu, Kuan-Yu Chen:
A Context-Aware Knowledge Transferring Strategy for CTC-Based ASR. 60-67 - Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro J. Moreno, Nanxin Chen:
Maestro-U: Leveraging Joint Speech-Text Representation Learning for Zero Supervised Speech ASR. 68-75 - Yusuke Fujita, Tatsuya Komatsu, Yusuke Kida:
Alternate Intermediate Conditioning with Syllable-Level and Character-Level Targets for Japanese ASR. 76-83 - Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu Jeong Han, Shinji Watanabe
:
E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition. 84-91 - Jinhwan Park, Sichen Jin, Junmo Park, Sungsoo Kim, Dhairya Sandhyana, Changheon Lee, Myoungji Han, Jungin Lee, Seokyeong Jung, Changwoo Han, Chanwoo Kim:
Conformer-Based on-Device Streaming Speech Recognition with KD Compression and Two-Pass Architecture. 92-99 - Suhaila M. Shakiah, Rupak Vignesh Swaminathan, Hieu Duy Nguyen, Raviteja Chinta, Tariq Afzal, Nathan Susanj, Athanasios Mouchtaris, Grant P. Strimel, Ariya Rastrow:
Accelerator-Aware Training for Transducer-Based Speech Recognition. 100-107 - Lahiru Samarakoon, Ivan Fung:
Untied Positional Encodings for Efficient Transformer-Based Speech Recognition. 108-114 - Yan Gaol, Javier Fernández-Marqués, Titouan Parcollet, Pedro P. B. de Gusmao, Nicholas D. Lane:
Match to Win: Analysing Sequences Lengths for Efficient Self-Supervised Learning in Speech and Audio. 115-122 - Peng Shen, Xugang Lu, Hisashi Kawai:
Pronunciation-Aware Unique Character Encoding for RNN Transducer-Based Mandarin Speech Recognition. 123-129 - Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg:
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition. 130-135 - Vasista Sai Lodagala, Sreyan Ghosh, Srinivasan Umesh:
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations. 136-143 - Fan Yu, Shiliang Zhang, Pengcheng Guo, Yuhao Liang, Zhihao Du, Yuxiao Lin, Lei Xie:
MFCCA:Multi-Frame Cross-Channel Attention for Multi-Speaker ASR in Multi-Party Meeting Scenario. 144-151 - Aleksandr Laptev, Boris Ginsburg:
Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition. 152-159 - Sungjun Han, Deepak Baby, Valentin Mendelev:
Residual Adapters for Targeted Updates in RNN-Transducer Based Speech Recognition System. 160-166 - Tian Li, Qingliang Meng, Yujian Sun:
Improved Noisy Iterative Pseudo-Labeling for Semi-Supervised Speech Recognition. 167-173 - Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas:
Guided Contrastive Self-Supervised Pre-Training for Automatic Speech Recognition. 174-181 - Jakob Poncelet, Hugo Van hamme:
Learning to Jointly Transcribe and Subtitle for End-To-End Spontaneous Speech Recognition. 182-189 - Tsendsuren Munkhdalai, Zelin Wu, Golan Pundak, Khe Chai Sim, Jiayang Li, Pat Rondon, Tara N. Sainath:
NAM+: Towards Scalable End-to-End Contextual Biasing for Adaptive ASR. 190-196 - Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno:
Modular Hybrid Autoregressive Transducer. 197-204 - Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Seyyed Saeed Sarfjoo, Petr Motlícek, Matthias Kleinert, Hartmut Helmke, Oliver Ohneiser
, Qingran Zhan:
How Does Pre-Trained Wav2Vec 2.0 Perform on Domain-Shifted Asr? an Extensive Benchmark on Air Traffic Control Communications. 205-212 - Adam Stooke, Khe Chai Sim, Mason Chua, Tsendsuren Munkhdalai, Trevor Strohman:
Internal Language Model Personalization of E2E Automatic Speech Recognition Using Random Encoder Features. 213-220 - Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski:
Towards End-to-End Unsupervised Speech Recognition. 221-228 - Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney:
Monotonic Segmental Attention for Automatic Speech Recognition. 229-236 - Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong:
Streaming, Fast and Accurate on-Device Inverse Text Normalization for Automatic Speech Recognition. 237-244 - Cal Peyser, W. Ronny Huang, Tara N. Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho:
Dual Learning for Large Vocabulary On-Device ASR. 245-251 - Aditya Patil, Vikas Joshi, Purvi Agrawal, Rupesh R. Mehta:
Streaming Bilingual End-to-End ASR Model Using Attention Over Multiple Softmax. 252-259 - Yoshiki Masuyama, Xuankai Chang, Samuele Cornell, Shinji Watanabe
, Nobutaka Ono:
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation. 260-265 - Dongjune Lee, Minchan Kim, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim:
Fully Unsupervised Training of Few-Shot Keyword Spotting. 266-272 - Chunxi Liu, Yuan Shangguan, Haichuan Yang, Yangyang Shi, Raghuraman Krishnamoorthi, Ozlem Kalinli:
Learning a Dual-Mode Speech Recognition Model VIA Self-Pruning. 273-279 - Ji Won Yoon, Beom Jun Woo, Sunghwan Ahn, Hyeonseung Lee, Nam Soo Kim:
Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition. 280-286 - Tina Raissi, Wei Zhou, Simon Berger, Ralf Schlüter, Hermann Ney:
HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch. 287-294 - Vrunda N. Sukhadia, Srinivasan Umesh:
Domain Adaptation of Low-Resource Target-Domain Models Using Well-Trained ASR Conformer Models. 295-301 - Saket Dingliwal, Monica Sunkara, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff, Sravan Bodapati:
Personalization of CTC Speech Recognition Models. 302-309 - Shaan Bijwadia, Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Chao Zhang, Yanzhang He:
Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems. 310-316 - Arun Narayanan, James Walker, Sankaran Panchapagesan, Nathan Howard, Yuma Koizumi:
Learning Mask Scalars for Improved Robust Automatic Speech Recognition. 317-323 - Niko Moritz, Frank Seide, Duc Le, Jay Mahadeokar, Christian Fuegen:
An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition. 324-330 - Chanwoo Kim, Sathish Indurti, Jinhwan Park, Wonyong Sung:
Macro-Block Dropout for Improved Regularization in Training End-to-End Speech Recognition Models. 331-338 - Ragheb Al-Ghezi, Yaroslav Getman, Ekaterina Voskoboinik, Mittul Singh, Mikko Kurimo:
Automatic Rating of Spontaneous Speech for Low-Resource Languages. 339-345 - Benjamin Kleiner, Jack G. M. Fitzgerald, Haidar Khan, Gohkan Tur:
Mixture of Domain Experts for Language Understanding: an Analysis of Modularity, Task Performance, and Memory Tradeoffs. 346-352 - Anupama Chingacham, Vera Demberg, Dietrich Klakow:
A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification. 353-360 - Gaëlle Laperrière, Valentin Pelloin, Mickaël Rouvier, Themos Stafylakis, Yannick Estève:
On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding. 361-368 - Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi:
Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction. 369-374 - Jinzi Qi, Hugo Van hamme:
Weak-Supervised Dysarthria-Invariant Features for Spoken Language Understanding Using an Fhvae and Adversarial Training. 375-381 - Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng:
Building Markovian Generative Architectures Over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems. 382-389 - Mohan Li, Rama Doddipatla:
Non-Autoregressive End-to-End Approaches for Joint Automatic Speech Recognition and Spoken Language Understanding. 390-397 - Yasufumi Moriya, Gareth J. F. Jones:
Improving Noise Robustness for Spoken Content Retrieval Using Semi-Supervised ASR and N-Best Transcripts for BERT-Based Ranking Models. 398-405 - Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
:
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding. 406-413 - Wei-Tsung Kao, Yuan-Kuei Wu, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee:
On the Efficiency of Integrating Self-Supervised Learning and Meta-Learning for User-Defined Few-Shot Keyword Spotting. 414-421 - Liang Wen, Lizhong Wang, Ying Zhang, Kwang Pyo Choi:
Multi-Stage Progressive Audio Bandwidth Extension. 422-427 - Sandipana Dowerah, Romain Serizel, Denis Jouvet, Mohammad MohammadAmini, Driss Matrouf:
Joint Optimization of Diffusion Probabilistic-Based Multichannel Speech Enhancement with Far-Field Speaker Verification. 428-435 - Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang:
Spatial-DCCRN: DCCRN Equipped with Frame-Level Angle Feature and Hybrid Filtering for Multi-Channel Speech Enhancement. 436-443 - Martin Strauss, Matteo Torcoli, Bernd Edler:
Improved Normalizing Flow-Based Speech Enhancement Using an all-Pole Gammatone Filterbank for Conditional Input Representation. 444-450 - Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu:
Exploring WavLM on Speech Enhancement. 451-457 - Yu-sheng Tsao, Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen:
Adaptive-FSN: Integrating Full-Band Extraction and Adaptive Sub-Band Encoding for Monaural Speech Enhancement. 458-464 - Andrea Lorena Aldana Blanco, Cassia Valentini-Botinhao, Ondrej Klejch, Mandar Gogate, Kia Dashtipour, Amir Hussain, Peter Bell:
AVSE Challenge: Audio-Visual Speech Enhancement Challenge. 465-471 - Yukai Ju, Shimin Zhang, Wei Rao, Yannan Wang, Tao Yu, Lei Xie, Shidong Shang:
TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement. 472-479 - Soumi Maiti, Yushi Ueda, Shinji Watanabe
, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu:
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. 480-487 - Qinghua Liu, Yating Huang, Yunzhe Hao, Jiaming Xu, Bo Xu:
LiMuSE: Lightweight Multi-Modal Speaker Extraction. 488-495 - Robin Scheibler, Wangyou Zhang, Xuankai Chang, Shinji Watanabe
, Yanmin Qian:
End-to-End Multi-Speaker ASR with Independent Vector Analysis. 496-501 - Wolfgang Mack, Emanuël A. P. Habets:
A Hybrid Acoustic Echo Reduction Approach Using Kalman Filtering and Informed Source Extraction with Improved Training. 502-508 - Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao:
Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation. 509-516 - Tianyu Cao
, Laureano Moro-Velázquez, Piotr Zelasko, Jesús Villalba, Najim Dehak:
Vsameter: Evaluation of a New Open-Source Tool to Measure Vowel Space Area and Related Metrics. 517-524 - Tyler Vuong, Nikhil Madaan, Rohan Panda, Richard M. Stern:
Investigating the Important Temporal Modulations for Deep-Learning-Based Speech Activity Detection. 525-531 - Anna Favaro, Chelsie Motley, Tianyu Cao
, Miguel Iglesias, Ankur Butala, Esther S. Oh, Robert D. Stevens, Jesús Villalba, Najim Dehak, Laureano Moro-Velázquez:
A Multi-Modal Array of Interpretable Features to Evaluate Language and Speech Patterns in Different Neurological Disorders. 532-539 - Donghyeon Kim, Jeong-gi Kwak, Hanseok Ko:
Efficient Dynamic Filter For Robust and Low Computational Feature Extraction. 540-547 - Sung Hwan Mun, Jee-weon Jung, Min Hyun Han, Nam Soo Kim:
Frequency and Multi-Scale Selective Kernel Attention for Speaker Verification. 548-554 - Junyi Peng, Oldrich Plchot, Themos Stafylakis, Ladislav Mosner, Lukás Burget, Jan Cernocký:
An Attention-Based Backend Allowing Efficient Fine-Tuning of Transformer Models for Speaker Verification. 555-562 - Woo Hyun Kang, Jahangir Alam, Abderrahim Fathan:
Flow-ER: A Flow-Based Embedding Regularization Strategy for Robust Speech Representation Learning. 563-570 - Ismail Rasim Ülgen, Levent M. Arslan:
Unsupervised Domain Adaptation of Neural PLDA Using Segment Pairs for Speaker Verification. 571-576 - Bhusan Chettri:
The Clever Hans Effect in Voice Spoofing Detection. 577-584 - Xin Wang
, Junichi Yamagishi:
Investigating Active-Learning-Based Training Data Selection for Speech Spoofing Countermeasure. 585-592 - Xinyue Ma, Shanshan Zhang, Shen Huang, Ji Gao, Ying Hu, Liang He:
How to Boost Anti-Spoofing with X-Vectors. 593-598 - Zhengyang Chen, Yao Qian, Bing Han, Yanmin Qian, Michael Zeng:
A Comprehensive Study on Self-Supervised Distillation for Speaker Representation Learning. 599-604 - Jeremy Heng Meng Wong, Yifan Gong:
Joint Speaker Diarisation and Tracking in Switching State-Space Model. 605-612 - Jeremy Heng Meng Wong, Igor Abramovski, Xiong Xiao, Yifan Gong:
Diarisation Using Location Tracking with Agglomerative Clustering. 613-619 - Shota Horiguchi, Yuki Takashima, Shinji Watanabe
, Paola García:
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization. 620-625 - Juan Manuel Coria, Hervé Bredin, Sahar Ghannay, Sophie Rosset:
Continual Self-Supervised Domain Adaptation for End-to-End Speaker Diarization. 626-632 - Juan Zuluaga-Gomez, Seyyed Saeed Sarfjoo, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlícek, Karel Ondrej
, Oliver Ohneiser
, Hartmut Helmke:
Bertraffic: Bert-Based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications. 633-640 - Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini:
Low-Latency Speech Separation Guided Diarization for Telephone Conversations. 641-646 - Samantha Kotey, Rozenn Dahyot
, Naomi Harte:
Fine Grained Spoken Document Summarization Through Text Segmentation. 647-654 - Jwala Dhamala, Varun Kumar, Rahul Gupta, Kai-Wei Chang, Aram Galstyan:
An Analysis of The Effects of Decoding Algorithms on Fairness in Open-Ended Language Generation. 655-662 - Lu Zeng, Sree Hari Krishnan Parthasarathi, Dilek Hakkani-Tur:
N-Best Hypotheses Reranking for Text-to-SQL Systems. 663-670 - Jia Cui, Heng Lu, Wenjie Wang, Shiyin Kang, Liqiang He, Guangzhi Li, Dong Yu:
Efficient Text Analysis with Pre-Trained Neural Network Models. 671-676 - Sharman Tan, Piyush Behre, Nick Kibre, Issac Alphonso, Shuangyu Chang:
Four-in-One: a Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition. 677-684 - Hiroaki Sugiyama, Masahiro Mizukami, Tsunehiro Arimoto, Hiromi Narimatsu, Yuya Chiba, Hideharu Nakajima, Toyomi Meguro:
Empirical Analysis of Training Strategies of Transformer-Based Japanese Chit-Chat Systems. 685-691 - Xuanjun Chen, Haibin Wu, Helen Meng, Hung-yi Lee, Jyh-Shing Roger Jang:
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection. 692-699 - Leanne Nortje, Herman Kamper:
Towards Visually Prompted Keyword Localisation for Zero-Resource Spoken Languages. 700-707 - Binghuai Lin, Liyuan Wang:
Exploiting Information From Native Data for Non-Native Automatic Pronunciation Assessment. 708-714 - Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath:
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model. 715-722 - Zhengyang Li, Timo Lohrenz, Matthias Dunkelberg, Tim Fingscheidt:
Transformer-Based Lip-Reading with Regularized Dropout and Relaxed Attention. 723-730 - Kayode Olaleye, Dan Oneata, Herman Kamper:
YFACC: A Yorùbá Speech-Image Dataset for Cross-Lingual Keyword Localisation Through Visual Grounding. 731-738 - Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato:
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis. 739-746 - Muhammad Huzaifah, Ivan Kukanov:
An Analysis of Semantically-Aligned Speech-Text Embeddings. 747-754 - Brady Houston, Katrin Kirchhoff:
Exploration of Language-Specific Self-Attention Parameters for Multilingual End-to-End Speech Recognition. 755-762 - Shelly Jain, Aditya Yadavalli, Ganesh Mirishkar, Anil Kumar Vuppala:
How do Phonological Properties Affect Bilingual Automatic Speech Recognition? 763-770 - Ke Hu, Bo Li, Tara N. Sainath:
Scaling Up Deliberation For Multilingual ASR. 771-776 - Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak, Ahmed Ali, Sanjeev Khudanpur:
Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition. 777-784 - Joshua Jansen van Vüren, Thomas Niesler:
Code-Switched Language Modelling Using a Code Predictive Lstm in Under-Resourced South African Languages. 785-791 - Le Minh Nguyen, Shekhar Nayak, Matt Coler
:
Improving Luxembourgish Speech Recognition with Cross-Lingual Speech Representations. 792-797 - Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna:
FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech. 798-805 - Zihan Wang, Qi Meng, HaiFeng Lan, XinRui Zhang, KeHao Guo, Akshat Gupta:
Multilingual Speech Emotion Recognition with Multi-Gating Mechanism and Neural Architecture Search. 806-813 - Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng:
Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE. 814-821 - Chia-Yu Li, Ngoc Thang Vu:
Improving Semi-Supervised End-To-End Automatic Speech Recognition Using Cyclegan and Inter-Domain Losses. 822-829 - C. S. Anoop
, A. G. Ramakrishnan
:
Exploring a Unified ASR for Multiple South Indian Languages Leveraging Multilingual Acoustic and Language Models. 830-837 - Sepand Mavandadi, Bo Li, Chao Zhang, Brian Farris, Tara N. Sainath, Trevor Strohman:
A Truly Multilingual First Pass and Monolingual Second Pass Streaming on-Device ASR System. 838-845