


default search action
Yanmin Qian
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j40]Shuai Wang
, Zhengyang Chen, Bing Han, Hongji Wang, Chengdong Liang, Binbin Zhang, Xu Xiang, Wen Ding, Johan Rohdin, Anna Silnova, Yanmin Qian, Haizhou Li:
Advancing speaker embedding learning: Wespeaker toolkit for research and production. Speech Commun. 162: 103104 (2024) - [j39]Xuankai Chang
, Shinji Watanabe
, Marc Delcroix
, Tsubasa Ochiai
, Wangyou Zhang
, Yanmin Qian
:
Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition [Special Issue On Model-Based and Data-Driven Audio Signal Processing]. IEEE Signal Process. Mag. 41(6): 39-50 (2024) - [j38]Bing Han
, Zhengyang Chen
, Yanmin Qian
:
Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification. IEEE ACM Trans. Audio Speech Lang. Process. 32: 529-541 (2024) - [j37]Wei Wang
, Yanmin Qian
:
Universal Cross-Lingual Data Generation for Low Resource ASR. IEEE ACM Trans. Audio Speech Lang. Process. 32: 973-983 (2024) - [j36]Zhengyang Chen
, Bing Han
, Shuai Wang
, Yanmin Qian
:
Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1636-1649 (2024) - [j35]Xun Gong
, Yu Wu
, Jinyu Li
, Shujie Liu, Rui Zhao, Xie Chen
, Yanmin Qian
:
Advanced Long-Content Speech Recognition With Factorized Neural Transducer. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1803-1815 (2024) - [j34]Jiahong Li
, Chenda Li
, Yifei Wu
, Yanmin Qian
:
Unified Cross-Modal Attention: Robust Audio-Visual Speech Recognition and Beyond. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1941-1953 (2024) - [j33]Bei Liu
, Haoyu Wang
, Yanmin Qian
:
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3771-3784 (2024) - [j32]Shuai Wang
, Zhengyang Chen, Kong Aik Lee
, Yanmin Qian
, Haizhou Li
:
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4971-4998 (2024) - [c186]Bing Han, Zhiqiang Lv, Anbai Jiang, Wen Huang
, Zhengyang Chen, Yufeng Deng, Jiawei Ding, Cheng Lu, Wei-Qiang Zhang, Pingyi Fan
, Jia Liu, Yanmin Qian:
Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection. ICASSP 2024: 1326-1330 - [c185]Wangyou Zhang, Jee-weon Jung, Yanmin Qian:
Improving Design of Input Condition Invariant Speech Enhancement. ICASSP 2024: 10696-10700 - [c184]Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li:
Leveraging in-the-wild Data for Effective Self-supervised Pretraining in Speaker Recognition. ICASSP 2024: 10901-10905 - [c183]Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li:
Prompt-Driven Target Speech Diarization. ICASSP 2024: 11086-11090 - [c182]Hang Shao, Bei Liu, Yanmin Qian:
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models. ICASSP 2024: 11296-11300 - [c181]Wen Huang
, Bing Han, Shuai Wang, Zhengyang Chen, Yanmin Qian:
Robust Cross-Domain Speaker Verification with Multi-Level Domain Adapters. ICASSP 2024: 11781-11785 - [c180]Linfeng Yu, Wangyou Zhang, Chenpeng Du, Leying Zhang, Zheng Liang, Yanmin Qian:
Generation-Based Target Speech Extraction with Speech Discretization and Vocoder. ICASSP 2024: 12612-12616 - [c179]Wen Huang
, Anbai Jiang, Bing Han, Xinhu Zheng, Yihong Qiu, Wenxi Chen, Yuzhe Liang, Pingyi Fan, Wei-Qiang Zhang, Cheng Lu, Xie Chen, Jia Liu, Yanmin Qian:
Semi-Supervised Acoustic Scene Classification with Test-Time Adaptation. ICME Workshops 2024: 1-5 - [c178]Yuzhe Liang, Wenxi Chen, Anbai Jiang, Yihong Qiu, Xinhu Zheng, Wen Huang
, Bing Han, Yanmin Qian, Pingyi Fan, Wei-Qiang Zhang, L. Cheng, Jia Liu, Xie Chen:
Improving Acoustic Scene Classification via Self-Supervised and Semi-Supervised Learning with Efficient Audio Transformer. ICME Workshops 2024: 1-6 - [c177]Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian, Xuchen Song:
InstructME: An Instruction Guided Music Edit Framework with Latent Diffusion Models. IJCAI 2024: 5835-5843 - [c176]Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Michael Zeng:
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation. NeurIPS 2024 - [c175]Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng:
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations. NeurIPS 2024 - [i83]Wangyou Zhang, Jee-weon Jung, Shinji Watanabe, Yanmin Qian:
Improving Design of Input Condition Invariant Speech Enhancement. CoRR abs/2401.14271 (2024) - [i82]Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian:
Advanced Long-Content Speech Recognition With Factorized Neural Transducer. CoRR abs/2403.13423 (2024) - [i81]Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng:
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations. CoRR abs/2404.06690 (2024) - [i80]Bo Chen, Shoukang Hu, Qi Chen, Chenpeng Du, Ran Yi, Yanmin Qian, Xie Chen:
GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting. CoRR abs/2404.19040 (2024) - [i79]Haoyu Wang, Bei Liu, Hang Shao, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian:
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs. CoRR abs/2405.17233 (2024) - [i78]Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng:
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation. CoRR abs/2405.17809 (2024) - [i77]Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian:
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement. CoRR abs/2406.04269 (2024) - [i76]Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Jan Pirklbauer, Marvin Sach, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian:
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement. CoRR abs/2406.04660 (2024) - [i75]Bei Liu, Haoyu Wang, Yanmin Qian:
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization. CoRR abs/2406.05359 (2024) - [i74]Yidi Jiang, Ruijie Tao, Zhengyang Chen, Yanmin Qian, Haizhou Li:
Target Speech Diarization with Multimodal Prompts. CoRR abs/2406.07198 (2024) - [i73]Zhengyang Chen, Xuechen Liu, Erica Cooper, Junichi Yamagishi, Yanmin Qian:
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems. CoRR abs/2406.08812 (2024) - [i72]Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan
:
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection. CoRR abs/2406.11364 (2024) - [i71]Chenda Li, Samuele Cornell, Shinji Watanabe, Yanmin Qian:
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement. CoRR abs/2406.13471 (2024) - [i70]Shuai Wang, Zhengyang Chen, Kong Aik Lee, Yanmin Qian, Haizhou Li:
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning. CoRR abs/2407.15188 (2024) - [i69]Zhengyang Chen, Bing Han, Shuai Wang, Yidi Jiang, Yanmin Qian:
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching. CoRR abs/2409.04859 (2024) - [i68]Zhengyang Chen, Shuai Wang, Mingyang Zhang, Xuechen Liu, Junichi Yamagishi, Yanmin Qian:
Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion. CoRR abs/2409.05004 (2024) - [i67]Xinhu Zheng, Anbai Jiang, Bing Han, Yanmin Qian, Pingyi Fan, Jia Liu, Wei-Qiang Zhang:
Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models. CoRR abs/2409.07016 (2024) - [i66]Shuai Wang, Ke Zhang, Shaoxiong Lin, Junjie Li, Xuefei Wang, Meng Ge, Jianwei Yu, Yanmin Qian, Haizhou Li:
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction. CoRR abs/2409.15799 (2024) - [i65]Wen Huang, Bing Han, Zhengyang Chen, Shuai Wang, Yanmin Qian:
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification. CoRR abs/2410.17033 (2024) - [i64]Bing Han, Wen Huang, Zhengyang Chen, Anbai Jiang, Pingyi Fan, Cheng Lu, Zhiqiang Lv, Jia Liu, Wei-Qiang Zhang, Yanmin Qian:
Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning. CoRR abs/2410.20775 (2024) - [i63]Bei Liu, Yanmin Qian:
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification. CoRR abs/2412.01195 (2024) - [i62]Leying Zhang, Wangyou Zhang, Chenda Li, Yanmin Qian:
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling. CoRR abs/2412.14890 (2024) - 2023
- [j31]Yen-Ju Lu
, Xuankai Chang
, Chenda Li
, Wangyou Zhang
, Samuele Cornell
, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler
, Zhong-Qiu Wang
, Yu Tsao
, Yanmin Qian
, Shinji Watanabe
:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing. J. Open Source Softw. 8(91): 5403 (2023) - [j30]Bei Liu
, Zhengyang Chen
, Yanmin Qian
:
Depth-First Neural Architecture With Attentive Feature Fusion for Efficient Speaker Verification. IEEE ACM Trans. Audio Speech Lang. Process. 31: 1825-1838 (2023) - [c174]Chang Chen, Xun Gong, Yanmin Qian:
Efficient Text-Only Domain Adaptation For CTC-Based ASR. ASRU 2023: 1-7 - [c173]Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu:
The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR. ASRU 2023: 1-8 - [c172]Shaoxiong Lin, Chao Zhang, Yanmin Qian:
Improving Speech Enhancement Using Audio Tagging Knowledge From Pre-Trained Representations and Multi-Task Learning. ASRU 2023: 1-7 - [c171]Dongning Yang, Wei Wang, Yanmin Qian:
FAT-HuBERT: Front-End Adaptive Training of Hidden-Unit BERT For Distortion-Invariant Robust Speech Recognition. ASRU 2023: 1-8 - [c170]Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian:
Toward Universal Speech Enhancement For Diverse Input Conditions. ASRU 2023: 1-6 - [c169]Wangyou Zhang, Lei Yang, Yanmin Qian:
Exploring Time-Frequency Domain Target Speaker Extraction For Causal and Non-Causal Processing. ASRU 2023: 1-6 - [c168]Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian:
LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer. ICASSP 2023: 1-5 - [c167]Xun Gong, Wei Wang, Hang Shao, Xie Chen, Yanmin Qian:
Factorized AED: Factorized Attention-Based Encoder-Decoder for Text-Only Domain Adaptive ASR. ICASSP 2023: 1-5 - [c166]Bing Han, Zhengyang Chen, Yanmin Qian:
Exploring Binary Classification Loss for Speaker Verification. ICASSP 2023: 1-5 - [c165]Bing Han, Wen Huang
, Zhengyang Chen, Yanmin Qian:
Improving Dino-Based Self-Supervised Speaker Verification with Progressive Cluster-Aware Training. ICASSP Workshops 2023: 1-5 - [c164]Jiahong Li
, Chenda Li, Yifei Wu, Yanmin Qian:
Robust Audio-Visual ASR with Unified Cross-Modal Attention. ICASSP 2023: 1-5 - [c163]Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng:
Target Sound Extraction with Variable Cross-Modality Clues. ICASSP 2023: 1-5 - [c162]Chenda Li, Yifei Wu, Yanmin Qian:
Predictive Skim: Contrastive Predictive Coding for Low-Latency Online Speech Separation. ICASSP 2023: 1-5 - [c161]Tao Liu, Zhengyang Chen, Yanmin Qian, Kai Yu:
Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge. ICASSP 2023: 1-2 - [c160]Hang Shao, Tian Tan, Wei Wang, Xun Gong, Yanmin Qian:
Joint Discriminator and Transfer Based Fast Domain Adaptation For End-To-End Speech Recognition. ICASSP 2023: 1-5 - [c159]Haoyu Wang, Bei Liu, Yifei Wu, Zhengyang Chen, Yanmin Qian:
Lowbit Neural Network Quantization for Speaker Verification. ICASSP Workshops 2023: 1-5 - [c158]Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian:
Wespeaker: A Research and Production Oriented Speaker Embedding Learning Toolkit. ICASSP 2023: 1-5 - [c157]Wei Wang, Yanmin Qian:
HuBERT-AGG: Aggregated Representation Distillation of Hidden-Unit Bert for Robust Speech Recognition. ICASSP 2023: 1-5 - [c156]Yifei Wu, Chenda Li, Yanmin Qian:
Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech Separation. ICASSP Workshops 2023: 1-5 - [c155]Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng:
Code-Switching Text Generation and Injection in Mandarin-English ASR. ICASSP 2023: 1-5 - [c154]Leying Zhang
, Zhengyang Chen, Yanmin Qian:
Adaptive Large Margin Fine-Tuning For Robust Speaker Verification. ICASSP 2023: 1-5 - [c153]Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. INTERSPEECH 2023: 1314-1318 - [c152]Bei Liu, Haoyu Wang, Yanmin Qian:
Extremely Low Bit Quantization for Mobile Speaker Verification Systems Under 1MB Memory. INTERSPEECH 2023: 1973-1977 - [c151]Zhilong Zhang, Wei Wang, Yanmin Qian:
Fast and Efficient Multilingual Self-Supervised Pre-training for Low-Resource Speech Recognition. INTERSPEECH 2023: 2248-2252 - [c150]Wei Wang, Yanmin Qian:
UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR. INTERSPEECH 2023: 2253-2257 - [c149]Bei Liu, Yanmin Qian:
Reversible Neural Networks for Memory-Efficient Speaker Verification. INTERSPEECH 2023: 3127-3131 - [c148]Bei Liu, Yanmin Qian:
ECAPA++: Fine-grained Deep Embedding Learning for TDNN Based Speaker Verification. INTERSPEECH 2023: 3132-3136 - [c147]Zhengyang Chen, Bing Han, Xu Xiang, Houjun Huang, Bei Liu, Yanmin Qian:
Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022. INTERSPEECH 2023: 3202-3206 - [c146]Wei Wang, Xun Gong, Hang Shao, Dongning Yang, Yanmin Qian:
Text Only Domain Adaptation with Phoneme Guided Data Splicing for End-to-End Speech Recognition. INTERSPEECH 2023: 3347-3351 - [c145]Linfeng Yu, Wangyou Zhang, Chenda Li, Yanmin Qian:
Overlap Aware Continuous Speech Separation without Permutation Invariant Training. INTERSPEECH 2023: 3512-3516 - [c144]Wangyou Zhang, Yanmin Qian:
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition. INTERSPEECH 2023: 3517-3521 - [c143]Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian:
Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor. INTERSPEECH 2023: 3552-3556 - [c142]Haoyu Wang, Bei Liu, Yifei Wu, Yanmin Qian:
Adaptive Neural Network Quantization For Lightweight Speaker Verification. INTERSPEECH 2023: 5331-5335 - [c141]Chenyang Le, Yao Qian, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng, Xuedong Huang:
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation. NeurIPS 2023 - [c140]Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe:
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation. WASPAA 2023: 1-5 - [d1]Yen-Ju Lu
, Xuankai Chang
, Chenda Li
, Wangyou Zhang
, Samuele Cornell
, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler
, Zhong-Qiu Wang
, Yu Tsao
, Yanmin Qian
, Shinji Watanabe
:
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310). Zenodo, 2023 - [i61]Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng:
Target Sound Extraction with Variable Cross-modality Clues. CoRR abs/2303.08372 (2023) - [i60]Haibin Yu, Yuxuan Hu, Yao Qian, Ma Jin, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng:
Code-Switching Text Generation and Injection in Mandarin-English ASR. CoRR abs/2303.10949 (2023) - [i59]Bing Han, Zhengyang Chen, Yanmin Qian:
Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification. CoRR abs/2304.05754 (2023) - [i58]Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian:
Attention-based Encoder-Decoder Network for End-to-End Neural Speaker Diarization with Target Speaker Attractor. CoRR abs/2305.10704 (2023) - [i57]Hang Shao, Wei Wang, Bei Liu, Xun Gong, Haoyu Wang, Yanmin Qian:
Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR. CoRR abs/2305.10788 (2023) - [i56]Wangyou Zhang, Yanmin Qian:
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition. CoRR abs/2305.16286 (2023) - [i55]Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. CoRR abs/2305.18747 (2023) - [i54]Bing Han, Zhengyang Chen, Yanmin Qian:
Exploring Binary Classification Loss For Speaker Verification. CoRR abs/2307.08205 (2023) - [i53]Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe:
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation. CoRR abs/2307.12231 (2023) - [i52]Bing Han, Junyu Dai, Xuchen Song, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian:
InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models. CoRR abs/2308.14360 (2023) - [i51]Zhengyang Chen, Bing Han, Shuai Wang, Yanmin Qian:
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer. CoRR abs/2309.06672 (2023) - [i50]Junyi Ao, Mehmet Sinan Yildirim, Meng Ge, Shuai Wang, Ruijie Tao, Yanmin Qian, Liqun Deng, Longshuai Xiao, Haizhou Li:
USED: Universal Speaker Extraction and Diarization. CoRR abs/2309.10674 (2023) - [i49]Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li:
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition. CoRR abs/2309.11730 (2023) - [i48]Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu:
The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR. CoRR abs/2309.13573 (2023) - [i47]Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Xinkai Wang, Hemin Yang, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng:
Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction. CoRR abs/2309.13874 (2023) - [i46]Wangyou Zhang, Kohei Saijo, Zhong-Qiu Wang, Shinji Watanabe, Yanmin Qian:
Toward Universal Speech Enhancement for Diverse Input Conditions. CoRR abs/2309.17384 (2023) - [i45]Hang Shao, Bei Liu, Yanmin Qian:
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models. CoRR abs/2310.09499 (2023) - [i44]Dongning Yang, Wei Wang, Yanmin Qian:
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech Recognition. CoRR abs/2311.17790 (2023) - 2022
- [j29]Sanyuan Chen
, Chengyi Wang, Zhengyang Chen, Yu Wu
, Shujie Liu, Zhuo Chen, Jinyu Li
, Naoyuki Kanda
, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian
, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei:
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. IEEE J. Sel. Top. Signal Process. 16(6): 1505-1518 (2022) - [j28]Yanmin Qian
, Zhikai Zhou
:
Optimizing Data Usage for Low-Resource Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 30: 394-403 (2022) - [j27]Chenda Li
, Zhuo Chen, Yanmin Qian
:
Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1508-1520 (2022) - [j26]Yanmin Qian
, Xun Gong
, Houjun Huang:
Layer-Wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 30: 2842-2853 (2022) - [j25]Wangyou Zhang
, Xuankai Chang
, Christoph Böddeker, Tomohiro Nakatani
, Shinji Watanabe
, Yanmin Qian
:
End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party. IEEE ACM Trans. Audio Speech Lang. Process. 30: 3173-3188 (2022) - [c139]Yifei Wu, Chenda Li, Jinfeng Bai, Zhongqin Wu, Yanmin Qian:
Time-Domain Audio-Visual Speech Separation on Low Quality Videos. ICASSP 2022: 256-260 - [c138]Chenda Li, Lei Yang, Weiqin Wang, Yanmin Qian:
Skim: Skipping Memory Lstm for Low-Latency Real-Time Continuous Speech Separation. ICASSP 2022: 681-685 - [c137]Zhengyang Chen, Sanyuan Chen, Yu Wu, Yao Qian, Chengyi Wang, Shujie Liu, Yanmin Qian, Michael Zeng:
Large-Scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification. ICASSP 2022: 6147-6151 - [c136]Bing Han, Zhengyang Chen, Yanmin Qian:
Local Information Modeling with Self-Attention for Speaker Verification. ICASSP 2022: 6727-6731 - [c135]Zhikai Zhou, Tian Tan, Yanmin Qian:
Punctuation Prediction for Streaming On-Device Speech Recognition. ICASSP 2022: 7277-7281 - [c134]Bing Han, Zhengyang Chen, Bei Liu, Yanmin Qian:
MLP-SVNET: A Multi-Layer Perceptrons Based Network for Speaker Verification. ICASSP 2022: 7522-7526 - [c133]Bei Liu, Haoyu Wang,