


Остановите войну!
for scientists:


default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 29
Volume 29, 2021
- Bijue Jia, Jiancheng Lv
, Xi Peng
, Yao Chen, Shenglan Yang:
Hierarchical Regulated Iterative Network for Joint Task of Music Detection and Music Relative Loudness Estimation. 1-13 - Nauman Dawalatabad
, Srikanth R. Madikeri
, C. Chandra Sekhar
, Hema A. Murthy:
Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings. 14-27 - Midia Yousefi, John H. L. Hansen
:
Block-Based High Performance CNN Architectures for Frame-Level Overlapping Speech Detection. 28-40 - Jiaming Cheng
, Ruiyu Liang
, Zhenlin Liang
, Li Zhao, Chengwei Huang
, Björn W. Schuller
:
A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy. 41-53 - Franz Anders
, Mario Hlawitschka
, Mirco Fuchs
:
Comparison of Artificial Neural Network Types for Infant Vocalization Classification. 54-67 - Tomohiko Nakamura
, Hirokazu Kameoka
:
Harmonic-Temporal Factor Decomposition for Unsupervised Monaural Separation of Harmonic Sounds. 68-82 - Jens Ahrens
, Stefan Bilbao:
Computation of Spherical Harmonic Representations of Source Directivity Based on the Finite-Distance Signature. 83-92 - Shun-Po Chuang
, Alexander H. Liu, Tzu-Wei Sung, Hung-yi Lee
:
Improving Automatic Speech Recognition and Speech Translation via Word Embedding Prediction. 93-105 - Li Chai
, Jun Du
, Qing-Feng Liu, Chin-Hui Lee
:
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement. 106-117 - De Hu
, Zhe Chen
, Fuliang Yin
:
Passive Geometry Calibration for Microphone Arrays Based on Distributed Damped Newton Optimization. 118-131 - Berrak Sisman
, Junichi Yamagishi
, Simon King
, Haizhou Li
:
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning. 132-157 - Jilu Jin
, Gongping Huang
, Xuehan Wang
, Jingdong Chen
, Jacob Benesty
, Israel Cohen
:
Steering Study of Linear Differential Microphone Arrays. 158-170 - Ching Hua Lee
, Bhaskar D. Rao, Harinath Garudadri:
Proportionate Adaptive Filtering Algorithms Derived Using an Iterative Reweighting Framework. 171-186 - Shakeel Ahmed
, Muhammad Tufail
, Muhammad Rehan, Tanveer Abbas, Amna Majid:
A Novel Approach for Improved Noise Reduction Performance in Feed-Forward Active Noise Control Systems With (Loudspeaker) Saturation Non-Linearity in the Secondary Path. 187-197 - Cunhang Fan
, Jiangyan Yi
, Jianhua Tao
, Zhengkun Tian, Bin Liu, Zhengqi Wen:
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition. 198-209 - Amin Edraki
, Wai-Yip Chan
, Jesper Jensen
, Daniel Fogerty
:
Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis. 210-225 - Phan Le Son
:
On the Design of Sparse Arrays With Frequency-Invariant Beam Pattern. 226-238 - Dylan Menzies
, Philip Coleman
, Filippo Maria Fazi:
A Room Compensation Method by Modification of Reverberant Audio Objects. 239-252 - Yonggang Hu
, Thushara D. Abhayapala
, Prasanga N. Samarasinghe
:
Multiple Source Direction of Arrival Estimations Using Relative Sound Pressure Based MUSIC. 253-264 - Alan Kan
, Qinglin Meng
:
The Temporal Limits Encoder as a Sound Coding Strategy for Bilateral Cochlear Implants. 265-273 - Rui Liu
, Berrak Sisman
, Feilong Bao, Jichen Yang
, Guanglai Gao, Haizhou Li
:
Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis. 274-285 - Fei Ma
, Thushara D. Abhayapala
, Wen Zhang
:
Multiple Circular Arrays of Vector Sensors for Real-Time Sound Field Analysis. 286-299 - David Diaz-Guerra
, Antonio Miguel, José Ramón Beltrán
:
Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks. 300-311 - Viet Anh Trinh
, Michael I. Mandel:
Directly Comparing the Listening Strategies of Humans and Machines. 312-323 - Leda Sari
, Mark Hasegawa-Johnson
, Samuel Thomas
:
Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection. 324-333 - Jielong Yang
, Xionghu Zhong, Weiguang Chen
, Wenwu Wang
:
Multiple Acoustic Source Localization in Microphone Array Networks. 334-347 - Bin Wu
, Sakriani Sakti
, Jinsong Zhang, Satoshi Nakamura
:
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load. 348-362 - Taewoong Lee
, Liming Shi
, Jesper Kjær Nielsen
, Mads Græsbøll Christensen
:
Fast Generation of Sound Zones Using Variable Span Trade-Off Filters in the DFT-Domain. 363-378 - Maoshen Jia
, Yuxuan Wu, Changchun Bao
, Christian H. Ritz:
Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points. 379-392 - Wei Xue
, Alastair H. Moore
, Mike Brookes
, Patrick A. Naylor
:
Speech Enhancement Based on Modulation-Domain Parametric Multichannel Kalman Filtering. 393-405 - Wei Song
, Jingjin Guo, Ruiji Fu, Ting Liu, Lizhen Liu:
A Knowledge Graph Embedding Approach for Metaphor Processing. 406-420 - Longbiao Cheng
, Xingwei Sun
, Dingding Yao, Junfeng Li
, Yonghong Yan
:
Estimation Reliability Function Assisted Sound Source Localization With Enhanced Steering Vector Phase Difference. 421-435 - Wangyang Yu
, W. Bastiaan Kleijn
:
Room Acoustical Parameter Estimation From Room Impulse Responses Using Deep Neural Networks. 436-447 - Miguel Ferrer
, Maria de Diego
, Gema Piñero
, Alberto González
:
Affine Projection Algorithm Over Acoustic Sensor Networks for Active Noise Control. 448-461 - Nico Gößling
, Daniel Marquardt
, Simon Doclo
:
Performance Analysis of the Extended Binaural MVDR Beamformer With Partial Noise Estimation. 462-476 - Gábor Gosztolya
, Róbert Busa-Fekete
:
Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy. 477-488 - Alfred Mertins
, Marco Maaß
, Fabrice Katzberg
:
Room Impulse Response Reshaping and Crosstalk Cancellation Using Convex Optimization. 489-502 - Xuefeng Bai
, Pengbo Liu
, Yue Zhang
:
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network. 503-514 - Bengt J. Borgström
, Michael S. Brandstein:
Speech Enhancement via Attention Masking Network (SEAMNET): An End-to-End System for Joint Suppression of Noise and Reverberation. 515-526 - Juan Manuel Miramont
, Marcelo Alejandro Colominas
, Gastón Schlotthauer
:
Voice Jitter Estimation Using High-Order Synchrosqueezing Operators. 527-536 - Peidong Wang
, Zhuo Chen, DeLiang Wang
, Jinyu Li
, Yifan Gong:
Speaker Separation Using Speaker Inventories and Estimated Speech. 537-546 - Sandro Cumani
:
On the Distribution of Speaker Verification Scores: Generative Models for Unsupervised Calibration. 547-562 - Yu-Ren Chien
, Jón Guðnason:
Acoustic Measure of Vocal Strain Based on Glottal Airflow Periodicity. 563-574 - Xingfa Shen, Xingkun Shao
, Quanbo Ge, Lili Liu
:
RARS: Recognition of Audio Recording Source Based on Residual Neural Network. 575-584 - Gang Chen
, Yang Liu
, Huanbo Luan, Meng Zhang, Qun Liu
, Maosong Sun:
Learning to Generate Explainable Plots for Neural Story Generation. 585-593 - Wenxing Yang
, Jacob Benesty
, Gongping Huang
, Jingdong Chen
:
A New Class of Differential Beamformers. 594-606 - Yuki Mitsufuji
, Norihiro Takamune, Shoichi Koyama
, Hiroshi Saruwatari
:
Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain. 607-617 - Dörte Fischer
, Simon Doclo
:
Robust Constrained MFMVDR Filters for Single-Channel Speech Enhancement Based on Spherical Uncertainty Set. 618-631 - Xudong Zhao
, Jacob Benesty
, Jingdong Chen
, Gongping Huang
:
Differential Beamforming From the Beampattern Factorization Perspective. 632-643 - Yuki Kawara
, Chenhui Chu, Yuki Arase:
Preordering Encoding on Transformer for Translation. 644-655 - Hirokazu Kameoka
, Wen-Chin Huang
, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda:
Many-to-Many Voice Transformer Network. 656-670 - Jie Zhang
, Huawei Chen
, Li-Rong Dai, Richard Christian Hendriks
:
A Study on Reference Microphone Selection for Multi-Microphone Speech Enhancement. 671-683 - Archontis Politis
, Annamaria Mesaros
, Sharath Adavanne
, Toni Heittola
, Tuomas Virtanen
:
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019. 684-698 - Markus Niermann
, Peter Vary:
Listening Enhancement in Noisy Environments: Solutions in Time and Frequency Domain. 699-709 - Hyeonseung Lee
, Woo Hyun Kang
, Sung Jun Cheon
, Hyeongju Kim, Nam Soo Kim
:
Gated Recurrent Context: Softmax-Free Attention for Online Encoder-Decoder Speech Recognition. 710-719 - Elizabeth Vargas
, James R. Hopgood
, Keith E. Brown
, Kartic Subr:
On Improved Training of CNN for Acoustic Source Localisation. 720-732 - Yunqi Cai
, Lantian Li
, Andrew Abel
, Xiaoyan Zhu, Dong Wang
:
Deep Normalization for Speaker Vectors. 733-744 - Wen-Chin Huang
, Tomoki Hayashi
, Yi-Chiao Wu
, Hirokazu Kameoka
, Tomoki Toda:
Pretraining Techniques for Sequence-to-Sequence Voice Conversion. 745-755 - Arindam Jati
, Amrutha Nadarajan, Raghuveer Peri
, Karel Mundnich
, Tiantian Feng, Benjamin Girault
, Shrikanth Narayanan
:
Temporal Dynamics of Workplace Acoustic Scenes: Egocentric Analysis and Prediction. 756-769 - Chaoqun Duan
, Kehai Chen
, Rui Wang
, Masao Utiyama, Eiichiro Sumita, Conghui Zhu, Tiejun Zhao:
Modeling Future Cost for Neural Machine Translation. 770-781 - Kashif Munir
, Hai Zhao, Zuchao Li
:
Adaptive Convolution for Semantic Role Labeling. 782-791 - Yi-Chiao Wu
, Tomoki Hayashi
, Takuma Okamoto, Hisashi Kawai, Tomoki Toda
:
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network. 792-806 - Weitao Yuan
, Bofei Dong, Shengbei Wang
, Masashi Unoki
, Wenwu Wang
:
Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation. 807-822 - Liming Shi
, Taewoong Lee
, Lijun Zhang, Jesper Kjær Nielsen
, Mads Græsbøll Christensen
:
Generation of Personal Sound Zones With Physical Meaningful Constraints and Conjugate Gradient Method. 823-837 - Xi Chen
, Jacob Benesty
, Gongping Huang
, Jingdong Chen
:
On the Robustness of the Superdirective Beamformer. 838-849 - Xinsheng Wang
, Tingting Qiao
, Jihua Zhu
, Alan Hanjalic
, Odette Scharenborg
:
Generating Images From Spoken Descriptions. 850-865 - Vevake Balaraman
, Bernardo Magnini:
Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems. 866-873 - Xixin Wu
, Yuewen Cao, Hui Lu
, Songxiang Liu
, Shiyin Kang, Zhiyong Wu, Xunying Liu
, Helen Meng:
Exemplar-Based Emotive Speech Synthesis. 874-886 - Heinrich Dinkel
, Mengyue Wu, Kai Yu
:
Towards Duration Robust Weakly Supervised Sound Event Detection. 887-900 - Zamir Ben-Hur
, David Lou Alon, Ravish Mehra, Boaz Rafaely
:
Binaural Reproduction Based on Bilateral Ambisonics and Ear-Aligned HRTFs. 901-913 - Philipp Aichinger
, Franz Pernkopf
:
Synthesis and Analysis-By-Synthesis of Modulated Diplophonic Glottal Area Waveforms. 914-926 - Finnian Kelly, John H. L. Hansen
:
Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition. 927-942 - Matthias Müller
, Thilo Schulz
, Tatiana Ermakova
, Philipp P. Caffier
:
Lyric or Dramatic - Vibrato Analysis for Voice Type Classification in Professional Opera Singers. 943-955 - Demóstenes Z. Rodríguez
, Dick Carrillo, Miguel Arjona Ramírez
, Pedro H. J. Nardelli, Sebastian Möller:
Incorporating Wireless Communication Parameters Into the E-Model Algorithm. 956-968 - Tianrui Zong
, Yong Xiang
, Iynkaran Natgunanathan, Longxiang Gao
, Guang Hua
, Wanlei Zhou
:
Non-Linear-Echo Based Anti-Collusion Mechanism for Audio Signals. 969-984 - Zheng Lian
, Bin Liu
, Jianhua Tao
:
CTNet: Conversational Transformer Network for Emotion Recognition. 985-1000 - Jiacheng Zhang
, Huanbo Luan
, Maosong Sun
, Feifei Zhai
, Jingfang Xu
, Yang Liu
:
Neural Machine Translation With Explicit Phrase Alignment. 1001-1010 - Maria Vukovic
, Melissa N. Stolar, Margaret Lech
:
Cognitive Load Estimation From Speech Commands to Simulated Aircraft. 1011-1022 - De Hu
, Zhe Chen
, Fuliang Yin
:
Geometry Calibration for Acoustic Transceiver Networks Based on Network Newton Distributed Optimization. 1023-1032 - Yuki Saito
, Shinnosuke Takamichi
, Hiroshi Saruwatari
:
Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling. 1033-1048 - Tadashi Sakata
, Naomitsu Ikeda, Yuichi Ueda, Akira Watanabe:
Vocal Tract Length Estimation Using Accumulated Means of Formants and Its Effects on Speaker-Normalization. 1049-1064 - Jichen Yang
, Hongji Wang, Rohan Kumar Das
, Yanmin Qian
:
Modified Magnitude-Phase Spectrum Information for Spoofing Detection. 1065-1078 - Yanmin Qian
, Zhengyang Chen
, Shuai Wang
:
Audio-Visual Deep Neural Network for Robust Person Verification. 1079-1092 - Peiqin Lin
, Meng Yang
, Jianhuang Lai
:
Deep Selective Memory Network With Selective Attention and Inter-Aspect Modeling for Aspect Level Sentiment Classification. 1093-1106 - Herman Kamper
, Yevgen Matusevych, Sharon Goldwater:
Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer. 1107-1118 - Weiqing Wang
, Jin Pan, Hua Yi, Zhanmei Song, Ming Li
:
Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism. 1119-1133 - Yi-Chiao Wu
, Tomoki Hayashi
, Patrick Lumban Tobing
, Kazuhiro Kobayashi
, Tomoki Toda
:
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network. 1134-1148 - Vesa Välimäki
, Karolina Prawda
:
Late-Reverberation Synthesis Using Interleaved Velvet-Noise Sequences. 1149-1160 - Zhuosheng Zhang
, Junlong Li, Hai Zhao:
Multi-Turn Dialogue Reading Comprehension With Pivot Turns and Knowledge. 1161-1173 - Clément Gaultier
, Srdan Kitic, Rémi Gribonval
, Nancy Bertin
:
Sparsity-Based Audio Declipping Methods: Selected Overview, New Algorithms, and Large-Scale Evaluation. 1174-1187 - Lachlan Birnie
, Thushara D. Abhayapala
, Vladimir Tourbabin, Prasanga N. Samarasinghe
:
Mixed Source Sound Field Translation for Virtual Binaural Application With Perceptual Validation. 1188-1203 - Monisankha Pal
, Manoj Kumar, Raghuveer Peri
, Tae Jin Park
, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan
:
Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization. 1204-1219 - Jie Zhang
, Jun Du
, Li-Rong Dai:
Sensor Selection for Relative Acoustic Transfer Function Steered Linearly-Constrained Beamformers. 1220-1232 - Huang Xie
, Tuomas Virtanen
:
Zero-Shot Audio Classification Via Semantic Embeddings. 1233-1242 - Xianhong Chen
, Changchun Bao
:
Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification. 1243-1255 - Dong-Yuan Shi
, Woon-Seng Gan
, Bhan Lam
, Shulin Wen
, Xiaoyi Shen:
Optimal Output-Constrained Active Noise Control Based on Inverse Adaptive Modeling Leak Factor Estimate. 1256-1269 - Ashutosh Pandey
, DeLiang Wang
:
Dense CNN With Self-Attention for Time-Domain Speech Enhancement. 1270-1279 - Libo Qin
, Wanxiang Che
, Minheng Ni
, Yangming Li, Ting Liu:
Knowing Where to Leverage: Context-Aware Graph Convolutional Network With an Adaptive Fusion Layer for Contextual Spoken Language Understanding. 1280-1289 - Mingyang Zhang
, Yi Zhou, Li Zhao, Haizhou Li
:
Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data. 1290-1302 - Weipeng He
, Petr Motlícek, Jean-Marc Odobez:
Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation. 1303-1317 - Yile Wang
, Leyang Cui, Yue Zhang
:
Improving Skip-Gram Embeddings Using BERT. 1318-1328 - Linzhi Wu
, Meishan Zhang
:
Deep Graph-Based Character-Level Chinese Dependency Parsing. 1329-1339 - Ye Bai
, Jiangyan Yi
, Jianhua Tao
, Zhengqi Wen, Zhengkun Tian, Shuai Zhang:
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data. 1340-1351 - Byung Joon Cho
, Hyung-Min Park
:
Convolutional Maximum-Likelihood Distortionless Response Beamforming With Steering Vector Estimation for Robust Speech Recognition. 1352-1367 - Daniel Michelsanti
, Zheng-Hua Tan
, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen:
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation. 1368-1396 - Gal Itzhak
, Jacob Benesty
, Israel Cohen
:
On the Design of Differential Kronecker Product Beamformers. 1397-1410 - Zhongshu Ge, Liang Li, Tianshu Qu
:
Partially Matching Projection Decoding Method Evaluation Under Different Playback Conditions. 1411-1423 - Sijie Mai
, Songlong Xing
, Haifeng Hu
:
Analyzing Multimodal Sentiment Via Acoustic- and Visual-LSTM With Channel-Aware Temporal Convolution Network. 1424-1437 - Tao Qian
, Meishan Zhang
, Yinxia Lou
, Daiwen Hua:
A Joint Model for Named Entity Recognition With Sentence-Level Entity Type Attentions. 1438-1448 - Ryotaro Sato
, Kenta Niwa
, Kazunori Kobayashi:
Ambisonic Signal Processing DNNs Guaranteeing Rotation, Scale and Time Translation Equivariance. 1449-1462 - Sooyeon Park
, Jung-Woo Choi
:
Iterative Echo Labeling Algorithm With Convex Hull Expansion for Room Geometry Estimation. 1463-1478 - Aidan O. T. Hogg
, Christine Evers
, Alastair H. Moore
, Patrick A. Naylor
:
Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking of Fundamental Frequency. 1479-1490 - Rajib Sharma
, Israel Cohen
, Baruch Ber