


default search action
INTERSPEECH 2015: Dresden, Germany
- 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015, Dresden, Germany, September 6-10, 2015. ISCA 2015
Keynotes
- Mary E. Beckman:
The emergence of compositional structure in language evolution and development. - Ruhi Sarikaya:
The technology powering personal digital assistants. - Katrin Amunts:
The HBP-atlas - concept, perspectives, and application for language and speech research. - Klaus R. Scherer:
Voices of power, passion, and personality.
Feature Extraction and Modeling with Neural Networks
- Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin W. Wilson, Oriol Vinyals:
Learning the speech front-end with raw waveform CLDNNs. 1-5 - Mayank Bhargava, Richard Rose:
Architectures for deep neural network based acoustic models defined over windowed speech waveforms. 6-10 - Dimitri Palaz, Mathew Magimai-Doss, Ronan Collobert:
Analysis of CNN-based speech recognition system using raw speech as input. 11-15 - Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta:
Bilinear map of filter-bank outputs for DNN-based speech recognition. 16-20 - Payton Lin, Dau-Cheng Lyu, Yun-Fan Chang, Yu Tsao:
Speech recognition with temporal neural networks. 21-25 - Pavel Golik, Zoltán Tüske, Ralf Schlüter, Hermann Ney:
Convolutional neural networks for acoustic modeling of raw time signal in LVCSR. 26-30
Prosody 1-3
- Ulrike Glavitsch, Lei He, Volker Dellwo:
Stable and unstable intervals as a basic segmentation procedure of the speech signal. 31-35 - Andreas Windmann, Juraj Simko, Petra Wagner:
Polysyllabic shortening and word-final lengthening in English. 36-40 - Anders Eriksson, Mattias Heldner:
The acoustics of word stress in English as a function of stress level and speaking style. 41-45 - Katharina Zahner, Muna Pohl, Bettina Braun:
Pitch accent distribution in German infant-directed speech. 46-50 - Hansjörg Mixdorff, Christian G. Cossio Mercado, Angelika Hönemann, Jorge A. Gurlekian, Diego A. Evin, Humberto M. Torres:
Acoustic correlates of perceived syllable prominence in German. 51-55 - Simone Simonetti, Jeesun Kim, Chris Davis:
Cross-modality matching of linguistic and emotional prosody. 56-59
Speech Intelligibility Enhancement
- Tudor-Catalin Zorila, Yannis Stylianou:
A fast algorithm for improved intelligibility of speech-in-noise based on frequency and time domain energy reallocation. 60-64 - Maria Koutsogiannaki, Petko Nikolov Petkov, Yannis Stylianou:
Intelligibility enhancement of casual speech for reverberant environments inspired by clear speech properties. 65-69 - Amira Ben Jemaa, N. Mechergui, G. Courtois, A. Mudry, Sonia Djaziri Larbi, Monia Turki, Hervé Lissek, Meriem Jaïdane:
Intelligibility enhancement of vocal announcements for public address systems: a design for all through a presbycusis pre-compensation filter. 70-74 - Henning F. Schepker, David Hülsmeier, Jan Rennies, Simon Doclo:
Model-based integration of reverberation for noise-adaptive near-end listening enhancement. 75-79 - Sebastian Rottschäfer, Hendrik Buschmeier, Herwin van Welbergen, Stefan Kopp:
Online Lombard adaptation in incremental speech synthesis. 80-84 - Emma Jokinen, Ulpu Remes, Paavo Alku:
Comparison of Gaussian process regression and Gaussian mixture models in spectral tilt modelling for intelligibility enhancement of telephone speech. 85-89
Detecting and Predicting Mental and Social Disorders
- Naveen Kumar, Shrikanth S. Narayanan:
A discriminative reliability-aware classification model with applications to intelligibility classification in pathological speech. 90-94 - Juan Rafael Orozco-Arroyave, Florian Hönig, Julián D. Arias-Londoño, Jesús Francisco Vargas-Bonilla, Sabine Skodda, Jan Rusz, Elmar Nöth:
Voiced/unvoiced transitions in speech as a potential bio-marker to detect parkinson's disease. 95-99 - Tatiana Villa-Cañas, Julián D. Arias-Londoño, Juan Rafael Orozco-Arroyave, Jesús Francisco Vargas-Bonilla, Elmar Nöth:
Low-frequency components analysis in running speech for the automatic detection of parkinson's disease. 100-104 - Juan Camilo Vásquez-Correa, Tomás Arias-Vergara
, Juan Rafael Orozco-Arroyave, Jesús Francisco Vargas-Bonilla, Julián D. Arias-Londoño, Elmar Nöth:
Automatic detection of parkinson's disease from continuous speech recorded in non-controlled noise conditions. 105-109 - Nicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Jarek Krajewski:
Relevance vector machine for depression prediction. 110-114 - Erik Marchi, Björn W. Schuller, Simon Baron-Cohen
, Ofer Golan, Sven Bölte, Prerna Arora, Reinhold Häb-Umbach:
Typicality and emotion in the voice of children with autism spectrum condition: evidence across three languages. 115-119
Spoken Language Understanding 1-3
- Chunxi Liu, Puyang Xu, Ruhi Sarikaya:
Deep contextual language understanding in spoken dialogue systems. 120-124 - Yik-Cheung Tam, Yangyang Shi, Hunk Chen, Mei-Yuh Hwang:
RNN-based labeled data generation for spoken language understanding. 125-129 - Vedran Vukotic, Christian Raymond, Guillaume Gravier:
Is it time to Switch to word embedding and recurrent neural networks for spoken language understanding? 130-134 - Suman V. Ravuri, Andreas Stolcke:
Recurrent neural network and LSTM models for lexical utterance classification. 135-139 - Hung-tsung Lu, Yuan-ming Liou, Hung-yi Lee, Lin-Shan Lee:
Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors. 140-144 - Mohamed Morchid, Richard Dufour, Driss Matrouf:
A comparison of normalization techniques applied to latent space representations for speech analytics. 145-149
Active Perception in Human and Machine Speech Communication (Special Session)
- Éva Székely, Mark T. Keane, Julie Carson-Berndsen:
The effect of soft, modal and loud voice levels on entrainment in noisy conditions. 150-154 - Benjamin R. Cowan, Holly P. Branigan:
Does voice anthropomorphism affect lexical alignment in speech-based human-computer dialogue? 155-159 - Ning Ma, Guy J. Brown, José A. González:
Exploiting top-down source models to improve binaural localisation of multiple sources in reverberant environments. 160-164 - Christopher Schymura, Fiete Winter, Dorothea Kolossa, Sascha Spors:
Binaural sound source localisation and tracking using a dynamic spherical head model. 165-169 - Tobias May, Thomas Bentsen, Torsten Dau:
The role of temporal resolution in modulation-based speech segregation. 170-174 - Hendrik Kayser, Constantin Spille, Daniel Marquardt, Bernd T. Meyer:
Improving automatic speech recognition in spatially-aware hearing aids. 175-179 - Randy Gomez, Levko Ivanchuk, Keisuke Nakamura, Takeshi Mizumoto, Kazuhiro Nakadai:
Dereverberation for active human-robot communication robust to speaker's face orientation. 180-184
Speaker Recognition and Diarization 1-3
- Nanxin Chen, Yanmin Qian, Kai Yu:
Multi-task learning for text-dependent speaker verification. 185-189 - Themos Stafylakis, Patrick Kenny, Md. Jahangir Alam, Marcel Kockmann:
JFA for speaker recognition with random digit strings. 190-194 - Elena Knyazeva, Guillaume Wisniewski, Hervé Bredin, François Yvon:
Structured prediction for speaker identification in TV series. 195-199 - Sandro Cumani, Pietro Laface, Farzana Kulsoom:
Speaker recognition by means of acoustic and phonetically informed GMMs. 200-204 - Ashish Panda:
A fast approach to psychoacoustic model compensation for robust speaker recognition in additive noise. 205-209 - Danila Doroshin, Nikolay Lubimov, Marina Nastasenko, Mikhail Kotov:
Blind score normalization method for PLDA based speaker recognition. 210-213 - Sergey Novoselov, Timur Pekhovsky, Oleg Kudashev, Valentin S. Mendelev, Alexey Prudnikov:
Non-linear PLDA for i-vector speaker verification. 214-218 - Carlos Vaquero, Patricia Rodríguez:
On the need of template protection for voice authentication. 219-223 - Finnian Kelly, John H. L. Hansen:
Evaluation and calibration of short-term aging effects in speaker verification. 224-228 - Liping Chen, Kong-Aik Lee, Bin Ma, Wu Guo, Haizhou Li, Li-Rong Dai:
Phone-centric local variability vector for text-constrained speaker verification. 229-233 - Kuruvachan K. George, C. Santhosh Kumar, K. I. Ramachandran, Ashish Panda:
Cosine distance features for robust speaker verification. 234-238 - Sayaka Shiota, Fernando Villavicencio, Junichi Yamagishi, Nobutaka Ono, Isao Echizen, Tomoko Matsui:
Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification. 239-243 - Antti Hurmalainen, Rahim Saeidi, Tuomas Virtanen:
Noise robust speaker recognition with convolutive sparse coding. 244-248 - Md. Jahangir Alam, Patrick Kenny, Themos Stafylakis:
Combining amplitude and phase-based features for speaker verification with short duration utterances. 249-253
Speech Synthesis 1-3
- Tuomo Raitio, Lauri Juvela, Antti Suni, Martti Vainio, Paavo Alku:
Phase perception of the glottal excitation of vocoded speech. 254-258 - Sunayana Sitaram, Serena Jeblee, Alan W. Black:
Using acoustics to improve pronunciation for synthesis of low resource languages. 259-263 - Tadashi Inai, Sunao Hara, Masanobu Abe, Yusuke Ijima, Noboru Miyazaki, Hideyuki Mizuno:
Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum. 264-268 - Heng Lu, Wei Zhang, Xu Shao, Quan Zhou, Wenhui Lei, Hongbin Zhou, Andrew P. Breen:
Pruning redundant synthesis units based on static and delta unit appearance frequency. 269-273 - Yamato Ohtani, Yu Nasu, Masahiro Morita, Masami Akamine:
Emotional transplant in statistical speech synthesis based on emotion additive model. 274-278 - Xurong Xie, Xunying Liu, Lan Wang, Rongfeng Su:
Generalized variable parameter HMMs based acoustic-to-articulatory inversion. 279-283 - Seyed Hamidreza Mohammadi, Alexander Kain:
Semi-supervised training of a voice conversion mapping function using a joint-autoencoder. 284-288 - Stefan Huber, Axel Roebel:
On glottal source shape parameter transformation using a novel deterministic and stochastic speech analysis and synthesis system. 289-293 - Yi-Chin Huang, Chung-Hsien Wu, Ming-Ge Shie:
Fluent personalized speech synthesis with prosodic word-level spontaneous speech generation. 294-298 - Yuji Oshima, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura:
Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics. 299-303 - Markus Toman, Michael Pucher:
Evaluation of state mapping based foreign accent conversion. 304-308 - Zhizheng Wu, Simon King:
Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features. 309-313
Mining and Annotation of Spoken and Multimodal Resources
- Jindrich Matousek, Daniel Tihelka:
Anomaly-based annotation errors detection in TTS corpora. 314-318 - Katrin Schweitzer, Markus Gärtner, Arndt Riester, Ina Rösiger, Kerstin Eckart, Jonas Kuhn, Grzegorz Dogil:
Analysing automatic descriptions of intonation with ICARUS. 319-323 - Nancy F. Chen, Rong Tong, Darren Wee, Pei Xuan Lee, Bin Ma, Haizhou Li:
iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent. 324-328 - Ka-Ho Wong
, Yu Ting Yeung, Edwin H. Y. Chan, Patrick C. M. Wong, Gina-Anne Levow, Helen M. Meng:
Development of a Cantonese dysarthric speech corpus. 329-333 - Harish Arsikere, Sonal Patil, Ranjeet Kumar, Kundan Shrivastava, Om Deshmukh:
Stylex: a corpus of educational videos for research on speaking styles and their impact on engagement and learning. 334-338 - Dogan Can, David C. Atkins, Shrikanth S. Narayanan:
A dialog act tagging approach to behavioral coding: a case study of addiction counseling conversations. 339-343 - Valentina Vapnarsky, Claude Barras, Cédric Becquey, David Doukhan, Martine Adda-Decker, Lori Lamel:
Analysing rhythm in ritual discourse in yucatec maya using automatic speech alignment. 344-348 - Madina Hasan, Rama Doddipatla, Thomas Hain
:
Noise-matched training of CRF based sentence end detection models. 349-353 - Jianjing Kuang, Mark Y. Liberman:
The effect of spectral slope on pitch perception. 354-358
Speech Production Data and Models
- Honghao Bao, Wenhuan Lu, Kiyoshi Honda, Jianguo Wei, Qiang Fang, Jianwu Dang:
Combined cine- and tagged-MRI for tracking landmarks on the tongue surface. 359-363 - Guillaume Barbier, Louis-Jean Boë, Guillaume Captier, Rafael Laboissière:
Human vocal tract growth: a longitudinal study of the development of various anatomical structures. 364-368 - Ganesh Sivaraman, Vikramjit Mitra, Mark K. Tiede, Elliot Saltzman, Louis Goldstein, Carol Y. Espy-Wilson:
Analysis of coarticulated speech using estimated articulatory trajectories. 369-373 - Guillaume Barbier, Pascal Perrier, Lucie Ménard, Yohan Payan, Mark K. Tiede, Joseph S. Perkell:
Speech planning in 4-year-old children versus adults: acoustic and articulatory analyses. 374-378 - Tokihiko Kaburagi:
Morphological and acoustic analysis of the vocal tract using a multi-speaker volumetric MRI dataset. 379-383 - Zisis Iason Skordilis, Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan:
Experimental assessment of the tongue incompressibility hypothesis during speech production. 384-388
Deep Neural Networks in Language and Accent Recognition
- Radek Fér, Pavel Matejka, Frantisek Grézl, Oldrich Plchot, Jan Cernocký:
Multilingual bottleneck features for language recognition. 389-393 - Alan McCree, Daniel Garcia-Romero:
DNN senone MAP multinomial i-vectors for phonotactic language recognition. 394-397 - Yan Song, Xinhai Hong, Bing Jiang, Ruilian Cui, Ian McLoughlin, Li-Rong Dai:
Deep bottleneck network based i-vector representation for language identification. 398-402 - Alicia Lozano-Diez, Rubén Zazo-Candil, Javier Gonzalez-Dominguez, Doroteo T. Toledano, Joaquín González-Rodríguez:
An end-to-end approach to language identification in short utterances using convolutional neural networks. 403-407 - Ville Hautamäki, Sabato Marco Siniscalchi, Hamid Behravan, Valerio Mario Salerno, Ivan Kukanov:
Boosting universal speech attributes classification with deep neural network for foreign accent characterization. 408-412 - Wang Geng, Jie Li, Shanshan Zhang, Xinyuan Cai, Bo Xu:
Multilingual tandem bottleneck feature for language identification. 413-417
Speech Transmission
- Afsaneh Asaei, Milos Cernak, Hervé Bourlard:
On compressibility of neural network phonological features for low bit rate speech coding. 418-422 - Michal Lenarczyk:
Robust and accurate LSF location with laguerre method. 423-427 - Jochen Issing, Nikolaus Färber, Reinhard German:
Interactivity-aware playout adaptation. 428-432 - Jochen Issing, Nikolaus Färber, Reinhard German:
Advanced time shrinking using a drop classifier based on codec features. 433-437 - Andrew Hines, Eoin Gillen, Naomi Harte:
Measuring and monitoring speech quality for voice over IP with POLQA, viSQOL and p.563. 438-442 - Laura Fernández Gallardo, Sebastian Möller:
Towards the prediction of human speaker identification performance from measured speech quality. 443-447
Language Modeling for Conversational Speech
- Michael Levit, Andreas Stolcke, R. Subba, Sarangarajan Parthasarathy, Shuangyu Chang, S. Xie, T. Anastasakos, Benoît Dumoulin:
Personalization of word-phrase-entity language models. 448-452 - Akio Kobayashi, Manon Ichiki, Takahiro Oku, Kazuo Onoe, Shoei Sato:
Discriminative bilinear language modeling for broadcast transcriptions. 453-457 - Xi Ma, Xiaoxi Wang, Dong Wang, Zhiyong Zhang:
Recognize foreign low-frequency words with similar pairs. 458-462 - Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito:
Combinations of various language model technologies including data expansion and adaptation in spontaneous speech recognition. 463-467 - Petar S. Aleksic, Mohammadreza Ghodsi, Assaf Hurwitz Michaely, Cyril Allauzen, Keith B. Hall, Brian Roark, David Rybach, Pedro J. Moreno:
Bringing contextual information to google speech recognition. 468-472 - Lucy Vasserman, Vlad Schogol, Keith B. Hall:
Sequence-based class tagging for robust transcription in ASR. 473-477
Interspeech 2015 Computational Paralinguistics ChallengE (ComParE): Degree of Nativeness, Parkinson's & Eating Condition (Special Session)
- Florian Hönig:
The degree of nativeness sub-challenge: the data. - Juan Rafael Orozco-Arroyave:
The parkinson's condition sub-challenge: the data. - Anton Batliner:
The eating condition sub-challenge: the data. - Stefan Steidl:
The INTERSPEECH 2015 computational paralinguistics challenge: a summary of results. - Björn W. Schuller, Stefan Steidl, Anton Batliner, Simone Hantke, Florian Hönig, Juan Rafael Orozco-Arroyave, Elmar Nöth, Yue Zhang, Felix Weninger:
The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, parkinson's & eating condition. 478-482 - Claude Montacié
, Marie-José Caraty:
Phrase accentuation verification and phonetic variation measurement for the degree of nativeness sub-challenge. 483-487 - Eugénio Ribeiro, Jaime Ferreira, Julia Olcoz, Alberto Abad, Helena Moniz, Fernando Batista, Isabel Trancoso:
Combining multiple approaches to predict the degree of nativeness. 488-492 - Matthew P. Black, Daniel Bone, Zisis Iason Skordilis, Rahul Gupta, Wei Xia, Pavlos Papadopoulos, Sandeep Nallan Chakravarthula, Bo Xiao, Maarten Van Segbroeck, Jangwon Kim, Panayiotis G. Georgiou, Shrikanth S. Narayanan:
Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scales. 493-497 - David Sztahó, Gábor Kiss, Klára Vicsi:
Estimating the severity of parkinson's disease from speech using linear regression and database partitioning. 498-502 - Alexander Zlotnik, Juan Manuel Montero, Rubén San Segundo, Ascensión Gallardo-Antolín:
Random forest-based prediction of parkinson's disease progression using acoustic, ASR and intelligibility features. 503-507 - Guozhen An, David Guy Brizan, Min Ma, Michelle Morales, Ali Raza Syed, Andrew Rosenberg:
Automatic recognition of unified parkinson's disease rating from speech with acoustic, i-vector and phonotactic features. 508-512 - Seongjun Hahm, Jun Wang:
Parkinson's condition estimation using speech acoustic and inversely mapped articulatory data. 513-517 - James R. Williamson, Thomas F. Quatieri, Brian S. Helfer, Joseph Perricone, Satrajit S. Ghosh, Gregory A. Ciccarelli, Daryush D. Mehta:
Segment-dependent dynamics in predicting parkinson's disease. 518-522
Pronunciation, Prosody and Audiovisual Features and Models
- S. M. Houghton, Colin J. Champion, Philip Weber:
Recognition of voiced sounds with a continuous state HMM. 523-527 - Xiangyu Zeng, Shi Yin, Dong Wang:
Learning speech rate in speech recognition. 528-532 - Guoguo Chen, Hainan Xu, Minhua Wu, Daniel Povey, Sanjeev Khudanpur:
Pronunciation and silence probability modeling for ASR. 533-537 - Marelie H. Davel, Etienne Barnard, Charl Johannes van Heerden, William Hartmann, Damianos G. Karakos, Richard M. Schwartz, Stavros Tsakalidis:
Exploring minimal pronunciation modeling for low resource languages. 538-542 - Hao Zheng, Zhanlei Yang, Liwei Qiao, Jianping Li, Wenju Liu:
Attribute knowledge integration for speech recognition based on multi-task learning neural networks. 543-547 - Etienne Marcheret, Gerasimos Potamianos, Josef Vopicka, Vaibhava Goel:
Detecting audio-visual synchrony using deep neural networks. 548-552 - Shahram Kalantari, David Dean, Houman Ghaemmaghami, Sridha Sridharan, Clinton Fookes:
Cross database training of audio-visual hidden Markov models for phone recognition. 553-557 - Shahram Kalantari, David Dean, Sridha Sridharan:
Incorporating visual information for spoken term detection. 558-562