default search action
Odyssey 2024: Quebec City, Canada
- Najim Dehak, Patrick Cardinal:
Odyssey 2024: The Speaker and Language Recognition Workshop, Quebec City, Canada, June 18-21, 2024. ISCA 2024
Keynotes
- Didier Meuwly:
Development and validation of an automatic approach addressing the forensic question of identity of source - the contribution of the speaker recognition field. - Craig S. Greenberg:
A Brief History of the NIST Speaker Recognition Evaluations. - Jesús Villalba:
Towards Speech Processing Robust to Adversarial Deceptions. - Carlos Busso:
Toward Robust and Discriminative Emotional Speech Representations. - Joon Son Chung:
Multimodal Learning of Speech and Speaker Representations.
Forensic Speaker Recognition
- Vincent Hughes, Chenzi Xu, Paul Foulkes, Philip Harrison, Poppy Welch, Finnian Kelly, David van der Vloed:
Exploring individual speaker behaviour within a forensic automatic speaker recognition system. 1-8 - Imen Ben Amor, Jean-François Bonastre, David van der Vloed:
Forensic speaker recognition with BA-LR: calibration and evaluation on a forensically realistic database. 9-16 - Petr Motlícek, Erinç Dikici, Srikanth R. Madikeri, Pradeep Rangappa, Miroslav Jánosík, Gerhard Backfried, Dorothea Thomas-Aniola, Maximilian Schürz, Johan Rohdin, Petr Schwarz, Marek Kovác, Kvetoslav Malý, Dominik Bobos, Mathias Leibiger, Costas Kalogiros, Andreas Alexopoulos, Daniel Kudenko, Zahra Ahmadi, Hoang H. Nguyen, Aravind Krishnan, Dawei Zhu, Dietrich Klakow, Maria Jofre, Francesco Calderoni, Denis Marraud, Nikolaos Koutras, Nikos Nikolau, Christiana Aposkiti, Panagiotis Douris, Konstantinos Gkountas, Eleni-Konstantina Sergidou, Wauter Bosma, Joshua Hughes, Hellenic Police Team:
ROXSD: The ROXANNE Multimodal and Simulated Dataset for Advancing Criminal Investigations. 17-24 - Linda Gerlach, Finnian Kelly, Kirsty McDougall, Anil Alexander:
Exploring speaker similarity based selection of relevant populations for forensic automatic speaker recognition. 25-30
Speaker Verification
- Nathan Griot, Mohammad MohammadAmini, Driss Matrouf, Raphaël Blouet, Jean-François Bonastre:
Attention-based Comparison on Aligned Utterances for Text-Dependent Speaker Verification. 31-37 - Théo Lepage, Réda Dehak:
Additive Margin in Contrastive Self-Supervised Frameworks to Learn Discriminative Speaker Representations. 38-42 - Abderrahim Fathan, Xiaolin Zhu, Jahangir Alam:
An investigative study of the effect of several regularization techniques on label noise robustness of self-supervised speaker verification systems. 43-50 - Oleksandra Zamana, Priit Käärd, Tanel Alumäe:
Using Pretrained Language Models for Improved Speaker Identification. 51-58 - Thomas Thebaud, Gabriel Hernández, Sarah Flora Samson Juan, Marie Tahon:
A Phonetic Analysis of Speaker Verification Systems through Phoneme selection and Integrated Gradients. 59-66
Speaker and Language Recogniton
- Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide:
Low-resource speech recognition and dialect identification of Irish in a multi-task framework. 67-73 - Aleix Espuña, Amrutha Prasad, Petr Motlícek, Srikanth R. Madikeri, Christof Schüpbach:
Normalizing Flows for Speaker and Language Recognition Backend. 74-80 - Satwik Dutta, Iván López-Espejo, Dwight Irvin, John H. L. Hansen:
Joint Language and Speaker Classification in Naturalistic Bilingual Adult-Toddler Interactions. 81-85 - Karen Jones, Kevin Walker, Christopher Caruso, Stephanie M. Strassel:
MAGLIC: The Maghrebi Language Identification Corpus. 86-90
Speaker Diarization
- Desh Raj, Matthew Wiesner, Matthew Maciejewski, Paola García, Daniel Povey, Sanjeev Khudanpur:
On Speaker Attribution with SURT. 91-98 - Can Cui, Imran A. Sheikh, Mostafa Sadeghi, Emmanuel Vincent:
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications. 99-106 - Juan Ignacio Álvarez-Trejos, Beltrán Labrador, Alicia Lozano-Diez:
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios. 107-114 - Joonas Kalda, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin:
PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings. 115-122 - Lin Zhang, Themos Stafylakis, Federico Landini, Mireia Díez, Anna Silnova, Lukás Burget:
Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information? 123-130 - Jenthe Thienpondt, Kris Demuynck:
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization. 131-136
Spoofing and Adversarial Attacks
- Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das:
Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks. 137-144 - Anh-Tuan Dao, Nicholas W. D. Evans, Driss Matrouf:
Spoofing detection in the wild: an investigation of approaches to improve generalisation. 145-150 - Matan Karo, Arie Yeredor, Itshak Lapidot:
Meaningful Embeddings for Explainable Countermeasures. 151-157 - Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas W. D. Evans, Jean-François Bonastre, Itshak Lapidot:
a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification. 158-164 - Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak:
Unraveling Adversarial Examples against Speaker Identification - Techniques for Attack Detection and Victim Model Classification. 165-171
Speech Synthesis
- Zongyang Du, Junchen Lu, Kun Zhou, Lakshmish Kaushik, Berrak Sisman:
Converting Anyone's Voice: End-to-End Expressive Voice Conversion with A Conditional Diffusion Model. 172-179 - Kun Zhou, Berrak Sisman, Carlos Busso, Bin Ma, Haizhou Li:
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion. 180-186 - Thibault Gaudier, Marie Tahon, Anthony Larcher, Yannick Estève:
Automatic Voice Identification after Speech Resynthesis using PPG. 187-193 - Shreeram Suresh Chandra, Zongyang Du, Berrak Sisman:
Exploring speech style spaces with language models: Emotional TTS without emotion labels. 194-200
Speech Pathologies and Fairness
- Anna Favaro, Najim Dehak, Thomas Thebaud, Jesús Villalba, Esther S. Oh, Laureano Moro-Velázquez:
Discovering Invariant Patterns of Cognitive Decline Via an Automated Analysis of the Cookie Thief Picture Description Task. 201-208 - Oubaïda Chouchane, Christoph Busch, Chiara Galdi, Nicholas W. D. Evans, Massimiliano Todisco:
A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness. 209-216 - Japan Bhatt, Harsh Patel, Hemant A. Patil:
Noise Robust Whisper Features for Dysarthric Automatic Speech Recognition. 217-224
Applications and Multimedia
- Reza Amini Gougeh, Nu Zhang, Zeljko Zilic:
Optimizing Auditory Immersion Safety on Edge Devices: An On-Device Sound Event Detection System. 225-231 - Martin Lebourdais, Pablo Gimeno, Théo Mariotte, Marie Tahon, Alfonso Ortega, Anthony Larcher:
3MAS: a multitask, multilabel, multidataset semi-supervised audio segmentation model. 232-239 - Gnana Praveen Rajasekhar, Jahangir Alam:
Cross-Modal Transformers for Audio-Visual Person Verification. 240-246
Emotion Challenge 1
- Lucas Goncalves, Ali N. Salman, Abinay Reddy Naini, Laureano Moro-Velázquez, Thomas Thebaud, Paola García, Najim Dehak, Berrak Sisman, Carlos Busso:
Odyssey 2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results. 247-254 - Henry Härm, Tanel Alumäe:
TalTech Systems for the Odyssey 2024 Emotion Recognition Challenge. 255-259 - Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua D. Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain:
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem. 260-265 - Federico Costa, Miquel India, Javier Hernando:
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition Challenge. 266-273
Emotion Challenge 2
- Miguel Ángel Pastor Yoldi, Alfonso Ortega, Antonio Miguel, Dayana Ribas:
The ViVoLab System for the Odyssey Emotion Recognition Challenge 2024 Evaluation. 274-280 - Meysam Shamsi, Lara Gauder, Marie Tahon:
The CONILIUM proposition for Odyssey Emotion Challenge : Leveraging major class with complex annotations. 281-287 - Jaime Bellver-Soler, Iván Martín-Fernández, Jose M. Bravo-Pacheco, Sergio Esteban Romero, Fernando Fernández Martínez, Luis Fernando D'Haro:
Multimodal Audio-Language Model for Speech Emotion Recognition. 288-295 - Adrien Lafore, Clément Pagés, Leila Moudjari, Sebastião Quintas, Hervé Bredin, Thomas Pellegrini, Farah Benamara, Isabelle Ferrané, Jérôme Bertrand, Marie-Françoise Bertrand, Véronique Moriceau, Jérôme Farinas:
IRIT-MFU Multi-modal systems for emotion classification for Odyssey 2024 challenge. 296-302 - Daria Diatlova, Anton Udalov, Vitalii Shutov, Egor Spirin:
Adapting WavLM for Speech Emotion Recognition. 303-308 - Jarod Duret, Yannick Estève, Mickael Rouvier:
MSP-Podcast SER Challenge 2024: L'antenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition. 309-314
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.