TTS

发布日期: 2025-10-03

更新日期: 2025-11-27

文章字数: 865

阅读时长: 3 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-03 更新

MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Authors:Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger

Text-to-Speech (TTS) technology offers notable benefits, such as providing a voice for individuals with speech impairments, but it also facilitates the creation of audio deepfakes and spoofing attacks. AI-based detection methods can help mitigate these risks; however, the performance of such models is inherently dependent on the quality and diversity of their training data. Presently, the available datasets are heavily skewed towards English and Chinese audio, which limits the global applicability of these anti-spoofing systems. To address this limitation, this paper presents the Multi-Language Audio Anti-Spoofing Dataset (MLAAD), version 8, created using 119 TTS models, comprising 58 different architectures, to generate 570.3 hours of synthetic voice in 40 different languages. We train and evaluate three state-of-the-art deepfake detection models with MLAAD and observe that it demonstrates superior performance over comparable datasets like InTheWild and Fake-Or-Real when used as a training resource. Moreover, compared to the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately outperformed each other, each excelling on four datasets. By publishing MLAAD and making a trained model accessible via an interactive webserver, we aim to democratize anti-spoofing technology, making it accessible beyond the realm of specialists, and contributing to global efforts against audio spoofing and deepfakes.

文本转语音（TTS）技术提供了明显的优势，如为存在语言障碍的个人提供声音，但它也促进了音频深度伪造和欺骗攻击的产生。基于人工智能的检测方法可以有助于减轻这些风险；然而，此类模型的性能本质上取决于其训练数据的质量和多样性。目前，可用的数据集严重偏向于英语和中文音频，这限制了这些防欺骗系统的全球适用性。

论文及项目相关链接

PDF IJCNN 2024

Summary

基于文本转语音（TTS）技术的音频深伪（audio deepfakes）和欺骗攻击所带来的风险，研究人员创建了多语言音频反欺骗数据集（MLAAD）。该数据集包含来自不同架构的119个TTS模型生成的合成语音，涵盖多种语言。研究结果显示，使用MLAAD训练的先进深伪检测模型在多个数据集上的性能优于其他数据集。发布MLAAD并通过互动网站提供训练模型，旨在使反欺骗技术民主化，为全球的音频欺骗和深伪风险对抗做出贡献。

Key Takeaways