嘘~ 正在从服务器偷取页面 . . .

TTS


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-10-03 更新

MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

Authors:Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl, Konstantin Böttinger

Text-to-Speech (TTS) technology offers notable benefits, such as providing a voice for individuals with speech impairments, but it also facilitates the creation of audio deepfakes and spoofing attacks. AI-based detection methods can help mitigate these risks; however, the performance of such models is inherently dependent on the quality and diversity of their training data. Presently, the available datasets are heavily skewed towards English and Chinese audio, which limits the global applicability of these anti-spoofing systems. To address this limitation, this paper presents the Multi-Language Audio Anti-Spoofing Dataset (MLAAD), version 8, created using 119 TTS models, comprising 58 different architectures, to generate 570.3 hours of synthetic voice in 40 different languages. We train and evaluate three state-of-the-art deepfake detection models with MLAAD and observe that it demonstrates superior performance over comparable datasets like InTheWild and Fake-Or-Real when used as a training resource. Moreover, compared to the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately outperformed each other, each excelling on four datasets. By publishing MLAAD and making a trained model accessible via an interactive webserver, we aim to democratize anti-spoofing technology, making it accessible beyond the realm of specialists, and contributing to global efforts against audio spoofing and deepfakes.

文本转语音(TTS)技术提供了明显的优势,如为存在语言障碍的个人提供声音,但它也促进了音频深度伪造和欺骗攻击的产生。基于人工智能的检测方法可以有助于减轻这些风险;然而,此类模型的性能本质上取决于其训练数据的质量和多样性。目前,可用的数据集严重偏向于英语和中文音频,这限制了这些防欺骗系统的全球适用性。

论文及项目相关链接

PDF IJCNN 2024

Summary

基于文本转语音(TTS)技术的音频深伪(audio deepfakes)和欺骗攻击所带来的风险,研究人员创建了多语言音频反欺骗数据集(MLAAD)。该数据集包含来自不同架构的119个TTS模型生成的合成语音,涵盖多种语言。研究结果显示,使用MLAAD训练的先进深伪检测模型在多个数据集上的性能优于其他数据集。发布MLAAD并通过互动网站提供训练模型,旨在使反欺骗技术民主化,为全球的音频欺骗和深伪风险对抗做出贡献。

Key Takeaways

  1. TTS技术不仅为语言障碍者提供语音功能,也助长了音频深伪和欺骗攻击的产生。
  2. AI检测方法是减轻这些风险的有效手段,但其性能取决于训练数据的质量和多样性。
  3. 当前可用的数据集主要偏向英语和中文音频,限制了反欺骗系统的全球适用性。
  4. MLAAD数据集使用多种语言的合成语音,旨在解决上述限制。
  5. MLAAD数据集包含来自不同架构的119个TTS模型的合成语音,总时长超过570小时。
  6. 使用MLAAD训练的先进深伪检测模型在多个数据集上的性能优于其他数据集,如InTheWild和Fake-Or-Real。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 本篇
TTS TTS
TTS 方向最新论文已更新,请持续关注 Update in 2025-10-03 MLAAD The Multi-Language Audio Anti-Spoofing Dataset
2025-10-03
下一篇 
医学图像 医学图像
医学图像 方向最新论文已更新,请持续关注 Update in 2025-10-03 A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI
2025-10-03
  目录