发布日期: 2025-09-16

更新日期: 2025-10-07

文章字数: 872

阅读时长: 3 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-16 更新

Prototypical Contrastive Learning For Improved Few-Shot Audio Classification

Authors:Christos Sgouropoulos, Christos Nikou, Stefanos Vlachos, Vasileios Theiou, Christos Foukanelis, Theodoros Giannakopoulos

Few-shot learning has emerged as a powerful paradigm for training models with limited labeled data, addressing challenges in scenarios where large-scale annotation is impractical. While extensive research has been conducted in the image domain, few-shot learning in audio classification remains relatively underexplored. In this work, we investigate the effect of integrating supervised contrastive loss into prototypical few shot training for audio classification. In detail, we demonstrate that angular loss further improves the performance compared to the standard contrastive loss. Our method leverages SpecAugment followed by a self-attention mechanism to encapsulate diverse information of augmented input versions into one unified embedding. We evaluate our approach on MetaAudio, a benchmark including five datasets with predefined splits, standardized preprocessing, and a comprehensive set of few-shot learning models for comparison. The proposed approach achieves state-of-the-art performance in a 5-way, 5-shot setting.

少量学习已成为一种强大的模式识别技术，用于训练具有有限标记数据的模型，解决大规模标注不切实际的情况下的挑战。虽然图像领域的研究已经广泛开展，但在音频分类中的少量学习仍然相对缺乏研究。在这项工作中，我们研究了将监督对比损失集成到原型少量射击训练中对音频分类的影响。具体来说，我们证明了与标准对比损失相比，角损失进一步提高了性能。我们的方法采用SpecAugment，随后通过自注意力机制将增强输入版本的多样化信息封装到一个统一的嵌入中。我们在MetaAudio上评估了我们的方法，这是一个包括五个数据集、具有预定义分割、标准化预处理和一套用于比较的小样本学习模型的基准测试。所提出的方法在5路、5次射击的设置中达到了最先进的性能。

论文及项目相关链接

PDF Accepted and Presented at IEEE International Workshop on Machine Learning for Signal Processing, Aug.\ 31– Sep.\ 3, 2025, Istanbul, Turkey , 6 pages, 2 figures, 1 table

Summary

本文探讨了将监督对比损失融入原型少样本音频分类训练的效果。通过引入角度损失，提高了对比损失的性能。该方法采用SpecAugment和自注意力机制，将增强输入版本的多样化信息封装到一个统一的嵌入空间中。在MetaAudio基准测试上，该方法在5类、5样本的场景下实现了最先进的性能。

Key Takeaways