MMT

发布日期: 2025-01-10

更新日期: 2025-01-10

文章字数: 1k

阅读时长: 4 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-01-10 更新

Authors:Yuxuan Zhou, Mario Fritz, Margret Keuper

SoftMax is a ubiquitous ingredient of modern machine learning algorithms. It maps an input vector onto a probability simplex and reweights the input by concentrating the probability mass at large entries. Yet, as a smooth approximation to the Argmax function, a significant amount of probability mass is distributed to other, residual entries, leading to poor interpretability and noise. Although sparsity can be achieved by a family of SoftMax variants, they often require an alternative loss function and do not preserve multi-modality. We show that this trade-off between multi-modality and sparsity limits the expressivity of SoftMax as well as its variants. We provide a solution to this tension between objectives by proposing a piece-wise differentiable function, termed MultiMax, which adaptively modulates the output distribution according to input entry range. Through comprehensive analysis and evaluation, we show that MultiMax successfully produces a distribution that supresses irrelevant entries while preserving multimodality, with benefits in image classification, language modeling and machine translation. The code is available at https://github.com/ZhouYuxuanYX/MultiMax.

SoftMax是现代机器学习算法中普遍存在的成分。它将输入向量映射到概率单纯形上，并通过在较大条目上集中概率质量来重新加权输入。然而，作为Argmax函数的平滑近似，大量的概率质量被分配到其他剩余条目上，导致解释性差和噪声。虽然可以通过一系列SoftMax变体实现稀疏性，但它们通常需要替代损失函数并且不能保持多峰性。我们表明，多峰性和稀疏性之间的权衡限制了SoftMax及其变体的表现力。我们通过一个称为MultiMax的分段可微函数来解决目标之间的这种紧张关系，该函数根据输入条目范围自适应地调制输出分布。通过全面分析和评估，我们证明MultiMax成功地产生了一种分布，该分布在保留多峰性的同时抑制了不相关的条目，对图像分类、语言建模和机器翻译都有好处。代码可在https://github.com/ZhouYuxuanYX/MultiMax找到。

论文及项目相关链接

PDF Accepted at ICML 2024

Summary

SoftMax作为现代机器学习算法中的常见成分，通过将输入向量映射到概率单纯形上并进行概率质量重新分配，从而实现输入的重加权。然而，由于其作为Argmax函数的平滑近似，会将大量概率质量分布到其他残余条目上，导致解释性差和噪声。尽管存在一些SoftMax变体可以实现稀疏性，但它们往往需要替代损失函数且无法保留多峰性。本研究提出了一种分段可导函数——MultiMax，可根据输入条目范围自适应调节输出分布，解决了这一在目标之间的权衡问题。综合分析和评估表明，MultiMax成功生成了一种分布，该分布在抑制不相关条目的同时保留了多峰性，并在图像分类、语言建模和机器翻译中带来了优势。相关代码可通过https://github.com/ZhouYuxuanYX/MultiMax获取。

Key Takeaways