I2I Translation

2025-01-10 更新

MultiMax: Sparse and Multi-Modal Attention Learning

Authors:Yuxuan Zhou, Mario Fritz, Margret Keuper

SoftMax is a ubiquitous ingredient of modern machine learning algorithms. It maps an input vector onto a probability simplex and reweights the input by concentrating the probability mass at large entries. Yet, as a smooth approximation to the Argmax function, a significant amount of probability mass is distributed to other, residual entries, leading to poor interpretability and noise. Although sparsity can be achieved by a family of SoftMax variants, they often require an alternative loss function and do not preserve multi-modality. We show that this trade-off between multi-modality and sparsity limits the expressivity of SoftMax as well as its variants. We provide a solution to this tension between objectives by proposing a piece-wise differentiable function, termed MultiMax, which adaptively modulates the output distribution according to input entry range. Through comprehensive analysis and evaluation, we show that MultiMax successfully produces a distribution that supresses irrelevant entries while preserving multimodality, with benefits in image classification, language modeling and machine translation. The code is available at https://github.com/ZhouYuxuanYX/MultiMax.



PDF Accepted at ICML 2024



Key Takeaways

  1. SoftMax是机器学习中的关键成分,能将输入向量转化为概率分布。
  2. SoftMax存在解释性不足和噪声问题,因为概率质量不仅集中在大型条目上,还分布在其他条目上。
  3. 尽管存在多种SoftMax变体以实现稀疏性,但它们往往牺牲了多峰性。
  4. SoftMax及其变体在表达力上受到限制,因为多峰性和稀疏性之间的权衡。
  5. 提出了一种新的方法MultiMax,通过分段可微函数自适应调节输出分布。
  6. MultiMax能在抑制无关条目的同时保持多峰性,提高模型性能。

Cool Papers


