MMT

发布日期: 2025-08-09

更新日期: 2025-08-20

文章字数: 1.1k

阅读时长: 4 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-08-09 更新

MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs

Authors:Yufei Gao, Jiaying Fei, Nuo Chen, Ruirui Chen, Guohang Yan, Yunshi Lan, Botian Shi

Multimodal Large Language Models (MLLMs) have shown remarkable performance in high-resource languages. However, their effectiveness diminishes significantly in the contexts of low-resource languages. Current multilingual enhancement methods are often limited to text modality or rely solely on machine translation. While such approaches help models acquire basic linguistic capabilities and produce “thin descriptions”, they neglect the importance of multimodal informativeness and cultural groundedness, both of which are crucial for serving low-resource language users effectively. To bridge this gap, in this study, we identify two significant objectives for a truly effective MLLM in low-resource language settings, namely 1) linguistic capability and 2) cultural groundedness, placing special emphasis on cultural awareness. To achieve these dual objectives, we propose a dual-source strategy that guides the collection of data tailored to each goal, sourcing native web alt-text for culture and MLLM-generated captions for linguistics. As a concrete implementation, we introduce MELLA, a multimodal, multilingual dataset. Experiment results show that after fine-tuning on MELLA, there is a general performance improvement for the eight languages on various MLLM backbones, with models producing “thick descriptions”. We verify that the performance gains are from both cultural knowledge enhancement and linguistic capability enhancement. Our dataset can be found at https://opendatalab.com/applyMultilingualCorpus.

多模态大型语言模型（MLLMs）在高资源语言中表现出了显著的性能。然而，它们在低资源语言的背景下有效性会大大降低。当前的多语种增强方法通常局限于文本模式或仅依赖于机器翻译。虽然这些方法有助于模型获得基本语言能力并产生“简洁描述”，但它们忽视了多模态信息的重要性和文化根基的重要性，这两者对于有效地为低资源语言用户服务都是至关重要的。为了弥补这一差距，本研究中，我们确定了在低资源语言环境中真正有效的MLLM的两个重要目标，即1）语言能力和2）文化根基，特别强调了文化意识的重要性。为了实现这两个目标，我们提出了双源策略，该策略指导针对每个目标定制的数据收集，从网络alt文本中采集文化数据，从MLLM生成的描述中采集语言学数据。作为具体实现，我们介绍了MELLA，这是一个多模态、多语言的数据集。实验结果表明，在MELLA上进行微调后，八种语言的MLLM主干模型的整体性能得到了提高，模型能够产生“丰富描述”。我们验证了性能提升既来自于文化知识的增强，也来自于语言能力的提升。我们的数据集可在https://opendatalab.com/applyMultilingualCorpus找到。

论文及项目相关链接

PDF

Summary

本文研究了多模态大型语言模型在低资源语言环境下的表现，并提出了一个双源策略来解决该问题。该研究强调了文化意识和语言能力的重要性，并引入了MELLA数据集来提高模型的性能。实验结果显示，使用MELLA数据集精细调整后的模型性能有所提高，且改进来源于文化知识的增强和语言能力的提升。数据集可通过 https://opendatalab.com/applyMultilingualCorpus 下载。

Key Takeaways

多模态大型语言模型在高资源语言环境下表现优异，但在低资源语言环境下有效性降低。
当前的多语种增强方法常常局限于文本模式或者依赖机器翻译，忽视了多模态信息和文化背景的重要性。
为了解决这一问题，提出了双源策略，包括两个目标：语言能力和文化根植性。
双源策略的数据采集来源为本地网页替代文本和MLLM生成的标题，以实现这两个目标。
MELLA数据集被引入作为一个具体实现方式，其是支持多模态和多语言的。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-08-09/MMT/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

MMT

Few-Shot

Few-Shot 方向最新论文已更新，请持续关注 Update in 2025-08-09 SMOL-MapSeg Show Me One Label

2025-08-09 Few-Shot

Few-Shot

Agent

Agent 方向最新论文已更新，请持续关注 Update in 2025-08-09 MV-Debate Multi-view Agent Debate with Dynamic Reflection Gating for Multimodal Harmful Content Detection in Social Media

2025-08-09 Agent

Agent