嘘~ 正在从服务器偷取页面 . . .


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-03-04 更新

Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

Authors:Shaharukh Khan, Ayush Tarun, Ali Faraz, Palash Kamble, Vivek Dahiya, Praveen Pokala, Ashish Kulkarni, Chandra Khatri, Abhinav Ravi, Shubham Agarwal

In this work, we provide the system description of our submission as part of the English to Lowres Multimodal Translation Task at the Workshop on Asian Translation (WAT2024). We introduce Chitranuvad, a multimodal model that effectively integrates Multilingual LLM and a vision module for Multimodal Translation. Our method uses a ViT image encoder to extract visual representations as visual token embeddings which are projected to the LLM space by an adapter layer and generates translation in an autoregressive fashion. We participated in all the three tracks (Image Captioning, Text only and Multimodal translation tasks) for Indic languages (ie. English translation to Hindi, Bengali and Malyalam) and achieved SOTA results for Hindi in all of them on the Challenge set while remaining competitive for the other languages in the shared task.





Key Takeaways:

  1. 提交了作为英语到Lowres多媒体翻译任务的参与内容。
  2. 介绍了名为Chitranuvad的多媒体模型。
  3. 该模型结合了多语言LLM和视觉模块进行多媒体翻译。
  4. 使用ViT图像编码器提取视觉表示并将其转换为视觉令牌嵌入。
  5. 通过适配器层将视觉令牌嵌入投影到LLM空间。
  6. 在所有三个轨道的印地语翻译任务中取得了SOTA成果。

Cool Papers


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
Few-Shot Few-Shot
Few-Shot 方向最新论文已更新,请持续关注 Update in 2025-03-04 Improving Open-world Continual Learning under the Constraints of Scarce Labeled Data
Agent Agent
Agent 方向最新论文已更新,请持续关注 Update in 2025-03-04 Hybrid Team Tetris A New Platform For Hybrid Multi-Agent, Multi-Human Teaming