嘘~ 正在从服务器偷取页面 . . .


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-01-10 更新

H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving

Authors:Siran Chen, Yuxiao Luo, Yue Ma, Yu Qiao, Yali Wang

With the prevalence of Multimodal Large Language Models(MLLMs), autonomous driving has encountered new opportunities and challenges. In particular, multi-modal video understanding is critical to interactively analyze what will happen in the procedure of autonomous driving. However, videos in such a dynamical scene that often contains complex spatial-temporal movements, which restricts the generalization capacity of the existing MLLMs in this field. To bridge the gap, we propose a novel Hierarchical Mamba Adaptation (H-MBA) framework to fit the complicated motion changes in autonomous driving videos. Specifically, our H-MBA consists of two distinct modules, including Context Mamba (C-Mamba) and Query Mamba (Q-Mamba). First, C-Mamba contains various types of structure state space models, which can effectively capture multi-granularity video context for different temporal resolutions. Second, Q-Mamba flexibly transforms the current frame as the learnable query, and attentively selects multi-granularity video context into query. Consequently, it can adaptively integrate all the video contexts of multi-scale temporal resolutions to enhance video understanding. Via a plug-and-play paradigm in MLLMs, our H-MBA shows the remarkable performance on multi-modal video tasks in autonomous driving, e.g., for risk object detection, it outperforms the previous SOTA method with 5.5% mIoU improvement.



PDF 7 pages, 4 figures



Key Takeaways

  1. 多模态大型语言模型(MLLMs)在自动驾驶领域带来新的机遇和挑战。
  2. 多模态视频理解对分析自动驾驶过程中的交互至关重要。
  3. 现有MLLMs在应对自动驾驶视频中复杂动态场景时存在局限性。
  4. 提出了分层玛姆巴适应(H-MBA)框架以应对复杂运动变化的自动驾驶视频。
  5. H-MBA框架包括捕捉多粒度视频上下文的C-Mamba模块和将当前帧转换为查询的Q-Mamba模块。
  6. H-MBA框架能自适应融合多尺度时间分辨率的视频上下文,提升视频理解。

Cool Papers


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !