嘘~ 正在从服务器偷取页面 . . .

检测/分割/跟踪


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-11-07 更新

MSDNet: Multi-Scale Decoder for Few-Shot Semantic Segmentation via Transformer-Guided Prototyping

Authors:Amirreza Fateh, Mohammad Reza Mohammadi, Mohammad Reza Jahed Motlagh

Few-shot Semantic Segmentation addresses the challenge of segmenting objects in query images with only a handful of annotated examples. However, many previous state-of-the-art methods either have to discard intricate local semantic features or suffer from high computational complexity. To address these challenges, we propose a new Few-shot Semantic Segmentation framework based on the Transformer architecture. Our approach introduces the spatial transformer decoder and the contextual mask generation module to improve the relational understanding between support and query images. Moreover, we introduce a multi scale decoder to refine the segmentation mask by incorporating features from different resolutions in a hierarchical manner. Additionally, our approach integrates global features from intermediate encoder stages to improve contextual understanding, while maintaining a lightweight structure to reduce complexity. This balance between performance and efficiency enables our method to achieve competitive results on benchmark datasets such as PASCAL-5^i and COCO-20^i in both 1-shot and 5-shot settings. Notably, our model with only 1.5 million parameters demonstrates competitive performance while overcoming limitations of existing methodologies. https://github.com/amirrezafateh/MSDNet

少数语义分割(Few-shot Semantic Segmentation)旨在解决在查询图像中进行对象分割的挑战,而只有少数标注样本可用。然而,许多先前最先进的方法在分割过程中要么需要丢弃复杂的局部语义特征,要么面临计算复杂度高的挑战。为了应对这些挑战,我们提出了一种基于Transformer架构的少数语义分割新框架。我们的方法引入了空间变换解码器和上下文掩码生成模块,以改善支持图像和查询图像之间的关系理解。此外,我们引入了多尺度解码器,以分层方式结合不同分辨率的特征来优化分割掩码。同时,我们的方法整合了中间编码器阶段的全局特征,以提高上下文理解,同时保持轻量级结构以降低复杂性。性能和效率之间的这种平衡使得我们的方法在PASCAL-5i和COCO-20i等基准数据集上能够在单样本和少样本情况下获得有竞争力的结果。值得注意的是,我们的模型仅有150万个参数就实现了竞争性能,克服了现有方法的局限性。代码库地址:https://github.com/amirrezafateh/MSDNet

论文及项目相关链接

PDF

Summary

基于Transformer架构,提出了一种新的少样本语义分割框架,通过引入空间变换解码器和上下文掩膜生成模块,提高了支持图像和查询图像之间的关系理解。同时采用多尺度解码器,以层次方式融入不同分辨率的特征来优化分割掩膜。该方法还融合了中间编码阶段的全局特征,以提高上下文理解,同时保持轻量化结构以降低复杂性。在PASCAL-5^i和COCO-20^i等基准数据集上,该方法在1-shot和5-shot设置下均取得有竞争力的结果。

Key Takeaways

  • 提出了基于Transformer架构的少样本语义分割新框架。
  • 通过空间变换解码器和上下文掩膜生成模块提高图像间关系理解。
  • 采用多尺度解码器,以层次方式融入不同分辨率特征。
  • 融合中间编码阶段的全球特征,提高上下文理解。
  • 保持模型轻量化以降低复杂性。
  • 在PASCAL-5^i和COCO-20^i等基准数据集上表现优异。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
Speech Speech
Speech 方向最新论文已更新,请持续关注 Update in 2025-11-07 Open Source State-Of-the-Art Solution for Romanian Speech Recognition
2025-11-07
下一篇 
Vision Transformer Vision Transformer
Vision Transformer 方向最新论文已更新,请持续关注 Update in 2025-11-07 Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models
  目录