发布日期: 2025-09-20

更新日期: 2025-11-27

文章字数: 867

阅读时长: 3 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-20 更新

DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut

Authors:Paul Couairon, Mustafa Shukor, Jean-Emmanuel Haugeard, Matthieu Cord, Nicolas Thome

Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks. While prior works have addressed unsupervised image segmentation, they significantly lag behind supervised models. In this paper, we use a diffusion UNet encoder as a foundation vision encoder and introduce DiffCut, an unsupervised zero-shot segmentation method that solely harnesses the output features from the final self-attention block. Through extensive experimentation, we demonstrate that the utilization of these diffusion features in a graph based segmentation algorithm, significantly outperforms previous state-of-the-art methods on zero-shot segmentation. Specifically, we leverage a recursive Normalized Cut algorithm that softly regulates the granularity of detected objects and produces well-defined segmentation maps that precisely capture intricate image details. Our work highlights the remarkably accurate semantic knowledge embedded within diffusion UNet encoders that could then serve as foundation vision encoders for downstream tasks. Project page at https://diffcut-segmentation.github.io

基础模型已经在语言、视觉和多模态任务等多个领域展现出强大的工具能力。尽管之前的工作已经解决了无监督图像分割的问题，但它们与有监督模型的性能差距仍然很大。在本文中，我们使用扩散UNet编码器作为基础视觉编码器，并引入了DiffCut，这是一种无监督零样本分割方法，它仅利用最终自注意力块的输出特征。通过广泛的实验，我们证明了在基于图的分割算法中使用这些扩散特征，在零样本分割方面显著优于以前的最先进方法。具体来说，我们利用递归归一化切割算法，该算法可以柔和地控制检测到的对象的粒度，并产生定义明确的分割图，能够精确地捕捉图像的细节。我们的工作突出了扩散UNet编码器中嵌入的精确语义知识，之后可以作为下游任务的基础视觉编码器。项目页面为https://diffcut-segmentation.github.io。

论文及项目相关链接

PDF NeurIPS 2024. Project page at https://diffcut-segmentation.github.io. Code at https://github.com/PaulCouairon/DiffCut

Summary

本文介绍了一种基于扩散UNet编码器的无监督零分割方法DiffCut。该方法仅利用最终自注意力块的输出特征，通过基于图的分割算法实现图像分割，显著优于先前的最新技术。研究团队使用递归归一化切割算法，可精细控制检测对象的粒度，并生成精确捕捉图像细节的清晰分割图。本研究突显了扩散UNet编码器中的精准语义知识，可作为下游任务的视觉基础编码器。

Key Takeaways

使用了扩散UNet编码器作为基础视觉编码器。
提出了无监督零分割方法DiffCut。
仅利用自注意力块的输出特征进行图像分割。
通过基于图的分割算法实现图像分割，性能显著优于先前的方法。
使用递归归一化切割算法，可控制检测对象的粒度。
生成的分割图精确捕捉了图像的细节。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-09-20/%E6%A3%80%E6%B5%8B_%E5%88%86%E5%89%B2_%E8%B7%9F%E8%B8%AA/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

检测/分割/跟踪

无监督/半监督/对比学习

无监督/半监督/对比学习方向最新论文已更新，请持续关注 Update in 2025-09-20 PVLM Parsing-Aware Vision Language Model with Dynamic Contrastive Learning for Zero-Shot Deepfake Attribution

2025-09-20 无监督/半监督/对比学习

无监督/半监督/对比学习

视频理解

视频理解方向最新论文已更新，请持续关注 Update in 2025-09-20 Dense Video Understanding with Gated Residual Tokenization

2025-09-20 视频理解

视频理解