⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-10-23 更新
Beyond Single Images: Retrieval Self-Augmented Unsupervised Camouflaged Object Detection
Authors:Ji Du, Xin Wang, Fangwei Hao, Mingyang Yu, Chunyuan Chen, Jiesheng Wu, Bin Wang, Jing Xu, Ping Li
At the core of Camouflaged Object Detection (COD) lies segmenting objects from their highly similar surroundings. Previous efforts navigate this challenge primarily through image-level modeling or annotation-based optimization. Despite advancing considerably, this commonplace practice hardly taps valuable dataset-level contextual information or relies on laborious annotations. In this paper, we propose RISE, a RetrIeval SElf-augmented paradigm that exploits the entire training dataset to generate pseudo-labels for single images, which could be used to train COD models. RISE begins by constructing prototype libraries for environments and camouflaged objects using training images (without ground truth), followed by K-Nearest Neighbor (KNN) retrieval to generate pseudo-masks for each image based on these libraries. It is important to recognize that using only training images without annotations exerts a pronounced challenge in crafting high-quality prototype libraries. In this light, we introduce a Clustering-then-Retrieval (CR) strategy, where coarse masks are first generated through clustering, facilitating subsequent histogram-based image filtering and cross-category retrieval to produce high-confidence prototypes. In the KNN retrieval stage, to alleviate the effect of artifacts in feature maps, we propose Multi-View KNN Retrieval (MVKR), which integrates retrieval results from diverse views to produce more robust and precise pseudo-masks. Extensive experiments demonstrate that RISE outperforms state-of-the-art unsupervised and prompt-based methods. Code is available at https://github.com/xiaohainku/RISE.
隐蔽物体检测的核心在于从与物体高度相似的周围环境中分割出物体。之前的研究主要通过图像级别的建模或基于标注的优化来应对这一挑战。尽管这种方法相当普遍且有所进展,但它很少利用宝贵的数据集级别的上下文信息,也依赖于繁琐的标注。在本文中,我们提出了RISE,一种利用整个训练数据集为单张图像生成伪标签的检索自增强范式,可用于训练COD模型。RISE首先使用训练图像(无真实标签)构建环境和隐蔽物体的原型库,然后通过K近邻检索(KNN)基于这些库为每张图像生成伪掩码。值得注意的是,仅使用无标注的训练图像在构建高质量的原型库时构成了明显的挑战。鉴于此,我们引入了一种先聚类再检索(CR)的策略,首先通过聚类生成粗略的掩码,然后基于直方图进行图像过滤和跨类别检索,以产生高置信度的原型。在KNN检索阶段,为了减轻特征图中伪影的影响,我们提出了多视图KNN检索(MVKR),它通过集成不同视图的检索结果来生成更稳健、更精确的伪掩码。大量实验表明,RISE在性能上超越了最新的无监督方法和基于提示的方法。代码可访问https://github.com/xiaohainku 详细信息请点击下方链接查看全文。
论文及项目相关链接
PDF ICCV 2025
Summary
本文提出了一种名为RISE的新的自我增强方法,用于解决伪装目标检测(COD)中的核心挑战。该方法通过构建原型库并利用K最近邻(KNN)检索技术生成伪标签来训练COD模型。使用无标注的训练图像创建高质量原型库是一大挑战,因此引入了先聚类后检索(CR)的策略。同时,提出多视角KNN检索(MVKR)来减少特征图中的伪影影响,生成更稳健和精确的伪标签。实验表明,RISE在性能上超越了现有的无监督和基于提示的方法。
Key Takeaways
- RISE是一种自我增强方法,用于解决伪装目标检测(COD)中的挑战。
- 该方法通过构建原型库并利用KNN检索技术生成伪标签来训练模型。
- 使用无标注的训练图像创建高质量原型库是一大挑战,因此引入了CR策略。
- 提出MVKR方法来减少特征图中的伪影影响,提高伪标签的准确性和稳健性。
- RISE在性能上超越了现有的无监督和基于提示的方法。
点此查看论文截图
MUSE: Model-based Uncertainty-aware Similarity Estimation for zero-shot 2D Object Detection and Segmentation
Authors:Sungmin Cho, Sungbum Park, Insoo Oh
In this work, we introduce MUSE (Model-based Uncertainty-aware Similarity Estimation), a training-free framework designed for model-based zero-shot 2D object detection and segmentation. MUSE leverages 2D multi-view templates rendered from 3D unseen objects and 2D object proposals extracted from input query images. In the embedding stage, it integrates class and patch embeddings, where the patch embeddings are normalized using generalized mean pooling (GeM) to capture both global and local representations efficiently. During the matching stage, MUSE employs a joint similarity metric that combines absolute and relative similarity scores, enhancing the robustness of matching under challenging scenarios. Finally, the similarity score is refined through an uncertainty-aware object prior that adjusts for proposal reliability. Without any additional training or fine-tuning, MUSE achieves state-of-the-art performance on the BOP Challenge 2025, ranking first across the Classic Core, H3, and Industrial tracks. These results demonstrate that MUSE offers a powerful and generalizable framework for zero-shot 2D object detection and segmentation.
在这项工作中,我们介绍了无需训练的MUSE(基于模型的不确定性感知相似度估计)框架,该框架旨在实现基于模型的零样本2D目标检测和分割。MUSE利用从未见过的3D对象的2D多视图模板和从输入查询图像中提取的2D目标提案。在嵌入阶段,它集成了类和补丁嵌入,其中补丁嵌入通过使用广义平均池化(GeM)进行归一化,以有效地捕获全局和局部表示。在匹配阶段,MUSE采用联合相似度度量,结合绝对和相对相似度分数,以提高在具有挑战性的场景下的匹配鲁棒性。最后,通过感知不确定性的对象先验对相似度得分进行细化,以调整提案的可靠性。无需任何额外的训练或微调,MUSE在BOP Challenge 2025上达到了最先进的性能水平,在Classic Core、H3和Industrial赛道上均排名第一。这些结果表明,MUSE为无样本的二维目标检测和分割提供了一个强大且可推广的框架。
论文及项目相关链接
PDF 11 pages with 6 figures
Summary
模型无关的零样本二维目标检测和分割框架MUSE被引入。它利用三维未见物体的二维多视图模板和输入查询图像中的二维目标提案。通过嵌入阶段和匹配阶段的处理,MUSE实现了高效的全局和局部表示,并在不确定性感知的对象先验知识下调整提案的可靠性,提升了匹配的稳健性。在不进行任何额外训练或微调的情况下,MUSE在BOP挑战2025上取得了卓越的性能,在经典核心、H3和工业领域均排名第一,展现出其强大的通用性。
Key Takeaways
- MUSE是一个训练外的框架,用于模型无关的零样本二维目标检测和分割。
- 该框架利用三维未见物体的二维多视图模板和二维目标提案。
- 通过嵌入阶段和匹配阶段处理,MUSE结合绝对和相对相似度评分来提升匹配的稳健性。
- MUSE通过GeM(广义均值池化)技术实现了高效的全局和局部表示。
- MUSE引入了一个不确定性感知的对象先验知识,用以调整提案的可靠性。
- MUSE在BOP挑战2025上的性能卓越,多个领域均排名第一。
点此查看论文截图