发布日期: 2025-11-26

更新日期: 2025-11-27

文章字数: 5.3k

阅读时长: 21 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-26 更新

SAM3-Adapter: Efficient Adaptation of Segment Anything 3 for Camouflage Object Segmentation, Shadow Detection, and Medical Image Segmentation

Authors:Tianrun Chen, Runlong Cao, Xinda Yu, Lanyun Zhu, Chaotao Ding, Deyi Ji, Cheng Chen, Qi Zhu, Chunyan Xu, Papa Mao, Ying Zang

The rapid rise of large-scale foundation models has reshaped the landscape of image segmentation, with models such as Segment Anything achieving unprecedented versatility across diverse vision tasks. However, previous generations-including SAM and its successor-still struggle with fine-grained, low-level segmentation challenges such as camouflaged object detection, medical image segmentation, cell image segmentation, and shadow detection. To address these limitations, we originally proposed SAM-Adapter in 2023, demonstrating substantial gains on these difficult scenarios. With the emergence of Segment Anything 3 (SAM3)-a more efficient and higher-performing evolution with a redesigned architecture and improved training pipeline-we revisit these long-standing challenges. In this work, we present SAM3-Adapter, the first adapter framework tailored for SAM3 that unlocks its full segmentation capability. SAM3-Adapter not only reduces computational overhead but also consistently surpasses both SAM and SAM2-based solutions, establishing new state-of-the-art results across multiple downstream tasks, including medical imaging, camouflaged (concealed) object segmentation, and shadow detection. Built upon the modular and composable design philosophy of the original SAM-Adapter, SAM3-Adapter provides stronger generalizability, richer task adaptability, and significantly improved segmentation precision. Extensive experiments confirm that integrating SAM3 with our adapter yields superior accuracy, robustness, and efficiency compared to all prior SAM-based adaptations. We hope SAM3-Adapter can serve as a foundation for future research and practical segmentation applications. Code, pre-trained models, and data processing pipelines are available.

大规模基础模型的迅速崛起已经重塑了图像分割领域的格局，例如Segment Anything模型在各种视觉任务中实现了前所未有的多功能性。然而，前几代产品，包括SAM及其后续产品，仍面临着精细粒度、低级别的分割挑战，如伪装目标检测、医学图像分割、细胞图像分割和阴影检测等。为了解决这些局限性，我们在2023年率先提出了SAM-Adapter，并在这些困难场景下实现了显著的提升。随着Segment Anything 3（SAM3）的出现——一个更高效、性能更高的进化版本，拥有重新设计的架构和改进的训练流程——我们重新考虑了这些长期存在的挑战。在这项工作中，我们推出了SAM3-Adapter，这是专为SAM3定制的第一个适配器框架，充分发挥了其全段分割能力。SAM3-Adapter不仅减少了计算开销，而且一致地超越了SAM和SAM2解决方案，在多个下游任务上创下了最新记录，包括医学成像、伪装（隐蔽）目标分割和阴影检测。基于原始SAM-Adapter的模块化、可组合的设计理念，SAM3-Adapter提供了更强的通用性、更丰富的任务适应性和显著改进的分割精度。大量实验证实，与先前的SAM基于适配器的集成相比，将SAM3与我们的适配器相结合可实现更高的准确性、稳健性和效率。我们希望SAM3-Adapter能为未来的研究和实际应用中的分割任务奠定基础。代码、预训练模型和数据处理流程均可获得。

论文及项目相关链接

PDF

Summary

大尺度基础模型（如Segment Anything）的快速发展重塑了图像分割领域的格局，展现出前所未有的多任务通用性。然而，在精细粒度的低层次分割任务上，如伪装目标检测、医学图像分割、细胞图像分割和阴影检测等方面，SAM及其后续版本仍有局限性。为了解决这个问题，我们在提出了SAM-Adapter后又在最新工作推出了SAM3-Adapter。SAM3-Adapter是专为SAM3定制的第一个适配器框架，不仅减少了计算开销，而且在一系列下游任务上超越了SAM和SAM2的解决方案，包括医学成像、伪装目标分割和阴影检测等。基于模块化、可组合的设计思想，SAM3-Adapter提供了更强的通用性、丰富的任务适应性和显著的分割精度改进。实验证实，与所有先前的SAM适配方案相比，将SAM3与我们的适配器结合使用可实现更高的准确性、稳健性和效率。我们期望SAM3-Adapter能为未来的研究和实际应用提供基础。代码、预训练模型和数据处理管道均已公开可用。

Key Takeaways

大型基础模型如Segment Anything推动了图像分割的进步，具有多任务通用性。
在精细粒度的低层次分割任务上，先前的模型如SAM仍面临挑战。
SAM3-Adapter是专为SAM3定制的第一个适配器框架，旨在解决上述问题。
SAM3-Adapter减少了计算开销，并在多个下游任务上超越了SAM和SAM2的解决方案。
SAM3-Adapter基于模块化、可组合的设计思想，提供了更强的通用性、丰富的任务适应性和更高的分割精度。
与先前的SAM适配方案相比，结合SAM3与SAM3-Adapter使用可提高准确性、稳健性和效率。

Cool Papers

点此查看论文截图

SupLID: Geometrical Guidance for Out-of-Distribution Detection in Semantic Segmentation

Authors:Nimeshika Udayangani, Sarah Erfani, Christopher Leckie

Out-of-Distribution (OOD) detection in semantic segmentation aims to localize anomalous regions at the pixel level, advancing beyond traditional image-level OOD techniques to better suit real-world applications such as autonomous driving. Recent literature has successfully explored the adaptation of commonly used image-level OOD methods–primarily based on classifier-derived confidence scores (e.g., energy or entropy)–for this pixel-precise task. However, these methods inherit a set of limitations, including vulnerability to overconfidence. In this work, we introduce SupLID, a novel framework that effectively guides classifier-derived OOD scores by exploiting the geometrical structure of the underlying semantic space, particularly using Linear Intrinsic Dimensionality (LID). While LID effectively characterizes the local structure of high-dimensional data by analyzing distance distributions, its direct application at the pixel level remains challenging. To overcome this, SupLID constructs a geometrical coreset that captures the intrinsic structure of the in-distribution (ID) subspace. It then computes OOD scores at the superpixel level, enabling both efficient real-time inference and improved spatial smoothness. We demonstrate that geometrical cues derived from SupLID serve as a complementary signal to traditional classifier confidence, enhancing the model’s ability to detect diverse OOD scenarios. Designed as a post-hoc scoring method, SupLID can be seamlessly integrated with any semantic segmentation classifier at deployment time. Our results demonstrate that SupLID significantly enhances existing classifier-based OOD scores, achieving state-of-the-art performance across key evaluation metrics, including AUR, FPR, and AUP. Code is available at https://github.com/hdnugit/SupLID.

语义分割中的分布外（OOD）检测旨在以像素级别定位异常区域，超越了传统的图像级别OOD技术，更好地适应了自动驾驶等现实世界应用。近期文献已成功探索了常用的图像级OOD方法的适应性，这些方法主要基于分类器派生出的置信度分数（例如能量或熵），用于此像素精确的任务。然而，这些方法继承了一系列局限性，包括容易受到过度自信的影响。

论文及项目相关链接

PDF 10 pages, CIKM 2025

摘要

本文研究了语义分割中的异常检测（OOD），旨在实现像素级别的异常区域定位，以更好地适应自动驾驶等真实世界应用。虽然已有文献成功探索了基于分类器置信度得分的图像级OOD方法的适应性，但这些方法存在局限性。本文提出了一种新的框架SupLID，它通过利用线性内蕴维数（LID）的几何结构来指导分类器得出的OOD分数。SupLID构建了一个几何核心集，捕获了内分布（ID）子空间的内在结构，计算超像素级别的OOD分数，实现了实时推理和更好的空间平滑性。实验表明，SupLID几何线索作为传统分类器置信度的补充信号，提高了模型检测各种OOD场景的能力。作为一种事后评分方法，SupLID可以无缝集成到任何语义分割分类器中。实验结果显示，SupLID显著提高了基于分类器的OOD分数性能，实现了先进的性能评估指标。

关键见解

OOD检测在语义分割中用于定位异常区域，相较于传统的图像级OOD技术更适合真实世界应用如自动驾驶。
现有基于分类器的方法存在局限性，如易受过度自信的影响。
提出了一种新的框架SupLID，利用线性内蕴维数（LID）的几何结构来指导分类器得出的OOD分数。
SupLID通过构建几何核心集来捕捉内分布子空间的内在结构，实现超像素级别的OOD分数计算。
SupLID增强了模型检测各种OOD场景的能力，作为传统分类器置信度的补充信号。
SupLID可作为事后评分方法与任何语义分割分类器无缝集成。
实验结果显示SupLID显著提高了基于分类器的OOD分数性能，实现了先进的性能评估指标。

Cool Papers

点此查看论文截图

DriveFlow: Rectified Flow Adaptation for Robust 3D Object Detection in Autonomous Driving

Authors:Hongbin Lin, Yiming Yang, Chaoda Zheng, Yifan Zhang, Shuaicheng Niu, Zilu Guo, Yafeng Li, Gui Gui, Shuguang Cui, Zhen Li

In autonomous driving, vision-centric 3D object detection recognizes and localizes 3D objects from RGB images. However, due to high annotation costs and diverse outdoor scenes, training data often fails to cover all possible test scenarios, known as the out-of-distribution (OOD) issue. Training-free image editing offers a promising solution for improving model robustness by training data enhancement without any modifications to pre-trained diffusion models. Nevertheless, inversion-based methods often suffer from limited effectiveness and inherent inaccuracies, while recent rectified-flow-based approaches struggle to preserve objects with accurate 3D geometry. In this paper, we propose DriveFlow, a Rectified Flow Adaptation method for training data enhancement in autonomous driving based on pre-trained Text-to-Image flow models. Based on frequency decomposition, DriveFlow introduces two strategies to adapt noise-free editing paths derived from text-conditioned velocities. 1) High-Frequency Foreground Preservation: DriveFlow incorporates a high-frequency alignment loss for foreground to maintain precise 3D object geometry. 2) Dual-Frequency Background Optimization: DriveFlow also conducts dual-frequency optimization for background, balancing editing flexibility and semantic consistency. Comprehensive experiments validate the effectiveness and efficiency of DriveFlow, demonstrating comprehensive performance improvements on all categories across OOD scenarios. Code is available at https://github.com/Hongbin98/DriveFlow.

在自动驾驶领域，以视觉为中心的3D对象检测是通过RGB图像来识别和定位3D物体。然而，由于标注成本高昂和户外场景多样化，训练数据往往无法覆盖所有可能的测试场景，这被称为分布外（OOD）问题。无需训练的图像编辑通过数据增强提高模型稳健性，而无需对预训练的扩散模型进行任何修改。然而，基于反演的方法往往有效性有限且存在固有误差，而最近的基于校正流的方法在保持对象准确的3D几何形状方面存在困难。在本文中，我们提出了DriveFlow，这是一种基于预训练的文本到图像流模型的自动驾驶训练数据增强方法的校正流适应方法。基于频率分解，DriveFlow引入两种策略来适应文本条件速度派生的无噪声编辑路径。1）高频前景保留：DriveFlow结合高频对齐损失来保持前景的精确3D对象几何形状。2）双频背景优化：DriveFlow还对背景进行双频优化，平衡编辑灵活性和语义一致性。综合实验验证了DriveFlow的有效性和效率，在OOD场景的所有类别中均显示出全面的性能改进。相关代码可访问https://github.com/Hongbin98/DriveFlow。

论文及项目相关链接

PDF Accepted by AAAI 2026

Summary
针对自动驾驶中的视觉三维物体检测任务，面临训练数据难以覆盖所有可能的测试场景问题，本文提出了一种基于预训练文本到图像流模型的驱动流（DriveFlow）训练方法，用于增强训练数据的鲁棒性。通过频率分解，引入两种策略以适应无噪声编辑路径，包括高频率前景保留和双频率背景优化。实验证明，该方法在OOD场景下的性能得到显著提高。

Key Takeaways

自动驾驶中的视觉三维物体检测面临训练数据难以覆盖所有测试场景的问题。
训练数据增强是提高模型鲁棒性的有效方法。
DriveFlow是一种基于预训练文本到图像流模型的训练数据增强方法。
DriveFlow采用频率分解，引入高频率前景保留和双频率背景优化策略。
DriveFlow通过保留精确的三维物体几何结构并平衡编辑灵活性和语义一致性来提高模型性能。
实验证明，DriveFlow在OOD场景下的性能显著提高。

Cool Papers

点此查看论文截图

Exploring Surround-View Fisheye Camera 3D Object Detection

Authors:Changcai Li, Wenwei Lin, Zuoxun Hou, Gang Chen, Wei Zhang, Huihui Zhou, Weishi Zheng

In this work, we explore the technical feasibility of implementing end-to-end 3D object detection (3DOD) with surround-view fisheye camera system. Specifically, we first investigate the performance drop incurred when transferring classic pinhole-based 3D object detectors to fisheye imagery. To mitigate this, we then develop two methods that incorporate the unique geometry of fisheye images into mainstream detection frameworks: one based on the bird’s-eye-view (BEV) paradigm, named FisheyeBEVDet, and the other on the query-based paradigm, named FisheyePETR. Both methods adopt spherical spatial representations to effectively capture fisheye geometry. In light of the lack of dedicated evaluation benchmarks, we release Fisheye3DOD, a new open dataset synthesized using CARLA and featuring both standard pinhole and fisheye camera arrays. Experiments on Fisheye3DOD show that our fisheye-compatible modeling improves accuracy by up to 6.2% over baseline methods.

在这项工作中，我们探讨了使用鱼眼相机系统实现端到端3D对象检测（3DOD）的技术可行性。具体来说，我们首先研究了将基于针孔的经典3D对象检测器转移到鱼眼图像时产生的性能下降问题。为了缓解这种情况，然后我们开发了两种方法，将鱼眼图像的独特几何结构融入主流检测框架：一种基于鸟瞰图（BEV）范式的方法，名为FisheyeBEVDet；另一种基于查询范式的方法，名为FisheyePETR。这两种方法都采用球形空间表示法，以有效地捕捉鱼眼几何信息。考虑到缺乏专门的评估基准测试集，我们发布了Fisheye3DOD，这是一个使用CARLA合成的新开放数据集，具有标准针孔和鱼眼相机阵列两种功能。在Fisheye3DOD上的实验表明，我们的鱼眼兼容建模比基线方法提高了高达6.2%的准确度。

论文及项目相关链接

PDF 9 pages,6 figures, accepted at AAAI 2026

Summary

本文探讨了利用鱼眼相机系统实现端到端3D对象检测的技术可行性。研究团队首先研究了将传统的基于针孔摄像头的3D对象检测器转移到鱼眼图像时出现的性能下降问题。为了缓解这一问题，研究团队提出了两种结合鱼眼图像独特几何特性的检测方法，分别基于鸟瞰图（BEV）范式和基于查询的范式，命名为FisheyeBEVDet和FisheyePETR。两种方法均采用球面空间表示法有效捕捉鱼眼几何特征。由于缺少专用的评估基准测试集，研究团队还发布了名为Fisheye3DOD的合成数据集，该数据集使用CARLA制作，包含标准针孔摄像头和鱼眼摄像头阵列的图像。实验表明，与基线方法相比，研究团队的鱼眼兼容建模提高了高达6.2%的准确率。

Key Takeaways

本文探索了利用鱼眼相机系统实现端到端3D对象检测的可行性。
研究了将传统基于针孔摄像头的3D对象检测器转移到鱼眼图像时性能下降的问题。
提出了两种结合鱼眼图像独特几何特性的检测方法：FisheyeBEVDet和FisheyePETR。
FisheyeBEVDet基于鸟瞰图（BEV）范式，而FisheyePETR基于查询的范式。
两种检测方法均采用球面空间表示法来有效捕捉鱼眼图像的几何特征。
缺乏专用的评估基准测试集，因此研究团队发布了合成数据集Fisheye3DOD。

Cool Papers

点此查看论文截图

Matching-Based Few-Shot Semantic Segmentation Models Are Interpretable by Design

Authors:Pasquale De Marinis, Uzay Kaymak, Rogier Brussee, Gennaro Vessio, Giovanna Castellano

Few-Shot Semantic Segmentation (FSS) models achieve strong performance in segmenting novel classes with minimal labeled examples, yet their decision-making processes remain largely opaque. While explainable AI has advanced significantly in standard computer vision tasks, interpretability in FSS remains virtually unexplored despite its critical importance for understanding model behavior and guiding support set selection in data-scarce scenarios. This paper introduces the first dedicated method for interpreting matching-based FSS models by leveraging their inherent structural properties. Our Affinity Explainer approach extracts attribution maps that highlight which pixels in support images contribute most to query segmentation predictions, using matching scores computed between support and query features at multiple feature levels. We extend standard interpretability evaluation metrics to the FSS domain and propose additional metrics to better capture the practical utility of explanations in few-shot scenarios. Comprehensive experiments on FSS benchmark datasets, using different models, demonstrate that our Affinity Explainer significantly outperforms adapted standard attribution methods. Qualitative analysis reveals that our explanations provide structured, coherent attention patterns that align with model architectures and and enable effective model diagnosis. This work establishes the foundation for interpretable FSS research, enabling better model understanding and diagnostic for more reliable few-shot segmentation systems. The source code is publicly available at https://github.com/pasqualedem/AffinityExplainer.

基于很少样本的语义分割（FSS）模型在分割新类别时，只需极少的标注样本就能取得出色的性能，但它们的决策过程在很大程度上仍然不透明。尽管在标准的计算机视觉任务中，可解释性人工智能已经取得了重大进展，但在FSS中，尽管对于理解模型行为和在数据稀缺场景中指导支持集的选择至关重要，对可解释性的研究几乎还处于空白状态。本文引入了一种基于匹配的首个专门用于解读FSS模型的方法，该方法利用模型固有的结构特性。我们的亲和力解释器方法提取属性图，突出显示支持图像中的哪些像素点对查询分割预测贡献最大，使用计算的支持和查询特征之间的匹配分数进行多个特征级别的计算。我们将标准解释性评估指标扩展到FSS领域，并提出了额外的指标，以更好地捕捉少样本场景中解释的实际效用。在FSS基准数据集上对不同模型进行的综合实验表明，我们的亲和力解释器明显优于经过调整的标准归因方法。定性分析表明，我们的解释提供了结构化、连贯的注意力模式，与模型架构相符，并能实现有效的模型诊断。这项工作为可解释的FSS研究奠定了基础，为实现更可靠的少样本分割系统的模型理解和诊断提供了支持。源代码可在https://github.com com/pasqualedem/AffinityExplainer公开访问。

论文及项目相关链接

PDF

Summary：
少数样本语义分割（FSS）模型在分割新类别时表现出强大的性能，尤其在标注样本有限的情况下。然而，其决策过程仍然不够透明。尽管可解释的AI在标准计算机视觉任务中已经取得了显著进展，但在FSS中的可解释性仍然几乎没有被探索。本文首次提出了一种基于匹配的FSS模型的解释方法，利用模型的结构特性来提取解释。我们的Affinity Explainer方法通过计算支持图像和查询图像特征之间的匹配分数来提取属性图，突出显示哪些像素点对查询分割预测贡献最大。我们扩展了标准可解释性评价指标，并提出了额外的指标，以更好地捕捉少数情况下的实用性解释。实验证明，我们的Affinity Explainer在FSS基准数据集上的表现显著优于标准属性方法。定性分析表明，我们的解释提供结构清晰、连贯的注意力模式，与模型架构相匹配并能够实现有效的模型诊断。这为建立可解释的FSS研究奠定了基础，有助于建立更可靠的少数样本分割系统。

Key Takeaways: