⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-20 更新
SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation
Authors:Junjie Jiang, Zelin Wang, Manqi Zhao, Yin Li, DongSheng Jiang
Inspired by Segment Anything 2, which generalizes segmentation from images to videos, we propose SAM2MOT–a novel segmentation-driven paradigm for multi-object tracking that breaks away from the conventional detection-association framework. In contrast to previous approaches that treat segmentation as auxiliary information, SAM2MOT places it at the heart of the tracking process, systematically tackling challenges like false positives and occlusions. Its effectiveness has been thoroughly validated on major MOT benchmarks. Furthermore, SAM2MOT integrates pre-trained detector, pre-trained segmentor with tracking logic into a zero-shot MOT system that requires no fine-tuning. This significantly reduces dependence on labeled data and paves the way for transitioning MOT research from task-specific solutions to general-purpose systems. Experiments on DanceTrack, UAVDT, and BDD100K show state-of-the-art results. Notably, SAM2MOT outperforms existing methods on DanceTrack by +2.1 HOTA and +4.5 IDF1, highlighting its effectiveness in MOT. Code is available at https://github.com/TripleJoy/SAM2MOT.
受Segment Anything 2的启发,它将图像分割技术泛化到视频领域,我们提出了SAM2MOT——一种用于多目标跟踪的新型分割驱动范式,它打破了传统的检测关联框架。与之前将分割视为辅助信息的方法不同,SAM2MOT将其置于跟踪过程的核心,系统解决误报和遮挡等挑战。它在主要的MOT基准测试上的有效性得到了充分验证。此外,SAM2MOT将预训练检测器、预训练分割器与跟踪逻辑集成到一个无需微调即时的零射MOs系统。这极大地降低了对标注数据的依赖,为MOT研究从任务特定解决方案转向通用系统铺平了道路。在DanceTrack、UAVDT和BDD100K上的实验均显示出了最先进的成果。值得注意的是,SAM2MOT在DanceTrack上的HOTA高出+2.1,+IDF高出+4.5,突显其在MOT中的有效性。代码可在https://github.com/TripleJoy/SAM2MOT获取。
论文及项目相关链接
Summary
基于Segment Anything 2的启发,我们提出了SAM2MOT——一种面向多目标跟踪的新型分割驱动范式,它摒弃了传统的检测关联框架。SAM2MOT将分割置于跟踪过程的核心,有效应对误报和遮挡等挑战。其在主要的多目标跟踪基准测试上的有效性得到了充分验证。此外,SAM2MOT将预训练的检测器、预训练的分割器与跟踪逻辑相结合,构建了一个无需微调即可工作的零射击多目标跟踪系统,极大地降低了对标记数据的依赖,为过渡研究多目标跟踪任务专用解决方案打下了坚实的基础通用系统道路。实验在DanceTrack、UAVDT和BDD100K上进行,展现出卓越的性能表现。值得一提的是,SAM2MOT在DanceTrack上的表现优于现有方法,提高了HOTA得分+2.1和IDF1得分+4.5。代码可通过以下链接获取:https://github.com/TripleJoy/SAM2MOT。
Key Takeaways
- SAM2MOT是一种基于分割驱动的多目标跟踪范式,不同于传统的检测关联框架。
- SAM2MOT将分割置于跟踪过程的核心,以应对误报和遮挡等挑战。
- 在主要的多目标跟踪基准测试上验证了SAM2MOT的有效性。
- SAM2MOT构建了一个零射击多目标跟踪系统,整合了预训练检测器和分割器,降低了对标记数据的依赖。
- SAM2MOT在DanceTrack上的表现优于其他方法,提高了HOTA和IDF1得分。
点此查看论文截图
Efficient Fourier Filtering Network with Contrastive Learning for AAV-based Unaligned Bimodal Salient Object Detection
Authors:Pengfei Lyu, Pak-Hei Yeung, Xiaosheng Yu, Xiufei Cheng, Chengdong Wu, Jagath C. Rajapakse
Autonomous aerial vehicle (AAV)-based bi-modal salient object detection (BSOD) aims to segment salient objects in a scene utilizing complementary cues in unaligned RGB and thermal image pairs. However, the high computational expense of existing AAV-based BSOD models limits their applicability to real-world AAV devices. To address this problem, we propose an efficient Fourier filter network with contrastive learning that achieves both real-time and accurate performance. Specifically, we first design a semantic contrastive alignment loss to align the two modalities at the semantic level, which facilitates mutual refinement in a parameter-free way. Second, inspired by the fast Fourier transform that obtains global relevance in linear complexity, we propose synchronized alignment fusion, which aligns and fuses bi-modal features in the channel and spatial dimensions by a hierarchical filtering mechanism. Our proposed model, AlignSal, reduces the number of parameters by 70.0%, decreases the floating point operations by 49.4%, and increases the inference speed by 152.5% compared to the cutting-edge BSOD model (i.e., MROS). Extensive experiments on the AAV RGB-T 2400 and seven bi-modal dense prediction datasets demonstrate that AlignSal achieves both real-time inference speed and better performance and generalizability compared to nineteen state-of-the-art models across most evaluation metrics. In addition, our ablation studies further verify AlignSal’s potential in boosting the performance of existing aligned BSOD models on AAV-based unaligned data. The code is available at: https://github.com/JoshuaLPF/AlignSal.
基于自主空中车辆(AAV)的双模态显著目标检测(BSOD)旨在利用未对齐的RGB和红外图像对中的互补线索来分割场景中的显著目标。然而,现有的基于AAV的BSOD模型计算成本高,限制了它们在真实世界AAV设备中的应用。为了解决这个问题,我们提出了一种高效的傅里叶滤波器网络,该网络采用对比学习,实现了实时和准确的性能。具体来说,我们首先设计了一种语义对比对齐损失,在语义层面上对齐两种模态,以无参数的方式促进相互细化。其次,受到快速傅里叶变换以线性复杂度获得全局相关性的启发,我们提出了同步对齐融合,通过分层过滤机制在通道和空间维度上对双模态特征进行对齐和融合。我们提出的AlignSal模型将参数数量减少了70.0%,将浮点运算减少了49.4%,并且相对于最先进的BSOD模型(即MROS)提高了推理速度达152.5%。在AAV RGB-T 2400和七个双模态密集预测数据集上的大量实验表明,AlignSal在大多数评估指标上实现了实时推理速度和更好的性能及泛化能力。此外,我们的消融研究进一步验证了AlignSal在提升现有对齐BSOD模型在基于AAV的未对齐数据上的潜力。代码可在https://github.com/JoshuaLPF/AlignSal处获取。
论文及项目相关链接
PDF Accepted by TGRS 2025
Summary
本文介绍了基于自主空中车辆(AAV)的双模态显著目标检测(BSOD)技术。针对现有模型的计算成本高的问题,提出了基于傅里叶滤波网络和对比学习的解决方案。通过设计语义对比对齐损失,实现两种模态在语义层面的对齐,促进相互细化且无需参数。结合快速傅里叶变换获得全局相关性的思想,提出同步对齐融合方法,通过分层过滤机制在通道和空间维度上对齐和融合双模态特征。提出的AlignSal模型减少了参数和计算量,提高了推理速度,在先进BSOD模型上取得了实时性能并保持了更好的性能和泛化能力。
Key Takeaways
- AAV-based bi-modal salient object detection (BSOD)旨在利用未对齐的RGB和红外图像对中的互补线索来分割场景中的显著目标。
- 现有BSOD模型存在高计算成本问题,限制了其在真实世界AAV设备中的应用。
- 提出了一种高效的傅里叶滤波网络,该网络结合对比学习实现了实时和准确的性能。
- 通过设计语义对比对齐损失,实现了两种模态的语义层面对齐,促进了无参数的相互细化。
- 同步对齐融合方法结合了快速傅里叶变换思想,通过分层过滤机制在通道和空间维度上对齐和融合双模态特征。
- AlignSal模型相比先进BSOD模型减少了参数和计算量,提高了推理速度。
- AlignSal模型在多个数据集上的表现优于其他先进模型,具有实时性能、更好性能和泛化能力。