发布日期: 2025-05-14

更新日期: 2025-05-14

文章字数: 20.5k

阅读时长: 84 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-05-14 更新

U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord

Authors:Qi Zhang, Xiuyuan Chen, Ziyi He, Kun Wang, Lianming Wu, Hongxing Shen, Jianqi Sun

T2 hyperintensities in spinal cord MR images are crucial biomarkers for conditions such as degenerative cervical myelopathy. However, current clinical diagnoses primarily rely on manual evaluation. Deep learning methods have shown promise in lesion detection, but most supervised approaches are heavily dependent on large, annotated datasets. Unsupervised anomaly detection (UAD) offers a compelling alternative by eliminating the need for abnormal data annotations. However, existing UAD methods rely on curated normal datasets and their performance frequently deteriorates when applied to clinical datasets due to domain shifts. We propose an Uncertainty-based Unsupervised Anomaly Detection framework, termed U2AD, to address these limitations. Unlike traditional methods, U2AD is designed to be trained and tested within the same clinical dataset, following a “mask-and-reconstruction” paradigm built on a Vision Transformer-based architecture. We introduce an uncertainty-guided masking strategy to resolve task conflicts between normal reconstruction and anomaly detection to achieve an optimal balance. Specifically, we employ a Monte-Carlo sampling technique to estimate reconstruction uncertainty mappings during training. By iteratively optimizing reconstruction training under the guidance of both epistemic and aleatoric uncertainty, U2AD reduces overall reconstruction variance while emphasizing regions. Experimental results demonstrate that U2AD outperforms existing supervised and unsupervised methods in patient-level identification and segment-level localization tasks. This framework establishes a new benchmark for incorporating uncertainty guidance into UAD, highlighting its clinical utility in addressing domain shifts and task conflicts in medical image anomaly detection. Our code is available: https://github.com/zhibaishouheilab/U2AD

在脊髓核磁共振图像中，T2高信号强度是退行性脑脊髓病变等疾病的关键生物标志物。然而，目前的临床诊断主要依赖于人工评估。深度学习在病灶检测方面显示出潜力，但大多数监督方法严重依赖于大量标注数据集。无监督异常检测（UAD）提供了一个吸引人的替代方案，通过消除对异常数据标注的需要。然而，现有的UAD方法依赖于精选的正常数据集，当应用于临床数据集时，由于领域差异，其性能往往会下降。为了解决这些限制，我们提出了一个基于不确定性的无监督异常检测框架，称为U2AD。不同于传统方法，U2AD被设计成在同一个临床数据集中进行训练和测试，遵循一个基于视觉Transformer架构的“掩膜和重建”范式。我们引入了一个由不确定性引导的掩膜策略，以解决正常重建和异常检测之间的任务冲突，以实现最佳平衡。具体来说，我们采用蒙特卡洛采样技术在训练过程中估计重建不确定性映射。通过迭代优化在认识论和偶然不确定性指导下的重建训练，U2AD降低了整体重建方差，同时强调了区域重要性。实验结果表明，U2AD在患者级别识别和分段级别定位任务上的性能优于现有的监督和无监督方法。该框架为将不确定性指导融入UAD建立了新的基准，突显其在解决医学图像异常检测中的领域差异和任务冲突的临床实用性。我们的代码可通过以下链接获取：https://github.com/zhibaishouheilab/U2AD

论文及项目相关链接

PDF

Summary
本论文提出一种基于不确定性的无监督异常检测框架U2AD，用于解决脊髓核磁共振图像中T2高信号病灶检测的问题。该框架无需异常数据标注，通过“遮罩与重建”模式训练，引入不确定性引导策略来解决正常重建与异常检测的任务冲突。实验结果证明U2AD在患者级别和分段级别的定位任务中表现优异，展现出其在医学图像异常检测中解决领域偏移和任务冲突的临床实用性。

Key Takeaways

T2高信号是脊髓病变的关键生物标志物，目前主要通过手动评估进行临床诊断。
深度学习在病灶检测中展现出潜力，但大多数监督方法需要大量标注数据集。
无监督异常检测（UAD）提供了一种不需要异常数据标注的替代方案。
现有UAD方法依赖于精选的正常数据集，在应用于临床数据集时性能会下降，存在领域偏移问题。
提出的U2AD框架通过结合不确定性指导的遮罩策略和“遮罩与重建”模式来解决领域偏移和任务冲突。
U2AD使用Monte-Carlo采样技术估计重建不确定性映射，通过优化重建训练来减少整体重建方差并强调关键区域。

Cool Papers

点此查看论文截图

Scale Efficient Training for Large Datasets

Authors:Qing Zhou, Junyu Gao, Qi Wang

The rapid growth of dataset scales has been a key driver in advancing deep learning research. However, as dataset scale increases, the training process becomes increasingly inefficient due to the presence of low-value samples, including excessive redundant samples, overly challenging samples, and inefficient easy samples that contribute little to model improvement.To address this challenge, we propose Scale Efficient Training (SeTa) for large datasets, a dynamic sample pruning approach that losslessly reduces training time. To remove low-value samples, SeTa first performs random pruning to eliminate redundant samples, then clusters the remaining samples according to their learning difficulty measured by loss. Building upon this clustering, a sliding window strategy is employed to progressively remove both overly challenging and inefficient easy clusters following an easy-to-hard curriculum.We conduct extensive experiments on large-scale synthetic datasets, including ToCa, SS1M, and ST+MJ, each containing over 3 million samples.SeTa reduces training costs by up to 50% while maintaining or improving performance, with minimal degradation even at 70% cost reduction. Furthermore, experiments on various scale real datasets across various backbones (CNNs, Transformers, and Mambas) and diverse tasks (instruction tuning, multi-view stereo, geo-localization, composed image retrieval, referring image segmentation) demonstrate the powerful effectiveness and universality of our approach. Code is available at https://github.com/mrazhou/SeTa.

数据的快速增长是推动深度学习研究发展的关键驱动力。然而，随着数据集规模的增加，由于存在大量低价值样本，包括过多的冗余样本、过于挑战的样本以及几乎没有改进模型效率的易处理样本，训练过程变得越来越低效。为了应对这一挑战，我们为大型数据集提出了Scale Efficient Training（SeTa）方法，这是一种动态样本修剪方法，可无损地减少训练时间。为了去除低价值样本，SeTa首先执行随机修剪以消除冗余样本，然后根据损失衡量其学习难度对剩余样本进行聚类。在此聚类的基础上，采用滑动窗口策略逐步去除过于挑战和效率低的容易集群，遵循从易到难的课程安排。我们在大规模合成数据集上进行了广泛实验，包括ToCa、SS1M和ST+MJ，每个数据集都包含超过3百万个样本。SeTa在保持或提高性能的同时，将训练成本降低了高达50%，即使在减少70%的成本时也能保持最小的性能下降。此外，使用各种规模的真实数据集、各种主干网络（CNN、Transformer和Mambas）以及不同任务（指令调整、多视图立体、地理定位、组合图像检索、引用图像分割）的实验证明了我们的方法的强大效果和通用性。代码可通过以下网址获取：https://github.com/mrazhou/SeTa。

论文及项目相关链接

PDF Accepted by CVPR2025

Summary
随着数据集规模的快速增长，低价值样本的存在使得训练过程越来越低效。为此，提出了一种针对大数据集的Scale Efficient Training（SeTa）方法，通过动态样本修剪来无损减少训练时间。首先进行随机修剪以消除冗余样本，然后根据损失测量学习难度对剩余样本进行聚类。在此基础上，采用滑动窗口策略逐步去除过于困难和不高效的简单集群，遵循由易到难的课程。SeTa在大型合成数据集上进行了广泛实验，减少了50%的训练成本，同时保持或提高了性能，即使在70%的成本减少下也几乎没有退化。此外，在不同规模的真实数据集、各种骨干网络（CNN、Transformer和Mambas）和多样任务（指令调优、多视图立体、地理定位、组合图像检索、引用图像分割）上的实验，证明了该方法的有效性和普遍性。

Key Takeaways

数据集规模的快速增长是推动深度学习研究进步的关键因素之一。
随着数据集规模的增加，训练过程因低价值样本的存在而变得越来越低效。
SeTa是一种针对大数据集的动态样本修剪方法，能够无损减少训练时间。
SeTa通过随机修剪消除冗余样本，然后根据损失测量学习难度进行样本聚类。
SeTa采用滑动窗口策略逐步去除过于困难和不高效的简单集群，遵循由易到难的训练课程。
在大型合成数据集上的实验表明，SeTa能减少50%的训练成本，同时保持或提高性能。

Cool Papers

点此查看论文截图

Sampling Innovation-Based Adaptive Compressive Sensing

Authors:Zhifu Tian, Tao Hu, Chaoyang Niu, Di Wu, Shu Wang

Scene-aware Adaptive Compressive Sensing (ACS) has attracted significant interest due to its promising capability for efficient and high-fidelity acquisition of scene images. ACS typically prescribes adaptive sampling allocation (ASA) based on previous samples in the absence of ground truth. However, when confronting unknown scenes, existing ACS methods often lack accurate judgment and robust feedback mechanisms for ASA, thus limiting the high-fidelity sensing of the scene. In this paper, we introduce a Sampling Innovation-Based ACS (SIB-ACS) method that can effectively identify and allocate sampling to challenging image reconstruction areas, culminating in high-fidelity image reconstruction. An innovation criterion is proposed to judge ASA by predicting the decrease in image reconstruction error attributable to sampling increments, thereby directing more samples towards regions where the reconstruction error diminishes significantly. A sampling innovation-guided multi-stage adaptive sampling (AS) framework is proposed, which iteratively refines the ASA through a multi-stage feedback process. For image reconstruction, we propose a Principal Component Compressed Domain Network (PCCD-Net), which efficiently and faithfully reconstructs images under AS scenarios. Extensive experiments demonstrate that the proposed SIB-ACS method significantly outperforms the state-of-the-art methods in terms of image reconstruction fidelity and visual effects. Codes are available at https://github.com/giant-pandada/SIB-ACS_CVPR2025.

场景感知自适应压缩感知（ACS）因其对场景图像高效高保真采集的潜力而备受关注。ACS通常基于先前的样本进行自适应采样分配（ASA），但在缺乏真实依据的情况下会面临挑战。当面对未知场景时，现有的ACS方法往往缺乏准确的判断和稳健的ASA反馈机制，从而限制了场景的高保真感知。在本文中，我们引入了一种基于采样创新的ACS（SIB-ACS）方法，该方法可以有效地识别和分配采样到具有挑战性的图像重建区域，从而实现高保真图像重建。提出了一个创新标准来判断ASA，通过预测采样增量导致的图像重建误差减少来评估，从而将更多样本导向重建误差显著减少的区域。提出了一种采样创新引导的多阶段自适应采样（AS）框架，通过多阶段反馈过程迭代优化ASA。对于图像重建，我们提出了主成分压缩域网络（PCCD-Net），该网络能够在AS场景下高效忠实地重建图像。大量实验表明，所提出的SIB-ACS方法在图像重建保真度和视觉效果方面显著优于最新方法。代码可从https://github.com/giant-pandada/SIB-ACS_CVPR2025获取。

论文及项目相关链接

PDF CVPR2025 accepted

Summary

基于采样创新的自适应压缩感知（SIB-ACS）方法能够有效识别并分配采样到图像重建的困难区域，从而实现高保真图像重建。该方法通过预测采样增量对图像重建误差的减小来判断自适应采样分配（ASA），并引导更多样本到重建误差显著减小的区域。同时，提出了采样创新引导的多阶段自适应采样框架和主成分压缩域网络（PCCD-Net），在自适应采样场景下实现高效和真实的图像重建。实验表明，该方法的图像重建保真度和视觉效果显著优于现有方法。

Key Takeaways

SIB-ACS方法能有效应对未知场景的图像重建，通过采样创新机制识别并分配采样到图像重建的关键区域。
引入创新标准来判断ASA，通过预测采样增量对图像重建误差的影响来指导采样分配。
提出了多阶段自适应采样框架，通过迭代反馈过程优化ASA。
采用PCCD-Net进行图像重建，能在自适应采样条件下实现高效且真实的图像重建。
SIB-ACS方法显著提高了图像重建的保真度和视觉效果。
提供了代码公开链接供研究使用。

Cool Papers

点此查看论文截图

MedLoRD: A Medical Low-Resource Diffusion Model for High-Resolution 3D CT Image Synthesis

Authors:Marvin Seyfarth, Salman Ul Hassan Dar, Isabelle Ayx, Matthias Alexander Fink, Stefan O. Schoenberg, Hans-Ulrich Kauczor, Sandy Engelhardt

Advancements in AI for medical imaging offer significant potential. However, their applications are constrained by the limited availability of data and the reluctance of medical centers to share it due to patient privacy concerns. Generative models present a promising solution by creating synthetic data as a substitute for real patient data. However, medical images are typically high-dimensional, and current state-of-the-art methods are often impractical for computational resource-constrained healthcare environments. These models rely on data sub-sampling, raising doubts about their feasibility and real-world applicability. Furthermore, many of these models are evaluated on quantitative metrics that alone can be misleading in assessing the image quality and clinical meaningfulness of the generated images. To address this, we introduce MedLoRD, a generative diffusion model designed for computational resource-constrained environments. MedLoRD is capable of generating high-dimensional medical volumes with resolutions up to 512$\times$512$\times$256, utilizing GPUs with only 24GB VRAM, which are commonly found in standard desktop workstations. MedLoRD is evaluated across multiple modalities, including Coronary Computed Tomography Angiography and Lung Computed Tomography datasets. Extensive evaluations through radiological evaluation, relative regional volume analysis, adherence to conditional masks, and downstream tasks show that MedLoRD generates high-fidelity images closely adhering to segmentation mask conditions, surpassing the capabilities of current state-of-the-art generative models for medical image synthesis in computational resource-constrained environments.

医疗影像人工智能的进步提供了巨大的潜力。然而，其应用受限于数据的有限可用性，以及医疗中心由于患者隐私担忧而不愿共享数据。生成模型通过创建合成数据作为真实患者数据的替代品，呈现出一种有前景的解决方案。然而，医疗图像通常是高维的，当前最先进的方法对于计算资源受限的医疗卫生环境来说通常不切实际。这些模型依赖于数据子采样，人们对它们的可行性和现实世界的适用性持怀疑态度。此外，许多这些模型的评估是基于定量指标，但仅仅依靠这些指标可能会误判生成图像的质量和临床重要性。为了解决这个问题，我们引入了MedLoRD，这是一种为计算资源受限环境设计的生成扩散模型。MedLoRD能够在仅使用24GB VRAM的GPU的情况下，生成分辨率为高达512×512×256的高维医疗体积图像，这些GPU在标准台式工作站中很常见。MedLoRD在多模式态下进行了评估，包括冠状动脉计算机断层扫描血管造影和肺部计算机断层扫描数据集。通过放射学评估、相对区域体积分析、遵循条件掩膜和下游任务的大量评估表明，MedLoRD生成的图像高度逼真，紧密遵循分割掩膜条件，超越了当前先进医疗图像合成生成模型在计算资源受限环境中的能力。

论文及项目相关链接

PDF

Summary
先进的AI医疗影像技术具有巨大潜力，但受限于数据可用性和医疗中心因隐私担忧不愿共享数据。生成模型通过创建合成数据作为真实患者数据的替代品展现出解决此问题的前景。然而，医疗图像通常具有高维度性，现有先进技术方法对于计算资源有限的医疗环境来说不太实用。它们依赖于数据子采样，引发对其可行性和实际应用能力的质疑。此外，许多模型的评估仅依赖定量指标，这可能无法准确评估生成图像的质量和临床意义。为解决这些问题，我们提出了MedLoRD，一种适用于计算资源受限环境的生成扩散模型。MedLoRD能够利用仅有24GB VRAM的GPU生成高分辨率（高达512×512×256）的医疗体积图像，这些GPU在标准台式工作站中很常见。MedLoRD在多模态下进行了评估，包括冠状动脉计算机断层扫描血管造影和肺部计算机断层扫描数据集。通过广泛的评估，包括放射学评估、相对区域体积分析、遵循条件掩膜和下游任务等，证明MedLoRD生成的图像具有高保真度，紧密遵循分割掩膜条件，超越了当前先进生成模型在资源受限环境中的医疗图像合成能力。

Key Takeaways

AI在医疗影像领域的进展具有巨大潜力，但受限于数据可用性和隐私保护问题。
生成模型通过创建合成数据为解决数据隐私问题提供了前景。
医疗图像的高维度性使得现有先进技术方法在计算资源有限的医疗环境中应用受限。
现有模型过度依赖数据子采样，引发对其可行性和实际应用能力的质疑。
评估生成模型的指标需要综合考虑定量和临床意义的评估。
MedLoRD是一种适用于计算资源受限环境的生成扩散模型，能够高效生成高分辨率医疗体积图像。

Cool Papers

点此查看论文截图

Enhancing zero-shot learning in medical imaging: integrating clip with advanced techniques for improved chest x-ray analysis

Authors:Prakhar Bhardwaj, Sheethal Bhat, Andreas Maier

Due to the large volume of medical imaging data, advanced AI methodologies are needed to assist radiologists in diagnosing thoracic diseases from chest X-rays (CXRs). Existing deep learning models often require large, labeled datasets, which are scarce in medical imaging due to the time-consuming and expert-driven annotation process. In this paper, we extend the existing approach to enhance zero-shot learning in medical imaging by integrating Contrastive Language-Image Pre-training (CLIP) with Momentum Contrast (MoCo), resulting in our proposed model, MoCoCLIP. Our method addresses challenges posed by class-imbalanced and unlabeled datasets, enabling improved detection of pulmonary pathologies. Experimental results on the NIH ChestXray14 dataset demonstrate that MoCoCLIP outperforms the state-of-the-art CheXZero model, achieving relative improvement of approximately 6.5%. Furthermore, on the CheXpert dataset, MoCoCLIP demonstrates superior zero-shot performance, achieving an average AUC of 0.750 compared to CheXZero with 0.746 AUC, highlighting its enhanced generalization capabilities on unseen data.

由于医学成像数据量大，需要先进的AI方法来辅助放射科医生从胸部X射线（CXRs）诊断胸部疾病。现有的深度学习模型通常需要大量有标签的数据集，但由于耗时且依赖专家标注的过程，医学成像中这种数据集非常稀缺。在本文中，我们通过整合Contrastive Language-Image Pre-training（CLIP）和Momentum Contrast（MoCo），改进了现有方法，提高了医学成像中的零样本学习能力，从而提出了我们的模型MoCoCLIP。我们的方法解决了由类别不平衡和无标签数据集带来的挑战，使肺部病理检测得以改进。在NIH ChestXray14数据集上的实验结果表明，MoCoCLIP优于最新的CheXZero模型，相对改进率约为6.5%。此外，在CheXpert数据集上，MoCoCLIP展现出出色的零样本性能，平均AUC为0.750，高于CheXZero的0.746 AUC，突显出其在未见数据上的增强泛化能力。

论文及项目相关链接

PDF

总结
采用深度学习方法处理大量医学图像数据时，因需要标注的数据集较为稀缺，现有模型在诊断胸部疾病时面临挑战。本文提出一种基于Contrastive Language-Image Pre-training（CLIP）和Momentum Contrast（MoCo）结合的零样本学习增强方法，即MoCoCLIP模型。该模型能够解决类不平衡和无标签数据集的问题，提高了肺部病理的检测能力。在NIH ChestXray14数据集上的实验表明，MoCoCLIP相较于当前最先进的CheXZero模型有约6.5%的相对改进。在CheXpert数据集上，MoCoCLIP展现出更出色的零样本性能，平均AUC达到0.750，高于CheXZero模型的0.746 AUC，显示出其在未见数据上的优秀泛化能力。

关键见解

医学图像大数据背景下，AI方法辅助放射科医生诊断胸部疾病的需求迫切。
现有深度学习模型因标注数据集稀缺面临挑战。
本文提出结合Contrastive Language-Image Pre-training（CLIP）和Momentum Contrast（MoCo）的MoCoCLIP模型，增强零样本学习能力。
MoCoCLIP模型能解决类不平衡和无标签数据集问题，提高肺部病理检测能力。
NIH ChestXray14数据集上的实验显示，MoCoCLIP相对改进约6.5%，表现优于现有模型。
在CheXpert数据集上，MoCoCLIP展现出出色的零样本性能，平均AUC高于CheXZero模型。

Cool Papers

点此查看论文截图

HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model

Authors:Tao Wang, Changxu Cheng, Lingfeng Wang, Senda Chen, Wuyue Zhao

The remarkable performance of large multimodal models (LMMs) has attracted significant interest from the image segmentation community. To align with the next-token-prediction paradigm, current LMM-driven segmentation methods either use object boundary points to represent masks or introduce special segmentation tokens, whose hidden states are decoded by a segmentation model requiring the original image as input. However, these approaches often suffer from inadequate mask representation and complex architectures, limiting the potential of LMMs. In this work, we propose the Hierarchical Mask Tokenizer (HiMTok), which represents segmentation masks with up to 32 tokens and eliminates the need for the original image during mask de-tokenization. HiMTok allows for compact and coarse-to-fine mask representations, aligning well with the LLM next-token-prediction paradigm and facilitating the direct acquisition of segmentation capabilities. We develop a 3-stage training recipe for progressive learning of segmentation and visual capabilities, featuring a hierarchical mask loss for effective coarse-to-fine learning. Additionally, we enable bidirectional information flow, allowing conversion between bounding boxes and mask tokens to fully leverage multi-task training potential. Extensive experiments demonstrate that our method achieves state-of-the-art performance across various segmentation tasks,while also enhancing visual grounding and maintaining overall visual understanding.

多模态大型模型（LMMs）的出色性能引起了图像分割界的极大兴趣。为了与下一个令牌预测范式相一致，当前的LMM驱动的分割方法要么使用对象边界点来表示掩膜，要么引入特殊的分割令牌，其隐藏状态由需要原始图像作为输入的分割模型解码。然而，这些方法通常存在掩膜表示不足和架构复杂的问题，限制了LMM的潜力。

在这项工作中，我们提出了分层掩膜令牌化器（HiMTok），它可以用高达32个令牌来表示分割掩膜，并在掩膜反令牌化过程中消除了对原始图像的需求。HiMTok允许紧凑和由粗到细的掩膜表示，很好地与LLM的下一个令牌预测范式相一致，并促进了直接获取分割能力。

论文及项目相关链接

PDF technical report

Summary

本文提出一种基于层次掩膜令牌化（HiMTok）的方法，用于图像分割任务。该方法利用最多32个令牌表示分割掩膜，无需原始图像即可进行掩膜去令牌化。HiMTok支持紧凑且从粗到细的掩膜表示，与LLM的next-token预测范式对齐，促进直接获取分割能力。通过分阶段训练配方和层次掩膜损失，实现分割和视觉能力的渐进学习。此外，通过启用边界框和掩膜令牌之间的双向信息流，充分利用多任务训练的潜力。实验表明，该方法在各种分割任务上达到最新性能，同时提高视觉定位能力并保持整体视觉理解。

Key Takeaways

大型多模态模型（LMMs）在图像分割领域受到广泛关注。
当前LMM驱动的分割方法常使用对象边界点或特殊分割令牌表示掩膜，但存在不足。
HiMTok方法提出利用最多32个令牌表示分割掩膜，无需原始图像进行去令牌化。
HiMTok支持紧凑且从粗到细的掩膜表示，与LLM的next-token预测范式对齐。
3阶段训练配方和层次掩膜损失用于渐进学习分割和视觉能力。
双向信息流技术用于充分利用多任务训练的潜力。

Cool Papers

点此查看论文截图

Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation

Authors:Xingguo Lv, Xingbo Dong, Liwen Wang, Jiewen Yang, Lei Zhao, Bin Pu, Zhe Jin, Xuejun Li

Despite domain generalization (DG) has significantly addressed the performance degradation of pre-trained models caused by domain shifts, it often falls short in real-world deployment. Test-time adaptation (TTA), which adjusts a learned model using unlabeled test data, presents a promising solution. However, most existing TTA methods struggle to deliver strong performance in medical image segmentation, primarily because they overlook the crucial prior knowledge inherent to medical images. To address this challenge, we incorporate morphological information and propose a framework based on multi-graph matching. Specifically, we introduce learnable universe embeddings that integrate morphological priors during multi-source training, along with novel unsupervised test-time paradigms for domain adaptation. This approach guarantees cycle-consistency in multi-matching while enabling the model to more effectively capture the invariant priors of unseen data, significantly mitigating the effects of domain shifts. Extensive experiments demonstrate that our method outperforms other state-of-the-art approaches on two medical image segmentation benchmarks for both multi-source and single-source domain generalization tasks. The source code is available at https://github.com/Yore0/TTDG-MGM.

尽管领域泛化（DG）已经显著解决了预训练模型因领域差异导致的性能下降问题，但在现实世界的部署中常常存在不足。测试时间适应（TTA）是一种使用无标签测试数据调整已学习模型的方法，展现出了一种有前途的解决方案。然而，大多数现有的TTA方法在医疗图像分割中很难提供强大的性能，主要是因为它们忽略了医疗图像所固有的关键先验知识。为了应对这一挑战，我们融入了形态信息，并提出了一种基于多图匹配（MGM）的框架。具体来说，我们引入了可学习的宇宙嵌入（universe embeddings），在多源训练过程中整合形态先验知识，并采用了新型的无监督测试时间范式来进行领域适应。这种方法保证了多匹配中的循环一致性，使模型能够更有效地捕获未见数据的恒定先验知识，显著减轻了领域差异的影响。大量实验表明，我们的方法在两种医疗图像分割基准测试上，无论是多源还是单源领域泛化任务，都超过了其他先进的方法。源代码可访问https://github.com/Yore0/TTDG-MGM。

论文及项目相关链接

PDF

摘要
本摘要介绍了尽管域泛化（DG）在解决预训练模型因域偏移导致的性能下降方面取得了显著进展，但在实际应用场景中仍存在不足。测试时间自适应（TTA）作为一种使用无标签测试数据进行模型调整的方法，为解决此问题提供了希望。然而，大多数现有的TTA方法在医学图像分割方面的表现并不理想，主要是因为它们忽略了医学图像中固有的关键先验知识。为解决这一挑战，本文结合了形态信息，并提出了基于多图匹配（MGM）的框架。通过引入可学习的宇宙嵌入（universe embeddings），该框架在多源训练过程中融入了形态先验知识，并采用了新型的无监督测试时间范式进行域适应。此方法保证了多匹配中的循环一致性，使模型更有效地捕获未见数据的固有先验知识，从而显著减轻了域偏移的影响。在两项医学图像分割基准测试上进行的广泛实验表明，对于多源和单源域泛化任务，本文提出的方法均优于其他最新方法。相关源代码可通过以下链接获取：https://github.com/Yore0/TTDG-MGM 。

关键见解

测试时间自适应（TTA）是解决医学图像分割中域泛化问题的一种有前途的解决方案。
大多数现有的TTA方法忽略医学图像中的关键先验知识，这在现实应用中存在局限性。
提出了一种基于多图匹配（MGM）的框架，结合形态信息来解决这一问题。
引入可学习的宇宙嵌入，以在多源训练过程中融入形态先验知识。
采用新型的无监督测试时间范式进行域适应，确保循环一致性并有效捕获未见数据的固有先验。
该方法在两项医学图像分割基准测试上的表现优于其他最新方法。
提供源代码链接供研究者和开发者参考和使用。

Cool Papers

点此查看论文截图

UniReg: Foundation Model for Controllable Medical Image Registration

Authors:Zi Li, Jianpeng Zhang, Tai Ma, Tony C. W. Mok, Yan-Jie Zhou, Zeli Chen, Xianghua Ye, Le Lu, Dakai Jin

Learning-based medical image registration has achieved performance parity with conventional methods while demonstrating a substantial advantage in computational efficiency. However, learning-based registration approaches lack generalizability across diverse clinical scenarios, requiring the laborious development of multiple isolated networks for specific registration tasks, e.g., inter-/intra-subject registration or organ-specific alignment. % To overcome this limitation, we propose \textbf{UniReg}, the first interactive foundation model for medical image registration, which combines the precision advantages of task-specific learning methods with the generalization of traditional optimization methods. Our key innovation is a unified framework for diverse registration scenarios, achieved through a conditional deformation field estimation within a unified registration model. This is realized through a dynamic learning paradigm that explicitly encodes: (1) anatomical structure priors, (2) registration type constraints (inter/intra-subject), and (3) instance-specific features, enabling the generation of scenario-optimal deformation fields. % Through comprehensive experiments encompassing $90$ anatomical structures at different body regions, our UniReg model demonstrates comparable performance with contemporary state-of-the-art methodologies while achieving ~50% reduction in required training iterations relative to the conventional learning-based paradigm. This optimization contributes to a significant reduction in computational resources, such as training time. Code and model will be available.

基于学习的医学图像配准方法在性能上已经达到与传统方法的平衡，同时在计算效率上展现出显著优势。然而，基于学习的配准方法在临床场景的通用性方面存在不足，针对特定的配准任务（如跨受试者或跨同一受试者内的配准或特定器官的对齐）需要费力地开发多个独立的网络。为了克服这一局限性，我们提出了UniReg，这是首个医学图像配准的交互式基础模型，它将任务特定学习方法的精确优势与传统优化方法的通用性相结合。我们的关键创新在于为各种配准场景提供了一个统一的框架，这是通过一个统一的配准模型内的条件变形场估计来实现的。这是通过一个动态学习范式实现的，该范式显式地编码：（1）解剖结构先验知识，（2）配准类型约束（跨受试者或同一受试者内），以及（3）特定实例的特征，从而能够生成场景优化的变形场。通过对涵盖不同身体区域的90个解剖结构进行全面实验，我们的UniReg模型展示出了与当代最新方法相当的性能，并且在相对于传统的基于学习的范式实现了约50%的训练迭代减少。这一优化为计算资源的使用带来了显著的减少，如训练时间。代码和模型将可供使用。

论文及项目相关链接

PDF

Summary

学习基于医学图像注册的模型已实现了与传统方法的性能相当，并且在计算效率上具有显著优势。然而，基于学习的注册方法在临床场景中的通用性较差。为克服这一局限性，本文提出了UniReg，首个交互式医学图像注册基础模型，结合了任务特定学习方法的精确优势与传统优化方法的泛化能力。其核心创新在于通过统一注册模型内的条件变形场估计，实现多种注册场景的统一框架。这通过动态学习范式实现，显式编码1）解剖结构先验知识，2）注册类型约束（跨/内主体），以及3）实例特定特征，以生成场景最优的变形场。实验表明，UniReg模型与当前先进方法相比具有相当的性能，并且在相对于传统学习范式的训练迭代中实现了约50%的减少，有助于减少计算资源，如训练时间。

Key Takeaways

学习基于医学图像注册的模型在性能上已与传统方法相当，并且计算效率有优势。
基于学习的注册方法在临床场景中的通用性有待提高。
UniReg是首个交互式医学图像注册基础模型，结合了任务特定学习方法和传统优化方法的优点。
UniReg通过条件变形场估计实现多种注册场景的统一框架。
动态学习范式结合解剖结构先验知识、注册类型约束和实例特定特征。
UniReg模型与当前先进方法相比具有相当的性能，并显著减少了训练迭代次数和计算资源。

Cool Papers

点此查看论文截图

Adaptive Transformer Attention and Multi-Scale Fusion for Spine 3D Segmentation

Authors:Yanlin Xiang, Qingyuan He, Ting Xu, Ran Hao, Jiacheng Hu, Hanchao Zhang

This study proposes a 3D semantic segmentation method for the spine based on the improved SwinUNETR to improve segmentation accuracy and robustness. Aiming at the complex anatomical structure of spinal images, this paper introduces a multi-scale fusion mechanism to enhance the feature extraction capability by using information of different scales, thereby improving the recognition accuracy of the model for the target area. In addition, the introduction of the adaptive attention mechanism enables the model to dynamically adjust the attention to the key area, thereby optimizing the boundary segmentation effect. The experimental results show that compared with 3D CNN, 3D U-Net, and 3D U-Net + Transformer, the model of this study has achieved significant improvements in mIoU, mDice, and mAcc indicators, and has better segmentation performance. The ablation experiment further verifies the effectiveness of the proposed improved method, proving that multi-scale fusion and adaptive attention mechanism have a positive effect on the segmentation task. Through the visualization analysis of the inference results, the model can better restore the real anatomical structure of the spinal image. Future research can further optimize the Transformer structure and expand the data scale to improve the generalization ability of the model. This study provides an efficient solution for the task of medical image segmentation, which is of great significance to intelligent medical image analysis.

本研究提出了一种基于改进型SwinUNETR的脊柱3D语义分割方法，以提高分割精度和稳健性。针对脊柱图像复杂的解剖结构，本文引入了一种多尺度融合机制，利用不同尺度的信息增强特征提取能力，从而提高模型对目标区域的识别精度。此外，引入自适应注意力机制使模型能够动态调整对关键区域的注意力，从而优化边界分割效果。实验结果表明，与3D CNN、3D U-Net和3D U-Net+Transformer相比，本研究的模型在mIoU、mDice和mAcc指标上取得了显著的改进，具有更好的分割性能。消融实验进一步验证了所改进方法的有效性，证明多尺度融合和自适应注意力机制对分割任务具有积极影响。通过对推理结果的可视化分析，该模型能够更好地恢复脊柱图像的真实解剖结构。未来研究可以进一步优化Transformer结构并扩大数据规模，以提高模型的泛化能力。本研究为医学图像分割任务提供了有效的解决方案，对智能医学图像分析具有重要意义。

论文及项目相关链接

PDF

Summary

本论文针对脊柱图像的复杂解剖结构，提出了一种基于改进型SwinUNETR的3D语义分割方法。该方法引入多尺度融合机制和自适应注意力机制，提高了特征提取能力和模型对目标区域的识别精度，优化了边界分割效果。实验结果显示，该方法相较于3D CNN、3D U-Net和3D U-Net + Transformer模型，在mIoU、mDice和mAcc指标上取得显著改进，具有更好的分割性能。

Key Takeaways

该研究提出了一种基于改进型SwinUNETR的3D语义分割方法，针对脊柱图像的复杂解剖结构进行设计。
引入多尺度融合机制，利用不同尺度的信息增强特征提取能力。
引入自适应注意力机制，使模型能动态调整对关键区域的注意力，优化边界分割效果。
实验结果显示，该方法在mIoU、mDice和mAcc指标上较其他模型有显著改进。
消融实验验证了多尺度融合和自适应注意力机制对分割任务的积极影响。
通过可视化分析推理结果，证明该模型能更好地恢复真实的脊柱图像解剖结构。

Cool Papers

点此查看论文截图

Mixed-granularity Implicit Representation for Continuous Hyperspectral Compressive Reconstruction

Authors:Jianan Li, Huan Chen, Wangcai Zhao, Rui Chen, Tingfa Xu

Hyperspectral Images (HSIs) are crucial across numerous fields but are hindered by the long acquisition times associated with traditional spectrometers. The Coded Aperture Snapshot Spectral Imaging (CASSI) system mitigates this issue through a compression technique that accelerates the acquisition process. However, reconstructing HSIs from compressed data presents challenges due to fixed spatial and spectral resolution constraints. This study introduces a novel method using implicit neural representation for continuous hyperspectral image reconstruction. We propose the Mixed Granularity Implicit Representation (MGIR) framework, which includes a Hierarchical Spectral-Spatial Implicit Encoder for efficient multi-scale implicit feature extraction. This is complemented by a Mixed-Granularity Local Feature Aggregator that adaptively integrates local features across scales, combined with a decoder that merges coordinate information for precise reconstruction. By leveraging implicit neural representations, the MGIR framework enables reconstruction at any desired spatial-spectral resolution, significantly enhancing the flexibility and adaptability of the CASSI system. Extensive experimental evaluations confirm that our model produces reconstructed images at arbitrary resolutions and matches state-of-the-art methods across varying spectral-spatial compression ratios. The code will be released at https://github.com/chh11/MGIR.

高光谱图像（HSI）在多个领域都至关重要，但传统光谱仪相关的长时间采集过程限制了其发展。编码孔径快照光谱成像（CASSI）系统通过一种压缩技术解决了这一问题，从而加速了采集过程。然而，从压缩数据中重建高光谱图像面临着固定的空间分辨率和光谱分辨率约束所带来的挑战。本研究引入了一种使用隐式神经表示进行连续高光谱图像重建的新方法。我们提出了混合粒度隐式表示（MGIR）框架，包括分层谱空间隐式编码器，用于高效的多尺度隐特征提取。这得到了混合粒度局部特征聚合器的补充，该聚合器自适应地集成了跨尺度的局部特征，并结合解码器合并坐标信息进行精确重建。通过利用隐式神经表示，MGIR框架能够以所需的任何空间光谱分辨率进行重建，显著增强了CASSI系统的灵活性和适应性。广泛的实验评估证实，我们的模型能够在任意分辨率下生成重建图像，并在各种光谱空间压缩比率方面达到了最新技术水平。代码将在https://github.com/chh11/MGIR发布。

论文及项目相关链接

PDF Accepted by TNNLS

Summary

本文介绍了一种利用隐神经表示进行连续高光谱图像重建的新方法。该研究提出了名为Mixed Granularity Implicit Representation（MGIR）的框架，通过分层谱空间隐编码器和混合粒度局部特征聚合器，实现了高效的多尺度隐特征提取和自适应局部特征集成。该框架能够重建任意所需的空间光谱分辨率的高光谱图像，显著提高了CASSI系统的灵活性和适应性。

Key Takeaways

隐神经表示用于高光谱图像重建。
MGIR框架包含分层谱空间隐编码器和混合粒度局部特征聚合器。
MGIR能够实现多尺度隐特征提取和自适应局部特征集成。
MGIR框架能够重建任意所需的空间光谱分辨率的高光谱图像。
MGIR通过利用坐标信息提高了重建精度。
实验评估证明，MGIR模型能够在任意分辨率下重建图像，并在不同的谱空间压缩比率下达到最先进的水平。

Cool Papers

点此查看论文截图

Adaptive Deep Learning for Multiclass Breast Cancer Classification via Misprediction Risk Analysis

Authors:Gul Sheeraz, Qun Chen, Liu Feiyu, Zhou Fengjin MD

Breast cancer remains one of the leading causes of cancer-related deaths worldwide. Early detection is crucial for improving patient outcomes, yet the diagnostic process is often complex and prone to inconsistencies among pathologists. Computer-aided diagnostic approaches have significantly enhanced breast cancer detection, particularly in binary classification (benign vs. malignant). However, these methods face challenges in multiclass classification, leading to frequent mispredictions. In this work, we propose a novel adaptive learning approach for multiclass breast cancer classification using H&E-stained histopathology images. First, we introduce a misprediction risk analysis framework that quantifies and ranks the likelihood of an image being mislabeled by a classifier. This framework leverages an interpretable risk model that requires only a small number of labeled samples for training. Next, we present an adaptive learning strategy that fine-tunes classifiers based on the specific characteristics of a given dataset. This approach minimizes misprediction risk, allowing the classifier to adapt effectively to the target workload. We evaluate our proposed solutions on real benchmark datasets, demonstrating that our risk analysis framework more accurately identifies mispredictions compared to existing methods. Furthermore, our adaptive learning approach significantly improves the performance of state-of-the-art deep neural network classifiers.

乳腺癌仍然是全球癌症死亡的主要原因之一。早期发现对于改善患者结果至关重要，然而，诊断过程往往复杂，病理学家之间容易出现不一致。计算机辅助诊断方法已经大大提高了乳腺癌的检测水平，特别是在二分类（良性与恶性）中。然而，这些方法在多类分类中面临挑战，导致频繁的错误预测。在这项工作中，我们提出了一种利用H&E染色组织病理学图像进行多类乳腺癌分类的新型自适应学习方法。首先，我们引入了一个误预测风险分析框架，该框架可以量化和排序图像被分类器错误标记的可能性。该框架利用了一个可解释的风险模型，该模型仅需要少量标记样本进行训练。接下来，我们提出了一种基于给定数据集特定特性的自适应学习策略，对分类器进行微调。这种方法降低了误预测的风险，使分类器能够更有效地适应目标工作量。我们在真实的基准数据集上评估了我们提出的解决方案，结果表明，我们的风险分析框架比现有方法更准确地识别了误预测。此外，我们的自适应学习方法显著提高了最先进的深度神经网络分类器的性能。

论文及项目相关链接

PDF

Summary

本文提出一种自适应学习的方法，用于基于H&E染色组织病理图像的乳腺癌多分类诊断。该方法包括一个误预测风险分析框架和一个自适应学习策略。风险分析框架能够量化并排序图像被分类器误判的概率，只需少量样本即可训练。自适应学习策略则根据特定数据集的特点微调分类器，最小化误预测风险。实验结果表明，该方法能更准确地识别误预测，提高现有深度神经网络分类器的性能。

Key Takeaways

乳腺癌仍然是全球癌症死亡的主要原因之一，早期检测对于改善患者预后至关重要。
计算机辅助诊断方法已经显著提高了乳腺癌检测能力，但在多分类方面仍面临挑战。
提出了一种新的自适应学习方法，用于多类别乳腺癌分类。
引入了一个误预测风险分析框架，能够量化图像被误判的概率。
风险分析框架只需要少量样本进行训练，具有高度的实用性。
自适应学习策略能够基于数据集的特点微调分类器，提高分类性能。

Cool Papers

点此查看论文截图

Humanoids in Hospitals: A Technical Study of Humanoid Surrogates for Dexterous Medical Interventions

Authors:Soofiyan Atar, Xiao Liang, Calvin Joyce, Florian Richter, Wood Ricardo, Charles Goldberg, Preetham Suresh, Michael Yip

The increasing demand for healthcare workers, driven by aging populations and labor shortages, presents a significant challenge for hospitals. Humanoid robots have the potential to alleviate these pressures by leveraging their human-like dexterity and adaptability to assist in medical procedures. This work conducted an exploratory study on the feasibility of humanoid robots performing direct clinical tasks through teleoperation. A bimanual teleoperation system was developed for the Unitree G1 Humanoid Robot, integrating high-fidelity pose tracking, custom grasping configurations, and an impedance controller to safely and precisely manipulate medical tools. The system is evaluated in seven diverse medical procedures, including physical examinations, emergency interventions, and precision needle tasks. Our results demonstrate that humanoid robots can successfully replicate critical aspects of human medical assessments and interventions, with promising quantitative performance in ventilation and ultrasound-guided tasks. However, challenges remain, including limitations in force output for procedures requiring high strength and sensor sensitivity issues affecting clinical accuracy. This study highlights the potential and current limitations of humanoid robots in hospital settings and lays the groundwork for future research on robotic healthcare integration.

随着人口老龄化和劳动力短缺的加剧，对医疗工作者的需求不断增加，这给医院带来了重大挑战。类人机器人具有缓解这些压力的潜力，它们可以利用类人的灵巧性和适应性来辅助医疗程序。本研究通过遥操作对人形机器人执行直接临床任务的可能性进行了探索性研究。为Unitree G1人形机器人开发了一种双手遥操作系统，集成了高保真姿态跟踪、自定义抓握配置和阻抗控制器，以安全精确地操作医疗工具。该系统在七种不同的医疗程序中进行评估，包括体格检查、紧急干预和精确针术任务。我们的结果表明，人形机器人能够成功复制人类医疗评估和干预的关键方面，在通气和超声引导任务中的定量性能具有前景。然而，仍存在挑战，包括对于需要高强度的程序而言力量输出的局限性以及影响临床准确性的传感器灵敏度问题。该研究突出了医院环境中人形机器人的潜力和当前局限性，并为未来机器人医疗保健整合的研究奠定了基础。

论文及项目相关链接

PDF 8 pages

Summary

人口老龄化及劳动力短缺对医院提出了巨大挑战。人形机器人具备人类般的灵巧性和适应性，可协助执行医疗程序，缓解压力。本研究探索了通过遥控操作人形机器人执行直接临床任务的可能性。为Unitree G1人形机器人开发了一种双手动遥控操作系统，集成了高保真姿态追踪、自定义抓握配置和阻抗控制器，可安全精确地操作医疗工具。系统经过七种不同的医疗程序的评估，包括体检、紧急干预和精准针刺任务等。结果显示，人形机器人在关键医疗评估方面取得了显著成果，尤其是在通气和超声引导任务中的定量表现令人鼓舞。然而仍存在挑战，如力量输出受限影响需要高强度的程序执行及传感器灵敏度问题影响临床准确性等。本研究强调了人形机器人在医院环境中的潜力及当前局限性，为未来机器人医疗保健融合研究奠定了基础。

Key Takeaways

人口老龄化和劳动力短缺对医院造成了挑战。
人形机器人具备人类般的灵巧性和适应性，可用于协助医疗程序。
研究通过遥控操作人形机器人执行直接临床任务的可能性。
为Unitree G1人形机器人开发的双手动遥控操作系统集成了高保真姿态追踪、自定义抓握配置和阻抗控制器。
系统经过多种医疗程序评估，包括体检、紧急干预等，表现良好。
人形机器人在关键医疗评估方面取得显著成果，尤其是在通气和超声引导任务中。

Cool Papers

点此查看论文截图

A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT

Authors:Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Dandan Zheng, Heng Guo, Puyang Wang, Ke Yan, Yirui Wang, Qinji Yu, Zi Li, Minfeng Xu, Jianfeng Zhang, Haoshen Li, Jia Ge, Tsung-Ying Ho, Bing-Shen Huang, Tashan Ai, Kuaile Zhao, Na Shen, Qifeng Wang, Yun Bian, Tingyu Wu, Peng Du, Hua Zhang, Feng-Ming Kong, Alan L. Yuille, Cher Heng Tan, Chunyan Miao, Perry J. Pickhardt, Senxiang Yan, Ronald M. Summers, Le Lu, Dakai Jin, Xianghua Ye

Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized clinical expertise, and the time required to finish the task. To this end, we proposed a novel continual learning-driven CT model that can segment complete anatomies presented using dozens of previously partially labeled datasets, dynamically expanding its capacity to segment new ones without compromising previously learned organ knowledge. Existing multi-dataset approaches are not able to dynamically segment new anatomies without catastrophic forgetting and would encounter optimization difficulty or infeasibility when segmenting hundreds of anatomies across the whole range of body regions. Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies. Composed of a universal encoder, multiple optimized and pruned decoders, CL-Net is developed using 13,952 CT scans from 20 public and 16 private high-quality partially labeled CT datasets of various vendors, different contrast phases, and pathologies. Extensive evaluation demonstrates that CL-Net consistently outperforms the upper limit of an ensemble of 36 specialist nnUNets trained per dataset with the complexity of 5% model size and significantly surpasses the segmentation accuracy of recent leading Segment Anything-style medical image foundation models by large margins. Our continual learning-driven CL-Net model would lay a solid foundation to facilitate many downstream tasks of oncology and chronic diseases using the most widely adopted CT imaging.

在慢性疾病的定量管理和肿瘤学中，如果患者计算机断层扫描（CT）能够精确且详细地分割、解析和分析，那么精准医疗将大大提高。然而，由于没有完全标注的CT数据集来标注所有解剖结构以供训练，原因在于手动成本极高、需要专业临床经验以及完成任务所需的时间。鉴于此，我们提出了一种新型的持续学习驱动的CT模型，该模型可以使用数十个先前部分标记的数据集来呈现完整的解剖结构，并动态扩展其分割新解剖结构的能力，而不会损害以前学习的器官知识。现有的多数据集方法无法动态分割新的解剖结构，而不会发生灾难性遗忘，并且在分割数百个解剖结构时，会在整个身体区域范围内遇到优化困难或不可行的情况。我们的单一统一CT分割模型CL-Net可以高度准确地分割临床上全面的235个精细全身解剖结构。CL-Net由通用编码器、多个优化和修剪的解码器组成，使用来自20个公共和16个私有高质量部分标记CT数据集的13952个CT扫描进行开发，这些数据集来自不同的供应商、对比阶段和病理情况。经过广泛评估，证明CL-Net始终超过每个数据集训练的36个专家nnUNets组合的上限，模型大小复杂度为5%，并且大大超过了最近领先的Segment Anything风格的医学图像基础模型的分割精度。我们持续学习驱动的CL-Net模型将为使用最广泛采用的CT成像的肿瘤学和慢性病下游任务奠定坚实基础。

论文及项目相关链接

PDF

Summary

本文提出一种新型持续学习驱动的CT模型（CL-Net），该模型可精确分割临床全面的235种精细全身解剖学结构。通过利用多种部分标记的CT数据集，CL-Net能够动态扩展其分割新解剖学结构的能力，同时不遗忘先前学习的器官知识。与传统的多数据集方法和其他模型相比，CL-Net在分割精度和效率方面表现出卓越性能。

Key Takeaways

CL-Net模型能够精确分割临床全面的235种精细全身解剖学结构。
该模型利用多种部分标记的CT数据集进行训练，实现动态扩展分割新解剖学结构的能力。
CL-Net采用持续学习驱动的方法，在分割新解剖学结构时不会遗忘先前学习的器官知识。
与传统多数据集方法和其他模型相比，CL-Net在分割性能上表现出优越性。
CL-Net模型由通用编码器、多个优化和修剪的解码器组成。
该模型使用了大量来自公共和私有来源的高质量部分标记CT数据集进行训练。

Cool Papers

点此查看论文截图

LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization

Authors:Alessio Spagnoletti, Jean Prost, Andrés Almansa, Nicolas Papadakis, Marcelo Pereyra

Text-to-image latent diffusion models (LDMs) have recently emerged as powerful generative models with great potential for solving inverse problems in imaging. However, leveraging such models in a Plug & Play (PnP), zero-shot manner remains challenging because it requires identifying a suitable text prompt for the unknown image of interest. Also, existing text-to-image PnP approaches are highly computationally expensive. We herein address these challenges by proposing a novel PnP inference paradigm specifically designed for embedding generative models within stochastic inverse solvers, with special attention to Latent Consistency Models (LCMs), which distill LDMs into fast generators. We leverage our framework to propose LAtent consisTency INverse sOlver (LATINO), the first zero-shot PnP framework to solve inverse problems with priors encoded by LCMs. Our conditioning mechanism avoids automatic differentiation and reaches SOTA quality in as little as 8 neural function evaluations. As a result, LATINO delivers remarkably accurate solutions and is significantly more memory and computationally efficient than previous approaches. We then embed LATINO within an empirical Bayesian framework that automatically calibrates the text prompt from the observed measurements by marginal maximum likelihood estimation. Extensive experiments show that prompt self-calibration greatly improves estimation, allowing LATINO with PRompt Optimization to define new SOTAs in image reconstruction quality and computational efficiency.

文本到图像的潜在扩散模型（LDMs）最近作为强大的生成模型出现，在解决成像中的逆问题方面有着巨大的潜力。然而，以Plug & Play（PnP）即插即用、零射击的方式利用这些模型仍然具有挑战性，因为这需要为感兴趣的未知图像确定合适的文本提示。此外，现有的文本到图像的PnP方法计算量巨大。我们通过提出一种新型PnP推理范式来解决这些挑战，该范式专门设计用于将生成模型嵌入随机逆求解器中，特别关注潜在一致性模型（LCMs），它将LDMs蒸馏为快速生成器。我们利用我们的框架提出了潜在一致性逆求解器（LATINO），这是第一个零射击PnP框架，用于使用由LCMs编码的先验知识解决逆问题。我们的调节机制避免了自动微分，并在仅进行8次神经网络功能评估的情况下达到了最新技术水平。因此，LATINO提供了非常准确的解决方案，并且在内存和计算方面比以前的方法更加高效。然后我们将LATINO嵌入经验贝叶斯框架中，通过边际最大似然估计自动校准文本提示。大量实验表明，提示自我校准极大地提高了估算精度，使得带有提示优化的LATINO在图像重建质量和计算效率方面定义了新的最新技术。

论文及项目相关链接

PDF 27 pages, 20 figures

Summary

本文介绍了一种针对文本到图像潜在扩散模型（LDMs）的新型Plug & Play（PnP）推理范式，用于解决图像重建中的逆问题。提出了一种新的潜在一致性模型（LCMs）嵌入方法，并将LDMs转化为快速生成器。文章介绍了一种名为LAtento consisTency INverse sOlver（LATINO）的零样本PnP框架，使用条件机制避免自动微分，达到仅需少量神经网络函数评估即可获得最优质量。此外，文章还将LATINO嵌入经验贝叶斯框架中，通过边际最大似然估计自动校准文本提示，实现自我校准提示优化。这一新方法显著提高图像重建质量和计算效率。

Key Takeaways

LDMs展现出解决成像逆问题的巨大潜力，但Plug & Play方式的应用存在挑战。
提出一种新型PnP推理范式，针对生成模型嵌入随机逆求解器，特别关注LCMs。
LATINO框架是首个零样本PnP框架，能解决由LCMs编码先验的逆问题。
条件机制避免自动微分，达到高质量同时减少神经网络函数评估次数。
LATINO嵌入经验贝叶斯框架，通过边际最大似然估计自动校准文本提示。
提示自我校准优化显著提高图像重建质量和计算效率。

Cool Papers

点此查看论文截图

AstroSat-CZTI searches for hard X-ray prompt emission from Fast Radio Bursts

Authors:G. Waratkar, M. Dixit, S. P. Tendulkar, V. Bhalerao, D. Bhattacharya, S. Vadawale

Fast Radio Bursts (FRBs) are short-duration, highly-energetic extragalactic radio transients with unclear origins & emission mechanisms. Despite extensive multi-wavelength searches, no credible X-ray or other prompt electromagnetic counterparts have been found for extragalactic FRBs. We present results from a comprehensive search for such prompt X-ray counterparts using AstroSat-CZTI which has been actively detecting other high-energy fast transients like Gamma-ray bursts (GRBs). We undertook a systematic search in AstroSat-CZTI data for hard X-ray transients temporally & spatially coincident with 578 FRBs, and found no X-ray counterparts. We estimate flux upper limits for these events and convert them to upper limits on X-ray-to-radio fluence ratios. Further, we utilize the redshifts derived from the dispersion measures of these FRBs to compare their isotropic luminosities with those of GRBs, providing insights into potential similarities between these two classes of transients. Finally, we explore the prospects for X-ray counterpart detections using other current and upcoming X-ray monitors, including Fermi-GBM, Swift-BAT, SVOM-ECLAIRs, and Daksha, in the era of next-generation FRB detection facilities such as CHIME, DSA-2000, CHORD, and BURSTT. Our results highlight that highly sensitive X-ray monitors with large sky coverage, like Daksha, will provide the best opportunities to detect X-ray counterparts of bright FRBs.

快速射电暴（FRBs）是短暂且能量极高的河外射电瞬变，其起源和发射机制尚不清楚。尽管进行了广泛的多波长搜索，但尚未发现河外FRB的可信X射线或其他即时电磁对应物。我们利用AstroSat-CZTI提供了关于此类即时X射线对应物的全面搜索的结果，该仪器一直在积极检测其他高能快速瞬变，如伽马射线暴（GRBs）。我们对AstroSat-CZTI数据进行了系统搜索，寻找与578个FRB在时间上和空间上相吻合的硬X射线瞬变，但没有发现X射线对应物。我们估计了这些事件的流量上限，并将其转换为X射线与射电流量比的上限。此外，我们还利用这些FRB的色散测量得出的红移来比较它们的等向光度与GRB的光度，了解这两类瞬变之间的潜在相似性。最后，我们探讨了使用当前的和其他即将推出的X射线监测器（包括费米GBM、Swift BAT、SVOM ECLAIRs和达卡）来检测X射线对应物的可能性，这是在下一个世代的FRB检测设施的时代，如CHIME、DSA-2000、CHORD和BURSTT等。我们的结果强调，像达卡这样具有高灵敏度和大天区的X射线监测器将为检测明亮FRB的X射线对应物提供最佳机会。

论文及项目相关链接

PDF 13 pages, 5 figures, 3 tables. Submitted to Journal of Astrophysics and Astronomy. Comments welcome!

Summary

本文研究了快速射电暴（FRBs）的X射线对应物。虽然FRBs是短暂且高能的射电瞬变，但对其起源和发射机制尚不清楚。作者对AstroSat-CZTI进行了全面的搜索，寻找与FRBs相对应的即时X射线对应物，但未发现任何X射线对应物。作者估计了这些事件的流量上限，并将其转换为X射线与射电波流量比的上限。此外，作者利用这些FRB的色散测量得到的红移来比较它们的等容发光量与GRB的等容发光量，以揭示这两种瞬态事件之间的潜在相似性。最后，作者探讨了使用当前和未来X射线监测器（如Fermi-GBM、Swift-BAT等）检测X射线对应物的可能性，强调了下一代FRB检测设施如CHIME等将为检测FRBs的X射线对应物提供最佳机会。

Key Takeaways

FRBs是短暂且高能的射电瞬变，其起源和发射机制尚不清楚。
通过AstroSat-CZTI未发现与FRBs相对应的即时X射线对应物。
估计了FRBs的流量上限，并转化为X射线与射电波流流量比的上限。
利用红移比较FRBs和GRB的等容发光量，探讨两者之间的潜在相似性。
当前和未来X射线监测器在检测FRBs的X射线对应物方面存在挑战。
下一代FRB检测设施将提供检测FRBs的X射线对应物的最佳机会。

Cool Papers

点此查看论文截图

A Causality-Inspired Model for Intima-Media Thickening Assessment in Ultrasound Videos

Authors:Shuo Gao, Jingyang Zhang, Jun Xue, Meng Yang, Yang Chen, Guangquan Zhou

Carotid atherosclerosis represents a significant health risk, with its early diagnosis primarily dependent on ultrasound-based assessments of carotid intima-media thickening. However, during carotid ultrasound screening, significant view variations cause style shifts, impairing content cues related to thickening, such as lumen anatomy, which introduces spurious correlations that hinder assessment. Therefore, we propose a novel causal-inspired method for assessing carotid intima-media thickening in frame-wise ultrasound videos, which focuses on two aspects: eliminating spurious correlations caused by style and enhancing causal content correlations. Specifically, we introduce a novel Spurious Correlation Elimination (SCE) module to remove non-causal style effects by enforcing prediction invariance with style perturbations. Simultaneously, we propose a Causal Equivalence Consolidation (CEC) module to strengthen causal content correlation through adversarial optimization during content randomization. Simultaneously, we design a Causal Transition Augmentation (CTA) module to ensure smooth causal flow by integrating an auxiliary pathway with text prompts and connecting it through contrastive learning. The experimental results on our in-house carotid ultrasound video dataset achieved an accuracy of 86.93%, demonstrating the superior performance of the proposed method. Code is available at \href{https://github.com/xielaobanyy/causal-imt}{https://github.com/xielaobanyy/causal-imt}.

颈动脉动脉粥样硬化是一个重要的健康风险。其早期诊断主要依赖于基于超声的颈动脉内膜中层厚度评估。然而，在颈动脉超声筛查过程中，视图变化较大导致风格变化，与厚度相关的内容线索受损，如腔道结构，这引入了虚假关联，妨碍评估。因此，我们提出了一种新型的因果启发方法来评估帧级超声视频中的颈动脉内膜中层厚度，该方法主要关注两个方面：消除由风格引起的虚假关联，并增强因果内容关联。具体来说，我们引入了一种新型的虚假关联消除（SCE）模块，通过强制预测不变性和风格扰动来消除非因果风格效应。同时，我们提出了因果等价巩固（CEC）模块，通过内容随机化过程中的对抗优化来加强因果内容关联。此外，我们设计了一个因果过渡增强（CTA）模块，通过辅助路径的文本提示和对比学习来确保流畅的因果流程。在我们内部的颈动脉超声视频数据集上进行的实验达到了86.93%的准确率，证明了所提出方法的卓越性能。代码可在https://github.com/xielaobanyy/causal-imt访问。

论文及项目相关链接

PDF 10 pages, 5 figures, conference

Summary

本文介绍了一种基于因果理论的方法，用于评估颈动脉内膜中层增厚。该方法通过消除非因果风格效应、增强因果内容关联和确保平滑的因果流程，提高了超声视频中颈动脉内膜中层增厚评估的准确性。

Key Takeaways

颈动脉粥样硬化的早期诊断主要依赖于基于超声的颈动脉内膜中层增厚评估。
现有超声筛查中，视图变化引起的风格转变会损害与增厚相关的内容线索，如腔道结构，从而产生虚假关联，影响评估。
提出了一种新的基于因果理论的方法，专注于评估颈动脉内膜中层增厚，主要包括三个模块：消除虚假关联、强化因果内容关联和确保平滑的因果流程。
通过非因果风格效应消除模块（SCE），利用预测不变性通过风格扰动来去除非因果风格影响。
通过因果等价巩固模块（CEC），利用对抗优化和内容随机化来加强因果内容关联。
通过因果转换增强模块（CTA），结合文本提示和对比学习，确保流畅的因果流程。
在自有颈动脉超声视频数据集上的实验达到了86.93%的准确率，显示出所提出方法的优越性。

Cool Papers

点此查看论文截图

A Novel Double Pruning method for Imbalanced Data using Information Entropy and Roulette Wheel Selection for Breast Cancer Diagnosis

Authors:Soufiane Bacha, Huansheng Ning, Belarbi Mostefa, Doreen Sebastian Sarwatt, Sahraoui Dhelim

Accurate illness diagnosis is vital for effective treatment and patient safety. Machine learning models are widely used for cancer diagnosis based on historical medical data. However, data imbalance remains a major challenge, leading to hindering classifier performance and reliability. The SMOTEBoost method addresses this issue by generating synthetic data to balance the dataset, but it may overlook crucial overlapping regions near the decision boundary and can produce noisy samples. This paper proposes RE-SMOTEBoost, an enhanced version of SMOTEBoost, designed to overcome these limitations. Firstly, RE-SMOTEBoost focuses on generating synthetic samples in overlapping regions to better capture the decision boundary using roulette wheel selection. Secondly, it incorporates a filtering mechanism based on information entropy to reduce noise, and borderline cases and improve the quality of generated data. Thirdly, we introduce a double regularization penalty to control the synthetic samples proximity to the decision boundary and avoid class overlap. These enhancements enable higher-quality oversampling of the minority class, resulting in a more balanced and effective training dataset. The proposed method outperforms existing state-of-the-art techniques when evaluated on imbalanced datasets. Compared to the top-performing sampling algorithms, RE-SMOTEBoost demonstrates a notable improvement of 3.22% in accuracy and a variance reduction of 88.8%. These results indicate that the proposed model offers a solid solution for medical settings, effectively overcoming data scarcity and severe imbalance caused by limited samples, data collection difficulties, and privacy constraints.

准确诊断疾病对于有效治疗和患者安全至关重要。机器学习模型基于历史医疗数据被广泛用于癌症诊断。然而，数据不平衡仍然是一个主要挑战，导致分类器性能和可靠性受到阻碍。SMOTEBoost方法通过生成合成数据来平衡数据集，以解决此问题，但它可能会忽略决策边界附近的关键重叠区域并产生噪声样本。本文提出了RE-SMOTEBoost，这是SMOTEBoost的增强版，旨在克服这些局限性。首先，RE-SMOTEBoost专注于在重叠区域生成合成样本，以更好地使用轮盘选择法捕捉决策边界。其次，它基于信息熵引入了一种过滤机制，以减少噪声、边界案例并提高生成数据的质量。第三，我们引入双重正则化惩罚来控制合成样本接近决策边界的程度，并避免类重叠。这些增强功能实现了对少数类的更高质量过采样，从而得到更平衡和有效的训练数据集。当在不平衡数据集上进行评估时，所提出的方法优于现有的最先进技术。与表现最佳的采样算法相比，RE-SMOTEBoost在准确度上实现了3.22%的显著提高，并降低了88.8%的方差。这些结果表明，所提出模型为医疗环境提供了有效的解决方案，能够有效克服因样本有限、数据采集困难和隐私约束而导致的数据稀缺和严重不平衡问题。

论文及项目相关链接

PDF

摘要

RE-SMOTEBoost方法是为了解决机器学习在医学诊断中面临的数据不平衡问题而提出的。该方法在生成合成样本、过滤机制和双重正则化惩罚等方面进行了改进，以提高少数类数据的过采样质量，从而得到更平衡和有效的训练数据集。在不平衡数据集上的评估结果表明，该方法优于现有先进技术，准确率提高3.22%，方差降低88.8%。这为医学环境中的数据稀缺和严重不平衡问题提供了有效的解决方案。

关键见解

数据不平衡是机器学习模型在疾病诊断中面临的主要挑战，影响分类器的性能和可靠性。
SMOTEBoost方法通过生成合成数据来平衡数据集，但可能忽略决策边界附近的重叠区域并产生噪声样本。
RE-SMOTEBoost是SMOTEBoost的增强版本，专注于在重叠区域生成合成样本，以更好地捕获决策边界。
RE-SMOTEBoost引入基于信息熵的过滤机制，以减少噪音、边界案例并提高生成数据的质量。
双重正则化惩罚控制合成样本与决策边界的接近度，避免类重叠。
RE-SMOTEBoost在不平衡数据集上的表现优于现有技术，准确率和方差均有显著改善。

Cool Papers

点此查看论文截图

Adaptive Label Correction for Robust Medical Image Segmentation with Noisy Labels

Authors:Chengxuan Qian, Kai Han, Siqi Ma, Chongwen Lyu, Zhenlong Yuan, Jun Chen, Zhe Liu

Deep learning has shown remarkable success in medical image analysis, but its reliance on large volumes of high-quality labeled data limits its applicability. While noisy labeled data are easier to obtain, directly incorporating them into training can degrade model performance. To address this challenge, we propose a Mean Teacher-based Adaptive Label Correction (ALC) self-ensemble framework for robust medical image segmentation with noisy labels. The framework leverages the Mean Teacher architecture to ensure consistent learning under noise perturbations. It includes an adaptive label refinement mechanism that dynamically captures and weights differences across multiple disturbance versions to enhance the quality of noisy labels. Additionally, a sample-level uncertainty-based label selection algorithm is introduced to prioritize high-confidence samples for network updates, mitigating the impact of noisy annotations. Consistency learning is integrated to align the predictions of the student and teacher networks, further enhancing model robustness. Extensive experiments on two public datasets demonstrate the effectiveness of the proposed framework, showing significant improvements in segmentation performance. By fully exploiting the strengths of the Mean Teacher structure, the ALC framework effectively processes noisy labels, adapts to challenging scenarios, and achieves competitive results compared to state-of-the-art methods.

深度学习在医学图像分析方面取得了显著的成就，但其依赖于大量高质量标注数据的特性限制了其应用范围。虽然获取带噪声的标注数据更容易，但直接将其纳入训练会降低模型性能。为了应对这一挑战，我们提出了一种基于Mean Teacher的自适应标签校正（ALC）自集成框架，用于稳健的医学图像分割噪声标签。该框架利用Mean Teacher架构，确保在噪声扰动下实现一致学习。它包含一个自适应标签优化机制，该机制动态捕获并权衡多个扰动版本之间的差异，以提高噪声标签的质量。此外，引入了一种基于样本级别不确定性的标签选择算法，以优先处理高置信度样本进行网络更新，减轻噪声注释的影响。一致性学习被整合以对齐学生和教师网络的预测，进一步提高模型的稳健性。在两个公共数据集上的广泛实验证明了所提框架的有效性，显示分割性能有显著提高。通过充分利用Mean Teacher结构的优势，ALC框架有效地处理了噪声标签，适应了具有挑战性的场景，并与最新方法相比取得了具有竞争力的结果。

论文及项目相关链接

PDF

Summary

深度学习在医学图像分析领域取得了显著的成功，但其依赖于大量高质量标注数据的应用范围受限。为解决使用带噪声标签数据直接训练会导致的模型性能下降问题，我们提出了基于Mean Teacher的自适应标签校正（ALC）自集成框架，用于稳健医学图像分割。该框架利用Mean Teacher架构确保在噪声扰动下的学习一致性，并包含自适应标签优化机制，动态捕捉并权衡不同扰动版本之间的差异，提高噪声标签的质量。此外，引入基于样本级别不确定性的标签选择算法，优先选取高置信度样本进行网络更新，减少噪声标注的影响。集成一致性学习以对齐学生和教师网络的预测结果，进一步提高模型的稳健性。在公共数据集上的广泛实验表明，该框架有效提高了分割性能。利用Mean Teacher结构的优势，ALC框架有效处理噪声标签，适应复杂场景，并与最新方法相比取得有竞争力的结果。

Key Takeaways

深度学习在医学图像分析中的应用受限于标注数据的质量和数量。
噪声标签直接用于训练会导致模型性能下降。
提出基于Mean Teacher的自适应标签校正（ALC）自集成框架来解决这一问题。
该框架利用Mean Teacher架构确保噪声扰动下的学习一致性。
自适应标签优化机制提高噪声标签质量。
基于样本级别不确定性的标签选择算法优先选取高置信度样本进行网络更新。

Cool Papers

点此查看论文截图

Breaking the Box: Enhancing Remote Sensing Image Segmentation with Freehand Sketches

Authors:Ying Zang, Yuncan Gao, Jiangi Zhang, Yuangi Hu, Runlong Cao, Lanyun Zhu, Qi Zhu, Deyi Ji, Renjun Xu, Tianrun Chen

This work advances zero-shot interactive segmentation for remote sensing imagery through three key contributions. First, we propose a novel sketch-based prompting method, enabling users to intuitively outline objects, surpassing traditional point or box prompts. Second, we introduce LTL-Sensing, the first dataset pairing human sketches with remote sensing imagery, setting a benchmark for future research. Third, we present LTL-Net, a model featuring a multi-input prompting transport module tailored for freehand sketches. Extensive experiments show our approach significantly improves segmentation accuracy and robustness over state-of-the-art methods like SAM, fostering more intuitive human-AI collaboration in remote sensing analysis and enhancing its applications.

本文通过三个主要贡献推动了遥感图像零样本交互分割技术的发展。首先，我们提出了一种新颖的基于草图提示方法，使用户能够直观地描绘出目标对象，超越了传统的点或框提示。其次，我们引入了LTL-Sensing数据集，这是第一个将人类草图和遥感图像配对的数据集，为未来的研究设定了基准。最后，我们提出了LTL-Net模型，该模型具有多输入提示传输模块，专为自由手绘草图设计。大量实验表明，我们的方法相较于SAM等先进方法，在分割准确性和稳健性上有显著提高，促进了遥感分析中人机协作的直觉性，并增强了其应用。

论文及项目相关链接

PDF

Summary

本文提出了一项针对遥感影像零样本交互分割技术的新进展。主要贡献包括：提出一种基于草图的新型提示方法，使用户能够直观地描绘物体，超越了传统的点或框提示；引入LTL-Sensing数据集，首次将人类草图与遥感影像配对，为未来研究提供了基准；提出LTL-Net模型，具备针对自由手绘草图的跨输入提示传输模块。实验证明，该方法在分割精度和稳健性上显著优于现有技术如SAM，推动了遥感分析中人类与AI的合作更为直观，并丰富了其应用场景。

Key Takeaways

提出了基于草图的新型提示方法，使用户能够直观描绘物体。
引入了LTL-Sensing数据集，为草图与遥感影像的结合提供基准。
开发LTL-Net模型，具备跨输入提示传输模块，适应自由手绘草图。
该方法显著提高了分割精度和稳健性，优于现有技术如SAM。
促进了遥感分析中人类与AI的合作更加直观。
丰富了遥感分析的应用场景。

Cool Papers

点此查看论文截图

E-SAM: Training-Free Segment Every Entity Model

Authors:Weiming Zhang, Dingwen Xiao, Lei Chen, Lin Wang

Entity Segmentation (ES) aims at identifying and segmenting distinct entities within an image without the need for predefined class labels. This characteristic makes ES well-suited to open-world applications with adaptation to diverse and dynamically changing environments, where new and previously unseen entities may appear frequently. Existing ES methods either require large annotated datasets or high training costs, limiting their scalability and adaptability. Recently, the Segment Anything Model (SAM), especially in its Automatic Mask Generation (AMG) mode, has shown potential for holistic image segmentation. However, it struggles with over-segmentation and under-segmentation, making it less effective for ES. In this paper, we introduce E-SAM, a novel training-free framework that exhibits exceptional ES capability. Specifically, we first propose Multi-level Mask Generation (MMG) that hierarchically processes SAM’s AMG outputs to generate reliable object-level masks while preserving fine details at other levels. Entity-level Mask Refinement (EMR) then refines these object-level masks into accurate entity-level masks. That is, it separates overlapping masks to address the redundancy issues inherent in SAM’s outputs and merges similar masks by evaluating entity-level consistency. Lastly, Under-Segmentation Refinement (USR) addresses under-segmentation by generating additional high-confidence masks fused with EMR outputs to produce the final ES map. These three modules are seamlessly optimized to achieve the best ES without additional training overhead. Extensive experiments demonstrate that E-SAM achieves state-of-the-art performance compared to prior ES methods, demonstrating a significant improvement by +30.1 on benchmark metrics.

实体分割（ES）旨在识别和分割图像中的不同实体，而无需预先定义的类别标签。这一特点使得ES非常适合于适应多样化和动态变化环境的开放世界应用程序，其中可能出现新的和之前未见过的实体。现有的ES方法要么需要大规模标注数据集，要么训练成本高，限制了其可扩展性和适应性。最近，尤其是其自动蒙版生成（AMG）模式的“任何事物分割模型”（SAM）已显示出整体图像分割的潜力。然而，它在过分割和不足分割方面存在困难，这使得它对ES的效力降低。在本文中，我们介绍了E-SAM，这是一个新的无需训练的训练框架，表现出卓越的ES能力。具体来说，我们首先提出多层次蒙版生成（MMG），它分层处理SAM的AMG输出来生成可靠的物体级蒙版，同时保留其他级别的精细细节。然后，实体级蒙版细化（EMR）将这些物体级蒙版细化为准确的实体级蒙版。也就是说，它分离重叠的蒙版，解决SAM输出中固有的冗余问题，并通过评估实体级一致性来合并相似的蒙版。最后，不足分割细化（USR）解决不足分割问题，通过生成与EMR输出融合的高置信度蒙版来生成最终的ES地图。这三个模块无缝优化，以在没有任何额外训练开销的情况下实现最佳ES。大量实验表明，与先前的ES方法相比，E-SAM实现了最先进的性能，在基准指标上实现了+30.1的显著改进。

论文及项目相关链接

PDF Under review

Summary

本文提出了一种无需训练的新型框架E-SAM，用于实现实体分割（ES）。它通过多层次掩膜生成（MMG）、实体级掩膜细化（EMR）和欠分割细化（USR）三个模块，有效解决了SAM在实体分割中遇到的过度分割和欠分割问题。实验表明，E-SAM在基准测试上取得了显著的提升，实现了业界领先的性能。

Key Takeaways