⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-10-20 更新
Uncertainty-Supervised Interpretable and Robust Evidential Segmentation
Authors:Yuzhu Li, An Sui, Fuping Wu, Xiahai Zhuang
Uncertainty estimation has been widely studied in medical image segmentation as a tool to provide reliability, particularly in deep learning approaches. However, previous methods generally lack effective supervision in uncertainty estimation, leading to low interpretability and robustness of the predictions. In this work, we propose a self-supervised approach to guide the learning of uncertainty. Specifically, we introduce three principles about the relationships between the uncertainty and the image gradients around boundaries and noise. Based on these principles, two uncertainty supervision losses are designed. These losses enhance the alignment between model predictions and human interpretation. Accordingly, we introduce novel quantitative metrics for evaluating the interpretability and robustness of uncertainty. Experimental results demonstrate that compared to state-of-the-art approaches, the proposed method can achieve competitive segmentation performance and superior results in out-of-distribution (OOD) scenarios while significantly improving the interpretability and robustness of uncertainty estimation. Code is available via https://github.com/suiannaius/SURE.
在医学图像分割中,不确定性估计被广泛研究作为一种提供可靠性的工具,特别是在深度学习方法中。然而,之前的方法在不确定性估计方面通常缺乏有效的监督,导致预测的解读性和稳健性较低。在这项工作中,我们提出了一种自我监督的方法来指导不确定性学习。具体来说,我们介绍了三个关于不确定性与边界周围图像梯度和噪声之间关系的原则。基于这些原则,设计了两个不确定性监督损失。这些损失提高了模型预测与人类解读之间的对齐程度。因此,我们引入了用于评估不确定性解读性和稳健性的新型定量指标。实验结果表明,与最先进的方法相比,所提出的方法在竞争激烈的分割性能和超出分布(OOD)场景中都能实现卓越的结果,并显著提高了不确定性估计的解读性和稳健性。代码可通过 https://github.com/suiannaius/SURE 获取。
论文及项目相关链接
Summary
本文提出一种自监督方法,用于指导不确定性学习,在医学图像分割中广泛应用。通过引入关于不确定性与图像边界和噪声梯度之间关系的三个原则,设计两种不确定性监督损失,提高模型预测与人类解读之间的对齐程度。同时,引入新型定量指标评估不确定性解读和稳健性。实验结果表明,相较于现有方法,该方法在分割性能上具有竞争力,在超出分布场景下表现更优越,并显著提高不确定性估计的解读性和稳健性。相关代码可通过https://github.com/suiannaius/SURE获取。
Key Takeaways
- 该研究关注医学图像分割中的不确定性估计,旨在提高预测结果的可靠性和稳健性。
- 提出一种自监督方法,通过引入关于不确定性与图像边界和噪声梯度关系的原则来指导不确定性学习。
- 设计两种不确定性监督损失,提高模型预测与人类解读的对齐程度。
- 引入新型定量指标来评估不确定性估计的解读性和稳健性。
- 实验结果表明,该方法在分割性能上具有竞争力,并且在超出分布的场景下表现更优越。
- 公开相关代码,便于研究者和开发者使用。
点此查看论文截图



Cryo-RL: automating prostate cancer cryoablation planning with reinforcement learning
Authors:Trixia Simangan, Ahmed Nadeem Abbasi, Yipeng Hu, Shaheer U. Saeed
Cryoablation is a minimally invasive localised treatment for prostate cancer that destroys malignant tissue during de-freezing, while sparing surrounding healthy structures. Its success depends on accurate preoperative planning of cryoprobe placements to fully cover the tumour and avoid critical anatomy. This planning is currently manual, expertise-dependent, and time-consuming, leading to variability in treatment quality and limited scalability. In this work, we introduce Cryo-RL, a reinforcement learning framework that models cryoablation planning as a Markov decision process and learns an optimal policy for cryoprobe placement. Within a simulated environment that models clinical constraints and stochastic intraoperative variability, an agent sequentially selects cryoprobe positions and ice sphere diameters. Guided by a reward function based on tumour coverage, this agent learns a cryoablation strategy that leads to optimal cryoprobe placements without the need for any manually-designed plans. Evaluated on 583 retrospective prostate cancer cases, Cryo-RL achieved over 8 percentage-point Dice improvements compared with the best automated baselines, based on geometric optimisation, and matched human expert performance while requiring substantially less planning time. These results highlight the potential of reinforcement learning to deliver clinically viable, reproducible, and efficient cryoablation plans.
冷冻消融是一种对前列腺癌的微创局部治疗方法,它能在解冻过程中破坏恶性组织,同时保留周围的健康结构。其成功取决于冷冻探针放置的术前计划准确,以完全覆盖肿瘤并避免关键解剖结构。当前的规划是手动进行的,依赖于专家,并且耗时,导致治疗质量存在差异性且扩展性有限。在这项工作中,我们引入了冷冻强化学习(Cryo-RL),这是一种强化学习框架,它将冷冻消融计划建模为马尔可夫决策过程,并学习冷冻探针放置的最优策略。在一个模拟的临床约束和术中随机变化的环境中,代理程序会依次选择冷冻探针的位置和冰球直径。在肿瘤覆盖的奖励函数指导下,该代理程序学习了一种冷冻消融策略,该策略能够导致最优的冷冻探针放置,无需任何手动设计的计划。在回顾性前列腺癌病例中进行了评估,冷冻强化学习相较于基于几何优化的最佳自动化基线,Dice指数提高了超过8个百分点,并且与人类专家表现相匹配,同时规划时间大大减少。这些结果突出了强化学习在提供临床可行、可重复和高效的冷冻消融计划方面的潜力。
论文及项目相关链接
PDF Accepted at MICAD (Medical Imaging and Computer-Aided Diagnosis) 2025
Summary
本文介绍了Cryo-RL这一强化学习框架在前列腺癌冷冻消融治疗计划中的应用。该框架能够模拟手术环境,自主规划冷冻探针位置和冰球直径,从而提高肿瘤覆盖率并缩短规划时间。在回顾性研究中,与几何优化等自动化方法相比,Cryo-RL在583例前列腺癌患者的应用中取得了超过8个百分点的Dice改善值,并达到了与人类专家相当的性能。
Key Takeaways
- Cryoablation是一种微创的局部前列腺癌治疗方法,通过冷冻和解冻过程破坏恶性组织,同时保留周围健康结构。
- 当前Cryoablation的治疗计划主要依赖于专家的手动规划,存在时间长、质量不一和难以扩展的问题。
- 本文提出了Cryo-RL这一强化学习框架,用于模拟冷冻消融手术环境并自主规划冷冻探针位置和冰球直径。
- 该框架以肿瘤覆盖率为奖励函数,学习优化冷冻消融策略,实现无需手动设计的计划。
- 在回顾性研究中,与几何优化等自动化方法相比,Cryo-RL在前列腺癌患者的治疗中取得了显著效果,提高了Dice改善值。
- Cryo-RL的性能与人类专家相当,但规划时间大幅缩短。
点此查看论文截图


MedDINOv3: How to adapt vision foundation models for medical image segmentation?
Authors:Yuheng Li, Yizhou Wu, Yuxiang Lai, Mingzhe Hu, Xiaofeng Yang
Accurate segmentation of organs and tumors in CT and MRI scans is essential for diagnosis, treatment planning, and disease monitoring. While deep learning has advanced automated segmentation, most models remain task-specific, lacking generalizability across modalities and institutions. Vision foundation models (FMs) pretrained on billion-scale natural images offer powerful and transferable representations. However, adapting them to medical imaging faces two key challenges: (1) the ViT backbone of most foundation models still underperform specialized CNNs on medical image segmentation, and (2) the large domain gap between natural and medical images limits transferability. We introduce MedDINOv3, a simple and effective framework for adapting DINOv3 to medical segmentation. We first revisit plain ViTs and design a simple and effective architecture with multi-scale token aggregation. Then, we perform domain-adaptive pretraining on CT-3M, a curated collection of 3.87M axial CT slices, using a multi-stage DINOv3 recipe to learn robust dense features. MedDINOv3 matches or exceeds state-of-the-art performance across four segmentation benchmarks, demonstrating the potential of vision foundation models as unified backbones for medical image segmentation. The code is available at https://github.com/ricklisz/MedDINOv3.
在CT和MRI扫描中,器官和肿瘤的精确分割对于诊断、治疗计划和疾病监测至关重要。虽然深度学习已经推动了自动化分割的发展,但大多数模型仍然针对特定任务,在不同的成像模式和机构之间缺乏通用性。视觉基础模型(FMs),在百亿级自然图像上进行预训练,提供了强大且可迁移的表示。然而,将其适应医学成像面临两个主要挑战:(1)大多数基础模型的ViT主干在医学图像分割方面仍然表现不佳,不及专业化的CNN;(2)自然图像和医学图像之间的领域差距较大,限制了可迁移性。我们引入了MedDINOv3,这是一个简单有效的框架,用于将DINOv3适应医学分割。我们首先回顾了普通的ViTs,并设计了一个简单有效的架构,具有多尺度令牌聚合。然后,我们使用CT-3M(一个精选的包含387万轴向CT切片的集合)进行领域自适应预训练,并使用多阶段DINOv3配方来学习稳健的密集特征。MedDINOv3在四个分割基准测试中达到了或超过了最先进的性能水平,证明了视觉基础模型作为医学图像分割统一主干的潜力。代码可在https://github.com/ricklisz/MedDINOv3找到。
论文及项目相关链接
Summary
本文介绍了在CT和MRI扫描中准确分割器官和肿瘤的重要性,并指出深度学习在自动化分割方面的进展。然而,大多数模型缺乏跨模态和跨机构的通用性。文章提出将预训练于百亿级自然图像的视觉基础模型(FMs)应用于医学成像,并介绍了MedDINOv3框架,该框架通过多阶段DINOv3配方在CT-3M数据集上进行域自适应预训练,以学习稳健的密集特征。MedDINOv3在四个分割基准测试中达到了或超越了最先进的性能,展示了视觉基础模型作为医学图像分割统一主干模型的潜力。
Key Takeaways
- 准确分割CT和MRI扫描中的器官和肿瘤对诊断、治疗计划和疾病监测至关重要。
- 大多数现有的深度学习模型在医学图像分割中缺乏跨模态和跨机构的通用性。
- 视觉基础模型(FMs)可以基于自然图像的强大和可迁移表示来增强医学成像。
- MedDINOv3框架是一种简单有效的将DINOv3适应医学分割的方法。
- MedDINOv3通过多阶段DINOv3配方在CT-3M数据集上进行域自适应预训练,设计出具有多尺度令牌聚合的简单有效架构。
- MedDINOv3在四个分割基准测试中表现出卓越的性能,展示了其潜力。
点此查看论文截图





KonfAI: A Modular and Fully Configurable Framework for Deep Learning in Medical Imaging
Authors:Valentin Boussot, Jean-Louis Dillenseger
KonfAI is a modular, extensible, and fully configurable deep learning framework specifically designed for medical imaging tasks. It enables users to define complete training, inference, and evaluation workflows through structured YAML configuration files, without modifying the underlying code. This declarative approach enhances reproducibility, transparency, and experimental traceability while reducing development time. Beyond the capabilities of standard pipelines, KonfAI provides native abstractions for advanced strategies including patch-based learning, test-time augmentation, model ensembling, and direct access to intermediate feature representations for deep supervision. It also supports complex multi-model training setups such as generative adversarial architectures. Thanks to its modular and extensible architecture, KonfAI can easily accommodate custom models, loss functions, and data processing components. The framework has been successfully applied to segmentation, registration, and image synthesis tasks, and has contributed to top-ranking results in several international medical imaging challenges. KonfAI is open source and available at https://github.com/vboussot/KonfAI.
KonfAI是一个模块化、可扩展和完全可配置的深度学习框架,专为医学影像任务设计。它使用户能够通过结构化的YAML配置文件定义完整的训练、推理和评估工作流程,而无需修改底层代码。这种声明式方法提高了可重复性、透明度和实验可追溯性,同时减少了开发时间。除了标准管道的功能外,KonfAI还提供高级策略的本机抽象,包括基于补丁的学习、测试时增强、模型集成和深度监督的直接访问中间特征表示。它还支持复杂的多模型训练设置,例如生成对抗架构。由于其模块化和可扩展的架构,KonfAI可以轻松容纳自定义模型、损失函数和数据处理组件。该框架已成功应用于分割、注册和图像合成任务,并在多个国际医学影像挑战中获得了顶尖结果。KonfAI是开源的,可在https://github.com/vboussot/KonfAI获取。
论文及项目相关链接
PDF https://github.com/vboussot/KonfAI
Summary
康斐医疗成像深度学习框架是一种模块化、可扩展和完全可配置的深度学习框架,专为医学影像任务设计。它采用声明式方法,通过结构化YAML配置文件定义训练、推理和评估流程,无需修改底层代码。该框架支持高级策略抽象、复杂多模型训练设置以及自定义模型、损失函数和数据处理组件的灵活集成。康斐框架已成功应用于分割、注册和图像合成任务,并在多个国际医学影像挑战中取得顶尖成果。
Key Takeaways
- 康斐是一个专为医学影像任务设计的深度学习框架。
- 它采用声明式方法,通过结构化YAML配置文件进行配置,提高实验的可重复性、透明性和追踪性。
- 康斐提供模块化和可扩展架构,支持高级策略抽象如基于补丁的学习、测试时增强、模型集成和深度监督的直接访问。
- 它支持复杂的多模型训练设置,包括生成对抗架构。
- 康斐框架允许轻松集成自定义模型、损失函数和数据处理组件。
- 该框架已成功应用于医学影像中的分割、注册和图像合成任务。
点此查看论文截图



Boosting Generic Semi-Supervised Medical Image Segmentation via Diverse Teaching and Label Propagation
Authors:Wei Li, Pengcheng Zhou, Linye Ma, Wenyi Zhao, Huihua Yang
Both limited annotation and domain shift are significant challenges frequently encountered in medical image segmentation, leading to derivative scenarios like semi-supervised medical (SSMIS), semi-supervised medical domain generalization (Semi-MDG) and unsupervised medical domain adaptation (UMDA). Conventional methods are generally tailored to specific tasks in isolation, the error accumulation hinders the effective utilization of unlabeled data and limits further improvements, resulting in suboptimal performance when these issues occur. In this paper, we aim to develop a generic framework that masters all three tasks. We found that the key to solving the problem lies in how to generate reliable pseudo labels for the unlabeled data in the presence of domain shift with labeled data and increasing the diversity of the model. To tackle this issue, we employ a Diverse Teaching and Label Propagation Network (DTLP-Net) to boosting the Generic Semi-Supervised Medical Image Segmentation. Our DTLP-Net involves a single student model and two diverse teacher models, which can generate reliable pseudo-labels for the student model. The first teacher model decouple the training process with labeled and unlabeled data, The second teacher is momentum-updated periodically, thus generating reliable yet divers pseudo-labels. To fully utilize the information within the data, we adopt inter-sample and intra-sample data augmentation to learn the global and local knowledge. In addition, to further capture the voxel-level correlations, we propose label propagation to enhance the model robust. We evaluate our proposed framework on five benchmark datasets for SSMIS, UMDA, and Semi-MDG tasks. The results showcase notable improvements compared to state-of-the-art methods across all five settings, indicating the potential of our framework to tackle more challenging SSL scenarios.
在医学图像分割中,经常会遇到有限的标注和领域偏移这两个重要的挑战,这导致了诸如半监督医学(SSMIS)、半监督医学领域泛化(Semi-MDG)和无监督医学领域适应(UMDA)等衍生场景。传统的方法通常针对特定的任务进行定制,误差累积阻碍了无标签数据的有效利用,限制了进一步的改进。当这些问题发生时,性能往往达不到最优。本文旨在开发一个掌握所有三个任务通用框架。我们发现解决问题的关键在于如何在存在领域偏移的有标签数据情况下,为无标签数据生成可靠的伪标签,并增加模型的多样性。为了解决这一问题,我们采用了多样化的教学和标签传播网络(DTLP-Net)来提升通用的半监督医学图像分割。我们的DTLP-Net包括一个学生模型和两个不同的教师模型,它们可以为学生模型生成可靠的伪标签。第一个教师模型将标注和无标注数据的训练过程解耦;第二个教师模型定期更新动量,从而生成可靠且多样的伪标签。为了充分利用数据中的信息,我们采用了样本间和样本内的数据增强来学习全局和局部知识。此外,为了进一步捕捉体素级的关联,我们提出标签传播以增强模型的稳健性。我们在五个基准数据集上评估了我们提出的框架在SSMIS、UMDA和Semi-MDG任务上的表现。结果显示,与最先进的方法相比,我们的框架在所有五个设置中都取得了显著的改进,这证明了我们框架在应对更具挑战性的SSL场景时的潜力。
论文及项目相关链接
PDF Under Review
摘要
本文研究了医学图像分割中遇到的有限标注与领域偏移两大挑战,并衍生出半监督医学图像分割(SSMIS)、半监督医学领域泛化(Semi-MDG)和无监督医学领域自适应(UMDA)等场景。针对现有方法在处理这些任务时积累误差、无法有效利用无标签数据的问题,本文提出一种通用框架,旨在掌握所有三个任务。为解决领域偏移下无标签数据的可靠伪标签生成问题,本文采用多样化教学及标签传播网络(DTLP-Net)增强通用半监督医学图像分割。DTLP-Net包括一个学生模型和两个多样化的教师模型,能为学生模型生成可靠的伪标签。同时,采用样本间和样本内的数据增强,学习全局和局部知识。此外,为捕捉体素级关联,本文提出标签传播以增强模型的稳健性。在五个基准数据集上的评估结果表明,与现有最佳方法相比,本文提出的框架在所有五个设置中都取得了显著的改进,显示出解决更复杂的SSL场景的潜力。
关键见解
- 医学图像分割面临有限标注和领域偏移的挑战,衍生出SSMIS、Semi-MDG和UMDA等场景。
- 现有方法在处理这些任务时存在误差积累的问题,无法有效利用无标签数据。
- 提出一种通用框架,旨在解决所有三个任务,强调生成可靠伪标签和增加模型多样性的重要性。
- 采用DTLP-Net,包括一个学生模型和两个教师模型,以生成可靠的伪标签。
- 通过样本间和样本内的数据增强,充分利用数据中的信息,学习全局和局部知识。
- 提出标签传播以增强模型的稳健性,捕捉体素级关联。
点此查看论文截图



MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis
Authors:Ning Zhu, Xiaochuan Ma, Shaoting Zhang, Guotai Wang
Cold-Start Active Learning (CSAL) aims to select informative samples for annotation without prior knowledge, which is important for improving annotation efficiency and model performance under a limited annotation budget in medical image analysis. Most existing CSAL methods rely on Self-Supervised Learning (SSL) on the target dataset for feature extraction, which is inefficient and limited by insufficient feature representation. Recently, pre-trained Foundation Models (FMs) have shown powerful feature extraction ability with a potential for better CSAL. However, this paradigm has been rarely investigated, with a lack of benchmarks for comparison of FMs in CSAL tasks. To this end, we propose MedCAL-Bench, the first systematic FM-based CSAL benchmark for medical image analysis. We evaluate 14 FMs and 7 CSAL strategies across 7 datasets under different annotation budgets, covering classification and segmentation tasks from diverse medical modalities. It is also the first CSAL benchmark that evaluates both the feature extraction and sample selection stages. Our experimental results reveal that: 1) Most FMs are effective feature extractors for CSAL, with DINO family performing the best in segmentation; 2) The performance differences of these FMs are large in segmentation tasks, while small for classification; 3) Different sample selection strategies should be considered in CSAL on different datasets, with Active Learning by Processing Surprisal (ALPS) performing the best in segmentation while RepDiv leading for classification. The code is available at https://github.com/HiLab-git/MedCAL-Bench.
冷启动主动学习(CSAL)旨在在没有先验知识的情况下选择信息样本进行标注,这对于在医学图像分析中使用有限的标注预算提高标注效率和模型性能非常重要。大多数现有的CSAL方法依赖于目标数据集上的自监督学习(SSL)进行特征提取,这是低效的,并且受到特征表示不足的限制。最近,预训练的Foundation Models(FMs)显示出强大的特征提取能力,具有更好的CSAL潜力。然而,这种范式很少受到研究,并且在CSAL任务中缺乏对FM进行比较的基准测试。为此,我们提出了MedCAL-Bench,这是基于FM的医学图像分析的首个系统性CSAL基准测试。我们在7个数据集上评估了14个FM和7个CSAL策略,涉及不同标注预算的分类和分割任务,涵盖了多种医学模态。它也是第一个同时评估特征提取和样本选择阶段的CSAL基准测试。我们的实验结果表明:1)大多数FM对于CSAL都是有效的特征提取器,DINO系列在分割方面表现最佳;2)这些FM在分割任务中的性能差异很大,而在分类任务中的差异较小;3)在不同的数据集上进行CSAL时,应考虑不同的样本选择策略,处理惊喜(ALPS)在分割方面表现最佳,而RepDiv在分类方面领先。代码可在https://github.com/HiLab-git/MedCAL-Bench获取。
论文及项目相关链接
PDF 23 pages, 6 figures, 10 tables
Summary
本文介绍了针对医学图像分析中的冷启动主动学习方法(CSAL)的挑战,强调了在无先验知识的情况下选择信息样本的重要性。文章指出传统CSAL方法依赖目标数据集的自我监督学习(SSL)进行特征提取存在效率低下和特征表示不足的问题。为解决这个问题,研究引入了预训练的Foundation Models(FMs),展现了强大的特征提取能力,并在CSAL中具有潜力。然而,FMs在CSAL中的应用鲜有研究,缺乏相应的基准测试平台。因此,本文提出了MedCAL-Bench,首个基于FM的CSAL基准测试平台,用于医学图像分析。通过评估14个FMs和7种CSAL策略在不同数据集和不同标注预算下的表现,实验结果显示不同FMs在分割任务上的性能差异较大,而在分类任务上差异较小;不同数据集的CSAL应考虑不同的样本选择策略。
Key Takeaways
- 冷启动主动学习方法(CSAL)在医学图像分析中,对于提高标注效率和模型性能至关重要,尤其是在有限的标注预算下。
- 现有CSAL方法大多依赖目标数据集的自我监督学习(SSL),存在效率低下和特征表示不足的问题。
- 预训练的Foundation Models(FMs)在特征提取方面表现出强大的能力,具有改善CSAL的潜力。
- FMs在CSAL中的应用鲜有研究,缺乏相应的基准测试平台,为此本文提出了MedCAL-Bench。
- 评估了多种FMs和CSAL策略在不同数据集、不同标注预算下的表现,涵盖了分类和分割任务。
- 实验结果显示,不同FMs在分割任务上的性能差异较大,而在分类任务上的差异较小。
点此查看论文截图







EMedNeXt: An Enhanced Brain Tumor Segmentation Framework for Sub-Saharan Africa using MedNeXt V2 with Deep Supervision
Authors:Ahmed Jaheen, Abdelrahman Elsayed, Damir Kim, Daniil Tikhonov, Matheus Scatolin, Mohor Banerjee, Qiankun Ji, Mostafa Salem, Hu Wang, Sarim Hashmi, Mohammad Yaqub
Brain cancer affects millions worldwide, and in nearly every clinical setting, doctors rely on magnetic resonance imaging (MRI) to diagnose and monitor gliomas. However, the current standard for tumor quantification through manual segmentation of multi-parametric MRI is time-consuming, requires expert radiologists, and is often infeasible in under-resourced healthcare systems. This problem is especially pronounced in low-income regions, where MRI scanners are of lower quality and radiology expertise is scarce, leading to incorrect segmentation and quantification. In addition, the number of acquired MRI scans in Africa is typically small. To address these challenges, the BraTS-Lighthouse 2025 Challenge focuses on robust tumor segmentation in sub-Saharan Africa (SSA), where resource constraints and image quality degradation introduce significant shifts. In this study, we present EMedNeXt – an enhanced brain tumor segmentation framework based on MedNeXt V2 with deep supervision and optimized post-processing pipelines tailored for SSA. EMedNeXt introduces three key contributions: a larger region of interest, an improved nnU-Net v2-based architectural skeleton, and a robust model ensembling system. Evaluated on the hidden validation set, our solution achieved an average LesionWise DSC of 0.897 with an average LesionWise NSD of 0.541 and 0.84 at a tolerance of 0.5 mm and 1.0 mm, respectively.
脑癌影响全球数百万人,在几乎所有的临床环境中,医生都依赖磁共振成像(MRI)来诊断和监测胶质瘤。然而,通过多参数MRI进行肿瘤量化的手动分割是目前的标准,这不仅耗时,需要专家放射科医生,而且在资源不足的卫生系统中通常不可行。这一问题在低收入地区尤为突出,那里的MRI扫描仪质量较低,放射学专家稀缺,导致分割和量化不准确。此外,非洲获得的MRI扫描次数通常较少。为了解决这些挑战,BraTS-Lighthouse 2025 Challenge专注于撒哈拉以南非洲(SSA)的稳健肿瘤分割,资源约束和图像质量退化引入了重大变化。在这项研究中,我们提出了EMedNeXt——一个基于MedNeXt V2的增强型脑肿瘤分割框架,具有深度监督和针对SSA优化的后处理流水线。EMedNeXt引入了三个主要贡献:更大的感兴趣区域、改进的nnU-Net v2基础架构和稳健的模型集成系统。在隐藏验证集上评估,我们的解决方案达到了LesionWise DSC的平均值为0.897,LesionWise NSD的平均值为0.541,在0.5毫米和1.0毫米的容忍度下分别为0.84。
论文及项目相关链接
PDF Won Third Place Award at Challenge 5 at BraTS-Lighthouse 2025 Challenge (MICCAI 2025)
Summary
针对非洲地区医疗资源匮乏、MRI扫描质量不佳等问题,EMedNeXt框架通过深度学习技术实现了对脑胶质瘤的精准分割。该框架结合了MedNeXt V2的深度学习监督和后处理优化流程,取得了较高的分割准确度。
Key Takeaways
- 脑癌诊断中,MRI是医生常用的诊断工具,尤其在低资源环境下。
- 当前的手动分割多参数MRI方法存在时间成本高、依赖专家及在资源匮乏地区难以实施的问题。
- 低收入地区特别是非洲的MRI扫描资源及质量较低,导致肿瘤分割和量化的不准确。
- BraTS-Lighthouse 2025 Challenge旨在解决撒哈拉以南非洲地区的肿瘤分割问题。
- EMedNeXt框架基于MedNeXt V2构建,拥有深度学习监督和优化后处理流程。
- EMedNeXt框架对非洲地区的脑胶质瘤分割达到了较高的准确度,平均LesionWise DSC达到0.897。
点此查看论文截图






MedVKAN: Efficient Feature Extraction with Mamba and KAN for Medical Image Segmentation
Authors:Hancan Zhu, Jinhao Chen, Guanghua He
Medical image segmentation has traditionally relied on convolutional neural networks (CNNs) and Transformer-based models. CNNs, however, are constrained by limited receptive fields, while Transformers face scalability challenges due to quadratic computational complexity. To over-come these issues, recent studies have explored alternative architectures. The Mamba model, a selective state-space design, achieves near-linear complexity and effectively captures long-range dependencies. Its vision-oriented variant, the Visual State Space (VSS) model, extends these strengths to image feature learning. In parallel, the Kolmogorov-Arnold Network (KAN) enhanc-es nonlinear expressiveness by replacing fixed activation functions with learnable ones. Moti-vated by these advances, we propose the VSS-Enhanced KAN (VKAN) module, which integrates VSS with the Expanded Field Convolutional KAN (EFC-KAN) as a replacement for Transformer modules, thereby strengthening feature extraction. We further embed VKAN into a U-Net frame-work, resulting in MedVKAN, an efficient medical image segmentation model. Extensive exper-iments on five public datasets demonstrate that MedVKAN achieves state-of-the-art performance on four datasets and ranks second on the remaining one. These results underscore the effective-ness of combining Mamba and KAN while introducing a novel and computationally efficient feature extraction framework. The source code is available at: https://github.com/beginner-cjh/MedVKAN.
医学图像分割传统上依赖于卷积神经网络(CNNs)和基于Transformer的模型。然而,CNNs受限于有限的感受野,而Transformer则由于二次计算复杂性而面临可扩展性挑战。为了解决这些问题,最近的研究已经探索了替代架构。Mamba模型是一种选择性状态空间设计,实现了近线性复杂度,并有效地捕捉了长距离依赖关系。其面向视觉的变体——视觉状态空间(VSS)模型,将这些优点扩展到了图像特征学习。同时,Kolmogorov-Arnold网络(KAN)通过用可学习的激活函数替换固定的激活函数,增强了非线性表达能力。受到这些进展的启发,我们提出了VSS增强KAN(VKAN)模块,它将VSS与扩展字段卷积KAN(EFC-KAN)相结合,作为Transformer模块的替代品,从而加强了特征提取能力。我们将VKAN进一步嵌入到U-Net框架中,形成了高效的医学图像分割模型MedVKAN。在五个公共数据集上的大量实验表明,MedVKAN在四个数据集上达到了最新技术水平,在剩余的一个数据集上排名第二。这些结果强调了结合Mamba和KAN的有效性,同时引入了一种新型的计算高效的特征提取框架。源代码可在:https://github.com/beginner-cjh/MedVKAN获取。
论文及项目相关链接
PDF This preprint has been published in Biomedical Signal Processing and Control, Volume 112, 2026, Article 108821
摘要
该文介绍了针对医学图像分割的新模型MedVKAN。该模型结合Mamba和KAN网络的优点,通过VKAN模块和EFC-KAN替代Transformer模块,强化特征提取能力。MedVKAN在五个公开数据集上的实验结果表明,它在四个数据集上达到最新技术水平,在剩余一个数据集上排名第二。该模型结合了计算效率和特征提取的新颖性,为医学图像分割提供了新的解决方案。
关键见解
- 医学图像分割的传统方法主要依赖于卷积神经网络(CNNs)和基于Transformer的模型。
- CNN受限于有限的感受野,而Transformer面临由于二次计算复杂性导致的可扩展性挑战。
- Mamba模型和Visual State Space(VSS)模型为克服这些问题提供了新的架构选择。
- Kolmogorov-Arnold网络(KAN)通过用可学习的激活函数替换固定的激活函数,增强了非线性表达能力。
- 提出的VKAN模块结合了VSS和Expanded Field Convolutional KAN(EFC-KAN),作为Transformer模块的替代品。
- MedVKAN模型将VKAN嵌入U-Net框架中,实现了高效的医学图像分割。
点此查看论文截图




Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results
Authors:Meritxell Riera-Marin, Sikha O K, Julia Rodriguez-Comas, Matthias Stefan May, Zhaohong Pan, Xiang Zhou, Xiaokun Liang, Franciskus Xaverius Erick, Andrea Prenner, Cedric Hemon, Valentin Boussot, Jean-Louis Dillenseger, Jean-Claude Nunes, Abdul Qayyum, Moona Mazher, Steven A Niederer, Kaisar Kushibar, Carlos Martin-Isla, Petia Radeva, Karim Lekadir, Theodore Barfoot, Luis C. Garcia Peraza Herrera, Ben Glocker, Tom Vercauteren, Lucas Gago, Justin Englemann, Joy-Marie Kleiss, Anton Aubanell, Andreu Antolin, Javier Garcia-Lopez, Miguel A. Gonzalez Ballester, Adrian Galdran
Deep learning (DL) has become the dominant approach for medical image segmentation, yet ensuring the reliability and clinical applicability of these models requires addressing key challenges such as annotation variability, calibration, and uncertainty estimation. This is why we created the Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS), which highlights the critical role of multiple annotators in establishing a more comprehensive ground truth, emphasizing that segmentation is inherently subjective and that leveraging inter-annotator variability is essential for robust model evaluation. Seven teams participated in the challenge, submitting a variety of DL models evaluated using metrics such as Dice Similarity Coefficient (DSC), Expected Calibration Error (ECE), and Continuous Ranked Probability Score (CRPS). By incorporating consensus and dissensus ground truth, we assess how DL models handle uncertainty and whether their confidence estimates align with true segmentation performance. Our findings reinforce the importance of well-calibrated models, as better calibration is strongly correlated with the quality of the results. Furthermore, we demonstrate that segmentation models trained on diverse datasets and enriched with pre-trained knowledge exhibit greater robustness, particularly in cases deviating from standard anatomical structures. Notably, the best-performing models achieved high DSC and well-calibrated uncertainty estimates. This work underscores the need for multi-annotator ground truth, thorough calibration assessments, and uncertainty-aware evaluations to develop trustworthy and clinically reliable DL-based medical image segmentation models.
深度学习(DL)已成为医学图像分割的主流方法,但要确保这些模型可靠性和临床适用性,需要解决标注变化、校准和不确定性估计等关键挑战。因此,我们创建了多评估者体积评估中的校准和不确定性(CURVAS),它突出了多评估者在建立更全面真实情况中的关键作用,强调分割本质上是主观的,利用评估者之间的变化对于稳健的模型评估至关重要。七个团队参与了此次挑战,使用诸如Dice相似系数(DSC)、期望校准误差(ECE)和连续排名概率分数(CRPS)等指标评估了多种深度学习模型。通过纳入共识和非共识真实情况,我们评估深度学习模型如何处理不确定性,以及它们的置信度估计是否与真正的分割性能相符。我们的研究结果再次强调了良好校准模型的重要性,因为更好的校准与结果质量密切相关。此外,我们的研究还表明,在多样化数据集上训练并丰富预训练知识的分割模型表现出更大的稳健性,特别是在偏离标准解剖结构的情况下。值得注意的是,表现最佳的模型具有高DSC和良好的不确定性估计校准。这项工作强调了多评估者真实情况、全面的校准评估和不确定性感知评估的需求,以开发可信赖和临床可靠的基于深度学习的医学图像分割模型。
论文及项目相关链接
PDF This challenge was hosted in MICCAI 2024
Summary
这篇论文介绍了深度学习在医学图像分割中的主导地位,同时指出了模型在临床应用中的可靠性和可靠性挑战,如标注的差异性、校准和不确定性估计等。为此,创建了CURVAS挑战,强调多评估者对建立更全面真实的重要性,指出分割本质上是主观的,利用评估者之间的差异性对稳健模型评估至关重要。研究发现,良好的校准模型与结果质量密切相关,经过多样化数据集训练和预训练知识增强的分割模型具有更大的稳健性。最好的模型实现了高DSC和良好的校准不确定性估计。强调需要多评估者真实、全面的校准评估和不确定性意识评估,以开发可信且临床可靠的基于深度学习的医学图像分割模型。
Key Takeaways
- 深度学习已成为医学图像分割的主要方法,但确保其可靠性和临床适用性面临挑战。
- 创建CURVAS挑战,强调多评估者对建立更全面真实的重要性。
- 分割本质上是主观的,利用评估者之间的差异性对稳健模型评估至关重要。
- 良好的校准模型与结果质量密切相关。
- 经过多样化数据集训练和预训练知识增强的分割模型具有更大的稳健性。
- 最佳模型具有高DSC和良好的校准不确定性估计。
点此查看论文截图

SAIP-Net: Enhancing Remote Sensing Image Segmentation via Spectral Adaptive Information Propagation
Authors:Zhongtao Wang, Xizhe Cao, Yisong Chen, Guoping Wang
Semantic segmentation of remote sensing imagery demands precise spatial boundaries and robust intra-class consistency, challenging conventional hierarchical models. To address limitations arising from spatial domain feature fusion and insufficient receptive fields, this paper introduces SAIP-Net, a novel frequency-aware segmentation framework that leverages Spectral Adaptive Information Propagation. SAIP-Net employs adaptive frequency filtering and multi-scale receptive field enhancement to effectively suppress intra-class feature inconsistencies and sharpen boundary lines. Comprehensive experiments demonstrate significant performance improvements over state-of-the-art methods, highlighting the effectiveness of spectral-adaptive strategies combined with expanded receptive fields for remote sensing image segmentation.
遥感影像的语义分割要求精确的空间边界和稳健的类内一致性,这对传统的层次模型提出了挑战。为了解决由空间域特征融合和感受野不足而产生的局限性,本文引入了SAIP-Net,这是一种新型的频率感知分割框架,利用谱自适应信息传播。SAIP-Net采用自适应频率滤波和多尺度感受野增强,有效地抑制了类内特征的不一致性,提高了边界线的清晰度。综合实验表明,该方法在最新方法的基础上显著提高了性能,突出了谱自适应策略与扩展感受野在遥感图像分割中的有效性。
论文及项目相关链接
Summary
新型遥感影像语义分割模型SAIP-Net提出基于频域自适应信息处理技术的框架,通过自适应频率过滤和多尺度感受野增强技术,提高了空间边界的精确度并增强了类内一致性,显著提高了遥感影像分割的性能。
Key Takeaways
- SAIP-Net是一种针对遥感影像语义分割的新型框架。
- 它利用频域自适应信息处理技术来解决传统模型在空间域特征融合和感受野不足方面的问题。
- SAIP-Net通过自适应频率过滤来抑制类内特征的不一致性。
- 多尺度感受野增强技术被用来提高边界线的清晰度。
- SAIP-Net通过扩大感受野和结合光谱自适应策略,实现了对先进方法的显著性能提升。
- 该模型在遥感影像分割领域具有潜在的应用价值。
点此查看论文截图



X-ray Polarimetry in the Low Statistics Regime using the Bayesian Approach Reveals Polarization Angle Variations
Authors:Hong Li, Qing-Chang Zhao, Hua Feng, Lian Tao, Sergey S. Tsygankov
X-ray polarimetry of accreting compact objects has revealed fast time variations in the polarization angle (PA), suggesting that the geometry and/or optical depth of the Comptonization region is changing rapidly. This prompts investigations into how fast such variability can be. Conventionally, the data are often binned to examine the time variability such that the measurement in each bin is above the minimum detectable polarization (MDP). Here we demonstrate that this is unnecessary, and even below the MDP, one can infer the posterior distribution of PA reliably using the Bayesian approach and still be able to place useful constraints on the physics in many cases, due to small relative uncertainties on PA (e.g., $\Delta$PA $\approx$ 10–30$^\circ$ compared with a dynamical range of 180$^\circ$). With this approach, we discovered that the PA variation in one of the Imaging X-ray Polarimetry Explorer (IXPE) observations of GX 13+1 is not following a linear rotation mode as suggested previously. Instead, the PA swings between two discrete angles, suggesting that there are two emitting components, e.g., the boundary layer and the spreading layer, competing with each other. In XTE J1701-462, we confirmed previous results for a variable PA in the normal branch, and furthermore, revealed that the variation timescale could be as short as 1.5 hours. During the IXPE observation of Sco X-1, a hint is found for the PA in the highest flux level to be different from the average but consistent with previous measurement results with PolarLight and OSO-8.
对致密吸积天体进行X射线偏振测量揭示了偏振角(PA)的快速时间变化,这表明康普顿化区域的几何形状和/或光学深度正在快速变化。这促使人们研究这种变化速度有多快。通常,数据通常会被分箱处理以检查时间变化,以便每个箱内的测量值都高于最小可检测偏振度(MDP)。在这里我们证明这是不必要的,即使低于MDP,也可以利用贝叶斯方法可靠地推断PA的后验分布,并且在许多情况下仍然可以对物理学施加有用的约束,这是由于PA的相对不确定性较小(例如,与180°的动态范围相比,ΔPA约为10-30°)。通过这种方法,我们发现GX 13+1成像X射线偏振仪(IXPE)观测的PA变化并没有遵循先前提出的线性旋转模式。相反,PA在两个离散角度之间摆动,这表明存在两个发射分量,例如边界层和扩展层相互竞争。在XTE J1701-462中,我们证实了正常分支中可变PA的先前结果,并且进一步揭示变化时间尺度可能短至1.5小时。在Sco X-1的IXPE观测中,发现最高通量水平的PA与平均值不同,但与之前使用PolarLight和OSO-8的测量结果一致。
论文及项目相关链接
PDF 9 pages, 7 figures. Accepted for publication in ApJ
摘要
X射线偏振测量揭示了致密天体在偏振角(PA)上的快速时间变化,暗示康普顿化区域的几何结构和光学厚度在迅速变化。研究聚焦于这种变化性的速度有多快。传统上,为了研究时间变化性,数据通常以超出最小可检测偏振度(MDP)的方式进行分箱处理。在这里我们展示了这是不必要的,并且即便低于MDP,仍可以利用贝叶斯方法可靠推断PA的后验分布,在许多情况下仍能对物理问题提出有用的限制。利用这种方法,我们发现GX 13+1的一个成像X射线偏振仪(IXPE)观测中PA变化并非遵循先前推测的线性旋转模式,而是介于两个离散角度之间摆动,暗示存在两个发射成分(如边界层和扩散层)在相互竞争。在XTE J1701-462中,我们确认了正常支路中可变PA的先前结果,并且进一步揭示变化时间尺度可能短至1.5小时。在Sco X-1的IXPE观测中,发现最高通量水平时的PA与平均值不同但与之前的测量结果一致。总的来说,我们的研究表明了对偏振数据的贝叶斯分析对于揭示致密天体中的复杂物理现象具有重要意义。
关键要点
- X射线偏振测量揭示了致密天体在偏振角上的快速变化。
- 贝叶斯方法允许在低于最小可检测偏振度的情况下推断偏振角后验分布。
- 在GX 13+1中观测到偏振角变化并非线性旋转模式,暗示存在两个发射成分竞争。
- 在XTE J1701-462中确认了可变偏振角并发现变化时间尺度可短至几小时。
- 在Sco X-1的观测中发现最高通量水平时的偏振角与先前测量结果一致。
- 贝叶斯分析对揭示致密天体中的复杂物理现象至关重要。
点此查看论文截图





A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis
Authors:Asifullah Khan, Laiba Asmatullah, Anza Malik, Shahzaib Khan, Hamna Asif
Self-supervised learning is a machine learning approach that generates implicit labels by learning underlined patterns and extracting discriminative features from unlabeled data without manual labelling. Contrastive learning introduces the concept of “positive” and “negative” samples, where positive pairs (e.g., variation of the same image/object) are brought together in the embedding space, and negative pairs (e.g., views from different images/objects) are pushed farther away. This methodology has shown significant improvements in image understanding and image text analysis without much reliance on labeled data. In this paper, we comprehensively discuss the terminologies, recent developments and applications of contrastive learning with respect to text-image models. Specifically, we provide an overview of the approaches of contrastive learning in text-image models in recent years. Secondly, we categorize the approaches based on different model structures. Thirdly, we further introduce and discuss the latest advances of the techniques used in the process such as pretext tasks for both images and text, architectural structures, and key trends. Lastly, we discuss the recent state-of-art applications of self-supervised contrastive learning Text-Image based models.
自监督学习是一种机器学习的方法,它通过从非标签数据中学习潜在的模式并提取区分特征,生成隐含的标签,而无需手动标注。对比学习引入了“正样本”和“负样本”的概念,其中正样本对(例如同一图像/对象的变体)在嵌入空间中聚集在一起,而负样本对(例如来自不同图像/对象的视图)则被推开。这种方法在图像理解和图像文本分析方面取得了显著的改进,极大地减少了对于标注数据的依赖。在本文中,我们针对文本-图像模型的对比学习进行了全面的术语讨论、近期发展以及应用方面的介绍。具体来说,我们概述了近年来文本-图像模型中对比学习的方法。其次,我们根据不同的模型结构对这些方法进行了分类。再次,我们进一步介绍了在该过程中使用的最新技术的进展,如图像和文本的预训练任务、架构结构和关键趋势。最后,我们讨论了基于文本-图像模型的自监督对比学习的最新应用。
论文及项目相关链接
PDF 38 pages, 8 figures, survey paper
Summary
自监督学习通过从非标记数据中学习潜在模式和提取判别特征,生成隐式标签,无需人工标注。对比学习引入了“正样本”和“负样本”的概念,将正样本对拉近嵌入空间,将负样本对推开。此方法在图像理解和文本分析方面显著提高了性能,减少对标注数据的依赖。本文全面探讨了对比学习在文本图像模型中的术语、最新发展和应用,概述了近年来的方法,按模型结构分类,并介绍了最新的技术和趋势。
Key Takeaways
- 自监督学习通过非标记数据学习潜在模式和提取特征,无需人工标注。
- 对比学习在自监督学习中引入正样本和负样本概念。
- 正样本对在嵌入空间中相互接近,负样本对则相互远离。
- 对比学习方法在图像理解和文本分析方面表现优异,减少对标注数据的依赖。
- 文本图像模型的对比学习方法近期得到全面发展。
- 本文按模型结构分类了对比学习方法。
点此查看论文截图


Convergence of Ray- and Pixel-Driven Discretization Frameworks in the Strong Operator Topology
Authors:Richard Huber
Tomography is a central tool in medical applications, allowing doctors to investigate patients’ interior features. The Radon transform (in two dimensions) is commonly used to model the measurement process in parallel-beam CT. Suitable discretization of the Radon transform and its adjoint (called the backprojection) is crucial. The most commonly used discretization approach combines what we refer to as the ray-driven Radon transform with what we refer to as the pixel-driven backprojection, as anecdotal reports describe these as showing the best approximation performance. However, there is little rigorous understanding of induced approximation errors. These methods involve three discretization parameters: the spatial-, detector-, and angular resolutions. Most commonly, balanced resolutions are used, i.e., the same (or similar) spatial- and detector resolutions are employed. We present an interpretation of ray- and pixel-driven discretizations as `convolutional methods’, a special class of finite-rank operators. This allows for a structured analysis that can explain observed behavior. In particular, we prove convergence in the strong operator topology of the ray-driven Radon transform and the pixel-driven backprojection under balanced resolutions, thus theoretically justifying this approach. In particular, with high enough resolutions one can approximate the Radon transform arbitrarily well. Numerical experiments corroborate these theoretical findings.
断层扫描是医学应用中的核心工具,允许医生检查患者的内部结构。Radon变换(二维)常用于平行束CT的测量过程建模。Radon变换及其伴随(称为反向投影)的适当离散化至关重要。最常用的离散化方法结合了所谓的射线驱动Radon变换和所谓的像素驱动反向投影,如个别报告所述,这些方法表现出最佳的近似性能。然而,对诱导的近似误差的严格理解很少。这些方法涉及三个离散化参数:空间分辨率、检测器分辨率和角度分辨率。通常使用的是平衡分辨率,即采用相同(或相似)的空间分辨率和检测器分辨率。我们将射线驱动和像素驱动的离散化解释为“卷积方法”,这是一类特殊的有限秩算子。这允许进行有结构的分析,可以解释观察到的行为。特别是,我们证明了在平衡分辨率下,射线驱动的Radon变换和像素驱动的反向投影在强算子拓扑中是收敛的,从而为这种方法提供了理论支持。特别是,在足够高的分辨率下,可以任意好地近似Radon变换。数值实验证实了这些理论发现。
论文及项目相关链接
PDF 38 pages, 14 figures, Preprint was substantially updated with inclusion of section 4.2.2 concerning numerical experiments for the backprojection, as well as improvements in all sections
Summary
医学断层成像中Radon变换及其离散化的重要性。结合像素驱动和射线驱动方法,通过卷积方法解析离散化过程,证明在平衡分辨率下其收敛性,数值实验支持这些理论发现。
Key Takeaways
- 断层成像中Radon变换用于建模测量过程的重要性。
- Radon变换的离散化及其伴随的背投影是关键步骤。
- 像素驱动和射线驱动方法是最常用的离散化方法,被证明在特定条件下表现良好。
- 缺乏关于近似误差的严格理解。
- 介绍了将像素驱动和射线驱动离散化视为卷积方法的一种特殊形式,即有限秩算子。
- 证明在平衡分辨率下,射线驱动的Radon变换和像素驱动的背投影在强算子拓扑中收敛。
点此查看论文截图


RadVLM: A Multitask Conversational Vision-Language Model for Radiology
Authors:Nicolas Deperrois, Hidetoshi Matsuo, Samuel Ruipérez-Campillo, Moritz Vandenhirtz, Sonia Laguna, Alain Ryser, Koji Fujimoto, Mizuho Nishio, Thomas M. Sutter, Julia E. Vogt, Jonas Kluckert, Thomas Frauenfelder, Christian Blüthgen, Farhad Nooralahzadeh, Michael Krauthammer
The widespread use of chest X-rays (CXRs), coupled with a shortage of radiologists, has driven growing interest in automated CXR analysis and AI-assisted reporting. While existing vision-language models (VLMs) show promise in specific tasks such as report generation or abnormality detection, they often lack support for interactive diagnostic capabilities. In this work we present RadVLM, a compact, multitask conversational foundation model designed for CXR interpretation. To this end, we curate a large-scale instruction dataset comprising over 1 million image-instruction pairs containing both single-turn tasks – such as report generation, abnormality classification, and visual grounding – and multi-turn, multi-task conversational interactions. After fine-tuning RadVLM on this instruction dataset, we evaluate it across different tasks along with re-implemented baseline VLMs. Our results show that RadVLM achieves state-of-the-art performance in conversational capabilities and visual grounding while remaining competitive in other radiology tasks. Ablation studies further highlight the benefit of joint training across multiple tasks, particularly for scenarios with limited annotated data. Together, these findings highlight the potential of RadVLM as a clinically relevant AI assistant, providing structured CXR interpretation and conversational capabilities to support more effective and accessible diagnostic workflows.
胸部X光(CXR)的广泛应用以及放射科医生的短缺,引发了人们对自动CXR分析和AI辅助报告生成技术的日益浓厚的兴趣。虽然现有的视觉语言模型(VLMs)在报告生成或异常检测等特定任务上展现出潜力,但它们通常缺乏支持交互诊断的能力。在这项工作中,我们推出了RadVLM,这是一个紧凑的多任务对话基础模型,专为CXR解读设计。为此,我们创建了一个大规模指令数据集,包含超过1百万张图像和指令对,包含如报告生成、异常分类和视觉定位等单回合任务,以及多回合多任务对话交互。在这个指令数据集上微调RadVLM后,我们在不同的任务上对其进行了评估,并与重新实现的基线VLMs进行了比较。结果表明,RadVLM在对话能力和视觉定位方面达到了最先进的性能,同时在其他放射学任务中保持竞争力。消融研究进一步强调了跨多个任务联合训练的优势,特别是在有限标注数据的情况下。这些发现共同突出了RadVLM作为临床相关AI助理的潜力,提供结构化CXR解读和对话能力,以支持更有效和可访问的诊断工作流程。
论文及项目相关链接
PDF 21 pages, 15 figures
Summary
本文介绍了RadVLM模型在医学影像领域的创新应用。针对放射科医生短缺和胸X光(CXR)广泛应用的问题,该模型展示了强大的多任务对话能力,包括报告生成、异常检测等。通过大规模指令数据集的训练,RadVLM在多项任务中表现出卓越性能,特别是在对话能力和视觉定位方面。此外,该模型在有限标注数据场景下也展现出优势,有望成为临床相关的AI助手,为诊断流程提供结构化解读和对话支持。
Key Takeaways
- RadVLM是一个针对CXR解读的多任务对话基础模型。
- 模型经过大规模指令数据集训练,包含单任务和多任务对话交互。
- RadVLM在对话能力和视觉定位方面达到业界领先水平。
- 模型在多任务联合训练方面展现出优势,特别是在标注数据有限的情况下。
- RadVLM有望成为临床相关的AI助手,支持更有效和可访问的诊断流程。
点此查看论文截图




TABSurfer: a Hybrid Deep Learning Architecture for Subcortical Segmentation
Authors:Aaron Cao, Vishwanatha M. Rao, Kejia Liu, Xinrui Liu, Andrew F. Laine, Jia Guo
Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propose TABSurfer, a novel 3D patch-based CNN-Transformer hybrid deep learning model designed for superior subcortical segmentation compared to existing state-of-the-art tools. To evaluate, we first demonstrate TABSurfer’s consistent performance across various T1w MRI datasets with significantly shorter processing times compared to FreeSurfer. Then, we validate against manual segmentations, where TABSurfer outperforms FreeSurfer based on the manual ground truth. In each test, we also establish TABSurfer’s advantage over a leading deep learning benchmark, FastSurferVINN. Together, these studies highlight TABSurfer’s utility as a powerful tool for fully automated subcortical segmentation with high fidelity.
尽管在大脑MRI扫描的定量结构分析中,亚皮层分割具有重要的应用,但其仍然是一个挑战。最准确的方法是手动分割,但这种方法非常耗时,因此已经采用了如FreeSurfer之类的自动化工具来处理这项任务。然而,这些传统管道在处理大数据集时速度较慢且效率低下。在本研究中,我们提出了TABSurfer,这是一种新型的三维补丁基CNN-Transformer混合深度学习模型,旨在以出色的表现进行亚皮层分割,优于现有的最先进工具。为了进行评估,我们首先展示了TABSurfer在各种T1w MRI数据集上的一致性能,其处理时间显著短于FreeSurfer。然后,我们与手动分割进行验证,TABSurfer在基于手动真实值的情况下表现优于FreeSurfer。在每次测试中,我们还证明了TABSurfer相对于领先的深度学习基准测试FastSurferVINN的优势。这些研究共同凸显了TABSurfer作为强大工具的重要性,可作为全自动高保真亚皮层分割的有力工具。
论文及项目相关链接
PDF 5 pages, 3 figures, 2 tables
Summary
本文介绍了一种新型的基于深度学习的子皮层分割方法TABSurfer,与传统的自动化工具FreeSurfer相比,TABSurfer在性能上有所提升,尤其是在处理大规模数据集时效率更高。研究表明,TABSurfer在多种T1w MRI数据集上的表现一致,且处理时间显著缩短。相较于手动分割,TABSurfer同样展现了优越性。总的来说,TABSurfer是一种高效、准确的自动化子皮层分割工具。
Key Takeaways
- TABSurfer是一种新型的3D补丁型CNN-Transformer混合深度学习模型,用于子皮层分割。
- 与现有先进工具相比,TABSurfer在处理大型数据集时更高效。
- TABSurfer在各种T1w MRI数据集上的表现一致。
- 与FreeSurfer相比,TABSurfer处理时间更短。
- 相较于手动分割,TABSurfer展现出更高的性能。
- TABSurfer相较于基准深度学习模型FastSurferVINN有所优势。
点此查看论文截图




