发布日期: 2025-10-11

更新日期: 2025-11-27

文章字数: 21.3k

阅读时长: 88 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-11 更新

AI-Driven Radiology Report Generation for Traumatic Brain Injuries

Authors:Riadh Bouslimi, Houda Trabelsi, Wahiba Ben Abdssalem Karaa, Hana Hedhli

Traumatic brain injuries present significant diagnostic challenges in emergency medicine, where the timely interpretation of medical images is crucial for patient outcomes. In this paper, we propose a novel AI-based approach for automatic radiology report generation tailored to cranial trauma cases. Our model integrates an AC-BiFPN with a Transformer architecture to capture and process complex medical imaging data such as CT and MRI scans. The AC-BiFPN extracts multi-scale features, enabling the detection of intricate anomalies like intracranial hemorrhages, while the Transformer generates coherent, contextually relevant diagnostic reports by modeling long-range dependencies. We evaluate the performance of our model on the RSNA Intracranial Hemorrhage Detection dataset, where it outperforms traditional CNN-based models in both diagnostic accuracy and report generation. This solution not only supports radiologists in high-pressure environments but also provides a powerful educational tool for trainee physicians, offering real-time feedback and enhancing their learning experience. Our findings demonstrate the potential of combining advanced feature extraction with transformer-based text generation to improve clinical decision-making in the diagnosis of traumatic brain injuries.

创伤性脑损伤在急诊医学中构成了重要的诊断挑战，因为对医疗图像进行及时的解读对患者的治疗效果至关重要。在本文中，我们提出了一种新型的基于人工智能的方法，专为颅脑损伤病例设计自动放射学报告生成。我们的模型结合了AC-BiFPN与Transformer架构来捕获和处理复杂的医疗成像数据，如CT和MRI扫描。AC-BiFPN提取多尺度特征，能够检测颅内出血等复杂异常，而Transformer通过模拟远程依赖关系生成连贯且上下文相关的诊断报告。我们在RSNA颅内出血检测数据集上评估了我们模型的性能，与传统基于CNN的模型相比，它在诊断和报告生成方面的表现更为出色。这一解决方案不仅支持处于高压环境中的放射科医生，而且还为实习医生提供了强大的教育工具，提供实时反馈并增强他们的学习体验。我们的研究结果表明，将高级特征提取与基于Transformer的文本生成相结合，在创伤性脑损伤的诊断中可以提高临床决策制定的潜力。

论文及项目相关链接

PDF

Summary

本文提出了一种基于AI的自动放射学报告生成方法，用于颅脑外伤病例。该方法结合了AC-BiFPN和Transformer架构，能够处理复杂的医学成像数据，如CT和MRI扫描。AC-BiFPN提取多尺度特征，检测颅内出血等细微异常，而Transformer生成连贯、上下文相关的诊断报告，通过建模远程依赖关系。在RSNA颅内出血检测数据集上评估，该模型在诊断准确性和报告生成方面都优于传统的CNN模型。此解决方案不仅支持放射科医生在高压力环境下工作，还为医学生提供强大的教育工具，提供实时反馈，增强学习体验。

Key Takeaways

论文介绍了在紧急医学中诊断颅脑损伤的挑战性，强调医学图像解读的及时性对患者结果至关重要。
提出了一种基于AI的自动放射学报告生成方法，专门用于颅脑损伤病例。
方法结合了AC-BiFPN和Transformer架构，能有效处理复杂的医学成像数据。
AC-BiFPN能够提取多尺度特征，检测颅内出血等细微异常。
Transformer能够生成连贯、上下文相关的诊断报告，建模远程依赖关系。
在RSNA颅内出血检测数据集上的评估表明，该模型在诊断准确性和报告生成方面优于传统CNN模型。

Cool Papers

点此查看论文截图

Robust Source-Free Domain Adaptation for Medical Image Segmentation based on Curriculum Learning

Authors:Ziqi Zhang, Yuexiang Li, Yawen Huang, Nanjun He, Tao Xu, Liwei Lin, Yefeng Zheng, Shaoxin Li, Feiyue Huang

Recent studies have uncovered a new research line, namely source-free domain adaptation, which adapts a model to target domains without using the source data. Such a setting can address the concerns on data privacy and security issues of medical images. However, current source-free domain adaptation frameworks mainly focus on the pseudo label refinement for target data without the consideration of learning procedure. Indeed, a progressive learning process from source to target domain will benefit the knowledge transfer during model adaptation. To this end, we propose a curriculum-based framework, namely learning from curriculum (LFC), for source-free domain adaptation, which consists of easy-to-hard and source-to-target curricula. Concretely, the former curriculum enables the framework to start learning with `easy’ samples and gradually tune the optimization direction of model adaption by increasing the sample difficulty. While, the latter can stablize the adaptation process, which ensures smooth transfer of the model from the source domain to the target. We evaluate the proposed source-free domain adaptation approach on the public cross-domain datasets for fundus segmentation and polyp segmentation. The extensive experimental results show that our framework surpasses the existing approaches and achieves a new state-of-the-art.

最近的研究揭示了一项新的研究线路，即无源域自适应（Source-Free Domain Adaptation），它能够在不使用源数据的情况下，使模型适应目标域。这种设置可以解决医疗图像数据隐私和安全问题。然而，当前的无源域自适应框架主要关注目标数据的伪标签优化，而没有考虑学习过程。实际上，从源域到目标域逐步学习过程将有助于模型适应过程中的知识转移。为此，我们提出了一种基于课程的学习框架，即无学习课程（Learning from Curriculum, LFC），用于无源域自适应，该框架包括由易到难和源到目标的课程。具体来说，前者课程使框架能够从“容易”的样本开始学习，并通过增加样本难度逐渐调整模型适应的优化方向。而后者则可以稳定适应过程，确保模型从源域到目标域的平稳转移。我们在公共的眼底分割和多息肉分割跨域数据集上评估了所提出的无源域自适应方法。大量的实验结果表明，我们的框架超越了现有方法，达到了新的先进水平。

论文及项目相关链接

PDF

Summary
近期研究发现了一种新的研究思路——无源域自适应技术，该技术无需使用源数据即可将模型适应于目标域。该技术解决了医疗图像等数据隐私和安全问题。然而，当前的无源域自适应框架主要关注目标数据的伪标签优化，而忽略了学习过程的重要性。我们从源域到目标域进行渐进学习，有助于知识迁移过程中的模型适应。为此，我们提出了一种基于课程的学习框架——无学习课程（LFC）进行无源域自适应，其中包括从易到难和从源到目标的课程。我们从“简单”样本开始学习，通过增加样本难度逐渐调整模型优化的方向；同时确保模型从源域到目标域的平稳过渡。我们在公共跨域数据集上评估了所提出的无源域自适应方法，用于眼底分割和息肉分割。大量实验结果表明，我们的框架超越了现有方法，达到了新的先进水平。

Key Takeaways

研究提出了一个新的概念——“无源域自适应”，旨在解决医疗图像等数据隐私和安全问题。
当前的无源域自适应框架主要关注目标数据的伪标签优化，但对学习过程考虑不足。
新的基于课程的学习框架——无学习课程（LFC），被提出来实现从源域到目标域的渐进学习。
该框架包括从易到难的课程，帮助模型从简单样本开始学习，逐渐增加样本难度，调整优化方向。
还包括从源到目标的课程，确保模型平稳地从源域过渡到目标域。
在公共跨域数据集上进行评估，该方法的眼底分割和息肉分割性能超越现有方法，达到新的先进水平。

Cool Papers

点此查看论文截图

Random Window Augmentations for Deep Learning Robustness in CT and Liver Tumor Segmentation

Authors:Eirik A. Østmo, Kristoffer K. Wickstrøm, Keyur Radiya, Michael C. Kampffmeyer, Karl Øyvind Mikalsen, Robert Jenssen

Contrast-enhanced Computed Tomography (CT) is important for diagnosis and treatment planning for various medical conditions. Deep learning (DL) based segmentation models may enable automated medical image analysis for detecting and delineating tumors in CT images, thereby reducing clinicians’ workload. Achieving generalization capabilities in limited data domains, such as radiology, requires modern DL models to be trained with image augmentation. However, naively applying augmentation methods developed for natural images to CT scans often disregards the nature of the CT modality, where the intensities measure Hounsfield Units (HU) and have important physical meaning. This paper challenges the use of such intensity augmentations for CT imaging and shows that they may lead to artifacts and poor generalization. To mitigate this, we propose a CT-specific augmentation technique, called Random windowing, that exploits the available HU distribution of intensities in CT images. Random windowing encourages robustness to contrast-enhancement and significantly increases model performance on challenging images with poor contrast or timing. We perform ablations and analysis of our method on multiple datasets, and compare to, and outperform, state-of-the-art alternatives, while focusing on the challenge of liver tumor segmentation.

增强计算机断层扫描（CT）对于各种疾病的诊断和治疗计划非常重要。基于深度学习的分割模型可能能够实现医疗图像自动化分析，检测并描绘CT图像中的肿瘤，从而减轻临床医生的工作量。在有限的领域（如放射学）实现模型的泛化能力，需要借助图像增强技术对现代深度学习模型进行训练。然而，将针对自然图像开发的增强方法直接应用于CT扫描时，往往会忽略CT的特性，其中强度以亨氏单位（HU）为单位测量并具有重要的物理意义。本文质疑在CT成像中使用此类强度增强的做法，并表明它们可能导致伪影和泛化性能不佳。为了缓解这一问题，我们提出了一种针对CT的特定增强技术，称为随机窗技术，该技术利用CT图像中可用的HU强度分布。随机窗技术有助于提高对比增强的稳健性，并对对比度较差或时间把控困难的图像显著提高模型性能。我们在多个数据集上进行了方法消融和分析，并与最新替代方案进行了比较，在解决肝脏肿瘤分割的挑战上表现更优。

论文及项目相关链接

PDF 10 pages, 9 figures. This work has been submitted to the IEEE for possible publication

Summary

本文探讨了对比增强计算机断层扫描（CT）在医学诊断和治疗计划中的重要性。深度学习（DL）在CT图像中的肿瘤检测和分割方面具有潜力，可减轻医生的工作量。为了在有限的医学图像数据领域实现模型的泛化能力，训练深度模型需要使用图像增强技术。然而，针对自然图像开发的增强方法直接应用于CT扫描时，忽略了CT的特性，其强度单位为霍恩菲尔德单位（HU），具有物理意义。本文质疑了这种强度增强方法在CT影像中的应用，并指出其可能导致伪影和泛化性能不佳。为此，提出了一种针对CT的特定增强技术——随机窗技术，该技术利用CT图像中可用的HU强度分布。随机窗口技术可以提高模型对对比增强的稳健性，并在对比度过低或图像质量较差的情况下显著提高模型性能。我们对方法进行了多个数据集的消融与分析，并超越了其他最先进的方法，尤其针对肝脏肿瘤分割提出了挑战。

Key Takeaways

对比增强计算机断层扫描（CT）在医学诊断和治疗计划中起重要作用。
深度学习在自动化医学图像分析中有潜力，特别是在CT图像的肿瘤检测和分割方面。
在有限的医学图像数据领域实现模型泛化需要训练深度模型并使用图像增强技术。
应用于CT扫描的增强方法需要考虑到CT的特性，尤其是其强度单位为霍恩菲尔德单位（HU）。
通用强度增强方法可能导致CT影像中的伪影和模型泛化性能不佳。
提出了针对CT的特定增强技术——随机窗技术，该技术利用CT图像中的HU分布。

Cool Papers

点此查看论文截图

RASALoRE: Region Aware Spatial Attention with Location-based Random Embeddings for Weakly Supervised Anomaly Detection in Brain MRI Scans

Authors:Bheeshm Sharma, Karthikeyan Jaganathan, Balamurugan Palaniappan

Weakly Supervised Anomaly detection (WSAD) in brain MRI scans is an important challenge useful to obtain quick and accurate detection of brain anomalies when precise pixel-level anomaly annotations are unavailable and only weak labels (e.g., slice-level) are available. In this work, we propose RASALoRE: Region Aware Spatial Attention with Location-based Random Embeddings, a novel two-stage WSAD framework. In the first stage, we introduce a Discriminative Dual Prompt Tuning (DDPT) mechanism that generates high-quality pseudo weak masks based on slice-level labels, serving as coarse localization cues. In the second stage, we propose a segmentation network with a region-aware spatial attention mechanism that relies on fixed location-based random embeddings. This design enables the model to effectively focus on anomalous regions. Our approach achieves state-of-the-art anomaly detection performance, significantly outperforming existing WSAD methods while utilizing less than 8 million parameters. Extensive evaluations on the BraTS20, BraTS21, BraTS23, and MSD datasets demonstrate a substantial performance improvement coupled with a significant reduction in computational complexity. Code is available at: https://github.com/BheeshmSharma/RASALoRE-BMVC-2025/.

在大脑MRI扫描中进行弱监督异常检测（WSAD）是一项重要挑战，在没有精确的像素级异常标注且只有弱标签（如切片级）可用的情况下，能迅速准确地检测大脑异常。在这项工作中，我们提出了RASALoRE：基于位置感知空间注意力的位置随机嵌入（Region Aware Spatial Attention with Location-based Random Embeddings），这是一种新颖的两阶段WSAD框架。在第一阶段，我们引入了一种判别式双重提示调整（DDPT）机制，它基于切片级标签生成高质量伪弱掩模，作为粗略定位线索。在第二阶段，我们提出了一种带有区域感知空间注意力机制的分割网络，该网络依赖于基于固定位置的随机嵌入。这种设计使模型能够有效地关注异常区域。我们的方法达到了最先进的异常检测性能，显著优于现有的WSAD方法，同时使用的参数少于8百万。在BraTS20、BraTS21、BraTS23和MSD数据集上的广泛评估证明了其性能显著提高以及计算复杂度的显著降低。代码可在以下网址找到：https://github.com/BheeshmSharma/RASALoRE-BMVC-2025/。

论文及项目相关链接

PDF Accepted in BMVC-2025

Summary

本摘要针对大脑MRI扫描的弱监督异常检测（WSAD）问题，提出了一种名为RASALoRE的新型两阶段WSAD框架。第一阶段通过判别式双提示调谐（DDPT）机制生成高质量伪弱掩膜，作为粗略定位线索。第二阶段采用具有基于固定位置随机嵌入的区域感知空间注意力机制的分割网络，实现对异常区域的有效关注。该方法实现了先进的异常检测性能，显著优于现有WSAD方法，同时参数使用量少于八百万。在BraTS20、BraTS21、BraTS23和MSD数据集上的广泛评估表明，该方法在性能上取得了实质性改进，同时大大降低了计算复杂度。

Key Takeaways

以下是该文本的主要见解：

针对大脑MRI扫描的弱监督异常检测（WSAD）问题提出了一种新型框架RASALoRE。
该框架包含两个阶段：第一阶段生成伪弱掩膜作为粗略定位线索，第二阶段采用具有区域感知空间注意力机制的分割网络。
利用判别式双提示调谐（DDPT）机制生成高质量伪弱掩膜。
基于固定位置随机嵌入的区域感知空间注意力机制有助于模型有效关注异常区域。
该方法实现了先进的异常检测性能，显著优于现有WSAD方法。
该方法参数使用量少于八百万，同时在多个数据集上实现了性能提升和计算复杂度的降低。

Cool Papers

点此查看论文截图

MRI-derived quantification of hepatic vessel-to-volume ratios in chronic liver disease using a deep learning approach

Authors:Alexander Herold, Daniel Sobotka, Lucian Beer, Nina Bastati, Sarah Poetter-Lang, Michael Weber, Thomas Reiberger, Mattias Mandorfer, Georg Semmler, Benedikt Simbrunner, Barbara D. Wichtmann, Sami A. Ba-Ssalamah, Michael Trauner, Ahmed Ba-Ssalamah, Georg Langs

Background: We aimed to quantify hepatic vessel volumes across chronic liver disease stages and healthy controls using deep learning-based magnetic resonance imaging (MRI) analysis, and assess correlations with biomarkers for liver (dys)function and fibrosis/portal hypertension. Methods: We assessed retrospectively healthy controls, non-advanced and advanced chronic liver disease (ACLD) patients using a 3D U-Net model for hepatic vessel segmentation on portal venous phase gadoxetic acid-enhanced 3-T MRI. Total (TVVR), hepatic (HVVR), and intrahepatic portal vein-to-volume ratios (PVVR) were compared between groups and correlated with: albumin-bilirubin (ALBI) and model for end-stage liver disease-sodium (MELD-Na) score, and fibrosis/portal hypertension (Fibrosis-4 [FIB-4] score, liver stiffness measurement [LSM], hepatic venous pressure gradient [HVPG], platelet count [PLT], and spleen volume). Results: We included 197 subjects, aged 54.9 $\pm$ 13.8 years (mean $\pm$ standard deviation), 111 males (56.3%): 35 healthy controls, 44 non-ACLD, and 118 ACLD patients. TVVR and HVVR were highest in controls (3.9; 2.1), intermediate in non-ACLD (2.8; 1.7), and lowest in ACLD patients (2.3; 1.0) ($p \leq 0.001$). PVVR was reduced in both non-ACLD and ACLD patients (both 1.2) compared to controls (1.7) ($p \leq 0.001$), but showed no difference between CLD groups ($p = 0.999$). HVVR significantly correlated indirectly with FIB-4, ALBI, MELD-Na, LSM, and spleen volume ($\rho$ ranging from -0.27 to -0.40), and directly with PLT ($\rho = 0.36$). TVVR and PVVR showed similar but weaker correlations. Conclusions: Deep learning-based hepatic vessel volumetry demonstrated differences between healthy liver and chronic liver disease stages and shows correlations with established markers of disease severity.

背景：本研究旨在利用基于深度学习的磁共振成像（MRI）分析来量化不同慢性肝病阶段和正常对照者的肝脏血管体积，并评估其与肝脏（异常）功能及纤维化/门静脉高压的生物标志物之间的相关性。

方法：我们回顾性地评估了正常对照组、非进展期慢性肝病患者和进展期慢性肝病患者，在静脉期的门脉期钆双胺酸增强的3T MRI上采用三维U-Net模型进行肝脏血管分割。组间比较总血管体积比（TVVR）、肝脏血管体积比（HVVR）和肝内门静脉与体积之比（PVVR），并与以下指标进行比较：白蛋白胆红素（ALBI）和末期肝病患者模型钠评分（MELD-Na）、纤维化指标/门静脉高压（Fibrosis-4指数（FIB-4）、肝硬度测定值（LSM）、肝静脉压力梯度（HVPG）、血小板计数（PLT）和脾脏体积）。

结果：共纳入研究对象197人，平均年龄为±标准偏差的54.9岁，男性占56.3%，包括正常对照组35人，非ACLD患者44人，ACLD患者118人。TVVR和HVVR在对照组中最高（分别为3.9和2.1），在非ACLD患者中处于中等水平（分别为2.8和1.7），而在ACLD患者中最低（分别为2.3和1.0）（p≤0.001）。PVVR在非ACLD和ACLD患者中均低于对照组（分别为1.2和1.7）（p≤0.001），但两组间无显著差异（p=0.999）。HVVR与FIB-4、ALBI、MELD-Na、LSM和脾脏体积的相关性间接且显著，其相关性强度从ρ=-0.27到ρ=-0.4变化不等；而与PLT呈直接显著相关性，其ρ为0.36。TVVR和PVVR显示了相似但较弱的关联性。

论文及项目相关链接

PDF ^Alexander Herold and Daniel Sobotka share first-authorship

Summary

基于深度学习技术的肝脏血管体积测量显示，健康人群与不同阶段的慢性肝病患者之间存在差异，并与疾病严重程度的标志存在相关性。

Key Takeaways

研究旨在利用深度学习技术进行肝脏血管体积的量化分析，评估慢性肝病阶段与健康对照者的差异。
采用3D U-Net模型对肝脏血管进行分割，利用磁共振成像（MRI）数据进行分析。
总血管体积比（TVVR）、肝脏血管体积比（HVVR）和肝内门静脉体积比（PVVR）在不同组间存在差异。
HVVR与肝纤维化、肝功能指标存在显著负相关，与血小板计数存在正相关。
TVVR和PVVR与疾病严重程度指标的相关性相对较弱。
深度学习技术在肝脏血管体积测量方面的应用有助于理解慢性肝病的病理生理机制。

Cool Papers

点此查看论文截图

Demystifying Deep Learning-based Brain Tumor Segmentation with 3D UNets and Explainable AI (XAI): A Comparative Analysis

Authors:Ming Jie Ong, Sze Yinn Ung, Sim Kuan Goh, Jimmy Y. Zhong

The current study investigated the use of Explainable Artificial Intelligence (XAI) to improve the accuracy of brain tumor segmentation in MRI images, with the goal of assisting physicians in clinical decision-making. The study focused on applying UNet models for brain tumor segmentation and using the XAI techniques of Gradient-weighted Class Activation Mapping (Grad-CAM) and attention-based visualization to enhance the understanding of these models. Three deep learning models - UNet, Residual UNet (ResUNet), and Attention UNet (AttUNet) - were evaluated to identify the best-performing model. XAI was employed with the aims of clarifying model decisions and increasing physicians’ trust in these models. We compared the performance of two UNet variants (ResUNet and AttUNet) with the conventional UNet in segmenting brain tumors from the BraTS2020 public dataset and analyzed model predictions with Grad-CAM and attention-based visualization. Using the latest computer hardware, we trained and validated each model using the Adam optimizer and assessed their performance with respect to: (i) training, validation, and inference times, (ii) segmentation similarity coefficients and loss functions, and (iii) classification performance. Notably, during the final testing phase, ResUNet outperformed the other models with respect to Dice and Jaccard similarity scores, as well as accuracy, recall, and F1 scores. Grad-CAM provided visuospatial insights into the tumor subregions each UNet model focused on while attention-based visualization provided valuable insights into the working mechanisms of AttUNet’s attention modules. These results demonstrated ResUNet as the best-performing model and we conclude by recommending its use for automated brain tumor segmentation in future clinical assessments. Our source code and checkpoint are available at https://github.com/ethanong98/MultiModel-XAI-Brats2020

当前研究旨在利用可解释人工智能（XAI）提高MRI图像中脑肿瘤分割的准确性，从而帮助医生进行临床决策。该研究重点是将UNet模型应用于脑肿瘤分割，并使用梯度加权类激活映射（Grad-CAM）和基于注意力的可视化等XAI技术，以提高对这些模型的理解。我们评估了三种深度学习模型——UNet、残差UNet（ResUNet）和注意力UNet（AttUNet）——以确定表现最佳的模型。使用XAI的目的是阐明模型决策，增加医生对这些模型的信任。我们比较了两种UNet变体（ResUNet和AttUNet）与传统UNet在BraTS2020公共数据集上分割脑肿瘤的性能，并使用Grad-CAM和基于注意力的可视化分析模型预测。我们使用最新的计算机硬件，使用Adam优化器训练和验证每个模型，并就其以下方面评估其性能：（i）训练、验证和推理时间；（ii）分割相似度系数和损失函数；（iii）分类性能。值得注意的是，在最终测试阶段，ResUNet在Dice和Jaccard相似度得分以及准确性、召回率和F1得分方面表现出比其他模型更优秀的性能。Grad-CAM提供了关于每个UNet模型关注的肿瘤亚区的视觉空间见解，而基于注意力的可视化则提供了有关AttUNet注意力模块工作机制的宝贵见解。这些结果证明了ResUNet是表现最佳的模型，因此我们建议未来在临床评估中使用ResUNet进行自动脑肿瘤分割。我们的源代码和检查点可在https://github.com/ethanong98/MultiModel-XAI-Brats2020找到。

论文及项目相关链接

PDF

Summary
本研究运用可解释的的人工智能（XAI）技术来提升MRI图像中脑肿瘤分割的准确性，旨在帮助医生进行临床决策。研究聚焦于使用UNet模型进行脑肿瘤分割，并结合Grad-CAM和注意力可视化等XAI技术，以增强对模型的理解。对比了UNet、ResUNet和AttUNet三种深度学习模型的性能，XAI的引入旨在澄清模型决策，增加医生对模型的信任。最终测试阶段，ResUNet在Dice和Jaccard相似度得分、准确率、召回率和F1分数上表现最佳。

Key Takeaways
1. 本研究使用Explainable Artificial Intelligence (XAI)以提高MRI图像中脑肿瘤分割的准确性。
2. 研究聚焦于应用UNet模型进行脑肿瘤分割，并使用Grad-CAM和注意力可视化技术以增强对模型的理解。
3. 对比了三种深度学习模型（UNet、ResUNet和AttUNet）的性能。
4. XAI的引入旨在帮助医生更好地理解模型决策，并增加他们对模型的信任。
5. 最终测试表明，ResUNet在各项评估指标上表现最佳。
6. Grad-CAM提供了关于肿瘤子区域的信息，这些区域是每个UNet模型所关注的重点。

Cool Papers

点此查看论文截图

TCIP: Threshold-Controlled Iterative Pyramid Network for Deformable Medical Image Registration

Authors:Heming Wu, Di Wang, Tai Ma, Peng Zhao, Yubin Xiao, Zhongke Wu, Xing-Ce Wang, Chuang Li, Xuan Wu, You Zhou

Although pyramid networks have demonstrated superior performance in deformable medical image registration, their decoder architectures are inherently prone to propagating and accumulating anatomical structure misalignments. Moreover, most existing models do not adaptively determine the number of iterations for optimization under varying deformation requirements across images, resulting in either premature termination or excessive iterations that degrades registration accuracy. To effectively mitigate the accumulation of anatomical misalignments, we propose the Feature-Enhanced Residual Module (FERM) as the core component of each decoding layer in the pyramid network. FERM comprises three sequential blocks that extract anatomical semantic features, learn to suppress irrelevant features, and estimate the final deformation field, respectively. To adaptively determine the number of iterations for varying images, we propose the dual-stage Threshold-Controlled Iterative (TCI) strategy. In the first stage, TCI assesses registration stability and with asserted stability, it continues with the second stage to evaluate convergence. We coin the model that integrates FERM and TCI as Threshold-Controlled Iterative Pyramid (TCIP). Extensive experiments on three public brain MRI datasets and one abdomen CT dataset demonstrate that TCIP outperforms the state-of-the-art (SOTA) registration networks in terms of accuracy, while maintaining comparable inference speed and a compact model parameter size. Finally, we assess the generalizability of FERM and TCI by integrating them with existing registration networks and further conduct ablation studies to validate the effectiveness of these two proposed methods.

尽管金字塔网络在可变形医学图像注册中表现出了卓越的性能，但其解码器架构固有的传播和累积解剖结构错位的问题。此外，大多数现有模型不能根据图像之间不同的变形要求自适应地确定优化迭代次数，导致过早终止或过度迭代，从而降低了注册准确性。为了有效缓解解剖结构错位的累积，我们提出了特征增强残差模块（FERM）作为金字塔网络中每个解码层的核心组件。FERM包含三个连续块，分别提取解剖语义特征、学习抑制无关特征和估计最终变形场。为了自适应地确定不同图像的迭代次数，我们提出了双阶段阈值控制迭代（TCI）策略。在第一阶段，TCI评估注册稳定性，并在稳定性得到确认后，进入第二阶段评估收敛性。我们将集成了FERM和TCI的模型称为阈值控制金字塔（TCIP）。在三个公共脑MRI数据集和一个腹部CT数据集上的广泛实验表明，TCIP在准确性方面超越了最先进的注册网络，同时保持了相当快的推理速度和紧凑的模型参数大小。最后，我们通过将FERM和TCI与现有注册网络集成来评估它们的通用性，并进行了剔除研究以验证这两种方法的有效性。

论文及项目相关链接

PDF

Summary

本文提出一种用于医学图像注册的改进方法，包括用于金字塔网络解码层的特征增强残差模块（FERM）和双阶段阈值控制迭代（TCI）策略。FERM用于减轻解剖结构错位累积的问题，而TCI策略则能自适应确定不同图像优化所需的迭代次数。新方法在多个公开数据集上的实验结果表明，其在保证推理速度和模型参数大小的同时，提高了注册网络的准确性。

Key Takeaways

金字塔网络在可变形医学图像注册中表现出卓越性能，但其解码架构容易传播和累积解剖结构错位。
现有模型大多不能自适应确定不同图像优化所需的迭代次数，导致过早终止或过度迭代，影响注册准确性。
提出特征增强残差模块（FERM），作为金字塔网络每个解码层的核心组件，用于减轻解剖结构错位累积的问题。
FERM包含三个顺序块，分别提取解剖语义特征、学习抑制无关特征和估计最终变形场。
提出双阶段阈值控制迭代（TCI）策略，自适应确定不同图像的迭代次数。
TCI策略包括两个阶段：第一阶段评估注册稳定性，第二阶段评估收敛性。
实验结果表明，结合FERM和TCI的阈值控制迭代金字塔（TCIP）模型在准确性方面优于现有最先进的注册网络，同时保持相当的推理速度和模型参数大小。

Cool Papers

点此查看论文截图

A Denoising Framework for Real-World Ultra-Low Dose Lung CT Images Based on an Image Purification Strategy

Authors:Guoliang Gong, Man Yu

Ultra-low dose CT (uLDCT) significantly reduces radiation exposure but introduces severe noise and artifacts. It also leads to substantial spatial misalignment between uLDCT and normal dose CT (NDCT) image pairs. This poses challenges for directly applying existing denoising networks trained on synthetic noise or aligned data. To address this core challenge in uLDCT denoising, this paper proposes an innovative denoising framework based on an Image Purification (IP) strategy. First, we construct a real clinical uLDCT lung dataset. Then, we propose an Image Purification strategy that generates structurally aligned uLDCT-NDCT image pairs, providing a high-quality data foundation for network training. Building upon this, we propose a Frequency-domain Flow Matching (FFM) model, which works synergistically with the IP strategy to excellently preserve the anatomical structure integrity of denoised images. Experiments on the real clinical dataset demonstrate that our IP strategy significantly enhances the performance of multiple mainstream denoising models on the uLDCT task. Notably, our proposed FFM model combined with the IP strategy achieves state-of-the-art (SOTA) results in anatomical structure preservation. This study provides an effective solution to the data mismatch problem in real-world uLDCT denoising. Code and dataset are available at https://github.com/MonkeyDadLufy/flow-matching.

超低剂量CT（uLDCT）显著降低了辐射暴露，但引入了严重的噪声和伪影。此外，它还导致uLDCT与常规剂量CT（NDCT）图像对之间出现较大的空间错位。这给直接应用现有合成噪声或对齐数据训练的降噪网络带来了挑战。针对uLDCT降噪中的这一核心挑战，本文提出了基于图像净化（IP）策略的创新降噪框架。首先，我们构建了真实的临床uLDCT肺部数据集。然后，我们提出了一种图像净化策略，生成结构对齐的uLDCT-NDCT图像对，为网络训练提供了高质量的数据基础。在此基础上，我们提出了频域流匹配（FFM）模型，该模型与IP策略协同工作，出色地保留了去噪图像的解剖结构完整性。在真实临床数据集上的实验表明，我们的IP策略显著提高了多个主流降噪模型在uLDCT任务上的性能。值得注意的是，我们提出的FFM模型与IP策略相结合，在解剖结构保留方面达到了最新水平（SOTA）。本研究为解决真实世界uLDCT降噪中的数据不匹配问题提供了有效解决方案。代码和数据集可通过https://github.com/MonkeyDadLufy/flow-matching获取。

论文及项目相关链接

PDF

Summary

本文提出一种基于图像净化策略的降噪框架，以解决超低剂量CT（uLDCT）图像中的噪声和伪影问题。通过构建真实临床uLDCT肺部数据集和采用图像净化策略生成结构对齐的uLDCT-NDCT图像对，为网络训练提供高质量数据基础。在此基础上，提出频域流匹配模型，与图像净化策略协同工作，优秀地保留了解噪图像的解剖结构完整性。实验证明，该策略显著提高主流降噪模型在uLDCT任务上的性能，尤其是频域流匹配模型结合图像净化策略在解剖结构保留方面达到最佳效果。

Key Takeaways

uLDCT显著减少辐射暴露，但引入严重噪声和伪影。
uLDCT与NDCT图像对之间存在空间不对准，给现有降噪网络的直接应用带来挑战。
提出基于图像净化策略的降噪框架，构建真实临床uLDCT肺部数据集。
引入频域流匹配模型，与图像净化策略协同，优秀地保留解噪图像的解剖结构完整性。
实验证明，图像净化策略显著提高主流降噪模型在uLDCT任务上的性能。
结合图像净化策略和频域流匹配模型达到最佳效果，在解剖结构保留方面为行业树立了新标准。

Cool Papers

点此查看论文截图

How We Won BraTS-SSA 2025: Brain Tumor Segmentation in the Sub-Saharan African Population Using Segmentation-Aware Data Augmentation and Model Ensembling

Authors:Claudia Takyi Ankomah, Livingstone Eli Ayivor, Ireneaus Nyame, Leslie Wambo, Patrick Yeboah Bonsu, Aondona Moses Iorumbur, Raymond Confidence, Toufiq Musah

Brain tumors, particularly gliomas, pose significant chall-enges due to their complex growth patterns, infiltrative nature, and the variability in brain structure across individuals, which makes accurate diagnosis and monitoring difficult. Deep learning models have been developed to accurately delineate these tumors. However, most of these models were trained on relatively homogenous high-resource datasets, limiting their robustness when deployed in underserved regions. In this study, we performed segmentation-aware offline data augmentation on the BraTS-Africa dataset to increase the data sample size and diversity to enhance generalization. We further constructed an ensemble of three distinct architectures, MedNeXt, SegMamba, and Residual-Encoder U-Net, to leverage their complementary strengths. Our best-performing model, MedNeXt, was trained on 1000 epochs and achieved the highest average lesion-wise dice and normalized surface distance scores of 0.86 and 0.81 respectively. However, the ensemble model trained for 500 epochs produced the most balanced segmentation performance across the tumour subregions. This work demonstrates that a combination of advanced augmentation and model ensembling can improve segmentation accuracy and robustness on diverse and underrepresented datasets. Code available at: https://github.com/SPARK-Academy-2025/SPARK-2025/tree/main/SPARK2025_BraTs_MODELS/SPARK_NeuroAshanti

脑肿瘤，特别是胶质瘤，由于其复杂的生长模式、浸润性以及脑结构在个体间的差异，给准确诊断和治疗带来了巨大挑战。深度学习模型已被开发出来精准地勾画这些肿瘤。然而，大多数模型是在相对均匀的高资源数据集上进行训练的，这在部署到服务不足的区域内时，限制了其稳健性。在这项研究中，我们对BraTS-Africa数据集进行了分割感知的离线数据增强，以增加数据样本量和多样性，从而提高模型的泛化能力。我们进一步构建了三种不同架构的集合模型，包括MedNeXt、SegMamba和Residual-Encoder U-Net，以利用它们的互补优势。表现最佳的模型MedNeXt经过1000个周期的训练，获得了最高的平均病变级别的Dice系数和归一化表面距离得分，分别为0.86和0.81。然而，经过500个周期训练的集成模型在肿瘤亚区的分割性能上表现最为均衡。这项工作表明，先进的增强技术与模型集成相结合，可以提高多样性和代表性不足的数据集的分割精度和稳健性。代码可在以下网址找到：https://github.com/SPARK-Academy-2025/SPARK-2025/tree/main/SPARK2025_BraTs_MODELS/SPARK_NeuroAshanti

论文及项目相关链接

PDF Brain Tumor Segmentation Challenge, International Medical Image Computing and Computer Assisted Intervention (MICCAI) Conference, 11 Pages, 2 Figures, 2 Tables

Summary
本研究针对非洲数据集BraTS-Africa进行线下数据增强，增加样本数量和多样性以提高模型泛化能力。结合三种不同架构的模型，实现肿瘤分割的精准诊断。其中MedNeXt模型表现最佳，平均病变级别的Dice系数和归一化表面距离得分分别为0.86和0.81。同时，通过模型集成提高了肿瘤各子区域的分割性能。研究证明了高级数据增强和模型集成的组合能提高分割精度和在多样性和代表性不足的数据集上的稳健性。

Key Takeaways

脑肿瘤的复杂生长模式、浸润性和个体差异使得准确诊断和治疗具有挑战。
Deep learning模型已被开发用于精确界定这些肿瘤。
研究使用BraTS-Africa数据集进行线下数据增强，以提高样本多样性和模型泛化能力。
结合三种不同架构的模型，包括MedNeXt、SegMamba和Residual-Encoder U-Net，以提高分割准确性。
MedNeXt模型在训练了1000个周期后表现最佳，达到较高的Dice系数和归一化表面距离得分。
通过模型集成，提高了肿瘤各子区域的分割性能的最佳平衡。

Cool Papers

点此查看论文截图

Authors:Alvaro Lopez Pellicer, Andre Mariucci, Plamen Angelov, Marwan Bukhari, Jemma G. Kerns

Bone health studies are crucial in medical practice for the early detection and treatment of Osteopenia and Osteoporosis. Clinicians usually make a diagnosis based on densitometry (DEXA scans) and patient history. The applications of AI in this field are ongoing research. Most successful methods rely on deep learning models that use vision alone (DEXA/X-ray imagery) and focus on prediction accuracy, while explainability is often disregarded and left to post hoc assessments of input contributions. We propose ProtoMedX, a multi-modal (multimodal) model that uses both DEXA scans of the lumbar spine and patient records. ProtoMedX’s prototype-based architecture is explainable by design, which is crucial for medical applications, especially in the context of the upcoming EU AI Act, as it allows explicit analysis of model decisions, including incorrect ones. ProtoMedX demonstrates state-of-the-art performance in bone health classification while also providing explanations that can be visually understood by clinicians. Using a dataset of 4,160 real NHS patients, the proposed ProtoMedX achieves 87.58% accuracy in vision-only tasks and 89.8% in its multi-modal variant, both surpassing existing published methods.

骨骼健康研究在医学实践中对于骨量减少和骨质疏松症的早期检测和治疗至关重要。临床医生通常基于骨密度测定（DEXA扫描）和患者病史进行诊断。人工智能在该领域的应用仍在研究中。大多数成功的方法依赖于使用视觉的深度学习模型（DEXA/X射线影像），侧重于预测准确性，而可解释性通常被忽视，并在事后对输入贡献进行评估。我们提出了ProtoMedX，一个使用DEXA扫描腰椎图像和患者记录的多模态（多模态）模型。ProtoMedX基于原型的设计是可解释的，这在医疗应用中至关重要，特别是在即将到来的欧盟人工智能法案的背景下，因为它允许对模型决策进行明确分析，包括错误的决策。ProtoMedX在骨骼健康分类方面表现出卓越的性能，同时提供了临床医生可以理解的解释。在使用包含4160名真实NHS患者的数据集进行的测试中，所提出的ProtoMedX在仅使用视觉任务的准确性达到87.58%，多模态变体的准确性为89.8%，均超过了现有已发布的方法。

论文及项目相关链接

PDF ICCV 2025 (PHAROS-AFE-AIMI: Adaptation, Fairness, and Explainability in Medical Imaging). 8 pages, 5 figures, 4 tables. Keywords: multi-modal, multimodal, prototype learning, explainable AI, interpretable models, case-based reasoning, medical imaging, DEXA, bone health, osteoporosis, osteopenia, diagnosis, classification, clustering

Summary
医学实践中的骨骼健康研究对于早期发现和治疗骨质疏松及骨质减少至关重要。研究人员正在研究人工智能在该领域的应用，目前最成功的方法主要依赖于使用深度学习的视觉模型（DEXA或X射线影像），并关注预测的准确性。本研究提出了ProtoMedX多模态模型，结合了DEXA扫描的腰椎影像与患者记录。其原型设计具有可解释性，对于医疗应用至关重要，特别是在即将到来的欧盟人工智能法案背景下，该模型允许对决策进行明确分析，包括错误的决策。ProtoMedX在骨骼健康分类方面表现出卓越的性能，同时提供了临床医生可以理解的解释。在包含4,160名真实NHS患者的数据集上，ProtoMedX在仅使用视觉任务的准确率为87.58%，多模态版本的准确率为89.8%，均超过了已发布的方法。

Key Takeaways

医学实践中的骨骼健康研究有助于早期发现和治疗骨质疏松及骨质减少。
目前AI在骨骼健康领域的研究主要依赖于深度学习的视觉模型。
ProtoMedX模型结合了DEXA扫描的腰椎影像与患者记录，具有多模态特性。
ProtoMedX的原型设计具有可解释性，这对医疗应用至关重要。
欧盟人工智能法案强调模型决策的可解释性。
ProtoMedX在骨骼健康分类方面表现出卓越性能，准确率超过现有方法。

Cool Papers

点此查看论文截图

FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification

Authors:Prajit Sengupta, Islem Rekik

Medical image classification requires not only high predictive performance but also interpretability to ensure clinical trust and adoption. Graph Neural Networks (GNNs) offer a powerful framework for modeling relational structures within datasets; however, standard GNNs often operate as black boxes, limiting transparency and usability, particularly in clinical settings. In this work, we present an interpretable graph-based learning framework named FireGNN that integrates trainable fuzzy rules into GNNs for medical image classification. These rules embed topological descriptors - node degree, clustering coefficient, and label agreement - using learnable thresholds and sharpness parameters to enable intrinsic symbolic reasoning. Additionally, we explore auxiliary self-supervised tasks (e.g., homophily prediction, similarity entropy) as a benchmark to evaluate the contribution of topological learning. Our fuzzy-rule-enhanced model achieves strong performance across five MedMNIST benchmarks and the synthetic dataset MorphoMNIST, while also generating interpretable rule-based explanations. To our knowledge, this is the first integration of trainable fuzzy rules within a GNN. Source Code: https://github.com/basiralab/FireGNN

医疗图像分类不仅需要高预测性能，还需要可解释性，以确保临床上的信任和采用。图神经网络（GNNs）为数据集内的关系结构提供了强大的建模框架；然而，标准GNNs通常作为黑箱操作，限制了透明度和可用性，特别是在临床环境中。在这项工作中，我们提出了一个基于图的、可解释的学习框架，名为FireGNN，它将可训练的模糊规则集成到GNNs中，用于医学图像分类。这些规则使用可学习的阈值和尖锐度参数嵌入拓扑描述符（节点度、聚类系数和标签一致性），以实现内在符号推理。此外，我们探索了辅助的自我监督任务（例如，同源性预测、相似性熵）作为评估拓扑学习贡献的基准。我们的增强模糊规则模型在五个MedMNIST基准测试和合成数据集MorphoMNIST上取得了强大的性能表现，同时生成了基于规则的可解释解释。据我们所知，这是GNN中可训练模糊规则的首次集成。源代码：https://github.com/basiralab/FireGNN

论文及项目相关链接

PDF Accepted at NeurIPS 2025 Conference (Workshop Track), San Diego, USA

Summary

本文介绍了一种名为FireGNN的可解释的基于图的学习框架，该框架将可训练的模糊规则集成到图神经网络（GNNs）中，用于医学图像分类。该框架使用拓扑描述符（如节点度、聚类系数和标签一致性）和可学习的阈值和尖锐度参数，实现了内在符号推理。此外，该研究还探讨了辅助自监督任务作为评估拓扑学习贡献的基准。该模糊规则增强的模型在五个MedMNIST基准测试和合成数据集MorphoMNIST上表现强劲，同时产生可解释的基于规则的解释。

Key Takeaways

医学图像分类需要高预测性能和可解释性，以确保临床信任和应用。
Graph Neural Networks (GNNs) 为数据集内的关系结构提供了强大的建模框架。
标准GNNs常常作为黑箱操作，限制了透明度和可用性，特别是在临床环境中。
FireGNN框架将可训练的模糊规则集成到GNNs中，实现医学图像分类。
该框架使用拓扑描述符进行符号推理，包括节点度、聚类系数和标签一致性。
辅助自监督任务被用作评估拓扑学习贡献的基准。

Cool Papers

点此查看论文截图

Authors:Lefei Shen, Mouxiang Chen, Xu Liu, Han Fu, Xiaoxue Ren, Jianling Sun, Zhuo Li, Chenghao Liu

Recent studies have indicated that vision models pre-trained on images can serve as time series foundation models (TSFMs) by reformulating time series forecasting (TSF) as image reconstruction. However, effective cross-modal transfer from vision to time series remains challenging due to three discrepancies: (1) the data-modality gap between structured, bounded image data and unbounded, heterogeneous time series; (2) the multivariate-forecasting gap between fixed RGB-three-channel vision models and time series with arbitrary numbers of variates; and (3) the probabilistic-forecasting gap between the deterministic outputs of vision models and the requirement for uncertainty-aware probabilistic predictions. To bridge these gaps, we propose VisonTS++, a TSFM based on continual pre-training of a vision model on large-scale time series. Our approach introduces three key innovations: (1) vision-model-based filtering to identify high-quality sequences to stabilize pre-training and mitigate modality gap; (2) colorized multivariate conversion, encoding multivariate series as multi-subfigure RGB images to enhance cross-variate modeling; (3) multi-quantile forecasting, using parallel reconstruction heads to generate quantile forecasts without parametric assumptions. Experiments show that VisionTS++ achieves state-of-the-art performance in both in-distribution and out-of-distribution forecasting, outperforming specialized TSFMs by 6%-44% in MSE reduction and ranking first in GIFT-Eval benchmark which comprises 23 datasets across 7 domains. Our work demonstrates that with appropriate adaptation, vision models can effectively generalize to TSF, thus advancing the pursuit of universal TSFMs. Code is available at https://github.com/HALF111/VisionTSpp.

近期研究表明，通过在图像上进行预训练的视觉模型，可以通过将时间序列预测（TSF）重新表述为图像重建，作为时间序列基础模型（TSFMs）。然而，由于三个差异，从视觉到时间序列的有效跨模态迁移仍然具有挑战性：（1）结构化、有界图像数据与无界、异质时间序列之间的数据模态差距；（2）固定RGB三通道视觉模型与具有任意变量数的时间序列之间的多元预测差距；（3）视觉模型的确定性输出与对不确定性感知的概率预测的要求之间的概率预测差距。为了弥合这些差距，我们提出了VisonTS++，一种基于视觉模型大规模时间序列持续预训练的时间序列基础模型。我们的方法引入了三个关键创新点：（1）基于视觉模型的过滤，以识别高质量序列，稳定预训练，减轻模态差距；（2）彩色多元转换，将多元序列编码为多子图RGB图像，以增强跨变量建模；（3）多分位预测，使用并行重建头生成分位预测，无需参数假设。实验表明，VisionTS++在内外分布预测方面达到了最先进的性能，在均方误差减少方面比专门的时间序列基础模型高出6%-44%，在由7个领域23个数据集组成的GIFT-Eval基准测试中排名第一。我们的工作证明，经过适当的适应，视觉模型可以有效地推广到TSF，从而推动通用TSFM的追求。代码可在https://github.com/HALF111/VisionTSpp上找到。

论文及项目相关链接

PDF 19 pages

摘要
基于图像预训练的视觉模型通过改革时间序列预测为图像重建，可作为时间序列基础模型（TSFM）。然而，由于数据模态差距、多元预测差距和概率预测差距，从视觉到时间序列的有效跨模态迁移仍然具有挑战性。为缩小这些差距，我们提出VisionTS++模型，该模型基于视觉模型的大规模时间序列持续预训练，引入三个关键创新点：基于视觉模型的过滤以稳定预训练并缩小模态差距；彩色多元转换，将多元序列编码为多子图RGB图像，以增强跨变量建模；多分位预测，使用并行重建头生成分位预测，无需参数假设。实验表明，VisionTS++在分布内和分布外预测方面均达到最新性能水平，在MSE降低方面优于专业TSFM达6%-44%，在涵盖7个领域的23个数据集的GIFT-Eval基准测试中排名第一。我们的工作证明，经过适当的适应，视觉模型可以有效地推广到TSF，从而推动通用TSFM的追求。

关键见解

研究表明，基于图像预训练的视觉模型可以作为时间序列基础模型（TSFM）。
存在从视觉到时间序列的跨模态迁移挑战，主要由于数据模态、多元预测和概率预测的差距。
VisionTS++模型通过三个关键创新来解决这些问题：基于视觉模型的过滤、彩色多元转换和多分位预测。
VisionTS++在多种数据集上实现最先进的性能，表现出优秀的泛化能力。
该模型在分布内和分布外预测均表现优越，与专业TSFM相比，MSE降低达6%-44%。
VisionTS++在GIFT-Eval基准测试中排名第一，涵盖7个领域的23个数据集。

Cool Papers

点此查看论文截图

AutoMiSeg: Automatic Medical Image Segmentation via Test-Time Adaptation of Foundation Models

Authors:Xingjian Li, Qifeng Wu, Adithya S. Ubaradka, Yiran Ding, Colleen Que, Runmin Jiang, Jianhua Xing, Tianyang Wang, Min Xu

Medical image segmentation is vital for clinical diagnosis, yet current deep learning methods often demand extensive expert effort, i.e., either through annotating large training datasets or providing prompts at inference time for each new case. This paper introduces a zero-shot and automatic segmentation pipeline that combines off-the-shelf vision-language and segmentation foundation models. Given a medical image and a task definition (e.g., “segment the optic disc in an eye fundus image”), our method uses a grounding model to generate an initial bounding box, followed by a visual prompt boosting module that enhance the prompts, which are then processed by a promptable segmentation model to produce the final mask. To address the challenges of domain gap and result verification, we introduce a test-time adaptation framework featuring a set of learnable adaptors that align the medical inputs with foundation model representations. Its hyperparameters are optimized via Bayesian Optimization, guided by a proxy validation model without requiring ground-truth labels. Our pipeline offers an annotation-efficient and scalable solution for zero-shot medical image segmentation across diverse tasks. Our pipeline is evaluated on seven diverse medical imaging datasets and shows promising results. By proper decomposition and test-time adaptation, our fully automatic pipeline not only substantially surpasses the previously best-performing method, yielding a 69% relative improvement in accuracy (Dice Score from 42.53 to 71.81), but also performs competitively with weakly-prompted interactive foundation models.

医学图像分割对临床诊断至关重要，然而，当前的深度学习方法往往需要大量的专家精力，例如通过标注大量训练数据集或在推理时间为每个新病例提供提示。本文介绍了一种零样本自动分割管道，它结合了现成的视觉语言模型和分割基础模型。给定医学图像和任务定义（例如，“在眼底图像中分割视盘”），我们的方法使用定位模型生成初始边界框，然后通过视觉提示增强模块增强提示，最后由可提示的分割模型处理以产生最终掩码。为了解决领域差距和结果验证的挑战，我们引入了一个测试时间适应框架，该框架具有一组可学习的适配器，用于将医学输入与基础模型表示进行对齐。其超参数通过贝叶斯优化进行优化，由代理验证模型指导，无需真实标签。我们的管道为跨不同任务的零样本医学图像分割提供了一种高效且可扩展的解决方案。我们的管道在七个不同的医学成像数据集上进行了评估，并显示出有希望的结果。通过适当的分解和测试时间适应，我们的全自动管道不仅显著超越了之前性能最佳的方法，准确度相对提高了69%（Dice得分从42.53提高到71.81），而且与弱提示交互式基础模型的性能相当。

论文及项目相关链接

PDF

摘要

本文介绍了一种零样本自动分割管道，结合了现成的视觉语言与分割基础模型。该方法只需医学图像和任务定义（如“在眼底图像中分割视盘”），无需大量专家标注数据或每个新病例的推理提示。通过初始边界框生成、视觉提示增强模块和可提示的分割模型处理，生成最终分割掩膜。为解决领域差距和结果验证问题，引入测试时自适应框架，通过优化超参数和代理验证模型，无需真实标签即可对齐医学输入与基础模型表示。该管道为跨不同任务的零样本医学图像分割提供了标注效率且可扩展的解决方案，并在七个不同的医学成像数据集上进行了评估，结果令人鼓舞。与传统方法相比，该管道不仅准确度相对提高了69%（Dice得分从42.53提高到71.81），而且与弱提示交互基础模型的性能具有竞争力。

关键见解

医学图像分割对临床诊断至关重要，但现有深度学习方法需要大量专家努力。
本文提出了一种零样本自动分割管道，结合了视觉语言和分割基础模型。
该方法通过生成初始边界框、增强视觉提示并处理可提示的分割模型来生成最终分割掩膜。
引入测试时自适应框架，通过优化超参数和代理验证模型解决领域差距和结果验证问题。
该管道在七个医学成像数据集上进行了评估，并实现了显著的准确性提高。
与传统方法相比，该管道的准确度相对提高了69%，并显示出与弱提示交互基础模型的竞争力。
该方法为实现全自动、高效的医学图像分割提供了新的可能性。

Cool Papers

点此查看论文截图

Robust Frequency Domain Full-Waveform Inversion via HV-Geometry

Authors:Zhijun Zeng, Matej Neumann, Yunan Yang

Conventional frequency-domain full-waveform inversion (FWI) is typically implemented with an $L^2$ misfit function, which suffers from challenges such as cycle skipping and sensitivity to noise. While the Wasserstein metric has proven effective in addressing these issues in time-domain FWI, its applicability in frequency-domain FWI is limited due to the complex-valued nature of the data and reduced transport-like dependency on wave speed. To mitigate these challenges, we introduce the HV metric ($d_{\text{HV}}$), inspired by optimal transport theory, which compares signals based on horizontal and vertical changes without requiring the normalization of data. We implement $d_{\text{HV}}$ as the misfit function in frequency-domain FWI and evaluate its performance on synthetic and real-world datasets from seismic imaging and ultrasound computed tomography (USCT). Numerical experiments demonstrate that $d_{\text{HV}}$ outperforms the $L^2$ and Wasserstein metrics in scenarios with limited prior model information and high noise while robustly improving inversion results on clinical USCT data.

传统的频域全波形反演（FWI）通常采用$L^2$不匹配函数进行实现，这面临着诸如周期跳变和对噪声敏感等挑战。虽然Wasserstein度量在时间域FWI中已被证明可以有效地解决这些问题，但由于数据的复值性和对波速的运输依赖降低，其在频域FWI中的应用受到限制。为了缓解这些挑战，我们引入了受最优传输理论启发的HV度量（$d_{\text{HV}}$），它基于水平变化和垂直变化比较信号，无需对数据进行归一化。我们将$d_{\text{HV}}$作为频域FWI中的不匹配函数进行实现，并评估其在地震成像和超声计算机断层扫描（USCT）的合成数据集和真实世界数据集上的性能。数值实验表明，在有限先验模型信息和高噪声的情况下，$d_{\text{HV}}$在场景中的表现优于$L^2$和Wasserstein度量，并且在临床USCT数据上稳健地改进了反演结果。

论文及项目相关链接

PDF

Summary
传统频域全波形反演（FWI）通常使用L²不适定函数，面临循环跳过和噪声敏感等问题。受最优传输理论启发，我们引入HV度量（dHV），该度量基于水平垂直变化比较信号，无需数据归一化。在频域FWI中实现dHV作为不适定函数，并在地震成像和超声计算机断层扫描（USCT）的合成分组和现实世界数据集上评估其性能。数值实验表明，在有限先验模型信息和高噪声情况下，dHV优于L²和Wasserstein度量，并能在临床USCT数据上稳健地改进反演结果。

Key Takeaways

传统频域全波形反演使用L²不适定函数面临挑战，如循环跳过和噪声敏感。
Wasserstein度量在频域FWI中的适用性有限，因为数据的复数性和运输依赖性的降低。
引入受最优传输理论启发的HV度量（dHV），无需数据归一化即可比较信号。
dHV基于水平垂直变化比较信号。
在合成和真实世界数据集上评估了dHV的性能，包括地震成像和超声计算机断层扫描（USCT）。
数值实验表明，在有限先验模型信息和高噪声情况下，dHV优于L²和Wasserstein度量。

Cool Papers

点此查看论文截图

From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation

Authors:Jingkun Chen, Haoran Duan, Xiao Zhang, Boyan Gao, Vicente Grau, Jungong Han

Medical image segmentation remains challenging due to the high cost of pixel-level annotations for training. In the context of weak supervision, clinician gaze data captures regions of diagnostic interest; however, its sparsity limits its use for segmentation. In contrast, vision-language models (VLMs) provide semantic context through textual descriptions but lack the explanation precision required. Recognizing that neither source alone suffices, we propose a teacher-student framework that integrates both gaze and language supervision, leveraging their complementary strengths. Our key insight is that gaze data indicates where clinicians focus during diagnosis, while VLMs explain why those regions are significant. To implement this, the teacher model first learns from gaze points enhanced by VLM-generated descriptions of lesion morphology, establishing a foundation for guiding the student model. The teacher then directs the student through three strategies: (1) Multi-scale feature alignment to fuse visual cues with textual semantics; (2) Confidence-weighted consistency constraints to focus on reliable predictions; (3) Adaptive masking to limit error propagation in uncertain areas. Experiments on the Kvasir-SEG, NCI-ISBI, and ISIC datasets show that our method achieves Dice scores of 80.78%, 80.53%, and 84.22%, respectively-improving 3-5% over gaze baselines without increasing the annotation burden. By preserving correlations among predictions, gaze data, and lesion descriptions, our framework also maintains clinical interpretability. This work illustrates how integrating human visual attention with AI-generated semantic context can effectively overcome the limitations of individual weak supervision signals, thereby advancing the development of deployable, annotation-efficient medical AI systems. Code is available at: https://github.com/jingkunchen/FGI.

医学图像分割仍然是一个挑战，因为像素级标注的成本高昂，这增加了训练的难度。在弱监督的背景下，医生注视数据能够捕捉诊断时的感兴趣区域，但其稀疏性限制了其在分割中的应用。相比之下，视觉语言模型（VLM）通过文本描述提供语义上下文，但缺乏所需的解释精度。我们认识到，单一的来源不足以解决问题，因此提出一个整合注视和语言监督的教师-学生框架，发挥它们的互补优势。我们的关键见解是，注视数据指示医生在诊断时的关注区域，而VLM则解释这些区域为何重要。为实现这一点，教师模型首先从由VLM生成的病变形态描述增强的注视点中学习，为引导学生模型奠定基础。然后，教师通过三种策略引导学生：（1）多尺度特征对齐，融合视觉线索和文本语义；（2）置信度加权一致性约束，专注于可靠预测；（3）自适应掩码，以限制不确定区域的错误传播。在Kvasir-SEG、NCI-ISBI和ISIC数据集上的实验表明，我们的方法分别实现了80.78%、80.53%和84.22%的Dice得分，在注视基线的基础上提高了3-5%，同时没有增加标注负担。通过保留预测、注视数据和病变描述之间的相关性，我们的框架还保持了临床可解释性。这项工作说明了如何将人类视觉注意力与AI生成的语义上下文相结合，从而有效克服单个弱监督信号的局限性，推动部署高效、注释有效的医疗AI系统的发展。相关代码可访问：https://github.com/jingkunchen/FGI。

论文及项目相关链接

PDF 11 pages, 4 figures

摘要
本研究针对医学图像分割面临的挑战，提出了一种结合医生注视数据和视觉语言模型（VLM）的教师-学生框架。该框架利用两者互补的优势，通过教师模型学习医生注视点和VLM生成的病变形态描述，指导学生模型进行训练。通过多尺度特征对齐、置信度加权一致性约束和自适应掩模等策略，实现医学图像分割的精准预测。实验结果表明，该方法在Kvasir-SEG、NCI-ISBI和ISIC数据集上的Dice得分有所提高，同时保持临床可解释性。

关键见解

医学图像分割面临训练像素级注释的高成本挑战。
医生注视数据能捕捉诊断时的感兴趣区域，但其稀疏性限制了其在分割中的应用。
视觉语言模型（VLM）通过文本描述提供语义上下文，但缺乏解释精度。
提出了一种教师-学生框架，结合了注视和语言的监督，利用它们的互补优势。
教师模型通过结合医生注视点和VLM生成的描述建立基础，指导学生模型训练。
通过多尺度特征对齐等策略实现精准预测，并在多个数据集上取得改进结果。
该方法保持临床可解释性，通过整合人类视觉注意和AI生成的语义上下文，克服单一弱监督信号的局限性。

Cool Papers

点此查看论文截图

Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification

Authors:Daniel G. P. Petrini, Hae Yong Kim

Mammography, an X-ray-based imaging technique, remains central to the early detection of breast cancer. Recent advances in artificial intelligence have enabled increasingly sophisticated computer-aided diagnostic methods, evolving from patch-based classifiers to whole-image approaches and then to multi-view architectures that jointly analyze complementary projections. Despite this progress, several critical questions remain unanswered. In this study, we systematically investigate these issues by addressing five key research questions: (1) the role of patch classifiers in performance, (2) the transferability of natural-image-trained backbones, (3) the advantages of learn-to-resize over conventional downscaling, (4) the contribution of multi-view integration, and (5) the robustness of findings across varying image quality. Beyond benchmarking, our experiments demonstrate clear performance gains over prior work. For the CBIS-DDSM dataset, we improved single-view AUC from 0.8153 to 0.8343, and multiple-view AUC from 0.8483 to 0.8658. Using a new comparative method, we also observed a 0.0217 AUC increase when extending from single to multiple-view analysis. On the complete VinDr-Mammo dataset, the multiple-view approach further improved results, achieving a 0.0492 AUC increase over single view and reaching 0.8511 AUC overall. These results establish new state-of-the-art benchmarks, providing clear evidence of the advantages of multi-view architectures for mammogram interpretation. Beyond performance, our analysis offers principled insights into model design and transfer learning strategies, contributing to the development of more accurate and reliable breast cancer screening tools. The inference code and trained models are publicly available at https://github.com/dpetrini/multiple-view.

乳腺X线摄影（即乳腺摄影术）仍是早期发现乳腺癌的核心技术。人工智能领域的最新进展使得计算机辅助诊断方法越来越精细，从基于补丁的分类器发展到全图像方法，再发展到多视图架构，联合分析互补投影。尽管取得了进展，但仍有一些关键问题尚未得到解答。在这项研究中，我们通过解决五个关键研究问题来系统地研究这些问题：（1）补丁分类器在性能中的作用；（2）自然图像训练后骨的可转移性；（3）与学习调整大小相比，传统缩小尺寸的优势；（4）多视图集成的贡献；以及（5）不同图像质量下研究结果的稳健性。除了基准测试外，我们的实验还证明了与之前工作的明确性能提升。对于CBIS-DDSM数据集，我们将单视图AUC从0.8153提高到0.8343，多视图AUC从0.8483提高到0.8658。使用一种新的比较方法，我们还观察到从单视图分析扩展到多视图分析时的AUC增加了0.0217。在完整的VinDr-Mammo数据集上，多视图方法进一步改善了结果，相对于单视图实现了0.0492的AUC增加，总体达到0.8511的AUC。这些结果建立了新的最先进的基准，充分证明了多视图架构在乳腺钼靶解读中的优势。除了性能之外，我们的分析还为模型设计和迁移学习策略提供了原则性的见解，为开发更准确可靠的乳腺癌筛查工具做出了贡献。推理代码和训练模型可在https://github.com/dpetrini/multiple-view公开获取。

论文及项目相关链接

PDF 31 pages

Summary

本文探讨了人工智能在乳腺癌检测中的应用，特别是多视角架构在乳腺X光图像分析中的优势。研究通过系统实验回答了五个关键问题，并在CBIS-DDSM和VinDr-Mammo数据集上取得了新的先进性能。多视角分析方法的引入有效提升了诊断准确性，为乳腺癌筛查工具的开发提供了理论见解和实践指导。

Key Takeaways

人工智能在乳腺癌检测中的最新进展，特别是多视角架构的应用，能联合分析互补投影，提高诊断准确性。
研究通过解决五个关键问题，包括patch分类器的作用、自然图像训练骨架的迁移性、学习调整大小的优势、多视角整合的贡献以及不同图像质量下结果的稳定性，进行了系统的实验调查。
在CBIS-DDSM数据集上，多视角分析方法相较于单视角提高了诊断性能，AUC值从0.8153提升至0.8658。
在VinDr-Mammo数据集上，多视角方法实现了更显著的AUC值提升，从单视角的0.8提升至整体的0.8511。
研究结果确立了新的性能基准，并提供了模型设计和迁移学习策略的见解，有助于开发更准确可靠的乳腺癌筛查工具。
公开可用的推理代码和训练模型有助于进一步研究和应用。

Cool Papers

点此查看论文截图

Submillimeter-Accurate 3D Lumbar Spine Reconstruction from Biplanar X-Ray Images: Incorporating a Multi-Task Network and Landmark-Weighted Loss

Authors:Wanxin Yu, Zhemin Zhu, Cong Wang, Yihang Bao, Chunjie Xia, Rongshan Cheng, Yan Yu, Tsung-Yuan Tsai

To meet the clinical demand for accurate 3D lumbar spine assessment in a weight-bearing position, this study presents a novel, fully automatic framework for high-precision 3D reconstruction from biplanar X-ray images, overcoming the limitations of existing methods. The core of this method involves a novel multi-task deep learning network that simultaneously performs lumbar decomposition and landmark detection on the original biplanar radiographs. The decomposition effectively eliminates interference from surrounding tissues, simplifying subsequent image registration, while the landmark detection provides an initial pose estimation for the Statistical Shape Model (SSM), enhancing the efficiency and robustness of the registration process. Building on this, we introduce a landmark-weighted 2D-3D registration strategy. By assigning higher weights to complex posterior structures like the transverse and spinous processes during optimization, this strategy significantly enhances the reconstruction accuracy of the posterior arch. Our method was validated against a gold standard derived from registering CT segmentations to the biplanar X-rays. It sets a new benchmark by achieving sub-millimeter accuracy and completes the full reconstruction and measurement workflow in under 20 seconds, establishing a state-of-the-art combination of precision and speed. This fast and low-dose pipeline provides a powerful automated tool for diagnosing lumbar conditions such as spondylolisthesis and scoliosis in their functional, weight-bearing state.

为满足在承重状态下对精确三维腰椎评估的临床需求，本研究提出了一种新型全自动高精度三维重建框架，可从双平面X射线图像中构建，克服了现有方法的局限性。该方法的核心在于一种新型多任务深度学习网络，可同时执行腰椎分解和原始双平面放射影像上的地标检测。分解有效地消除了周围组织的干扰，简化了后续图像注册，而地标检测为统计形状模型（SSM）提供了初步姿态估计，提高了注册过程的效率和稳健性。在此基础上，我们引入了一种地标加权2D-3D注册策略。通过在优化过程中给横向过程和棘突等复杂后侧结构分配更高的权重，该策略大大提高了后侧结构的重建精度。我们的方法通过与从双平面X射线注册得到的CT分割的金标准进行比较来验证。它达到了亚毫米级的精度，并在不到20秒内完成了完整的重建和测量流程，建立了精确度和速度的最新组合。这一快速、低剂量的流程为在功能承重状态下诊断腰椎疾病（如脊柱滑脱和脊柱侧弯）提供了强大的自动化工具。

论文及项目相关链接

PDF 27 pages, 16 figures, 9 tables

Summary

本研究提出了一种新型全自动框架，用于从双平面X射线图像进行高精度3D重建，以满足临床对负重状态下准确3D腰椎评估的需求。该研究的核心在于一种多任务深度学习网络，可同时执行腰椎分解和地标检测，从而消除周围组织的干扰，简化图像注册，并提供初始姿态估计，增强统计形状模型的效率和稳健性。此外，引入地标加权2D-3D注册策略，优化过程中给复杂后结构如横突和棘突赋予更高的权重，显著提高后拱重建的准确性。该研究实现了亚毫米级精度，并在20秒内完成完整重建和测量流程，为诊断腰椎疾病如椎体滑脱和脊柱侧凸提供了强大的自动化工具。

Key Takeaways

研究提出了一种全自动框架，从双平面X射线图像进行高精度3D腰椎重建。
多任务深度学习网络同时执行腰椎分解和地标检测，消除周围组织干扰，简化图像注册。
地标检测提供初始姿态估计，增强统计形状模型的效率和稳健性。
引入地标加权2D-3D注册策略，优化复杂后结构的重建准确性。
研究实现了亚毫米级精度，快速完成重建和测量流程。
该方法可用于诊断腰椎疾病，如椎体滑脱和脊柱侧凸。

Cool Papers

点此查看论文截图

A Graph-Based Framework for Interpretable Whole Slide Image Analysis

Authors:Alexander Weers, Alexander H. Berger, Laurin Lux, Peter Schüffler, Daniel Rueckert, Johannes C. Paetzold

The histopathological analysis of whole-slide images (WSIs) is fundamental to cancer diagnosis but is a time-consuming and expert-driven process. While deep learning methods show promising results, dominant patch-based methods artificially fragment tissue, ignore biological boundaries, and produce black-box predictions. We overcome these limitations with a novel framework that transforms gigapixel WSIs into biologically-informed graph representations and is interpretable by design. Our approach builds graph nodes from tissue regions that respect natural structures, not arbitrary grids. We introduce an adaptive graph coarsening technique, guided by learned embeddings, to efficiently merge homogeneous regions while preserving diagnostically critical details in heterogeneous areas. Each node is enriched with a compact, interpretable feature set capturing clinically-motivated priors. A graph attention network then performs diagnosis on this compact representation. We demonstrate strong performance on challenging cancer staging and survival prediction tasks. Crucially, our resource-efficient model ($>$13x fewer parameters and $>$300x less data) achieves results competitive with a massive foundation model, while offering full interpretability through feature attribution. Our code is publicly available at https://github.com/HistoGraph31/pix2pathology.

全切片图像（WSI）的组织病理学分析对癌症诊断至关重要，但这是一个耗时且依赖专家的过程。虽然深度学习的方法显示出有前景的结果，但主流的基于补丁的方法人为地分割组织，忽略生物边界，并产生黑箱预测。我们通过一个新型框架克服了这些限制，该框架将吉像素WSI转换为具有生物学信息的图形表示，并通过设计具有可解释性。我们的方法根据自然结构而不是任意网格构建图形节点。我们引入了一种自适应图形粗化技术，通过学习的嵌入进行引导，以有效地合并同质区域，同时保留异质区域的诊断细节。每个节点都通过捕捉临床驱动的先验知识，配备了一套紧凑且可解释的特征。然后，图注意力网络在此紧凑表示上进行诊断。我们在具有挑战性的癌症分期和生存预测任务上表现出强劲的性能。关键的是，我们的资源高效型模型（参数少13倍以上，数据少300倍以上）实现了与大型基础模型相当的结果，同时通过特征归属提供完全的可解释性。我们的代码可在 https://github.com/HistoGraph31/pix2pathology 获得。

论文及项目相关链接

PDF 15 pages, 5 figures

Summary

本文介绍了一种全新的用于全幻灯片图像（WSI）病理分析的框架，它能够将巨像素WSI转化为具有生物学信息的图形表示，通过构建尊重自然结构的图形节点，实现更准确的癌症诊断。该框架引入了一种自适应的图简化技术，能够在保留诊断细节的同时，有效地合并同质区域。此外，该模型具有紧凑、可解释的特征集，并通过图注意力网络进行诊断。该模型在癌症分期和生存预测任务上表现出强大的性能，且资源利用率高，可实现特征归属的完全可解释性。

Key Takeaways

病理分析在全幻灯片图像（WSI）中至关重要，但过程耗时且依赖专家。
当前深度学习方法在病理分析中存在局限性，如基于补丁的方法可能会人为地破坏组织，忽视生物边界，并产生黑箱预测。
提出了一种新的框架，将巨像素WSI转化为具有生物学信息的图形表示，尊重自然结构，提高诊断准确性。
引入自适应图简化技术，通过学习的嵌入进行引导，有效合并同质区域，同时保留异质区域的诊断细节。
每个节点都包含紧凑、可解释的特征集，这些特征集捕捉了临床动机的先验知识。
使用图注意力网络进行诊断，并在具有挑战性的癌症分期和生存预测任务上表现出强大的性能。

Cool Papers

点此查看论文截图

H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging

Authors:Zhen Huang, Tao Tang, Ronghao Xu, Yangbo Wei, Wenkai Yang, Suhua Wang, Xiaoxin Sun, Han Li, Qingsong Yao

3D landmark detection is a critical task in medical image analysis, and accurately detecting anatomical landmarks is essential for subsequent medical imaging tasks. However, mainstream deep learning methods in this field struggle to simultaneously capture fine-grained local features and model global spatial relationships, while maintaining a balance between accuracy and computational efficiency. Local feature extraction requires capturing fine-grained anatomical details, while global modeling requires understanding the spatial relationships within complex anatomical structures. The high-dimensional nature of 3D volume further exacerbates these challenges, as landmarks are sparsely distributed, leading to significant computational costs. Therefore, achieving efficient and precise 3D landmark detection remains a pressing challenge in medical image analysis. In this work, We propose a \textbf{H}ybrid \textbf{3}D \textbf{DE}tection \textbf{Net}(H3DE-Net), a novel framework that combines CNNs for local feature extraction with a lightweight attention mechanism designed to efficiently capture global dependencies in 3D volumetric data. This mechanism employs a hierarchical routing strategy to reduce computational cost while maintaining global context modeling. To our knowledge, H3DE-Net is the first 3D landmark detection model that integrates such a lightweight attention mechanism with CNNs. Additionally, integrating multi-scale feature fusion further enhances detection accuracy and robustness. Experimental results on a public CT dataset demonstrate that H3DE-Net achieves state-of-the-art(SOTA) performance, significantly improving accuracy and robustness, particularly in scenarios with missing landmarks or complex anatomical variations. We aready open-source our project, including code, data and model weights.

三维（3D）标志点检测是医学图像分析中的一项关键任务，准确检测解剖标志点对于后续医学成像任务至关重要。然而，该领域的主流深度学习方法难以在精细局部特征捕捉、全局空间关系建模、准确性和计算效率之间保持平衡。局部特征提取需要捕捉精细的解剖细节，而全局建模则需要理解复杂解剖结构内的空间关系。此外，由于标志点稀疏分布，三维体积的高维性质进一步加剧了这些挑战，导致计算成本较高。因此，实现高效且精确的3D标志点检测仍然是医学图像分析中的一个紧迫挑战。在本研究中，我们提出了一种混合三维检测网络（Hybrid 3D Detection Net，简称H3DE-Net）。这是一个新颖框架，结合了卷积神经网络（CNN）用于局部特征提取和一个轻量级注意力机制，旨在高效捕捉三维体积数据中的全局依赖关系。该机制采用分层路由策略来降低计算成本，同时保持全局上下文建模。据我们所知，H3DE-Net是第一个将此类轻量级注意力机制与CNN相结合的3D标志点检测模型。此外，通过整合多尺度特征融合进一步提高了检测精度和稳健性。在公共CT数据集上的实验结果表明，H3DE-Net达到了最先进的性能，特别是在缺失标志点或复杂解剖结构变异的情况下，显著提高了准确性和稳健性。我们已经开源了我们的项目，包括代码、数据和模型权重。

论文及项目相关链接

PDF

Summary

本文提出一种混合三维检测网络（H3DE-Net），结合卷积神经网络（CNN）进行局部特征提取，并采用轻量级注意力机制高效捕捉三维体积数据中的全局依赖关系。该方法采用分层路由策略，降低计算成本的同时保持全局上下文建模。实验结果表明，H3DE-Net在公开CT数据集上达到最佳性能，尤其在缺失地标或复杂解剖变异情况下，提高准确性和鲁棒性。

Key Takeaways

3D landmark检测在医学图像分析中至关重要，但主流深度学习方法难以平衡精细局部特征与全局空间关系的捕捉。
局部特征提取需要捕捉细致的解剖细节，而全局建模需要理解复杂解剖结构内的空间关系。
H3DE-Net结合CNN和轻量级注意力机制，高效捕捉三维体积数据中的全局依赖。
分层路由策略降低计算成本，同时保持全局上下文建模。
H3DE-Net是首个结合轻量级注意力机制和CNN的3D landmark检测模型。
多尺度特征融合进一步提高检测准确性和鲁棒性。

Cool Papers

点此查看论文截图

Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood Attention

Authors:Omid Nejati Manzari, Hojat Asgariandehkordi, Taha Koleilat, Yiming Xiao, Hassan Rivaz

Convolutional networks, transformers, hybrid models, and Mamba-based architectures have demonstrated strong performance across various medical image classification tasks. However, these methods were primarily designed to classify clean images using labeled data. In contrast, real-world clinical data often involve image corruptions that are unique to multi-center studies and stem from variations in imaging equipment across manufacturers. In this paper, we introduce the Medical Vision Transformer (MedViTV2), a novel architecture incorporating Kolmogorov-Arnold Network (KAN) layers into the transformer architecture for the first time, aiming for generalized medical image classification. We have developed an efficient KAN block to reduce computational load while enhancing the accuracy of the original MedViT. Additionally, to counteract the fragility of our MedViT when scaled up, we propose an enhanced Dilated Neighborhood Attention (DiNA), an adaptation of the efficient fused dot-product attention kernel capable of capturing global context and expanding receptive fields to scale the model effectively and addressing feature collapse issues. Moreover, a hierarchical hybrid strategy is introduced to stack our Local Feature Perception and Global Feature Perception blocks in an efficient manner, which balances local and global feature perceptions to boost performance. Extensive experiments on 17 medical image classification datasets and 12 corrupted medical image datasets demonstrate that MedViTV2 achieved state-of-the-art results in 27 out of 29 experiments with reduced computational complexity. MedViTV2 is 44% more computationally efficient than the previous version and significantly enhances accuracy, achieving improvements of 4.6% on MedMNIST, 5.8% on NonMNIST, and 13.4% on the MedMNIST-C benchmark.

卷积网络、变压器、混合模型和基于Mamba的架构已在各种医学图像分类任务中表现出强大的性能。然而，这些方法主要是为使用带标签数据对干净图像进行分类而设计的。相比之下，现实世界的临床数据通常涉及多中心研究独有的图像损坏问题，以及来自不同制造商的成像设备之间的差异所导致的图像损坏问题。在本文中，我们引入了医疗视觉转换器（MedViTV2），这是一种新型架构，首次将Kolmogorov-Arnold网络（KAN）层纳入转换器架构中，旨在实现通用的医学图像分类。我们开发了一个高效的KAN块，以减少计算负载同时提高原始MedViT的准确性。此外，为了抵消我们MedViT在扩大规模时的脆弱性，我们提出了一种增强的膨胀邻域注意力（DiNA），这是对高效的融合点积注意力核心的适应，能够捕获全局上下文并扩展感受野，以有效地扩展模型并解决特征崩溃问题。此外，我们介绍了一种分层混合策略，以有效的方式堆叠我们的局部特征感知和全局特征感知块，这平衡了局部和全局特征感知以提高性能。在17个医学图像分类数据集和12个损坏医学图像数据集上的大量实验表明，MedViTV2在29次实验中的27次达到了最新水平的结果，并且计算复杂度降低。MedViTV2的计算效率比前一个版本高出44％，并且大大提高了准确性，在MedMNIST上提高了4.6％，在NonMNIST上提高了5.8％，在MedMNIST-C基准测试上提高了13.4％。

论文及项目相关链接

PDF

摘要
本文介绍了Medical Vision Transformer V2（MedViTV2）架构，该架构首次将Kolmogorov-Arnold网络（KAN）层融入transformer架构中，旨在实现通用的医学图像分类。通过开发高效的KAN块，减少了计算负载，提高了原始MedViT的准确性。为应对模型放大后的脆弱性，提出了增强的Dilated Neighborhood Attention（DiNA），其能捕捉全局上下文并扩展感受野，有效扩展模型并解决特征崩溃问题。此外，还介绍了分层混合策略，以平衡本地和全局特征感知块，提高性能。在17个医学图像分类数据集和12个腐败医学图像数据集上的大量实验表明，MedViTV2在27项实验中获得了最新结果，计算复杂度降低。与前一版本相比，MedViTV2计算效率提高了44%，并且在MedMNIST、NonMNIST和MedMNIST-C基准测试上的准确率分别提高了4.6%、5.8%和13.4%。

关键见解

MedViTV2结合了卷积网络、transformer、混合模型和基于Mamba的架构的优势，用于医学图像分类任务。
引入Kolmogorov-Arnold网络（KAN）层以提高模型的准确性和效率。
提出Dilated Neighborhood Attention（DiNA）以增强模型的健壮性并扩大感受野。
采用分层混合策略平衡本地和全局特征感知，进一步提升性能。
在多个医学图像分类数据集上进行广泛实验，证明MedViTV2具有卓越的性能和计算效率。
与先前的模型相比，MedViTV2在计算效率和准确率方面均有显著提高。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-10-11/%E5%8C%BB%E5%AD%A6%E5%9B%BE%E5%83%8F/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

医学图像

TTS

TTS 方向最新论文已更新，请持续关注 Update in 2025-10-11 DialoSpeech Dual-Speaker Dialogue Generation with LLM and Flow Matching

2025-10-11 TTS

TTS

牙齿修复

牙齿修复方向最新论文已更新，请持续关注 Update in 2025-10-11 Biology-driven assessment of deep learning super-resolution imaging of the porosity network in dentin

2025-10-11 牙齿修复

牙齿修复

医学图像

2025-10-11 更新

AI-Driven Radiology Report Generation for Traumatic Brain Injuries

Robust Source-Free Domain Adaptation for Medical Image Segmentation based on Curriculum Learning

Random Window Augmentations for Deep Learning Robustness in CT and Liver Tumor Segmentation

RASALoRE: Region Aware Spatial Attention with Location-based Random Embeddings for Weakly Supervised Anomaly Detection in Brain MRI Scans

MRI-derived quantification of hepatic vessel-to-volume ratios in chronic liver disease using a deep learning approach

Demystifying Deep Learning-based Brain Tumor Segmentation with 3D UNets and Explainable AI (XAI): A Comparative Analysis

TCIP: Threshold-Controlled Iterative Pyramid Network for Deformable Medical Image Registration

A Denoising Framework for Real-World Ultra-Low Dose Lung CT Images Based on an Image Purification Strategy

How We Won BraTS-SSA 2025: Brain Tumor Segmentation in the Sub-Saharan African Population Using Segmentation-Aware Data Augmentation and Model Ensembling

ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification

FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification

VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Vision Backbones

AutoMiSeg: Automatic Medical Image Segmentation via Test-Time Adaptation of Foundation Models

Robust Frequency Domain Full-Waveform Inversion via HV-Geometry

From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation

Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification

Submillimeter-Accurate 3D Lumbar Spine Reconstruction from Biplanar X-Ray Images: Incorporating a Multi-Task Network and Landmark-Weighted Loss

A Graph-Based Framework for Interpretable Whole Slide Image Analysis

H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging

Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood Attention