发布日期: 2025-10-18

更新日期: 2025-11-27

文章字数: 915

阅读时长: 3 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-18 更新

Evaluating the Explainability of Vision Transformers in Medical Imaging

Authors:Leili Barekatain, Ben Glocker

Understanding model decisions is crucial in medical imaging, where interpretability directly impacts clinical trust and adoption. Vision Transformers (ViTs) have demonstrated state-of-the-art performance in diagnostic imaging; however, their complex attention mechanisms pose challenges to explainability. This study evaluates the explainability of different Vision Transformer architectures and pre-training strategies - ViT, DeiT, DINO, and Swin Transformer - using Gradient Attention Rollout and Grad-CAM. We conduct both quantitative and qualitative analyses on two medical imaging tasks: peripheral blood cell classification and breast ultrasound image classification. Our findings indicate that DINO combined with Grad-CAM offers the most faithful and localized explanations across datasets. Grad-CAM consistently produces class-discriminative and spatially precise heatmaps, while Gradient Attention Rollout yields more scattered activations. Even in misclassification cases, DINO with Grad-CAM highlights clinically relevant morphological features that appear to have misled the model. By improving model transparency, this research supports the reliable and explainable integration of ViTs into critical medical diagnostic workflows.

在医学成像领域，理解模型决策至关重要，因为解释性直接影响临床信任和采用。视觉转换器（ViTs）在诊断成像方面表现出了卓越的性能；然而，它们复杂的注意力机制对解释性构成了挑战。本研究使用梯度注意力滚动和Grad-CAM评估了不同的视觉转换器架构和预训练策略——ViT、DeiT、DINO和Swin Transformer的解释性。我们在两个医学影像任务（外周血细胞分类和乳腺超声检查图像分类）上进行了定量和定性分析。我们的研究结果表明，DINO与Grad-CAM相结合提供了跨数据集最忠实和局部化的解释。Grad-CAM始终能产生类别区分度高且空间精确的热图，而梯度注意力滚动产生的激活更为分散。即使在误分类的情况下，DINO与Grad-CAM也能突出显示临床上相关的形态特征，这些特征似乎误导了模型。通过提高模型的透明度，该研究支持将ViTs可靠且可解释地集成到关键医疗诊断流程中。

论文及项目相关链接

PDF Accepted at Workshop on Interpretability of Machine Intelligence in Medical Image Computing at MICCAI 2025

Summary

本文研究了不同Vision Transformer架构和预训练策略（包括ViT、DeiT、DINO和Swin Transformer）在医学影像中的解释性。通过Gradient Attention Rollout和Grad-CAM进行定量和定性分析，发现DINO结合Grad-CAM在多个数据集上提供最具说服力和局部化的解释。该研究结果有助于提高模型的透明度，并支持将ViTs可靠地集成到关键医疗诊断流程中。

Key Takeaways

Vision Transformers (ViTs) 在医学影像领域具有卓越性能，但其复杂的注意力机制对解释性构成挑战。
本研究使用Gradient Attention Rollout和Grad-CAM评估了不同ViT架构和预训练策略的解释性。
在血液细胞分类和乳腺超声图像分类任务中，DINO结合Grad-CAM提供最具说服力和局部化的解释。
Grad-CAM能产生类判别和空间精确的热图，而Gradient Attention Rollout产生的激活更分散。
即使在误分类的情况下，DINO与Grad-CAM也能突出显示与模型决策相关的形态特征。
研究结果有助于提高模型的透明度，为医疗诊断流程中的可靠集成提供支持。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-10-18/%E5%8C%BB%E5%AD%A6%E5%BD%B1%E5%83%8F_Breast%20Ultrasound/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

医学影像/Breast Ultrasound

Speech

Speech 方向最新论文已更新，请持续关注 Update in 2025-10-18 OmniMotion Multimodal Motion Generation with Continuous Masked Autoregression

2025-10-18 Speech

Speech

无监督/半监督/对比学习

无监督/半监督/对比学习方向最新论文已更新，请持续关注 Update in 2025-10-18 Seeing and Knowing in the Wild Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning

2025-10-18 无监督/半监督/对比学习

无监督/半监督/对比学习