发布日期: 2025-10-02

更新日期: 2025-11-27

文章字数: 21.6k

阅读时长: 88 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-02 更新

Comparative study of Wavelet transform and Fourier domain filtering for medical image denoising

Authors:M. Ali Saif, Bassam M. Mughalles, Ibrahim G. H. Loqman

Denoising of images is a crucial preprocessing step in medical imaging, essential for improving diagnostic clarity. While deep learning methods offer state-of-the-art performance, their computational complexity and data requirements can be prohibitive. In this study we present a comprehensive comparative analysis of two classical, computationally efficient transform-domain techniques: Discrete Wavelet Transform (DWT) and Discrete Fourier Cosine Transform (DFCT) filtering. We evaluated their efficacy in denoising medical images which corrupted by Gaussian, Uniform, Poisson, and Salt-and-Pepper noise. Contrary to the common hypothesis favoring wavelets for their multi-resolution capabilities, our results demonstrate that a block-based DFCT approach consistently and significantly outperforms a global DWT approach across all noise types and performance metrics (SNR, PSNR, IM). We attribute DFCT’s superior performance to its localized processing strategy, which better preserves fine details by operating on small image blocks, effectively adapting to local statistics without introducing global artifacts. This finding underscores the importance of algorithmic selection based on processing methodology, not just transform properties, and positions DFCT as a highly effective and efficient denoising tool for practical medical imaging applications.

图像去噪在医学成像中是一个关键的预处理步骤，对于提高诊断清晰度至关重要。虽然深度学习方法提供了最先进的性能，但其计算复杂性和数据要求可能是禁止性的。在这项研究中，我们对两种经典且计算效率高的变换域技术进行了全面的比较分析：离散小波变换（DWT）和离散余弦变换（DFCT）滤波。我们评估了它们在去除医学图像中由高斯、均匀、泊松和椒盐噪声引起的噪声方面的有效性。与普遍假设小波具有多分辨率能力相反，我们的结果表明，基于块的DFCT方法在所有这些噪声类型和性能指标（SNR、PSNR、IM）上始终显著优于全局DWT方法。我们将DFCT的优异性能归因于其局部处理策略，该策略通过在较小的图像块上进行操作来更好地保留细节，有效地适应局部统计信息而不会引入全局伪影。这一发现强调了基于处理方法的算法选择的重要性，而不仅仅是转换属性，并将DFCT定位为实际医学成像应用中高效且高效的去噪工具。

论文及项目相关链接

PDF 21 pages, 11 figures

Summary
医学图像去噪是医学成像中的关键预处理步骤，能提高诊断清晰度。本研究对离散余弦变换（DFCT）和离散小波变换（DWT）两种传统变换域技术进行了全面的比较分析，用于评估它们在去噪医学图像中的效果，这些图像受到高斯、均匀、泊松和椒盐噪声的破坏。研究发现，基于块的DFCT方法在所有噪声类型和性能指标（信噪比、峰值信噪比、图像度量）上均显著优于全局DWT方法。这主要归因于DFCT的局部处理策略，能够更好地保留细节，通过处理小图像块来适应局部统计信息，而不引入全局伪影。

Key Takeaways

医学图像去噪是提升诊断清晰度的关键步骤。
研究对比了离散余弦变换（DFCT）和离散小波变换（DWT）在去噪医学图像中的应用。
DFCT方法在所有噪声类型和性能指标上表现更优秀。
DFCT的局部处理策略可以更好地保留图像的细节。
DFCT通过处理小图像块来适应局部统计信息，减少了全局伪影的产生。
算法的选择不仅取决于变换属性，还取决于处理策略。

Cool Papers

点此查看论文截图

Automated and Scalable SEM Image Analysis of Perovskite Solar Cell Materials via a Deep Segmentation Framework

Authors:Jian Guo Pan, Lin Wang, Xia Cai

Scanning Electron Microscopy (SEM) is indispensable for characterizing the microstructure of thin films during perovskite solar cell fabrication. Accurate identification and quantification of lead iodide and perovskite phases are critical because residual lead iodide strongly influences crystallization pathways and defect formation, while the morphology of perovskite grains governs carrier transport and device stability. Yet current SEM image analysis is still largely manual, limiting throughput and consistency. Here, we present an automated deep learning-based framework for SEM image segmentation that enables precise and efficient identification of lead iodide, perovskite and defect domains across diverse morphologies. Built upon an improved YOLOv8x architecture, our model named PerovSegNet incorporates two novel modules: (i) Adaptive Shuffle Dilated Convolution Block, which enhances multi-scale and fine-grained feature extraction through group convolutions and channel mixing; and (ii) Separable Adaptive Downsampling module, which jointly preserves fine-scale textures and large-scale structures for more robust boundary recognition. Trained on an augmented dataset of 10,994 SEM images, PerovSegNet achieves a mean Average Precision of 87.25% with 265.4 Giga Floating Point Operations, outperforming the baseline YOLOv8x-seg by 4.08%, while reducing model size and computational load by 24.43% and 25.22%, respectively. Beyond segmentation, the framework provides quantitative grain-level metrics, such as lead iodide/perovskite area and count, which can serve as reliable indicators of crystallization efficiency and microstructural quality. These capabilities establish PerovSegNet as a scalable tool for real-time process monitoring and data-driven optimization of perovskite thin-film fabrication.The source code is available at:https://github.com/wlyyj/PerovSegNet/tree/master.

扫描电子显微镜（SEM）在钙钛矿太阳能电池制备过程中表征薄膜微观结构时必不可少。准确识别和量化碘化铅和钙钛矿相至关重要，因为残留的碘化铅会严重影响结晶途径和缺陷的形成，而钙钛矿晶粒的形态则控制着载流子的传输和设备的稳定性。然而，当前的SEM图像分析仍然是大部分手动操作，限制了处理速度和一致性。在这里，我们提出了一种基于深度学习的自动化SEM图像分割框架，可以精确有效地识别碘化铅、钙钛矿和缺陷域的各种形态。我们的模型PerovSegNet建立在改进的YOLOv8x架构之上，并融入了两个新模块：（i）自适应洗牌膨胀卷积块，它通过分组卷积和通道混合增强多尺度和细粒度特征提取；（ii）可分离自适应下采样模块，它联合保留细纹理和大规模结构，以实现更稳健的边界识别。PerovSegNet在10994张SEM图像的增强数据集上进行训练，以265.4 Giga浮点运算达到87.25%的平均精度，超越了基线YOLOv8x-seg 4.08%，同时分别减小了模型大小和计算负载的24.43%和25.22%。除了分割功能外，该框架还提供定量晶粒级指标，如碘化铅/钙钛矿面积和计数，这些指标可以作为结晶效率和微观结构质量的可靠指标。这些功能使PerovSegNet成为用于实时监控和数据驱动优化钙钛矿薄膜制造的可扩展工具。源代码可在：https://github.com/wlyyj/PerovSegNet/tree/master获取。

论文及项目相关链接

PDF

Summary

基于扫描电子显微镜（SEM）图像的深度学习分割框架PerovSegNet，能精确有效地识别铅碘化物、钙钛矿和缺陷域。该模型通过改进YOLOv8x架构，融入自适应混合卷积模块和可分离自适应降采样模块，实现多尺度特征提取，并能在大量SEM图像中识别细微差异。训练数据集增强至10,994张SEM图像，模型平均精度达到87.25%，性能优越。此外，该框架还提供定量晶粒级指标，可用于评估钙钛矿薄膜的结晶效率和微结构质量。

Key Takeaways

PerovSegNet是一个基于深度学习的自动化框架，用于SEM图像分割。
框架能够精确识别铅碘化物、钙钛矿和缺陷域。
通过改进YOLOv8x架构，融入两个新模块以提高性能。
训练数据集增强，模型平均精度达到87.25%。
框架提供定量晶粒级指标，用于评估钙钛矿薄膜的结晶效率和微结构质量。
模型性能优越，较基线YOLOv8x-seg高出4.08%，同时减小了模型大小和计算负载。

Cool Papers

点此查看论文截图

MoSe2 and WSe2 shell morphology control via temperature optimization during two-step growth of ZnSe-based core-shell nanowires

Authors:Luize Dipane, Liora Kotlara, Viktors Vibornijs, Katrina Laganovska, Aleksejs Zolotarjovs, Eriks Dipans, Jevgenijs Gabrusenoks, Boris Polyakov, Edgars Butanovs

Achieving uniform and controlled transition metal dichalcogenide (TMD) shell growth on nanowires (NWs) remains a key challenge, limiting the development of high-quality core-shell heterostructures for optoelectronic and photocatalytic applications. In this work, the fabrication of ZnSe-MoSe2 and ZnSe-WSe2 core-shell NWs was successfully demonstrated. ZnSe NWs were grown via the vapor-liquid-solid growth mechanism, while TMD (MoSe2 or WSe2) shells were formed through a two-step process of sacrificial oxide layer deposition via magnetron sputtering followed by selenization process in a chemical vapor transport reactor. As-grown nanostructures were characterized using X-ray diffraction, transmission electron microscopy, X-ray photoelectron spectroscopy, Raman spectroscopy and photoluminescence spectroscopy. It was observed that the TMD shell morphology can be controlled through the selenization process temperature optimization, which arises due to different growth mechanisms discussed here. The studied trends could be further extended to other semiconductor NW and TMD core-shell heterostructure growth, offering promising avenues for advanced nanoscale applications.

在纳米线上实现均匀可控的过渡金属二卤化物（TMD）外壳生长仍然是一个关键挑战，这限制了用于光电子和光催化应用的高质量核壳异质结构的发展。在这项工作中，成功展示了ZnSe-MoSe2和ZnSe-WSe2核壳纳米线的制备。ZnSe纳米线是通过气液固生长机制生长的，而TMD（MoSe2或WSe2）外壳则是通过磁控溅射沉积牺牲氧化层后，再经化学气相传输反应器进行硒化过程的两步工艺形成的。对所生长的纳米结构进行了X射线衍射、透射电子显微镜、X射线光电子光谱、拉曼光谱和光致发光光谱表征。观察发现，通过优化硒化过程温度，可以控制TMD外壳的形态，这是由于这里讨论的不同生长机制所导致的。所研究的趋势可进一步扩展到其他半导体纳米线和TMD核壳异质结构的生长，为先进的纳米应用提供了有前景的途径。

论文及项目相关链接

PDF

Summary

在制造ZnSe-MoSe2及ZnSe-WSe2核壳纳米线时面临的一项重要挑战是金属二卤化物壳层的均匀生长控制问题。本研究成功展示了核壳纳米线的生长过程，采用气液固生长机制制造ZnSe纳米线，通过磁控溅射沉积牺牲氧化物层，并在化学气相传输反应器中进行硒化过程形成TMD壳层。实验证明，可以通过优化硒化过程温度来控制TMD壳的形态，这对未来的先进纳米级应用十分有利。研究成果还可以拓展到其他半导体纳米线和金属二卤化物核壳异质结构物的生长研究中。该结果具有重要的实际意义。通过扫描X射线衍射分析可知其对金属二卤化物(TMD)的结构依赖性可预期展现较为可靠结果的理论验证方式可供借鉴。此研究不仅对于材料科学领域有着重要影响，同时也在光电子学和光催化领域开辟了新的研究道路。这项研究将推动相关领域的技术进步和创新发展。研究内容包括高性能纳米材料的合成以及这些材料在电子设备中的潜在应用等前沿问题。本研究有助于进一步推进光电领域的发展，并有望为未来的技术革新提供新的思路和方法。该研究成果对于推动纳米材料的发展具有重大意义。它不仅解决了金属二卤化物壳层生长的关键问题，而且为先进纳米级应用提供了广阔的前景和潜在的实用价值。综上所述，本项研究将对核壳纳米线的开发产生重要影响并推进纳米科学领域的发展进步具有深远的实践意义和社会价值等更多相关技术领域中的广阔发展前景有望对更广阔的科学应用领域提供技术支持与应用拓展并推动相关领域的技术创新与发展进步。这一成果为相关领域的研究提供了重要的启示和借鉴。未来，该研究将有望为半导体纳米线和核壳异质结构的研究带来新的突破和发展方向。Key Takeaways:

一、实现了ZnSe-MoSe2和ZnSe-WSe2核壳纳米线的成功生长。这是通过在纳米线上利用气液固生长机制生成ZnSe核后，再通过磁控溅射沉积牺牲氧化物层，最后进行硒化过程形成TMD壳层实现的。

二、研究发现在硒化过程中通过温度优化能够控制壳层形态变化。此观察是由于在此优化过程中发生的不同生长机制造成的。这一发现对于未来半导体纳米线和金属二卤化物核壳异质结构的生长具有指导意义。

Cool Papers

点此查看论文截图

Authors:Yang Zhou, Kunhao Yuan, Ye Wei, Jishizhan Chen

Liver fibrosis represents the accumulation of excessive extracellular matrix caused by sustained hepatic injury. It disrupts normal lobular architecture and function, increasing the chances of cirrhosis and liver failure. Precise staging of fibrosis for early diagnosis and intervention is often invasive, which carries risks and complications. To address this challenge, recent advances in artificial intelligence-based liver segmentation and fibrosis staging offer a non-invasive alternative. As a result, the CARE 2025 Challenge aimed for automated methods to quantify and analyse liver fibrosis in real-world scenarios, using multi-centre, multi-modal, and multi-phase MRI data. This challenge included tasks of precise liver segmentation (LiSeg) and fibrosis staging (LiFS). In this study, we developed an automated pipeline for both tasks across all the provided MRI modalities. This pipeline integrates pseudo-labelling based on multi-modal co-registration, liver segmentation using deep neural networks, and liver fibrosis staging based on shape, textural, appearance, and directional (STAD) features derived from segmentation masks and MRI images. By solely using the released data with limited annotations, our proposed pipeline demonstrated excellent generalisability for all MRI modalities, achieving top-tier performance across all competition subtasks. This approach provides a rapid and reproducible framework for quantitative MRI-based liver fibrosis assessment, supporting early diagnosis and clinical decision-making. Code is available at https://github.com/YangForever/care2025_liver_biodreamer.

肝脏纤维化是由持续的肝损伤引起的过量细胞外基质积累。它会破坏正常的肝小叶结构和功能，增加肝硬化和肝衰竭的风险。对纤维化进行精确分期以进行早期诊断和治疗通常具有侵入性，这带来了风险和并发症。为了应对这一挑战，人工智能在肝脏分割和纤维化分期方面的最新进展提供了一种非侵入性的替代方案。因此，CARE 2025挑战赛旨在利用多中心、多模态和多阶段的MRI数据，开发自动化方法来量化并分析实际场景中的肝纤维化。该挑战包括精确的肝脏分割（LiSeg）和纤维化分期（LiFS）任务。在这项研究中，我们针对所有提供的MRI模式开发了一个用于这两个任务的自动化管道。该管道集成了基于多模态配准的伪标记、使用深度神经网络进行肝脏分割以及基于分割掩膜和MRI图像的形状、纹理、外观和方向（STAD）特征进行肝脏纤维化分期。仅使用有限标注的发布数据，我们提出的管道在所有MRI模式上表现出良好的通用性，在所有竞赛子任务中均取得了顶尖的表现。这种方法提供了一个快速且可重复的量化的MRI肝脏纤维化评估框架，支持早期诊断和临床决策。代码可在https://github.com/YangForever/care2025_liver_biodreamer上找到。

论文及项目相关链接

PDF

Summary
利用人工智能技术实现无创肝脏纤维化分期评估。基于多模态MRI数据的伪标记和多模态融合，建立自动化流程实现肝脏分割和纤维化分期评估。代码公开可供研究使用。

Key Takeaways

肝脏纤维化是由持续肝损伤引起的过量细胞外基质积累导致的。
纤维化会破坏正常的肝小叶结构和功能，增加肝硬化和肝衰竭的风险。
目前精确的纤维化分期方法通常具有侵入性，存在风险和并发症。
人工智能在肝脏分割和纤维化分期方面的进展为解决这一问题提供了无创的替代方案。
CARE 2025挑战旨在使用多中心、多模态和多阶段的MRI数据，实现肝脏纤维化的自动化量化和分析。
研究中开发的自动化流程包括基于多模态配准的伪标记、使用深度神经网络进行肝脏分割以及基于分割掩膜和MRI图像的形状、纹理、外观和方向特征进行肝脏纤维化分期。

Cool Papers

点此查看论文截图

Causally Guided Gaussian Perturbations for Out-Of-Distribution Generalization in Medical Imaging

Authors:Haoran Pei, Yuguang Yang, Kexin Liu, Baochang Zhang

Out-of-distribution (OOD) generalization remains a central challenge in deploying deep learning models to real-world scenarios, particularly in domains such as biomedical images, where distribution shifts are both subtle and pervasive. While existing methods often pursue domain invariance through complex generative models or adversarial training, these approaches may overlook the underlying causal mechanisms of generalization.In this work, we propose Causally-Guided Gaussian Perturbations (CGP)-a lightweight framework that enhances OOD generalization by injecting spatially varying noise into input images, guided by soft causal masks derived from Vision Transformers. By applying stronger perturbations to background regions and weaker ones to foreground areas, CGP encourages the model to rely on causally relevant features rather than spurious correlations.Experimental results on the challenging WILDS benchmark Camelyon17 demonstrate consistent performance gains over state-of-the-art OOD baselines, highlighting the potential of causal perturbation as a tool for reliable and interpretable generalization.

将深度学习模型部署到真实场景时，尤其是处理生物图像等领域时，面临着分布外（OOD）泛化的核心挑战，因为这里的分布变化既微妙又普遍。虽然现有方法通常通过复杂的生成模型或对抗训练追求域不变性，但这些方法可能会忽略泛化的潜在因果机制。在这项工作中，我们提出了受因果引导的高斯扰动（CGP）——一个增强型框架，它通过向输入图像注入空间变化的噪声来提高OOD泛化能力，这些噪声由基于视觉变压器的软因果掩膜引导。通过对背景区域应用更强的扰动，对前景区域应用较弱的扰动，CGP鼓励模型依赖于因果相关特征而不是偶然相关性。在具有挑战性的WILDS基准数据集Camelyon17上的实验结果表明，与最先进的OOD基线相比，CGP具有持续的性能优势，这突显了因果扰动作为可靠和可解释泛化工具的可能性。

论文及项目相关链接

PDF

Summary

本文提出了一种名为Causally-Guided Gaussian Perturbations（CGP）的轻量级框架，用于增强深度学习的模型在真实世界场景中的泛化能力。该框架通过向输入图像注入受视觉转换器引导的空间变化噪声，提高模型的稳健性。通过在背景区域应用更强的扰动，并在前景区域应用较弱的扰动，CGP鼓励模型依赖于因果相关特征，而不是偶然的相关性。在Camelyon17等挑战性数据集上的实验结果表明，与最新的OOD基线相比，该框架具有持续的性能提升潜力。

Key Takeaways

深度学习的模型在部署到真实世界场景时面临OOD（Out-of-Distribution）泛化挑战。
在生物医学图像等领域，分布转移是微妙且普遍的。
现有方法通常通过复杂的生成模型或对抗性训练追求域不变性，但可能忽略了泛化的底层因果机制。
CGP框架通过向输入图像注入受视觉转换器引导的空间变化噪声，提高模型的稳健性。
CGP鼓励模型依赖于因果相关特征，而不是偶然的相关性，通过在背景区域应用更强的扰动，并在前景区域应用较弱的扰动来实现。
在Camelyon17等挑战性数据集上的实验结果表明CGP框架的有效性。

Cool Papers

点此查看论文截图

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Authors:Arvind Murari Vepa, Yannan Yu, Jingru Gan, Anthony Cuturrufo, Weikai Li, Wei Wang, Fabien Scalzo, Yizhou Sun

We introduce mpLLM, a prompt-conditioned hierarchical mixture-of-experts (MoE) architecture for visual question answering over multi-parametric 3D brain MRI (mpMRI). mpLLM routes across modality-level and token-level projection experts to fuse multiple interrelated 3D modalities, enabling efficient training without image–report pretraining. To address limited image-text paired supervision, mpLLM integrates a synthetic visual question answering (VQA) protocol that generates medically relevant VQA from segmentation annotations, and we collaborate with medical experts for clinical validation. mpLLM outperforms strong medical VLM baselines by 5.3% on average across multiple mpMRI datasets. Our study features three main contributions: (1) the first clinically validated VQA dataset for 3D brain mpMRI, (2) a novel multimodal LLM that handles multiple interrelated 3D modalities, and (3) strong empirical results that demonstrate the medical utility of our methodology. Ablations highlight the importance of modality-level and token-level experts and prompt-conditioned routing. We have included our source code in the supplementary materials and will release our dataset upon publication.

我们介绍了mpLLM，这是一种针对多参数3D脑MRI（mpMRI）的视觉问答的提示条件分层专家混合（MoE）架构。mpLLM在模态级别和令牌级别投影专家之间进行路由，以融合多个相互关联的3D模态，实现高效的训练，无需图像报告预训练。为了解决有限的图像文本配对监督问题，mpLLM集成了一个合成视觉问答（VQA）协议，该协议从分割注释中生成医学相关的VQA，并与医学专家进行合作进行临床验证。在多个mpMRI数据集上，mpLLM平均比强大的医疗VLM基准测试高出5.3%。我们的研究有三个主要贡献：（1）首个经过临床验证的3D脑mpMRI的VQA数据集，（2）一种处理多个相互关联的3D模态的新型多模态LLM，以及（3）强有力的实证结果证明了我们的方法医学实用性。消融实验突显了模态级别和令牌级别专家以及提示条件路由的重要性。我们的源代码已包含在补充材料中，并在发布时公布我们的数据集。

论文及项目相关链接

PDF 23 pages, 3 figures

Summary

本文介绍了mpLLM，这是一种用于多参数三维脑MRI（mpMRI）的视觉问答的提示条件分层混合专家（MoE）架构。该架构实现了跨模态和标记级的投影专家路由，融合了多个相关的三维模态，无需图像报告进行预训练，就能进行高效训练。为解决图像文本配对监督有限的问题，mpLLM集成了一种合成视觉问答（VQA）协议，该协议可从分割注释生成医学相关的VQA，并与医学专家合作进行临床验证。在多个mpMRI数据集上，mpLLM平均比强大的医学VLM基线高出5.3%。本研究的主要贡献包括：（1）首个经过临床验证的用于三维脑mpMRI的视觉问答数据集，（2）一种处理多个相关三维模态的新型多模态大型语言模型，（3）强有力的实证结果证明了我们方法论的医学实用性。分离研究突出了模态级和标记级专家以及提示条件路由的重要性。

Key Takeaways

mpLLM是一种用于多参数三维脑MRI的视觉问答的分层混合专家架构。
该架构实现了跨模态和标记级的投影专家路由，融合了多个相关的三维模态。
mpLLM通过集成合成视觉问答协议，生成医学相关的问答并进行临床验证。
mpLLM在多个mpMRI数据集上的表现优于其他医学VLM基线，平均高出5.3%。
本研究贡献包括首个临床验证的VQA数据集、新型多模态大型语言模型和实证结果。
分离研究证明了模态级和标记级专家以及提示条件路由的重要性。

Cool Papers

点此查看论文截图

Dolphin v1.0 Technical Report

Authors:Taohan Weng, Chi zhang, Chaoran Yan, Siya Liu, Xiaoyang Liu, Yalun Wu, Boyang Wang, Boyan Wang, Jiren Ren, Kaiwen Yan, Jinze Yu, Kaibing Hu, Henan Liu, Haoyun zheng, Anjie Le, Hongcheng Guo

Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound’s complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.

超声在现代医学中至关重要，但面临着操作者依赖、图像噪声和实时扫描等挑战，阻碍了人工智能的整合。虽然大型多模式模型在其他医学成像领域表现卓越，但在应对超声的复杂性方面却遇到困扰。为了解决这个问题，我们推出了Dolphin v1.0（V1）及其增强推理版本Dolphin R1。作为首个大规模的多模式超声基础模型，Dolphin能在单一视觉语言框架下统一完成多样化的临床任务。为了应对超声的变性和噪声问题，我们筛选了一个规模达2百万的多模式数据集，结合了教科书知识、公开数据、合成样本和一般语料库。这确保了稳健的感知、通用性和临床适应性。Dolphin系列采用三阶段训练策略：领域专业化预训练、指令驱动对齐和基于增强的细化。Dolphin v1.0在分类、检测、回归和报告生成方面表现出可靠的性能。而Dolphin R1通过强化学习并使用超声特定奖励增强了诊断推理的透明度。在U2-Bench上的八项超声任务评估中，Dolphin R1的U2分数为0.5835，是第二名模型（0.2968）的两倍多，创造了新的技术记录。Dolphin v1.0也表现出强大的竞争力，验证了其统一框架的有效性。对比结果显示，经过增强推理的训练显著提高了诊断的准确性、一致性和可解释性，突显其在高风险医疗人工智能中的重要性。

论文及项目相关链接

PDF

Summary

本文介绍了超声在现代医学中的重要性及其所面临的挑战，如操作依赖性、图像噪声和实时扫描等。为了解决这些问题，文章提出了Dolphin v1.0及其增强版Dolphin R1，这是首个大规模的多模式超声基础模型，在一个统一的视觉语言框架内融合了多种临床任务。通过采用三阶段训练策略和数据集组合，Dolphin系列模型在分类、检测、回归和报告生成等方面表现出可靠性能，而Dolphin R1通过强化学习与超声特定奖励增强诊断推理、透明度和解释性。评估结果显示，Dolphin R1在U2-Bench上的表现达到新的最佳水平，显著提高了诊断准确性、一致性和解释性。

Key Takeaways

超声在现代医学中至关重要，但面临操作依赖性、图像噪声和实时扫描等挑战。
多模式大型模型在医学成像方面表现出卓越性能，但在处理超声复杂性时遇到困难。
Dolphin v1.0及其增强版Dolphin R1是首个统一的多模式超声基础模型，融合多种临床任务。
Dolphin系列采用三阶段训练策略，确保稳健感知、通用性和临床适应性。
Dolphin R1通过强化学习提高诊断推理、透明度和解释性。
Dolphin R1在U2-Bench上的表现达到新的最佳水平，超过现有模型。

Cool Papers

点此查看论文截图

K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model

Authors:Bangwei Guo, Yunhe Gao, Meng Ye, Difei Gu, Yang Zhou, Leon Axel, Dimitris Metaxas

Medical image segmentation is fundamental to clinical decision-making, yet existing models remain fragmented. They are usually trained on single knowledge sources and specific to individual tasks, modalities, or organs. This fragmentation contrasts sharply with clinical practice, where experts seamlessly integrate diverse knowledge: anatomical priors from training, exemplar-based reasoning from reference cases, and iterative refinement through real-time interaction. We present $\textbf{K-Prism}$, a unified segmentation framework that mirrors this clinical flexibility by systematically integrating three knowledge paradigms: (i) $\textit{semantic priors}$ learned from annotated datasets, (ii) $\textit{in-context knowledge}$ from few-shot reference examples, and (iii) $\textit{interactive feedback}$ from user inputs like clicks or scribbles. Our key insight is that these heterogeneous knowledge sources can be encoded into a dual-prompt representation: 1-D sparse prompts defining $\textit{what}$ to segment and 2-D dense prompts indicating $\textit{where}$ to attend, which are then dynamically routed through a Mixture-of-Experts (MoE) decoder. This design enables flexible switching between paradigms and joint training across diverse tasks without architectural modifications. Comprehensive experiments on 18 public datasets spanning diverse modalities (CT, MRI, X-ray, pathology, ultrasound, etc.) demonstrate that K-Prism achieves state-of-the-art performance across semantic, in-context, and interactive segmentation settings. Code will be released upon publication.

医学图像分割对于临床决策至关重要，但现有的模型仍然零散。它们通常基于单一的知识源进行训练，并针对特定的任务、模态或器官。这与临床实践形成了鲜明的对比，专家在实践中能够无缝地整合各种知识：来自训练的解剖先验知识、基于参考案例的示例推理以及通过实时互动进行的迭代优化。我们提出了K-Prism，这是一个统一的分割框架，它通过系统地整合三种知识范式来反映这种临床灵活性：（i）从注释数据集中学习的语义先验，（ii）来自少量参考示例的上下文知识，以及（iii）来自用户输入（如点击或涂鸦）的交互式反馈。我们的关键见解是，这些异质的知识源可以编码成双重提示表示：定义什么需要分割的1-D稀疏提示和指示在哪里需要注意的2-D密集提示，然后通过混合专家（MoE）解码器进行动态路由。这种设计实现了范式之间的灵活切换以及在各种任务之间的联合训练，而无需进行架构修改。在涵盖多种模态（CT、MRI、X射线、病理学、超声波等）的18个公共数据集上进行的综合实验表明，K-Prism在语义、上下文和交互式分割环境中均达到了最新技术性能。代码将在出版时发布。

论文及项目相关链接

PDF

摘要
医学图像分割对临床决策至关重要，但现有模型仍然碎片化。它们通常仅在单一知识源上进行训练，并针对特定任务、模态或器官。这与临床实践形成鲜明对比，专家在实践中能无缝整合多样化知识：来自训练的解剖先验知识、基于参考案例的范例推理和通过实时互动的迭代优化。本文提出$\textbf{K-Prism}$，一个统一的分割框架，通过系统地整合三种知识范式来反映这种临床灵活性：从标注数据集中学习的$\textit{语义先验}$、从少数样本参考例子中的$\textit{上下文知识}$、以及来自用户输入（如点击或涂鸦）的$\textit{交互式反馈}$。我们的关键见解是，这些异质的知识来源可以编码成一种双重提示表示：定义$\textit{什么}$要分割的1-D稀疏提示和指示$\textit{在哪里}$要注意的2-D密集提示，然后通过混合专家解码器进行动态路由。这种设计实现了范式之间的灵活切换和不同任务之间的联合训练，无需进行架构修改。在涵盖多种模态（CT、MRI、X光、病理学、超声等）的18个公共数据集上进行的综合实验表明，K-Prism在语义、上下文和交互式分割环境中均达到最新技术水平。代码将在发表时公开。

要点提炼

医学图像分割在临床决策中的重要性及其现有模型的碎片化问题。
现有模型通常局限于单一知识源和特定任务、模态或器官。
临床实践中专家无缝整合多样化知识。
K-Prism框架的提出，通过整合语义先验、上下文知识和交互式反馈来反映临床灵活性。
K-Prism通过双重提示表示和混合专家解码器实现知识范式的灵活切换和联合训练。
K-Prism在多种公共数据集上达到最新技术水平。

Cool Papers

点此查看论文截图

GenVarFormer: Predicting gene expression from long-range mutations in cancer

Authors:David Laub, Ethan Armand, Arda Pekis, Zekai Chen, Irsyad Adam, Shaun Porwal, Bing Ren, Kevin Brown, Hannah Carter

Distinguishing the rare “driver” mutations that fuel cancer progression from the vast background of “passenger” mutations in the non-coding genome is a fundamental challenge in cancer biology. A primary mechanism that non-coding driver mutations contribute to cancer is by affecting gene expression, potentially from millions of nucleotides away. However, existing predictors of gene expression from mutations are unable to simultaneously handle interactions spanning millions of base pairs, the extreme sparsity of somatic mutations, and generalize to unseen genes. To overcome these limitations, we introduce GenVarFormer (GVF), a novel transformer-based architecture designed to learn mutation representations and their impact on gene expression. GVF efficiently predicts the effect of mutations up to 8 million base pairs away from a gene by only considering mutations and their local DNA context, while omitting the vast intermediate sequence. Using data from 864 breast cancer samples from The Cancer Genome Atlas, we demonstrate that GVF predicts gene expression with 26-fold higher correlation across samples than current models. In addition, GVF is the first model of its kind to generalize to unseen genes and samples simultaneously. Finally, we find that GVF patient embeddings are more informative than ground-truth gene expression for predicting overall patient survival in the most prevalent breast cancer subtype, luminal A. GVF embeddings and gene expression yielded concordance indices of $0.706^{\pm0.136}$ and $0.573^{\pm0.234}$, respectively. Our work establishes a new state-of-the-art for modeling the functional impact of non-coding mutations in cancer and provides a powerful new tool for identifying potential driver events and prognostic biomarkers.

区分促进癌症发展的罕见“驱动”突变与非编码基因组中大量存在的“过客”突变是一个在癌症生物学中的基本挑战。非编码驱动突变促进癌症的主要机制是通过影响基因表达，这种影响可能来自数百万个核苷酸之外。然而，现有的基于突变的基因表达预测器无法同时处理跨越数百万碱基的交互、体细胞突变的极端稀疏性，并且无法推广到未见过的基因。为了克服这些限制，我们引入了GenVarFormer（GVF），这是一种基于transformer的新型架构，旨在学习突变的表示及其对基因表达的影响。GVF仅通过考虑突变及其局部DNA上下文，就能有效地预测距离基因高达8百万碱基对的突变影响，同时省略了大量的中间序列。我们使用来自癌症基因组图谱的864个乳腺癌样本的数据证明，GVF在样本之间的预测基因表达与当前模型相比具有26倍的高相关性。此外，GVF是首个能够同时推广到未见过的基因和样本的模型。最后，我们发现GVF的患者嵌入信息比最常见的乳腺癌亚型luminal A中的基因表达更能预测患者的总体存活情况。GVF嵌入和基因表达的契合指数分别为0.706±0.136和0.573±0.234。我们的工作建立了非编码突变功能影响建模的最新技术，并提供了一种强大的新工具，用于识别潜在的驱动事件和预后生物标志物。

论文及项目相关链接

PDF

摘要

识别推动癌症发展的罕见“驱动”突变与大量非编码基因中的“乘客”突变之间的区分是癌症生物学中的一项基本挑战。非编码驱动突变主要通过影响基因表达来促进癌症发展，这种影响可能来自数百万个核苷酸的距离。然而，现有的基因表达突变预测模型无法同时处理跨越数百万碱基的相互作用、体细胞突变的极端稀疏性以及对未见基因的概括性。为了克服这些局限性，我们引入了GenVarFormer（GVF），这是一种基于transformer的新型架构，旨在学习突变表示及其对基因表达的影响。GVF可以有效地预测距离基因长达8百万碱基处的突变的影响，仅考虑突变和其局部DNA上下文，同时省略大量中间序列。通过使用来自癌症基因组图谱的864个乳腺癌样本的数据，我们证明了GVF在样本之间的预测基因表达与当前模型相比具有26倍更高的相关性。此外，GVF是第一个能够同时概括未见基因和样本的模型。最后，我们发现GVF的患者嵌入信息比真实的基因表达对最常见的乳腺癌亚型luminal A的总体患者生存期的预测更具参考价值。GVF嵌入和基因表达的契合指数分别为±0.136的0.706和±0.234的0.573。我们的工作为建立非编码突变功能影响的最新标准提供了有力的工具，为识别潜在的驱动事件和预后生物标志物提供了强大的新工具。

关键见解

区分推动癌症发展的罕见“驱动”突变与非编码基因中的大量“乘客”突变是癌症生物学的基本挑战之一。
非编码驱动突变通过影响基因表达促进癌症发展，这种影响可能远离基因数百万个核苷酸。
现有预测模型无法同时处理跨越大量碱基的相互作用、突变的极端稀疏性以及对未见基因的概括性。
GenVarFormer（GVF）是一种新型基于transformer的架构，能有效预测突变对基因表达的影响，即使突变位置距离基因长达8百万碱基处。
GVF在乳腺癌样本数据上表现出比现有模型更高的基因表达预测相关性。
GVF能够同时概括未见基因和样本，是首个具备此能力的模型。

Cool Papers

点此查看论文截图

Radiology’s Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology

Authors:Suvrankar Datta, Divya Buchireddygari, Lakshmi Vennela Chowdary Kaza, Mrudula Bhalke, Kautik Singh, Ayush Pandey, Sonit Sai Vasipalli, Upasana Karnwal, Hakikat Bir Singh Bhatti, Bhavya Ratan Maroo, Sanjana Hebbar, Rahul Joseph, Gurkawal Kaur, Devyani Singh, Akhil V, Dheeksha Devasya Shama Prasad, Nishtha Mahajan, Ayinaparthi Arisha, Rajesh Vanagundi, Reet Nandy, Kartik Vuthoo, Snigdhaa Rajvanshi, Nikhileswar Kondaveeti, Suyash Gunjal, Rishabh Jain, Rajat Jain, Anurag Agrawal

Generalist multimodal AI systems such as large language models (LLMs) and vision language models (VLMs) are increasingly accessed by clinicians and patients alike for medical image interpretation through widely available consumer-facing chatbots. Most evaluations claiming expert level performance are on public datasets containing common pathologies. Rigorous evaluation of frontier models on difficult diagnostic cases remains limited. We developed a pilot benchmark of 50 expert-level “spot diagnosis” cases across multiple imaging modalities to evaluate the performance of frontier AI models against board-certified radiologists and radiology trainees. To mirror real-world usage, the reasoning modes of five popular frontier AI models were tested through their native web interfaces, viz. OpenAI o3, OpenAI GPT-5, Gemini 2.5 Pro, Grok-4, and Claude Opus 4.1. Accuracy was scored by blinded experts, and reproducibility was assessed across three independent runs. GPT-5 was additionally evaluated across various reasoning modes. Reasoning quality errors were assessed and a taxonomy of visual reasoning errors was defined. Board-certified radiologists achieved the highest diagnostic accuracy (83%), outperforming trainees (45%) and all AI models (best performance shown by GPT-5: 30%). Reliability was substantial for GPT-5 and o3, moderate for Gemini 2.5 Pro and Grok-4, and poor for Claude Opus 4.1. These findings demonstrate that advanced frontier models fall far short of radiologists in challenging diagnostic cases. Our benchmark highlights the present limitations of generalist AI in medical imaging and cautions against unsupervised clinical use. We also provide a qualitative analysis of reasoning traces and propose a practical taxonomy of visual reasoning errors by AI models for better understanding their failure modes, informing evaluation standards and guiding more robust model development.

通用多模态人工智能系统，如大型语言模型（LLMs）和视觉语言模型（VLMs），正越来越多地被临床医生和患者用于通过广泛可用的面向消费者的聊天机器人进行医学图像解读。大多数声称专家级性能的评估都是在包含常见病理的公共数据集上进行的。对于前沿模型在困难诊断病例上的严格评估仍然有限。我们开发了一个包含50个专家级“即时诊断”病例的试点基准测试，涉及多种成像模式，以评估前沿人工智能模型与董事会认证放射科医师和放射学实习生的表现。为了反映真实世界的使用情况，我们测试了五个受欢迎的前沿人工智能模型的推理模式，这些模型包括通过其原生网络接口进行的OpenAI o3、OpenAI GPT-5、Gemini 2.5 Pro、Grok-4和Claude Opus 4.1。准确性由盲专家评分，并在三次独立运行中进行了可重复性的评估。GPT-5还在各种推理模式下进行了评估。评估了推理质量错误，并定义了视觉推理错误的分类。董事会认证的放射科医师的诊断准确性最高（83%），优于实习生（45%）和所有AI模型（GPT-5表现最佳：30%）。GPT-5和o3的可靠性相当高，Gemini 2.5 Pro和Grok-4的可靠性属中等水平，而Claude Opus 4.1的可靠性较差。这些研究结果表明，在具有挑战性的诊断病例中，先进的前沿模型与放射科医生相比仍有很大差距。我们的基准测试突显了通用人工智能在医学成像方面的当前局限性，并警告不要在没有监督的情况下进行临床使用。我们还提供了对推理轨迹的定性分析，并提出了一个实用的视觉推理错误分类，以更好地了解AI模型的失败模式，为评估标准提供信息并指导更稳健的模型开发。

论文及项目相关链接

PDF 29 pages, 7 figures, 7 tables, includes Annexure (1). Part of the work accepted at RSNA 2025 (Cutting Edge Oral Presentation)

Summary
通用多模态人工智能系统（如大型语言模型和视觉语言模型）通过面向公众的聊天机器人被广泛用于医学图像解读，但其在复杂诊断案例中的表现仍待严格评估。本研究开发了一个包含50个专家级“即时诊断”案例的试点基准测试，以评估前沿AI模型在多种成像模态下的表现，并与认证放射学家和放射学实习生进行比较。结果显示，GPT-5在AI模型中的表现最佳，但放射学专家仍表现出最高的诊断准确性。这表明在复杂诊断案例中，先进的前沿模型与放射师相比仍有很大差距。本研究警告说，这些AI模型在未经监督的临床使用方面仍存在局限性。同时提供了人工智能模型推理痕迹的定性分析并提出了一个实用的视觉推理错误分类法，旨在更好地了解其失败模式，为评估标准提供信息并指导更稳健的模型开发。

Key Takeaways

多模态AI系统通过消费者聊天机器人广泛应用于医学图像解读。
在复杂诊断案例中，前沿AI模型的表现仍待严格评估。
本研究建立了一个包含多种成像模态的专家级诊断案例的试点基准测试。
GPT-5在前沿AI模型中的表现最佳，但仍落后于认证放射师的诊断准确性。
AI模型在推理质量方面存在错误，本研究提出了一个视觉推理错误的分类法以更好地理解其失败模式。
警告说，这些AI模型在未经监督的临床使用方面存在局限性。

Cool Papers

点此查看论文截图

Evaluating Foundation Models with Pathological Concept Learning for Kidney Cancer

Authors:Shangqi Gao, Sihan Wang, Yibo Gao, Boming Wang, Xiahai Zhuang, Anne Warren, Grant Stewart, James Jones, Mireia Crispin-Ortuzar

To evaluate the translational capabilities of foundation models, we develop a pathological concept learning approach focused on kidney cancer. By leveraging TNM staging guidelines and pathology reports, we build comprehensive pathological concepts for kidney cancer. Then, we extract deep features from whole slide images using foundation models, construct pathological graphs to capture spatial correlations, and trained graph neural networks to identify these concepts. Finally, we demonstrate the effectiveness of this approach in kidney cancer survival analysis, highlighting its explainability and fairness in identifying low- and high-risk patients. The source code has been released by https://github.com/shangqigao/RadioPath.

为了评估基础模型的翻译能力，我们开发了一种针对肾癌的病理性概念学习方法。我们借助TNM分期指南和病理报告，为肾癌构建了全面的病理性概念。然后，我们使用基础模型从全幻灯片图像中提取深度特征，构建病理性图表以捕获空间相关性，并训练图神经网络以识别这些概念。最后，我们通过肾癌生存分析证明了该方法的有效性，并重点强调了其在识别低危和高危患者时的可解释性和公平性。源代码已发布在https://github.com/shangqigao/RadioPath。

论文及项目相关链接

PDF Best Paper Award at MICCAI AMAI 2025

Summary
基于肾脏癌的TNM分期指南和病理报告，该研究提出一种针对基础模型的病理性概念学习评价策略。该研究使用全片图像进行深度特征提取，建立病理性图谱以捕捉空间关联，并通过训练图神经网络来识别这些概念。在肾脏癌生存分析中验证了该策略的有效性，其在鉴别低、高风险患者时展现了解释性和公平性。研究相关源代码可通过特定链接访问。

Key Takeaways

研究采用病理性概念学习评估基础模型的翻译能力。
基于肾脏癌的TNM分期指南和病理报告，构建了全面的病理性概念。
利用全片图像提取深度特征并使用图神经网络进行识别。
通过建立病理性图谱捕捉空间关联。
在肾脏癌生存分析中验证了该策略的有效性。
该策略具备解释性和公平性，在鉴别低、高风险患者时表现出优势。

Cool Papers

点此查看论文截图

RIFLE: Removal of Image Flicker-Banding via Latent Diffusion Enhancement

Authors:Libo Zhu, Zihan Zhou, Xiaoyang Liu, Weihang Zhang, Keyu Shi, Yifan Fu, Yulun Zhang

Capturing screens is now routine in our everyday lives. But the photographs of emissive displays are often influenced by the flicker-banding (FB), which is alternating bright%u2013dark stripes that arise from temporal aliasing between a camera’s rolling-shutter readout and the display’s brightness modulation. Unlike moire degradation, which has been extensively studied, the FB remains underexplored despite its frequent and severe impact on readability and perceived quality. We formulate FB removal as a dedicated restoration task and introduce Removal of Image Flicker-Banding via Latent Diffusion Enhancement, RIFLE, a diffusion-based framework designed to remove FB while preserving fine details. We propose the flicker-banding prior estimator (FPE) that predicts key banding attributes and injects it into the restoration network. Additionally, Masked Loss (ML) is proposed to concentrate supervision on banded regions without sacrificing global fidelity. To overcome data scarcity, we provide a simulation pipeline that synthesizes FB in the luminance domain with stochastic jitter in banding angle, banding spacing, and banding width. Feathered boundaries and sensor noise are also applied for a more realistic simulation. For evaluation, we collect a paired real-world FB dataset with pixel-aligned banding-free references captured via long exposure. Across quantitative metrics and visual comparisons on our real-world dataset, RIFLE consistently outperforms recent image reconstruction baselines from mild to severe flicker-banding. To the best of our knowledge, it is the first work to research the simulation and removal of FB. Our work establishes a great foundation for subsequent research in both the dataset construction and the removal model design. Our dataset and code will be released soon.

屏幕截图现在已经成为我们日常生活中的常规操作。然而，发光显示屏的照片往往会受到频闪条纹（FB）的影响，频闪条纹是由于相机滚动快门读出与显示屏亮度调制之间的时间混叠而产生的明暗交替条纹。与摩尔纹退化（已被广泛研究）不同，尽管频闪条纹经常对可读性和感知质量产生严重的影响，但它仍然被探索得不够充分。我们将频闪条纹移除制定为一个专门的恢复任务，并引入了通过潜在扩散增强去除图像频闪条纹（RIFLE），这是一个基于扩散的框架，旨在去除频闪条纹同时保留细节。我们提出了频闪条纹先验估计器（FPE），它可以预测关键的条纹属性并将其注入恢复网络。此外，还提出了掩膜损失（ML），以将监督集中在带状区域上，而不牺牲全局保真度。为了克服数据稀缺的问题，我们提供了一个合成频闪条纹的模拟管道，该管道在亮度域中合成频闪条纹，并带有条纹角度、条纹间距和条纹宽度中的随机抖动。我们还应用了渐变的边界和传感器噪声，以进行更现实的模拟。为了评估，我们收集了一个配对的真实世界频闪条纹数据集，通过长时间曝光捕获了具有像素对齐的无条纹参考。在我们的真实世界数据集上，无论是在定量指标还是视觉比较方面，RIFLE在轻微至严重的频闪条纹情况下均优于最近的图像重建基线。据我们所知，它是第一项研究频闪条纹模拟和去除的工作。我们的工作为后续研究在数据集构建和去除模型设计方面奠定了坚实的基础。我们的数据集和代码将很快发布。

论文及项目相关链接

PDF

摘要
本文研究了屏幕截图中的闪烁条纹（FB）问题，提出一种基于扩散的框架RIFLE，用于去除FB同时保留细节。文章介绍了闪烁条纹先验估计器（FPE）和Masked Loss（ML），并提供了合成FB的仿真管道。RIFLE在真实世界数据集上表现优异，为后续的数据集构建和去除模型设计研究奠定了基础。

关键见解

闪烁条纹（FB）是屏幕截图常见的问题，对可读性和感知质量产生严重影响。
RIFLE框架旨在去除FB，同时保留细节，是一种基于扩散的方法。
引入闪烁条纹先验估计器（FPE）预测关键条纹属性并注入恢复网络。
提出Masked Loss（ML）集中监督带状区域，而不牺牲全局保真度。
提供合成FB的仿真管道，包括亮度域合成和随机抖动带状角度、带状间距和带状宽度。
收集真实世界FB数据集，并通过定量指标和视觉比较评估RIFLE性能。
RIFLE在轻微至严重的闪烁条纹情况下均表现优异，为相关研究奠定基础。

Cool Papers

点此查看论文截图

EWC-Guided Diffusion Replay for Exemplar-Free Continual Learning in Medical Imaging

Authors:Anoushka Harit, William Prew, Zhongtian Sun, Florian Markowetz

Medical imaging foundation models must adapt over time, yet full retraining is often blocked by privacy constraints and cost. We present a continual learning framework that avoids storing patient exemplars by pairing class conditional diffusion replay with Elastic Weight Consolidation. Using a compact Vision Transformer backbone, we evaluate across eight MedMNIST v2 tasks and CheXpert. On CheXpert our approach attains 0.851 AUROC, reduces forgetting by more than 30% relative to DER\texttt{++}, and approaches joint training at 0.869 AUROC, while remaining efficient and privacy preserving. Analyses connect forgetting to two measurable factors: fidelity of replay and Fisher weighted parameter drift, highlighting the complementary roles of replay diffusion and synaptic stability. The results indicate a practical route for scalable, privacy aware continual adaptation of clinical imaging models.

医学影像基础模型必须随时间进行适应，然而由于隐私约束和成本考虑，全面重新训练通常会被阻止。我们提出了一种持续学习框架，通过结合类别条件扩散回放和弹性权重整合，避免了存储患者样本。使用紧凑的愿景转换器主干网，我们在八个MedMNIST v2任务和CheXpert上进行了评估。在CheXpert上，我们的方法达到了0.851的AUROC，与DER++相比，减少了超过30%的遗忘，并接近联合训练的0.869 AUROC，同时保持高效和隐私保护。分析将遗忘与两个可衡量的因素联系起来：回放的保真度和Fisher加权参数漂移，突出了回放扩散和突触稳定性的互补作用。结果表明，这是一条针对临床影像模型的可扩展、注重隐私的持续适应的实际途径。

论文及项目相关链接

PDF Accepted at AI That Keeps Up: NeurIPS 2025 Workshop on Continual and Compatible Foundation Model Updates

Summary
医学成像基础模型需要随时间适应，但受隐私约束和成本限制，全面再训练不可行。我们提出了一种持续学习框架，通过配对类别条件扩散回放和弹性权重整合，避免存储患者样本。使用紧凑的Vision Transformer主干网，我们在八个MedMNIST v2任务和CheXpert上进行了评估。在CheXpert上，我们的方法达到了0.851的AUROC，与DER++相比减少了超过30%的遗忘，并接近联合训练的0.869 AUROC，同时保持高效和隐私保护。分析将遗忘与回放保真度和Fisher加权参数漂移两个可测因素联系起来，突出了回放扩散和突触稳定性的互补作用。结果指示了临床成像模型的可扩展、隐私意识持续适应的实际途径。

Key Takeaways

医学成像基础模型需要适应变化，但全面再训练受限。
提出的持续学习框架避免了存储患者样本。
框架结合了类别条件扩散回放和弹性权重整合。
在多个任务上评估表现良好，尤其在CheXpert上AUROC达到0.851。
与DER++相比，减少了超过30%的遗忘。
分析表明遗忘与回放保真度和参数漂移有关。

Cool Papers

点此查看论文截图

HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation

Authors:Cong Chen, Ziyuan Huang, Cheng Zou, Muzhi Zhu, Kaixiang Ji, Jiajia Liu, Jingdong Chen, Hao Chen, Chunhua Shen

In this work, we present HieraTok, a novel multi-scale Vision Transformer (ViT)-based tokenizer that overcomes the inherent limitation of modeling single-scale representations. This is realized through two key designs: (1) multi-scale downsampling applied to the token map generated by the tokenizer encoder, producing a sequence of multi-scale tokens, and (2) a scale-causal attention mechanism that enables the progressive flow of information from low-resolution global semantic features to high-resolution structural details. Coupling these designs, HieraTok achieves significant improvements in both image reconstruction and generation tasks. Under identical settings, the multi-scale visual tokenizer outperforms its single-scale counterpart by a 27.2% improvement in rFID ($1.47 \rightarrow 1.07$). When integrated into downstream generation frameworks, it achieves a $1.38\times$ faster convergence rate and an 18.9% boost in gFID ($16.4 \rightarrow 13.3$), which may be attributed to the smoother and more uniformly distributed latent space. Furthermore, by scaling up the tokenizer’s training, we demonstrate its potential by a sota rFID of 0.45 and a gFID of 1.82 among ViT tokenizers. To the best of our knowledge, we are the first to introduce multi-scale ViT-based tokenizer in image reconstruction and image generation. We hope our findings and designs advance the ViT-based tokenizers in visual generation tasks.

在这项工作中，我们提出了HieraTok，这是一种基于多尺度Vision Transformer（ViT）的新型分词器，克服了建模单尺度表示的固有局限性。这是通过两个关键设计实现的：（1）对分词器编码器生成的令牌图进行多尺度降采样，生成一系列多尺度令牌；（2）一种规模因果注意力机制，它允许从低分辨率全局语义特征到高分辨率结构细节的信息逐步流动。通过结合这些设计，HieraTok在图像重建和生成任务中都取得了显著的改进。在相同设置下，多尺度视觉分词器在rFID指标上优于单尺度对应物（从1.47提升至1.07，提高了27.2%）。当集成到下游生成框架中时，它实现了1.38倍更快的收敛率，并在gFID上提升了18.9%（从16.4降至13.3），这可能是由于更平滑且分布更均匀的潜在空间。此外，通过扩大分词器的训练规模，我们在ViT分词器中实现了最佳rFID为0.45和gFID为1.82。据我们所知，我们首次在图像重建和图像生成中引入了多尺度ViT基础的分词器。我们希望我们的发现和设计能推动基于ViT的分词器在视觉生成任务中的应用。

论文及项目相关链接

PDF

Summary
本文提出一种基于Vision Transformer（ViT）的多尺度图像生成和重建技术——HieraTok。通过设计两个关键技术点，包括利用多尺度下采样生成多尺度token序列以及采用尺度因果注意力机制，实现了从低分辨率全局语义特征到高分辨率结构细节的信息渐进流动。该技术显著提高了图像重建和生成任务的效果，实现了更平滑且均匀分布的潜在空间，并在大规模数据集上展示了强大的性能表现。该技术是首个在图像重建和生成领域引入的多尺度ViT技术。

Key Takeaways

HieraTok是一种基于Vision Transformer的多尺度图像生成和重建方法。
通过多尺度下采样和尺度因果注意力机制，实现了从全局到局部的信息渐进流动。
HieraTok在图像重建和生成任务上表现出显著优势，相比单尺度模型提升了27.2%的rFID得分。
集成到下游生成框架后，实现了更快的收敛速度和更高的gFID得分提升。
通过扩大训练规模，展示了HieraTok在ViT令牌化器中的潜力。
该技术首次将多尺度ViT应用于图像重建和生成领域。

Cool Papers

点此查看论文截图

Authors:Dayu Tan, Ziwei Zhang, Yansan Su, Xin Peng, Yike Dai, Chunhou Zheng, Weimin Zhong

Numerous CNN-Transformer hybrid models rely on high-complexity global attention mechanisms to capture long-range dependencies, which introduces non-linear computational complexity and leads to significant resource consumption. Although knowledge distillation and sparse attention mechanisms can improve efficiency, they often fall short of delivering the high segmentation accuracy necessary for complex tasks. Balancing model performance with computational efficiency remains a critical challenge. In this work, we propose a novel 3D multi-modal image segmentation framework, termed MSD-KMamba, which integrates bidirectional spatial perception with multi-scale self-distillation. The bidirectional spatial aware branch effectively captures long-range spatial context dependencies across brain regions, while also incorporating a powerful nonlinear feature extraction mechanism that further enhances the model’s ability to learn complex and heterogeneous patterns. In addition, the proposed multi-scale self-distilled fusion strategy strengthens hierarchical feature representations and improves the transfer of semantic information at different resolution levels. By jointly leveraging the bidirectional spatial perception branch and the multi-scale self-distilled fusion strategy, our framework effectively mitigates the bottleneck of quadratic computational complexity in volumetric segmentation, while simultaneously addressing the limitation of insufficient global perception. Extensive experiments on multiple standard benchmark datasets demonstrate that MSD-KMamba consistently outperforms state-of-the-art methods in segmentation accuracy, robustness, and generalization, while maintaining high computational efficiency and favorable scalability. The source code of MSD-KMamba is publicly available at https://github.com/daimao-zhang/MSD-KMamba.

许多CNN-Transformer混合模型依赖于高复杂度的全局注意力机制来捕捉长距离依赖关系，这引入了非线性计算复杂度并导致资源消耗显著增加。尽管知识蒸馏和稀疏注意力机制可以提高效率，但它们通常难以实现复杂任务所需的高分割精度。平衡模型性能与计算效率仍然是一个关键挑战。

在这项工作中，我们提出了一种新的3D多模态图像分割框架，称为MSD-KMamba，它结合了双向空间感知和多尺度自蒸馏。双向空间感知分支有效地捕捉了脑区之间的长距离空间上下文依赖关系，同时采用强大的非线性特征提取机制，进一步增强了模型学习复杂和异质模式的能力。此外，提出的多尺度自蒸馏融合策略加强了分层特征表示，并改善了不同分辨率层次的语义信息传输。通过联合利用双向空间感知分支和多尺度自蒸馏融合策略，我们的框架有效地缓解了体积分割中二次计算复杂性的瓶颈，同时解决了全局感知不足的局限性。在多个标准基准数据集上的广泛实验表明，MSD-KMamba在分割精度、稳健性和泛化方面均优于最新方法，同时保持高计算效率和良好的可扩展性。MSD-KMamba的源代码可公开访问https://github.com/daimao-zhang/MSD-KMamba。

论文及项目相关链接

PDF

Summary

本文提出一种名为MSD-KMamba的3D多模态图像分割框架，它结合了双向空间感知和多尺度自蒸馏技术。该框架能有效捕捉长距离空间上下文依赖关系，增强模型学习复杂和异质模式的能力。通过利用双向空间感知分支和多尺度自蒸馏融合策略，MSD-KMamba在体积分割中有效缓解了二次计算复杂度瓶颈，并解决了全局感知不足的问题。在多个标准数据集上的实验表明，MSD-KMamba在分割精度、鲁棒性和泛化能力方面均优于最新方法，同时保持了较高的计算效率和良好的可扩展性。

Key Takeaways

MSD-KMamba框架结合了双向空间感知和多尺度自蒸馏技术，旨在解决CNN-Transformer混合模型在计算效率和性能上的挑战。
双向空间感知分支能有效捕捉长距离空间上下文依赖关系，并增强模型学习复杂和异质模式的能力。
多尺度自蒸馏融合策略强化了层次特征表示，并改善了不同分辨率级别的语义信息传输。
MSD-KMamba通过结合上述技术，有效缓解了体积分割中的二次计算复杂度瓶颈和全局感知不足的问题。
在多个标准数据集上的实验表明，MSD-KMamba在分割精度、鲁棒性和泛化能力方面均优于现有最新方法。
MSD-KMamba框架的源代码已公开可访问，便于其他研究者使用和改进。

Cool Papers

点此查看论文截图

Authors:Md. Saiful Bari Siddiqui, Mohammed Imamul Hassan Bhuiyan

Convolutional Neural Networks have become a cornerstone of medical image analysis due to their proficiency in learning hierarchical spatial features. However, this focus on a single domain is inefficient at capturing global, holistic patterns and fails to explicitly model an image’s frequency-domain characteristics. To address these challenges, we propose the Spatial-Spectral Summarizer Fusion Network (S$^3$F-Net), a dual-branch framework that learns from both spatial and spectral representations simultaneously. The S$^3$F-Net performs a fusion of a deep spatial CNN with our proposed shallow spectral encoder, SpectraNet. SpectraNet features the proposed SpectralFilter layer, which leverages the Convolution Theorem by applying a bank of learnable filters directly to an image’s full Fourier spectrum via a computation-efficient element-wise multiplication. This allows the SpectralFilter layer to attain a global receptive field instantaneously, with its output being distilled by a lightweight summarizer network. We evaluate S$^3$F-Net across four medical imaging datasets spanning different modalities to validate its efficacy and generalizability. Our framework consistently and significantly outperforms its strong spatial-only baseline in all cases, with accuracy improvements of up to 5.13%. With a powerful Bilinear Fusion, S$^3$F-Net achieves a SOTA competitive accuracy of 98.76% on the BRISC2025 dataset. Concatenation Fusion performs better on the texture-dominant Chest X-Ray Pneumonia dataset, achieving 93.11% accuracy, surpassing many top-performing, much deeper models. Our explainability analysis also reveals that the S$^3$F-Net learns to dynamically adjust its reliance on each branch based on the input pathology. These results verify that our dual-domain approach is a powerful and generalizable paradigm for medical image analysis.

卷积神经网络因其在学习层次化空间特征方面的专长，已成为医学图像分析的核心。然而，对单一领域的关注在捕捉全局整体模式方面效率低下，并且未能显式地建模图像的频域特性。为了应对这些挑战，我们提出了空间-光谱综合融合网络（S$^3$F-Net），这是一个双分支框架，可以同时从空间和光谱表示中学习。S$^3$F-Net将深度空间CNN与我们提出的浅层光谱编码器SpectraNet相融合。SpectraNet具有我们提出的SpectralFilter层，它利用卷积定理，通过元素级乘法直接将一组可学习的滤波器应用于图像的全傅立叶谱，从而实现高效计算。这使得SpectralFilter层能够瞬间获得全局感受野，其输出通过轻量级摘要网络进行提炼。我们在四种不同模态的医学成像数据集上评估了S$^3$F-Net的有效性泛化性。我们的框架在所有情况下都始终显著优于仅基于空间的强大基线，准确率提高了高达5.13%。通过强大的双线性融合，S$^3$F-Net在BRISC2025数据集上实现了竞争性的最高准确率98.76%。在纹理主导的胸部X光肺炎数据集上，串联融合表现更好，达到了93.11%的准确率，超越了许多表现优异的更深层次模型。我们的解释性分析还表明，S$^3$F-Net学会了根据输入的病理学动态调整对每个分支的依赖。这些结果验证了我们的双域方法在医学图像分析中的强大和通用性。

论文及项目相关链接

PDF Submitted to IEEE Journal of Biomedical and Health Informatics (JBHI). This preprint includes few additional details not present in the journal submission

Summary

卷积神经网络在医学图像分析领域具有卓越表现，但专注于单一空间域存在局限。为克服此挑战，提出空间-频谱摘要融合网络（S^3F-Net），结合空间与频谱表示学习。S^3F-Net融合深度空间CNN与浅谱编码器SpectraNet，采用频谱滤波器层处理图像全频谱。该网络在四个医学成像数据集上表现优异，较基线方法显著提高精度，最高提升5.13%。通过双线性融合与拼接融合，在BRISC2025与Chest X-Ray Pneumonia数据集上分别达98.76%与93.11%准确率。网络能动态调整对两分支的依赖，展现强大且通用的医学图像分析范式。

Key Takeaways

卷积神经网络在医学图像分析中的重要作用及其专注于单一空间域的挑战。
S^3F-Net的出现，作为双分支框架，能够同时学习空间与频谱表示。
S^3F-Net融合了深度空间CNN与浅谱编码器SpectraNet，其中SpectraNet包含频谱滤波器层，可直接应用于图像全频谱。
S^3F-Net在多个医学成像数据集上表现优异，较基线方法有显著提高。
双线性融合与拼接融合策略在不同数据集上的效果差异。
S^3F-Net能动态调整对空间与频谱分支的依赖，提供强大的医学图像分析能力。

Cool Papers

点此查看论文截图

CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

Authors:Xi Zhang, Zaiqiao Meng, Jake Lever, Edmond S. L. Ho

Multimodal large language models (MLLMs) have recently achieved remarkable progress in radiology by integrating visual perception with natural language understanding. However, they often generate clinically unsupported descriptions, known as medical hallucinations, which pose serious risks in medical applications that demand accuracy and image-grounded outputs. Through empirical analysis, we find that prompt-induced hallucinations remain prevalent in radiology MLLMs, largely due to over-sensitivity to clinical sections. To address this, we introduce Clinical Contrastive Cecoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task-specific radiology expert models. CCD introduces a dual-stage contrastive mechanism to refine token-level logits during generation, thereby enhancing clinical fidelity without modifying the base MLLM. Experiments on three datasets and multiple models demonstrate that CCD consistently improves overall performance on radiology report generation (RRG). On the MIMIC-CXR dataset, it yields up to a 17% improvement in RadGraph-F1 when applied to state-of-the-art RRG models. Our approach provides a lightweight and generalisable solution for mitigating medical hallucinations, effectively bridging expert models and MLLMs in radiology.

多模态大型语言模型（MLLMs）最近通过整合视觉感知与自然语言理解在放射学领域取得了显著的进步。然而，它们往往会生成不受临床支持的描述，被称为医学幻觉，这在需要准确性和图像基础输出的医学应用中带来了严重的风险。通过实证分析，我们发现提示诱导的幻觉在放射学MLLMs中仍然普遍存在，很大程度上是由于对临床部分的过度敏感。为了解决这一问题，我们引入了临床对比编码（CCD），这是一种无需训练和检索的推理框架，它集成了来自特定任务放射学专家模型的结构化临床信号。CCD引入了一种双阶段对比机制，用于在生成过程中细化令牌级别的逻辑，从而提高临床保真度，同时不修改基础MLLM。在三个数据集和多个模型上的实验表明，CCD在放射学报告生成（RRG）方面始终提高了总体性能。在MIMIC-CXR数据集上，当应用于最先进的RRG模型时，它在RadGraph-F1上提高了高达17%。我们的方法提供了一种轻量级和通用的解决方案，用于缓解医学幻觉，有效地桥接了专家模型和MLLMs在放射学领域的应用。

论文及项目相关链接

PDF Preprint

Summary

本文主要介绍了多模态大型语言模型（MLLMs）在放射学领域的最新进展，通过整合视觉感知与自然语言理解取得了显著成效。然而，这类模型常产生医学上未经证实描述的“医学幻觉”，这在需要精确性和图像支撑输出的医学应用中带来了风险。为解决这一问题，本文提出了临床对比编码（CCD）方法，这是一种无需训练和检索的推理框架，能够整合来自特定任务放射学专家模型的结构化临床信号。CCD通过双重对比机制优化了生成过程中的标记级逻辑，在不改变基础MLLM的前提下提高了临床准确性。实验证明，该方法在放射学报告生成任务上表现优异，特别是在MIMIC-CXR数据集上，相较于最先进的放射学报告生成模型，其RadGraph-F1得分提高了高达17%。本文方法为缓解医学幻觉问题提供了轻便且通用的解决方案，有效地将专家模型和MLLMs桥接起来。

Key Takeaways

多模态大型语言模型（MLLMs）在放射学领域通过结合视觉感知和自然语言理解取得了进步。
MLLMs会产生医学上未经证实的描述，称为“医学幻觉”，这在医学应用中带来风险。
临床对比编码（CCD）是一种无需训练和检索的推理框架，旨在解决MLLMs中的医学幻觉问题。
CCD通过整合结构化临床信号和双重对比机制优化生成过程。
CCD提高了临床准确性，且在不改变基础MLLM的前提下实现了这一目标。
实验证明，CCD在放射学报告生成任务上表现优异，特别是在MIMIC-CXR数据集上显著提高性能。

Cool Papers

点此查看论文截图

Transfer Learning and Machine Learning for Training Five Year Survival Prognostic Models in Early Breast Cancer

Authors:Lisa Pilgram, Kai Yang, Ana-Alicia Beltran-Bless, Gregory R. Pond, Lisa Vandermeer, John Hilton, Marie-France Savard, Andréanne Leblanc, Lois Sheperd, Bingshu E. Chen, John M. S. Bartlett, Karen J. Taylor, Jane Bayani, Sarah L. Barker, Melanie Spears, Cornelis J. H. van der Velde, Elma Meershoek-Klein Kranenbarg, Luc Dirix, Elizabeth Mallon, Annette Hasenburg, Christos Markopoulos, Lamin Juwara, Fida K. Dankar, Mark Clemons, Khaled El Emam

Prognostic information is essential for decision-making in breast cancer management. Recently trials have predominantly focused on genomic prognostication tools, even though clinicopathological prognostication is less costly and more widely accessible. Machine learning (ML), transfer learning and ensemble integration offer opportunities to build robust prognostication frameworks. We evaluate this potential to improve survival prognostication in breast cancer by comparing de-novo ML, transfer learning from a pre-trained prognostic tool and ensemble integration. Data from the MA.27 trial was used for model training, with external validation on the TEAM trial and a SEER cohort. Transfer learning was applied by fine-tuning the pre-trained prognostic tool PREDICT v3, de-novo ML included Random Survival Forests and Extreme Gradient Boosting, and ensemble integration was realized through a weighted sum of model predictions. Transfer learning, de-novo RSF, and ensemble integration improved calibration in MA.27 over the pre-trained model (ICI reduced from 0.042 in PREDICT v3 to <=0.007) while discrimination remained comparable (AUC increased from 0.738 in PREDICT v3 to 0.744-0.799). Invalid PREDICT v3 predictions were observed in 23.8-25.8% of MA.27 individuals due to missing information. In contrast, ML models and ensemble integration could predict survival regardless of missing information. Across all models, patient age, nodal status, pathological grading and tumor size had the highest SHAP values, indicating their importance for survival prognostication. External validation in SEER, but not in TEAM, confirmed the benefits of transfer learning, RSF and ensemble integration. This study demonstrates that transfer learning, de-novo RSF, and ensemble integration can improve prognostication in situations where relevant information for PREDICT v3 is lacking or where a dataset shift is likely.

预后信息对于乳腺癌管理中的决策制定至关重要。尽管临床病理预后相对成本更低且更易于获取，但最近的研究主要集中于基因组预后工具。机器学习（ML）、迁移学习和集成方法为构建稳健的预后框架提供了机会。我们通过比较全新机器学习、从预训练预后工具进行迁移学习和集成方法，评估了提高乳腺癌生存预后的潜力。使用MA.27试验的数据进行模型训练，并在TEAM试验和SEER队列中进行外部验证。通过微调预训练的预后工具PREDICT v3应用迁移学习，全新机器学习包括随机生存森林和极端梯度提升，而集成方法则是通过模型预测加权和来实现。在MA.27中，迁移学习、全新随机森林和集成改进了相对于预训练模型的校准（ICI从PREDICT v3中的0.042减少到<=0.007），同时鉴别能力保持相当（AUC从PREDICT v3中的0.738增加到0.744-0.799）。由于信息缺失，PREDICT v3的预测在MA.27个体中的23.8-25.8%被判定为无效。相比之下，机器模型和集成方法能够预测生存情况，无论信息是否缺失。在所有模型中，患者年龄、节点状态、病理分级和肿瘤大小具有最高的SHAP值，这表明它们在生存预后中的重要性。在SEER中的外部验证，而非TEAM，证实了迁移学习、随机森林和集成的优势。本研究表明，在缺乏用于PREDICT v3的相关信息或数据集可能发生变动的情况下，迁移学习、全新随机森林和集成方法可以改进预后预测。

论文及项目相关链接

PDF

Summary

本文探讨了乳腺癌预后预测的重要性，并评估了机器学习、迁移学习和集成方法在改善乳腺癌生存预后预测方面的潜力。研究使用MA.27试验数据进行模型训练，并通过TEAM试验和SEER队列进行外部验证。结果显示，迁移学习、全新随机森林和集成方法改进了MA.27中的校准，且判别能力保持良好。在缺失信息的情况下，机器学习模型和集成方法仍然能够预测生存。患者年龄、结节状态、病理分级和肿瘤大小对生存预后预测最为重要。在SEER中进行的外部验证证实了迁移学习、随机森林和集成方法的好处，但在TEAM中未得到验证。

Key Takeaways

乳腺癌预后信息对决策至关重要，近期研究主要关注基因组预后工具，但临床病理预后工具具有较低成本和更广泛的可及性。
机器学习、迁移学习和集成方法为提高乳腺癌预后预测的准确性提供了机会。
在MA.27试验中，迁移学习、全新随机森林和集成方法改进了预训练模型的校准，同时保持较好的判别能力。
机器学习模型和集成方法能够在缺失信息的情况下进行生存预测。
患者年龄、结节状态、病理分级和肿瘤大小是生存预后预测的关键因素。
在SEER进行的外部验证支持了新方法（迁移学习、随机森林和集成方法）的优势。

Cool Papers

点此查看论文截图

Untangling Vascular Trees for Surgery and Interventional Radiology

Authors:Guillaume Houry, Tom Boeken, Stéphanie Allassonnière, Jean Feydy

The diffusion of minimally invasive, endovascular interventions motivates the development of visualization methods for complex vascular networks. We propose a planar representation of blood vessel trees which preserves the properties that are most relevant to catheter navigation: topology, length and curvature. Taking as input a three-dimensional digital angiography, our algorithm produces a faithful two-dimensional map of the patient’s vessels within a few seconds. To this end, we propose optimized implementations of standard morphological filters and a new recursive embedding algorithm that preserves the global orientation of the vascular network. We showcase our method on peroperative images of the brain, pelvic and knee artery networks. On the clinical side, our method simplifies the choice of devices prior to and during the intervention. This lowers the risk of failure during navigation or device deployment and may help to reduce the gap between expert and common intervention centers. From a research perspective, our method simulates the cadaveric display of artery trees from anatomical dissections. This opens the door to large population studies on the branching patterns and tortuosity of fine human blood vessels. Our code is released under the permissive MIT license as part of the scikit-shapes Python library (https://scikit-shapes.github.io ).

微创血管内干预的普及促进了复杂血管网络可视化方法的发展。我们提出了一种血管树的平面表示方法，这种方法保留了对导管导航至关重要的属性：拓扑结构、长度和曲率。以三维数字血管造影为输入，我们的算法在几秒内生成患者血管的忠实二维地图。为此，我们对标准形态学滤波器进行了优化实现，并提出了一种新的递归嵌入算法，该算法保留了血管网络的全局方向。我们在大脑的术中图像、盆腔和膝关节动脉网络上展示了我们的方法。在临床方面，我们的方法简化了干预前后设备的选择。这降低了导航或设备部署过程中的失败风险，并有助于缩小专家与普通干预中心之间的差距。从研究角度来看，我们的方法模拟了解剖解剖中动脉树的尸体显示。这为研究人类精细血管的分支模式和扭曲性提供了大规模人群研究的机会。我们的代码作为scikit-shapes Python库的一部分，在许可的MIT许可下发布（https://scikit-shapes.github.io）。

论文及项目相关链接

PDF

Summary

本算法针对复杂的血管网络开发了一种可视化方法，提出了血管树的平面表示方式，并保留了导管导航最相关的属性：拓扑、长度和曲率。通过三维数字血管造影术输入，算法可在几秒内生成患者血管的忠实二维地图。该方法简化了手术过程中的设备选择，降低了导航或设备部署过程中的失败风险，并有助于缩小专家与普通介入中心之间的差距。此外，该方法还模拟了解剖学研究中动脉树的展示，为研究人类血管分支模式和弯曲度提供了机会。

Key Takeaways

提出了针对复杂血管网络的可视化方法，强调导管导航相关的拓扑、长度和曲率属性的保留。
通过三维数字血管造影术输入，快速生成二维血管地图。
方法简化了手术中的设备选择，降低了手术风险。
有助于缩小专家与普通介入中心的差距。
模拟了解剖学中的动脉树展示，为血管研究提供机会。
可用于研究人类血管的分支模式和弯曲度。

Cool Papers

点此查看论文截图

Mask What Matters: Controllable Text-Guided Masking for Self-Supervised Medical Image Analysis

Authors:Ruilang Wang, Shuotong Xu, Bowen Liu, Runlin Huang, Donglong Chen, Weifeng Su

The scarcity of annotated data in specialized domains such as medical imaging presents significant challenges to training robust vision models. While self-supervised masked image modeling (MIM) offers a promising solution, existing approaches largely rely on random high-ratio masking, leading to inefficiency and poor semantic alignment. Moreover, region-aware variants typically depend on reconstruction heuristics or supervised signals, limiting their adaptability across tasks and modalities. We propose Mask What Matters, a controllable text-guided masking framework for self-supervised medical image analysis. By leveraging vision-language models for prompt-based region localization, our method flexibly applies differentiated masking to emphasize diagnostically relevant regions while reducing redundancy in background areas. This controllable design enables better semantic alignment, improved representation learning, and stronger cross-task generalizability. Comprehensive evaluation across multiple medical imaging modalities, including brain MRI, chest CT, and lung X-ray, shows that Mask What Matters consistently outperforms existing MIM methods (e.g., SparK), achieving gains of up to +3.1 percentage points in classification accuracy, +1.3 in box average precision (BoxAP), and +1.1 in mask average precision (MaskAP) for detection. Notably, it achieves these improvements with substantially lower overall masking ratios (e.g., 40% vs. 70%). This work demonstrates that controllable, text-driven masking can enable semantically aligned self-supervised learning, advancing the development of robust vision models for medical image analysis.

在医学成像等特定领域，标注数据的稀缺性为训练稳健的视觉模型带来了重大挑战。虽然自监督的掩码图像建模（MIM）提供了有前途的解决方案，但现有方法大多依赖于随机的高比例掩码，导致效率低下和语义对齐不佳。此外，区域感知变体通常依赖于重建启发式或监督信号，这限制了它们在任务和模态之间的适应性。我们提出了“Mask What Matters”（掩码关键信息），这是一种用于自监督医学图像分析的可控文本引导掩码框架。通过利用视觉语言模型进行基于提示的区域定位，我们的方法可以灵活地应用差异化掩码，以强调与诊断相关的区域，同时减少背景区域的冗余。这种可控的设计实现了更好的语义对齐、改进了表示学习和更强的跨任务泛化能力。在多种医学成像模态上的综合评估，包括脑MRI、胸部CT和肺部X射线，表明Mask What Matters始终优于现有的MIM方法（例如SparK），在分类精度上提高了高达+3.1个百分点，框平均精度（BoxAP）提高了+1.3，掩膜平均精度（MaskAP）在检测方面提高了+1.1。值得注意的是，它在实现这些改进的同时，整体掩码比例大大降低（例如，40%对70%）。这项工作表明，可控的、文本驱动的掩码可以实现语义对齐的自监督学习，推动医学图像分析中的稳健视觉模型的发展。

论文及项目相关链接

PDF

Summary
针对医学成像等特定领域标注数据稀缺的问题，现有自监督遮挡图像建模（MIM）方法存在随机高比例遮挡，导致效率低下和语义对齐不佳。本文提出Mask What Matters，一个可控文本引导的遮挡框架，用于自监督医学图像分析。该方法利用视觉语言模型进行基于提示的区域定位，灵活应用差异化遮挡，强调诊断相关区域，减少背景区域的冗余。此可控设计实现了更好的语义对齐、改进了表征学习和增强了跨任务泛化能力。在多种医学成像模态上的综合评估表明，Mask What Matters在分类精度、框平均精度和掩膜平均精度方面均优于现有MIM方法，并实现了较低的整体遮挡比例。该工作证明可控的文本驱动遮挡可实现语义对齐的自监督学习，为医学图像分析的稳健视觉模型开发提供了先进的方法。

Key Takeaways

医学成像领域面临标注数据稀缺的挑战。
自监督遮挡图像建模（MIM）是一种有前景的解决方案。
现有MIM方法主要依赖随机高比例遮挡，导致效率和语义对齐问题。
Mask What Matters采用可控文本引导的遮挡框架，用于自监督医学图像分析。
该方法利用视觉语言模型进行区域定位，实现差异化遮挡，强调诊断相关区域。
Mask What Matters在多个医学成像模态上的表现优于现有MIM方法。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-10-02/%E5%8C%BB%E5%AD%A6%E5%9B%BE%E5%83%8F/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

医学图像

TTS

TTS 方向最新论文已更新，请持续关注 Update in 2025-10-02 Go with Your Gut Scaling Confidence for Autoregressive Image Generation

2025-10-02 TTS

TTS

牙齿修复

牙齿修复方向最新论文已更新，请持续关注 Update in 2025-10-02 U-Mamba2 Scaling State Space Models for Dental Anatomy Segmentation in CBCT

2025-10-02 牙齿修复

牙齿修复

医学图像

2025-10-02 更新

Comparative study of Wavelet transform and Fourier domain filtering for medical image denoising

Automated and Scalable SEM Image Analysis of Perovskite Solar Cell Materials via a Deep Segmentation Framework

MoSe2 and WSe2 shell morphology control via temperature optimization during two-step growth of ZnSe-based core-shell nanowires

Multi-modal Liver Segmentation and Fibrosis Staging Using Real-world MRI Images

Causally Guided Gaussian Perturbations for Out-Of-Distribution Generalization in Medical Imaging

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Dolphin v1.0 Technical Report

K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model

GenVarFormer: Predicting gene expression from long-range mutations in cancer

Radiology’s Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology

Evaluating Foundation Models with Pathological Concept Learning for Kidney Cancer

RIFLE: Removal of Image Flicker-Banding via Latent Diffusion Enhancement

EWC-Guided Diffusion Replay for Exemplar-Free Continual Learning in Medical Imaging

HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and Generation

MSD-KMamba: Bidirectional Spatial-Aware Multi-Modal 3D Brain Segmentation via Multi-scale Self-Distilled Fusion Strategy

S$^3$F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network

CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

Transfer Learning and Machine Learning for Training Five Year Survival Prognostic Models in Early Breast Cancer

Untangling Vascular Trees for Surgery and Interventional Radiology

Mask What Matters: Controllable Text-Guided Masking for Self-Supervised Medical Image Analysis