⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-09-12 更新
Delving into the depths of NGC 3783 with XRISM II. Cross-calibration of X-ray instruments used in the large, multi-mission observational campaign
Authors: XRISM collaboration
Accurate X-ray spectroscopic measurements are fundamental for deriving basic physical parameters of the most abundant baryon components in the Universe. The plethora of X-ray observatories currently operational enables a panchromatic view of the high-energy emission of celestial sources. However, uncertainties in the energy-dependent calibration of the instrument transfer functions (e.g. the effective area, energy redistribution, or gain) can limit - and historically, did limit - the accuracy of X-ray spectroscopic measurements. We revised the status of the cross-calibration among the scientific payload on board four operation missions: Chandra, NuSTAR, XMM-Newton, and the recently launched XRISM. XRISM carries the micro-calorimeter Resolve, which yields the best energy resolution at energies above 2 keV. For this purpose, we used the data from a 10-day-long observational campaign targeting the nearby active galactic nucleus NGC 3783, carried out in July 2024. We present a novel model-independent method for assessing the cross-calibration status that is based on a multi-node spline of the spectra with the highest-resolving power (XRISM/Resolve in our campaign). We also estimated the impact of the intrinsic variability of NGC 3783 on the cross-calibration status due to the different time coverages of participating observatories and performed an empirical reassessment of the Resolve throughput at low energies. Based on this analysis, we derived a set of energy-dependent correction factors of the observed responses, enabling a statistically robust analysis of the whole spectral dataset. They will be employed in subsequent papers describing the astrophysical results of the campaign.
准确的X射线光谱测量对于推导宇宙中丰度最高的重子成分的基本物理参数至关重要。当前运营的众多X射线天文台能够对天体源的高能发射进行全光谱观察。然而,仪器传递函数(例如有效面积、能量重新分配或增益)的能源依赖性校准中的不确定性可能会限制(历史上也确实限制过)X射线光谱测量的准确性。我们回顾了四个运营任务中科学有效载荷之间的交叉校准情况:钱德拉、NuSTAR、XMM-牛顿和最近发射的XRISM。XRISM携带微型热量计Resolve,在高于2千电子伏的能量下具有最佳能量分辨率。为此,我们使用了于2024年7月开展的针对附近活动星系核NGC 3783的为期10天的观测活动的数据。我们提出了一种新型的独立于模型之外的评估交叉校准状况的方法,该方法基于具有最高分辨率(我们的活动中的XRISM/Resolve)的多节点光谱插值。我们还估计了由于参与观测的天文台的不同的时间覆盖范围,NGC 3783的内在变化对交叉校准状态的影响,并对Resolve在低能下的吞吐量进行了经验重新评估。基于这一分析,我们得出了一组观察到的响应的能量相关校正因子,能够对整个光谱数据集进行统计稳健的分析。他们将在后续论文中用于描述活动的天文结果。
论文及项目相关链接
PDF 12 pages, 12 figures, Astronomy & Astrophysics, accepted for publication
摘要
X射线光谱测量的准确性对于推导宇宙中最多巴原子成分的基本物理参数至关重要。当前运营的众多X射线观测站能够实现全天候高能量发射的观测。然而,仪器传输功能的能量相关校准(如有效面积、能量重新分配或增益)的不确定性可能会限制X射线光谱测量的准确性。本文回顾了四个运行任务中科学载荷之间的交叉校准状态:Chandra、NuSTAR、XMM-Newton以及最近发射的XRISM。XRISM携带有高分辨率微量热量计Resolve,可在高于2 keV的能量下实现最佳能量分辨率。为此,我们使用了针对附近活动星系核NGC 3783的为期10天的观测活动的数据。我们提出了一种新型独立于模型的方法,用以评估交叉校准状态,该方法基于具有最高分辨率的多节点谱线插值(我们的活动中的XRISM/Resolve)。我们还估计了由于参与观测站的不同时间覆盖,NGC 3783的内在变化对交叉校准状态的影响,并对Resolve在低能量的通过率进行了经验再评估。基于这项分析,我们得到了一系列能量相关的校正因子,对观测反应进行校正,实现对整个光谱数据集的统计稳健分析。这些校正因子将在后续描述活动成果的论文中使用。
关键见解
- X射线光谱测量的准确性对于推导宇宙基本物理参数至关重要。
- 当前运营的X射线观测站实现了对天体高能量发射的全方位观测。
- 仪器传输功能的能量相关校准的不确定性可能影响X射线光谱测量的准确性。
- XRISM和其高分辨率微量热量计Resolve在X射线光谱测量中表现出卓越性能。
- 使用新型独立于模型的方法评估交叉校准状态,以更准确地测量X射线光谱。
- NGC 3783的内在变化对交叉校准状态有影响,尤其是在不同时间覆盖的观测活动中。
点此查看论文截图





RoentMod: A Synthetic Chest X-Ray Modification Model to Identify and Correct Image Interpretation Model Shortcuts
Authors:Lauren H. Cooke, Matthias Jung, Jan M. Brendel, Nora M. Kerkovits, Borek Foldyna, Michael T. Lu, Vineet K. Raghu
Chest radiographs (CXRs) are among the most common tests in medicine. Automated image interpretation may reduce radiologists' workload and expand access to diagnostic expertise. Deep learning multi-task and foundation models have shown strong performance for CXR interpretation but are vulnerable to shortcut learning, where models rely on spurious and off-target correlations rather than clinically relevant features to make decisions. We introduce RoentMod, a counterfactual image editing framework that generates anatomically realistic CXRs with user-specified, synthetic pathology while preserving unrelated anatomical features of the original scan. RoentMod combines an open-source medical image generator (RoentGen) with an image-to-image modification model without requiring retraining. In reader studies with board-certified radiologists and radiology residents, RoentMod-produced images appeared realistic in 93% of cases, correctly incorporated the specified finding in 89-99% of cases, and preserved native anatomy comparable to real follow-up CXRs. Using RoentMod, we demonstrate that state-of-the-art multi-task and foundation models frequently exploit off-target pathology as shortcuts, limiting their specificity. Incorporating RoentMod-generated counterfactual images during training mitigated this vulnerability, improving model discrimination across multiple pathologies by 3-19% AUC in internal validation and by 1-11% for 5 out of 6 tested pathologies in external testing. These findings establish RoentMod as a broadly applicable tool for probing and correcting shortcut learning in medical AI. By enabling controlled counterfactual interventions, RoentMod enhances the robustness and interpretability of CXR interpretation models and provides a generalizable strategy for improving foundation models in medical imaging.
胸部X射线(CXR)是医学中最常见的检查之一。自动图像解读可以减少放射科医师的工作量,并扩大诊断专业的可及性。深度学习多任务模型和基础模型在CXR解读方面表现出强大的性能,但容易陷入捷径学习,即模型依赖于偶然和偏离目标的关联,而不是临床相关的特征来做出决策。我们引入了RoentMod,这是一种基于反事实的图像编辑框架,能够生成具有用户指定合成病理的解剖结构真实的CXR图像,同时保留原始扫描中无关解剖特征。RoentMod结合了开源医学图像生成器(RoentGen)和图像到图像的修改模型,无需重新训练。在与认证放射科医师和放射科住院医师的读者研究中,RoentMod生成的图像在93%的情况下看起来很真实,在89-99%的情况下正确纳入了指定的发现物,并且保留的原始解剖结构与真实的后续CXR相当。我们使用RoentMod证明,最先进的多任务模型和基础模型经常利用偏离目标的病理作为捷径,这限制了它们的特异性。在培训过程中融入RoentMod生成的反事实图像减轻了这一漏洞,通过内部验证和外部测试,模型在多病理方面的辨别能力提高了3-19%的AUC(曲线下面积)以及外部测试中测试的六种病理中的五种提高了1-11%的AUC。这些发现确立了RoentMod作为一种广泛应用于探索和改进医疗人工智能捷径学习的工具。通过实现受控的反事实干预,RoentMod增强了CXR解读模型的稳健性和可解释性,并提供了一种通用的策略来改善医学成像中的基础模型。
论文及项目相关链接
PDF 25 + 8 pages, 4 + 7 figures
摘要
本文介绍了一种名为RoentMod的医学图像编辑框架,用于生成具有用户指定合成病理的解剖学真实胸片。该框架结合了开源医学图像生成器RoentGen和一个图像到图像的修改模型,无需重新训练。研究结果显示,RoentMod生成的图像在大多数病例中看起来非常真实,能正确融入指定的发现,并保留与原扫描无关的原生解剖结构。此外,该框架还揭示了最先进的多任务和基础模型经常利用非目标病理作为捷径,限制了其特异性。在训练中融入RoentMod生成的反事实图像缓解了这一漏洞,提高了模型对多种病理的辨别能力。总的来说,RoentMod作为一种广泛应用于医学人工智能探测和纠正捷径学习的工具,通过实现受控的反事实干预,提高了胸片解读模型的稳健性和可解释性,并为改善医学成像基础模型提供了可推广的策略。
关键要点
- RoentMod是一个用于生成具有用户指定合成病理的解剖学真实胸片的框架。
- RoentMod结合了RoentGen医学图像生成器和无需重新训练的图像修改模型。
- RoentMod生成的图像在读者研究中被认证为真实,并正确融入了指定的发现。
- 最先进的多任务和基础模型在医学图像解读中存在利用非目标病理作为捷径的问题。
- 融入RoentMod生成的反事实图像训练缓解了模型对捷径的依赖,提高了模型的辨别能力。
- RoentMod增强了模型的稳健性和可解释性,为改善医学成像基础模型提供了策略。
点此查看论文截图

Implicit Shape-Prior for Few-Shot Assisted 3D Segmentation
Authors:Mathilde Monvoisin, Louise Piecuch, Blanche Texier, Cédric Hémon, Anaïs Barateau, Jérémie Huet, Antoine Nordez, Anne-Sophie Boureau, Jean-Claude Nunes, Diana Mateus
The objective of this paper is to significantly reduce the manual workload required from medical professionals in complex 3D segmentation tasks that cannot be yet fully automated. For instance, in radiotherapy planning, organs at risk must be accurately identified in computed tomography (CT) or magnetic resonance imaging (MRI) scans to ensure they are spared from harmful radiation. Similarly, diagnosing age-related degenerative diseases such as sarcopenia, which involve progressive muscle volume loss and strength, is commonly based on muscular mass measurements often obtained from manual segmentation of medical volumes. To alleviate the manual-segmentation burden, this paper introduces an implicit shape prior to segment volumes from sparse slice manual annotations generalized to the multi-organ case, along with a simple framework for automatically selecting the most informative slices to guide and minimize the next interactions. The experimental validation shows the method’s effectiveness on two medical use cases: assisted segmentation in the context of at risks organs for brain cancer patients, and acceleration of the creation of a new database with unseen muscle shapes for patients with sarcopenia.
本文的目标是为了显著减少在复杂的3D分割任务中,医学专业人士所需的手动工作量,这些任务目前还不能完全自动化。例如,在放射治疗计划中,必须在计算机断层扫描(CT)或磁共振成像(MRI)扫描中准确识别出有风险的器官,以确保这些器官免受有害辐射的影响。同样,诊断与肌肉体积减少和力量减弱有关的年龄相关性退化疾病,如肌少症,通常基于从手动分割医学体积所获得的肌肉质量测量。为了减轻手动分割的负担,本文引入了一种隐式形状先验,从稀疏切片手动注释中分割体积,并将其推广到多器官情况,以及一个简单框架,用于自动选择信息量最大的切片来指导和最小化下一次交互。实验验证显示该方法在两种医学用例中的有效性:辅助分割脑癌患者风险器官的情境,以及加速为肌少症患者创建包含未见肌肉形状的新数据库。
论文及项目相关链接
PDF Both first Authors contributed equally to this work, lastnames in alphabetical order. This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution will be published in a Springer Nature Computer Science book series (CCIS, LNAI, LNBI, LNBIP, LNCS) and the doi will soon be released
Summary
医学图像分割研究旨在降低专业人士在复杂三维分割任务中的工作量,无法全自动执行的任务通过引入隐式形状先验实现自动化。论文介绍了基于稀疏切片手动注释的分割方法,并将其推广到多器官领域,并构建了一个简单的框架自动选择最具信息量的切片作为指导并最小化后续交互。实验验证表明,该方法在风险器官分割和肌肉形状数据库创建等医学应用场景中有效。
Key Takeaways
- 该论文旨在减少医学专业人士在复杂三维分割任务中的工作量。
- 引入隐式形状先验,通过自动化方式实现难以完全自动化的任务。
- 提出一种基于稀疏切片手动注释的分割方法,适用于多器官分割。
- 构建了一个简单的框架自动选择最具信息量的切片来指导并最小化后续交互。
- 实验验证显示该方法在风险器官分割和肌肉形状数据库创建等医学应用场景中有效。
- 该方法可用于辅助放疗计划中的风险器官分割。
点此查看论文截图



Vision-Language Semantic Aggregation Leveraging Foundation Model for Generalizable Medical Image Segmentation
Authors:Wenjun Yu, Yinchen Zhou, Jia-Xuan Jiang, Shubin Zeng, Yuee Li, Zhong Wang
Multimodal models have achieved remarkable success in natural image segmentation, yet they often underperform when applied to the medical domain. Through extensive study, we attribute this performance gap to the challenges of multimodal fusion, primarily the significant semantic gap between abstract textual prompts and fine-grained medical visual features, as well as the resulting feature dispersion. To address these issues, we revisit the problem from the perspective of semantic aggregation. Specifically, we propose an Expectation-Maximization (EM) Aggregation mechanism and a Text-Guided Pixel Decoder. The former mitigates feature dispersion by dynamically clustering features into compact semantic centers to enhance cross-modal correspondence. The latter is designed to bridge the semantic gap by leveraging domain-invariant textual knowledge to effectively guide deep visual representations. The synergy between these two mechanisms significantly improves the model’s generalization ability. Extensive experiments on public cardiac and fundus datasets demonstrate that our method consistently outperforms existing SOTA approaches across multiple domain generalization benchmarks.
多模态模型在自然图像分割方面取得了显著的成功,但当应用于医学领域时,其性能往往不佳。通过深入研究,我们将这种性能差距归因于多模态融合的挑战,主要是抽象的文本提示和精细的医学视觉特征之间的语义差距,以及由此导致的特征分散。为了解决这些问题,我们从语义聚合的角度重新审视问题。具体来说,我们提出了一种期望最大化(EM)聚合机制和文本引导像素解码器。前者通过动态地将特征聚类到紧凑的语义中心,增强跨模态对应性,从而缓解特征分散问题。后者则利用领域不变的文本知识来有效地引导深度视觉表征,从而缩小语义差距。这两种机制的协同作用显著提高了模型的泛化能力。在公共心脏和眼底数据集上的大量实验表明,我们的方法在多个域泛化基准测试上始终优于现有的最先进方法。
论文及项目相关链接
PDF 29 pages and 8 figures
Summary
多模态模型在自然图像分割领域取得了显著的成功,但当应用于医学领域时常常表现不佳。研究指出,这主要是因为面临多模态融合的挑战,尤其是抽象文本提示和精细医学视觉特征之间的语义鸿沟以及由此产生的特征分散问题。为解决这些问题,研究提出了一种基于期望最大化(EM)的聚合机制和文本引导像素解码器。前者通过动态将特征聚类到紧凑的语义中心来缓解特征分散问题,从而提高跨模态的对应关系。后者旨在利用领域不变的文本知识来有效地引导深度视觉表征,从而缩小语义鸿沟。这两种机制的协同作用显著提高了模型的泛化能力。在公共心脏和眼底数据集上的实验表明,该方法在多域泛化基准测试中始终优于现有最先进的方案。
Key Takeaways
- 多模态模型在医学图像分割中表现欠佳。
- 主要挑战在于多模态融合问题,特别是语义鸿沟和特征分散。
- 研究提出了期望最大化(EM)聚合机制以缓解特征分散问题。
- 研究引入了文本引导像素解码器以缩小语义鸿沟并增强模型泛化能力。
- 该方法通过动态聚类特征和利用文本知识来提高模型的性能。
- 在公共心脏和眼底数据集上的实验表明,该方法在多域泛化测试中表现优越。
点此查看论文截图


Color-Blind Image Sensors: Towards Digital Twin of Human Retina
Authors:Yushan Meng, Bryce Widdicombe, Dechuan Sun, Paul Beckett, Peter van Wijngaarden, Efstratios Skafidas, Ampalavanapillai Nirmalathas, Ranjith Unnithan
The human retina contains a complex arrangement of photoreceptors that convert light into visual information. Conventional image sensors mimic the trichromacy of the retina using periodic filter mosaics responsive to three primary colors. However, this is, at best, an approximation, as an actual retina exhibits a quasi-random spatial distribution of light-sensitive rod and cone photoreceptors, where the ratio of rods to cones and their concentrations vary across the retina. Hence, the periodic mosaics are limited to accurately simulate the properties of the eye. Here, we present an image sensor with similar distribution, spacing, ratios and spectral characteristics of an actual foveal mosaic for emulating eye-like sampling and mimicking color blindness. To perform image reconstruction, we use a fully convolutional U-Net neural network adopting the concept of receptive fields in the retinal circuitry. Our research will enable the development of digital twin of a retina to further understand color vision deficiencies.
人类视网膜包含复杂的感光细胞排列,这些细胞将光转化为视觉信息。传统的图像传感器通过周期性滤波器马赛克响应三种主要颜色来模仿视网膜的三色视觉。然而,这最多只是一种近似,因为实际的视网膜表现出光敏杆状细胞和锥状感光细胞的准随机空间分布,其中杆状细胞和锥状细胞的比例及其浓度在视网膜上有所不同。因此,周期性马赛克在模拟眼睛属性方面存在局限性。在这里,我们提出了一种具有实际眼底马赛克相似分布、间距、比例和光谱特性的图像传感器,以模拟眼采样并模仿色盲。为了进行图像重建,我们采用了全卷积U-Net神经网络,采用视网膜电路中感受野的概念。我们的研究将能够实现视网膜的数字孪生,以进一步了解色觉缺陷。
论文及项目相关链接
PDF Journal
Summary
本文介绍了一种模拟视网膜图像传感器,其模拟人眼视网膜中的感光器分布、间距、比例和光谱特性,以模拟人眼的采样和色盲现象。该研究使用全卷积U-Net神经网络并采用视网膜电路中的感受野概念进行图像重建,有望为深入了解色觉缺陷提供数字视网膜双胞胎。
Key Takeaways
- 视网膜包含复杂的光感受器排列,将光转化为视觉信息。
- 传统图像传感器模仿视网膜的三色视觉,但与实际视网膜的光感受器分布存在差异。
- 人眼视网膜中的光感受器比例和浓度在视网膜上有所不同。
- 研究提出了一种模拟实际视网膜特性的图像传感器。
- 该研究使用全卷积U-Net神经网络和视网膜电路中的感受野概念进行图像重建。
- 此项研究可助力开发数字视网膜双胞胎。
点此查看论文截图




LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations
Authors:Payal Varshney, Adriano Lucieri, Christoph Balada, Sheraz Ahmed, Andreas Dengel
Video-based AI systems are increasingly adopted in safety-critical domains such as autonomous driving and healthcare. However, interpreting their decisions remains challenging due to the inherent spatiotemporal complexity of video data and the opacity of deep learning models. Existing explanation techniques often suffer from limited temporal coherence, insufficient robustness, and a lack of actionable causal insights. Current counterfactual explanation methods typically do not incorporate guidance from the target model, reducing semantic fidelity and practical utility. We introduce Latent Diffusion for Video Counterfactual Explanations (LD-ViCE), a novel framework designed to explain the behavior of video-based AI models. Compared to previous approaches, LD-ViCE reduces the computational costs of generating explanations by operating in latent space using a state-of-the-art diffusion model, while producing realistic and interpretable counterfactuals through an additional refinement step. Our experiments demonstrate the effectiveness of LD-ViCE across three diverse video datasets, including EchoNet-Dynamic (cardiac ultrasound), FERV39k (facial expression), and Something-Something V2 (action recognition). LD-ViCE outperforms a recent state-of-the-art method, achieving an increase in R2 score of up to 68% while reducing inference time by half. Qualitative analysis confirms that LD-ViCE generates semantically meaningful and temporally coherent explanations, offering valuable insights into the target model behavior. LD-ViCE represents a valuable step toward the trustworthy deployment of AI in safety-critical domains.
基于视频的AI系统越来越多地被应用在自动驾驶和医疗等安全关键领域。然而,由于视频数据的固有时空复杂性和深度学习模型的透明度,解释它们的决策仍然是一个挑战。现有的解释技术通常存在时间连贯性有限、鲁棒性不足和缺乏可操作性的因果洞察等问题。当前的反事实解释方法通常不结合目标模型的指导,降低了语义保真度和实用性。我们引入了用于视频反事实解释的潜在扩散(Latent Diffusion for Video Counterfactual Explanations,LD-ViCE),这是一个旨在解释基于视频的AI模型行为的新型框架。与以前的方法相比,LD-ViCE通过在潜在空间使用最先进的扩散模型来生成解释,从而降低了生成解释的计算成本,同时通过额外的细化步骤产生现实和可解释的反事实。我们的实验表明,LD-ViCE在三个不同的视频数据集上非常有效,包括EchoNet-Dynamic(心脏超声)、FERV39k(面部表情)和Something-Something V2(动作识别)。LD-ViCE优于最近的最先进方法,R2分数提高了高达68%,同时将推理时间减少了一半。定性分析证实,LD-ViCE生成的解释在语义上是有意义的,时间上连贯的,为目标模型的行为提供了有价值的见解。LD-ViCE朝着在关键安全领域可信部署AI的方向迈出了宝贵的一步。
论文及项目相关链接
PDF 30 pages
Summary
针对视频基AI系统决策解释的难题,提出了一种名为Latent Diffusion for Video Counterfactual Explanations(LD-ViCE)的新型框架。该框架通过操作潜在空间降低生成解释的计算成本,同时产生真实且可解读的反事实解释。实验证明,LD-ViCE在三个不同视频数据集上的表现优于现有方法,能生成语义上连贯且时间上连贯的解释。
Key Takeaways
- 视频基AI系统在安全关键领域的应用越来越广泛,但解释其决策仍然具有挑战性。
- 现有解释技术存在时间连贯性有限、稳健性不足和缺乏可操作性的因果见解的问题。
- LD-ViCE框架旨在解释视频基AI模型的行为,通过操作潜在空间降低生成解释的计算成本。
- LD-ViCE采用先进的扩散模型和细化步骤,产生真实且可解读的反事实解释。
- LD-ViCE在三个不同视频数据集上的表现优于现有方法,包括EchoNet-Dynamic(心脏超声)、FERV39k(面部表情)和Something-Something V2(动作识别)。
- LD-ViCE的R²得分提高幅度高达68%,同时推理时间减少一半。
- LD-ViCE生成的解释语义上连贯且时间上连贯,为理解目标模型行为提供了有价值的见解。
点此查看论文截图





SimCroP: Radiograph Representation Learning with Similarity-driven Cross-granularity Pre-training
Authors:Rongsheng Wang, Fenghe Tang, Qingsong Yao, Rui Yan, Xu Zhang, Zhen Huang, Haoran Lai, Zhiyang He, Xiaodong Tao, Zihang Jiang, Shaohua Kevin Zhou
Medical vision-language pre-training shows great potential in learning representative features from massive paired radiographs and reports. However, in computed tomography (CT) scans, the distribution of lesions which contain intricate structures is characterized by spatial sparsity. Besides, the complex and implicit relationships between different pathological descriptions in each sentence of the report and their corresponding sub-regions in radiographs pose additional challenges. In this paper, we propose a Similarity-Driven Cross-Granularity Pre-training (SimCroP) framework on chest CTs, which combines similarity-driven alignment and cross-granularity fusion to improve radiograph interpretation. We first leverage multi-modal masked modeling to optimize the encoder for understanding precise low-level semantics from radiographs. Then, similarity-driven alignment is designed to pre-train the encoder to adaptively select and align the correct patches corresponding to each sentence in reports. The cross-granularity fusion module integrates multimodal information across instance level and word-patch level, which helps the model better capture key pathology structures in sparse radiographs, resulting in improved performance for multi-scale downstream tasks. SimCroP is pre-trained on a large-scale paired CT-reports dataset and validated on image classification and segmentation tasks across five public datasets. Experimental results demonstrate that SimCroP outperforms both cutting-edge medical self-supervised learning methods and medical vision-language pre-training methods. Codes and models are available at https://github.com/ToniChopp/SimCroP.
医学视觉语言预训练显示出从大量配对的放射图像和报告中学习代表性特征的巨大潜力。然而,在计算机断层扫描(CT)中,包含复杂结构的病灶的空间稀疏性分布是一个特点。此外,报告中的每个句子中不同的病理描述与其在放射图像中相应子区域之间复杂而隐晦的关系带来了额外的挑战。在本文中,我们提出了基于胸部CT的Similarity-Driven Cross-Granularity Pre-training(SimCroP)框架,该框架结合了相似性驱动对齐和跨粒度融合,以改进放射图像的解释。我们首先利用多模式遮挡建模来优化编码器,以理解来自放射图像的精确低级语义。然后,设计相似性驱动对齐来预训练编码器,使其能够自适应地选择和对齐与报告中每个句子相对应的正确斑块。跨粒度融合模块融合了实例级别和单词斑块级别的多模式信息,有助于模型在稀疏的放射图像中更好地捕捉关键病理结构,从而提高多尺度下游任务的性能。SimCroP是在大规模配对CT报告数据集上进行预训练的,并在五个公共数据集上的图像分类和分割任务中进行了验证。实验结果表明,SimCroP在先进医学自监督学习方法和医学视觉语言预训练方法的性能上表现更优越。代码和模型可通过https://github.com/ToniChopp/SimCroP获取。
论文及项目相关链接
PDF Accepted by MICCAI 2025
摘要
医学视觉与语言预训练技术在从大量配对放射影像与报告中学习代表性特征方面显示出巨大潜力。在CT扫描中,由于病灶内含复杂结构且具有空间稀疏性分布特征,其识别面临挑战。此外,报告中每一句话对不同病理描述与其在放射影像中对应子区域之间复杂而隐含的关系也增加了难度。本文提出一种针对胸部CT的Similarity-Driven Cross-Granularity Pre-training(SimCroP)框架,结合相似性驱动对齐和跨粒度融合,以改善放射影像解读。首先,我们利用多模态掩模建模优化编码器,以理解放射影像中的精确低级语义。接着,设计相似性驱动对齐来预训练编码器,使其能够自适应选择和对齐报告中每句话的正确斑块。跨粒度融合模块整合实例级别和词-斑块级别的多模态信息,帮助模型在稀疏放射影像中更好地捕捉关键病理结构,对多尺度下游任务性能有所提升。SimCroP框架在大型配对CT报告数据集上进行预训练,并在五个公共数据集上进行图像分类和分割任务验证。实验结果表明,SimCroP在先进的医学自监督学习方法和医学视觉语言预训练方法的测试中表现优异。相关代码和模型可通过https://github.com/ToniChopp/SimCroP获取。
要点概括
- 医学视觉-语言预训练在理解大量放射影像与报告中的代表性特征方面具有潜力。
- CT扫描中的病灶具有空间稀疏性和复杂结构特征,增加了识别难度。
- 报告中病理描述与其在放射影像中对应子区域的关系复杂且隐含。
- SimCroP框架通过相似性驱动对齐和跨粒度融合来提升放射影像解读。
- 利用多模态掩模建模优化编码器对低级语义的理解。
- SimCroP框架在多个数据集上进行了预训练并在图像分类和分割任务中表现优越。
点此查看论文截图




RepViT-CXR: A Channel Replication Strategy for Vision Transformers in Chest X-ray Tuberculosis and Pneumonia Classification
Authors:Faisal Ahmed
Chest X-ray (CXR) imaging remains one of the most widely used diagnostic tools for detecting pulmonary diseases such as tuberculosis (TB) and pneumonia. Recent advances in deep learning, particularly Vision Transformers (ViTs), have shown strong potential for automated medical image analysis. However, most ViT architectures are pretrained on natural images and require three-channel inputs, while CXR scans are inherently grayscale. To address this gap, we propose RepViT-CXR, a channel replication strategy that adapts single-channel CXR images into a ViT-compatible format without introducing additional information loss. We evaluate RepViT-CXR on three benchmark datasets. On the TB-CXR dataset,our method achieved an accuracy of 99.9% and an AUC of 99.9%, surpassing prior state-of-the-art methods such as Topo-CXR (99.3% accuracy, 99.8% AUC). For the Pediatric Pneumonia dataset, RepViT-CXR obtained 99.0% accuracy, with 99.2% recall, 99.3% precision, and an AUC of 99.0%, outperforming strong baselines including DCNN and VGG16. On the Shenzhen TB dataset, our approach achieved 91.1% accuracy and an AUC of 91.2%, marking a performance improvement over previously reported CNN-based methods. These results demonstrate that a simple yet effective channel replication strategy allows ViTs to fully leverage their representational power on grayscale medical imaging tasks. RepViT-CXR establishes a new state of the art for TB and pneumonia detection from chest X-rays, showing strong potential for deployment in real-world clinical screening systems.
胸部X射线(CXR)成像仍然是检测肺结核(TB)和肺炎等肺部疾病的最广泛使用的诊断工具之一。最近深度学习,尤其是视觉转换器(ViTs)的进展,已显示出在自动化医学图像分析方面的强大潜力。然而,大多数ViT架构都是在自然图像上进行预训练的,需要三通道输入,而CXR扫描本质上是灰度的。为了解决这一差距,我们提出了RepViT-CXR,这是一种通道复制策略,它可以将单通道CXR图像适应为ViT兼容格式,而不会引入额外的信息损失。我们在三个基准数据集上评估了RepViT-CXR。在TB-CXR数据集上,我们的方法达到了99.9%的准确率和99.9%的AUC,超越了先前的最新方法,如Topo-CXR(准确率为99.3%,AUC为99.8%)。对于小儿肺炎数据集,RepViT-CXR达到了99.0%的准确率,召回率为99.2%,精确度为99.3%,AUC为99.0%,优于DCNN和VGG16等强大的基线。在深圳的TB数据集上,我们的方法达到了91.1%的准确率和91.2%的AUC,相较于之前报道的基于CNN的方法,性能有所提高。这些结果表明,简单而有效的通道复制策略允许ViTs在灰度医学成像任务上充分利用其表征能力。RepViT-CXR为肺结核和肺炎的胸部X射线检测建立了新的技术标杆,显示出在真实世界临床筛查系统中部署的强大潜力。
论文及项目相关链接
PDF 10 pages, 5 figures
摘要
本文提出一种名为RepViT-CXR的渠道复制策略,能将单通道的Chest X-ray(CXR)图像转化为适合ViT模型分析的格式,同时不增加信息损失。该策略在肺结核(TB)和肺炎的CXR图像诊断上表现优异,取得了新的领先水平,展现出在实际临床筛查系统中部署的潜力。
关键见解
- Chest X-ray (CXR) 仍然是检测肺部疾病如肺结核和肺炎的最常用的诊断工具之一。
- 近期深度学习和Vision Transformers(ViTs)在医学图像分析方面展现出强大的潜力。
- 大多数ViT架构都是在自然图像上进行预训练的,需要三通道输入,而CXR扫描是灰度的。
- 提出RepViT-CXR的渠道复制策略,将单通道CXR图像转化为ViT兼容格式。
- RepViT-CXR在三个基准数据集上的评估表现优秀:在TB-CXR数据集上准确率和AUC均达到99.9%,超过先前的方法。
- 在小儿肺炎数据集上,RepViT-CXR获得99.0%的准确率、99.2%的召回率、99.3%的精确度和99.0%的AUC,优于DCNN和VGG16等强基线。
- RepViT-CXR方法在深圳市肺结核数据集上达到91.1%的准确率和91.2%的AUC,相比之前报道的CNN方法有所提升。
点此查看论文截图




The WISSH quasar project. XII. X-ray view of the most luminous quasi-stellar objects at Cosmic Noon
Authors:C. Degli Agosti, C. Vignali, E. Piconcelli, L. Zappacosta, E. Bertola, R. Middei, I. Saccheo, G. Vietri, F. Vito, A. Bongiorno, M. Bischetti, G. Bruni, S. Carniani, G. Cresci, C. Feruglio, F. Salvestrini, A. Travascio, M. Gaspari, E. Glikman, E. Kammoun, G. Lanzuisi, M. Laurenti, G. Miniutti, C. Pinto, V. Testa, F. Tombesi, A. Tortosa, F. Fiore
To improve our knowledge of nuclear emission in luminous QSOs at Cosmic Noon, we studied the X-ray emission of the WISE/SDSS-selected hyper-luminous (WISSH) QSO sample: 85 broad-line AGN with $L_{bol}>few\times 10^{47},erg,s^{-1}$ at $z\sim 2-4$. Our aim is to characterise their X-ray spectra and explore relations between X-ray luminosity and other bands, comparing powerful QSOs with the general AGN population. We performed spectral analysis for about half of the sample; 16 sources were analysed via their hardness ratio; for the others we estimated their intrinsic luminosity $L_{2-10,keV}$. Only 8 sources are undetected. We report a large dispersion in $L_{2-10,keV}$ despite the narrow distribution of $L_{bol}$, $L_{2500,\r{A}}$ and $\lambda L_{6,\mu m}$ (about one-third of the sources classified as X-ray weak). This suggests differences in X-ray corona and accretion flow physics between hyper-luminous and less powerful AGN. X-ray photon index distribution is consistent with that of lower-$z$, lower-$L_{bol}$ AGN, and does not depend on the Eddington ratio ($\lambda_{Edd}$) or X-ray weakness. Most WISSH QSOs with intrinsic absorption estimates show little to no obscuration ($N_H \le 5\times 10^{22},cm^{-2}$). Among the obscured sources we find blue QSOs without broad absorption lines within the “forbidden region” of the $Log(N_H)-Log(\lambda_{Edd})$ plane, typically occupied by dust-reddened QSOs and associated with intense feedback. We confirm a correlation between $L_{2-10,keV}$ and CIV line blueshift, a tracer of nuclear ionized outflows. Multi-wavelength data and complete X-ray coverage enabled the investigation of the disk-corona interplay at the highest luminosity regimes. The broad distribution of bolometric correction and X-ray - to - optical index suggest caution when using $L_{bol}$, $L_{2500,\r{A}}$ or $L_{6,\mu m}$ as direct X-ray proxy for individual luminous QSOs.
为了了解宇宙中午期发光量子标度超新星(QSOs)的核发射特性,我们对通过WAISE/SDSS选定的超亮(WISSH)QSO样本的X射线发射进行了研究,这些样本包括85个宽线活跃星系核(AGN),其总光度($L_{bol}$)大于若干倍$10^{47} , erg , s^{-1}$,位于红移$z\sim 2-4$的区间。我们的目标是刻画它们的X射线光谱,探索X射线光度与其他波段的联系,并将这些强大的QSOs与一般的AGN群体进行比较。我们对大约一半的样本进行了光谱分析;通过硬度比分析了其中16个源;对于其他源,我们估计了其固有光度($L_{2-10 , keV}$)。仅有8个源未被探测到。尽管总光度($L_{bol}$)、波长 $L_{2500 \r{A}}$ 以及 $\lambda L_{6 , \mu m}$ 的分布较窄(约三分之一的源被归类为X射线弱),但我们报告了 $L_{2-10 , keV}$ 的较大分散。这表明超亮AGN与功率较低的AGN之间的X射线冕和增流物理特性的差异。X射线光子指数分布与低红移、低总光度的AGN一致,不依赖于爱丁顿比率($\lambda_{Edd}$)或X射线弱度。大多数具有固有吸收估计值的WISSH QSOs显示几乎没有遮蔽作用($N_H \le 5\times 10^{22} , cm^{-2}$)。在遮蔽源中,我们发现了一些没有宽吸收线的蓝色QSOs位于“禁止区域”内,该区域通常被尘埃红化的QSOs占据,并与强烈的反馈有关。我们证实了 $L_{2-10 , keV}$ 与CIV线蓝移之间的相关性,这是核离子流出的迹象。多波长数据和完整的X射线覆盖使得在最高光度范围内研究星盘相互作用成为可能。总光度的广泛校正和X射线与光学指数表明,在使用 $L_{bol}$、 $L_{2500 \r{A}}$ 或 $L_{6 , \mu m}$ 作为单个发光QSOs的直接X射线代理时需要谨慎。
论文及项目相关链接
PDF Accepted for publication in Astronomy & Astrophysics
摘要
本文对宇宙中午期核发射的研究进行了深入探讨,特别是对高光度QSO的X射线发射进行了研究。通过对WISE/SDSS选定的超高光度(WISSH)QSO样本的X射线光谱分析,我们对其X射线光谱进行了表征,并探讨了X射线光度与其他波段之间的关系。本文报道了样本中约一半源的光谱分析结果,对另外的源通过硬度比进行了分析,并对这些源的固有光度L进行了估算。结果发现超高光度QSO中存在较大的X射线光度分散现象,这暗示了超高光度QSO与普通低光度AGNX射线冕和吸积流物理性质之间的差异。此外,大多数WISSH QSOs的固有吸收估计显示没有明显的遮蔽现象。最后,我们发现了一种现象,即CIV线蓝移与X射线光度之间的相关性,这可能与核电离流出有关。本文强调了在使用Lbol、L 2500 A或L 6μm作为个体高光度QSO的直接X射线代理时需要谨慎。总的来说,本文为我们对宇宙中午期高光度QSO的核发射有了更深入的了解。
关键见解
- 对宇宙中午期核发射进行了深入研究,特别是对高光度QSO的X射线发射进行了考察。
- 研究了超高光度QSO样本的X射线光谱特性。
- 发现超高光度QSO的X射线光度存在较大的分散现象,暗示其与普通低光度AGNX射线冕和吸积流物理性质存在差异。
- 大部分WISSH QSOs的固有吸收估计显示没有明显的遮蔽现象。
- 发现CIV线蓝移与X射线光度之间存在相关性,可能与核电离流出有关。
- 使用Lbol、L 2500 A或L 6μm作为高光度QSO的直接X射线代理时存在不确定性,需谨慎使用。
点此查看论文截图





Expert-Guided Explainable Few-Shot Learning for Medical Image Diagnosis
Authors:Ifrat Ikhtear Uddin, Longwei Wang, KC Santosh
Medical image analysis often faces significant challenges due to limited expert-annotated data, hindering both model generalization and clinical adoption. We propose an expert-guided explainable few-shot learning framework that integrates radiologist-provided regions-of-interests (ROIs) into model training to simultaneously enhance classification performance and interpretability. Leveraging Grad-CAM for spatial attention supervision, we introduce an explanation loss based on Dice similarity to align model attention with diagnostically relevant regions during training. This explanation loss is jointly optimized with a standard prototypical network objective, encouraging the model to focus on clinically meaningful features even under limited data conditions. We evaluate our framework on two distinct datasets: BraTS (MRI) and VinDr-CXR (Chest X-ray), achieving significant accuracy improvements from 77.09% to 83.61% on BraTS and from 54.33% to 73.29% on VinDr-CXR compared to non-guided models. Grad-CAM visualizations further confirm that expert-guided training consistently aligns attention with diagnostic regions, improving both predictive reliability and clinical trustworthiness. Our findings demonstrate the effectiveness of incorporating expert-guided attention supervision to bridge the gap between performance and interpretability in few-shot medical image diagnosis.
医学图像分析常常因为缺乏专家标注的数据而面临重大挑战,这不仅影响了模型的泛化能力,也阻碍了其在临床上的实际应用。我们提出了一种专家引导的可解释的少量学习框架,该框架将放射科医生提供的感兴趣区域(ROI)整合到模型训练中,以同时提高分类性能和解释性。我们借助Grad-CAM进行空间注意力监督,并引入基于Dice相似度的解释损失,以在训练过程中使模型注意力与诊断相关区域对齐。这种解释损失与标准原型网络目标联合优化,鼓励模型即使在数据有限的情况下也关注临床上具有重要意义的特点。我们在两个不同的数据集BraTS(MRI)和VinDr-CXR(胸部X光片)上评估了我们的框架,与无导向模型相比,BraTS的准确性从77.09%提高到83.61%,VinDr-CXR从54.33%提高到73.29%。Grad-CAM可视化进一步证实,专家引导的训练能使注意力始终与诊断区域对齐,提高了预测可靠性和临床可信度。我们的研究结果表明,在少量医学图像诊断中融入专家引导的注意力监督,能有效缩小性能与解释性之间的差距。
论文及项目相关链接
PDF Accepted for publication in the proceedings of MICCAI Workshop on Data Engineering in Medical Imaging 2025
Summary
本文提出一种专家引导的可解释的少样本学习框架,通过整合放射科医生提供的感兴趣区域(ROIs)到模型训练中,同时提高分类性能和解释性。利用Grad-CAM进行空间注意力监督,并引入基于Dice相似度的解释损失,使模型注意力与诊断相关区域对齐。该解释损失与标准原型网络目标联合优化,鼓励模型在有限数据条件下关注临床有意义的特征。在BraTS和VinDr-CXR两个数据集上的评估显示,与无引导模型相比,准确率分别从77.09%提高到83.61%和从54.33%提高到73.29%。Grad-CAM可视化进一步证实,专家引导的训练能使注意力始终与诊断区域对齐,提高预测可靠性和临床可信度。
Key Takeaways
- 医学图像分析面临专家标注数据有限的挑战,影响模型通用化和临床应用。
- 提出一种专家引导的可解释的少样本学习框架,将放射科医生提供的感兴趣区域(ROIs)融入模型训练。
- 利用Grad-CAM进行空间注意力监督,并引入基于Dice相似度的解释损失。
- 解释损失与原型网络目标联合优化,提高模型在有限数据下对临床有意义特征的关注。
- 在BraTS和VinDr-CXR数据集上的评估显示,该方法显著提高分类准确率。
- Grad-CAM可视化证实专家引导训练能使模型注意力与诊断区域对齐。
- 专家引导的方法提高了预测可靠性和临床可信度。
点此查看论文截图


EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration
Authors:Haokai Zhu, Bo Qu, Si-Yuan Cao, Runmin Zhang, Shujie Chen, Bailin Yang, Hui-Liang Shen
Previous deep image registration methods that employ single homography, multi-grid homography, or thin-plate spline often struggle with real scenes containing depth disparities due to their inherent limitations. To address this, we propose an Exponential-Decay Free-Form Deformation Network (EDFFDNet), which employs free-form deformation with an exponential-decay basis function. This design achieves higher efficiency and performs well in scenes with depth disparities, benefiting from its inherent locality. We also introduce an Adaptive Sparse Motion Aggregator (ASMA), which replaces the MLP motion aggregator used in previous methods. By transforming dense interactions into sparse ones, ASMA reduces parameters and improves accuracy. Additionally, we propose a progressive correlation refinement strategy that leverages global-local correlation patterns for coarse-to-fine motion estimation, further enhancing efficiency and accuracy. Experiments demonstrate that EDFFDNet reduces parameters, memory, and total runtime by 70.5%, 32.6%, and 33.7%, respectively, while achieving a 0.5 dB PSNR gain over the state-of-the-art method. With an additional local refinement stage,EDFFDNet-2 further improves PSNR by 1.06 dB while maintaining lower computational costs. Our method also demonstrates strong generalization ability across datasets, outperforming previous deep learning methods.
先前采用单应性、多网格单应性或薄板插值的深度图像配准方法,由于固有局限性,在处理包含深度差异的真实场景时常常遇到困难。为解决这一问题,我们提出了指数衰减自由形态变形网络(EDFFDNet),该网络采用自由形态变形和指数衰减基函数。这种设计实现了高效率,并在深度差异场景中表现良好,得益于其固有的局部性。我们还引入了自适应稀疏运动聚合器(ASMA),取代了先前方法中的多层运动聚合器。ASMA通过将密集交互转换为稀疏交互,减少了参数,提高了精度。此外,我们提出了一种渐进的相关性细化策略,利用全局-局部相关性模式进行从粗到细的运动估计,进一步提高效率和准确性。实验表明,EDFFDNet在参数、内存和总运行时间方面分别减少了70.5%、32.6%和33.7%,同时在最先进的方法基础上实现了0.5 dB的峰值信噪比(PSNR)增益。通过额外的局部细化阶段,EDFFDNet-2进一步提高了PSNR值1.06 dB,同时保持了较低的计算成本。我们的方法还表现出了强大的跨数据集泛化能力,优于先前的深度学习方法。
论文及项目相关链接
Summary
本文提出一种基于自由形态变形和指数衰减基函数的Exponential-Decay Free-Form Deformation Network(EDFFDNet)模型,用于解决传统深度图像配准方法在面对具有深度差异的真实场景时的局限性问题。同时引入自适应稀疏运动聚合器(ASMA)和渐进式相关性优化策略,提高效率和准确性。实验表明,EDFFDNet在减少参数、内存和总运行时间的同时,较现有方法提高了图像质量。EDFFDNet-2在局部细化阶段进一步提升了性能。
Key Takeaways
- EDFFDNet模型采用自由形态变形和指数衰减基函数,提高了处理具有深度差异场景的效率。
- 自适应稀疏运动聚合器(ASMA)替换传统方法中的MLP运动聚合器,降低参数并提高准确性。
- 提出渐进相关性优化策略,利用全局和局部相关性模式进行从粗到细的运动估计,进一步提高效率和准确性。
- 实验结果显示,EDFFDNet较现有方法减少了参数、内存和运行时间,并提高了PSNR值。
- EDFFDNet-2在局部细化阶段进一步提升了图像质量。
- 该方法在不同数据集上表现出强大的泛化能力。
- 该方法在深度图像配准领域具有潜在的应用价值。
点此查看论文截图




HU-based Foreground Masking for 3D Medical Masked Image Modeling
Authors:Jin Lee, Vu Dang, Gwang-Hyun Yu, Anh Le, Zahid Rahman, Jin-Ho Jang, Heonzoo Lee, Kun-Yung Kim, Jin-Sul Kim, Jin-Young Kim
While Masked Image Modeling (MIM) has revolutionized fields of computer vision, its adoption in 3D medical image computing has been limited by the use of random masking, which overlooks the density of anatomical objects. To address this limitation, we enhance the pretext task with a simple yet effective masking strategy. Leveraging Hounsfield Unit (HU) measurements, we implement an HU-based Foreground Masking, which focuses on the intensity distribution of visceral organs and excludes non-tissue regions, such as air and fluid, that lack diagnostically meaningful features. Extensive experiments on five public 3D medical imaging datasets demonstrate that our masking consistently improves performance, both in quality of segmentation and Dice score (BTCV:84.64%, Flare22:92.43%, MM-WHS:90.67%, Amos22:88.64%, BraTS:~78.55%). These results underscore the importance of domain-centric MIM and suggest a promising direction for representation learning in medical image segmentation. Implementation is available at github.com/AISeedHub/SubFore/.
虽然Masked Image Modeling(MIM)在计算机视觉领域已经实现了革命性的进展,但其在三维医学图像计算中的应用却因随机遮蔽的使用而受到限制,忽视了解剖物体密度的信息。为了解决这一局限性,我们通过一项简单而有效的遮蔽策略来增强预设任务。利用Hounsfield Unit(HU)测量值,我们实现了基于HU的前景遮蔽策略,该策略侧重于内脏器官的强度分布,并排除缺乏诊断意义的特征的非组织区域,如空气和流体。在五个公开的三维医学图像数据集上进行的大量实验表明,我们的遮蔽策略在分割质量和Dice分数方面都表现出了一致性的改进(BTCV:
84.64%,Flare22:92.43%,MM-WHS:90.67%,Amos22:88.64%,BraTS:~78.55%)。这些结果强调了以领域为中心的MIM的重要性,并为医学图像分割的表示学习指明了有前景的方向。具体实现可访问github.com/AISeedHub/SubFore/。
论文及项目相关链接
PDF Accepted by MICCAI AMAI Workshop 2025
Summary
本文提出一种基于Hounsfield单位(HU)测量的前景掩模策略,用于改进在三维医学图像计算中的Masked Image Modeling(MIM)。新策略关注内脏器官的强度分布,并排除缺乏诊断特征的非组织区域,如空气和流体。在五个公共三维医学成像数据集上的广泛实验表明,这种掩模策略在分割质量和Dice分数方面持续提高了性能。
Key Takeaways
- Masked Image Modeling(MIM)在医学图像分割中的应用受到限制,主要由于随机掩模忽略了解剖对象的密度。
- 提出了一种基于Hounsfield单位(HU)的前景掩模策略,专注于内脏器官的强度分布。
- 新策略排除非组织区域,如空气和流体,这些区域缺乏诊断特征。
- 在五个公共三维医学成像数据集上进行的广泛实验表明,新的掩模策略提高了分割质量和Dice分数。
- 实验结果强调领域特定的MIM的重要性,并为医学图像分割中的表示学习提供了有前景的方向。
- 公开了实施细节,可在github.com/AISeedHub/SubFore/找到。
点此查看论文截图





Advanced Brain Tumor Segmentation Using EMCAD: Efficient Multi-scale Convolutional Attention Decoding
Authors:GodsGift Uzor, Tania-Amanda Nkoyo Fredrick Eneye, Chukwuebuka Ijezue
Brain tumor segmentation is a critical pre-processing step in the medical image analysis pipeline that involves precise delineation of tumor regions from healthy brain tissue in medical imaging data, particularly MRI scans. An efficient and effective decoding mechanism is crucial in brain tumor segmentation especially in scenarios with limited computational resources. However these decoding mechanisms usually come with high computational costs. To address this concern EMCAD a new efficient multi-scale convolutional attention decoder designed was utilized to optimize both performance and computational efficiency for brain tumor segmentation on the BraTs2020 dataset consisting of MRI scans from 369 brain tumor patients. The preliminary result obtained by the model achieved a best Dice score of 0.31 and maintained a stable mean Dice score of 0.285 plus/minus 0.015 throughout the training process which is moderate. The initial model maintained consistent performance across the validation set without showing signs of over-fitting.
脑肿瘤分割是医学图像分析流程中的关键预处理步骤,该步骤涉及从医学成像数据中精确划分肿瘤区域和健康脑组织,特别是MRI扫描。在脑肿瘤分割中,特别是在计算资源有限的情况下,高效且有效的解码机制至关重要。然而,这些解码机制通常伴随着较高的计算成本。为了解决这一问题,采用了一种新型高效的多尺度卷积注意力解码器(EMCAD),以优化BrainTs2020数据集上脑肿瘤分割的性能和计算效率。该数据集包含来自369名脑肿瘤患者的MRI扫描。该模型取得的初步最佳Dice系数为0.31,并在整个训练过程中维持了稳定的平均Dice系数,即正负0.015,属于中等水平。初始模型在验证集上的表现保持一致,没有出现过拟合的迹象。
论文及项目相关链接
Summary
针对医学图像分析中的脑肿瘤分割问题,特别是在计算资源有限的场景下,高效解码机制尤为重要。为了解决这个问题,研究采用了新型的多尺度卷积注意力解码器(EMCAD),在BraTs2020数据集上进行脑肿瘤分割,实现了性能和计算效率的优化。初步实验结果显示,模型的最佳Dice系数为0.31,并在训练过程中保持稳定,平均Dice系数为0.285±0.015,表现中等。初步模型在验证集上表现一致,没有过拟合的迹象。
Key Takeaways
- 脑肿瘤分割是医学图像分析中的重要预处理步骤,需要从医学成像数据中精确划分肿瘤区域和健康脑组织。
- 在计算资源有限的情况下,高效的解码机制在脑肿瘤分割中至关重要。
- EMCAD是一种新型的多尺度卷积注意力解码器,旨在优化脑肿瘤分割的性能和计算效率。
- 在BraTs2020数据集上进行的初步实验显示,该模型的最佳Dice系数为0.31,表现中等。
- 模型在训练过程中保持稳定的平均Dice系数,为0.285±0.015。
- 初步模型在验证集上的表现一致,没有过拟合的迹象。
点此查看论文截图







Systematic Integration of Attention Modules into CNNs for Accurate and Generalizable Medical Image Diagnosis
Authors:Zahid Ullah, Minki Hong, Tahir Mahmood, Jihie Kim
Deep learning has become a powerful tool for medical image analysis; however, conventional Convolutional Neural Networks (CNNs) often fail to capture the fine-grained and complex features critical for accurate diagnosis. To address this limitation, we systematically integrate attention mechanisms into five widely adopted CNN architectures, namely, VGG16, ResNet18, InceptionV3, DenseNet121, and EfficientNetB5, to enhance their ability to focus on salient regions and improve discriminative performance. Specifically, each baseline model is augmented with either a Squeeze and Excitation block or a hybrid Convolutional Block Attention Module, allowing adaptive recalibration of channel and spatial feature representations. The proposed models are evaluated on two distinct medical imaging datasets, a brain tumor MRI dataset comprising multiple tumor subtypes, and a Products of Conception histopathological dataset containing four tissue categories. Experimental results demonstrate that attention augmented CNNs consistently outperform baseline architectures across all metrics. In particular, EfficientNetB5 with hybrid attention achieves the highest overall performance, delivering substantial gains on both datasets. Beyond improved classification accuracy, attention mechanisms enhance feature localization, leading to better generalization across heterogeneous imaging modalities. This work contributes a systematic comparative framework for embedding attention modules in diverse CNN architectures and rigorously assesses their impact across multiple medical imaging tasks. The findings provide practical insights for the development of robust, interpretable, and clinically applicable deep learning based decision support systems.
深度学习已成为医学图像分析的有力工具;然而,传统的卷积神经网络(CNN)往往无法捕获对准确诊断至关重要的精细且复杂的特征。为了解决这个问题,我们系统地将在五种广泛采用的CNN架构(即VGG16、ResNet18、InceptionV3、DenseNet121和EfficientNetB5)中集成注意力机制,以增强其关注显著区域的能力并提高鉴别性能。具体来说,每个基准模型都增加了Squeeze和Excitation块或混合卷积块注意力模块,实现对通道和空间特征表示的自适应重新校准。所提出的模型在两个不同的医学成像数据集上进行了评估,其中包括多种肿瘤亚型的脑肿瘤MRI数据集和包含四种组织类别的妊娠产物组织病理学数据集。实验结果表明,增强注意力的CNN在所有指标上始终优于基准架构。特别是,具有混合注意力的EfficientNetB5获得了最高的总体性能,并在两个数据集上都实现了显著的收益。除了提高分类精度外,注意力机制还增强了特征定位,从而在各种异构成像模态之间实现了更好的泛化。这项工作为在多种CNN架构中嵌入注意力模块提供了一个系统的比较框架,并严格评估了它们在多个医学成像任务中的影响。研究结果为开发实用、可解释和临床上适用的基于深度学习的决策支持系统提供了实际见解。
论文及项目相关链接
Summary
深度学习在医学图像分析中具有强大的能力,但传统卷积神经网络(CNNs)在捕捉精细粒度和复杂特征方面存在局限性,这对于准确诊断至关重要。为解决这个问题,本文系统地将在五种广泛采用的CNN架构中整合注意力机制,包括VGG16、ResNet18、InceptionV3、DenseNet121和EfficientNetB5,以增强其聚焦显著区域的能力并提高辨别性能。实验结果表明,增强注意力的CNN在各种指标上始终优于基础架构。特别是,带有混合注意力的EfficientNetB5实现最高整体性能,在多个数据集上实现显著改进。除提高分类精度外,注意力机制还提高了特征定位能力,在多种异质成像模态之间实现了更好的泛化。本研究为嵌入注意力模块的系统比较框架及其在多种医学成像任务中的影响评估做出了贡献,为开发稳健、可解释和临床适用的深度学习决策支持系统提供了实际见解。
Key Takeaways
- 深度学习在医学图像分析中具有强大能力,但传统CNN存在捕捉精细粒度和复杂特征的局限性。
- 为提高诊断准确性,研究者在五种CNN架构中整合了注意力机制。
- 注意力增强型CNN在各种指标上优于基础架构。
- EfficientNetB5结合混合注意力机制表现最佳,在多个数据集上实现显著性能提升。
- 注意力机制不仅提高了分类精度,还增强了特征定位能力,提高了模型在不同成像模态之间的泛化性能。
- 本研究为嵌入注意力模块的系统比较框架提供了贡献。
点此查看论文截图



Hessian-Based Lightweight Neural Network HessNet for State-of-the-Art Brain Vessel Segmentation on a Minimal Training Dataset
Authors:Alexandra Bernadotte, Elfimov Nikita, Mikhail Shutov, Ivan Menshikov
Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image segmentation, but their development depends on well-annotated training datasets. However, there is a notable lack of publicly available MRA datasets with detailed brain vessel annotations. To address this gap, we propose a novel semi-supervised learning lightweight neural network with Hessian matrices on board for 3D segmentation of complex structures such as tubular structures, which we named HessNet. The solution is a Hessian-based neural network with only 6000 parameters. HessNet can run on the CPU and significantly reduces the resource requirements for training neural networks. The accuracy of vessel segmentation on a minimal training dataset reaches state-of-the-art results. It helps us create a large, semi-manually annotated brain vessel dataset of brain MRA images based on the IXI dataset (annotated 200 images). Annotation was performed by three experts under the supervision of three neurovascular surgeons after applying HessNet. It provides high accuracy of vessel segmentation and allows experts to focus only on the most complex important cases. The dataset is available at https://git.scinalytics.com/terilat/VesselDatasetPartly.
在脑部磁共振血管造影(MRA)中,对血管进行精确分割对于手术成功至关重要,如动脉瘤修复或搭桥手术。目前,主要通过手动分割或经典方法(如Frangi滤波器)进行标注,但这种方法往往缺乏足够的准确性。神经网络已作为医学图像分割的强大工具出现,但其发展取决于经过良好注释的训练数据集。然而,缺乏带有详细脑血管注释的公开MRA数据集。为了弥补这一空白,我们提出了一种新型半监督学习轻量化神经网络,该网络内置Hessian矩阵,用于对管状结构等复杂结构进行3D分割,我们称之为HessNet。该解决方案是一个基于Hessian的神经网络,仅有6000个参数。HessNet可在CPU上运行,可显著降低训练神经网络所需的资源要求。在最小训练数据集上的血管分割精度达到了最新结果。它帮助我们基于IXI数据集创建了一个大规模的半手动注释的脑部MRA图像脑血管数据集(注释了200张图像)。在神经血管外科医生的监督下,三名专家应用HessNet进行了标注。它提供了高精度的血管分割,使专家能够专注于最复杂的关键病例。数据集可在https://git.scinalytics.com/terilat/VesselDatasetPartly找到。
论文及项目相关链接
PDF 11 pages, 2 figures
摘要
神经网络技术为解决医学图像分割问题提供了强有力的工具,但在脑部磁共振血管造影(MRA)中准确分割血管对于手术成功至关重要,如动脉瘤修复或搭桥手术。目前主要通过手动分割或经典方法(如Frangi滤波器)进行标注,但准确性不足。为解决这一问题,我们提出了一种新型的半监督学习轻量化神经网络,采用Hessian矩阵进行3D复杂结构(如管状结构)分割,我们称之为HessNet。该解决方案是一个基于Hessian的神经网络,仅有6000个参数,可在CPU上运行,显著降低了训练神经网络所需的资源要求。在最小训练数据集上的血管分割精度达到了业界领先水平。它帮助我们创建了一个大规模的半手动标注的脑部血管数据集,基于IXI数据集(标注了200张图像)。标注工作由三名专家在三位神经血管外科医生的监督下进行,并应用了HessNet。这提供了高精度的血管分割,并使专家能够专注于最复杂、最重要的病例。数据集可在https://git.scinalytics.com/terilat/VesselDatasetPartly获取。
关键见解
- 神经网络技术在医学图像分割中有广泛应用前景,特别是在脑部磁共振血管造影(MRA)中。
- 目前手动分割和经典方法(如Frangi滤波器)在MRA血管标注方面存在准确性问题。
- HessNet是一种新型的半监督学习轻量化神经网络,利用Hessian矩阵进行复杂结构(如管状结构)的3D分割。
- HessNet具有超高的参数效率,仅有6000个参数,且可在CPU上运行,降低资源需求。
- 在有限训练数据集上,HessNet实现了高精度的血管分割,达到业界领先水平。
- 基于HessNet,创建了一个大规模的半手动标注的脑部血管数据集。
点此查看论文截图


Identifying actionable driver mutations in lung cancer using an efficient Asymmetric Transformer Decoder
Authors:Biagio Brattoli, Jack Shi, Jongchan Park, Taebum Lee, Donggeun Yoo, Sergio Pereira
Identifying actionable driver mutations in non-small cell lung cancer (NSCLC) can impact treatment decisions and significantly improve patient outcomes. Despite guideline recommendations, broader adoption of genetic testing remains challenging due to limited availability and lengthy turnaround times. Machine Learning (ML) methods for Computational Pathology (CPath) offer a potential solution; however, research often focuses on only one or two common mutations, limiting the clinical value of these tools and the pool of patients who can benefit from them. This study evaluates various Multiple Instance Learning (MIL) techniques to detect six key actionable NSCLC driver mutations: ALK, BRAF, EGFR, ERBB2, KRAS, and MET ex14. Additionally, we introduce an Asymmetric Transformer Decoder model that employs queries and key-values of varying dimensions to maintain a low query dimensionality. This approach efficiently extracts information from patch embeddings and minimizes overfitting risks, proving highly adaptable to the MIL setting. Moreover, we present a method to directly utilize tissue type in the model, addressing a typical MIL limitation where either all regions or only some specific regions are analyzed, neglecting biological relevance. Our method outperforms top MIL models by an average of 3%, and over 4% when predicting rare mutations such as ERBB2 and BRAF, moving ML-based tests closer to being practical alternatives to standard genetic testing.
识别非小细胞肺癌(NSCLC)中的可操作驱动基因突变可以对治疗决策产生影响,并显著改善患者预后。尽管有指南推荐,但由于有限的可用性和漫长的周转时间,更广泛地采用基因检测仍然具有挑战性。计算病理学(CPath)中的机器学习(ML)方法提供了潜在的解决方案;然而,研究通常只关注一种或两种常见的突变,这限制了这些工具的临床价值以及可以从这些工具中受益的患者群体。本研究评估了各种多实例学习(MIL)技术来检测六种关键可操作NSCLC驱动基因突变:ALK、BRAF、EGFR、ERBB2、KRAS和MET ex14。此外,我们引入了一种不对称变压器解码器模型,该模型采用不同维度的查询和键值对来维持低查询维度。这种方法有效地从补丁嵌入中提取信息,并最小化过度拟合的风险,证明其高度适应MIL环境。而且,我们提出了一种方法,直接在模型中使用组织类型,解决了一个典型的MIL限制,即只分析所有区域或某些特定区域,而忽略了生物学相关性。我们的方法平均优于顶级MIL模型约3%,在预测如ERBB2和BRAF等罕见突变时超过4%,使基于ML的测试更接近实用的替代标准基因检测。
论文及项目相关链接
PDF Accepted at MICCAI 2025 Workshop COMPAYL
Summary
本文研究利用机器学习技术检测非小细胞肺癌(NSCLC)的六种关键可操作的驱动基因突变,包括ALK、BRAF、EGFR、ERBB2、KRAS和MET ex14。研究采用多种实例学习(MIL)技术,并引入不对称转换器解码器模型,该方法能够高效提取信息,降低过拟合风险,并直接利用组织类型,解决了典型的MIL限制。该方法优于顶级MIL模型,特别是在预测罕见突变如ERBB2和BRAF时,使得基于ML的测试成为实用替代标准遗传测试的一种可能性。
Key Takeaways
- 机器学习技术在非小细胞肺癌基因突变检测中的应用可以提高治疗决策和患者预后。
- 遗传测试的更广泛采用受到限制,因为可用性和周转时间有限。
- 计算病理学(CPath)的机器学习(ML)方法提供潜在解决方案。
- 研究重点集中在六种关键可操作的非小细胞肺癌(NSCLC)驱动基因突变上。
- 引入不对称转换器解码器模型,可高效提取信息并降低过拟合风险。
- 该方法直接利用组织类型,解决了典型的MIL限制。
点此查看论文截图





DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model
Authors:Han Zhang, Xiangde Luo, Yong Chen, Kang Li
Annotation variability remains a substantial challenge in medical image segmentation, stemming from ambiguous imaging boundaries and diverse clinical expertise. Traditional deep learning methods producing single deterministic segmentation predictions often fail to capture these annotator biases. Although recent studies have explored multi-rater segmentation, existing methods typically focus on a single perspective – either generating a probabilistic ``gold standard’’ consensus or preserving expert-specific preferences – thus struggling to provide a more omni view. In this study, we propose DiffOSeg, a two-stage diffusion-based framework, which aims to simultaneously achieve both consensus-driven (combining all experts’ opinions) and preference-driven (reflecting experts’ individual assessments) segmentation. Stage I establishes population consensus through a probabilistic consensus strategy, while Stage II captures expert-specific preference via adaptive prompts. Demonstrated on two public datasets (LIDC-IDRI and NPC-170), our model outperforms existing state-of-the-art methods across all evaluated metrics. Source code is available at https://github.com/string-ellipses/DiffOSeg .
标注差异性在医学图像分割中仍然是一个巨大挑战,其根源在于成像边界模糊和临床专业知识多样。传统产生单一确定性分割预测的深度学习方法通常无法捕捉这些标注者偏见。尽管最近有研究表明多评价者分割方法,但现有方法通常只关注单一视角,要么生成概率“金标准”共识,要么保留专家特定偏好,因此很难提供更全面的视图。在这项研究中,我们提出了DiffOSeg,这是一个基于两阶段扩散的框架,旨在同时实现共识驱动(结合所有专家的意见)和偏好驱动(反映专家的个人评估)的分割。第一阶段通过概率共识策略建立人群共识,而第二阶段通过自适应提示捕捉专家特定偏好。在LIDC-IDRI和NPC-170两个公开数据集上的演示表明,我们的模型在所有评估指标上均优于现有最先进的模型。源代码可在https://github.com/string-ellipses/DiffOSeg上找到。
论文及项目相关链接
Summary
医学图像分割中仍存在标注差异性挑战,源于成像边界模糊和临床专家意见多样性。传统深度学习方法产生的单一确定性分割预测往往无法捕捉标注者偏差。本研究提出DiffOSeg框架,旨在同时实现共识驱动和偏好驱动的分割。该框架分为两个阶段,第一阶段通过概率共识策略建立群体共识,第二阶段通过自适应提示捕捉专家特定偏好。在公共数据集上的表现优于现有最先进方法。
Key Takeaways
- 医学图像分割面临标注差异性挑战,成因于成像边界模糊和临床专家意见差异。
- 传统深度学习方法无法有效捕捉标注者偏差。
- DiffOSeg框架分为两个阶段,旨在同时实现共识驱动和偏好驱动的分割。
- 第一阶段通过概率共识策略建立群体共识。
- 第二阶段通过自适应提示捕捉专家特定偏好。
- DiffOSeg在公共数据集上的表现优于现有最先进方法。
点此查看论文截图



A versatile foundation model for cine cardiac magnetic resonance image analysis tasks
Authors:Yunguan Fu, Wenjia Bai, Weixi Yi, Charlotte Manisty, Anish N Bhuva, Thomas A Treibel, James C Moon, Matthew J Clarkson, Rhodri Huw Davies, Yipeng Hu
Here we present a versatile foundation model that can perform a range of clinically-relevant image analysis tasks, including segmentation, landmark localisation, diagnosis, and prognostication. A multi-view convolution-transformer masked autoencoder, named as CineMA, was trained on 15 million cine images from 74,916 subjects. The model was validated on multiple image analysis tasks and compared to existing models on >4,500 images from eight independent datasets with diverse population characteristics, representing the largest benchmark study for cine CMR so far. CineMA consistently outperformed conventional convolutional neural networks (CNNs) in delineating ventricular boundaries and estimating ejection fraction, a key measure of cardiac function. The improved performance was preserved, even when the model only used half of fine-tuning data. CineMA also surpassed CNNs in disease detection and matched their performance in long-axis function measurement. Interestingly, we found that CineMA can also detect cardiac changes in systemic diseases, such as diabetes, hypertension and cancer, and can also predict mortality. Finally, we assessed model fairness and demonstrated consistent model performance across demographic subgroups. These findings highlight CineMA’s accuracy, learning efficiency, adaptability, and fairness, underscoring its potential as a foundation model for automated cardiac image analysis to support clinical workflow and cardiovascular research. All training and inference code and models are made publicly available at https://github.com/mathpluscode/CineMA.
在这里,我们展示了一个通用基础模型,可以执行一系列与临床相关的图像分析任务,包括分割、地标定位、诊断和预后。该模型采用多视角卷积转换器掩码自编码器,命名为CineMA,在来自74916个主体的1500万电影图像上进行训练。该模型在多个图像分析任务上进行了验证,并在来自八个独立数据集的超过4500张图像上与现有模型进行了比较,代表了迄今为止最大的电影CMR基准测试。CineMA在描绘心室边界和估计射血分数方面始终优于传统的卷积神经网络(CNN),射血分数是心脏功能的关键指标。即使模型只使用一半的精调数据,其性能也得以保持。CineMA在疾病检测方面也超越了CNN,并在长轴功能测量方面与CNN性能相匹配。有趣的是,我们发现CineMA还可以检测糖尿病、高血压和癌症等系统性疾病的心脏变化,并可以进行死亡预测。最后,我们对模型公平性进行了评估,并证明了其在各人口亚组中的模型性能表现一致。这些发现强调了CineMA在准确性、学习效率、适应性和公平性方面的优势,突显了其作为自动化心脏图像分析基础模型的潜力,可支持临床工作流程和心血管研究。所有训练和推理代码和模型都公开可在https://github.com/mathpluscode/CineMA获取。
论文及项目相关链接
Summary
本文介绍了一种多功能基础模型CineMA,该模型可在临床相关的图像分析任务中表现出卓越的性能,包括分割、地标定位、诊断和预后预测。CineMA在大量心脏电影图像数据上进行训练,并在多个独立数据集上进行了验证,表现出较高的准确性和泛化能力。此外,CineMA还能检测全身性疾病中的心脏变化并进行死亡预测。研究结果强调了CineMA的准确性、学习效率、适应性和公平性,可作为自动化心脏图像分析的基础模型,支持临床工作和心血管研究。
Key Takeaways
- CineMA是一种多功能的医学图像分析模型,能够执行包括分割、地标定位、诊断和预后预测等任务。
- CineMA在大量心脏电影图像数据上进行训练,并表现出卓越的性能。
- 与现有模型相比,CineMA在心室边界描绘和射血分数估计方面表现更优秀。
- CineMA在疾病检测和长期功能测量方面也有出色的表现。
- CineMA能够检测全身性疾病中的心脏变化,并进行死亡预测。
- CineMA具有准确性、学习效率和适应性强的特点。
点此查看论文截图

