发布日期: 2025-09-24

更新日期: 2025-11-27

文章字数: 20k

阅读时长: 82 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-24 更新

Towards Seeing Bones at Radio Frequency

Authors:Yiwen Song, Hongyang Li, Kuang Yuan, Ran Bi, Swarun Kumar

Wireless sensing literature has long aspired to achieve X-ray-like vision at radio frequencies. Yet, state-of-the-art wireless sensing literature has yet to generate the archetypal X-ray image: one of the bones beneath flesh. In this paper, we explore MCT, a penetration-based RF-imaging system for imaging bones at mm-resolution, one that significantly exceeds prior penetration-based RF imaging literature. Indeed the long wavelength, significant attenuation and complex diffraction that occur as RF propagates through flesh, have long limited imaging resolution (to several centimeters at best). We address these concerns through a novel penetration-based synthetic aperture algorithm, coupled with a learning-based pipeline to correct for diffraction-induced artifacts. A detailed evaluation of meat models demonstrates a resolution improvement from sub-decimeter to sub-centimeter over prior art in RF penetrative imaging.

无线传感文献长期以来一直渴望实现在射频下的类似X射线的视觉。然而，最先进的无线传感文献尚未生成典型的X射线图像：即肉下的骨骼图像。本文探讨了MCT这一基于穿透的射频成像系统，该系统能够以毫米级分辨率对骨骼进行成像，显著超越了之前的穿透式射频成像文献。实际上，射频在传播过程中会产生较长的波长、较大的衰减和复杂的衍射现象，长期以来一直限制了成像分辨率（最多达到几厘米）。我们通过一种新型的基于穿透的合成孔径算法，以及基于学习的管道来校正衍射引起的伪影来解决这些问题。对肉类模型的详细评估表明，与先前的射频穿透成像技术相比，我们的成像技术分辨率从分米级提高到了厘米级。

论文及项目相关链接

PDF

Summary

本文探索了一种基于穿透的RF成像系统MCT，能够以毫米级分辨率成像骨骼，显著超越了之前的穿透式RF成像文献。针对RF信号在传播过程中受到的长波长、大幅衰减和复杂衍射问题，该文提出了一种新型穿透式合成孔径算法，并结合学习管道校正衍射引起的伪影。对肉类模型的详细评估显示，在射频穿透成像方面，其分辨率较之前的技术有所提升，从分米级提高到了厘米级。

Key Takeaways

本文旨在实现无线传感的X射线视觉，探索了一种新的RF成像系统MCT。
MCT系统能够显著超越现有穿透式RF成像技术的分辨率，实现毫米级骨骼成像。
RF信号在传播过程中面临长波长、大幅衰减和复杂衍射的问题。
针对上述问题，本文提出了一种新型穿透式合成孔径算法。
该算法结合学习管道，能够校正因衍射导致的图像伪影。
在肉类模型上的详细评估表明，该技术在RF穿透成像的分辨率上有所提升。

Cool Papers

点此查看论文截图

SmaRT: Style-Modulated Robust Test-Time Adaptation for Cross-Domain Brain Tumor Segmentation in MRI

Authors:Yuanhan Wang, Yifei Chen, Shuo Jiang, Wenjing Yu, Mingxuan Liu, Beining Wu, Jinying Zong, Feiwei Qin, Changmiao Wang, Qiyuan Tian

Reliable brain tumor segmentation in MRI is indispensable for treatment planning and outcome monitoring, yet models trained on curated benchmarks often fail under domain shifts arising from scanner and protocol variability as well as population heterogeneity. Such gaps are especially severe in low-resource and pediatric cohorts, where conventional test-time or source-free adaptation strategies often suffer from instability and structural inconsistency. We propose SmaRT, a style-modulated robust test-time adaptation framework that enables source-free cross-domain generalization. SmaRT integrates style-aware augmentation to mitigate appearance discrepancies, a dual-branch momentum strategy for stable pseudo-label refinement, and structural priors enforcing consistency, integrity, and connectivity. This synergy ensures both adaptation stability and anatomical fidelity under extreme domain shifts. Extensive evaluations on sub-Saharan Africa and pediatric glioma datasets show that SmaRT consistently outperforms state-of-the-art methods, with notable gains in Dice accuracy and boundary precision. Overall, SmaRT bridges the gap between algorithmic advances and equitable clinical applicability, supporting robust deployment of MRI-based neuro-oncology tools in diverse clinical environments. Our source code is available at https://github.com/baiyou1234/SmaRT.

可靠的脑肿瘤MRI分割对治疗计划和疗效监测至关重要。然而，在扫描仪和协议变化以及人群异质性引起的领域转移中，经过精心训练的模型往往会出现失效。这种差距在低资源和儿科群体中尤为严重，传统测试时间或无需源适应的策略往往面临不稳定和结构不一致的问题。我们提出了SmaRT，这是一种风格调制鲁棒的测试时间自适应框架，能够实现无源的跨域泛化。SmaRT集成了风格感知增强来减少外观差异，双分支动量策略用于稳定的伪标签细化，以及结构先验强制一致性、完整性和连通性。这种协同作用确保了极端领域转移下的适应稳定性和解剖保真度。在撒哈拉以南非洲和儿科胶质瘤数据集上的广泛评估表明，SmaRT始终优于最先进的方法，在Dice精度和边界精度方面有显著的提升。总体而言，SmaRT弥补了算法进步和公平临床应用之间的鸿沟，支持在多样化的临床环境中稳健部署基于MRI的神经肿瘤学工具。我们的源代码可在https://github.com/baiyou1234/SmaRT上找到。

论文及项目相关链接

PDF 11 pages, 6 figures

Summary

本文提出一种名为SmaRT的风格调制稳健测试时间自适应框架，实现了无源的跨域泛化能力。通过集成风格感知增强、双分支动量策略和结构性先验知识，确保在极端域转移下的适应性和解剖学保真度。在撒哈拉以南非洲和儿科胶质瘤数据集上的广泛评估表明，SmaRT在狄氏精度和边界精度方面表现优异，成功缩小了算法进步与公平临床应用之间的差距，支持MRI神经肿瘤学工具在不同临床环境中的稳健部署。

Key Takeaways

可靠的脑肿瘤MRI分割对于治疗规划和结果监测至关重要。但实际应用中面临来自扫描仪、协议变化和人群差异的领域迁移问题。
在低资源和儿科群体中，传统的测试时间或无需源适应策略存在不稳定性和结构不一致性。
SmaRT框架通过风格感知增强来减轻外观差异，实现跨域泛化能力。
SmaRT采用双分支动量策略进行稳定的伪标签优化，确保适应性稳定性。
通过结构性先验知识强化一致性、完整性和连通性，确保解剖学保真度。
在撒哈拉以南非洲和儿科胶质瘤数据集上的评估显示，SmaRT在狄氏精度和边界精度方面表现优于现有方法。

Cool Papers

点此查看论文截图

Monte Carlo parameter study for Seyfert AGN-starburst composite galaxies NGC1068 and NGC7469

Authors:Silvia Salvatore, Björn Eichmann, Giacomo Sommani, Santiago del Palacio, Patrik M. Veres, Julia Becker Tjus

Seyfert-starburst composite galaxies host two promising phenomena of non-thermal high-energy radiation. In this regard the IceCube observation of high-energy neutrinos from the direction of the Seyfert-starburst composite galaxy NGC 1068 is not surprising. More recently, another Seyfert-starburst composite galaxy, NGC 7469, has shown hints for neutrino emission at even higher energies. Theoretical investigations could clarify that their so-called AGN corona is the most-likely origin of these neutrinos due to the need of being partially $\gamma$-ray opaque. In this work, we present an updated version of our Seyfert-starburst composite model from 2022, that accounts for a proper treatment of the stochastic acceleration processes in the AGN corona and the secondary electrons and positrons from leptonic radiation processes. Moreover, we use a Markov Chain Monte Carlo (MCMC) approach to study the parameter space of these two potential high-energy neutrino sources under consideration of the given prior knowledge. In the case of NGC 1068, we can successfully explain its non-thermal observational features, where both its AGN corona and starburst ring are needed to account for the observations at high- energies. In the case of NGC 7469, the high-energy signatures can only be explained assuming a small coronal radius and the including external $\gamma\gamma$-pair attenuation. In general, both sources exhibit a strong influence of the $\gamma$-ray opaqueness on the results, highlighting the need for an accurate treatment of the intrinsic coronal X-ray field and the spatial extent of the $\gamma$-ray production site.

塞弗特星爆发复合星系有两种具有前景的非热高能辐射现象。因此，从塞弗特星爆发复合星系NGC 1068方向观测到IceCube的高能中微子并不意外。最近，另一个塞弗特星爆发复合星系NGC 7469显示出更高能量下的中微子发射迹象。理论研究表明，由于需要部分γ射线不透性，所谓的活动星系核冕可能是这些中微子的最可能来源。在这项工作中，我们展示了更新后的塞弗特星爆发复合模型（2022版），该模型考虑了活动星系核冕中随机加速过程的适当处理以及来自轻子辐射过程的次级电子和正电子。此外，我们使用马尔可夫链蒙特卡罗（MCMC）方法，根据已有的先验知识来研究这两个潜在的高能中微子源的参数空间。对于NGC 1068，我们可以成功解释其非热观测特征，其中既需要其活动星系核冕也需要其星爆环来解释高能观测结果。对于NGC 7469，高能特征只能假设较小的冕半径并包括外部γγ对衰减才能得到解释。总的来说，两个源的结果都强烈受到γ射线不透性的影响，这强调了需要准确处理内在的冠状X射线场和γ射线产生部位的空间范围。

论文及项目相关链接

PDF

摘要

塞弗特-星暴复合星系表现出非热高能辐射的两个有前景的现象。IceCube对塞弗特-星暴复合星系NGC 1068的高能中微子观测并不意外。最近，另一个塞弗特-星暴复合星系NGC 7469显示出更高能级的中微子发射迹象。理论研究表明，由于需要部分γ射线遮蔽，所谓的活动星系核冕可能是这些中微子的起源。在这项工作中，我们展示了更新后的塞弗特-星暴复合模型（2022版），该模型对活动星系核冕中的随机加速过程以及来自轻子辐射过程的次级电子和正电子进行了适当处理。此外，我们采用马尔可夫链蒙特卡洛（MCMC）方法，根据现有先验知识研究这两个潜在高能中微子源参数空间。在NGC 1068的情况下，我们能够成功解释其非热观测特征，其中既需要其活动星系核冕也需要星暴环来解释高能观测结果。在NGC 7469的情况下，高能特征只能假设较小的冕半径并包括外部γγ对衰减来解释。总的来说，两个源都表现出γ射线遮蔽对结果的影响，强调了对固有冠状X射线场和γ射线产生位点空间范围的准确处理的必要性。

关键见解

塞弗特-星暴复合星系表现出非热高能辐射现象。
NGC 1068和NGC 7469是两个具有高能中微子发射迹象的星系。
理论研究表明，活动星系核冕可能是这些中微子的起源，需要部分γ射线遮蔽。
更新的塞弗特-星暴复合模型考虑了活动星系核冕中的随机加速过程和轻子辐射的次级电子和正电子。
采用MCMC方法研究了这两个星系的中微子源参数空间。
在NGC 1068的情况下，需要同时考虑其核冕和星暴环来解释高能观测结果。

Cool Papers

点此查看论文截图

Automated Labeling of Intracranial Arteries with Uncertainty Quantification Using Deep Learning

Authors:Javier Bisbal, Patrick Winter, Sebastian Jofre, Aaron Ponce, Sameer A. Ansari, Ramez Abdalla, Michael Markl, Oliver Welin Odeback, Sergio Uribe, Cristian Tejos, Julio Sotelo, Susanne Schnell, David Marlevi

Accurate anatomical labeling of intracranial arteries is essential for cerebrovascular diagnosis and hemodynamic analysis but remains time-consuming and subject to interoperator variability. We present a deep learning-based framework for automated artery labeling from 3D Time-of-Flight Magnetic Resonance Angiography (3D ToF-MRA) segmentations (n=35), incorporating uncertainty quantification to enhance interpretability and reliability. We evaluated three convolutional neural network architectures: (1) a UNet with residual encoder blocks, reflecting commonly used baselines in vascular labeling; (2) CS-Net, an attention-augmented UNet incorporating channel and spatial attention mechanisms for enhanced curvilinear structure recognition; and (3) nnUNet, a self-configuring framework that automates preprocessing, training, and architectural adaptation based on dataset characteristics. Among these, nnUNet achieved the highest labeling performance (average Dice score: 0.922; average surface distance: 0.387 mm), with improved robustness in anatomically complex vessels. To assess predictive confidence, we implemented test-time augmentation (TTA) and introduced a novel coordinate-guided strategy to reduce interpolation errors during augmented inference. The resulting uncertainty maps reliably indicated regions of anatomical ambiguity, pathological variation, or manual labeling inconsistency. We further validated clinical utility by comparing flow velocities derived from automated and manual labels in co-registered 4D Flow MRI datasets, observing close agreement with no statistically significant differences. Our framework offers a scalable, accurate, and uncertainty-aware solution for automated cerebrovascular labeling, supporting downstream hemodynamic analysis and facilitating clinical integration.

精确解剖标记颅内动脉对脑血管疾病的诊断和血流动力学分析至关重要，但这仍是一项耗时的任务，并且存在操作者之间的差异。我们提出了一种基于深度学习的自动化动脉标记框架，可从三维飞行时间磁共振血管造影（3D ToF-MRA）分割数据中得出（n=35），并结合不确定性量化来提高可解释性和可靠性。我们评估了三种卷积神经网络架构：（1）带有残差编码器块的UNet，这是血管标记中常用的基准线；（2）CS-Net，一种增强型UNet，结合了通道和空间注意力机制，以提高曲线结构的识别能力；（3）nnUNet，一个自配置框架，可基于数据集特性自动化预处理、训练和架构调整。其中，nnUNet的标记性能最高（平均Dice得分为0.922，平均表面距离为0.387毫米），在解剖结构复杂的血管中表现出更高的稳健性。为了评估预测置信度，我们实施了测试时间增强（TTA）并引入了一种新型坐标引导策略，以减少增强推理过程中的插值误差。所得的不确定性图可靠地指示了解剖模糊、病理变化或手动标记不一致的区域。我们通过比较自动和手动标记在注册的4D血流MRI数据集中的流速来进一步验证临床实用性，观察到两者间无统计学上的显著差异，一致性较好。我们的框架提供了一个可扩展、准确、具有不确定性的自动化脑血管标记解决方案，支持下游血流动力学分析，促进临床整合。

论文及项目相关链接

PDF 16 pages, 6 figures

Summary

本文介绍了一种基于深度学习的自动化颅内动脉标注框架，应用于3D时间飞行磁共振血管造影（3D ToF-MRA）分割。该框架结合不确定性量化，提高了解释性和可靠性。评估了三种卷积神经网络架构，最终发现nnUNet在解剖复杂血管中表现出最高的标注性能。通过测试时间增强和一种新的坐标引导策略，该框架的不确定性地图能可靠地指示解剖模糊、病理变化或手动标注不一致的区域。此外，通过与手动标签在注册的4D血流MRI数据集上的流速对比，验证了其临床实用性。该框架为下游血流动力学分析和临床整合提供了可扩展、准确和考虑不确定性的解决方案。

Key Takeaways

自动化颅内动脉标注对脑血管诊断和血流动力学分析至关重要。
介绍了一种基于深度学习的自动化动脉标注框架，应用于3D ToF-MRA分割。
结合不确定性量化，提高模型的解释性和可靠性。
评估了三种卷积神经网络架构，其中nnUNet在复杂血管结构中表现最佳。
通过测试时间增强和坐标引导策略，生成不确定性地图以指示模糊或不一致区域。
框架的临床实用性通过对比自动化和手动标签的流速得到验证。

Cool Papers

点此查看论文截图

MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson’s Disease with Limited 3D MR Data

Authors:Ding Shaodong, Liu Ziyang, Zhou Yijun, Liu Tao

The automatic diagnosis of Parkinson’s disease is in high clinical demand due to its prevalence and the importance of targeted treatment. Current clinical practice often relies on diagnostic biomarkers in QSM and NM-MRI images. However, the lack of large, high-quality datasets makes training diagnostic models from scratch prone to overfitting. Adapting pre-trained 3D medical models is also challenging, as the diversity of medical imaging leads to mismatches in voxel spacing and modality between pre-training and fine-tuning data. In this paper, we address these challenges by leveraging 2D vision foundation models (VFMs). Specifically, we crop multiple key ROIs from NM and QSM images, process each ROI through separate branches to compress the ROI into a token, and then combine these tokens into a unified patient representation for classification. Within each branch, we use 2D VFMs to encode axial slices of the 3D ROI volume and fuse them into the ROI token, guided by an auxiliary segmentation head that steers the feature extraction toward specific brain nuclei. Additionally, we introduce multi-ROI supervised contrastive learning, which improves diagnostic performance by pulling together representations of patients from the same class while pushing away those from different classes. Our approach achieved first place in the MICCAI 2025 PDCADxFoundation challenge, with an accuracy of 86.0% trained on a dataset of only 300 labeled QSM and NM-MRI scans, outperforming the second-place method by 5.5%.These results highlight the potential of 2D VFMs for clinical analysis of 3D MR images.

帕金森病在临床上的自动诊断需求很高，因为其流行性和针对性治疗的重要性。当前的临床实践通常依赖于QSM和NM-MRI图像中的诊断生物标志物。然而，由于缺乏大规模的高质量数据集，从零开始训练诊断模型容易导致过拟合。此外，适应预训练的3D医疗模型也面临挑战，因为医学成像的多样性导致预训练和微调数据之间的体素间距和模态不匹配。

在本文中，我们通过利用二维视觉基础模型（VFMs）来解决这些挑战。具体来说，我们从NM和QSM图像中裁剪多个关键ROI，通过单独的分支处理每个ROI，将其压缩为令牌，然后将这些令牌组合成统一的患者表示进行分类。在每个分支中，我们使用二维VFMs对三维ROI体积的轴向切片进行编码，并将其融合到ROI令牌中，由辅助分割头引导，该分割头将特征提取引导至特定的脑核。此外，我们引入了多ROI监督对比学习，通过拉近同一类别患者的表示并推开不同类别患者的表示，提高了诊断性能。

论文及项目相关链接

PDF First-place solution of the classification track for MICCAI’2025 PDCADxFoundation Challenge

Summary
本文利用二维视觉基础模型（VFMs）解决帕金森病自动诊断中的挑战。通过裁剪关键ROI并进行处理，将患者图像转化为统一表示进行分类。引入多ROI监督对比学习提高诊断性能。在MICCAI 2025 PDCADxFoundation挑战中，使用仅300个标记的QSM和NM-MRI扫描数据集训练，准确率达86.0%，优于第二名方法5.5%。突显了二维VFMs在三维医学图像临床应用中的潜力。

Key Takeaways

帕金森病自动诊断的临床需求迫切，当前实践依赖QSM和NM-MRI图像的诊断生物标志物。
缺乏大规模高质量数据集导致从头训练诊断模型易出现过拟合。
利用二维视觉基础模型（VFMs）解决适应预训练的三维医学模型的挑战。
通过裁剪并处理关键ROI（感兴趣区域），将患者图像转化为统一表示进行分类。
引入多ROI监督对比学习提高诊断性能，通过拉近同一类别患者表示，推远不同类别患者表示。
在MICCAI挑战中取得第一名，使用有限数据集准确率达86.0%，表现出二维VFMs在医学图像分析中的潜力。

Cool Papers

点此查看论文截图

Multimodal Medical Image Classification via Synergistic Learning Pre-training

Authors:Qinghua Lin, Guang-Hai Liu, Zuoyong Li, Yang Li, Yuting Jiang, Xiang Wu

Multimodal pathological images are usually in clinical diagnosis, but computer vision-based multimodal image-assisted diagnosis faces challenges with modality fusion, especially in the absence of expert-annotated data. To achieve the modality fusion in multimodal images with label scarcity, we propose a novel ``pretraining + fine-tuning” framework for multimodal semi-supervised medical image classification. Specifically, we propose a synergistic learning pretraining framework of consistency, reconstructive, and aligned learning. By treating one modality as an augmented sample of another modality, we implement a self-supervised learning pre-train, enhancing the baseline model’s feature representation capability. Then, we design a fine-tuning method for multimodal fusion. During the fine-tuning stage, we set different encoders to extract features from the original modalities and provide a multimodal fusion encoder for fusion modality. In addition, we propose a distribution shift method for multimodal fusion features, which alleviates the prediction uncertainty and overfitting risks caused by the lack of labeled samples. We conduct extensive experiments on the publicly available gastroscopy image datasets Kvasir and Kvasirv2. Quantitative and qualitative results demonstrate that the proposed method outperforms the current state-of-the-art classification methods. The code will be released at: https://github.com/LQH89757/MICS.

多模态病理图像在临床诊断中通常很重要，但基于计算机视觉的多模态图像辅助诊断在模态融合方面面临挑战，特别是在缺乏专家标注数据的情况下。为了实现标签稀缺情况下的多模态图像模态融合，我们提出了一种新的“预训练+微调”框架用于多模态半监督医学图像分类。具体来说，我们提出了一个协同学习预训练框架，包括一致性、重建和对齐学习。通过将一种模态视为另一种模态的增强样本，我们实现了自监督学习预训练，提高了基线模型的特征表示能力。然后，我们设计了一种用于模态融合的微调方法。在微调阶段，我们设置不同的编码器从原始模态中提取特征，并提供一个多模态融合编码器进行融合模态。此外，我们提出了一种多模态融合特征分布转移方法，这减轻了由于缺少标记样本导致的不确定性预测和过拟合风险。我们在公开可用的胃镜图像数据集Kvasir和Kvasirv2上进行了大量实验。定量和定性结果表明，所提出的方法优于当前最先进的分类方法。代码将在以下网址发布：https://github.com/LQH89757/MICS 。

论文及项目相关链接

PDF

Summary

在临床医学诊断中，多模态医学图像的使用变得越来越普遍。然而，在缺乏专家标注数据的情况下，多模态图像融合面临挑战。为此，我们提出了一种新型的“预训练+微调”框架，用于进行半监督多模态医学图像分类。通过一致性、重建和对齐学习的协同学习预训练框架，提高基线模型的特征表示能力。随后设计了一种微调方法进行多模态融合。实验证明，该方法在公开可用的胃镜图像数据集Kvasir和Kvasirv2上表现出优异性能。

Key Takeaways

多模态医学图像在临床诊断中广泛应用，但缺乏专家标注数据的模态融合面临挑战。
提出了一种“预训练+微调”框架进行半监督多模态医学图像分类。
协同学习预训练框架包括一致性、重建和对接学习，提高模型特征表示能力。
微调方法设计用于多模态融合，包括不同编码器提取特征和一个多模态融合编码器。
引入分布偏移方法处理多模态融合特征，降低预测不确定性和过拟合风险。
在公开数据集Kvasir和Kvasirv2上进行广泛实验，证明该方法优于现有最先进的分类方法。

Cool Papers

点此查看论文截图

Medical AI Consensus: A Multi-Agent Framework for Radiology Report Generation and Evaluation

Authors:Ahmed T. Elboardy, Ghada Khoriba, Essam A. Rashed

Automating radiology report generation poses a dual challenge: building clinically reliable systems and designing rigorous evaluation protocols. We introduce a multi-agent reinforcement learning framework that serves as both a benchmark and evaluation environment for multimodal clinical reasoning in the radiology ecosystem. The proposed framework integrates large language models (LLMs) and large vision models (LVMs) within a modular architecture composed of ten specialized agents responsible for image analysis, feature extraction, report generation, review, and evaluation. This design enables fine-grained assessment at both the agent level (e.g., detection and segmentation accuracy) and the consensus level (e.g., report quality and clinical relevance). We demonstrate an implementation using chatGPT-4o on public radiology datasets, where LLMs act as evaluators alongside medical radiologist feedback. By aligning evaluation protocols with the LLM development lifecycle, including pretraining, finetuning, alignment, and deployment, the proposed benchmark establishes a path toward trustworthy deviance-based radiology report generation.

自动化生成放射学报告面临双重挑战：建立临床可靠的系统和设计严格的评价协议。我们引入了一种多智能体强化学习框架，该框架既可作为放射学生态系统中多模式临床推理的基准测试环境，也可用于对其进行评价。所提出的框架在模块化架构中集成了大型语言模型（LLM）和大型视觉模型（LVM），由10个专职智能体组成，负责图像分析、特征提取、报告生成、审查和评估。这种设计能够在智能体层面（例如检测和分割准确性）和共识层面（例如报告质量和临床相关性）进行精细评估。我们使用ChatGPT-4o在公共放射学数据集上进行了实现展示，其中LLM作为评估器与医学放射科医师反馈相结合。通过使评价协议与LLM开发周期（包括预训练、微调、对齐和部署）保持一致，所提出的标准建立了一条基于可信偏差的放射学报告生成路径。

论文及项目相关链接

PDF NeurIPS2025 Workshop: Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling

Summary
医学图像自动化报告生成面临临床可靠系统和严格评估协议的双挑战。引入多智能体强化学习框架，作为放射学生态系统中多模式临床推理的基准和评价环境。该框架整合大型语言模型和大型视觉模型，采用模块化架构，包含负责图像分析、特征提取、报告生成、审查和评估的十个专用智能体。这一设计可在智能体层面和共识层面进行精细评估，如检测与分割精度、报告质量和临床相关性等。使用ChatGPT-4o在公共放射学数据集上实现，大型语言模型扮演评估者角色，同时结合医学放射科医生反馈。通过让评估协议与大型语言模型的发展周期保持一致，包括预训练、微调、对齐和部署，该基准测试为基于可信偏差的医学图像报告生成铺平了道路。

Key Takeaways

自动化生成医学图像报告需要临床可靠性和严格的评估协议。
引入多智能体强化学习框架，作为放射学中的基准和评价环境。
框架整合大型语言模型和大型视觉模型，采用模块化设计。
框架包含负责不同任务的十个专用智能体。
可以在智能体层面和共识层面进行精细评估。
使用ChatGPT-4o在公共放射学数据集上实现该框架。

Cool Papers

点此查看论文截图

Viscoelastic properties of tumor spheroids revealed by a microfluidic compression device and a modified power law model

Authors:Mrinal Pandey, Bangguo Zhu, Kaitlyn Roach, Young Joon Suh, Jeffrey E Segall, Chung-Yuen Hui, Mingming Wu

Clinically, palpation is one of the important diagnostic methods to assess tumor malignancy. In laboratory research, it is well accepted that the bulk stiffness of the tumor and the surrounding tissue is closely correlated with the malignant state of the tumor. Here, we postulate that, in addition to tumor stiffness, tumor viscoelasticity - the fact that tumor tissue takes time to bounce back after compression, can also be used to evaluate the tumor malignancy state. In this work, we characterized the viscoelastic properties of breast tumor spheroids using a recently developed microfluidic compression device and a theoretical power law model. Breast tumor cells at varying malignant levels; a non-tumorigenic epithelial (MCF10A), moderately malignant tumor (MCF7) and triple negative metastatic tumor (MDA-MB-231) cells were used. Spheroids embedded within a 3D extracellular matrix were periodically compressed, and their strain responses were recorded using microscopic imaging. Our results revealed that the measured strain relaxation curves can be successfully described by a modified power law model, demonstrated that non-tumorigenic tumor spheroids were more elastic, exhibited shorter relaxation time and less plasticity than those of tumorigenic spheroids. Together, these results highlight that viscoelastic properties in addition to bulk stiffness of the tumor spheroids can serve as a complementary mechanical biomarker of tumor malignancy and demonstrate the validity of a modified power law model for the mechanical characterization of a living tissue.

在临床中，触诊是评估肿瘤恶性程度的重要诊断方法之一。在实验室研究中，人们普遍认为肿瘤的体积刚度和周围组织的体积刚度与肿瘤的恶性状态密切相关。在此，我们假设除了肿瘤刚度外，肿瘤的粘弹性（即肿瘤组织在压缩后需要一段时间才能恢复的特性）也可用于评估肿瘤的恶性状态。在这项工作中，我们使用最近开发的微流体压缩装置和理论幂律模型表征了乳腺癌球体的粘弹性特性。使用了不同恶性程度的乳腺癌细胞，包括非肿瘤性上皮（MCF10A）、中度恶性肿瘤（MCF7）和三阴性转移性肿瘤（MDA-MB-231）细胞。嵌入三维细胞外基质的球体被定期压缩，其应变响应通过显微镜成像进行记录。我们的结果表明，测得的应变松弛曲线可以成功地用修正后的幂律模型来描述，这表明非肿瘤性肿瘤球体更富有弹性，其松弛时间较短、可塑性较小，与具有肿瘤性的球体相比。总体而言，这些结果强调，除了肿瘤球体的体积刚度外，其粘弹性特性可以作为补充的机械生物标志物来指示肿瘤的恶性程度，并验证了修正后的幂律模型在活组织机械特性表征中的有效性。

论文及项目相关链接

PDF This manuscript contains 14 pages, 4 figures, along with a Supplementary Information

Summary
本研究探讨了肿瘤球体的粘弹性特性与肿瘤恶性程度的关系。利用微流体压缩装置和理论幂律模型，对三种不同恶性程度的乳腺肿瘤细胞进行三维基质内球体压缩实验，并记录应变响应。研究发现非肿瘤性球体较有弹性，松驰时间较短且可塑性较低；恶性肿瘤球体则展现出更粘弹性质地。这些结果表明，肿瘤球体的粘弹性特性可作为肿瘤恶性程度的辅助机械生物标志物，同时验证了改进后的幂律模型在活组织机械特性表征中的有效性。

Key Takeaways

肿瘤恶性和粘弹性特性有关。除了肿瘤的体积刚度外，肿瘤的粘弹性即压缩后恢复的时间长度也能反映肿瘤的恶性程度。
利用微流体压缩装置和理论幂律模型对乳腺肿瘤细胞的粘弹性特性进行了研究。
三种不同恶性程度的乳腺肿瘤细胞被研究：非肿瘤性上皮、中度恶性肿瘤和转移性三阴性肿瘤细胞。
通过周期性压缩嵌入在三维基质中的肿瘤球体，并记录其应变响应来了解其特性。
研究发现非肿瘤性球体的弹性较高，具有较短的松弛时间和较低的可塑性，而肿瘤球体的粘弹性更为明显。
肿瘤球体的粘弹性特性可以作为肿瘤恶性程度的辅助机械生物标志物。

Cool Papers

点此查看论文截图

eROSITA-selection of new period-bounce Cataclysmic Variables: First follow-up confirmation using TESS and SDSS

Authors:Daniela Muñoz-Giraldo, Beate Stelzer, Axel Schwope, Santiago Hernández-Díaz, Scott F. Anderson, Sebastian Demasi

Between 40$%$ and 80$%$ of cataclysmic variables (CVs) are expected to have evolved past the period-minimum and contain a degenerate donor. However, observational surveys for CVs have only been able to detect a few of these highly evolved “period-bouncers”, most likely due to the intrinsic faintness associated with their predicted low mass accretion rates. We have produced an initial selection of 137 high-likelihood period-bounce candidates from WD catalog based on our multiwavelength period-bouncer scorecard and selection cuts including X-ray data from the extended ROentgen Survey with an Imaging Telescope Array (eROSITA) on board the Spektrum-Roentgen-Gamma spacecraft (SRG). We have laid out three main requirements (classification as a CV, determination of an orbital period and detection of a very late-type donor) that should result in the confirmation of several of these candidates. Our path for confirming these candidates has already produced its first successful result with the confirmation of GALEX J125751.4-283015 as a new period-bouncer. Several other candidates have already fulfilled at least one of our three requirements making their future confirmation likely. Our search for period-bouncers using the X-ray eROSITA emission of objects in optical WD catalogs has led to the confirmation of six new period-bouncers identified from the Gaia DR3 WD catalog (five previously known CVs and one WD candidate), a 18$%$ increase that brings the present population to 39 systems. Both the selection method for period-bounce candidates and the confirmation path that we outlined will aid in future searches for new period-bounce candidates, contributing to the goal of resolving the discrepancy between the predicted high number of period-bouncers and the low number of these systems successfully observed to date.

预计约有40%至80%的大灾变变量（CVs）已经演进到周期最小阶段之后，并包含退化供体。然而，对于CV的观察调查只能检测到这些高度演化的“周期反弹者”中的少数几个，这很可能与它们预测的较低质量增长率的固有暗淡性有关。我们基于多波长周期反弹评分卡和包括X射线数据在内的选择标准（这些数据来自搭载在光谱罗恩塔伽太空飞行器上的扩展罗恩塔伽成像望远镜阵列的eROSITA调查），从WD目录中初步筛选出了137个高概率周期反弹候选者。我们提出了三个主要要求（分类为CV、确定轨道周期和检测到非常晚型的供体），这将导致对这些候选者的确认。我们确认这些候选者的路径已经产生了第一个成功的结果，即确认GALEX J125751.4-283015为新周期反弹者。其他几位候选者已经满足了我们的至少一项要求，因此他们的未来确认可能性很大。我们利用光学WD目录中的X射线eROSITA发射物对周期反弹者的搜索已经确认了在Gaia DR3 WD目录中新增的六个周期反弹者（五个已知的CV和一个WD候选者），这一增长率为目前的周期反弹者数量增加了约增加到了原先的百分之一十八（现在总计为三十九个系统）。无论是周期反弹候选者的选择方法还是我们概述的确认路径，都将有助于未来寻找新的周期反弹候选者，为实现解决预测中存在大量周期反弹者与迄今为止观察到的这些系统数量之间差距的目标做出贡献。

论文及项目相关链接

PDF 15 pages, 12 figures. Submitted to A&A

Summary

本文介绍了对灾变变量（CVs）中的周期反弹候选者的研究。据估计，40%~80%的CVs已经演化到周期最小值之后并拥有退化供体星，但观测调查仅能检测到少数这些高度演化的“周期反弹者”，这可能与它们预测的较低质量吸积率导致的内在暗淡有关。研究者基于多波长周期反弹得分卡和选择标准，从WD目录中筛选出137个高概率周期反弹候选者。使用SRG卫星上的eROSITA望远镜的X射线数据，研究者确认了至少六个新的周期反弹系统，其中五个是已知的CVs和一个WD候选者。研究者的选择方法和确认路径将有助于未来寻找新的周期反弹候选者，为解决预测的周期反弹者数量与成功观察到的数量之间的不一致做出贡献。

Key Takeaways

灾变变量（CVs）中有高达80%可能已演化至周期最小值之后，包含退化供体星。
目前观测仅能检测到少数“周期反弹者”，原因在于其预测的低质量吸积率导致的内在暗淡。
基于多波长周期反弹得分卡和选择标准，从WD目录筛选出高概率周期反弹候选者。
利用SRG卫星上的eROSITA望远镜的X射线数据，确认了至少六个新的周期反弹系统。
研究者的选择方法和确认路径有助于提高未来寻找新的周期反弹候选者的效率。
此研究为解决预测的周期反弹者数量与成功观察到的数量之间的不一致做出贡献。

Cool Papers

点此查看论文截图

Echo-Path: Pathology-Conditioned Echo Video Generation

Authors:Kabir Hamzah Muhammad, Marawan Elbatel, Yi Qin, Xiaomeng Li

Cardiovascular diseases (CVDs) remain the leading cause of mortality globally, and echocardiography is critical for diagnosis of both common and congenital cardiac conditions. However, echocardiographic data for certain pathologies are scarce, hindering the development of robust automated diagnosis models. In this work, we propose Echo-Path, a novel generative framework to produce echocardiogram videos conditioned on specific cardiac pathologies. Echo-Path can synthesize realistic ultrasound video sequences that exhibit targeted abnormalities, focusing here on atrial septal defect (ASD) and pulmonary arterial hypertension (PAH). Our approach introduces a pathology-conditioning mechanism into a state-of-the-art echo video generator, allowing the model to learn and control disease-specific structural and motion patterns in the heart. Quantitative evaluation demonstrates that the synthetic videos achieve low distribution distances, indicating high visual fidelity. Clinically, the generated echoes exhibit plausible pathology markers. Furthermore, classifiers trained on our synthetic data generalize well to real data and, when used to augment real training sets, it improves downstream diagnosis of ASD and PAH by 7% and 8% respectively. Code, weights and dataset are available here https://github.com/Marshall-mk/EchoPathv1

心血管疾病（CVD）仍然是全球主要的死亡原因，超声心动图对于常见和先天性心脏病的诊断至关重要。然而，某些病理的超声心动图数据稀缺，阻碍了稳健的自动化诊断模型的发展。在这项工作中，我们提出了Echo-Path，这是一个新型生成框架，可以根据特定的心脏病理生成超声心动图视频。Echo-Path可以合成逼真的超声视频序列，这些视频序列显示出了有针对性的异常，这里主要关注房间隔缺损（ASD）和肺动脉高压（PAH）。我们的方法将病理调节机制引入先进的回声视频生成器，使模型能够学习和控制心脏中的特定疾病的结构和运动模式。定量评估表明，合成视频的分布距离较低，说明视觉保真度较高。在临床方面，生成的回声表现出合理的病理标记。此外，使用我们的合成数据进行训练的分类器能够很好地推广到真实数据，当用于增强真实训练集时，对提高ASD和PAH的下游诊断准确率分别提高了7%和8%。代码、权重和数据集可在此处找到：[https://github.com/Marshall-mk/EchoPathv1]（链接无法直接访问，请自行复制至浏览器访问）。

论文及项目相关链接

PDF 10 pages, 3 figures, MICCAI-AMAI2025 Workshop

Summary

本文提出一种名为Echo-Path的新型生成框架，用于根据特定心脏病理生成超声心动图视频序列。该框架能够合成具有真实感的超声心动视频，展示特定的心脏异常，如房间隔缺损和肺动脉高压。该研究将病理条件机制引入先进的回声视频生成器，使模型能够学习和控制疾病特定的心脏结构和运动模式。合成的视频在视觉上有高度的真实感，并且可以作为临床病理标记。此外，使用合成数据训练的分类器能够很好地推广到真实数据，并且当用于增强现实训练集时，可以提高对房间隔缺损和肺动脉高压的诊断准确率。

Key Takeaways

Echo-Path是一种用于生成特定心脏病理的超声心动图视频的生成框架。
该框架能够合成具有真实感的超声心动视频序列，展示特定的心脏异常如房间隔缺损和肺动脉高压。
Echo-Path将病理条件机制引入先进的回声视频生成器。
模型能够学习和控制疾病特定的心脏结构和运动模式。
合成的视频在视觉上高度真实，可作为临床病理标记。
使用合成数据训练的分类器具有良好的泛化性能。

Cool Papers

点此查看论文截图

Ambiguous Medical Image Segmentation Using Diffusion Schrödinger Bridge

Authors:Lalith Bharadwaj Baru, Kamalaker Dadi, Tapabrata Chakraborti, Raju S. Bapi

Accurate segmentation of medical images is challenging due to unclear lesion boundaries and mask variability. We introduce \emph{Segmentation Sch"{o}dinger Bridge (SSB)}, the first application of Sch"{o}dinger Bridge for ambiguous medical image segmentation, modelling joint image-mask dynamics to enhance performance. SSB preserves structural integrity, delineates unclear boundaries without additional guidance, and maintains diversity using a novel loss function. We further propose the \emph{Diversity Divergence Index} ($D_{DDI}$) to quantify inter-rater variability, capturing both diversity and consensus. SSB achieves state-of-the-art performance on LIDC-IDRI, COCA, and RACER (in-house) datasets.

医学图像准确分割是一个具有挑战性的任务，因为病变边界不清晰和掩膜存在变化。我们引入了Schödinger Bridge分割（SSB），这是Schödinger Bridge首次被应用于模糊医学图像分割，通过建模图像和掩膜联合动态来提高性能。SSB保留了结构完整性，在没有额外指导的情况下描绘了不清晰边界，并使用新型损失函数维持多样性。我们进一步提出了多样性发散指数（D_{DDI}）来量化评价者间差异性，同时捕捉多样性和共识。SSB在LIDC-IDRI、COCA和RACER（内部）数据集上实现了最新技术水平的性能表现。

论文及项目相关链接

PDF MICCAI 2025 (11 pages, 2 figures, 1 table, and 26 references)

Summary

医学图像分割面临因病灶边界不清和遮罩变化带来的挑战。我们引入“Segmentation Sch"{o}dinger Bridge (SSB)”技术，首次将Sch"{o}dinger Bridge应用于模糊医学图像分割，建立图像与遮罩的动态联合模型以提高性能。SSB保留结构完整性，在无需额外指导的情况下描绘不清的边界，并使用新型损失函数维持多样性。我们还提出了“多样性发散指数”（Diversity Divergence Index，$D_{DDI}$）来衡量评估者之间的差异，同时捕捉多样性和共识。SSB在LIDC-IDRI、COCA和RACER数据集上达到了业界领先水平。

Key Takeaways

医学图像分割面临病灶边界不清和遮罩变化带来的挑战。
引入SSB技术，首次将Sch"{o}dinger Bridge应用于医学图像分割。
SSB建立图像与遮罩的动态联合模型，提高分割性能。
SSB在无需额外指导的情况下能够描绘不清的边界。
SSB使用新型损失函数来保持多样性。
提出“多样性发散指数”（Diversity Divergence Index，$D_{DDI}$）来衡量评估者之间的差异。

Cool Papers

点此查看论文截图

Accurate Thyroid Cancer Classification using a Novel Binary Pattern Driven Local Discrete Cosine Transform Descriptor

Authors:Saurabh Saini, Kapil Ahuja, Marc C. Steinbach, Thomas Wick

In this study, we develop a new CAD system for accurate thyroid cancer classification with emphasis on feature extraction. Prior studies have shown that thyroid texture is important for segregating the thyroid ultrasound images into different classes. Based upon our experience with breast cancer classification, we first conjuncture that the Discrete Cosine Transform (DCT) is the best descriptor for capturing textural features. Thyroid ultrasound images are particularly challenging as the gland is surrounded by multiple complex anatomical structures leading to variations in tissue density. Hence, we second conjuncture the importance of localization and propose that the Local DCT (LDCT) descriptor captures the textural features best in this context. Another disadvantage of complex anatomy around the thyroid gland is scattering of ultrasound waves resulting in noisy and unclear textures. Hence, we third conjuncture that one image descriptor is not enough to fully capture the textural features and propose the integration of another popular texture capturing descriptor (Improved Local Binary Pattern, ILBP) with LDCT. ILBP is known to be noise resilient as well. We term our novel descriptor as Binary Pattern Driven Local Discrete Cosine Transform (BPD-LDCT). Final classification is carried out using a non-linear SVM. The proposed CAD system is evaluated on the only two publicly available thyroid cancer datasets, namely TDID and AUITD. The evaluation is conducted in two stages. In Stage I, thyroid nodules are categorized as benign or malignant. In Stage II, the malignant cases are further sub-classified into TI-RADS (4) and TI-RADS (5). For Stage I classification, our proposed model demonstrates exceptional performance of nearly 100% on TDID and 97% on AUITD. In Stage II classification, the proposed model again attains excellent classification of close to 100% on TDID and 99% on AUITD.

在这项研究中，我们开发了一种新的计算机辅助诊断（CAD）系统，用于对甲状腺癌进行精确分类，并重点研究特征提取。先前的研究表明，甲状腺纹理对于将甲状腺超声图像分类为不同的类别非常重要。基于我们对乳腺癌分类的经验，我们首先推测离散余弦变换（DCT）是捕获纹理特征的最佳描述符。甲状腺超声图像尤其具有挑战性，因为该腺体被多个复杂的解剖结构所包围，导致组织密度变化。因此，我们其次也推测定位的重要性，并提出局部离散余弦变换（LDCT）描述符在这种情况下能最好地捕获纹理特征。甲状腺周围复杂解剖结构另一个不利之处是超声波的散射导致纹理嘈杂且不清晰。因此，我们再次推测单一的图像描述符不足以完全捕获纹理特征，并提出将另一种流行的纹理捕获描述符（改进后的局部二值模式，ILBP）与LDCT相结合。已知ILBP具有抗噪声性。我们将我们的新型描述符称为二进制模式驱动的局部离散余弦变换（BPD-LDCT）。最终的分类是通过非线性支持向量机（SVM）进行的。该CAD系统是在两个公开的甲状腺癌数据集TDID和AUITD上进行评估的。评估分为两个阶段进行。在第一阶段，甲状腺结节被分类为良性或恶性。在第二阶段，恶性病例进一步细分为TI-RADS（4）和TI-RADS（5）。在第一阶段分类中，我们提出的模型在TDID上表现出近100％的出色性能，在AUITD上为97％。在第二阶段分类中，该模型在TDID上再次实现了接近100％的优秀分类，在AUITD上为99％。

论文及项目相关链接

PDF 15 Pages, 7 Figures, 5 Tables

Summary

本研究开发了一种新的CAD系统，用于甲状腺癌症的准确分类，重点在于特征提取。研究中，结合离散余弦变换（DCT）与局部离散余弦变换（LDCT）以及改进局部二值模式（ILBP）构建了一种新型描述器BPD-LDCT，以捕捉甲状腺超声图像中的纹理特征。最终采用非线性支持向量机进行分类。在公开数据集TDID和AUITD上的实验结果表明，该CAD系统在两个阶段均表现出优异的性能。

Key Takeaways

研究开发了一种新的CAD系统用于甲状腺癌症分类。
重点在于特征提取，结合离散余弦变换（DCT）与局部离散余弦变换（LDCT）进行纹理特征捕捉。
针对甲状腺超声图像复杂解剖结构引起的纹理变化问题，提出采用改进局部二值模式（ILBP）结合LDCT进行描述。
使用非线性支持向量机进行分类。

Cool Papers

点此查看论文截图

TF-DWGNet: A Directed Weighted Graph Neural Network with Tensor Fusion for Multi-Omics Cancer Subtype Classification

Authors:Tiantian Yang, Zhiqian Chen

Integration and analysis of multi-omics data provide valuable insights for cancer subtype classification. However, such data are inherently heterogeneous, high-dimensional, and exhibit complex intra- and inter-modality dependencies. Recent advances in graph neural networks (GNNs) offer powerful tools for modeling such structure. Yet, most existing methods rely on prior knowledge or predefined similarity networks to construct graphs, which are often undirected or unweighted, failing to capture the directionality and strength of biological interactions. Interpretability at both the modality and feature levels also remains limited. To address these challenges, we propose TF-DWGNet, a novel Graph Neural Network framework that combines tree-based Directed Weighted graph construction with Tensor Fusion for multiclass cancer subtype classification. TF-DWGNet introduces two key innovations: a supervised tree-based approach for constructing directed, weighted graphs tailored to each omics modality, and a tensor fusion mechanism that captures unimodal, bimodal, and trimodal interactions using low-rank decomposition for efficiency. TF-DWGNet enables modality-specific representation learning, joint embedding fusion, and interpretable subtype prediction. Experiments on real-world cancer datasets show that TF-DWGNet consistently outperforms state-of-the-art baselines across multiple metrics and statistical tests. Moreover, it provides biologically meaningful insights by ranking influential features and modalities. These results highlight TF-DWGNet’s potential for effective and interpretable multi-omics integration in cancer research.

多组学数据的整合与分析为癌症亚型分类提供了宝贵的见解。然而，此类数据本质上是异质性的、高维的，并表现出复杂的内部和外部模态依赖性。图神经网络（GNNs）的最新进展为建模此类结构提供了强大的工具。然而，大多数现有方法依赖于先验知识或预定义的相似性网络来构建图，这些图通常是无方向或无权重的，无法捕捉生物相互作用的方向性和强度。同时在模态和特征层面的可解释性仍然有限。

论文及项目相关链接

PDF 9 pages, 4 figures, 4 tables

Summary

本文提出了TF-DWGNet，一种结合树结构导向加权图与张量融合的新型图神经网络框架，用于多类癌症亚型分类。TF-DWGNet解决多组学数据集成中的挑战，包括构建针对每种组学模态的有监督的基于树的定向加权图，以及捕捉单模态、双模态和三模态交互的张量融合机制。实验表明，TF-DWGNet在真实世界癌症数据集上的表现优于现有技术，并提供了具有生物学意义的信息。

Key Takeaways

TF-DWGNet是一个新型的图神经网络框架，用于多类癌症亚型分类。
它结合了树结构导向加权图与张量融合机制，处理多组学数据的异质性、高维性以及复杂的内外模态依赖问题。
TF-DWGNet通过构建针对每个组学模态的定向加权图，解决了现有方法依赖先验知识或预设相似性网络的问题。
它引入了张量融合机制，能够捕捉单模态、双模态和三模态的交互作用，并通过低秩分解提高效率。
TF-DWGNet实现了模态特定的表示学习、联合嵌入融合和可解释的亚型预测。
在真实世界癌症数据集上的实验表明，TF-DWGNet的表现优于现有技术。

Cool Papers

点此查看论文截图

R-Net: A Reliable and Resource-Efficient CNN for Colorectal Cancer Detection with XAI Integration

Authors:Rokonozzaman Ayon, Md Taimur Ahad, Bo Song, Yan Li

State-of-the-art (SOTA) Convolutional Neural Networks (CNNs) are criticized for their extensive computational power, long training times, and large datasets. To overcome this limitation, we propose a reasonable network (R-Net), a lightweight CNN only to detect and classify colorectal cancer (CRC) using the Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset (EBHI). Furthermore, six SOTA CNNs, including Multipath-based CNNs (DenseNet121, ResNet50), Depth-based CNNs (InceptionV3), width-based multi-connection CNNs (Xception), depth-wise separable convolutions (MobileNetV2), spatial exploitation-based CNNs (VGG16), Transfer learning, and two ensemble models are also tested on the same dataset. The ensemble models are a multipath-depth-width combination (DenseNet121-InceptionV3-Xception) and a multipath-depth-spatial combination (ResNet18-InceptionV3-VGG16). However, the proposed R-Net lightweight achieved 99.37% accuracy, outperforming MobileNet (95.83%) and ResNet50 (96.94%). Most importantly, to understand the decision-making of R-Net, Explainable AI such as SHAP, LIME, and Grad-CAM are integrated to visualize which parts of the EBHI image contribute to the detection and classification process of R-Net. The main novelty of this research lies in building a reliable, lightweight CNN R-Net that requires fewer computing resources yet maintains strong prediction results. SOTA CNNs, transfer learning, and ensemble models also extend our knowledge on CRC classification and detection. XAI functionality and the impact of pixel intensity on correct and incorrect classification images are also some novelties in CRC detection and classification.

最先进的卷积神经网络（CNNs）因其巨大的计算力、漫长的训练时间和庞大的数据集而受到批评。为了克服这一局限，我们提出了一种合理的网络（R-Net），这是一个轻量级的CNN，仅用于使用胃肠镜活检组织病理学苏木精和伊红图像数据集（EBHI）检测和分类结直肠癌（CRC）。此外，还在同一数据集上测试了六种最先进的CNN，包括基于多路径的CNN（DenseNet121、ResNet50）、基于深度的CNN（InceptionV3）、基于宽度的多连接CNN（Xception）、深度可分离卷积（MobileNetV2）、基于空间利用的CNN（VGG16）、迁移学习以及两种集成模型。集成模型是一种多路径深度宽度组合（DenseNet121-InceptionV3-Xception）和一种多路径深度空间组合（ResNet18-InceptionV3-VGG16）。然而，所提出的R-Net轻型网络达到了99.37%的准确率，超过了MobileNet（95.83%）和ResNet50（96.94%）。最重要的是，为了理解R-Net的决策过程，我们集成了可解释的AI，如SHAP、LIME和Grad-CAM，以可视化EBHI图像的哪些部分对R-Net的检测和分类过程有所贡献。这项研究的主要新颖之处在于建立了一个可靠且轻量级的CNN R-Net，它需要的计算资源较少，但预测结果仍然强大。最先进的CNN、迁移学习和集成模型也扩展了我们对CRC分类和检测的认识。可解释的AI功能和像素强度对正确和错误分类图像的影响也是CRC检测和分类中的一些新颖之处。

论文及项目相关链接

PDF

Summary

该研究针对现有的卷积神经网络（CNN）在处理结直肠癌（CRC）检测与分类时存在的问题，提出了一种轻量级的CNN网络（R-Net）。相比于其他复杂模型如DenseNet、ResNet等，R-Net在相同数据集上取得了更高的准确性，达到99.37%。研究还融合了可解释人工智能（Explainable AI）技术来分析R-Net的决策过程。此研究的主要创新点在于构建了一个可靠的轻量级CNN网络，既减少了计算资源需求，又保持了出色的预测性能。同时，研究也拓展了我们对CRC检测和分类的认识。

Key Takeaways

研究针对现有CNN在处理结直肠癌检测与分类时的计算量大、训练时间长和数据集需求大的问题，提出了一种轻量级CNN网络（R-Net）。
R-Net在Enteroscope Biopsy Histopathological Hematoxylin和Eosin Image数据集（EBHI）上取得了高达99.37%的准确率，超过了其他复杂模型如MobileNet和ResNet等。
研究结合了可解释人工智能（Explainable AI）技术来分析R-Net的决策过程，揭示了模型对图像哪些部分的依赖。
该研究的主要创新点在于构建了一个既可靠又轻量级的CNN网络，降低了计算资源需求但保持了高预测性能。
研究展示了多种先进CNN、迁移学习和集成模型在CRC检测与分类的应用，拓展了相关知识领域。

Cool Papers

点此查看论文截图

A study on Deep Convolutional Neural Networks, transfer learning, and Mnet model for Cervical Cancer Detection

Authors:Saifuddin Sagor, Md Taimur Ahad, Faruk Ahmed, Rokonozzaman Ayon, Sanzida Parvin

Early and accurate detection through Pap smear analysis is critical to improving patient outcomes and reducing mortality of Cervical cancer. State-of-the-art (SOTA) Convolutional Neural Networks (CNNs) require substantial computational resources, extended training time, and large datasets. In this study, a lightweight CNN model, S-Net (Simple Net), is developed specifically for cervical cancer detection and classification using Pap smear images to address these limitations. Alongside S-Net, six SOTA CNNs were evaluated using transfer learning, including multi-path (DenseNet201, ResNet152), depth-based (Serasnet152), width-based multi-connection (Xception), depth-wise separable convolutions (MobileNetV2), and spatial exploitation-based (VGG19). All models, including S-Net, achieved comparable accuracy, with S-Net reaching 99.99%. However, S-Net significantly outperforms the SOTA CNNs in terms of computational efficiency and inference time, making it a more practical choice for real-time and resource-constrained applications. A major limitation in CNN-based medical diagnosis remains the lack of transparency in the decision-making process. To address this, Explainable AI (XAI) techniques, such as SHAP, LIME, and Grad-CAM, were employed to visualize and interpret the key image regions influencing model predictions. The novelty of this study lies in the development of a highly accurate yet computationally lightweight model (S-Net) caPable of rapid inference while maintaining interpretability through XAI integration. Furthermore, this work analyzes the behavior of SOTA CNNs, investigates the effects of negative transfer learning on Pap smear images, and examines pixel intensity patterns in correctly and incorrectly classified samples.

通过涂片分析进行早期和准确的检测对于改善宫颈癌患者预后和降低死亡率至关重要。现有的先进技术（SOTA）卷积神经网络（CNN）需要大量的计算资源、较长的训练时间和大规模数据集。本研究针对这些限制，开发了一种用于宫颈癌检测和分类的轻量级CNN模型S-Net（Simple Net），该模型使用涂片图像进行检测。除S-Net外，还利用迁移学习评估了六种SOTA CNN，包括多路（DenseNet201、ResNet152）、基于深度（Serasnet152）、基于宽度的多连接（Xception）、深度可分离卷积（MobileNetV2）和基于空间利用（VGG19）。所有模型，包括S-Net在内，均达到相近的准确性，其中S-Net达到99.99%。然而，在运算效率和推理时间上，S-Net显著优于SOTA CNN，使其成为实时和资源受限应用更实用的选择。CNN在医学诊断中的一个主要局限性在于决策过程缺乏透明度。为解决这一问题，采用了可解释的AI（XAI）技术，如SHAP、LIME和Grad-CAM，以可视化和解释影响模型预测的关键图像区域。本研究的创新之处在于开发了一种高度准确但计算量小的模型（S-Net），能够在保持XAI集成解释性的同时进行快速推理。此外，本工作还分析了SOTA CNN的行为，研究了负迁移学习对涂片图像的影响，并探讨了正确和错误分类样本中的像素强度模式。

论文及项目相关链接

PDF

Summary
本研究开发了一种轻量级的CNN模型（S-Net）用于宫颈癌的检测和分类。相较于其他先进CNN模型，S-Net在巴氏涂片图像分析中具有更高的计算效率和更快的推理时间，同时保持了高准确性。研究还利用可解释性AI技术解析了模型决策过程的关键图像区域。

Key Takeaways

研究强调了早期准确检测宫颈癌的重要性，以及通过巴氏涂片分析在改善患者预后和降低死亡率方面的作用。
提出了一种轻量级的CNN模型（S-Net），专门用于宫颈癌的检测和分类，解决了现有CNN模型计算资源需求大、训练时间长和数据集需求大的问题。
S-Net在巴氏涂片图像分析中的准确率达到了99.99%，并且相较于其他先进CNN模型具有更高的计算效率和更快的推理时间。
可解释性AI技术被用于解析模型决策过程的关键图像区域，增强了模型的透明度。
本研究分析了多种先进CNN模型在巴氏涂片图像分析中的表现，探讨了负迁移学习的影响，并研究了正确和错误分类样本的像素强度模式。
S-Net模型结合了高准确性、计算轻量性和推理速度快的优势，适合用于实时和资源受限的应用场景。

Cool Papers

点此查看论文截图

ENSAM: an efficient foundation model for interactive segmentation of 3D medical images

Authors:Elias Stenhede, Agnar Martin Bjørnstad, Arian Ranjbar

We present ENSAM (Equivariant, Normalized, Segment Anything Model), a lightweight and promptable model for universal 3D medical image segmentation. ENSAM combines a SegResNet-based encoder with a prompt encoder and mask decoder in a U-Net-style architecture, using latent cross-attention, relative positional encoding, normalized attention, and the Muon optimizer for training. ENSAM is designed to achieve good performance under limited data and computational budgets, and is trained from scratch on under 5,000 volumes from multiple modalities (CT, MRI, PET, ultrasound, microscopy) on a single 32 GB GPU in 6 hours. As part of the CVPR 2025 Foundation Models for Interactive 3D Biomedical Image Segmentation Challenge, ENSAM was evaluated on hidden test set with multimodal 3D medical images, obtaining a DSC AUC of 2.404, NSD AUC of 2.266, final DSC of 0.627, and final NSD of 0.597, outperforming two previously published baseline models (VISTA3D, SAM-Med3D) and matching the third (SegVol), surpassing its performance in final DSC but trailing behind in the other three metrics. In the coreset track of the challenge, ENSAM ranks 5th of 10 overall and best among the approaches not utilizing pretrained weights. Ablation studies confirm that our use of relative positional encodings and the Muon optimizer each substantially speed up convergence and improve segmentation quality.

我们提出了ENSAM（等价、归一化、分割任何模型），这是一个用于通用3D医学图像分割的轻便且可提示的模型。ENSAM结合了基于SegResNet的编码器、提示编码器和掩膜解码器，采用U-Net风格的架构，使用潜在交叉注意力、相对位置编码、归一化注意力和Muon优化器进行训练。ENSAM旨在有限的数据和计算预算下实现良好的性能，在单个32GB GPU上，使用不到5000个来自多种模态（CT、MRI、PET、超声、显微镜）的体积数据进行从头训练，仅需6小时。作为CVPR 2025交互式3D生物医学图像分割挑战赛的基础模型的一部分，ENSAM在具有多模态3D医学图像的隐藏测试集上进行了评估，获得了DSC AUC 2.404、NSD AUC 2.266、最终DSC 0.627和最终NSD 0.597的成绩，优于之前发表的两个基线模型（VISTA3D、SAM-Med3D）并匹配了第三名（SegVol）在最终DSC上的表现，但在其他三个指标上稍逊一筹。在挑战的核心集赛道中，ENSAM在10个模型中排名第5，且在未使用预训练权重的方法中表现最佳。消融研究证实，我们使用的相对位置编码和Muon优化器各自显著加快了收敛速度并提高了分割质量。

论文及项目相关链接

PDF

摘要

本文介绍了ENSAM模型，这是一种用于通用3D医学图像分割的轻便、快速响应的模型。ENSAM结合SegResNet编码器、提示编码器和掩膜解码器，采用U-Net风格的架构，并利用潜在交叉注意力、相对位置编码、归一化注意力和Muon优化器进行训练。ENSAM旨在实现在有限数据和计算预算下实现良好性能，在单个32GB GPU上，使用多种模态（CT、MRI、PET、超声、显微镜）在不到5,000个体积的数据集上进行训练只需6小时。在CVPR 2025交互式3D生物医学图像分割挑战的基础模型评估中，ENSAM在隐藏测试集上进行多模态3D医学图像评估，获得了DSC AUC为2.404等评价指标，相较于先前发表的两个基线模型（VISTA3D、SAM-Med3D）表现更佳，并匹配了第三名SegVol的性能，在最终DSC上表现超越SegVol但在其他三个指标上稍逊一筹。在挑战的核心集赛道中，ENSAM位列第五名。消融研究证实，我们使用的相对位置编码和Muon优化器分别显著加快了收敛速度并提高了分割质量。

关键见解

ENSAM是一个用于通用3D医学图像分割的轻便、快速响应的模型。
ENSAM结合多种技术，包括SegResNet编码器、提示编码器、掩膜解码器、潜在交叉注意力、相对位置编码和归一化注意力。
ENSAM在有限数据和计算预算下表现出良好的性能。
在CVPR 2025挑战中，ENSAM在隐藏测试集上获得了良好的评价指标结果，相较于其他模型有优势也有劣势。
ENSAM在挑战的核心集赛道中排名第五。
消融研究证实相对位置编码和Muon优化器对模型性能有积极影响。

Cool Papers

点此查看论文截图

Uncertainty-Gated Deformable Network for Breast Tumor Segmentation in MR Images

Authors:Yue Zhang, Jiahua Dong, Chengtao Peng, Qiuli Wang, Dan Song, Guiduo Duan

Accurate segmentation of breast tumors in magnetic resonance images (MRI) is essential for breast cancer diagnosis, yet existing methods face challenges in capturing irregular tumor shapes and effectively integrating local and global features. To address these limitations, we propose an uncertainty-gated deformable network to leverage the complementary information from CNN and Transformers. Specifically, we incorporates deformable feature modeling into both convolution and attention modules, enabling adaptive receptive fields for irregular tumor contours. We also design an Uncertainty-Gated Enhancing Module (U-GEM) to selectively exchange complementary features between CNN and Transformer based on pixel-wise uncertainty, enhancing both local and global representations. Additionally, a Boundary-sensitive Deep Supervision Loss is introduced to further improve tumor boundary delineation. Comprehensive experiments on two clinical breast MRI datasets demonstrate that our method achieves superior segmentation performance compared with state-of-the-art methods, highlighting its clinical potential for accurate breast tumor delineation.

在磁共振图像（MRI）中对乳腺癌肿瘤进行精确分割对于乳腺癌诊断至关重要。然而，现有方法在捕捉不规则肿瘤形状以及有效地整合局部和全局特征方面面临挑战。为了解决这些局限性，我们提出了一种不确定性门控可变形网络，以利用CNN和Transformer的互补信息。具体来说，我们将可变形特征建模融入卷积和注意力模块，为不规则肿瘤轮廓提供自适应感受野。我们还设计了一个不确定性门控增强模块（U-GEM），该模块根据像素级不确定性有选择地交换CNN和Transformer之间的互补特征，增强了局部和全局表示。此外，还引入了一种边界敏感深度监督损失，以进一步改善肿瘤边界的勾勒。在两个临床乳腺癌MRI数据集上的综合实验表明，我们的方法与最新方法相比，在分割性能上取得了优势，突显了其在准确描绘乳腺癌肿瘤方面的临床潜力。

论文及项目相关链接

PDF 5 pages, 2 figures

Summary
医学磁共振成像中乳腺癌肿瘤的精确分割对诊断至关重要。现有方法面临捕捉肿瘤不规则形状和有效整合局部与全局特征的挑战。为此，我们提出一种不确定性门控可变形网络，结合卷积神经网络（CNN）和Transformer的互补信息。通过引入可变形特征建模和不确定性门控增强模块（U-GEM），该网络可自适应地处理不规则肿瘤边界，同时增强局部和全局特征表示。此外，我们还引入了边界敏感深度监督损失，以进一步提高肿瘤边界的描绘效果。在多个临床乳腺MRI数据集上的实验表明，我们的方法相较于其他前沿技术，具有更出色的分割性能，显示出其在准确描绘乳腺癌肿瘤方面的临床潜力。

Key Takeaways

乳腺癌诊断中，MRI图像中肿瘤的精确分割非常重要。
现有分割方法在处理不规则肿瘤形状和整合局部与全局特征时存在挑战。
提出了一种不确定性门控可变形网络，结合CNN和Transformer的优势。
通过可变形特征建模，网络能自适应处理不规则肿瘤边界。
U-GEM模块可选择性交换CNN和Transformer之间的互补特征，基于像素级不确定性。
引入边界敏感深度监督损失，改善肿瘤边界描绘效果。

Cool Papers

点此查看论文截图

pFedSAM: Personalized Federated Learning of Segment Anything Model for Medical Image Segmentation

Authors:Tong Wang, Xingyue Zhao, Linghao Zhuang, Haoyu Zhao, Jiayi Yin, Yuyang He, Gang Yu, Bo Lin

Medical image segmentation is crucial for computer-aided diagnosis, yet privacy constraints hinder data sharing across institutions. Federated learning addresses this limitation, but existing approaches often rely on lightweight architectures that struggle with complex, heterogeneous data. Recently, the Segment Anything Model (SAM) has shown outstanding segmentation capabilities; however, its massive encoder poses significant challenges in federated settings. In this work, we present the first personalized federated SAM framework tailored for heterogeneous data scenarios in medical image segmentation. Our framework integrates two key innovations: (1) a personalized strategy that aggregates only the global parameters to capture cross-client commonalities while retaining the designed L-MoE (Localized Mixture-of-Experts) component to preserve domain-specific features; and (2) a decoupled global-local fine-tuning mechanism that leverages a teacher-student paradigm via knowledge distillation to bridge the gap between the global shared model and the personalized local models, thereby mitigating overgeneralization. Extensive experiments on two public datasets validate that our approach significantly improves segmentation performance, achieves robust cross-domain adaptation, and reduces communication overhead.

医学图像分割对计算机辅助诊断至关重要，但隐私约束阻碍了机构间的数据共享。联合学习解决了这一限制，但现有方法通常依赖于处理复杂、异构数据能力不足的轻型架构。最近，Segment Anything Model（SAM）表现出了出色的分割能力；然而，其庞大的编码器在联合设置中构成了重大挑战。在这项工作中，我们首次推出了针对医学图像分割中异构数据场景的个性化联邦SAM框架。我们的框架集成了两个关键创新点：（1）个性化策略，只聚合全局参数以捕获跨客户端的共性，同时保留设计的L-MoE（局部混合专家）组件以保持领域特定特征；（2）解耦的全局-局部微调机制，利用教师-学生范式通过知识蒸馏来缩小全局共享模型和个性化本地模型之间的差距，从而缓解过度泛化。在两个公共数据集上的大量实验验证了我们的方法显著提高了分割性能，实现了稳健的跨域适应，并降低了通信开销。

论文及项目相关链接

PDF 5 pages

Summary
医学图像分割对于计算机辅助诊断至关重要，但隐私约束限制了跨机构的数据共享。联邦学习解决了这一局限性，但现有方法通常依赖于处理复杂、异构数据方面的轻型架构。最近，Segment Anything Model（SAM）展现出出色的分割能力，但其庞大的编码器在联邦设置中构成挑战。本研究首次提出针对医学图像分割中异构数据场景的个性化联邦SAM框架。该框架集成两项关键创新：一是仅聚合全局参数的个性化策略，以捕捉跨客户端的共性并保留设计的L-MoE（局部混合专家）组件以保留领域特定特征；二是采用解耦的全局-局部微调机制，通过知识蒸馏的教师-学生范式缩小全局共享模型和个性化本地模型之间的差距，从而缓解过度泛化问题。在公共数据集上的大量实验验证，该方法显著提高分割性能，实现稳健的跨域适应，并减少通信开销。

Key Takeaways

医学图像分割在计算机辅助诊断中的重要性以及隐私约束对数据共享的影响。
联邦学习为解决跨机构数据共享问题提供了一种解决方案。
现有方法在处理复杂、异构数据时面临的挑战。
Segment Anything Model（SAM）在医学图像分割中的出色表现及其面临的挑战。
个性化联邦SAM框架的提出，针对异构数据场景进行定制。
框架中两个关键创新点：个性化策略与解耦的全局-局部微调机制。

Cool Papers

点此查看论文截图

VocSegMRI: Multimodal Learning for Precise Vocal Tract Segmentation in Real-time MRI

Authors:Daiqi Liu, Tomás Arias-Vergara, Johannes Enk, Fangxu Xing, Maureen Stone, Jerry L. Prince, Jana Hutter, Andreas Maier, Jonghye Woo, Paula Andrea Pérez-Toro

Accurately segmenting articulatory structures in real-time magnetic resonance imaging (rtMRI) remains challenging, as most existing methods rely almost entirely on visual cues. Yet synchronized acoustic and phonological signals provide complementary context that can enrich visual information and improve precision. In this paper, we introduce VocSegMRI, a multimodal framework that integrates video, audio, and phonological inputs through cross-attention fusion for dynamic feature alignment. To further enhance cross-modal representation, we incorporate a contrastive learning objective that improves segmentation performance even when the audio modality is unavailable at inference. Evaluated on a sub-set of USC-75 rtMRI dataset, our approach achieves state-of-the-art performance, with a Dice score of 0.95 and a 95th percentile Hausdorff Distance (HD_95) of 4.20 mm, outperforming both unimodal and multimodal baselines. Ablation studies confirm the contributions of cross-attention and contrastive learning to segmentation precision and robustness. These results highlight the value of integrative multimodal modeling for accurate vocal tract analysis.

实时磁共振成像（rtMRI）中精确分割发音结构仍然是一个挑战，因为大多数现有方法几乎完全依赖于视觉线索。然而，同步的声学和语音信号提供了丰富的上下文信息，可以丰富视觉信息并提高精度。在本文中，我们介绍了VocSegMRI，这是一个多模式框架，通过跨注意融合整合视频、音频和语音输入，以实现动态特征对齐。为了进一步增强跨模式表示，我们采用了对比学习目标，即使在推理阶段音频模式不可用的情况下，也能提高分割性能。在USC-75 rtMRI数据集的一个子集上进行了评估，我们的方法达到了最先进的性能，Dice得分为0.95，95%百分位Hausdorff距离（HD_95）为4.20毫米，超过了单模态和多模态基线。消融研究证实了跨注意力和对比学习对分割精度和稳健性的贡献。这些结果突显了整合多模式建模在准确分析声带结构中的重要性。

论文及项目相关链接

PDF Preprint submitted to ICASSP

Summary

该论文提出了一种新的多模态框架VocSegMRI，它结合了视频、音频和语音学输入，通过跨注意力融合实现动态特征对齐，以提高实时磁共振成像（rtMRI）中对发音结构的准确分割。该框架使用对比学习来提高分割性能，甚至在音频模式不可用的情况下也能实现优越表现。在USC-75 rtMRI数据集的一个子集上评估，该方法实现了最高水平的性能，Dice得分为0.95，Hausdorff Distance（HD_95）的95th百分位数为4.20毫米。

Key Takeaways

VocSegMRI是一个多模态框架，结合了视频、音频和语音学输入。
通过跨注意力融合实现动态特征对齐，以提高rtMRI中发音结构的分割准确性。
使用对比学习来提高分割性能，即使在音频模式不可用的情况下也能保持优越表现。
在USC-75 rtMRI数据集的一个子集上评估，该方法实现了最高水平的性能。
Dice得分为0.95，显示出高精确度。
Hausdorff Distance（HD_95）的95th百分位数为4.20毫米，表明分割结果的边界定位准确。

Cool Papers

点此查看论文截图

PPORLD-EDNetLDCT: A Proximal Policy Optimization-Based Reinforcement Learning Framework for Adaptive Low-Dose CT Denoising

Authors:Debopom Sutradhar, Ripon Kumar Debnath, Mohaimenul Azam Khan Raiaan, Yan Zhang, Reem E. Mohamed, Sami Azam

Low-dose computed tomography (LDCT) is critical for minimizing radiation exposure, but it often leads to increased noise and reduced image quality. Traditional denoising methods, such as iterative optimization or supervised learning, often fail to preserve image quality. To address these challenges, we introduce PPORLD-EDNetLDCT, a reinforcement learning-based (RL) approach with Encoder-Decoder for LDCT. Our method utilizes a dynamic RL-based approach in which an advanced posterior policy optimization (PPO) algorithm is used to optimize denoising policies in real time, based on image quality feedback, trained via a custom gym environment. The experimental results on the low dose CT image and projection dataset demonstrate that the proposed PPORLD-EDNetLDCT model outperforms traditional denoising techniques and other DL-based methods, achieving a peak signal-to-noise ratio of 41.87, a structural similarity index measure of 0.9814 and a root mean squared error of 0.00236. Moreover, in NIH-AAPM-Mayo Clinic Low Dose CT Challenge dataset our method achieved a PSNR of 41.52, SSIM of 0.9723 and RMSE of 0.0051. Furthermore, we validated the quality of denoising using a classification task in the COVID-19 LDCT dataset, where the images processed by our method improved the classification accuracy to 94%, achieving 4% higher accuracy compared to denoising without RL-based denoising.

低剂量计算机断层扫描（LDCT）对于最小化辐射暴露至关重要，但通常会导致噪声增加和图像质量下降。传统的降噪方法，如迭代优化或监督学习，往往无法保持图像质量。为了解决这些挑战，我们引入了PPWORLD-EDNetLDCT，这是一种基于强化学习（RL）的LDCT编码器-解码器方法。我们的方法采用动态RL方法，使用先进的后验策略优化（PPO）算法，根据图像质量反馈实时优化降噪策略，并通过自定义的gym环境进行训练。在低剂量CT图像和投影数据集上的实验结果表明，所提出的PPWORLD-EDNetLDCT模型优于传统降噪技术和其他深度学习方法，达到峰值信噪比41.87，结构相似性指数测量值为0.9814，均方根误差为0.00236。此外，在NIH-AAPM-Mayo Clinic低剂量CT挑战赛数据集上，我们的方法达到了PSNR 41.52、SSIM 0.9723和RMSE 0.0051。此外，我们在COVID-19 LDCT数据集中通过分类任务验证了降噪质量，使用我们方法处理的图像提高了分类精度至94%，与未使用RL的降噪相比，提高了4%的准确率。

论文及项目相关链接

PDF 20 pages, 5 figures, 5 tables

摘要

本研究采用基于强化学习（RL）的方法来解决低剂量计算机断层扫描（LDCT）中的噪声问题，提出了PPWORLD-EDNetLDCT模型。该模型采用动态RL策略优化去噪策略，根据图像质量反馈实时调整，并通过自定义的gym环境进行训练。实验结果显示，相较于传统去噪技术和其他深度学习（DL）方法，PPWORLD-EDNetLDCT模型在噪声消除方面表现更优秀，信号峰值比达到41.87，结构相似性指数测量值为0.9814，均方根误差为0.00236。在NIH-AAPM-Mayo Clinic低剂量CT挑战数据集上，其表现同样出色。此外，在COVID-19 LDCT数据集上进行分类任务验证时，使用该方法处理的图像分类准确率提升至94%，相较于非RL去噪方法提高了4%。

关键见解

PPWORLD-EDNetLDCT模型采用强化学习（RL）解决LDCT中的噪声问题。
模型采用动态RL策略优化去噪策略，根据图像质量反馈实时调整。
实验结果显示该模型相较于传统去噪技术和其他深度学习（DL）方法有更好的去噪效果。
在多个数据集上的实验验证了模型的性能。
该模型提高了COVID-19 LDCT数据集的分类准确率。
使用RL方法在去噪过程中有助于提高图像质量。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-09-24/%E5%8C%BB%E5%AD%A6%E5%9B%BE%E5%83%8F/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

医学图像

TTS

TTS 方向最新论文已更新，请持续关注 Update in 2025-09-24 TMD-TTS A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation

2025-09-24 TTS

TTS

Diffusion Models

Diffusion Models 方向最新论文已更新，请持续关注 Update in 2025-09-24 Seg4Diff Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers

2025-09-24 Diffusion Models

Diffusion Models