发布日期: 2025-11-08

更新日期: 2025-11-27

文章字数: 20.9k

阅读时长: 86 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-08 更新

MedSapiens: Taking a Pose to Rethink Medical Imaging Landmark Detection

Authors:Marawan Elbatel, Anbang Wang, Keyuan Liu, Kaouther Mouheb, Enrique Almar-Munoz, Lizhuo Lin, Yanqi Yang, Karim Lekadir, Xiaomeng Li

This paper does not introduce a novel architecture; instead, it revisits a fundamental yet overlooked baseline: adapting human-centric foundation models for anatomical landmark detection in medical imaging. While landmark detection has traditionally relied on domain-specific models, the emergence of large-scale pre-trained vision models presents new opportunities. In this study, we investigate the adaptation of Sapiens, a human-centric foundation model designed for pose estimation, to medical imaging through multi-dataset pretraining, establishing a new state of the art across multiple datasets. Our proposed model, MedSapiens, demonstrates that human-centric foundation models, inherently optimized for spatial pose localization, provide strong priors for anatomical landmark detection, yet this potential has remained largely untapped. We benchmark MedSapiens against existing state-of-the-art models, achieving up to 5.26% improvement over generalist models and up to 21.81% improvement over specialist models in the average success detection rate (SDR). To further assess MedSapiens adaptability to novel downstream tasks with few annotations, we evaluate its performance in limited-data settings, achieving 2.69% improvement over the few-shot state of the art in SDR. Code and model weights are available at https://github.com/xmed-lab/MedSapiens .

本文并未介绍新型架构，而是重新关注一个基础却被忽视的基线：适应以人类为中心的基线模型，用于医学成像中的解剖地标检测。虽然地标检测传统上依赖于特定领域的模型，但大规模预训练视觉模型的出现带来了新的机遇。在这项研究中，我们调查了人类为中心的基础模型Sapiens在医学成像中的多数据集预训练适应性，该模型是为姿态估计而设计的，通过多数据集预训练，在多个数据集上建立了新的最先进的水平。我们提出的模型MedSapiens表明，以人类为中心、针对空间姿态定位进行优化的基础模型为解剖地标检测提供了强大的先验知识，但这种潜力在很大程度上尚未被开发。我们将MedSapiens与现有的最先进的模型进行了基准测试，在平均成功检测率（SDR）上，与一般模型相比提高了5.26%，与专业模型相比提高了高达21.81%。为了进一步评估MedSapiens对具有少量注释的新下游任务的适应性，我们在有限数据环境中评估了其性能，在SDR上实现了对现有技术的2.69%的提升。代码和模型权重可在https://github.com/xmed-lab/MedSapiens找到。

论文及项目相关链接

PDF

Summary
本文不引入新型架构，而是重新关注一个基础却被忽视的领域：适应以人为中心的基础模型，用于医学成像中的解剖地标检测。本研究调查了Sapiens模型在医学成像中的适应情况，通过多数据集预训练，建立了一个新的基准。所提出的MedSapiens模型证明，以人为中心的基础模型对空间姿势定位进行了优化，为解剖地标检测提供了强有力的先验知识。与现有最先进的模型相比，MedSapiens在平均成功检测率（SDR）上提高了最多5.26%和高达21.81%。此外，该模型在少量标注的新下游任务上表现出良好的适应性。

Key Takeaways

该研究重新关注以人为中心的基础模型在医学成像中的潜力，特别是用于解剖地标检测。
MedSapiens模型是通过多数据集预训练而建立的，展示了一种新的基准方法。
MedSapiens利用人体姿势估计进行优化，为解剖地标检测提供强有力的先验知识。
与现有模型相比，MedSapiens在平均成功检测率（SDR）上有显著提高。
MedSapiens在数据标注有限的情况下展现出良好的性能。
该研究强调了利用大型预训练视觉模型的机会，尤其是将其适应于医学成像领域的机会。

Cool Papers

点此查看论文截图

Hadronic Processes in Advection-Dominated Accretion Flow as the Origin of TeV Excesses in BL Lac Objects

Authors:Ji-Shun Lian, Ze-Rui Wang, Jin Zhang

The spectral energy distributions (SEDs) of certain BL Lac objects (BL Lacs) exhibit an additional hard $\gamma$-ray component in the TeV energy range that surpasses the predictions of the one-zone leptonic jet model. The origin of this excess emission remains unclear. In this study, we selected five BL Lacs whose SEDs display a very hard intrinsic spectrum in the TeV band and successfully reproduced their broadband SEDs using a two-zone lepto-hadronic model. Within this framework, the emission observed in the optical, X-ray, GeV $\gamma$-ray, and sub-TeV $\gamma$-ray bands is modeled using the synchrotron and synchrotron self-Compton radiation processes of the relativistic electrons in the jets. Meanwhile, the TeV excess is attributed to $\gamma$-ray emission resulting from the photomeson ($p\gamma$) process via $\pi^0$ decay occurring within advection-dominated accretion flows (ADAFs). This scenario requires a hard proton spectrum with a spectral index of $p \sim 1.6-1.7$ and a cutoff energy ranging from 30 to 90 TeV, as well as a relatively large ADAF radius. Such hard proton spectra suggest that the dominant acceleration mechanisms are likely magnetic reconnection and/or stochastic acceleration processes within ADAFs. Additionally, the emission from the cascaded electrons results in a bump in the keV–MeV band; however, it is overwhelmed by the jet emission. Although the hadronuclear ($pp$) process cannot be entirely ruled out, it would necessitate an even harder proton spectrum and a higher cutoff energy compared to the $p\gamma$ process, making it a less favorable explanation for the observed TeV excess.

某些BL Lac天体（BL Lacs）的谱能量分布（SEDs）在TeV能量范围内表现出额外的硬γ射线成分，这超出了单区leptonic jet模型的预测。这种过量发射的起源仍然不清楚。在这项研究中，我们选择了五个SEDs在TeV波段表现出非常硬的固有谱的BL Lac，并成功地使用两区lepto-hadronic模型再现了它们的宽带SEDs。在此框架内，通过jet中的相对论性电子的同步加速和同步自康普顿辐射过程，对光学、X射线、伽马射线以及亚TeV伽马射线波段观测到的发射进行了建模。同时，将TeV过剩归因于光介子（pγ）过程的伽马射线发射，这是通过advection-dominated accretion flows（ADAFs）内的π°衰变发生的。这种情况需要具有谱指数p～1.6-1.7和截止能量范围在30到90 TeV之间的硬质子谱，以及相对较大的ADAF半径。这种硬质子谱表明，主要的加速机制可能是ADAFs中的磁重联和/或随机加速过程。此外，来自级联电子的发射在keV-MeV波段产生了一个凸起，但它被jet发射所淹没。虽然核子（pp）过程不能完全排除，但它需要更硬的质子谱和更高的截止能量与pγ过程相比，这使得它成为解释观察到的TeV过剩的更不合适的选项。

论文及项目相关链接

PDF 13 pages, 6 Figures, 1 Table. Accepted for publication in ApJ

Summary
某些BL Lac天体（BL Lacs）的谱能量分布（SEDs）在TeV能量范围内表现出超出单区莱普顿喷射模型的预测之外的硬γ射线成分。本研究选用五个SEDs在TeV波段具有非常硬本底谱的BL Lacs，并使用两区莱普托哈德模型成功复现了其宽带SEDs。其中，光学、X射线、GeV γ射线和亚TeV γ射线波段内的发射被建模为喷射中的相对论性电子的同步辐射和同步自康普顿辐射过程。而TeV过剩则被归因于光介子（pγ）过程中π^0衰变产生的γ射线发射，该过程发生在ADAFs中。这一情景需要具有谱指数p~1.6-1.7和截断能量在30到90TeV之间的硬质子谱，以及相对较大的ADAF半径。此外，级联电子的发射会在keV-MeV波段产生一个凸起，但被喷射发射所掩盖。尽管核子（pp）过程不能完全排除，但它需要更硬的质子谱和更高的截断能量，因此相比pγ过程，它对于观察到的TeV过剩的解释不太有利。

Key Takeaways

BL Lac天体的谱能量分布表现出TeV能量范围内的硬γ射线成分，超出单区莱普顿喷射模型的预测。
两区莱普托哈德模型成功复现了具有硬本底谱的BL Lacs的宽带SEDs。
光学、X射线、GeV γ射线和亚TeV γ射线波段的发射可通过同步辐射和同步自康普顿辐射过程建模。
TeV过剩归因于ADAFs中的光介子（pγ）过程产生的γ射线发射。
此情景需要具有特定谱指数和截断能量的硬质子谱，以及较大的ADAF半径。
级联电子在keV-MeV波段的发射表现为凸起，但被喷射发射所掩盖。

Cool Papers

点此查看论文截图

Covariance Descriptors Meet General Vision Encoders: Riemannian Deep Learning for Medical Image Classification

Authors:Josef Mayr, Anna Reithmeir, Maxime Di Folco, Julia A. Schnabel

Covariance descriptors capture second-order statistics of image features. They have shown strong performance in general computer vision tasks, but remain underexplored in medical imaging. We investigate their effectiveness for both conventional and learning-based medical image classification, with a particular focus on SPDNet, a classification network specifically designed for symmetric positive definite (SPD) matrices. We propose constructing covariance descriptors from features extracted by pre-trained general vision encoders (GVEs) and comparing them with handcrafted descriptors. Two GVEs - DINOv2 and MedSAM - are evaluated across eleven binary and multi-class datasets from the MedMNSIT benchmark. Our results show that covariance descriptors derived from GVE features consistently outperform those derived from handcrafted features. Moreover, SPDNet yields superior performance to state-of-the-art methods when combined with DINOv2 features. Our findings highlight the potential of combining covariance descriptors with powerful pretrained vision encoders for medical image analysis.

协方差描述符能够捕捉图像特征的二阶统计信息。它们在一般的计算机视觉任务中表现出强大的性能，但在医学影像中仍然被较少研究。我们研究了它们在传统和基于学习的医学图像分类中的有效性，特别是关注专为对称正定（SPD）矩阵设计的分类网络SPDNet。我们提出从预训练的通用视觉编码器（GVE）提取的特征构建协方差描述符，并与手工描述符进行比较。我们在来自MedMNSIT基准的11个二进位和多类别数据集中评估了DINOv2和MedSAM两种GVE。我们的结果表明，从GVE特征派生的协方差描述符始终优于从手工特征派生的描述符。此外，当与DINOv2特征结合时，SPDNet的性能优于最新方法。我们的研究结果表明了将协方差描述符与强大的预训练视觉编码器相结合在医学图像分析中的潜力。

论文及项目相关链接

PDF Preprint. Submitted to the IEEE International Symposium on Biomedical Imaging (ISBI) 2026

Summary
协方差描述符能捕捉图像特征的第二阶统计信息，在计算机视觉任务中表现优异，但在医学成像中尚未得到充分研究。本研究探讨其在传统和基于学习的医学图像分类中的应用，特别是为对称正定矩阵设计的SPDNet。研究提出构建来源于预训练通用视觉编码器特征的协方差描述符，并与手工描述符进行比较。在MedMNSIT基准的多个二分类和多分类数据集上评估了DINOv2和MedSAM两种编码器，结果显示来源于编码器特征的协方差描述符表现优于手工特征，结合SPDNet和DINOv2特征能达到业界前沿水平。这凸显了将协方差描述符与强大预训练视觉编码器结合在医学图像分析中的潜力。

Key Takeaways

协方差描述符在计算机视觉任务中表现优异，但在医学成像中应用较少。
研究探讨了协方差描述符在医学图像分类中的有效性，包括传统和基于学习的方法。
SPDNet是对对称正定矩阵设计的分类网络，研究中对其进行了评估。
提出利用预训练的通用视觉编码器（GVE）构建协方差描述符。
与手工描述符相比，来源于预训练通用视觉编码器的协方差描述符表现更佳。
在多个医学图像数据集上评估了DINOv2和MedSAM两种编码器，证明了其有效性。

Cool Papers

点此查看论文截图

Systematic Evaluation of Preprocessing Techniques for Accurate Image Registration in Digital Pathology

Authors:Fatemehzahra Darzi, Rodrigo Escobar Diaz Guerrero, Thomas Bocklitz

Image registration refers to the process of spatially aligning two or more images by mapping them into a common coordinate system, so that corresponding anatomical or tissue structures are matched across images. In digital pathology, registration enables direct comparison and integration of information from different stains or imaging modalities, sup-porting applications such as biomarker analysis and tissue reconstruction. Accurate registration of images from different modalities is an essential step in digital pathology. In this study, we investigated how various color transformation techniques affect image registration between hematoxylin and eosin (H&E) stained images and non-linear multimodal images. We used a dataset of 20 tissue sample pairs, with each pair undergoing several preprocessing steps, including different color transformation (CycleGAN, Macenko, Reinhard, Vahadane), inversion, contrast adjustment, intensity normalization, and denoising. All images were registered using the VALIS registration method, which first applies rigid registration and then performs non-rigid registration in two steps on both low and high-resolution images. Registration performance was evaluated using the relative Target Registration Error (rTRE). We reported the median of median rTRE values (MMrTRE) and the average of median rTRE values (AMrTRE) for each method. In addition, we performed a custom point-based evaluation using ten manually selected key points. Registration was done separately for two scenarios, using either the original or inverted multimodal images. In both scenarios, CycleGAN color transformation achieved the lowest registration errors, while the other methods showed higher errors. These findings show that applying color transformation before registration improves alignment between images from different modalities and supports more reliable analysis in digital pathology.

图像配准是指通过映射到同一坐标系，将两个或多个图像在空间上进行对齐的过程，从而使相应的解剖或组织结构在图像之间匹配。在数字病理学中，配准能够直接比较并整合来自不同染色或成像模式的信息，支持生物标志物分析和组织重建等应用。不同模式的图像准确配准是数字病理学中的基本步骤。在这项研究中，我们研究了各种颜色转换技术如何影响苏木精和伊红（H&E）染色图像与非线性多模态图像之间的图像配准。我们使用包含20个组织样本对的数据集，每个样本对都要经历几个预处理步骤，包括不同的颜色转换（CycleGAN、Macenko、Reinhard、Vahadane）、反转、对比度调整、强度归一化和去噪。所有图像均使用VALIS配准方法进行配准，该方法首先应用刚性配准，然后在高低分辨率图像上分为两步执行非刚性配准。使用相对目标配准误差（rTRE）来评估配准性能。我们报告了每种方法的平均中位rTRE值（MMrTRE）和中位数的中位rTRE值（AMrTRE）。此外，我们还使用了基于十个手动选择的关键点的自定义点评估法。配准是针对两种情景分别进行的，使用原始或反转的多模态图像。在这两种情况下，CycleGAN颜色转换都实现了最低的配准误差，而其他方法显示出更高的误差。这些结果表明，在配准之前应用颜色转换可改善不同模态图像之间的对齐，并支持数字病理学中进行更可靠的分析。

论文及项目相关链接

PDF 14 pages, 7 Figures

Summary
图像配准是通过将两幅或多幅图像映射到同一坐标系中，实现空间对齐的过程，从而使不同图像中的相应解剖或组织结构相匹配。在数字病理学中，配准能够直接比较和整合不同染色或成像方式的信息，支持生物标志物分析和组织重建等应用。本研究探讨了不同的色彩转换技术如何影响苏木精和伊红染色图像与非线性多模态图像之间的图像配准。使用包含20个组织样本对的数据库，经过多种预处理步骤和色彩转换技术，所有图像均使用VALIS配准方法进行配准。性能评估采用目标配准误差相对值（rTRE）。结果显示，CycleGAN色彩转换在配准误差方面表现最佳，而其他方法误差较高。这表明在配准之前应用色彩转换可以改进不同模态图像之间的对齐，并支持更可靠的分析。

Key Takeaways

图像配准是数字病理学中至关重要的步骤，它允许不同染色或成像方式的信息进行比较和整合。
本研究探讨了色彩转换技术对图像配准的影响。
使用包含20个组织样本对的数据库进行实验。
所有的图像都使用VALIS配准方法进行配准，该方法首先应用刚性配准，然后在高低分辨率图像上进行两步非刚性配准。
性能评估采用相对目标注册误差（rTRE）。
CycleGAN色彩转换在配准过程中表现最佳。

Cool Papers

点此查看论文截图

When Swin Transformer Meets KANs: An Improved Transformer Architecture for Medical Image Segmentation

Authors:Nishchal Sapkota, Haoyan Shi, Yejia Zhang, Xianshi Ma, Bofang Zheng, Danny Z. Chen

Medical image segmentation is critical for accurate diagnostics and treatment planning, but remains challenging due to complex anatomical structures and limited annotated training data. CNN-based segmentation methods excel at local feature extraction, but struggle with modeling long-range dependencies. Transformers, on the other hand, capture global context more effectively, but are inherently data-hungry and computationally expensive. In this work, we introduce UKAST, a U-Net like architecture that integrates rational-function based Kolmogorov-Arnold Networks (KANs) into Swin Transformer encoders. By leveraging rational base functions and Group Rational KANs (GR-KANs) from the Kolmogorov-Arnold Transformer (KAT), our architecture addresses the inefficiencies of vanilla spline-based KANs, yielding a more expressive and data-efficient framework with reduced FLOPs and only a very small increase in parameter count compared to SwinUNETR. UKAST achieves state-of-the-art performance on four diverse 2D and 3D medical image segmentation benchmarks, consistently surpassing both CNN- and Transformer-based baselines. Notably, it attains superior accuracy in data-scarce settings, alleviating the data-hungry limitations of standard Vision Transformers. These results show the potential of KAN-enhanced Transformers to advance data-efficient medical image segmentation. Code is available at: https://github.com/nsapkota417/UKAST

医学图像分割对于准确的诊断和制定治疗方案至关重要，但由于复杂的解剖结构和有限的标注训练数据，它仍然是一个挑战。基于CNN的分割方法在局部特征提取方面表现出色，但在建模长距离依赖关系方面却遇到困难。另一方面，变压器更有效地捕捉全局上下文，但本质上需要大量的数据和昂贵的计算资源。在这项工作中，我们引入了UKAST，这是一种类似于U-Net的架构，它将基于有理函数的Kolmogorov-Arnold网络（KANs）集成到Swin Transformer编码器。通过利用Kolmogorov-Arnold Transformer（KAT）中的有理基函数和组有理KANs（GR-KANs），我们的架构解决了普通样条插值基KANs的效率低下问题，产生了一个更具表现力和数据高效的框架，与SwinUNETR相比，浮点运算减少，参数计数仅略有增加。UKAST在四个不同的2D和3D医学图像分割基准测试中实现了最先进的性能，一致超越了基于CNN和Transformer的基线。值得注意的是，它在数据稀缺的环境中达到了更高的准确性，缓解了标准视觉变压器对数据的需求限制。这些结果展示了增强型KAN Transformer在数据高效医学图像分割方面的潜力。代码可在以下网址找到：[https://github.com/nsapkota417/UKAST]（中文链接请自行转换为对应中文网站）

论文及项目相关链接

PDF

Summary

本文介绍了医学图像分割的重要性和挑战，包括复杂的解剖结构和有限的标注训练数据。文章提出了一种新型的U-Net架构UKAST，它结合了Kolmogorov-Arnold Networks（KANs）和Swin Transformer编码器。UKAST具有更好的表达能力和数据效率，降低了计算量并增加了较少的参数。它在多个医学图像分割基准测试中达到了领先水平，尤其是在数据稀缺的环境下表现出优越的准确性。这为推动数据高效医学图像分割的发展提供了潜力。

Key Takeaways

医学图像分割对准确诊断和治疗计划至关重要，但面临复杂解剖结构和有限训练数据的挑战。
CNN和Transformer在医学图像分割中各有优势与不足。CNN擅长局部特征提取，但难以建模长距离依赖关系；而Transformer能更有效地捕捉全局上下文，但数据需求量大且计算成本高。
UKAST是一个新型的U-Net架构，结合了Kolmogorov-Arnold Networks（KANs）和Swin Transformer编码器。
UKAST通过利用理性基础函数和Group Rational KANs（GR-KANs），解决了常规spline-based KANs的效率问题。
UKAST在四个不同的医学图像分割基准测试中表现优异，尤其在数据稀缺的环境下表现出更高的准确性。
UKAST代码已公开，为进一步研究和应用提供了便利。

Cool Papers

点此查看论文截图

Adversarial and Score-Based CT Denoising: CycleGAN vs Noise2Score

Authors:Abu Hanif Muhammad Syarubany

We study CT image denoising in the unpaired and self-supervised regimes by evaluating two strong, training-data-efficient paradigms: a CycleGAN-based residual translator and a Noise2Score (N2S) score-matching denoiser. Under a common evaluation protocol, a configuration sweep identifies a simple standard U-Net backbone within CycleGAN (lambda_cycle = 30, lambda_iden = 2, ngf = ndf = 64) as the most reliable setting; we then train it to convergence with a longer schedule. The selected CycleGAN improves the noisy input from 34.66 dB / 0.9234 SSIM to 38.913 dB / 0.971 SSIM and attains an estimated score of 1.9441 and an unseen-set (Kaggle leaderboard) score of 1.9343. Noise2Score, while slightly behind in absolute PSNR / SSIM, achieves large gains over very noisy inputs, highlighting its utility when clean pairs are unavailable. Overall, CycleGAN offers the strongest final image quality, whereas Noise2Score provides a robust pair-free alternative with competitive performance. Source code is available at https://github.com/hanifsyarubany/CT-Scan-Image-Denoising-using-CycleGAN-and-Noise2Score.

我们研究了在无配对和自监督环境下的CT图像去噪，通过评估两种强大且训练数据高效的范式：基于CycleGAN的残差翻译器和Noise2Score（N2S）得分匹配去噪器。在一个常见的评估协议下，配置扫描确定了CycleGAN中一个简单的标准U-Net主干（lambda_cycle = 30，lambda_iden = 2，ngf = ndf = 64）是最可靠的设置；然后我们使用更长的计划将其训练到收敛。所选的CycleGAN将噪声输入从34.66 dB / 0.9234 SSIM提高到38.913 dB / 0.971 SSIM，估计得分为1.9441，未见集（Kaggle排行榜）得分为1.9343。Noise2Score虽然在绝对的PSNR / SSIM上略逊一筹，但在处理非常嘈杂的输入时取得了很大的收益，这突出了当没有干净的配对时它的实用性。总的来说，CycleGAN提供了最强的最终图像质量，而Noise2Score则提供了一种无需配对的稳健替代方案，表现具有竞争力。源代码可在https://github.com/hanifsyarubany/CT-Scan-Image-Denoising-using-CycleGAN-and-Noise2Score找到。

论文及项目相关链接

PDF

Summary

基于CycleGAN的残差翻译器和Noise2Score（N2S）得分匹配去噪器在无配对和自我监督下对CT图像进行去噪研究。通过对两种高效、训练数据效率高的范式进行评估，确定CycleGAN的最可靠配置为简单的标准U-Net骨干网，并将其训练至收敛。CycleGAN提高了噪声输入的图像质量，而Noise2Score在绝对PSNR/SSIM上略逊一筹，但在非常嘈杂的输入上取得了很大进步。总体而言，CycleGAN提供了最强的最终图像质量，而Noise2Score提供了具有竞争力的无配对替代方案。

Key Takeaways

研究了无配对和自我监督下的CT图像去噪。
通过评估两种训练数据效率高的范式：基于CycleGAN的残差翻译器和Noise2Score得分匹配去噪器。
CycleGAN的最可靠配置为具有简单标准U-Net骨干网，并通过延长训练时间达到收敛。
CycleGAN显著提高了噪声输入的图像质量。
Noise2Score在嘈杂输入上取得了进步，尽管在绝对PSNR/SSIM上略逊于CycleGAN。
CycleGAN最终图像质量最强。

Cool Papers

点此查看论文截图

Improving the Performance of Radiology Report De-identification with Large-Scale Training and Benchmarking Against Cloud Vendor Methods

Authors:Eva Prakash, Maayane Attias, Pierre Chambon, Justin Xu, Steven Truong, Jean-Benoit Delbrouck, Tessa Cook, Curtis Langlotz

Objective: To enhance automated de-identification of radiology reports by scaling transformer-based models through extensive training datasets and benchmarking performance against commercial cloud vendor systems for protected health information (PHI) detection. Materials and Methods: In this retrospective study, we built upon a state-of-the-art, transformer-based, PHI de-identification pipeline by fine-tuning on two large annotated radiology corpora from Stanford University, encompassing chest X-ray, chest CT, abdomen/pelvis CT, and brain MR reports and introducing an additional PHI category (AGE) into the architecture. Model performance was evaluated on test sets from Stanford and the University of Pennsylvania (Penn) for token-level PHI detection. We further assessed (1) the stability of synthetic PHI generation using a “hide-in-plain-sight” method and (2) performance against commercial systems. Precision, recall, and F1 scores were computed across all PHI categories. Results: Our model achieved overall F1 scores of 0.973 on the Penn dataset and 0.996 on the Stanford dataset, outperforming or maintaining the previous state-of-the-art model performance. Synthetic PHI evaluation showed consistent detectability (overall F1: 0.959 [0.958-0.960]) across 50 independently de-identified Penn datasets. Our model outperformed all vendor systems on synthetic Penn reports (overall F1: 0.960 vs. 0.632-0.754). Discussion: Large-scale, multimodal training improved cross-institutional generalization and robustness. Synthetic PHI generation preserved data utility while ensuring privacy. Conclusion: A transformer-based de-identification model trained on diverse radiology datasets outperforms prior academic and commercial systems in PHI detection and establishes a new benchmark for secure clinical text processing.

目标：通过大规模训练数据集扩展基于变压器的模型，并与商业云供应商系统的保护健康信息（PHI）检测性能进行基准测试，从而提高放射学报告自动化去标识化的能力。材料与方法：在这项回顾性研究中，我们建立在最新、基于变压器的PHI去标识化管道上，通过两个来自斯坦福大学的标注放射学语料库进行微调，涵盖胸部X射线、胸部CT、腹部/骨盆CT和大脑MR报告，并将额外的PHI类别（年龄）引入架构中。我们在斯坦福和宾夕法尼亚大学（宾大）的测试集上评估了模型在令牌级PHI检测方面的性能。我们进一步评估了（1）“隐藏于普通视野中”方法的合成PHI生成的稳定性，以及（2）“与商业系统的性能对比”。计算了所有PHI类别的精确度、召回率和F1分数。结果：我们的模型在宾大数据集上的总体F1分数为0.973，斯坦福数据集上为0.996，优于或保持了先前的最新模型性能。合成PHI评估显示，在50个独立去标识化的宾大数据集中，检测能力一致（总体F1：0.959 [0.958-0.960]）。我们的模型在合成宾大报告上优于所有供应商系统（总体F1：0.960 vs. 0.632-0.754）。讨论：大规模、多模式训练提高了跨机构推广和稳健性。合成PHI生成在保留数据实用性的同时确保了隐私。结论：基于变压器的去标识化模型，经过多样化的放射学数据集训练，在PHI检测方面优于先前的学术和商业系统，为安全临床文本处理建立了新的基准。

论文及项目相关链接

PDF In submission to JAMIA

Summary
本研究采用基于变压器的模型，通过大规模数据集进行训练，以提高放射学报告自动去标识化的性能，并与商业云供应商系统的保护健康信息（PHI）检测性能进行了比较。该模型在斯坦福和宾夕法尼亚大学数据集上的PHI检测F1得分较高，超过或维持了先前最新模型的表现。合成PHI评估显示，该模型在独立去标识的宾夕法尼亚数据集上具有良好的可检测性。此外，该模型在合成宾夕法尼亚报告上优于所有供应商系统。研究结果表明，大规模多模式训练提高了跨机构推广和稳健性，合成PHI生成确保了数据的隐私同时保留其使用性。本研究建立的基于变压器的去标识化模型在PHI检测方面表现出色，为安全临床文本处理建立了新的基准。

Key Takeaways

研究使用基于变压器的模型进行放射学报告自动去标识化。
通过大规模数据集进行训练以提高性能。
模型在斯坦福和宾夕法尼亚大学数据集上进行了PHI检测的评估，表现优异。
合成PHI评估显示了模型的稳定性和有效性。
模型在跨机构推广和稳健性方面表现出优势。
合成PHI生成技术确保了数据的隐私性和使用性。

Cool Papers

点此查看论文截图

MedDChest: A Content-Aware Multimodal Foundational Vision Model for Thoracic Imaging

Authors:Mahmoud Soliman, Islam Osman, Mohamed S. Shehata, Rasika Rajapakshe

The performance of vision models in medical imaging is often hindered by the prevailing paradigm of fine-tuning backbones pre-trained on out-of-domain natural images. To address this fundamental domain gap, we propose MedDChest, a new foundational Vision Transformer (ViT) model optimized specifically for thoracic imaging. We pre-trained MedDChest from scratch on a massive, curated, multimodal dataset of over 1.2 million images, encompassing different modalities including Chest X-ray and Computed Tomography (CT) compiled from 10 public sources. A core technical contribution of our work is Guided Random Resized Crops, a novel content-aware data augmentation strategy that biases sampling towards anatomically relevant regions, overcoming the inefficiency of standard cropping techniques on medical scans. We validate our model’s effectiveness by fine-tuning it on a diverse set of downstream diagnostic tasks. Comprehensive experiments empirically demonstrate that MedDChest significantly outperforms strong, publicly available ImageNet-pretrained models. By establishing the superiority of large-scale, in-domain pre-training combined with domain-specific data augmentation, MedDChest provides a powerful and robust feature extractor that serves as a significantly better starting point for a wide array of thoracic diagnostic tasks. The model weights will be made publicly available to foster future research and applications.

在医学影像中，视觉模型的性能常常受到使用预训练的域外自然图像微调主导模式的影响。为了解决这一基本域差距问题，我们提出了MedDChest，这是一种针对胸部影像优化设计的全新基础视觉转换器（ViT）模型。我们从大量精选的多模式数据集中从头开始预训练MedDChest，该数据集包含超过120万张图像，涵盖不同模式，包括来自10个公共源的胸部X射线和计算机断层扫描（CT）。我们工作的核心技术贡献是引导随机调整裁剪（Guided Random Resized Crops），这是一种新型的内容感知数据增强策略，它偏向于解剖相关区域的采样，克服了标准裁剪技术在医学扫描上的低效性。我们通过在一系列下游诊断任务上微调模型来验证其有效性。综合实验经验表明，MedDChest显著优于强大的公开ImageNet预训练模型。通过建立大规模领域预训练与特定领域数据增强相结合的优势，MedDChest提供了一个强大且稳健的特征提取器，可作为多种胸部诊断任务的一个更好的起点。模型的权重将公开提供，以促进未来的研究与应用。

论文及项目相关链接

PDF 10 pages, 2 figures

Summary

经过研究发现，针对医疗影像的预训练模型在应用于医学成像时存在性能瓶颈。为解决这一问题，提出了一种新的针对胸部影像的Vision Transformer模型——MedDChest。该模型在大量精选的多模态数据集上进行训练，包含超过1.2百万张涵盖不同模态（如X光胸片与计算机断层扫描）的图像。此外，还提出了一种新型的内容感知数据增强策略——导向随机尺寸裁剪（Guided Random Resized Crops），能更有效地在医学扫描图像上采样解剖相关区域。通过实验验证，MedDChest在多种下游诊断任务上的表现均显著优于公开的ImageNet预训练模型。这表明大规模领域内预训练结合领域特定数据增强策略能提供更强大、更稳健的特征提取能力，为广泛的胸部诊断任务提供更优秀的起点。

Key Takeaways

MedDChest是一种针对胸部影像优化的新Vision Transformer模型。
该模型在超过1.2百万张图像的大规模、精选、多模态数据集上进行预训练。
提出了导向随机尺寸裁剪（Guided Random Resized Crops）这一新型内容感知数据增强策略。
MedDChest在多种下游诊断任务上的表现均显著优于ImageNet预训练模型。
大规模领域内预训练结合领域特定数据增强策略能提升特征提取能力。
MedDChest模型为广泛的胸部诊断任务提供更优秀的起点。

Cool Papers

点此查看论文截图

CORE - A Cell-Level Coarse-to-Fine Image Registration Engine for Multi-stain Image Alignment

Authors:Esha Sadia Nasir, Behnaz Elhaminia, Mark Eastwood, Catherine King, Owen Cain, Lorraine Harper, Paul Moss, Dimitrios Chanouzas, David Snead, Nasir Rajpoot, Adam Shephard, Shan E Ahmed Raza

Accurate and efficient registration of whole slide images (WSIs) is essential for high-resolution, nuclei-level analysis in multi-stained tissue slides. We propose a novel coarse-to-fine framework CORE for accurate nuclei-level registration across diverse multimodal whole-slide image (WSI) datasets. The coarse registration stage leverages prompt-based tissue mask extraction to effectively filter out artefacts and non-tissue regions, followed by global alignment using tissue morphology and ac- celerated dense feature matching with a pre-trained feature extractor. From the coarsely aligned slides, nuclei centroids are detected and subjected to fine-grained rigid registration using a custom, shape-aware point-set registration model. Finally, non-rigid alignment at the cellular level is achieved by estimating a non-linear dis- placement field using Coherent Point Drift (CPD). Our approach benefits from automatically generated nuclei that enhance the accuracy of deformable registra- tion and ensure precise nuclei-level correspondence across modalities. The pro- posed model is evaluated on three publicly available WSI registration datasets, and two private datasets. We show that CORE outperforms current state-of-the-art methods in terms of generalisability, precision, and robustness in bright-field and immunofluorescence microscopy WSIs

在多重染色组织切片中进行高分辨率、细胞核级别的分析时，对整个幻灯片图像（WSI）进行准确高效的注册至关重要。我们提出了一种新型由粗到细的注册框架CORE，用于在多种模式的整个幻灯片图像（WSI）数据集中进行准确的细胞核级别注册。粗注册阶段利用基于提示的组织掩膜提取技术，有效地过滤掉伪影和非组织区域，随后利用组织形态进行全局对齐，并使用预训练的特征提取器加速密集特征匹配。从粗略对齐的幻灯片中检测细胞核质心，并使用自定义的、具有形状感知的点集注册模型进行精细的刚性注册。最后，通过估计非线性位移场来实现细胞层面的非刚性对齐，使用协同点漂移（CPD）。我们的方法受益于自动生成的细胞核，提高了可变形注册的准确性，并确保跨模态的精确细胞核级别对应。所提出的模型在三个公开的WSI注册数据集和两个私有数据集上进行了评估。我们表明，CORE在明场和免疫荧光显微镜WSI的通用性、精度和稳健性方面优于当前最先进的方法。

论文及项目相关链接

PDF

Summary

一种新型的粗到细框架CORE被提出来实现多模态全切片图像（WSI）的细胞核级别精确配准。该框架包括粗配准阶段和精细配准阶段。粗配准阶段基于提示进行组织掩膜提取，有效过滤出伪影和非组织区域，然后利用组织形态和预训练的特征提取器进行全局对齐和加速密集特征匹配。精细配准阶段则基于检测到的细胞核质心，利用自定义的形状感知点集配准模型进行精细配准，并通过估计非线性位移场实现细胞级别的非刚性对齐。该方法自动生成的细胞核提高了可变形配准的准确性，并在多种公开和私有数据集上验证了其优于当前最先进方法的泛化能力、精确度和稳健性。

Key Takeaways

提出了新型的粗到细框架CORE，用于多模态全切片图像（WSI）的细胞核级别精确配准。
粗配准阶段通过提示进行组织掩膜提取，有效过滤伪影和非组织区域。
利用组织形态和预训练的特征提取器进行全局对齐和加速密集特征匹配。
精细配准阶段基于检测到的细胞核质心进行点集配准。
通过估计非线性位移场实现细胞级别的非刚性对齐。
自动生成的细胞核提高了可变形配准的准确性。

Cool Papers

点此查看论文截图

Possibility of ferro-octupolar order in Ba$_2$CaOsO$_6$ assessed by X-ray magnetic dichroism measurements

Authors:Goro Shibata, Naomi Kawamura, Jun Okamoto, Arata Tanaka, Hiroaki Hayashi, Kazunari Yamaura, Hsiao-Yu Huang, Amol Singh, Chien-Te Chen, Di-Jing Huang, Sergey V. Streltsov, Atsushi Fujimori

Localized $5d^2$ electrons in a cubic crystal field possess multipoles such as electric quadrupoles and magnetic octupoles. We studied the cubic double perovskite Ba$_2$CaOsO$_6$ containing the Os$^{6+}$ ($5d^2$) ions, which exhibits a phase transition to a hidden order' below $T^* \sim$ 50 K, by X-ray absorption spectroscopy (XAS) and X-ray magnetic circular dichroism (XMCD) at the Os $L_{2,3}$ edge. The cubic ligand-field splitting between the $t_{2g}$ and $e_g$ levels of Os $5d$ was deduced by XAS to be $\sim$4 eV. The temperature dependence of the XMCD spectra was consistent with a $\sim$18 meV residual cubic splitting of the lowest $J_{\rm eff} =$ 2 multiplet state into the non-Kramers $E_g$ doublet ground state and the $T_{2g}$ triplet excited state. Ligand-field (LF) multiplet calculation under fictitious strong magnetic fields indicated that the exchange interaction between nearest-neighbor octupoles should be as strong as $\sim$1.5 meV if a ferro-octupole order is stabilized in the hidden-ordered’ state, consistent with the exchange interaction of $\sim$1 meV previously predicted theoretically using model and density functional theory calculations.

具有立方晶体场的局部$5d^2$电子具有多重极，例如电四极和磁八极。我们研究了具有Os$^{6+}$（$5d^2$）离子的立方双钙钛矿Ba$2$CaOsO$6$，它在低于$T^* \sim 50 K$时会发生到“隐藏顺序”的相变。我们通过Os $L{2,3}$边缘的X射线吸收光谱（XAS）和X射线磁圆二色性（XMCD）进行了研究。通过XAS推断出Os $5d$的$t{2g}$和$e_g$能级之间的立方配体场分裂约为$4 eV$。XMCD光谱的温度依赖性表明最低的多重态状态$J_{\rm eff} = 2$存在约$18 meV$的残余立方分裂，这使其成为非Kramers基态$E_g$双态和激发态的$T_{2g}$三重态。在假设的强磁场下的配体场多重态计算表明，如果在“隐藏有序”状态下稳定了铁磁八极顺序，则最近邻八极之间的交换相互作用应该达到$\sim 1.5 meV$，这与之前使用模型和密度泛函理论计算的理论预测值$\sim 1 meV$的交换相互作用一致。

论文及项目相关链接

PDF 6 pages, 4 figures

Summary

本文研究了立方双钙钛矿Ba$_2$CaOsO$_6$中的Os$^{6+}$（$5d^2$）离子，发现其在低温下存在一种“隐藏序”。通过X射线吸收光谱（XAS）和X射线磁圆二色性（XMCD）研究，发现该离子在立方晶体场中具有多重极，如电四极和磁八极。研究结果表明，在隐藏序状态下可能存在铁磁性八极序，且交换相互作用预计相当强。

Key Takeaways

Os$^{6+}$离子在立方双钙钛矿Ba$_2$CaOsO$_6$中的电子结构表现出多重极特性。
通过XAS研究确定了Os $5d$的$t_{2g}$和$e_g$能级之间的立方配体场分裂约为4 eV。
XMCD谱的温度依赖性表明存在约18 meV的残余立方分裂。
研究结果支持了最低$J_{\rm eff} =$ 2多重态的非Kramers $E_g$基态和$T_{2g}$激发态的存在。
在假想的强磁场下进行的配体场多重态计算表明，如果“隐藏序”状态下稳定了铁磁性八极序，则近邻八极之间的交换相互作用可能达到约1.5 meV。
该结果与之前使用模型和密度泛函理论计算得到的约1 meV的交换相互作用一致。

Cool Papers

点此查看论文截图

Authors:Yutao Jin, Haowen Xiao, Junyong Zhai, Yuxiao Li, Jielei Chu, Fengmao Lv, Yuxiao Li

Mild Cognitive Impairment (MCI) serves as a prodromal stage of Alzheimer’s Disease (AD), where early identification and intervention can effectively slow the progression to dementia. However, diagnosing AD remains a significant challenge in neurology due to the confounders caused mainly by the selection bias of multi-modal data and the complex relationships between variables. To address these issues, we propose a novel visual-language causality-inspired framework named Cross-modal Causal Intervention with Mediator for Alzheimer’s Disease Diagnosis (MediAD) for diagnostic assistance. Our MediAD employs Large Language Models (LLMs) to summarize clinical data under strict templates, therefore enriching textual inputs. The MediAD model utilizes Magnetic Resonance Imaging (MRI), clinical data, and textual data enriched by LLMs to classify participants into Cognitively Normal (CN), MCI, and AD categories. Because of the presence of confounders, such as cerebral vascular lesions and age-related biomarkers, non-causal models are likely to capture spurious input-output correlations, generating less reliable results. Our framework implicitly mitigates the effect of both observable and unobservable confounders through a unified causal intervention method. Experimental results demonstrate the outstanding performance of our method in distinguishing CN/MCI/AD cases, outperforming other methods in most evaluation metrics. The study showcases the potential of integrating causal reasoning with multi-modal learning for neurological disease diagnosis.

轻度认知障碍（MCI）是阿尔茨海默病（AD）的前驱阶段，早期识别和干预可以有效减缓向痴呆的进展。然而，由于多种模式数据的选择偏见和变量之间的复杂关系导致的混淆因素，AD的诊断在神经学领域仍然是一个巨大的挑战。为了解决这些问题，我们提出了一种新型的视觉语言因果灵感框架，名为“用于阿尔茨海默氏症诊断的跨模态因果干预中介”（MediAD），用于辅助诊断。我们的MediAD采用大型语言模型（LLM）对严格模板下的临床数据进行总结，从而丰富文本输入。MediAD模型利用磁共振成像（MRI）、临床数据和LLM丰富的文本数据，将参与者分类为认知正常（CN）、MCI和AD类别。由于存在脑血管病变和年龄相关生物标志物等混杂因素，非因果模型可能会捕捉到虚假的输入-输出相关性，从而产生不太可靠的结果。我们的框架通过统一的因果干预方法隐含地减轻了可观测和不可观测混杂因素的影响。实验结果表明，我们的方法在区分CN/MCI/AD病例方面表现出卓越的性能，在大多数评估指标上优于其他方法。该研究展示了将因果推理与多模式学习整合用于神经系统疾病诊断的潜力。

论文及项目相关链接

PDF

Summary

本文介绍了针对阿尔茨海默病（AD）诊断的挑战，提出了一种基于视觉语言因果理论的新型框架——Cross-modal Causal Intervention with Mediator（MediAD）。该框架通过大型语言模型（LLMs）对临床数据进行总结并丰富文本输入，利用磁共振成像（MRI）、临床数据和文本数据将参与者分为认知正常（CN）、轻度认知障碍（MCI）和AD三类。实验结果表明，该方法在区分CN/MCI/AD病例方面表现出卓越性能，并在大多数评估指标上优于其他方法。该研究展示了将因果推理与多模态学习相结合在神经系统疾病诊断中的潜力。

Key Takeaways

轻度认知障碍（MCI）是阿尔茨海默病（AD）的先兆阶段，早期识别和干预可有效减缓向痴呆的进展。
AD诊断在神经学中仍具挑战，主要由于多模态数据的选择偏见和变量间复杂关系导致的混淆因素。
提出了一种新型视觉语言因果灵感框架——MediAD，用于阿尔茨海默病的诊断。
MediAD框架利用大型语言模型（LLMs）对临床数据进行总结并丰富文本输入。
MediAD使用MRI、临床数据和文本数据对参与者进行分类。
因果推理方法能有效减轻可观测和不可观测混淆因素的影响。

Cool Papers

点此查看论文截图

DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging

Authors:Felix Wagner, Pramit Saha, Harry Anthony, J. Alison Noble, Konstantinos Kamnitsas

Safe deployment of machine learning (ML) models in safety-critical domains such as medical imaging requires detecting inputs with characteristics not seen during training, known as out-of-distribution (OOD) detection, to prevent unreliable predictions. Effective OOD detection after deployment could benefit from access to the training data, enabling direct comparison between test samples and the training data distribution to identify differences. State-of-the-art OOD detection methods, however, either discard the training data after deployment or assume that test samples and training data are centrally stored together, an assumption that rarely holds in real-world settings. This is because shipping the training data with the deployed model is usually impossible due to the size of training databases, as well as proprietary or privacy constraints. We introduce the Isolation Network, an OOD detection framework that quantifies the difficulty of separating a target test sample from the training data by solving a binary classification task. We then propose Decentralized Isolation Networks (DIsoN), which enables the comparison of training and test data when data-sharing is impossible, by exchanging only model parameters between the remote computational nodes of training and deployment. We further extend DIsoN with class-conditioning, comparing a target sample solely with training data of its predicted class. We evaluate DIsoN on four medical imaging datasets (dermatology, chest X-ray, breast ultrasound, histopathology) across 12 OOD detection tasks. DIsoN performs favorably against existing methods while respecting data-privacy. This decentralized OOD detection framework opens the way for a new type of service that ML developers could provide along with their models: providing remote, secure utilization of their training data for OOD detection services. Code: https://github.com/FelixWag/DIsoN

在医学成像等安全关键领域中安全部署机器学习（ML）模型需要检测训练期间未见特征输入，这称为离群分布（OOD）检测，以防止不可靠预测。部署后有效的OOD检测可以受益于访问训练数据，能够直接在测试样本和训练数据分布之间进行比较以识别差异。然而，最先进的OOD检测方法要么在部署后丢弃训练数据，要么假设测试样本和训练数据集中存储在一起，这在现实世界的设置几乎很少如此。这是因为由于训练数据库的大小以及专有或隐私约束，通常不可能将训练数据与部署的模型一起传输。我们引入了隔离网络，这是一种OOD检测框架，通过解决二分类任务来量化将目标测试样本从训练数据中分离的难度。然后，我们提出了分散式隔离网络（DIsoN），通过在训练和部署的远程计算节点之间仅交换模型参数，从而能够在不可能进行数据共享的情况下比较训练和测试数据。我们进一步将DIsoN扩展到类条件，仅将目标样本与其预测类的训练数据进行比较。我们在四个医学成像数据集（皮肤科、胸部X射线、乳腺超声、组织病理学）上对DIsoN进行了评估，涵盖了1Exx OOD检测任务。DIsoN表现优于现有方法同时尊重数据隐私。这种分散式的OOD检测框架为ML开发人员提供了一种新型服务的方式：在提供模型的同时提供远程、安全的利用他们的训练数据进行OOD检测服务。代码地址：https://github.com/FelixWag/DIsoN

论文及项目相关链接

PDF Accepted at NeurIPS 2025

Summary

本文介绍了一种用于医疗图像等领域的机器学习模型安全部署的方法。针对部署后的模型输入，通过隔离网络（Isolation Network）进行异常检测，以识别出与训练数据分布不同的测试样本。同时，提出了分散隔离网络（DIsoN），能够在无法共享数据的情况下比较训练和测试数据。该方法在四个医疗影像数据集上的表现良好，能够实现对新类型的有效服务。在部署模型的同时，能够远程安全利用训练数据进行异常检测服务。

Key Takeaways

Cool Papers

点此查看论文截图

CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists’ Diagnostic Logic

Authors:Yuxuan Sun, Yixuan Si, Chenglu Zhu, Kai Zhang, Zhongyi Shui, Bowen Ding, Tao Lin, Lin Yang

Recent advances in computational pathology have led to the emergence of numerous foundation models. These models typically rely on general-purpose encoders with multi-instance learning for whole slide image (WSI) classification or apply multimodal approaches to generate reports directly from images. However, these models cannot emulate the diagnostic approach of pathologists, who systematically examine slides at low magnification to obtain an overview before progressively zooming in on suspicious regions to formulate comprehensive diagnoses. Instead, existing models directly output final diagnoses without revealing the underlying reasoning process. To address this gap, we introduce CPathAgent, an innovative agent-based approach that mimics pathologists’ diagnostic workflow by autonomously navigating across WSI based on observed visual features, thereby generating substantially more transparent and interpretable diagnostic summaries. To achieve this, we develop a multi-stage training strategy that unifies patch-level, region-level, and WSI-level capabilities within a single model, which is essential for replicating how pathologists understand and reason across diverse image scales. Additionally, we construct PathMMU-HR2, the first expert-validated benchmark for large region analysis. This represents a critical intermediate scale between patches and whole slides, reflecting a key clinical reality where pathologists typically examine several key large regions rather than entire slides at once. Extensive experiments demonstrate that CPathAgent consistently outperforms existing approaches across benchmarks at three different image scales, validating the effectiveness of our agent-based diagnostic approach and highlighting a promising direction for computational pathology.

近年来，计算病理学领域的进步催生了众多基础模型的出现。这些模型通常依赖于具有多实例学习功能的通用编码器，用于全幻灯片图像（WSI）分类，或者采用多模态方法直接从图像生成报告。然而，这些模型无法模拟病理医生的诊断过程，病理医生会在低倍镜下系统地检查幻灯片以获得概览，然后逐步放大到可疑区域以形成全面的诊断。相反，现有模型直接输出最终诊断结果，而没有揭示潜在的推理过程。为了解决这一差距，我们引入了CPathAgent，这是一种基于创新的代理方法，通过基于观察到的视觉特征在WSI中自主导航，从而生成更加透明和可解释的诊断摘要。为了实现这一点，我们开发了一种多阶段训练策略，将补丁级别、区域级别和WSI级别的能力统一到一个模型中，这对于复制病理医生如何在不同图像尺度上理解和推理是至关重要的。此外，我们构建了PathMMU-HR2，这是用于大区域分析的首个专家验证的基准测试。这代表了补丁和整个幻灯片之间的中间规模的关键过渡，反映了临床上的实际情况，即病理医生通常一次检查几个关键的大区域而不是整个幻灯片。大量实验表明，CPathAgent在三个不同图像尺度的基准测试中始终优于现有方法，验证了我们的基于代理的诊断方法的有效性，并指出了计算病理学的一个有前途的发展方向。

论文及项目相关链接

PDF 52 pages, 34 figures

Summary
计算病理学领域的最新进展催生了众多基础模型的出现。这些模型通常采用通用编码器与多实例学习技术，用于全切片图像分类，或采用多模态方法直接从图像生成报告。然而，这些模型无法模拟病理医生的诊断过程，即系统性地低倍观察切片以获得概览，再逐步放大到可疑区域以做出全面诊断。为了解决这一差距，我们推出了CPathAgent，这是一种模仿病理医生诊断工作流的基于代理的方法，可自主在全切片图像中进行导航，根据观察到的视觉特征生成更加透明和可解释的诊断摘要。为实现这一目标，我们开发了一种多阶段训练策略，将补丁级别、区域级别和全切片级别的能力在一个单一模型中统一起来，这对于复制病理医生如何在不同图像尺度上进行理解和推理至关重要。此外，我们构建了PathMMU-HR2，这是一个经过专家验证的大型区域分析的基准测试，它反映了病理医生通常一次查看的几个关键大型区域，而不是整个切片，这代表了临床实际情况中的关键中间尺度。实验表明，CPathAgent在三个不同图像尺度的基准测试中始终优于现有方法，验证了我们的基于代理的诊断方法的有效性，并指出了计算病理学领域的一个有前途的方向。

Key Takeaways

计算病理学领域出现多个基础模型，通常采用通用编码器与多实例学习或多模态方法进行图像分析和报告生成。
当前模型无法模拟病理医生的诊断流程，缺乏透明度与可解释性。
CPathAgent被引入以模仿病理医生的诊断流程，通过自主导航全切片图像并基于视觉特征生成诊断摘要。
CPathAgent采用多阶段训练策略来统一不同图像尺度下的能力。
PathMMU-HR2是首个针对大型区域分析的专家验证基准测试，反映病理医生实际工作中的关键中间尺度。
CPathAgent在多个基准测试中表现优异，验证了其有效性。

Cool Papers

点此查看论文截图

Authors:Peihong Zhang, Zhixin Li, Rui Sang, Yuxuan Liu, Yiqiang Cai, Yizhou Tan, Shengchen Li

The coupling signal refers to a latent physiological signal that characterizes the transformation from cardiac electrical excitation, captured by the electrocardiogram (ECG), to mechanical contraction, recorded by the phonocardiogram (PCG). By encoding the temporal and functional interplay between electrophysiological and hemodynamic events, it serves as an intrinsic link between modalities and offers a unified representation of cardiac function, with strong potential to enhance multi-modal cardiovascular disease (CVD) detection. However, existing coupling signal estimation methods remain highly vulnerable to noise, particularly in real-world clinical and physiological settings, which undermines their robustness and limits practical value. In this study, we propose Noise-Robust Multi-Modal Coupling Signal Estimation (NMCSE), which reformulates coupling signal estimation as a distribution matching problem solved via optimal transport. By jointly aligning amplitude and timing, NMCSE avoids noise amplification and enables stable signal estimation. When integrated into a Temporal-Spatial Feature Extraction (TSFE) network, the estimated coupling signal effectively enhances multi-modal fusion for more accurate CVD detection. To evaluate robustness under real-world conditions, we design two complementary experiments targeting distinct sources of noise. The first uses the PhysioNet 2016 dataset with simulated hospital noise to assess the resilience of NMCSE to clinical interference. The second leverages the EPHNOGRAM dataset with motion-induced physiological noise to evaluate intra-state estimation stability across activity levels. Experimental results show that NMCSE consistently outperforms existing methods under both clinical and physiological noise, highlighting it as a noise-robust estimation approach that enables reliable multi-modal cardiac detection in real-world conditions.

耦合信号指的是一种潜在的生理信号，它描述了心电图（ECG）捕捉的心脏电兴奋到由心音图（PCG）记录的机械收缩的转换过程。通过编码电生理和血流动力学事件之间的时间性和功能性相互作用，它成为不同模态之间的固有联系，并为心脏功能提供了统一的表示，具有增强多模态心血管疾病（CVD）检测的潜力。然而，现有的耦合信号估计方法仍然高度容易受到噪声的影响，特别是在现实世界的临床和生理环境中，这削弱了它们的稳健性并限制了实用价值。在研究中，我们提出了噪声鲁棒多模态耦合信号估计（NMCSE），它将耦合信号估计重新定义为通过最优传输解决的分布匹配问题。通过联合对齐振幅和时间，NMCSE避免了噪声放大，并实现了稳定的信号估计。当整合到时空特征提取（TSFE）网络时，估计的耦合信号有效地增强了多模态融合，从而实现了更准确的CVD检测。为了评估在现实条件下的稳健性，我们设计了两个互补的实验，针对不同的噪声来源。第一个实验使用PhysioNet 2016数据集和模拟的医院噪声来评估NMCSE对临床干扰的抗性。第二个实验利用EPHNOGRAM数据集和运动引起的生理噪声来评估不同活动水平下的状态内估计稳定性。实验结果表明，NMCSE在临床和生理噪声下始终优于现有方法，凸显出它是一种噪声鲁棒的估计方法，能够在现实条件下实现可靠的多模态心脏检测。

论文及项目相关链接

PDF

Summary
该研究提出一种噪声鲁棒的多模态耦合信号估计方法（NMCSE），用于在心电图和心音图之间建立联系。通过将振幅和时序联合对齐，避免噪声放大并实现稳定信号估计。该研究在真实世界条件下测试了该方法的鲁棒性，显示其在噪声条件下表现出优于现有方法的性能。这为多模态心血管疾病的准确检测提供了新的视角。

Key Takeaways

耦合信号是心电图和心音图之间表征心脏电兴奋转化为机械收缩的潜在生理信号。
耦合信号是不同模态之间的固有联系，为心脏功能提供统一表示，有助于多模态心血管疾病检测。
现有耦合信号估计方法易受噪声影响，特别是在现实世界的临床和生理环境中。
研究提出噪声鲁棒的多模态耦合信号估计方法（NMCSE），旨在解决耦合信号的估计问题。
NMCSE通过将振幅和时序联合对齐，避免噪声放大并实现稳定信号估计。
集成到时空特征提取网络后，估计的耦合信号可提高多模态融合，为更准确的心血管疾病检测提供有效支持。

Cool Papers

点此查看论文截图

Multimodal Cancer Modeling in the Age of Foundation Model Embeddings

Authors:Steven Song, Morgan Borjigin-Wang, Irene Madejski, Robert L. Grossman

The Cancer Genome Atlas (TCGA) has enabled novel discoveries and served as a large-scale reference dataset in cancer through its harmonized genomics, clinical, and imaging data. Numerous prior studies have developed bespoke deep learning models over TCGA for tasks such as cancer survival prediction. A modern paradigm in biomedical deep learning is the development of foundation models (FMs) to derive feature embeddings agnostic to a specific modeling task. Biomedical text especially has seen growing development of FMs. While TCGA contains free-text data as pathology reports, these have been historically underutilized. Here, we investigate the ability to train classical machine learning models over multimodal, zero-shot FM embeddings of cancer data. We demonstrate the ease and additive effect of multimodal fusion, outperforming unimodal models. Further, we show the benefit of including pathology report text and rigorously evaluate the effect of model-based text summarization and hallucination. Overall, we propose an embedding-centric approach to multimodal cancer modeling.

癌症基因组图谱（TCGA）通过其统一的基因组学、临床和成像数据，为癌症研究带来了新的发现，并作为大规模参考数据集发挥作用。许多早期研究已经在TCGA上开发用于癌症生存预测等任务的深度定制深度学习模型。生物医学深度学习的一个现代范式是开发基础模型（FMs），以派生出独立于特定建模任务的特征嵌入。生物医学文本尤其是FMs的开发和应用在不断增加。虽然TCGA包含了作为病理报告的文本数据，但这些数据在历史上被利用不足。在这里，我们研究了在多模态零射流FM嵌入的癌症数据上训练经典机器学习模型的能力。我们证明了多模态融合的简单性和附加效果，其性能优于单模态模型。此外，我们展示了包含病理报告文本的好处，并严格评估了基于模型的文本摘要和虚构化的影响。总体而言，我们提出了一种以嵌入为中心的癌症多模态建模方法。

论文及项目相关链接

PDF camera ready version for ML4H 2025

Summary

TCGA数据库通过其统一的基因组学、临床和成像数据为癌症研究提供了新的发现，并作为大规模参考数据集服务于癌症研究。本文研究了在多模态数据下，利用基础模型嵌入癌症数据训练经典机器学习模型的能力，并验证了多模态融合的优势。同时，文章还探讨了病理报告文本的作用，并通过模型评估了文本摘要和虚构的影响。总体而言，本文提出了以嵌入为中心的多模态癌症建模方法。

Key Takeaways

TCGA数据库作为癌症研究的大规模参考数据集，通过其统一的基因组学、临床和成像数据促进了癌症研究的新发现。
经典机器学习模型在多模态数据下的训练能力得到了研究验证。
多模态融合具有优势，能够提升模型性能。
病理报告文本在癌症建模中的作用得到了重视和探讨。
通过模型评估，验证了文本摘要和虚构对模型的影响。
本文提出了一种以嵌入为中心的多模态癌症建模方法，旨在充分利用各种数据源的优势。

Cool Papers

点此查看论文截图

Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2

Authors:Yuwen Chen, Zafer Yildiz, Qihang Li, Yaqian Chen, Haoyu Dong, Hanxue Gu, Nicholas Konz, Maciej A. Mazurowski

Manual annotation of volumetric medical images, such as magnetic resonance imaging (MRI) and computed tomography (CT), is a labor-intensive and time-consuming process. Recent advancements in foundation models for video object segmentation, such as Segment Anything Model 2 (SAM 2), offer a potential opportunity to significantly speed up the annotation process by manually annotating one or a few slices and then propagating target masks across the entire volume. However, the performance of SAM 2 in this context varies. Our experiments show that relying on a single memory bank and attention module is prone to error propagation, particularly at boundary regions where the target is present in the previous slice but absent in the current one. To address this problem, we propose Short-Long Memory SAM 2 (SLM-SAM 2), a novel architecture that integrates distinct short-term and long-term memory banks with separate attention modules to improve segmentation accuracy. We evaluate SLM-SAM 2 on four public datasets covering organs, bones, and muscles across MRI, CT, and ultrasound videos. We show that the proposed method markedly outperforms the default SAM 2, achieving an average Dice Similarity Coefficient improvement of 0.14 and 0.10 in the scenarios when 5 volumes and 1 volume are available for the initial adaptation, respectively. SLM-SAM 2 also exhibits stronger resistance to over-propagation, reducing the time required to correct propagated masks by 60.575% per volume compared to SAM 2, making a notable step toward more accurate automated annotation of medical images for segmentation model development.

对诸如磁共振成像（MRI）和计算机断层扫描（CT）等体积医学图像进行手动标注是一个劳动密集型和耗时长的过程。最近的视频对象分割基础模型的进展，如Segment Anything Model 2（SAM 2），通过手动标注一个或多个切片并在整个体积上传播目标蒙版，为加速标注过程提供了潜在的机会。然而，SAM 2在此背景下的表现有所不同。我们的实验表明，依赖单一内存银行和注意力模块容易出现误差传播，特别是在目标出现在前一切片中但当前切片中没有出现的边界区域。为了解决这个问题，我们提出了Short-Long Memory SAM 2（SLM-SAM 2），这是一种新型架构，融合了短期和长期内存银行以及独立的注意力模块，以提高分割准确性。我们在包括MRI、CT和超声视频的器官、骨骼和肌肉方面的四个公共数据集上评估了SLM-SAM 2。我们显示，所提出的方法在初始适应时可用5个卷和1个卷的情况下，分别实现了平均Dice相似系数提高0.14和0.10的显著优势。SLM-SAM 2还表现出更强的抗过度传播性，与SAM 2相比，每个体积校正传播蒙版所需的时间减少了60.575%，这在朝着更准确自动标注医学图像以进行分割模型开发方面迈出了重要一步。

论文及项目相关链接

PDF Accepted for publication in IEEE Transactions on Medical Imaging (IEEE TMI)

Summary

本文介绍了医学图像手动标注的繁琐和耗时问题，并探讨了使用Segment Anything Model 2（SAM 2）进行自动标注的可能性。然而，实验中发现了SAM 2在边界区域容易出现误差传播的问题。为解决这个问题，本文提出了Short-Long Memory SAM 2（SLM-SAM 2），通过引入短期和长期记忆库以及独立的注意力模块，提高了分割精度。在MRI、CT和超声视频等公共数据集上的评估表明，SLM-SAM 2在初始适应时较SAM 2有明显提升，Dice相似系数平均提高了0.14和0.10。此外，SLM-SAM 2还具有较强的抗过度传播能力，减少了修正传播掩膜所需的时间。

Key Takeaways

手动标注医学图像是一个劳动密集且耗时的过程。
Segment Anything Model 2 (SAM 2) 可用于加速标注过程。
SAM 2在边界区域存在误差传播问题。
Short-Long Memory SAM 2 (SLM-SAM 2) 通过结合短期和长期记忆库及独立注意力模块，提高了医学图像分割精度。
SLM-SAM 2在多个公共数据集上的表现优于SAM 2，Dice相似系数有显著提高。
SLM-SAM 2具有较强的抗过度传播能力。

Cool Papers

点此查看论文截图

RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability

Authors:Jonggwon Park, Byungmu Yoon, Soobum Kim, Kyoyun Choi

Recent advancements in multimodal models have significantly improved vision-language (VL) alignment in radiology. However, existing approaches struggle to effectively utilize complex radiology reports for learning and offer limited interpretability through attention probability visualizations. To address these challenges, we introduce $\textbf{RadZero}$, a novel framework for VL alignment in chest X-ray with zero-shot multi-task capability. A key component of our approach is $\textbf{VL-CABS}$ ($\textbf{V}$ision-$\textbf{L}$anguage $\textbf{C}$ross-$\textbf{A}$ttention $\textbf{B}$ased on $\textbf{S}$imilarity), which aligns text embeddings with local image features for interpretable, fine-grained VL reasoning. RadZero leverages large language models to extract concise semantic sentences from radiology reports and employs multi-positive contrastive training to effectively capture relationships between images and multiple relevant textual descriptions. It uses a pre-trained vision encoder with additional trainable Transformer layers, allowing efficient high-resolution image processing. By computing similarity between text embeddings and local image patch features, VL-CABS enables zero-shot inference with similarity probability for classification, and pixel-level VL similarity maps for grounding and segmentation. Experimental results on public chest radiograph benchmarks show that RadZero outperforms state-of-the-art methods in zero-shot classification, grounding, and segmentation. Furthermore, VL similarity map analysis highlights the potential of VL-CABS for improving explainability in VL alignment. Additionally, qualitative evaluation demonstrates RadZero’s capability for open-vocabulary semantic segmentation, further validating its effectiveness in medical imaging. Code is available at $\href{https://github.com/deepnoid-ai/RadZero}{https://github.com/deepnoid-ai/RadZero}$.

近期多模态模型的进展在放射学中的视觉语言（VL）对齐方面取得了显著的提升。然而，现有方法难以有效运用复杂的放射学报告进行学习，并且通过注意力概率可视化提供的解释性有限。为了应对这些挑战，我们推出了RadZero，一个具有零样本多任务能力的胸部X光视觉语言对齐新型框架。我们的方法的关键组件是VL-CABS（基于相似性的视觉语言跨注意力），它将文本嵌入与局部图像特征对齐，以实现可解释、精细的VL推理。RadZero利用大型语言模型从放射学报告中提取简洁的语义句子，并采用多阳性对比训练来有效捕捉图像与多个相关文本描述之间的关系。它使用预训练的视觉编码器以及额外的可训练Transformer层，以实现高效的高分辨率图像处理。通过计算文本嵌入和局部图像补丁特征之间的相似性，VL-CABS实现了零样本推断，具有分类的相似性概率，以及用于接地和分割的像素级VL相似性地图。在公共胸部X光基准测试上的实验结果表明，RadZero在零样本分类、接地和分割方面优于最新方法。此外，VL相似性地图分析突出了VL-CABS在提高VL对齐的解释性方面的潜力。定性评估也证明了RadZero在开放词汇语义分割方面的能力，进一步验证了其在医学影像中的有效性。相关代码已发布在[https://github.com/deepnoid-ai/RadZero]。

论文及项目相关链接

PDF NeurIPS 2025

Summary
本文提出一种名为RadZero的新框架，用于胸部X光片的视觉语言对齐。它借助多模态模型的技术进步，提升了理解和解释放射学报告的能力。RadZero采用VL-CABS技术对齐文本嵌入和局部图像特征，并借助大型语言模型进行精细化推理。此外，RadZero采用多阳性对比训练来捕捉图像和多个相关文本描述之间的关系，并通过预训练的视觉编码器和可训练的Transformer层实现高效的高分辨率图像处理。实验结果表明，RadZero在零样本分类、定位、分割等方面优于现有技术。代码已公开于GitHub上。

Key Takeaways

RadZero是一个用于胸部X光片视觉语言对齐的新框架，具有零样本多任务能力。
RadZero通过VL-CABS技术实现文本嵌入和局部图像特征的对齐，提高了理解和解释放射学报告的能力。
RadZero借助大型语言模型进行精细化推理，并采用多阳性对比训练来捕捉图像和文本之间的关系。
RadZero使用预训练的视觉编码器和可训练的Transformer层进行高效的高分辨率图像处理。
RadZero通过计算文本嵌入和局部图像补丁特征之间的相似性，实现了零样本推断和像素级视觉语言相似性地图。
实验结果表明，RadZero在零样本分类、定位、分割等方面具有出色的性能，且其视觉语言相似性地图分析有助于提高解释性。

Cool Papers

点此查看论文截图

Augmented Reality-based Guidance with Deformable Registration in Head and Neck Tumor Resection

Authors:Qingyun Yang, Fangjie Li, Jiayi Xu, Zixuan Liu, Sindhura Sridhar, Whitney Jin, Jennifer Du, Jon Heiselman, Michael Miga, Michael Topf, Jie Ying Wu

Head and neck squamous cell carcinoma (HNSCC) has one of the highest rates of recurrence cases among solid malignancies. Recurrence rates can be reduced by improving positive margins localization. Frozen section analysis (FSA) of resected specimens is the gold standard for intraoperative margin assessment. However, because of the complex 3D anatomy and the significant shrinkage of resected specimens, accurate margin relocation from specimen back onto the resection site based on FSA results remains challenging. We propose a novel deformable registration framework that uses both the pre-resection upper surface and the post-resection site of the specimen to incorporate thickness information into the registration process. The proposed method significantly improves target registration error (TRE), demonstrating enhanced adaptability to thicker specimens. In tongue specimens, the proposed framework improved TRE by up to 33% as compared to prior deformable registration. Notably, tongue specimens exhibit complex 3D anatomies and hold the highest clinical significance compared to other head and neck specimens from the buccal and skin. We analyzed distinct deformation behaviors in different specimens, highlighting the need for tailored deformation strategies. To further aid intraoperative visualization, we also integrated this framework with an augmented reality-based auto-alignment system. The combined system can accurately and automatically overlay the deformed 3D specimen mesh with positive margin annotation onto the resection site. With a pilot study of the AR guided framework involving two surgeons, the integrated system improved the surgeons’ average target relocation error from 9.8 cm to 4.8 cm.

头颈部鳞状细胞癌（HNSCC）在实体恶性肿瘤中复发率较高。通过改善阳性边缘定位可以降低复发率。冰冻切片分析（FSA）是切除标本的术中边缘评估的金标准。然而，由于复杂的3D解剖结构和切除标本的显著收缩，根据FSA结果准确地将边缘重新定位到切除部位仍然具有挑战性。我们提出了一种新型的可变形注册框架，该框架使用切除前的上表面和切除后的标本部位，将厚度信息纳入注册过程中。所提出的方法显著提高了目标注册误差（TRE），并表现出对较厚标本的适应性增强。在舌标本中，与之前的可变形注册相比，所提出框架的TRE提高了高达33%。值得注意的是，舌标本具有复杂的3D解剖结构，相较于来自颊部和皮肤的其它头颈部标本，其临床意义最高。我们分析了不同标本中不同的变形行为，强调了需要定制变形策略。为了进一步辅助术中可视化，我们还将此框架与基于增强现实的自动对齐系统进行了集成。该综合系统可以准确、自动地将变形的3D标本网格与阳性边缘注释叠加到切除部位。通过涉及两名外科医生的AR引导框架试点研究，集成系统使医生的目标平均重新定位误差从9.8厘米减少到4.8厘米。

论文及项目相关链接

PDF Accepted at MICCAI 2025

Summary

本文提出一种新型的变形注册框架，利用术前和术后的表面信息，结合厚度信息，对手术切除标本进行准确的边缘定位。该框架在头颈部鳞状细胞癌的手术中有显著优势，特别是在处理具有复杂三维结构的舌标本时。与先前的方法相比，新框架能提高目标注册误差的准确度，并集成增强现实技术，实现自动对齐，辅助术中可视化。

Key Takeaways

头颈部鳞状细胞癌复发率高，改善阳性边缘定位能降低复发率。
冰冻切片分析是术中边缘评估的金标准，但标本的3D复杂结构和收缩性使准确边缘定位具有挑战性。
新型变形注册框架结合术前和术后的表面信息，考虑厚度因素，显著提高目标注册误差的准确度。
在舌标本上，新框架比传统方法能提高目标注册误差的准确度达33%。
舌标本具有复杂的3D结构，在临床中具有重要意义。
需要针对不同标本的变形行为制定定制化的变形策略。

Cool Papers

点此查看论文截图

Dual-Input Dynamic Convolution for Positron Range Correction in PET Image Reconstruction

Authors:Youness Mellak, Alexandre Bousse, Thibaut Merlin, Élise Émond, Mikko Hakulinen, Dimitris Visvikis

Positron range (PR) blurring degrades positron emission tomography (PET) image resolution, particularly for high-energy emitters like gallium-68 (68Ga). We introduce Dual-input Dynamic Convolution (DDConv), a novel computationally efficient approach trained with voxel-specific PR point spread functions (PSFs) from Monte Carlo (MC) simulations and designed to be utilized within an iterative reconstruction algorithm to perform PR correction (PRC). By dynamically inferring local blurring kernels through a trained convolutional neural network (CNN), DDConv captures complex tissue interfaces more accurately than prior methods. Crucially, it also computes the transpose of the PR operator, ensuring consistency within iterative PET reconstruction. Comparisons with a state-of-the-art, tissue-dependent correction confirm the advantages of DDConv in recovering higher-resolution details in heterogeneous regions, including bone-soft tissue and lung-soft tissue boundaries. Experiments across digital phantoms and MC-simulated data show that DDConv offers near-MC accuracy, and outperforms the state-of-the-art technique, namely spatially-variant and tissue-dependent (SVTD), especially in areas with complex material interfaces. Results from physical phantom experiments further confirmed DDConv’s robustness and practical applicability: while both DDConv and SVTD performed similarly in homogeneous soft-tissue regions, DDConv provided more accurate activity recovery and sharper delineation at heterogeneous lung-soft tissue interfaces.

正电子范围（PR）模糊降低了正电子发射断层扫描（PET）图像的分辨率，特别是对于像镓-68（68Ga）这样的高能发射体。我们引入了双输入动态卷积（DDConv）这一新型计算效率高的方法，它采用蒙特卡洛（MC）模拟的特定于体素的PR点扩散函数（PSFs）进行训练，并被设计用于迭代重建算法中执行PR校正（PRC）。DDConv通过训练有素的卷积神经网络（CNN）动态推断局部模糊核，能更准确地捕捉复杂的组织界面相比于之前的方法。关键的是，它还计算PR算子的转置，确保迭代PET重建中的一致性。与最先进的组织相关校正方法的比较，证实了DDConv在恢复异质区域的高分辨率细节方面的优势，包括骨-软组织以及肺-软组织边界。在数字幻影和MC模拟数据上的实验表明，DDConv接近MC的精度，并且优于最先进的技术，即空间可变和组织依赖性（SVTD），特别是在具有复杂材料界面的区域。物理幻影实验的结果进一步证实了DDConv的稳健性和实用性：虽然DDConv和SVTD在均匀的软组织区域表现相似，但DDConv在异质肺-软组织界面提供了更准确的活性恢复和更清晰的轮廓。

论文及项目相关链接

PDF 11 pages, 10 figures, 2 tables

Summary
正电子范围（PR）模糊会降低正电子发射断层扫描（PET）图像的分辨率，特别是对于高能量发射体如镓-68（68Ga）。研究引入了双输入动态卷积（DDConv）这一新型计算效率高的方法，通过蒙特卡洛（MC）模拟的像素级PR点扩散函数（PSFs）进行训练，用于迭代重建算法中的PR校正（PRC）。DDConv通过训练卷积神经网络动态推断局部模糊核，更准确地捕捉复杂组织界面。同时，它计算PR算子的转置，确保迭代PET重建过程中的一致性。与最新先进技术的组织依赖性校正相比，DDConv在恢复异质区域的较高分辨率细节方面表现出优势，包括骨-软组织、肺-软组织边界等。数字和蒙特卡洛模拟数据显示，DDConv接近蒙特卡洛精度，特别是在具有复杂材料界面的区域，优于空间可变和组织依赖性（SVTD）技术。物理幻影实验的结果进一步证实了DDConv的稳健性和实用性。

Key Takeaways

Positron range (PR) blurring是影响正电子发射断层扫描（PET）图像分辨率的一个重要问题，特别是在使用高能量发射体时。
双输入动态卷积（DDConv）是一种新型的PR校正方法，通过训练卷积神经网络来动态推断局部模糊核，以改善PET图像的分辨率。
DDConv能够更准确地捕捉复杂组织界面，相较于现有方法具有优势。
DDConv通过计算PR算子的转置，确保迭代PET重建过程中的一致性。
DDConv在恢复异质区域的较高分辨率细节方面表现出优势，特别是在骨-软组织、肺-软组织边界等区域。
DDConv在数字和蒙特卡洛模拟数据中表现出近MC精度，优于现有的空间可变和组织依赖性（SVTD）技术。

Cool Papers

点此查看论文截图

BEN: Using Confidence-Guided Matting for Dichotomous Image Segmentation

Authors:Maxwell Meyer, Jack Spruyt

Current approaches to dichotomous image segmentation (DIS) treat image matting and object segmentation as fundamentally different tasks. As improvements in image segmentation become increasingly challenging to achieve, combining image matting and grayscale segmentation techniques offers promising new directions for architectural innovation. Inspired by the possibility of aligning these two model tasks, we propose a new architectural approach for DIS called Confidence-Guided Matting (CGM). We created the first CGM model called Background Erase Network (BEN). BEN consists of two components: BEN Base for initial segmentation and BEN Refiner for confidence-based refinement. Our approach achieves substantial improvements over current state-of-the-art methods on the DIS5K validation dataset, demonstrating that matting-based refinement can significantly enhance segmentation quality. This work introduces a new paradigm for integrating matting and segmentation techniques, improving fine-grained object boundary prediction in computer vision.

当前二分图像分割（DIS）的方法将图像抠图和对象分割视为根本不同的任务。随着图像分割的改进越来越难以实现，结合图像抠图和灰度分割技术为架构创新提供了有前景的新方向。受对齐这两个模型任务的可能性的启发，我们提出了一种新的用于DIS的架构方法，称为置信度引导抠图（CGM）。我们创建了第一个CGM模型，称为背景擦除网络（BEN）。BEN由两部分组成：用于初始分割的BEN Base和基于置信度进行精细调整的BEN Refiner。我们的方法在DIS5K验证数据集上实现了对当前最先进的技术的实质性改进，证明了基于抠图的细化可以显着提高分割质量。这项工作为整合抠图和分割技术引入了新模式，提高了计算机视觉中精细对象边界预测的能力。

论文及项目相关链接

PDF 6 pages, 2 figures, 3 tables, and 1 algorithms

Summary

本文介绍了将图像磨光与灰度分割技术结合，为二值图像分割（DIS）提出一种新的架构方法——信心引导磨光（CGM）。该文章提出了首个基于信心引导磨光的模型，名为背景擦除网络（BEN），由初始分割的BEN基础部分和基于信心进行精细化的BEN细化器构成。此方法在DIS5K验证数据集上相较于现有前沿方法实现了显著的提升，证明了基于磨光的精细化能显著提高分割质量，为计算机视觉中的磨光和分割技术集成提供了新的范例。

Key Takeaways

当前二值图像分割（DIS）方法将图像磨光和对象分割视为根本不同的任务。
结合图像磨光和灰度分割技术为DIS提供了新的研究方向。
提出了信心引导磨光（CGM）的新架构方法。
首个基于信心引导磨光的模型——背景擦除网络（BEN）。
BEN包含两个组件：用于初始分割的BEN基础和用于信心基于的精细化的BEN细化器。
该方法在DIS5K验证数据集上实现了显著的提升，证明了磨光技术对提高分割质量的重要性。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-11-08/%E5%8C%BB%E5%AD%A6%E5%9B%BE%E5%83%8F/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

医学图像

TTS

TTS 方向最新论文已更新，请持续关注 Update in 2025-11-08 MrMARTIAN A Multi-resolution Mass Reconstruction Algorithm Combining Free-form and Analytic Components

2025-11-08 TTS

TTS

Diffusion Models

Diffusion Models 方向最新论文已更新，请持续关注 Update in 2025-11-08 Proto-LeakNet Towards Signal-Leak Aware Attribution in Synthetic Human Face Imagery

2025-11-08 Diffusion Models

Diffusion Models

医学图像

2025-11-08 更新

MedSapiens: Taking a Pose to Rethink Medical Imaging Landmark Detection

Hadronic Processes in Advection-Dominated Accretion Flow as the Origin of TeV Excesses in BL Lac Objects

Covariance Descriptors Meet General Vision Encoders: Riemannian Deep Learning for Medical Image Classification

Systematic Evaluation of Preprocessing Techniques for Accurate Image Registration in Digital Pathology

When Swin Transformer Meets KANs: An Improved Transformer Architecture for Medical Image Segmentation

Adversarial and Score-Based CT Denoising: CycleGAN vs Noise2Score

Improving the Performance of Radiology Report De-identification with Large-Scale Training and Benchmarking Against Cloud Vendor Methods

MedDChest: A Content-Aware Multimodal Foundational Vision Model for Thoracic Imaging

CORE - A Cell-Level Coarse-to-Fine Image Registration Engine for Multi-stain Image Alignment

Possibility of ferro-octupolar order in Ba$_2$CaOsO$_6$ assessed by X-ray magnetic dichroism measurements

Cross-modal Causal Intervention for Alzheimer’s Disease Prediction

DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging

CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists’ Diagnostic Logic

NMCSE: Noise-Robust Multi-Modal Coupling Signal Estimation Method via Optimal Transport for Cardiovascular Disease Detection

Multimodal Cancer Modeling in the Age of Foundation Model Embeddings

Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2

RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability

Augmented Reality-based Guidance with Deformable Registration in Head and Neck Tumor Resection

Dual-Input Dynamic Convolution for Positron Range Correction in PET Image Reconstruction

BEN: Using Confidence-Guided Matting for Dichotomous Image Segmentation