⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-10-23 更新
TreeFedDG: Alleviating Global Drift in Federated Domain Generalization for Medical Image Segmentation
Authors:Yucheng Song, Chenxi Li, Haokang Ding, Zhining Liao, Zhifang Liao
In medical image segmentation tasks, Domain Generalization (DG) under the Federated Learning (FL) framework is crucial for addressing challenges related to privacy protection and data heterogeneity. However, traditional federated learning methods fail to account for the imbalance in information aggregation across clients in cross-domain scenarios, leading to the Global Drift (GD) problem and a consequent decline in model generalization performance. This motivates us to delve deeper and define a new critical issue: global drift in federated domain generalization for medical imaging (FedDG-GD). In this paper, we propose a novel tree topology framework called TreeFedDG. First, starting from the distributed characteristics of medical images, we design a hierarchical parameter aggregation method based on a tree-structured topology to suppress deviations in the global model direction. Second, we introduce a parameter difference-based style mixing method (FedStyle), which enforces mixing among clients with maximum parameter differences to enhance robustness against drift. Third, we develop a a progressive personalized fusion strategy during model distribution, ensuring a balance between knowledge transfer and personalized features. Finally, during the inference phase, we use feature similarity to guide the retrieval of the most relevant model chain from the tree structure for ensemble decision-making, thereby fully leveraging the advantages of hierarchical knowledge. We conducted extensive experiments on two publicly available datasets. The results demonstrate that our method outperforms other state-of-the-art domain generalization approaches in these challenging tasks and achieves better balance in cross-domain performance.
在医学图像分割任务中,联邦学习框架下的领域泛化(DG)对于应对隐私保护和数据异构性的挑战至关重要。然而,传统的联邦学习方法未能考虑到跨域场景中客户端信息聚合的不平衡,从而引发了全局漂移(GD)问题,导致模型泛化性能下降。这促使我们深入研究并定义了一个新的关键问题:医学成像联邦领域泛化中的全局漂移(FedDG-GD)。在本文中,我们提出了一种新的树拓扑框架,称为TreeFedDG。首先,我们从医学图像的分布式特点出发,设计了一种基于树结构拓扑的分层参数聚合方法,以抑制全局模型方向的偏差。其次,我们引入了一种基于参数差异的样式混合方法(FedStyle),通过强制参数差异最大的客户端进行混合,以增强对漂移的鲁棒性。第三,我们在模型分布过程中开发了一种渐进式个性化融合策略,确保知识转移和个性化特征之间的平衡。最后,在推理阶段,我们使用特征相似性来指导从树结构中检索最相关的模型链,以进行集成决策,从而充分利用分层知识的优势。我们在两个公开数据集上进行了大量实验。结果表明,我们的方法在这些具有挑战性的任务中优于其他最先进的领域泛化方法,并在跨域性能上实现了更好的平衡。
论文及项目相关链接
Summary
本文研究了医学图像分割任务中联邦学习框架下的域泛化问题。针对跨域场景中信息聚合不平衡导致的全局漂移问题,提出了新型树拓扑框架TreeFedDG。通过层次参数聚合方法、参数差异风格混合方法和渐进个性化融合策略,提高了模型的泛化性能和鲁棒性。实验结果表明,该方法在挑战性任务上优于其他最先进的域泛化方法,实现了跨域性能的更好平衡。
Key Takeaways
- 医学图像分割任务中,联邦学习框架下的域泛化(DG)至关重要,尤其是解决隐私保护和数据的异质性挑战。
- 传统联邦学习方法在跨域场景中无法平衡信息聚合,导致全局漂移(GD)问题,影响模型泛化性能。
- 新型树拓扑框架TreeFedDG被提出以解决此问题,通过层次参数聚合方法抑制全局模型方向上的偏差。
- 引入参数差异风格混合方法(FedStyle),通过混合参数差异最大的客户端来增强对漂移的鲁棒性。
- 开发渐进个性化融合策略,在模型分布时平衡知识转移和个性化特征。
- 在推理阶段,使用特征相似性来引导从树结构中检索最相关的模型链,以实现分层知识的充分利用。
点此查看论文截图
EMA-SAM: Exponential Moving-average for SAM-based PTMC Segmentation
Authors:Maryam Dialameh, Hossein Rajabzadeh, Jung Suk Sim, Hyock Ju Kwon
Papillary thyroid microcarcinoma (PTMC) is increasingly managed with radio-frequency ablation (RFA), yet accurate lesion segmentation in ultrasound videos remains difficult due to low contrast, probe-induced motion, and heat-related artifacts. The recent Segment Anything Model 2 (SAM-2) generalizes well to static images, but its frame-independent design yields unstable predictions and temporal drift in interventional ultrasound. We introduce \textbf{EMA-SAM}, a lightweight extension of SAM-2 that incorporates a confidence-weighted exponential moving average pointer into the memory bank, providing a stable latent prototype of the tumour across frames. This design preserves temporal coherence through probe pressure and bubble occlusion while rapidly adapting once clear evidence reappears. On our curated PTMC-RFA dataset (124 minutes, 13 patients), EMA-SAM improves \emph{maxDice} from 0.82 (SAM-2) to 0.86 and \emph{maxIoU} from 0.72 to 0.76, while reducing false positives by 29%. On external benchmarks, including VTUS and colonoscopy video polyp datasets, EMA-SAM achieves consistent gains of 2–5 Dice points over SAM-2. Importantly, the EMA pointer adds \textless0.1% FLOPs, preserving real-time throughput of $\sim$30,FPS on a single A100 GPU. These results establish EMA-SAM as a robust and efficient framework for stable tumour tracking, bridging the gap between foundation models and the stringent demands of interventional ultrasound. Codes are available here \hyperref[code {https://github.com/mdialameh/EMA-SAM}.
甲状腺微小乳头状癌(PTMC)越来越多地采用射频消融(RFA)进行治疗,但由于超声视频中的低对比度、探针引起的运动以及热相关伪影,准确的病灶分割仍然是一个难题。最近的Segment Anything Model 2(SAM-2)对静态图像具有良好的通用性,但其独立于帧的设计导致预测不稳定和介入超声中的时间漂移。我们引入了EMA-SAM,它是SAM-2的一个轻量级扩展,它将置信度加权指数移动平均指针引入到内存银行中,为跨帧的肿瘤提供了一个稳定的潜在原型。这种设计在探头压力和气泡闭塞时会保留时间的连贯性,当清晰的证据再次出现时能够迅速适应。在我们的定制的PTMC-RFA数据集上(124分钟,13名患者),EMA-SAM将最大Dice系数从SAM-2的0.82提高到0.86,最大IoU从0.72提高到0.76,同时减少了假阳性结果达29%。在外部基准测试中,包括VTUS和结肠镜检查视频息肉数据集,EMA-SAM相较于SAM-2实现了稳定的增益,Dice点数提高了2-5点。重要的是,EMA指针增加了不到0.1%的FLOPs,在单个A100 GPU上保持了约30 FPS的实时吞吐量。这些结果证明了EMA-SAM是一个稳健且高效的框架,用于稳定的肿瘤追踪,缩小了基础模型与介入超声的严格需求之间的差距。代码可通过此链接获得:https://github.com/mdialameh/EMA-SAM。
论文及项目相关链接
Summary
论文介绍了针对乳头状甲状腺微小癌(PTMC)的射频消融(RFA)治疗中,超声视频中的病灶分割难题。提出一种名为EMA-SAM的模型,通过引入信心加权指数移动平均指针到记忆银行中,实现了对肿瘤的稳定追踪,提高了分割精度并降低了误报。在PTMC-RFA数据集上,EMA-SAM相较于SAM-2提高了分割效果,并在外部基准测试中实现了稳定的性能提升。EMA指针的添加几乎不影响计算负担,可在单个A100 GPU上实现实时处理。
Key Takeaways
- 乳头状甲状腺微小癌(PTMC)的射频消融(RFA)治疗中的超声视频病灶分割面临困难,主要由于低对比度、探针引起的运动和热相关伪影。
- Segment Anything Model 2(SAM-2)在静态图像上表现良好,但在介入性超声中存在不稳定预测和时间漂移的问题。
- EMA-SAM模型是SAM-2的轻量级扩展,通过引入信心加权指数移动平均指针到记忆银行中,提高了分割精度并降低了误报。
- 在PTMC-RFA数据集上,EMA-SAM提高了最大Dice系数和最大IoU,并减少了误报。
- EMA-SAM在外部基准测试中实现了稳定的性能提升,相较于SAM-2增加了2-5个Dice点。
- EMA指针的添加几乎不影响计算负担,可实现实时处理。
- EMA-SAM模型为稳定的肿瘤追踪提供了稳健且高效的框架,缩小了基础模型与介入性超声的严格需求之间的差距。
点此查看论文截图
Spherical Radiomics – A Novel Approach to Glioblastoma Radiogenomic Analysis of Heterogeneity
Authors:Haotian Feng, Ke Sheng
We develop and validate a novel spherical radiomics framework for predicting key molecular biomarkers using multiparametric MRI. Conventional Cartesian radiomics extract tumor features on orthogonal grids, which do not fully capture the tumor’s radial growth patterns and can be insensitive to evolving molecular signatures. In this study, we analyzed GBM radiomic features on concentric 2D shells, which were then mapped onto 2D planes for radiomics analysis. Radiomic features were extracted using PyRadiomics from four different regions in GBM. Feature selection was performed using ANOVA F-statistics. Classification was conducted with multiple machine-learning models. Model interpretability was evaluated through SHAP analysis, clustering analysis, feature significance profiling, and comparison between radiomic patterns and underlying biological processes. Spherical radiomics consistently outperformed conventional 2D and 3D Cartesian radiomics across all prediction tasks. The best framework reached an AUC of 0.85 for MGMT, 0.80 for EGFR, 0.80 for PTEN, and 0.83 for survival prediction. GLCM-derived features were identified as the most informative predictors. Radial transition analysis using the Mann-Whitney U-test demonstrates that transition slopes between T1-weighted contrast-enhancing and T2/FLAIR hyperintense lesion regions, as well as between T2 intense lesion and a 2 cm peritumoral expansion region, are significantly associated with biomarker status. Furthermore, the observed radiomic changes along the radial direction closely reflected known biological characteristics. Radiomic features extracted on the spherical surfaces at varying radial distances to the GBM tumor centroid are better correlated with important tumor molecular markers and patient survival than the conventional Cartesian analysis.
我们开发并验证了一种新型球形放射学特征框架,该框架使用多参数MRI预测关键分子生物标志物。传统的笛卡尔放射学在正交网格上提取肿瘤特征,这并不能完全捕捉肿瘤的径向生长模式,并且对不断发展的分子特征可能不敏感。在这项研究中,我们在同心二维外壳上分析了GBM的放射学特征,然后将其映射到二维平面上进行放射学分析。使用PyRadiomics从GBM的四个不同区域提取放射学特征。使用ANOVA F统计量进行特征选择。分类采用多种机器学习模型。通过SHAP分析、聚类分析、特征显著性分析和放射学模式与潜在生物学过程之间的比较来评估模型的可解释性。球形放射学在所有预测任务中始终优于传统的二维和三维笛卡尔放射学。最佳框架对MGMT的AUC达到0.85,对EGFR的AUC为0.80,对PTEN的AUC为0.80,对生存预测的AUC为0.83。GLCM衍生的特征被识别为最有信息量的预测因子。使用Mann-Whitney U检验进行的径向转换分析表明,T1加权对比增强区域与T2/FLAIR高信号病变区域之间以及T2高信号病变区域与周围肿瘤扩展区之间的过渡斜率与生物标志物状态密切相关。此外,沿径向观察到的放射学变化密切反映了已知的生物特性。在距离GBM肿瘤中心不同径向距离的球形表面上提取的放射学特征与重要的肿瘤分子标记和患者生存情况相比,与传统的笛卡尔分析相比具有更好的相关性。
论文及项目相关链接
摘要
本文开发并验证了一种新型球形放射学框架,用于利用多参数MRI预测关键分子生物标志物。该研究在同心二维外壳上分析了GBM放射学特征,并将其映射到二维平面上进行放射学分析。使用PyRadiomics从GBM的四个不同区域中提取放射学特征。通过ANOVA F统计进行特征选择。使用多种机器学习模型进行分类。通过SHAP分析、聚类分析、特征显著性分析和放射学模式与基础生物学过程之间的比较来评估模型的可解释性。球形放射学在所有预测任务中均优于传统的二维和三维笛卡尔放射学。最佳框架对MGMT的AUC达到0.85,对EGFR和PTEN的AUC均为0.80,生存预测AUC为0.83。GLCM衍生的特征被识别为最有信息量的预测因子。通过Mann-Whitney U检验进行的径向转换分析显示,T1加权对比增强区域与T2/FLAIR高亮度病变区域之间的转换斜率以及与肿瘤周围扩张区域的转换斜率与生物标志物状态显著相关。此外,观察到的沿径向方向的放射学变化紧密反映了已知的生物特性。与常规笛卡尔分析相比,在距GBM肿瘤中心不同径向距离的球形表面上提取的放射学特征与重要的肿瘤分子标记物和患者生存率的关联更为密切。
关键发现
- 开发并验证了一种新型球形放射学框架,用于利用多参数MRI预测关键分子生物标志物。
- 相比传统的笛卡尔放射学,球形放射学在预测任务中表现更佳。
- 最佳框架的AUC值较高,特别是在预测MGMT、EGFR、PTEN以及生存预测方面。
- GLCM衍生的特征是最具信息量的预测因子。
- 径向转换分析揭示了特定区域间转换斜率与生物标志物状态的显著关联。
- 放射学变化与已知的生物特性紧密相关。
点此查看论文截图
GeoCAD: Local Geometry-Controllable CAD Generation with Large Language Models
Authors:Zhanwei Zhang, Kaiyuan Liu, Junjie Liu, Wenxiao Wang, Binbin Lin, Liang Xie, Chen Shen, Deng Cai
Local geometry-controllable computer-aided design (CAD) generation aims to modify local parts of CAD models automatically, enhancing design efficiency. It also ensures that the shapes of newly generated local parts follow user-specific geometric instructions (e.g., an isosceles right triangle or a rectangle with one corner cut off). However, existing methods encounter challenges in achieving this goal. Specifically, they either lack the ability to follow textual instructions or are unable to focus on the local parts. To address this limitation, we introduce GeoCAD, a user-friendly and local geometry-controllable CAD generation method. Specifically, we first propose a complementary captioning strategy to generate geometric instructions for local parts. This strategy involves vertex-based and VLLM-based captioning for systematically annotating simple and complex parts, respectively. In this way, we caption $\sim$221k different local parts in total. In the training stage, given a CAD model, we randomly mask a local part. Then, using its geometric instruction and the remaining parts as input, we prompt large language models (LLMs) to predict the masked part. During inference, users can specify any local part for modification while adhering to a variety of predefined geometric instructions. Extensive experiments demonstrate the effectiveness of GeoCAD in generation quality, validity and text-to-CAD consistency. Code will be available at https://github.com/Zhanwei-Z/GeoCAD.
局部几何可控计算机辅助设计(CAD)生成旨在自动修改CAD模型的部分,提高设计效率。同时,它还能确保新生成的局部部件的形状符合用户特定的几何指令(例如,等腰直角三角形或切掉一角的矩形)。然而,现有方法在实现这一目标时面临挑战。具体来说,它们要么无法遵循文本指令,要么无法专注于局部部件。为了解决这一局限性,我们引入了GeoCAD,这是一种用户友好、局部几何可控的CAD生成方法。具体来说,我们首先提出了一种补充标题策略,为局部部件生成几何指令。该策略包括基于顶点和基于VLLM的标题,分别用于系统注释简单和复杂部件。通过这种方式,我们总共标注了约22万1千个不同的局部部件。在训练阶段,给定一个CAD模型,我们随机遮挡一个局部部件。然后,利用其几何指令和剩余部件作为输入,我们提示大型语言模型(LLM)预测被遮挡的部分。在推理阶段,用户可以指定任何局部部件进行修改,同时遵循多种预定义的几何指令。大量实验表明,GeoCAD在生成质量、有效性和文本到CAD的一致性方面都非常有效。代码将在https://github.com/Zhanwei-Z/GeoCAD上提供。
论文及项目相关链接
PDF Accepted by NeurIPS 2025
Summary
该文本介绍了一种名为GeoCAD的用户友好型本地几何可控计算机辅助设计生成方法。该方法通过互补标注策略生成局部几何指令,采用顶点基和VLLM基标注方式分别标注简单和复杂部件。在训练阶段,随机遮挡CAD模型中的局部部件,利用几何指令和剩余部件作为输入,通过大型语言模型预测遮挡部件。在推理阶段,用户可指定任何局部部件进行修改,同时遵循多种预定义的几何指令。该方法在生成质量、有效性和文本到CAD的一致性方面表现出良好的效果。
Key Takeaways
- GeoCAD是一种用户友好型、局部几何可控的计算机辅助设计生成方法。
- GeoCAD采用互补标注策略生成局部几何指令。
- GeoCAD通过顶点基和VLLM基标注方式分别处理简单和复杂部件的标注。
- 在训练阶段,GeoCAD通过随机遮挡CAD模型中的局部部件,并利用大型语言模型进行预测。
- 用户可指定任何局部部件进行修改,并遵循多种预定义的几何指令。
- GeoCAD在生成质量、有效性和文本到CAD的一致性方面表现出良好的效果。
点此查看论文截图
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
Authors:Yandong Guan, Xilin Wang, Ximing Xing, Jing Zhang, Dong Xu, Qian Yu
In this work, we introduce CAD-Coder, a novel framework that reformulates text-to-CAD as the generation of CadQuery scripts - a Python-based, parametric CAD language. This representation enables direct geometric validation, a richer modeling vocabulary, and seamless integration with existing LLMs. To further enhance code validity and geometric fidelity, we propose a two-stage learning pipeline: (1) supervised fine-tuning on paired text-CadQuery data, and (2) reinforcement learning with Group Reward Policy Optimization (GRPO), guided by a CAD-specific reward comprising both a geometric reward (Chamfer Distance) and a format reward. We also introduce a chain-of-thought (CoT) planning process to improve model reasoning, and construct a large-scale, high-quality dataset of 110K text-CadQuery-3D model triplets and 1.5K CoT samples via an automated pipeline. Extensive experiments demonstrate that CAD-Coder enables LLMs to generate diverse, valid, and complex CAD models directly from natural language, advancing the state of the art of text-to-CAD generation and geometric reasoning.
在这项工作中,我们介绍了CAD-Coder这一新型框架,它将文本到CAD的转换重新定义为CadQuery脚本的生成——一种基于Python的参数化CAD语言。这种表示方法能够实现直接的几何验证、更丰富的建模词汇,以及与现有大型语言模型的无缝集成。为了进一步提高代码的有效性和几何保真度,我们提出了一个两阶段的学习管道:(1)在成对的文本-CadQuery数据上进行监督微调;(2)采用群体奖励政策优化(GRPO)的强化学习,由包括几何奖励(Chamfer距离)和格式奖励在内的CAD特定奖励进行指导。我们还引入了思维链(CoT)规划过程,以提高模型推理能力,并通过自动化管道构建了一个大规模、高质量的包含110K个文本-CadQuery-3D模型三元组和1.5K个CoT样本的数据集。大量实验表明,CAD-Coder使大型语言模型能够直接从自然语言生成多样、有效和复杂的CAD模型,推动了文本到CAD生成和几何推理的最新技术进展。
论文及项目相关链接
Summary
CAD-Coder框架能将文本转化为CAD设计,通过生成CadQuery脚本实现。该框架支持直接几何验证、丰富的建模词汇,并能无缝融入现有大型语言模型。为提高代码有效性和几何保真度,研究提出两阶段学习管道:监督微调与配对文本-CadQuery数据强化学习。此外,引入思维链规划过程改善模型推理,并通过自动化管道构建大型高质量数据集。实验证明,CAD-Coder使LLMs能直接从自然语言生成多样、有效、复杂的CAD模型,推动文本到CAD生成和几何推理领域的发展。
Key Takeaways
- CAD-Coder框架将文本转化为CAD设计,通过生成CadQuery脚本实现,支持直接几何验证和丰富的建模语言。
- 框架采用两阶段学习管道,包括监督微调与强化学习,提高代码有效性和几何保真度。
- 引入思维链(CoT)规划过程,改善模型推理能力。
- 构建大型高质量数据集,包含110K文本-CadQuery-3D模型三元组和1.5K思维链样本。
- CAD-Coder框架能够生成多样、有效、复杂的CAD模型。
- 该框架推动了文本到CAD生成领域的最新技术发展水平。
点此查看论文截图
Sampling Kantorovich operators for speckle noise reduction using a Down-Up scaling approach and gap filling in remote sensing images
Authors:Danilo Costarelli, Mariarosaria Natale
In the literature, several approaches have been proposed for restoring and enhancing remote sensing images, including methods based on interpolation, filtering, and deep learning. In this paper, we investigate the application of multivariate sampling Kantorovich (SK) operators for image reconstruction, with a particular focus on gap filling and speckle noise reduction. To understand the accuracy performances of the proposed algorithms, we first derive a quantitative estimate in $C(\R^n)$ for the error of approximation using the Euler-Maclaurin summation formula, under weak regularity conditions. We also establish a convergence result and a quantitative estimate with respect to the dissimilarity index measured by the continuous SSIM for functions in Lebesgue spaces. Additionally, we prove a multidimensional linear prediction result, which is used to design a new SK-based reconstruction algorithm to handle missing data, that we call LP-SK algorithm. To address speckle noise, we integrate SK operators into a newly proposed Down-Up scaling approach. Numerical tests are presented on synthetic and real SAR images to validate the proposed methods. Performance is assessed using similarity metrics such as SSIM and PSNR, along with speckle-specific indexes. Comparative analysis with state-of-the-art techniques highlights the effectiveness of the proposed approaches.
在文献中,已经提出了多种用于恢复和增强遥感图像的方法,包括基于插值、滤波和深度学习的方法。本文研究了多元采样Kantorovich(SK)算子在图像重建中的应用,特别关注间隙填充和斑点噪声减少。为了了解所提出算法的准确性能,我们首先利用欧拉-麦克劳林求和公式,在较弱的正则条件下,对近似误差进行了定量估计。我们还建立了关于勒贝格空间中函数的不相似性指数测量的收敛结果和定量估计,并证明了多维线性预测结果,该结果用于设计一种处理缺失数据的新SK重建算法,我们称之为LP-SK算法。为解决斑点噪声问题,我们将SK算子集成到新提出的上下缩放方法中。对合成和真实SAR图像进行了数值测试,以验证所提出的方法。性能评估使用SSIM和PSNR等相似度度量以及斑点特定指标。与最新技术的比较分析突出了所提出方法的有效性。
论文及项目相关链接
Summary
基于插值、滤波和深度学习等方法,文献中提出了多种遥感图像恢复和增强的方法。本文研究应用多元采样Kantorovich(SK)算子进行图像重建,特别关注数据填充和斑点噪声减少。本文推导了使用Euler-Maclaurin求和公式进行近似误差的定量估计,建立了关于连续SSIM测量的收敛结果和定量估计。此外,本文证明了用于处理缺失数据的基于SK重建算法的多维线性预测结果。为了解决斑点噪声问题,我们将SK算子集成到一种新的下采样上采样缩放方法中。数值测试表明,该方法在合成和真实SAR图像上的性能优异。通过对比最先进的技术,验证了所提出方法的有效性。
Key Takeaways
- 研究了多元采样Kantorovich(SK)算子在遥感图像恢复中的应用。
- 推导了近似误差的定量估计方法。
- 建立了关于连续SSIM测量的收敛性和定量估计。
- 证明了处理缺失数据的基于SK重建算法的多维线性预测结果。
- 集成SK算子到一种新颖的下采样上采样缩放方法以解决斑点噪声问题。
- 通过数值测试验证了方法的有效性,包括在合成和真实SAR图像上的测试。
点此查看论文截图
Regression is all you need for medical image translation
Authors:Sebastian Rassmann, David Kügler, Christian Ewert, Martin Reuter
While Generative Adversarial Nets (GANs) and Diffusion Models (DMs) have achieved impressive results in natural image synthesis, their core strengths - creativity and realism - can be detrimental in medical applications, where accuracy and fidelity are paramount. These models instead risk introducing hallucinations and replication of unwanted acquisition noise. Here, we propose YODA (You Only Denoise once - or Average), a 2.5D diffusion-based framework for medical image translation (MIT). Consistent with DM theory, we find that conventional diffusion sampling stochastically replicates noise. To mitigate this, we draw and average multiple samples, akin to physical signal averaging. As this effectively approximates the DM’s expected value, we term this Expectation-Approximation (ExpA) sampling. We additionally propose regression sampling YODA, which retains the initial DM prediction and omits iterative refinement to produce noise-free images in a single step. Across five diverse multi-modal datasets - including multi-contrast brain MRI and pelvic MRI-CT - we demonstrate that regression sampling is not only substantially more efficient but also matches or exceeds image quality of full diffusion sampling even with ExpA. Our results reveal that iterative refinement solely enhances perceptual realism without benefiting information translation, which we confirm in relevant downstream tasks. YODA outperforms eight state-of-the-art DMs and GANs and challenges the presumed superiority of DMs and GANs over computationally cheap regression models for high-quality MIT. Furthermore, we show that YODA-translated images are interchangeable with, or even superior to, physical acquisitions for several medical applications.
尽管生成对抗网络(GANs)和扩散模型(DMs)在自然图像合成方面取得了令人印象深刻的结果,但它们在医学应用中的核心优势——创造性和逼真性——可能会带来负面影响,因为在医学应用中,准确性和保真度是至关重要的。这些模型反而可能引入幻觉和复制不需要的采集噪声。在这里,我们提出了YODA(You Only Denoise once - or Average,仅去噪一次或平均),这是一个基于扩散的医学图像翻译(MIT)的2.5D框架。符合DM理论,我们发现传统的扩散采样会随机复制噪声。为了缓解这个问题,我们绘制并平均多个样本,类似于物理信号平均。由于这有效地近似了DM的期望值,我们将其称为期望近似(ExpA)采样。此外,我们还提出了回归采样YODA,它保留了初始DM预测值,并省略了迭代细化,以单步生成无噪声图像。在五个多样化的多模式数据集上——包括多对比度脑部MRI和盆腔MRI-CT——我们证明回归采样不仅大大提高了效率,而且即使使用ExpA,其图像质量也与全扩散采样相匹配或更高。我们的结果表明,迭代细化只是提高了感知的逼真度,并没有提高信息翻译的质量,这一点在相关的下游任务中也得到了证实。YODA表现优于八种最先进的DMs和GANs,并挑战了DMs和GANs在高质量MIT方面相对于计算成本较低的回归模型的假定优越性。此外,我们还表明,YODA翻译的图像可与多种医学应用中的物理采集互换,甚至在某些情况下表现更佳。
论文及项目相关链接
Summary
YODA是一种用于医学图像翻译(MIT)的2.5D扩散框架,旨在解决生成对抗网络(GANs)和扩散模型(DMs)在医学应用中引入的噪声和幻觉问题。YODA通过多次采样平均来提高图像质量,并引入期望近似(ExpA)采样和回归采样方法。在多个多模态数据集上的实验表明,回归采样不仅效率更高,而且图像质量与全扩散采样相匹配或更高。YODA挑战了DMs和GANs在计算成本较低的回归模型上的优越性,并证明其翻译的图像可互换或优于物理采集的图像。
Key Takeaways
- YODA是一种针对医学图像翻译的2.5D扩散框架,旨在解决GANs和DMs在医学应用中可能引入的噪声和幻觉问题。
- YODA通过多次采样平均来提高图像质量,提出了期望近似(ExpA)采样方法。
- 回归采样方法在YODA中被引入,该方法保留了初始DM预测,并产生无噪声图像,无需迭代优化。
- 在多个多模态数据集上的实验表明,回归采样不仅效率更高,而且图像质量至少与全扩散采样相匹配。
- YODA挑战了DMs和GANs在计算成本较低的回归模型上的优越性。
- YODA在医学图像翻译方面的性能超越了八种最新的DMs和GANs。
点此查看论文截图
Monitoring morphometric drift in lifelong learning segmentation of the spinal cord
Authors:Enamundram Naga Karthik, Sandrine Bédard, Jan Valošek, Christoph S. Aigner, Elise Bannier, Josef Bednařík, Virginie Callot, Anna Combes, Armin Curt, Gergely David, Falk Eippert, Lynn Farner, Michael G Fehlings, Patrick Freund, Tobias Granberg, Cristina Granziera, RHSCIR Network Imaging Group, Ulrike Horn, Tomáš Horák, Suzanne Humphreys, Markus Hupp, Anne Kerbrat, Nawal Kinany, Shannon Kolind, Petr Kudlička, Anna Lebret, Lisa Eunyoung Lee, Caterina Mainero, Allan R. Martin, Megan McGrath, Govind Nair, Kristin P. O’Grady, Jiwon Oh, Russell Ouellette, Nikolai Pfender, Dario Pfyffer, Pierre-François Pradat, Alexandre Prat, Emanuele Pravatà, Daniel S. Reich, Ilaria Ricchi, Naama Rotem-Kohavi, Simon Schading-Sassenhausen, Maryam Seif, Andrew Smith, Seth A Smith, Grace Sweeney, Roger Tam, Anthony Traboulsee, Constantina Andrada Treaba, Charidimos Tsagkas, Zachary Vavasour, Dimitri Van De Ville, Kenneth Arnold Weber II, Sarath Chandar, Julien Cohen-Adad
Morphometric measures derived from spinal cord segmentations can serve as diagnostic and prognostic biomarkers in neurological diseases and injuries affecting the spinal cord. While robust, automatic segmentation methods to a wide variety of contrasts and pathologies have been developed over the past few years, whether their predictions are stable as the model is updated using new datasets has not been assessed. This is particularly important for deriving normative values from healthy participants. In this study, we present a spinal cord segmentation model trained on a multisite $(n=75)$ dataset, including 9 different MRI contrasts and several spinal cord pathologies. We also introduce a lifelong learning framework to automatically monitor the morphometric drift as the model is updated using additional datasets. The framework is triggered by an automatic GitHub Actions workflow every time a new model is created, recording the morphometric values derived from the model’s predictions over time. As a real-world application of the proposed framework, we employed the spinal cord segmentation model to update a recently-introduced normative database of healthy participants containing commonly used measures of spinal cord morphometry. Results showed that: (i) our model outperforms previous versions and pathology-specific models on challenging lumbar spinal cord cases, achieving an average Dice score of $0.95 \pm 0.03$; (ii) the automatic workflow for monitoring morphometric drift provides a quick feedback loop for developing future segmentation models; and (iii) the scaling factor required to update the database of morphometric measures is nearly constant among slices across the given vertebral levels, showing minimum drift between the current and previous versions of the model monitored by the framework. The code and model are open-source and accessible via Spinal Cord Toolbox v7.0.
从脊髓分割得到的形态测量指标可以作为神经系统疾病和脊髓损伤的诊断和预后生物标志物。虽然过去几年已经开发出了针对各种对比度和病理特征的稳健自动分割方法,但尚未评估随着新数据集的使用模型更新后其预测的稳定性。这对于从健康参与者中得出规范值尤为重要。在这项研究中,我们展示了一个在多个站点(n=75)数据集上训练的脊髓分割模型,包括9种不同的MRI对比度和几种脊髓病理特征。我们还介绍了一个终身学习框架,以自动监控模型使用附加数据集更新时的形态漂移。每当创建新模型时,该框架都会由GitHub Actions工作流程自动触发,记录模型预测随时间推移得出的形态测量值。作为所提出框架的实际应用,我们采用了脊髓分割模型来更新最近引入的健康参与者规范数据库,其中包含常用的脊髓形态测量指标。结果表明:(i)我们的模型在具有挑战性的腰椎脊髓病例上的表现优于以前版本和特定病理模型,平均Dice得分为$0.95 \pm 0.03$;(ii)用于监控形态漂移的自动工作流程为开发未来的分割模型提供了快速反馈循环;(iii)用于更新形态测量数据库的缩放因子在给定的椎体水平之间的切片中几乎恒定,显示出当前版本和由框架监控的先前版本之间的最小漂移。代码和模型都是开源的,可通过脊髓工具箱v7.0访问。
论文及项目相关链接
PDF Under review (after 1st round of revision) at Imaging Neuroscience journal
摘要
基于医学图像分割的脊髓形态计量测量可作为神经系统疾病和脊髓损伤的诊断和预后生物标志物。尽管过去几年已经开发了许多稳健的自动分割方法,以适应各种对比度和病理学,但尚未评估使用新数据集更新模型时其预测的稳定性。这对于从健康参与者中得出规范值尤为重要。本研究中,我们展示了一个在多个站点(n=75)的数据集上进行训练的脊髓分割模型,包括9种不同的MRI对比度和多种脊髓病理。我们还介绍了一种终身学习框架,可自动监视形态计量漂移,随着使用附加数据集更新模型时也会触发。该框架每次创建新模型时都会由GitHub Actions工作流自动触发,记录模型预测得出的形态计量值随时间的变化。作为所提出框架的实际应用,我们使用脊髓分割模型来更新最近引入的健康参与者的规范数据库,其中包含常用的脊髓形态计量测量。结果表明:(i)我们的模型在具有挑战性的腰椎脊髓病例上的表现优于以前版本和特定病理模型,平均Dice得分为0.95±0.03;(ii)用于监视形态计量漂移的自动工作流程为开发未来的分割模型提供了快速反馈循环;(iii)用于更新数据库的形态计量测量的缩放因子在给定的椎体水平之间几乎恒定,显示出模型和当前版本之间最小的漂移。代码和模型是开源的,可通过脊髓工具箱v7.0访问。
要点
- 医学图像分割的脊髓形态计量测量可作为神经系统疾病和损伤的预后指标。
- 虽然已有自动分割方法的发展,但缺乏对使用新数据集更新模型时的预测稳定性的评估。
- 介绍了一种新型的脊髓分割模型和终身学习框架,该框架可自动监视形态计量变化并评估模型预测的稳定性。
- 模型表现出良好的性能,尤其是在具有挑战性的腰椎脊髓病例中。
- 自动工作流为未来的分割模型提供了快速反馈循环,帮助优化和改进模型性能。
- 更新后的数据库形态计量测量的缩放因子在给定的椎体水平之间几乎恒定,显示出模型的稳定性和可靠性。
点此查看论文截图
Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation
Authors:Jiyuan Wang, Chunyu Lin, Cheng Guan, Lang Nie, Jing He, Haodong Li, Kang Liao, Yao Zhao
In this paper, we propose Jasmine, the first Stable Diffusion (SD)-based self-supervised framework for monocular depth estimation, which effectively harnesses SD’s visual priors to enhance the sharpness and generalization of unsupervised prediction. Previous SD-based methods are all supervised since adapting diffusion models for dense prediction requires high-precision supervision. In contrast, self-supervised reprojection suffers from inherent challenges (e.g., occlusions, texture-less regions, illumination variance), and the predictions exhibit blurs and artifacts that severely compromise SD’s latent priors. To resolve this, we construct a novel surrogate task of hybrid image reconstruction. Without any additional supervision, it preserves the detail priors of SD models by reconstructing the images themselves while preventing depth estimation from degradation. Furthermore, to address the inherent misalignment between SD’s scale and shift invariant estimation and self-supervised scale-invariant depth estimation, we build the Scale-Shift GRU. It not only bridges this distribution gap but also isolates the fine-grained texture of SD output against the interference of reprojection loss. Extensive experiments demonstrate that Jasmine achieves SoTA performance on the KITTI benchmark and exhibits superior zero-shot generalization across multiple datasets.
本文中,我们提出了Jasmine,这是基于Stable Diffusion(SD)的首个单目深度估计自监督框架,它有效地利用SD的视觉先验知识,提高了无监督预测的清晰度和泛化能力。之前的基于SD的方法都是有监督的,因为将扩散模型用于密集预测需要高精度监督。相比之下,自监督重投影面临固有的挑战(例如遮挡、无纹理区域、光照变化),并且预测结果出现模糊和伪影,严重损害SD的潜在先验知识。为了解决这一问题,我们构建了一个混合图像重建的新型替代任务。它不依赖任何额外的监督,通过重建图像本身来保留SD模型的细节先验知识,同时防止深度估计退化。此外,为了解决SD尺度与移位不变估计和自监督尺度不变深度估计之间的固有不匹配问题,我们构建了Scale-Shift GRU。它不仅弥补了分布差距,而且针对重投影损失的干扰,隔离了SD输出的精细纹理。大量实验表明,Jasmine在KITTI基准测试中达到了最新技术水平,并在多个数据集上表现出出色的零样本泛化能力。
论文及项目相关链接
PDF Accepted to NeurIPS 2025. 23 pages, with the appendix
Summary
本文提出了Jasmine,首个基于Stable Diffusion(SD)的自监督单目深度估计框架。它有效地利用SD的视觉先验知识,提高了无监督预测的清晰度和泛化能力。为解决SD模型在密集预测中的高精度监督需求问题,以及自监督重投影面临的固有挑战(如遮挡、无纹理区域、光照变化等),本文构建了一个混合图像重建的替代任务,无需额外监督即可保留SD模型的细节先验,同时防止深度估计退化。此外,为解决SD尺度与自监督深度估计之间的固有不匹配问题,本文构建了Scale-Shift GRU,不仅弥补了分布差距,还隔离了SD输出的精细纹理,减少了重投影损失的干扰。实验表明,Jasmine在KITTI基准测试上达到了最先进的性能,并在多个数据集上表现出优异的零样本泛化能力。
Key Takeaways
- Jasmine是首个基于Stable Diffusion(SD)的自监督单目深度估计框架,利用SD的视觉先验知识提高预测清晰度。
- 替代任务——混合图像重建,无需额外监督即可保留SD模型的细节先验,防止深度估计退化。
- 解决了SD模型在密集预测中的高精密监督需求问题。
- 提出了Scale-Shift GRU以解决SD尺度与自监督深度估计之间的固有不匹配问题。
- Jasmine在KITTI基准测试上表现优异,达到最先进的性能。
- Jasmine展现出良好的零样本泛化能力。
点此查看论文截图
Adaptive Label Correction for Robust Medical Image Segmentation with Noisy Labels
Authors:Chengxuan Qian, Kai Han, Jianxia Ding, Chongwen Lyu, Zhenlong Yuan, Jun Chen, Zhe Liu
Deep learning has shown remarkable success in medical image analysis, but its reliance on large volumes of high-quality labeled data limits its applicability. While noisy labeled data are easier to obtain, directly incorporating them into training can degrade model performance. To address this challenge, we propose a Mean Teacher-based Adaptive Label Correction (ALC) self-ensemble framework for robust medical image segmentation with noisy labels. The framework leverages the Mean Teacher architecture to ensure consistent learning under noise perturbations. It includes an adaptive label refinement mechanism that dynamically captures and weights differences across multiple disturbance versions to enhance the quality of noisy labels. Additionally, a sample-level uncertainty-based label selection algorithm is introduced to prioritize high-confidence samples for network updates, mitigating the impact of noisy annotations. Consistency learning is integrated to align the predictions of the student and teacher networks, further enhancing model robustness. Extensive experiments on two public datasets demonstrate the effectiveness of the proposed framework, showing significant improvements in segmentation performance. By fully exploiting the strengths of the Mean Teacher structure, the ALC framework effectively processes noisy labels, adapts to challenging scenarios, and achieves competitive results compared to state-of-the-art methods.
深度学习在医学图像分析方面取得了显著的成就,但其依赖于大量高质量标注数据的特性限制了其应用范围。虽然获取带噪声的标注数据更容易,但直接将其纳入训练会降低模型性能。为了应对这一挑战,我们提出了一种基于Mean Teacher的自适应标签校正(ALC)自集成框架,用于稳健的医学图像分割,该框架带有噪声标签。该框架利用Mean Teacher架构确保在噪声扰动下的持续学习。它包含一个自适应标签优化机制,该机制能够动态捕获并权衡多个扰动版本之间的差异,以提高噪声标签的质量。此外,引入了一种基于样本级别不确定性的标签选择算法,以优先处理高置信度样本进行网络更新,减轻噪声注释的影响。集成一致性学习以对齐学生和教师网络的预测,进一步提高模型的稳健性。在两个公共数据集上的大量实验证明了所提框架的有效性,显示出分割性能的显著提高。通过充分利用Mean Teacher结构的优势,ALC框架有效地处理了噪声标签,适应了具有挑战性的场景,并与最先进的方法相比取得了具有竞争力的结果。
论文及项目相关链接
Summary
深度学习在医学图像分析领域取得了显著的成功,但其依赖于大量高质量标注数据的应用范围有限。为了处理带噪声标签数据带来的问题,我们提出了基于Mean Teacher的自适应标签修正(ALC)自集成框架,用于稳健医学图像分割。该框架利用Mean Teacher架构确保在噪声扰动下的学习一致性,并包括自适应标签优化机制,动态捕捉并权衡不同扰动版本之间的差异,以提高噪声标签的质量。此外,引入了基于样本级别不确定性的标签选择算法,优先选取高置信度样本进行网络更新,减少噪声标注的影响。一致性学习被整合以对齐学生和教师网络的预测,进一步提高模型的稳健性。在公共数据集上的大量实验证明了该框架的有效性,在分割性能上取得了显著的提升。
Key Takeaways
- 深度学习在医学图像分析中具有显著成功,但依赖大量高质量标注数据的限制其应用范围。
- 提出的基于Mean Teacher的自适应标签修正(ALC)框架旨在处理带噪声标签数据的问题。
- ALC框架利用Mean Teacher架构确保噪声扰动下学习的一致性。
- 自适应标签优化机制能动态捕捉并权衡不同扰动版本之间的差异,提高噪声标签质量。
- 基于样本级别不确定性的标签选择算法优先选取高置信度样本进行网络更新。
- 一致性学习被整合以提高模型的稳健性,对齐学生和教师网络的预测。
点此查看论文截图
Geodesic Diffusion Models for Efficient Medical Image Enhancement
Authors:Teng Zhang, Hongxu Jiang, Kuang Gong, Wei Shao
Diffusion models generate data by learning to reverse a forward process, where samples are progressively perturbed with Gaussian noise according to a predefined noise schedule. From a geometric perspective, each noise schedule corresponds to a unique trajectory in probability space from the data distribution to a Gaussian prior. However, prior diffusion models rely on empirically chosen schedules that may not be optimal. This inefficiency necessitates many intermediate time steps, resulting in high computational costs during both training and sampling. To address this, we derive a family of geodesic noise schedules corresponding to the shortest paths in probability space under the Fisher-Rao metric. Based on these schedules, we propose Geodesic Diffusion Models (GDMs), which significantly improve training and sampling efficiency by minimizing the energy required to transform between probability distributions. This efficiency further enables sampling to start from an intermediate distribution in conditional image generation, achieving state-of-the-art results with as few as 6 steps. We evaluated GDM on two medical image enhancement tasks: CT image denoising and MRI image super-resolution. Experimental results show that GDM achieved state-of-the-art performance while reducing training time by 20- to 30-fold compared to Denoising Diffusion Probabilistic Models (DDPMs) and 4- to 6-fold compared to Fast-DDPM, and accelerating sampling by 160- to 170-fold and 1.6-fold, respectively. These gains support the use of GDM for efficient model development and real-time clinical applications. Our code is publicly available at: https://github.com/mirthAI/GDM-VE.
扩散模型通过学习反转一个预先定义噪声计划逐步扰动样本的前向过程来生成数据。从几何学的角度来看,每个噪声计划对应于概率空间中从数据分布到高斯先验的独特轨迹。然而,先前的扩散模型依赖于经验选择的计划,这些计划可能并非最优。这种低效性需要大量的中间步骤,导致训练和采样过程中的计算成本高昂。为解决这一问题,我们推导出对应于Fisher-Rao度量下概率空间中最短路径的一系列测地线噪声计划。基于这些计划,我们提出了测地线扩散模型(GDMs),通过最小化概率分布之间转换所需的能量,显著提高训练和采样的效率。这种效率还使得可以从条件图像生成的中间分布开始采样,仅用6步就实现了业界领先的结果。我们在两项医学图像增强任务上评估了GDM:CT图像去噪和MRI图像超分辨率。实验结果表明,GDM在达到业界领先性能的同时,与去噪扩散概率模型(DDPMs)相比将训练时间缩短了20至30倍,与Fast-DDPM相比缩短了4至6倍;采样速度分别加快了160至170倍和1.6倍。这些优势支持使用GDM进行高效模型开发和实时临床应用。我们的代码公开在:https://github.com/mirthAI/GDM-VE。
论文及项目相关链接
Summary
基于Fisher-Rao度量,本文提出了一系列最短路径的测地线噪声调度方案,并据此提出了Geodesic Diffusion Models(GDMs)。GDM通过最小化概率分布之间的能量转换需求,显著提高了训练和采样的效率。在医学图像增强任务中,GDM取得了最先进的性能,并大幅度减少了训练时间和采样时间。
Key Takeaways
- 扩散模型通过学习反转一个正向过程来生成数据,该过程按照预定的噪声调度逐步扰动样本。
- 过去的扩散模型依赖于经验选择的调度,可能不是最优的,导致训练和采样的计算成本高昂。
- 本文提出了基于Fisher-Rao度量的最短路径测地线噪声调度方案。
- 基于这些调度方案,提出了Geodesic Diffusion Models(GDMs),提高了训练和采样的效率。
- GDM能够在条件图像生成中从中间分布开始采样,以较少的步骤达到最先进的生成效果。
- 在医学图像增强任务中,GDM实现了最先进的性能,并大幅度减少了训练时间。
点此查看论文截图
Physics motivated models of pulsar X-ray hotspots: off-center dipole configurations
Authors:Chun Huang, Alexander Y. Chen
Recently, it was proposed that an off-center dipole magnetic configuration, together with a non-trivial temperature profile, may be the best model to explain the X-ray light curve of PSR J0030+0451 observed by the Neutron Star Interior Composition Explorer (\emph{NICER}). Using a theoretical model for the electric current density in a force-free pulsar magnetosphere, we compute from first principles the distribution of electric current over the polar cap associated with an off-center magnetic dipole. We then use a simple prescription to compute the resulting temperature distribution, which allows us to derive the observed X-ray light curve. We investigate the role of the volumetric return current region in the polar cap and find that although it does not make a big difference in an aligned dipole case, the difference can be bigger in the case of an off-center dipole. Finally, we apply Markov Chain Monte Carlo (MCMC) fitting to the X-ray light curves of pulsars PSR J0030+0451 and PSR J0437–4715 with and without the volumetric return current, and find that our model can reasonably recover the observed X-ray light curves.
最近,有提议认为非中心偶极磁场配置与非常规温度分布相结合可能是解释由中子星内部结构探测器(NICER)观察到的PSR J0030+0451的X射线光变的最佳模型。我们使用自由力偶极脉冲星磁层中的电流密度理论模型,从第一原理出发计算非中心磁偶极极冠上的电流分布。然后,我们使用一个简单的公式来计算所得的温度分布,从而推导出观察到的X射线光变曲线。我们研究了体积返回电流区域在极冠中的作用,发现虽然在偶极对齐的情况下差异不大,但在非中心偶极的情况下差异可能更大。最后,我们对包含和不包含体积返回电流的脉冲星PSR J0030+0451和PSR J0437-4715的X射线光变曲线应用马尔可夫链蒙特卡洛(MCMC)拟合方法,发现我们的模型可以合理地恢复观察到的X射线光变曲线。
论文及项目相关链接
PDF Accepted publication in ApJ
Summary
该文探讨了中子星内部成分探测器(NICER)观测到的PSR J0030+0451的X射线光变曲线,并提出一个偏离中心的偶极磁场配置与非平凡的温度分布可能是最佳模型来解释这一现象。研究从自由力偶极磁场模型出发,计算极冠区域的电流分布。进一步,利用简单的温度分布公式推导X射线光变曲线。研究发现极冠的体积返回电流区域在偏离中心偶极情况下影响较大。最后,利用马尔可夫链蒙特卡罗(MCMC)方法对包含和不包含体积返回电流的PSR J0030+0451和PSR J0437-4715的X射线光变曲线进行拟合,发现模型能够较好地恢复观测到的X射线光变曲线。
Key Takeaways
- PSR J0030+0451的X射线光变曲线可通过偏离中心的偶极磁场配置和非平凡的温度分布模型来解释。
- 通过自由力偶极磁场模型计算了极冠区域的电流分布。
- 极冠的体积返回电流区域在偏离中心偶极情况下作用显著。
- 使用简单公式计算温度分布以推导X射线光变曲线。
- MCMC方法被用于拟合PSR J0030+0451和PSR J0437-4715的X射线光变曲线。
- 模型能够较好地恢复观测到的X射线光变曲线。
点此查看论文截图
H3DE-Net: Efficient and Accurate 3D Landmark Detection in Medical Imaging
Authors:Zhen Huang, Tao Tang, Ronghao Xu, Yangbo Wei, Wenkai Yang, Suhua Wang, Xiaoxin Sun, Han Li, Qingsong Yao
3D landmark detection is a critical task in medical image analysis, and accurately detecting anatomical landmarks is essential for subsequent medical imaging tasks. However, mainstream deep learning methods in this field struggle to simultaneously capture fine-grained local features and model global spatial relationships, while maintaining a balance between accuracy and computational efficiency. Local feature extraction requires capturing fine-grained anatomical details, while global modeling requires understanding the spatial relationships within complex anatomical structures. The high-dimensional nature of 3D volume further exacerbates these challenges, as landmarks are sparsely distributed, leading to significant computational costs. Therefore, achieving efficient and precise 3D landmark detection remains a pressing challenge in medical image analysis. In this work, We propose a \textbf{H}ybrid \textbf{3}D \textbf{DE}tection \textbf{Net}(H3DE-Net), a novel framework that combines CNNs for local feature extraction with a lightweight attention mechanism designed to efficiently capture global dependencies in 3D volumetric data. This mechanism employs a hierarchical routing strategy to reduce computational cost while maintaining global context modeling. To our knowledge, H3DE-Net is the first 3D landmark detection model that integrates such a lightweight attention mechanism with CNNs. Additionally, integrating multi-scale feature fusion further enhances detection accuracy and robustness. Experimental results on a public CT dataset demonstrate that H3DE-Net achieves state-of-the-art(SOTA) performance, significantly improving accuracy and robustness, particularly in scenarios with missing landmarks or complex anatomical variations. We aready open-source our project, including code, data and model weights.
三维(3D)地标检测是医学图像分析中的一项重要任务,准确地检测解剖地标对于后续医学成像任务至关重要。然而,该领域的主流深度学习方法难以在精细的局部特征捕捉、全局空间关系建模、准确性和计算效率之间保持平衡。局部特征提取需要捕捉精细的解剖细节,而全局建模则需要理解复杂解剖结构内的空间关系。由于地标分布稀疏,三维体积的高维性质进一步加剧了这些挑战,导致计算成本较高。因此,实现高效且精确的3D地标检测仍然是医学图像分析领域的一个紧迫挑战。在本研究中,我们提出了一种混合三维检测网络(Hybrid 3D Detection Net,简称H3DE-Net)。这是一个新型框架,结合了卷积神经网络(CNN)进行局部特征提取和一个轻量级注意力机制,旨在高效捕捉三维体积数据中的全局依赖关系。该机制采用分层路由策略来降低计算成本,同时保持全局上下文建模。据我们所知,H3DE-Net是第一个将此类轻量级注意力机制与CNN相结合的3D地标检测模型。此外,通过多尺度特征融合进一步提高了检测精度和稳健性。在公共CT数据集上的实验结果表明,H3DE-Net达到了最先进的性能,特别是在地标缺失或解剖结构复杂的场景中显著提高了准确性和稳健性。我们的项目已经开源,包括代码、数据和模型权重。
论文及项目相关链接
Summary
本文提出一种混合三维检测网络(H3DE-Net),结合卷积神经网络(CNN)进行局部特征提取,并采用轻量级注意力机制高效捕捉三维体积数据中的全局依赖关系。该方法采用分层路由策略,降低计算成本的同时保持全局上下文建模。实验结果表明,H3DE-Net在公开CT数据集上达到最新技术水平,提高了准确性和鲁棒性,特别是在缺失地标或复杂解剖变异情况下。
Key Takeaways
- 3D landmark检测是医学图像分析中的关键任务,需要同时捕捉局部精细特征和全局空间关系。
- 主流深度学习方法在此任务上面临挑战,需要在准确性、计算效率和模型复杂度之间取得平衡。
- H3DE-Net结合CNN进行局部特征提取,采用轻量级注意力机制捕捉全局依赖。
- 分层路由策略降低计算成本,同时保持全局上下文建模。
- H3DE-Net是首个将轻量级注意力机制与CNN结合用于3D landmark检测模型。
- 多尺度特征融合进一步提高检测准确性和鲁棒性。
点此查看论文截图
A Synthetic Data-Driven Radiology Foundation Model for Pan-tumor Clinical Diagnosis
Authors:Wenhui Lei, Hanyu Chen, Zitian Zhang, Luyang Luo, Qiong Xiao, Yannian Gu, Peng Gao, Yankai Jiang, Ci Wang, Guangtao Wu, Tongjia Xu, Yingjie Zhang, Pranav Rajpurkar, Xiaofan Zhang, Shaoting Zhang, Zhenning Wang
AI-assisted imaging made substantial advances in tumor diagnosis and management. However, a major barrier to developing robust oncology foundation models is the scarcity of large-scale, high-quality annotated datasets, which are limited by privacy restrictions and the high cost of manual labeling. To address this gap, we present PASTA, a pan-tumor radiology foundation model built on PASTA-Gen, a synthetic data framework that generated 30,000 3D CT scans with pixel-level lesion masks and structured reports of tumors across ten organ systems. Leveraging this resource, PASTA achieves state-of-the-art performance on 45 of 46 oncology tasks, including non-contrast CT tumor screening, lesion segmentation, structured reporting, tumor staging, survival prediction, and MRI-modality transfer. To assess clinical applicability, we developed PASTA-AID, a clinical decision support system, and ran a retrospective simulated clinical trial across two scenarios. For pan-tumor screening on plain CT with fixed reading time, PASTA-AID increased radiologists’ throughput by 11.1-25.1% and improved sensitivity by 17.0-31.4% and precision by 10.5-24.9%; additionally, in a diagnosis-aid workflow, it reduced segmentation time by up to 78.2% and reporting time by up to 36.5%. Beyond gains in accuracy and efficiency, PASTA-AID narrowed the expertise gap, enabling less-experienced radiologists to approach expert-level performance. Together, this work establishes an end-to-end, synthetic data-driven pipeline spanning data generation, model development, and clinical validation, thereby demonstrating substantial potential for pan-tumor research and clinical translation.
人工智能辅助成像在肿瘤诊断和治疗方面取得了重大进展。然而,开发稳健的肿瘤基础模型的主要障碍是缺乏大规模、高质量的有标注数据集,这受到隐私限制和手动标注成本高昂的限制。为了解决这一差距,我们提出了基于合成数据框架PASTA-Gen构建的泛肿瘤放射学基础模型PASTA。PASTA-Gen生成了带有像素级病变掩码的十万个三维CT扫描图像以及涵盖十个系统的结构化肿瘤报告。利用这一资源,PASTA在46项肿瘤学任务中的45项上达到了最先进的性能,包括非对比增强CT肿瘤筛查、病变分割、结构化报告、肿瘤分期、生存预测和MRI模态转换等。为了评估其临床适用性,我们开发了一个名为PASTA-AID的临床决策支持系统,并在两种情境下进行了回顾性模拟临床试验。在常规CT上的泛肿瘤筛查中,使用固定阅读时间的情况下,PASTA-AID提高了放射科医生的工作效率,提高了医生诊断的敏感性达17.0%~31.4%,精确度提高达10.5%~24.9%。此外,在诊断辅助工作流程中,它减少了分割时间高达78.2%,并减少了报告时间高达36.5%。除了提高准确性和效率之外,PASTA-AID缩小了专家之间的差距,使经验较少的放射科医生能够达到专家级别的性能。总体而言,这项工作建立了一条端到端的合成数据驱动管道,涵盖数据生成、模型开发和临床验证,从而显示出其在泛肿瘤研究和临床转化方面的巨大潜力。
论文及项目相关链接
PDF 63 pages, 7 figures
Summary
基于合成数据框架生成的模拟图像库——PASTA-Gen,构建了一个泛肿瘤放射学基础模型PASTA,该模型在肿瘤诊断和管理方面取得了显著进展。通过合成数据生成的3万多个三维CT扫描图像和像素级别的肿瘤遮罩结构报告,覆盖了十个系统器官的肿瘤数据。相较于现有的肿瘤诊断任务,其展现了出色的性能,在两项评估中实现了临床应用,不仅提高了放射科医生的工作效率,也提升了诊断的准确性和精度。更重要的是,该模型缩小了专家之间的差距,使经验不足的放射科医生也能接近专家水平。此研究展示了一个涵盖数据生成、模型开发和临床验证的端到端的合成数据驱动管道,为泛肿瘤研究和临床应用提供了巨大潜力。
Key Takeaways
- AI辅助成像在肿瘤诊断和治疗中取得显著进展。
- 缺乏大规模高质量标注数据集是开发稳健的肿瘤基础模型的主要障碍。
- PASTA模型利用合成数据框架PASTA-Gen生成了覆盖十个系统器官的泛肿瘤图像数据集。
- PASTA模型在多项肿瘤诊断任务上达到最佳性能水平。
- 通过两项评估的临床应用测试表明,PASTA-AID提升了放射科医生的工作效率,增强了诊断的准确性和精确度。
- 该模型提高了医生的经验不足的放射科医生的表现水平。
点此查看论文截图