⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-05 更新
Navigated hepatic tumor resection using intraoperative ultrasound imaging
Authors:Karin Olthof, Theo Ruers, Tiziano Natali, Lisanne Venix, Jasper Smit, Anne den Hartor, Niels Kok, Matteo Fusaglia, Koert Kuhlmann
Purpose: This proof-of-concept study evaluates feasibility and accuracy of an ultrasound-based navigation system for open liver surgery. Unlike most conventional systems that rely on registration to preoperative imaging, the proposed system provides navigation-guided resection using 3D models generated from intraoperative ultrasound. Methods: A pilot study was conducted in 25 patients undergoing resection of liver metastases. The first five cases served to optimize the workflow. Intraoperatively, an electromagnetic sensor compensated for organ motion, after which an ultrasound volume was acquired. Vasculature was segmented automatically and tumors semi-automatically using region-growing (n=15) or a deep learning algorithm (n=5). The resulting 3D model was visualized alongside tracked surgical instruments. Accuracy was assessed by comparing the distance between surgical clips and tumors in the navigation software with the same distance on a postoperative CT of the resected specimen. Results: Navigation was successfully established in all 20 patients. However, four cases were excluded from accuracy assessment due to intraoperative sensor detachment (n=3) or incorrect data recording (n=1). The complete navigation workflow was operational within 5-10 minutes. In 16 evaluable patients, 78 clip-to-tumor distances were analyzed. The median navigation accuracy was 3.2 mm [IQR: 2.8-4.8 mm], and an R0 resection was achieved in 15/16 (93.8%) patients and one patient had an R1 vascular resection. Conclusion: Navigation based solely on intra-operative ultrasound is feasible and accurate for liver surgery. This registration-free approach paves the way for simpler and more accurate image guidance systems.
目的:本概念验证研究旨在评估基于超声的导航系统在开放肝脏手术中的可行性和准确性。与传统的大多系统依赖于术前成像注册不同,所提议的系统使用从术中超声生成的3D模型提供导航引导切除。方法:在25例接受肝转移切除的患者中进行了试点研究。前五个病例用于优化工作流程。术中,电磁传感器对器官运动进行了补偿,之后获取了超声体积。血管自动分割,肿瘤半自动使用区域增长(n=15)或深度学习算法(n=5)。将得到的3D模型与追踪的手术器械进行可视化。通过比较导航软件中手术夹与肿瘤之间的距离与切除标本术后CT上的相同距离,来评估准确性。结果:所有20名患者成功建立导航。然而,由于术中传感器脱落(n=3)或数据记录错误(n=1),四例被排除在准确性评估之外。完整的导航工作流程在5-10分钟内运行。在16例可评估的患者中,分析了78个夹片至肿瘤的距离。中位数导航精度为3.2毫米[IQR:2.8-4.8毫米],在15/16(93.8%)例患者中实现R0切除,一例患者出现R1血管切除。结论:仅基于术中超声的导航对于肝脏手术是可行且准确的。这种无需注册的方法为更简单和更准确的图像引导系统铺平了道路。
论文及项目相关链接
Summary
本文是一项关于超声导航系统在开放肝脏手术中的可行性及准确性的概念验证研究。该研究采用了一种新颖的导航引导切除系统,该系统利用术中超声生成的3D模型进行导航,而非依赖术前影像注册。研究结果显示,该系统的导航精度较高,且操作简便快捷,为肝脏手术的图像引导系统提供了新的可能性。
Key Takeaways
- 该研究评估了一种基于超声的导航系统在开放肝脏手术中的应用,该系统无需注册到术前影像,而是利用术中超声生成的3D模型进行导航。
- 在一项针对肝脏转移癌切除的试点研究中,对25名患者进行了手术操作,其中前五个病例用于优化工作流程。
- 术中采用电磁传感器补偿器官运动,并获取超声体积数据。血管自动分割和肿瘤半自动分割采用区域增长或深度学习算法。
- 导航在20名患者中成功建立,其中4例因术中传感器脱落或数据记录错误而未被纳入精度评估。
- 在可评价的16名患者中,分析的数据显示导航精度中位数为3.2毫米,且R0切除率高,达到了93.8%。
- 该系统的操作全程在5-10分钟内完成,显示出较高的实用性和操作便捷性。
点此查看论文截图
Context-Gated Cross-Modal Perception with Visual Mamba for PET-CT Lung Tumor Segmentation
Authors:Elena Mulero Ayllón, Linlin Shen, Pierangelo Veltri, Fabrizia Gelardi, Arturo Chiti, Paolo Soda, Matteo Tortora
Accurate lung tumor segmentation is vital for improving diagnosis and treatment planning, and effectively combining anatomical and functional information from PET and CT remains a major challenge. In this study, we propose vMambaX, a lightweight multimodal framework integrating PET and CT scan images through a Context-Gated Cross-Modal Perception Module (CGM). Built on the Visual Mamba architecture, vMambaX adaptively enhances inter-modality feature interaction, emphasizing informative regions while suppressing noise. Evaluated on the PCLT20K dataset, the model outperforms baseline models while maintaining lower computational complexity. These results highlight the effectiveness of adaptive cross-modal gating for multimodal tumor segmentation and demonstrate the potential of vMambaX as an efficient and scalable framework for advanced lung cancer analysis. The code is available at https://github.com/arco-group/vMambaX.
精确进行肺部肿瘤分割对于提高诊断和治疗计划的制定至关重要,如何将PET和CT扫描的解剖学和功能信息进行有效结合仍然是一个重大挑战。在这项研究中,我们提出了vMambaX,这是一个轻量级的多模态框架,它通过上下文门控跨模态感知模块(CGM)融合了PET和CT扫描图像。基于Visual Mamba架构,vMambaX自适应地增强了跨模态特征交互能力,强调信息区域的同时抑制噪声。在PCLT20K数据集上进行的评估表明,该模型优于基线模型,同时保持了较低的计算复杂度。这些结果突出了自适应跨模态门控在多模态肿瘤分割中的有效性,并展示了vMambaX作为一个高效且可扩展的框架在高级肺癌分析中的潜力。代码可通过https://github.com/arco-group/vMambaX获取。
论文及项目相关链接
Summary
本研究提出了一种轻量级的多模态框架vMambaX,该框架通过上下文门控跨模态感知模块(CGM)融合PET和CT扫描图像。vMambaX基于Visual Mamba架构构建,自适应增强跨模态特征交互,强调信息区域同时抑制噪声。在PCLT20K数据集上的评估结果表明,该模型优于基线模型,同时计算复杂度较低。此研究展示了自适应跨模态门控在多模态肿瘤分割中的有效性,并表明了vMambaX作为先进肺癌分析的高效可拓展框架的潜力。
Key Takeaways
- vMambaX是一个基于Visual Mamba架构的多模态框架,用于融合PET和CT扫描图像。
- 通过上下文门控跨模态感知模块(CGM),vMambaX能自适应增强跨模态特征交互。
- vMambaX强调信息区域,同时抑制噪声,以提高肺肿瘤分割的准确性。
- 在PCLT20K数据集上的评估显示,vMambaX模型性能优于基线模型。
- vMambaX计算复杂度较低,具有高效性。
- 自适应跨模态门控技术在多模态肿瘤分割中展现有效性。
- vMambaX具有潜力成为肺癌分析的高效且可拓展框架。
点此查看论文截图
CoMViT: An Efficient Vision Backbone for Supervised Classification in Medical Imaging
Authors:Aon Safdar, Mohamed Saadeldin
Vision Transformers (ViTs) have demonstrated strong potential in medical imaging; however, their high computational demands and tendency to overfit on small datasets limit their applicability in real-world clinical scenarios. In this paper, we present CoMViT, a compact and generalizable Vision Transformer architecture optimized for resource-constrained medical image analysis. CoMViT integrates a convolutional tokenizer, diagonal masking, dynamic temperature scaling, and pooling-based sequence aggregation to improve performance and generalization. Through systematic architectural optimization, CoMViT achieves robust performance across twelve MedMNIST datasets while maintaining a lightweight design with only ~4.5M parameters. It matches or outperforms deeper CNN and ViT variants, offering up to 5-20x parameter reduction without sacrificing accuracy. Qualitative Grad-CAM analyses show that CoMViT consistently attends to clinically relevant regions despite its compact size. These results highlight the potential of principled ViT redesign for developing efficient and interpretable models in low-resource medical imaging settings.
在计算机视觉领域,Vision Transformers(ViTs)在医学成像方面表现出了强大的潜力。然而,它们计算量大且容易在小数据集上过度拟合,限制了它们在现实世界临床场景中的应用。在本文中,我们介绍了CoMViT,这是一种紧凑且可推广的专为资源受限医学图像分析优化的Vision Transformer架构。CoMViT集成了卷积分词器、对角线掩码、动态温度缩放和基于池化的序列聚合,以提高性能和泛化能力。通过系统的架构优化,CoMViT在十二个MedMNIST数据集上实现了稳健的性能,同时保持了仅有~450万个参数的小型设计。它匹配或超越了更深的CNN和ViT变体,实现了高达5-20倍的参数缩减,同时不损失准确性。定性Grad-CAM分析表明,尽管CoMViT尺寸紧凑,但它始终关注临床相关区域。这些结果突出了原则性ViT重新设计的潜力,为低资源医学成像环境中开发高效和可解释模型提供了可能。
论文及项目相关链接
PDF Preprint (submitted manuscript). Accepted at the MICCAI 2025 MIRASOL Workshop; to appear in the Springer proceedings volume. This is the pre-review version (not the Version of Record). DOI will be added after publication. [Optional: 8 pages, 4 figures, 4 tables.]
Summary
医学图像领域中的ViT模型存在计算量大和在小型数据集上容易过拟合的问题,限制了其在真实临床场景中的应用。本文提出一种紧凑且通用的Vision Transformer架构——CoMViT,适用于资源受限的医疗图像分析。CoMViT通过整合卷积标记器、对角掩码、动态温度缩放和基于池的序列聚合等技术,进行系统架构优化,实现了在十二个MedMNIST数据集上的稳健性能,同时保持仅有~4.5M参数。它匹配或优于更深的CNN和ViT变体,实现了高达5-20倍的参数减少,同时不损失准确性。定性Grad-CAM分析表明,CoMViT虽然尺寸紧凑,但始终关注临床相关区域。
Key Takeaways
- CoMViT是一个针对资源受限医疗图像分析优化的紧凑且通用的Vision Transformer架构。
- CoMViT通过整合多项技术(卷积标记器、对角掩码、动态温度缩放和基于池的序列聚合)提升性能并增强通用性。
- CoMViT在十二个MedMNIST数据集上表现稳健,实现了高准确率。
- CoMViT具有轻量级设计,仅有~4.5M参数。
- CoMViT与更深的CNN和ViT变体相比,实现了参数减少,同时不损失准确性。
- 定量Grad-CAM分析证实CoMViT关注临床相关区域。
点此查看论文截图
Who Does Your Algorithm Fail? Investigating Age and Ethnic Bias in the MAMA-MIA Dataset
Authors:Aditya Parikh, Sneha Das, Aasa Feragen
Deep learning models aim to improve diagnostic workflows, but fairness evaluation remains underexplored beyond classification, e.g., in image segmentation. Unaddressed segmentation bias can lead to disparities in the quality of care for certain populations, potentially compounded across clinical decision points and amplified through iterative model development. Here, we audit the fairness of the automated segmentation labels provided in the breast cancer tumor segmentation dataset MAMA-MIA. We evaluate automated segmentation quality across age, ethnicity, and data source. Our analysis reveals an intrinsic age-related bias against younger patients that continues to persist even after controlling for confounding factors, such as data source. We hypothesize that this bias may be linked to physiological factors, a known challenge for both radiologists and automated systems. Finally, we show how aggregating data from multiple data sources influences site-specific ethnic biases, underscoring the necessity of investigating data at a granular level.
深度学习模型旨在改善诊断流程,但在图像分割等分类之外,公平性评价仍被忽视。未解决的分割偏见可能导致特定人群护理质量的不平等,这种不平等可能在临床决策点加剧,并在模型迭代开发中放大。在这里,我们对乳腺癌肿瘤分割数据集MAMA-MIA提供的自动分割标签的公平性进行了审计。我们评估了不同年龄段、种族和数据来源的自动分割质量。我们的分析揭示了对年轻患者的内在年龄相关偏见,即使在控制诸如数据来源等混杂因素后,这种偏见依然存在。我们假设这种偏见可能与生理因素(已知对放射医师和自动化系统都是已知的挑战)有关。最后,我们展示了从多个数据源聚合数据如何影响特定地点的种族偏见,这强调了从微观层面调查数据的必要性。
论文及项目相关链接
PDF Medical Imaging Meets EurIPS (NeurIPS-endorsed workshop) - MedEurIPS
Summary
医学图像深度学习模型在诊断工作流程中的应用日益广泛,但其公平性评估在分类以外领域仍然被忽视。未解决的分割偏见可能导致某些群体在护理质量上的差距,这些差距可能会在决策点累积放大并在模型迭代开发中进一步复杂化。本研究审计了乳腺癌肿瘤分割数据集MAMA-MIA中提供的自动化分割标签的公平性,评估了其在年龄、种族和数据源方面的自动化分割质量。分析显示存在针对年轻患者的年龄相关偏见,即使在控制数据源等混淆因素后,这种偏见依然存在。假设这种偏见可能与生理因素相关,这是放射医师和自动系统都面临的挑战。此外,通过多数据源聚合数据对特定站点种族偏见的影响,强调了进行粒度级数据调查的必要性。
Key Takeaways
- 深度学习模型在医学图像诊断工作流程中的应用虽广泛,但在除分类以外的领域,如图像分割中的公平性评估仍然被忽视。
- 未解决的分割偏见可能导致不同群体在接受医疗服务时存在质量差距。
- 针对乳腺癌肿瘤分割数据集MAMA-MIA的自动化分割标签进行了公平性审计。
- 评估了自动化分割质量在年龄、种族和数据源方面的表现。
- 分析显示存在针对年轻患者的年龄相关偏见,即使控制其他因素后依然存在。
- 假设这种偏见可能与生理因素相关,这是对放射医师和自动系统的共同挑战。
点此查看论文截图
MeisenMeister: A Simple Two Stage Pipeline for Breast Cancer Classification on MRI
Authors:Benjamin Hamm, Yannick Kirchhoff, Maximilian Rokuss, Klaus Maier-Hein
The ODELIA Breast MRI Challenge 2025 addresses a critical issue in breast cancer screening: improving early detection through more efficient and accurate interpretation of breast MRI scans. Even though methods for general-purpose whole-body lesion segmentation as well as multi-time-point analysis exist, breast cancer detection remains highly challenging, largely due to the limited availability of high-quality segmentation labels. Therefore, developing robust classification-based approaches is crucial for the future of early breast cancer detection, particularly in applications such as large-scale screening. In this write-up, we provide a comprehensive overview of our approach to the challenge. We begin by detailing the underlying concept and foundational assumptions that guided our work. We then describe the iterative development process, highlighting the key stages of experimentation, evaluation, and refinement that shaped the evolution of our solution. Finally, we present the reasoning and evidence that informed the design choices behind our final submission, with a focus on performance, robustness, and clinical relevance. We release our full implementation publicly at https://github.com/MIC-DKFZ/MeisenMeister
ODELIA乳腺癌MRI挑战赛2025解决了乳腺癌筛查中的一个关键问题:通过更有效率且准确的乳腺MRI扫描解读来改善早期检测。尽管存在用于全身通用病变分割以及多点时间分析的方法,但检测乳腺癌仍然是一项巨大挑战,这主要是因为高质量分割标签的可用性有限。因此,开发稳健的分类方法对于未来早期乳腺癌检测至关重要,特别是在大规模筛查等应用中。在此报告中,我们全面概述了我们应对挑战的方法。首先,我们详细介绍指导我们工作的基本概念和基本假设。然后,我们描述了迭代开发过程,突出显示实验、评估和优化的关键阶段,这些阶段塑造了我们的解决方案的演变。最后,我们给出了设计最终提交时的推理和证据,重点考虑性能、稳健性和临床意义。我们在https://github.com/MIC-DKFZ/MeisenMeister上公开发布了我们的完整实现。
论文及项目相关链接
PDF Winning Solution of the MICCAI 2025 ODELIA Breast MRI Classification Challenge
Summary
MRI影像的自动化分析技术是改善乳腺癌早期检测的关键,特别是对于大规模筛查。本文全面概述了在ODELIA乳腺癌MRI挑战赛上的方法。介绍的方法着眼于高性能和鲁棒性,公开分享了具体实施方案和实验结果。我们相信这样的技术进步有望提升MRI图像分析的准确性和效率,提高乳腺癌的早期诊断水平。更全面的解读与应用情况请访问公开网址 https://github.com/MIC-DKFZ/MeisenMeister 了解。
Key Takeaways
- ODELIA Breast MRI挑战旨在通过改进MRI扫描的解读来提升乳腺癌的早期检测效率与准确性。
- 目前乳腺癌检测面临的主要挑战是高质量分割标签的有限可用性。
- 开发稳健的分类方法对于未来早期乳腺癌检测至关重要,特别是在大规模筛查应用中。
- 文章详细介绍了研究团队的方法,包括基础假设、迭代开发过程、实验、评估和解决方案的改进等关键阶段。
- 研究团队重视性能、鲁棒性和临床意义,并以此作为设计选择的核心依据。
- 该研究团队公开分享了其完整实施方案和实验结果,以便其他研究者可以借鉴和改进。
点此查看论文截图
Versatile and Efficient Medical Image Super-Resolution Via Frequency-Gated Mamba
Authors:Wenfeng Huang, Xiangyun Liao, Wei Cao, Wenjing Jia, Weixin Si
Medical image super-resolution (SR) is essential for enhancing diagnostic accuracy while reducing acquisition cost and scanning time. However, modeling both long-range anatomical structures and fine-grained frequency details with low computational overhead remains challenging. We propose FGMamba, a novel frequency-aware gated state-space model that unifies global dependency modeling and fine-detail enhancement into a lightweight architecture. Our method introduces two key innovations: a Gated Attention-enhanced State-Space Module (GASM) that integrates efficient state-space modeling with dual-branch spatial and channel attention, and a Pyramid Frequency Fusion Module (PFFM) that captures high-frequency details across multiple resolutions via FFT-guided fusion. Extensive evaluations across five medical imaging modalities (Ultrasound, OCT, MRI, CT, and Endoscopic) demonstrate that FGMamba achieves superior PSNR/SSIM while maintaining a compact parameter footprint ($<$0.75M), outperforming CNN-based and Transformer-based SOTAs. Our results validate the effectiveness of frequency-aware state-space modeling for scalable and accurate medical image enhancement.
医学影像超分辨率(SR)技术在提高诊断准确性的同时,降低了采集成本和扫描时间。然而,如何在低计算开销的情况下,对远程解剖结构和精细频率细节进行建模仍然是一个挑战。我们提出了FGMamba,这是一种新型频率感知门控状态空间模型,它将全局依赖建模和细节增强集成到一个轻量级架构中。我们的方法引入了两个关键的创新点:一个门控注意力增强状态空间模块(GASM),它将高效的状态空间建模与双分支空间和通道注意力相结合;一个金字塔频率融合模块(PFFM),它通过FFT引导融合捕获多个分辨率下的高频细节。在五种医学影像模态(超声、OCT、MRI、CT和内窥镜)的广泛评估表明,FGMamba在保持紧凑的参数占用空间(小于0.75M)的同时,实现了优越的PSNR/SSIM性能,超越了基于CNN和基于Transformer的SOTAs。我们的结果验证了频率感知状态空间模型在可扩展和精确医学影像增强方面的有效性。
论文及项目相关链接
Summary
医学图像超分辨率技术对于提高诊断准确性、降低采集成本和扫描时间具有重要意义。提出一种新型频率感知门控状态空间模型FGMamba,该模型将全局依赖性建模和精细细节增强集成到一个轻量级架构中。该方法包括两个关键创新点:门控注意力增强状态空间模块(GASM)和金字塔频率融合模块(PFFM)。前者结合了高效的状态空间建模和双分支空间和通道注意力机制;后者通过FFT引导融合捕获多分辨率下的高频细节。在五种医学成像模态上的广泛评估表明,FGMamba在保持紧凑的参数足迹的同时,实现了优异的PSNR/SSIM性能,优于基于CNN和Transformer的SOTAs。结果验证了频率感知状态空间模型在可伸缩和准确的医学图像增强中的有效性。
Key Takeaways
- 医学图像超分辨率技术对于提高诊断准确性、降低成本和缩短扫描时间至关重要。
- 提出了一种新型频率感知门控状态空间模型FGMamba,该模型融合了全局依赖建模和精细细节增强。
- FGMamba包含两个关键创新点:GASM和PFFM。
- GASM结合了状态空间建模和注意力机制以增强性能。
- PFFM通过FFT引导融合捕获多分辨率下的高频细节,从而提高图像质量。
- FGMamba在多种医学成像模态上表现优异,优于现有的CNN和Transformer模型。
点此查看论文截图
T3: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis
Authors:Raza Imam, Hu Wang, Dwarikanath Mahapatra, Mohammad Yaqub
In medical imaging, vision-language models face a critical duality: pretrained networks offer broad robustness but lack subtle, modality-specific characteristics, while fine-tuned expert models achieve high in-distribution accuracy yet falter under modality shift. Existing model-merging techniques, designed for natural-image benchmarks, are simple and efficient but fail to deliver consistent gains across diverse medical modalities; their static interpolation limits reliability in varied clinical tasks. To address this, we introduce Test-Time Task adaptive merging (T^3), a backpropagation-free framework that computes per-sample interpolation coefficients via the Jensen-Shannon divergence between the two models’ output distributions. T^3 dynamically preserves local precision when models agree and defers to generalist robustness under drift. To overcome the inference costs of sample-wise merging, we further propose a batch-wise extension, T^3_B, that computes a merging coefficient across a batch of samples, dramatically reducing computational bottleneck. Recognizing the lack of a standardized medical-merging benchmark, we present a rigorous cross-evaluation protocol spanning in-domain, base-to-novel, and corruptions across four modalities. Empirically, T^3 sets new state-of-the-art in Top-1 accuracy and error reduction, outperforming strong baselines while maintaining efficiency, paving the way for adaptive MVLM deployment in clinical settings. Our code is available at https://github.com/Razaimam45/TCube.
在医学成像领域,视觉语言模型面临着一个关键的两难问题:预训练网络提供了广泛的稳健性,但缺乏细微的、特定于模态的特征,而精细调整的专业模型在内部分布上达到了高准确性,但在模态转变时却表现不佳。现有的模型合并技术,旨在为自然图像基准测试设计,简单高效,但在不同的医学模态上未能实现一致的增益;他们的静态插值限制了其在各种临床任务中的可靠性。为解决这一问题,我们引入了测试时间任务自适应合并(T^3),这是一个无需反向传播的框架,它通过计算两个模型输出分布之间的Jensen-Shannon散度来计算每个样本的插值系数。T^3在模型一致时动态保留局部精度,并在漂移时转向通用稳健性。为了克服样本级合并的推理成本,我们进一步提出了批量扩展T^3_B,它可以在一批样本上计算合并系数,从而大大降低计算瓶颈。认识到缺乏标准化的医学合并基准测试,我们提出了一种严格的跨评价协议,该协议涵盖了四个模态的域内、从基础到新颖以及腐败情况。经验上,T^3在Top-1准确率和误差减少方面达到了最新水平,优于强大的基准测试,同时保持了高效率,为临床环境中自适应MVLM部署铺平了道路。我们的代码可在[https://github.com/Razaimam45/TCube找到。]
论文及项目相关链接
PDF Main: 11 pages, Supplementary: 9 pages 10 tables, 10 figures
Summary
本文提出了一个面向医学成像的Test-Time Task自适应融合(T^3)框架,解决了预训练网络和精细调整专家模型之间的局限性。该框架通过计算两个模型输出分布之间的Jensen-Shannon散度来动态计算每个样本的融合系数,既保留了局部精度又在模型漂移时保持稳健性。此外,为了降低样本级融合的计算成本,还提出了批量扩展的T^3_B方法。经过严格的跨域、跨模态评估协议验证,T^3在四种模态上实现了最先进的Top-1准确率和误差降低,表现出优越的性能和效率,为临床环境中多模态医学成像的适应性部署铺平了道路。
Key Takeaways
- T^3框架解决了预训练网络和专家模型在医学成像中的局限性。
- T^3通过计算模型输出分布之间的Jensen-Shannon散度来动态计算融合系数。
- T^3能在模型一致时保留局部精度,并在模型漂移时保持稳健性。
- T^3_B作为批量扩展方法,降低了计算成本。
- 缺乏标准化的医学融合基准测试,需要建立严格的评估协议。
- T^3在四种模态上实现了最先进的性能,包括域内、基础到新颖和腐败数据。
点此查看论文截图
Fusion of Heterogeneous Pathology Foundation Models for Whole Slide Image Analysis
Authors:Zhidong Yang, Xiuhui Shi, Wei Ba, Zhigang Song, Haijing Luan, Taiyuan Hu, Senlin Lin, Jiguang Wang, Shaohua Kevin Zhou, Rui Yan
Whole slide image (WSI) analysis has emerged as an increasingly essential technique in computational pathology. Recent advances in the pathological foundation models (FMs) have demonstrated significant advantages in deriving meaningful patch-level or slide-level feature representations from WSIs. However, current pathological FMs have exhibited substantial heterogeneity caused by diverse private training datasets and different network architectures. This heterogeneity introduces performance variability when we utilize the extracted features from different FMs in the downstream tasks. To fully explore the advantage of multiple FMs effectively, in this work, we propose a novel framework for the fusion of heterogeneous pathological FMs, called FuseCPath, yielding a model with a superior ensemble performance. The main contributions of our framework can be summarized as follows: (i) To guarantee the representativeness of the training patches, we propose a multi-view clustering-based method to filter out the discriminative patches via multiple FMs’ embeddings. (ii) To effectively fuse the heterogeneous patch-level FMs, we devise a cluster-level re-embedding strategy to online capture patch-level local features. (iii) To effectively fuse the heterogeneous slide-level FMs, we devise a collaborative distillation strategy to explore the connections between slide-level FMs. Extensive experiments conducted on lung cancer, bladder cancer, and colorectal cancer datasets from The Cancer Genome Atlas (TCGA) have demonstrated that the proposed FuseCPath achieves state-of-the-art performance across multiple tasks on these public datasets.
全幻灯片图像(Whole Slide Image, WSI)分析在计算病理学中已成为越来越重要的技术。近期病理基础模型(FMs)的进展在从WSIs中提取有意义的斑块级别或幻灯片级别的特征表示方面显示出显著优势。然而,当前的病理FMs由于各种私人训练数据集和网络架构的不同而表现出相当大的异质性。这种异质性在我们将不同FM中提取的特征用于下游任务时,引入了性能变化。为了充分探索多个FMs的优势,在这项工作中,我们提出了一种名为FuseCPath的新型异质病理FM融合框架,从而产生了一个具有卓越集成性能的模型。我们的框架的主要贡献如下:(i)为了保证训练斑块的代表性,我们提出了一种基于多视图聚类的方法,通过多个FMs的嵌入来过滤出鉴别性斑块。(ii)为了有效地融合异质斑块级别的FMs,我们设计了一种集群级别的重新嵌入策略,以在线捕获斑块级别的局部特征。(iii)为了有效地融合异质幻灯片级别的FMs,我们设计了一种协作蒸馏策略,以探索幻灯片级别FMs之间的联系。在癌症基因组图谱(TCGA)的肺癌、膀胱癌和结肠癌数据集上进行的广泛实验表明,所提出的FuseCPath在这些公开数据集上的多个任务上实现了最先进的性能。
论文及项目相关链接
PDF 22 pages, 9 figures
Summary
全切片图像分析在计算病理学中是至关重要的技术,而最新的病理基础模型展现出从全切片图像中提取有意义的部分或整体特征表示的显著优势。然而,由于各种私有训练数据集和网络架构的差异,病理基础模型的异质性导致了性能变化。为解决这一问题,我们提出了一种名为FuseCPath的异质病理基础模型融合框架,通过融合多个模型实现卓越的整体性能。
Key Takeaways
- 全切片图像分析在计算病理学中的重要性。
- 病理基础模型能够从全切片图像中提取有意义的部分或整体特征表示。
- 病理基础模型存在异质性,这会影响其在下游任务中的性能。
- 提出了一种名为FuseCPath的异质病理基础模型融合框架。
- FuseCPath框架通过多视角聚类方法筛选鉴别性补丁,保证训练补丁的代表性。
- FuseCPath框架采用集群级别的重新嵌入策略,有效地融合了异质的部分级模型。
- 通过在肺癌、膀胱癌和结肠癌数据集上的大量实验证明,FuseCPath框架实现了多个任务的最佳性能。
点此查看论文截图
ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding
Authors:Haonan Wang, Jingyu Lu, Hongrui Li, Xiaomeng Li
Recent advances in neural decoding have enabled the reconstruction of visual experiences from brain activity, positioning fMRI-to-image reconstruction as a promising bridge between neuroscience and computer vision. However, current methods predominantly rely on subject-specific models or require subject-specific fine-tuning, limiting their scalability and real-world applicability. In this work, we introduce ZEBRA, the first zero-shot brain visual decoding framework that eliminates the need for subject-specific adaptation. ZEBRA is built on the key insight that fMRI representations can be decomposed into subject-related and semantic-related components. By leveraging adversarial training, our method explicitly disentangles these components to isolate subject-invariant, semantic-specific representations. This disentanglement allows ZEBRA to generalize to unseen subjects without any additional fMRI data or retraining. Extensive experiments show that ZEBRA significantly outperforms zero-shot baselines and achieves performance comparable to fully finetuned models on several metrics. Our work represents a scalable and practical step toward universal neural decoding. Code and model weights are available at: https://github.com/xmed-lab/ZEBRA.
近期神经解码技术的进展已经能够从脑活动中重建视觉体验,使fMRI-to-image重建在神经科学和计算机视觉之间架起了一座充满希望的桥梁。然而,当前的方法主要依赖于特定主体的模型或需要进行主体特定的微调,这限制了其可扩展性和现实世界的适用性。在这项工作中,我们介绍了ZEBRA,这是第一个无需特定主体适应的零样本脑视觉解码框架。ZEBRA建立在这样一个关键见解之上,即fMRI表示可以分解为与主体相关的组件和与语义相关的组件。通过利用对抗训练,我们的方法显式地解开这些组件,以隔离与主体无关、与语义特定的表示。这种解开允许ZEBRA在没有任何额外的fMRI数据或重新训练的情况下推广到未见过的主体。大量实验表明,ZEBRA显著优于零样本基线,并在多个指标上实现了与完全微调模型相当的性能。我们的工作代表着朝着通用神经解码的一个可扩展和实用的步骤。代码和模型权重可在https://github.com/xmed-lab/ZEBRA找到。
论文及项目相关链接
PDF Accepted by NeurIPS 2025
Summary
近期神经解码技术的进展使得从脑活动重建视觉体验成为可能,将fMRI-to-image重建技术作为连接神经科学与计算机视觉之间的一座有前途的桥梁。然而,现有方法主要依赖于特定主体的模型或需要主体特定的微调,限制了其可扩展性和实际应用。本研究介绍了一种无需主体特定适应的零射击脑视觉解码框架ZEBRA。ZEBRA建立在这样一个关键见解上:fMRI表示可以分解为与主体相关的成分和语义相关的成分。通过利用对抗训练,我们的方法显式地解开这些成分,以隔离主体不变、语义特定的表示。这种解开允许ZEBRA在没有任何额外的fMRI数据或重新训练的情况下推广到未见过的主体。大量实验表明,ZEBRA显著优于零射击基线,并在多个指标上实现了与完全微调模型相当的性能。我们的工作代表着朝着通用神经解码的可扩展性和实用性的重要一步。
Key Takeaways
- 神经解码技术能够从脑活动中重建视觉体验,形成神经科学与计算机视觉之间的桥梁。
- 当前方法依赖特定主体的模型或微调,限制了其应用范围和实用性。
- ZEBRA框架是一种零射击脑视觉解码方法,无需主体特定适应。
- ZEBRA利用对抗训练解开fMRI表示的组件,实现主体不变、语义特定的表示。
- ZEBRA在未见过的主体上表现出强大的泛化能力,无需额外数据和重新训练。
- 实验表明ZEBRA性能显著优于零射击基线,并在多个指标上接近完全微调模型。
点此查看论文截图
UP2D: Uncertainty-aware Progressive Pseudo-label Denoising for Source-Free Domain Adaptive Medical Image Segmentation
Authors:Quang-Khai Bui-Tran, Thanh-Huy Nguyen, Manh D. Ho, Thinh B. Lam, Vi Vu, Hoang-Thien Nguyen, Phat Huynh, Ulas Bagci
Medical image segmentation models face severe performance drops under domain shifts, especially when data sharing constraints prevent access to source images. We present a novel Uncertainty-aware Progressive Pseudo-label Denoising (UP2D) framework for source-free domain adaptation (SFDA), designed to mitigate noisy pseudo-labels and class imbalance during adaptation. UP2D integrates three key components: (i) a Refined Prototype Filtering module that suppresses uninformative regions and constructs reliable class prototypes to denoise pseudo-labels, (ii) an Uncertainty-Guided EMA (UG-EMA) strategy that selectively updates the teacher model based on spatially weighted boundary uncertainty, and (iii) a quantile-based entropy minimization scheme that focuses learning on ambiguous regions while avoiding overconfidence on easy pixels. This single-stage student-teacher framework progressively improves pseudo-label quality and reduces confirmation bias. Extensive experiments on three challenging retinal fundus benchmarks demonstrate that UP2D achieves state-of-the-art performance across both standard and open-domain settings, outperforming prior UDA and SFDA approaches while maintaining superior boundary precision.
医学图像分割模型在领域迁移下面临性能严重下降的问题,尤其是在数据共享受限无法访问源图像时。我们提出了一种新型的无需源数据的域适应(SFDA)不确定性感知渐进伪标签去噪(UP2D)框架,旨在减轻伪标签噪声和类不平衡问题。UP2D集成了三个关键组件:(i)精细化原型过滤模块,该模块抑制了非信息区域并构建了可靠的类原型以去噪伪标签;(ii)基于不确定性的EMA(UG-EMA)策略,根据空间加权边界不确定性选择性地更新教师模型;(iii)基于分位数的熵最小化方案,专注于学习模糊区域,同时避免对简单像素过于自信。这个单一阶段的学生-教师框架逐步提高了伪标签质量并减少了确认偏见。在三个具有挑战性的视网膜基金标准上的广泛实验表明,UP2D在标准和开放领域设置中均达到了最新性能水平,优于先前的UDA和SFDA方法,同时保持了优越的边界精度。
论文及项目相关链接
Summary
医学图像分割模型在跨域时性能严重下降,特别是在数据共享受限无法获取源图像时。本文提出了一种新型的不确定性感知渐进伪标签去噪(UP2D)框架,用于无源域自适应(SFDA),旨在减轻伪标签噪声和类不平衡问题。UP2D集成了三个关键组件,包括精细化原型过滤模块、不确定性引导EMA策略和基于分位数熵最小化方案。该单阶段学生-教师框架逐步提高了伪标签质量,减少了确认偏见。在三个具有挑战性的视网膜基金图像数据集上的实验表明,UP2D在标准域和开放域设置下均达到了最新技术水平,优于先前的域自适应和SFDA方法,同时保持了较高的边界精度。
Key Takeaways
- 医学图像分割模型在跨域时面临性能下降问题,特别是在数据共享受限的情况下。
- UP2D框架旨在解决无源域自适应(SFDA)中的伪标签噪声和类不平衡问题。
- UP2D集成了精细化原型过滤模块、不确定性引导EMA策略和基于分位数熵最小化方案三个关键组件。
- 单阶段学生-教师框架逐步提高伪标签质量,减少确认偏见。
- UP2D在视网膜基金图像数据集上实现了最新技术水平。
- UP2D优于先前的域自适应和SFDA方法。
点此查看论文截图
MORE: Multi-Organ Medical Image REconstruction Dataset
Authors:Shaokai Wu, Yapan Guo, Yanbiao Ji, Jing Tong, Yuxiang Lu, Mei Li, Suizhi Huang, Yue Ding, Hongtao Lu
CT reconstruction provides radiologists with images for diagnosis and treatment, yet current deep learning methods are typically limited to specific anatomies and datasets, hindering generalization ability to unseen anatomies and lesions. To address this, we introduce the Multi-Organ medical image REconstruction (MORE) dataset, comprising CT scans across 9 diverse anatomies with 15 lesion types. This dataset serves two key purposes: (1) enabling robust training of deep learning models on extensive, heterogeneous data, and (2) facilitating rigorous evaluation of model generalization for CT reconstruction. We further establish a strong baseline solution that outperforms prior approaches under these challenging conditions. Our results demonstrate that: (1) a comprehensive dataset helps improve the generalization capability of models, and (2) optimization-based methods offer enhanced robustness for unseen anatomies. The MORE dataset is freely accessible under CC-BY-NC 4.0 at our project page https://more-med.github.io/
计算机断层扫描重建为放射科医生提供了用于诊断和治疗的图像。然而,当前的深度学习方法通常仅限于特定的解剖部位和数据集,阻碍了其在未见过的解剖部位和病变上的泛化能力。为了解决这一问题,我们引入了多器官医学图像重建(MORE)数据集,该数据集包含9种不同解剖部位的计算机断层扫描图像,涉及15种病变类型。该数据集有两个主要目的:(1)能够在广泛、异质的数据上进行深度学习模型的稳健训练;(2)对计算机断层扫描重建的模型泛化能力进行严格评估。我们还建立了一个强大的基线解决方案,在这个挑战条件下超越了以前的方法。我们的结果表明:(1)综合数据集有助于提高模型的泛化能力;(2)基于优化的方法为未见过的解剖部位提供了更强的稳健性。MORE数据集可在我们的项目页面 https://more-med.github.io/ 下免费访问,遵循CC-BY-NC 4.0协议。
论文及项目相关链接
PDF Accepted to ACMMM 2025
Summary
本文介绍了针对计算机断层扫描(CT)重建的多器官医学图像重建(MORE)数据集。该数据集包含9种不同解剖结构和15种病变类型的CT扫描,旨在促进深度学习模型在广泛、异质数据上的稳健训练,并评估模型对未见解剖结构和病变的泛化能力。通过使用该数据集,建立了强大的基线解决方案,该方案在具有挑战性的条件下优于以前的方法。总结来说,全面数据集有助于提高模型的泛化能力,而基于优化的方法则提供了对未见解剖结构的增强稳健性。
Key Takeaways
- 介绍了多器官医学图像重建(MORE)数据集,包含9种不同解剖结构和15种病变类型的CT扫描。
- MORE数据集旨在促进深度学习模型在广泛、异质数据上的训练,并评估模型对未见解剖结构和病变的泛化能力。
- 使用MORE数据集建立了强大的基线解决方案。
- 全面数据集有助于提高模型的泛化能力。
- 基于优化的方法提供了对未见解剖结构的增强稳健性。
- MORE数据集可通过CC-BY-NC 4.0协议免费访问。
- 该项目页面为“https://more-med.github.io/”。
点此查看论文截图
Simultaneous optimization of non-coplanar beam orientations and cumulative EQD2 distribution for high-dose reirradiation of locoregionally recurrent non-small cell lung cancer
Authors:Nathan Torelli, Jonas Willmann, Katja Daehler, Madalyne Day, Nicolaus Andratschke, Jan Unkelbach
Background and Purpose: Reirradiation for non-small cell lung cancer (NSCLC) is commonly delivered using coplanar techniques. In this study, we developed a beam orientation optimization algorithm for reirradiation planning to investigate whether the selection of favorable non-coplanar beam orientations may limit cumulative doses to critical organs-at-risk (OARs) and thus improve the therapeutic window. Materials and Methods: Fifteen cases of challenging high-dose reirradiation for locoregionally recurrent NSCLC were included in this in-silico study. For each patient, the dose distribution from the previous treatment was first mapped to the reirradiation planning CT using rigid dose registration, and subsequently converted to equivalent dose in 2 Gy fractions (EQD2). A 2-arc non-coplanar reirradiation plan, combining dynamic gantry and couch rotation, was then generated using an EQD2-based direct aperture optimization algorithm, which allows for the simultaneous optimization of the dynamic gantry-couch path and the cumulative EQD2 distribution. Non-coplanar reirradiation plans were benchmarked against 2-arc coplanar VMAT plans, which mimic state-of-the-art practice for reirradiation of NSCLC. Results: Non-coplanar reirradiation plans could reduce the maximum cumulative EQD2 to critical OARs such as bronchial tree, esophagus, thoracic wall and trachea by at least 5 Gy2 for 6 out of 15 patients compared to coplanar reirradiation plans. At the same time, target coverage and lung EQD2 metrics were comparable for both methods. Conclusions: The automated selection of favorable non-coplanar beam orientations may reduce the maximum cumulative EQD2 to critical OARs in challenging thoracic reirradiation cases. This allows to explore either better OAR sparing or dose-escalation in future clinical studies.
背景与目的:非小细胞肺癌(NSCLC)的再照射通常使用共面技术进行治疗。本研究开发了一种束流方向优化算法,用于再照射治疗计划,以研究选择合适的非共面束流方向是否可能减少关键危险器官的累积剂量,从而提高治疗窗口。材料与方法:本研究采用虚拟模拟方式对15例局部区域复发NSCLC的高剂量再照射治疗进行研究。对于每位患者,首先使用刚剂量注册将前次治疗的剂量分布映射到再照射计划CT上,然后转换为等效剂量(在2 Gy剂量中)。然后采用基于等效剂量的直接孔径优化算法生成一个由动态机架和床板旋转相结合的两弧非共面再照射计划,该算法允许同时优化动态机架-床板路径和累积等效剂量分布。非共面再照射计划与两弧共面VMAT计划进行了比较,后者模拟了NSCLC再照射的现行实践。结果:对于这十五例患者中的六名患者,与两弧共面再照射计划相比,非共面再照射计划可以减少对支气管树、食道、胸壁和气管等关键危险器官的累积等效剂量至少减少5 Gy²。同时,两种方法的目标覆盖范围和肺等效剂量指标相当。结论:自动选择有利的非共面束流方向可以减少具有挑战性的胸部再照射病例中关键危险器官的累积等效剂量最大值。这为未来的临床研究提供了更好的器官保护或剂量递增的可行性。
论文及项目相关链接
Summary
基于非共面技术进行优化后的放射治疗计划对非小细胞肺癌患者而言可优化治疗效果。该计划采用非共面弧进行放射束操作以实现对敏感器官的累积剂量最小化。该方法可在不影响目标覆盖率和肺EQD2指标的前提下实现目的。总体而言,研究表明对非小细胞肺癌患者的重新放疗进行光束方位角优化可提高治疗效果和临床可行性。此外,优化算法能够实现治疗目标的良好覆盖以及实现器官的剂量累积降低的目标,从而减少敏感器官如支气管树、食管、胸壁和气管等器官对高剂量辐射的暴露风险。
Key Takeaways
关于医学图像中重放疗方案的优化研究关键洞见如下:
- 重放疗中常用共面技术来治疗非小细胞肺癌(NSCLC)。但对于复杂病例,可能需要更先进的计划方案以提高治疗效果。
- 研究采用非共面弧技术优化重放疗计划,旨在减少敏感器官对高剂量辐射的累积暴露风险。这有助于提高治疗效果并减少并发症风险。具体来说,研究指出非共面计划可针对支气管树、食管等关键器官降低最大累积等效剂量(EQD2)。这为未来临床研究中更好的器官保护或剂量提升提供了可能性。
点此查看论文截图
MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction
Authors:Shunjie-Fabian Zheng, Hyeonjun Lee, Thijs Kooi, Ali Diba
Large annotated datasets are essential for training robust Computer-Aided Diagnosis (CAD) models for breast cancer detection or risk prediction. However, acquiring such datasets with fine-detailed annotation is both costly and time-consuming. Vision-Language Models (VLMs), such as CLIP, which are pre-trained on large image-text pairs, offer a promising solution by enhancing robustness and data efficiency in medical imaging tasks. This paper introduces a novel Multi-View Mammography and Language Model for breast cancer classification and risk prediction, trained on a dataset of paired mammogram images and synthetic radiology reports. Our MV-MLM leverages multi-view supervision to learn rich representations from extensive radiology data by employing cross-modal self-supervision across image-text pairs. This includes multiple views and the corresponding pseudo-radiology reports. We propose a novel joint visual-textual learning strategy to enhance generalization and accuracy performance over different data types and tasks to distinguish breast tissues or cancer characteristics(calcification, mass) and utilize these patterns to understand mammography images and predict cancer risk. We evaluated our method on both private and publicly available datasets, demonstrating that the proposed model achieves state-of-the-art performance in three classification tasks: (1) malignancy classification, (2) subtype classification, and (3) image-based cancer risk prediction. Furthermore, the model exhibits strong data efficiency, outperforming existing fully supervised or VLM baselines while trained on synthetic text reports and without the need for actual radiology reports.
大规模注释数据集对于训练用于乳腺癌检测或风险预测的稳健计算机辅助诊断(CAD)模型至关重要。然而,获取具有精细详细注释的此类数据集既昂贵又耗时。视觉语言模型(VLM),如CLIP,在大规模图像文本对上进行预训练,为解决医学成像任务中的稳健性和数据效率问题提供了有前景的解决方案。本文介绍了一种用于乳腺癌分类和风险预测的新型多视角乳腺X光摄影和语言模型(MV-MLM)。该模型在配对的乳腺X光图像和合成放射学报告数据集上进行训练。我们的MV-MLM利用多视图监督,通过跨图像文本对的跨模态自监督学习从广泛的放射学数据中学习丰富的表示。这包括多个视图和相应的伪放射学报告。我们提出了一种新型联合视觉文本学习策略,以提高不同数据类型和任务上的通用性和准确性,以区分乳腺组织或癌症特征(钙化、肿块),并利用这些模式来理解乳腺X光图像并预测癌症风险。我们在私有和公开可用数据集上评估了我们的方法,结果表明,该模型在三项分类任务上达到了最新性能:(1)恶性分类,(2)亚型分类,(3)基于图像的癌症风险预测。此外,该模型具有很强的数据效率,在合成文本报告上进行训练时,无需实际放射学报告即可超越现有的完全监督或VLM基线。
论文及项目相关链接
PDF Accepted to Computer Vision for Automated Medical Diagnosis (CVAMD) Workshop at ICCV 2025
Summary
本文介绍了一种利用多视角乳腺摄影与语言模型(MV-MLM)进行乳腺癌分类与风险预测的新方法。该方法利用大规模预训练的视觉-语言模型(如CLIP),在配对乳腺X光图像和合成放射学报告的数据集上进行训练。通过多视角监督学习和跨模态自监督学习,MV-MLM能够从丰富的放射学数据中学习表示,并在不同类型的任务和数据集上实现优异的推广和准确性能。在私有和公开数据集上的实验表明,该方法在恶性分类、亚型分类和基于图像的癌症风险预测三项任务中达到最新性能水平,并在使用合成文本报告时表现出强大的数据效率,无需实际放射学报告。
Key Takeaways
- 大型注释数据集对于训练计算机辅助诊断(CAD)模型进行乳腺癌检测或风险预测至关重要。
- 获取精细详细注释的数据集既昂贵又耗时。
- 视觉-语言模型(VLMs)如CLIP能提高医学成像任务中的稳健性和数据效率。
- 本文提出了一种新的多视角乳腺摄影与语言模型(MV-MLM)进行乳腺癌分类与风险预测。
- MV-MLM利用多视角监督学习从丰富的放射学数据中学习表示。
- MV-MLM在多种分类任务中达到最新性能水平,包括恶性分类、亚型分类和基于图像的癌症风险预测。
点此查看论文截图
FlexICL: A Flexible Visual In-context Learning Framework for Elbow and Wrist Ultrasound Segmentation
Authors:Yuyue Zhou, Jessica Knight, Shrimanti Ghosh, Banafshe Felfeliyan, Jacob L. Jaremko, Abhilash R. Hareendranathan
Elbow and wrist fractures are the most common fractures in pediatric populations. Automatic segmentation of musculoskeletal structures in ultrasound (US) can improve diagnostic accuracy and treatment planning. Fractures appear as cortical defects but require expert interpretation. Deep learning (DL) can provide real-time feedback and highlight key structures, helping lightly trained users perform exams more confidently. However, pixel-wise expert annotations for training remain time-consuming and costly. To address this challenge, we propose FlexICL, a novel and flexible in-context learning (ICL) framework for segmenting bony regions in US images. We apply it to an intra-video segmentation setting, where experts annotate only a small subset of frames, and the model segments unseen frames. We systematically investigate various image concatenation techniques and training strategies for visual ICL and introduce novel concatenation methods that significantly enhance model performance with limited labeled data. By integrating multiple augmentation strategies, FlexICL achieves robust segmentation performance across four wrist and elbow US datasets while requiring only 5% of the training images. It outperforms state-of-the-art visual ICL models like Painter, MAE-VQGAN, and conventional segmentation models like U-Net and TransUNet by 1-27% Dice coefficient on 1,252 US sweeps. These initial results highlight the potential of FlexICL as an efficient and scalable solution for US image segmentation well suited for medical imaging use cases where labeled data is scarce.
肘关节和腕部骨折是儿童中最常见的骨折类型。超声(US)中骨骼肌肉结构的自动分割可以提高诊断准确性和治疗计划。骨折表现为皮质缺陷,但需要进行专家解读。深度学习(DL)可以提供实时反馈并突出关键结构,帮助轻度训练的用户更有信心地进行检查。然而,训练过程中的逐像素专家注释仍然耗时且成本高昂。为了应对这一挑战,我们提出了FlexICL,这是一种用于超声图像分割骨骼区域的新型灵活上下文学习(ICL)框架。我们将其应用于视频内分割设置,其中专家只注释一小部分帧,模型则分割未见过的帧。我们系统地研究了各种图像拼接技术和训练策略,用于视觉ICL,并引入了新型拼接方法,在有限标记数据的情况下显着提高了模型性能。通过整合多种增强策略,FlexICL在四个手腕和肘关节超声数据集上实现了稳健的分割性能,并且只需要使用5%的训练图像。它在1252次超声扫描中,相较于先进的视觉ICL模型(如Painter、MAE-VQGAN)和传统分割模型(如U-Net和TransUNet),Dice系数提高了1-27%。这些初步结果突显了FlexICL作为超声图像分割的高效可伸缩解决方案的潜力,尤其适合医学影像用例中标记数据稀缺的情况。
论文及项目相关链接
Summary
本文介绍了一种名为FlexICL的新型灵活上下文学习框架,用于在超声图像中分割骨骼区域。该框架应用于视频内分割场景,专家只需对一小部分帧进行标注,模型即可对未见的帧进行分割。通过探索各种图像拼接技术和训练策略,并引入新的拼接方法,FlexICL在有限的标注数据下实现了出色的分割性能。在四个手腕和肘部超声数据集上,仅需使用5%的训练图像,该框架便实现了稳健的分割效果,优于其他先进视觉上下文学习模型和传统分割模型。
Key Takeaways
- FlexICL是一种新型的上下文学习框架,用于在超声图像中分割骨骼区域。
- 该框架适用于视频内分割场景,专家只需标注一小部分帧。
- FlexICL通过探索多种图像拼接技术和训练策略,增强了模型性能。
- 引入的新型拼接方法显著提高了在有限标注数据下的模型表现。
- FlexICL实现了稳健的分割性能,适用于手腕和肘部超声数据的分割。
- 该框架优于其他先进的视觉上下文学习模型和传统分割模型。
点此查看论文截图
Fine-tuning Segment Anything for Real-Time Tumor Tracking in Cine-MRI
Authors:Valentin Boussot, Cédric Hémon, Jean-Claude Nunes, Jean-Louis Dillenseger
In this work, we address the TrackRAD2025 challenge of real-time tumor tracking in cine-MRI sequences of the thoracic and abdominal regions under strong data scarcity constraints. Two complementary strategies were explored: (i) unsupervised registration with the IMPACT similarity metric and (ii) foundation model-based segmentation leveraging SAM 2.1 and its recent variants through prompt-based interaction. Due to the one-second runtime constraint, the SAM-based method was ultimately selected. The final configuration used SAM2.1 b+ with mask-based prompts from the first annotated slice, fine-tuned solely on the small labeled subset from TrackRAD2025. Training was configured to minimize overfitting, using 1024x1024 patches (batch size 1), standard augmentations, and a balanced Dice + IoU loss. A low uniform learning rate (0.0001) was applied to all modules (prompt encoder, decoder, Hiera backbone) to preserve generalization while adapting to annotator-specific styles. Training lasted 300 epochs (~12h on RTX A6000, 48GB). The same inference strategy was consistently applied across all anatomical sites and MRI field strengths. Test-time augmentation was considered but ultimately discarded due to negligible performance gains. The final model was selected based on the highest Dice Similarity Coefficient achieved on the validation set after fine-tuning. On the hidden test set, the model reached a Dice score of 0.8794, ranking 6th overall in the TrackRAD2025 challenge. These results highlight the strong potential of foundation models for accurate and real-time tumor tracking in MRI-guided radiotherapy.
在这项工作中,我们针对TrackRAD2025挑战,在强烈的数据稀缺限制下,实现了胸部和腹部MRI序列中的实时肿瘤跟踪。探索了两种互补策略:(i)使用IMPACT相似度指标的无监督注册;(ii)基于提示交互的SAM 2.1及其最新变种的基础模型分割。由于存在一秒钟的运行时间限制,最终选择了基于SAM的方法。最终配置使用了SAM 2.1 b+,并基于来自第一个标注切片的掩模提示,仅对TrackRAD2025中的小部分标记数据进行微调。训练配置旨在最小化过度拟合,使用1024x1024的补丁(批次大小为1)、标准增强和平衡的Dice + IoU损失。所有模块(提示编码器、解码器、Hiera骨干网)应用低统一学习率(0.0001),以在适应注释器特定风格的同时保持泛化能力。训练持续了300个周期(在RTX A6000、48GB上约12小时)。所有解剖部位和MRI场强均始终应用相同的推理策略。考虑了测试时间增强,但最终因几乎可以忽略的性能提升而遭到放弃。最终模型的选择基于微调后在验证集上实现的最高Dice相似系数。在隐藏测试集上,该模型达到了0.8794的Dice得分,在TrackRAD2025挑战中总体排名第6。这些结果突显了基础模型在MRI引导的放射治疗中进行准确实时肿瘤跟踪的强大潜力。
论文及项目相关链接
PDF Paper for the Trackrad2025 challenge, Team BreizhTrack
Summary
该研究针对TrackRAD2025挑战,研究在数据稀缺条件下实时追踪胸腹部MRI序列中的肿瘤。采用两种互补策略:一是使用IMPACT相似性度量的无监督注册,二是基于SAM 2.1及其变体通过提示交互的基础模型分割法。最终选用SAM-based方法,在TrackRAD2025的小标注子集上进行微调,实现了高效的肿瘤追踪。模型在验证集上达到较高的Dice相似系数,在隐藏测试集上排名第六。结果展示了基础模型在MRI引导的放射治疗中进行准确实时肿瘤追踪的强大潜力。
Key Takeaways
- 研究针对TrackRAD2025挑战,解决实时肿瘤追踪问题。
- 采用两种策略:无监督注册和基于基础模型的分割法。
- 最终选用SAM-based方法,满足一秒钟运行时间限制。
- 使用SAM2.1 b+与基于掩膜的提示,从首个注释切片开始微调。
- 训练配置旨在最小化过拟合,使用1024x1024的补丁和平衡Dice + IoU损失。
- 应用低统一学习率以保留泛化能力,同时适应注释器特定风格。
点此查看论文截图
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
Authors:Roman Beliy, Amit Zalcher, Jonathan Kogman, Navve Wasserman, Michal Irani
Reconstructing images seen by people from their fMRI brain recordings provides a non-invasive window into the human brain. Despite recent progress enabled by diffusion models, current methods often lack faithfulness to the actual seen images. We present “Brain-IT”, a brain-inspired approach that addresses this challenge through a Brain Interaction Transformer (BIT), allowing effective interactions between clusters of functionally-similar brain-voxels. These functional-clusters are shared by all subjects, serving as building blocks for integrating information both within and across brains. All model components are shared by all clusters & subjects, allowing efficient training with a limited amount of data. To guide the image reconstruction, BIT predicts two complementary localized patch-level image features: (i)high-level semantic features which steer the diffusion model toward the correct semantic content of the image; and (ii)low-level structural features which help to initialize the diffusion process with the correct coarse layout of the image. BIT’s design enables direct flow of information from brain-voxel clusters to localized image features. Through these principles, our method achieves image reconstructions from fMRI that faithfully reconstruct the seen images, and surpass current SotA approaches both visually and by standard objective metrics. Moreover, with only 1-hour of fMRI data from a new subject, we achieve results comparable to current methods trained on full 40-hour recordings.
通过从fMRI脑记录中重建人们看到的图像,为我们提供了一个无创地观察人脑的窗口。尽管最近由扩散模型取得的进展,当前的方法通常缺乏对实际看到的图像的忠实度。我们提出“Brain-IT”,这是一种受大脑启发的方法,通过Brain Interaction Transformer (BIT)解决这一挑战,使功能相似的脑体素簇之间进行有效的交互。这些功能簇是所有主题所共有的,可作为整合脑内和跨脑信息的构建块。所有模型组件都被所有集群和主题所共享,允许在有限的数据量上进行高效训练。为了引导图像重建,BIT预测两个互补的局部斑块级图像特征:(i)高级语义特征,引导扩散模型走向图像的正确语义内容;(ii)低级结构特征,有助于以正确的粗略布局初始化扩散过程。BIT的设计实现了从脑体素簇到局部图像特征的直接信息流。通过这些原理,我们的方法能够从fMRI重建图像,忠实重建所看到的图像,无论在视觉上还是通过标准客观指标上,都超越了当前的最先进方法。而且,仅使用新主题的1小时fMRI数据,我们就能达到与当前在完整的40小时记录上训练的方法相当的结果。
论文及项目相关链接
Summary
通过fMRI脑记录重建人类所看到的图像,提供了一个非侵入式的研究人类大脑的方式。尽管扩散模型取得了最新进展,但当前的方法往往不能真实地重建实际看到的图像。本研究提出了“Brain-IT”,这是一种受大脑启发的解决方案,它通过Brain Interaction Transformer(BIT)解决了这一挑战,使功能相似的脑体素簇之间能够进行有效的交互。这些功能集群对所有受试者都是通用的,可作为整合脑内和跨脑信息的构建模块。所有模型组件都被所有集群和受试者共享,使得在有限的数据量上进行高效训练成为可能。为了指导图像重建,BIT预测了两个互补的局部斑块级图像特征:一是高层次的语义特征,它引导扩散模型走向图像的正确语义内容;二是低层次的结构特征,它有助于初始化扩散过程,形成图像的正确粗略布局。Brain-IT的设计实现了从脑体素簇到局部图像特征的直接信息流。通过这一原理,我们的方法能够重建忠实于所看到的图像的fMRI图像,无论在视觉上还是通过标准客观指标,都超越了当前的最优方法。此外,仅使用新受试者1小时的fMRI数据,我们就能达到与当前方法相当的结果,而当前方法需要在完整的40小时记录上进行训练。
Key Takeaways
- 通过fMRI记录重建图像提供了非侵入式研究大脑的方式。
- 当前方法在重建真实图像时存在不忠实的问题。
- Brain-IT通过Brain Interaction Transformer(BIT)解决此挑战,允许功能相似的脑体素簇间有效交互。
- 功能集群对所有受试者通用,作为整合信息的构建模块。
- BIT预测高层次的语义特征和低层次的结构特征来指导图像重建。
- Brain-IT设计实现直接从脑体素簇到局部图像特征的信息流。
点此查看论文截图
Physics-Guided Conditional Diffusion Networks for Microwave Image Reconstruction
Authors:Shirin Chehelgami, Joe LoVetri, Vahab Khoshdel
A conditional latent-diffusion based framework for solving the electromagnetic inverse scattering problem associated with microwave imaging is introduced. This generative machine-learning model explicitly mirrors the non-uniqueness of the ill-posed inverse problem. Unlike existing inverse solvers utilizing deterministic machine learning techniques that produce a single reconstruction, the proposed latent-diffusion model generates multiple plausible permittivity maps conditioned on measured scattered-field data, thereby generating several potential instances in the range-space of the non-unique inverse mapping. A forward electromagnetic solver is integrated into the reconstruction pipeline as a physics-based evaluation mechanism. The space of candidate reconstructions form a distribution of possibilities consistent with the conditioning data and the member of this space yielding the lowest scattered-field data discrepancy between the predicted and measured scattered fields is reported as the final solution. Synthetic and experimental labeled datasets are used for training and evaluation of the model. An innovative labeled synthetic dataset is created that exemplifies a varied set of scattering features. Training of the model using this new dataset produces high quality permittivity reconstructions achieving improved generalization with excellent fidelity to shape recognition. The results highlight the potential of hybrid generative physics frameworks as a promising direction for robust, data-driven microwave imaging.
介绍了一个基于条件潜扩散的框架,用于解决与微波成像相关的电磁逆散射问题。这一生成机器学习模型明确反映了不适定逆问题的非唯一性。不同于现有的利用确定性机器学习技术产生单一重建的逆求解器,所提出潜扩散模型根据测量散射场数据生成多个合理的介电常数图,从而生成非唯一逆映射范围空间中的多个潜在实例。将正向电磁求解器集成到重建流程中作为基于物理的评估机制。候选重建的空间形成与条件数据一致的可能性分布,并报告了使预测和测量散射场之间的散射场数据差异最小的空间成员为最终解决方案。合成和实验标记数据集用于训练和评估模型。创建了一个创新的标记合成数据集,该数据集展示了各种散射特征。使用该新数据集对模型进行训练,产生了高质量的介电常数重建,实现了良好的泛化能力并对形状识别具有高度的保真度。结果突出了混合生成物理框架作为稳健的数据驱动微波成像的有前途方向所具备的巨大潜力。
论文及项目相关链接
Summary
本文介绍了一种基于条件潜在扩散的解决微波成像中的电磁逆散射问题的框架。该生成式机器学习模型明确反映了不适定逆问题的非唯一性。与传统的利用确定性机器学习技术求解逆问题的重建方法不同,所提出的潜在扩散模型基于测量的散射场数据生成多个可能的介电常数图,从而生成多个潜在的映射范围空间的实例。结合正向电磁求解器作为基于物理的评估机制,将候选重建物的空间形成与条件数据一致的可能性分布,并报告在预测和测量散射场之间具有最低散射场数据差异的空间成员作为最终解决方案。使用合成和实验标记数据集对模型进行训练和评估。使用创新标记的合成数据集创建了一个展示各种散射特征的样本集。使用该数据集训练模型产生了高质量的介电常数重建,实现了良好的形状识别泛化能力。结果突出了混合生成物理框架在稳健的数据驱动微波成像中的潜力。
Key Takeaways
- 介绍了一种基于条件潜在扩散的框架,用于解决与微波成像相关的电磁逆散射问题。
- 该框架是一个生成式机器学习模型,能够处理逆问题的非唯一性。
- 与其他确定性机器学习逆求解器不同,该模型能生成多个可能的介电常数图。
- 集成正向电磁求解器作为基于物理的评估机制。
- 候选重建物的空间形成一个与条件数据一致的可能性分布。
- 使用合成和实验标记数据集对模型进行训练和评估。
点此查看论文截图
Comparative Study of UNet-based Architectures for Liver Tumor Segmentation in Multi-Phase Contrast-Enhanced Computed Tomography
Authors:Doan-Van-Anh Ly, Thi-Thu-Hien Pham, Thanh-Hai Le
Segmentation of liver structures in multi-phase contrast-enhanced computed tomography (CECT) plays a crucial role in computer-aided diagnosis and treatment planning for liver diseases, including tumor detection. In this study, we investigate the performance of UNet-based architectures for liver tumor segmentation, starting from the original UNet and extending to UNet3+ with various backbone networks. We evaluate ResNet, Transformer-based, and State-space (Mamba) backbones, all initialized with pretrained weights. Surprisingly, despite the advances in modern architecture, ResNet-based models consistently outperform Transformer- and Mamba-based alternatives across multiple evaluation metrics. To further improve segmentation quality, we introduce attention mechanisms into the backbone and observe that incorporating the Convolutional Block Attention Module (CBAM) yields the best performance. ResNetUNet3+ with CBAM module not only produced the best overlap metrics with a Dice score of 0.755 and IoU of 0.662, but also achieved the most precise boundary delineation, evidenced by the lowest HD95 distance of 77.911. The model’s superiority was further cemented by its leading overall accuracy of 0.925 and specificity of 0.926, showcasing its robust capability in accurately identifying both lesion and healthy tissue. To further enhance interpretability, Grad-CAM visualizations were employed to highlight the region’s most influential predictions, providing insights into its decision-making process. These findings demonstrate that classical ResNet architecture, when combined with modern attention modules, remain highly competitive for medical image segmentation tasks, offering a promising direction for liver tumor detection in clinical practice.
在多种阶段的增强型计算机断层扫描(CECT)中,肝脏结构的分割对肝脏疾病的计算机辅助诊断和治疗计划(包括肿瘤检测)起着至关重要的作用。本研究中,我们调查了基于UNet架构的肝脏肿瘤分割性能,从原始的UNet扩展到具有各种骨干网络的UNet3+。我们评估了ResNet、基于Transformer和State-space(Mamba)的骨干网络,所有网络都使用预训练权重进行初始化。令人惊讶的是,尽管现代架构有所进步,基于ResNet的模型在多个评估指标上始终优于基于Transformer和Mamba的替代方案。为进一步提高分割质量,我们在骨干中引入了注意力机制,并观察到加入卷积块注意力模块(CBAM)会取得最佳性能。带有CBAM模块的ResNetUNet3+不仅以Dice系数0.755和IoU 0.662的最佳重叠度指标产出结果,而且实现了最精确的边界描绘,以最低的HD95距离77.911为证据。该模型的优越性还体现在其总体准确率0.925和特异性0.926上,展示出其准确识别病变组织和健康组织的稳健能力。为了进一步提高可解释性,采用了Grad-CAM可视化来突出显示对预测最具有影响力的区域,从而深入了解其决策过程。这些发现表明,经典的ResNet架构与现代注意力模块相结合时,在医学图像分割任务中仍具有高度的竞争力,为临床实践中的肝脏肿瘤检测提供了有前景的方向。
论文及项目相关链接
PDF 27 pages, 8 figures
摘要
本研究探讨了基于UNet架构的肝脏肿瘤分割性能,从原始UNet到UNet3+,并采用了多种backbone网络。实验结果显示,尽管现代架构有所进步,但基于ResNet的模型在多个评估指标上持续优于基于Transformer和Mamba的模型。引入注意力机制后,特别是加入卷积块注意力模块(CBAM)的ResNetUNet3+表现最佳,不仅重叠度量指标(Dice分数为0.755,IoU为0.662)最佳,而且边界划定最精确(HD95距离为77.911),总体准确度(0.925)和特异性(0.926)领先。Grad-CAM可视化技术进一步增强了模型的解释性,突出了预测中最具影响力的区域,为决策过程提供了见解。这些发现表明,经典ResNet架构与现代注意力模块的结合在医学图像分割任务中仍具有竞争力,为临床实践中肝脏肿瘤检测提供了有前景的方向。
关键见解
- UNet架构在肝脏肿瘤分割中具有关键作用。
- 在多种backbone网络中,ResNet模型表现出最佳性能。
- 引入注意力机制提高了分割质量,尤其是CBAM模块的应用。
- ResNetUNet3+与CBAM的结合在重叠度量、边界划定、总体准确度和特异性方面均表现最佳。
- Grad-CAM可视化技术增强了模型的解释性。
- 经典ResNet架构与现代模块的融合在医学图像分割中仍具有竞争力。
点此查看论文截图
Aligning What You Separate: Denoised Patch Mixing for Source-Free Domain Adaptation in Medical Image Segmentation
Authors:Quang-Khai Bui-Tran, Thanh-Huy Nguyen, Hoang-Thien Nguyen, Ba-Thinh Lam, Nguyen Lan Vi Vu, Phat K. Huynh, Ulas Bagci, Min Xu
Source-Free Domain Adaptation (SFDA) is emerging as a compelling solution for medical image segmentation under privacy constraints, yet current approaches often ignore sample difficulty and struggle with noisy supervision under domain shift. We present a new SFDA framework that leverages Hard Sample Selection and Denoised Patch Mixing to progressively align target distributions. First, unlabeled images are partitioned into reliable and unreliable subsets through entropy-similarity analysis, allowing adaptation to start from easy samples and gradually incorporate harder ones. Next, pseudo-labels are refined via Monte Carlo-based denoising masks, which suppress unreliable pixels and stabilize training. Finally, intra- and inter-domain objectives mix patches between subsets, transferring reliable semantics while mitigating noise. Experiments on benchmark datasets show consistent gains over prior SFDA and UDA methods, delivering more accurate boundary delineation and achieving state-of-the-art Dice and ASSD scores. Our study highlights the importance of progressive adaptation and denoised supervision for robust segmentation under domain shift.
源域自由适应(SFDA)作为隐私约束下的医学图像分割的一种吸引人的解决方案正崭露头角,然而,当前的方法往往忽略了样本的难度,并在域转移时面临噪声监督的挑战。我们提出了一种新的SFDA框架,利用硬样本选择和去噪补丁混合来逐步对齐目标分布。首先,通过熵相似性分析将无标签图像划分为可靠和不可靠的子集,使适应从简单的样本开始,并逐步结合更困难的样本。接下来,通过基于蒙特卡洛的去噪掩膜对伪标签进行精炼,这可以抑制不可靠的像素并稳定训练。最后,域内和域间的目标在子集之间混合补丁,在传递可靠语义的同时减轻噪声。在基准数据集上的实验表明,与先前的SFDA和UDA方法相比,我们的方法具有一致的增益,实现了更准确的边界描绘,并达到了最先进的Dice和ASSD分数。我们的研究强调了逐步适应和去噪监督在域转移下进行稳健分割的重要性。
论文及项目相关链接
PDF 5 pages, 3 figures
Summary
医学图像分割中的无源域自适应(SFDA)方法在处理隐私约束下的医学图像分割时表现出巨大的潜力。针对现有方法忽略样本难度和跨域噪声监督的问题,我们提出了一种新的SFDA框架,结合硬样本选择和去噪补丁混合,逐步对齐目标分布。该方法通过��act分析将未标记图像分为可靠和不可靠子集,从容易样本开始适应并逐渐纳入困难样本。通过基于蒙特卡洛的去噪掩膜优化伪标签,抑制不可靠像素并稳定训练过程。此外,我们还将域内和域间的目标斑块混合在一起,在转移可靠语义的同时减轻噪声影响。在基准数据集上的实验表明,该方法在先前SFDA和UDA方法的基础上实现了一致的增益,取得了更精确的边界轮廓,并获得了领先的Dice和ASSD得分。本研究强调了渐进适应和去噪监督在跨域稳健分割中的重要性。
Key Takeaways
- 无源域自适应(SFDA)在处理隐私约束下的医学图像分割时具有潜力。
- 当前SFDA方法忽略样本难度和跨域噪声监督。
- 提出新的SFDA框架结合硬样本选择和去噪补丁混合。
- 通过熵分析将未标记图像分为可靠和不可靠子集,实现渐进适应。
- 使用基于蒙特卡洛的去噪掩膜优化伪标签,稳定训练过程。
- 实验结果表明,该方法实现了与现有方法相比的一致增益,并获得了更高的Dice和ASSD得分。
点此查看论文截图
Transformers in Medicine: Improving Vision-Language Alignment for Medical Image Captioning
Authors:Yogesh Thakku Suresh, Vishwajeet Shivaji Hogale, Luca-Alexandru Zamfira, Anandavardhana Hegde
We present a transformer-based multimodal framework for generating clinically relevant captions for MRI scans. Our system combines a DEiT-Small vision transformer as an image encoder, MediCareBERT for caption embedding, and a custom LSTM-based decoder. The architecture is designed to semantically align image and textual embeddings, using hybrid cosine-MSE loss and contrastive inference via vector similarity. We benchmark our method on the MultiCaRe dataset, comparing performance on filtered brain-only MRIs versus general MRI images against state-of-the-art medical image captioning methods including BLIP, R2GenGPT, and recent transformer-based approaches. Results show that focusing on domain-specific data improves caption accuracy and semantic alignment. Our work proposes a scalable, interpretable solution for automated medical image reporting.
我们提出了一种基于transformer的多模态框架,用于为MRI扫描生成临床相关的描述。我们的系统结合了DEiT-Small视觉transformer作为图像编码器、MediCareBERT用于描述嵌入,以及基于LSTM的自定义解码器。该架构旨在使用混合余弦-MSE损失和通过向量相似性的对比推断,语义对齐图像和文本嵌入。我们在MultiCaRe数据集上对我们的方法进行了基准测试,比较了过滤后的仅大脑MRI与通用MRI图像的绩效,并与最前沿的医疗图像描述方法,包括BLIP、R2GenGPT和最近的基于transformer的方法进行了比较。结果表明,专注于领域特定数据提高了描述的准确性和语义对齐。我们的工作提出了一种可扩展、可解释的解决方案,用于自动化医疗图像报告。
论文及项目相关链接
PDF This work is to appear in the Proceedings of MICAD 2025, the 6th International Conference on Medical Imaging and Computer-Aided Diagnosis
Summary
医学图像描述生成框架,采用transformer模型结合图像编码器、文本嵌入器和解码器,实现语义对齐。在MultiCaRe数据集上测试性能优于其他医疗图像描述生成方法。专注于领域特定数据,提高描述准确性和语义对齐性,提出可解释的自动化医学影像报告方案。
Key Takeaways
- 该研究提出了基于transformer的多模态框架,用于生成与MRI扫描相关的临床描述。
- 系统包含图像编码器(使用DEiT-Small视觉转换器)、文本嵌入器(使用MediCareBERT)和基于LSTM的解码器。
- 系统设计用于语义对齐图像和文本嵌入,采用混合的cosine-MSE损失和基于向量相似性的对比推断。
- 在MultiCaRe数据集上进行性能测试,对比了过滤后的脑部MRI与一般MRI图像上的表现,以及与其他先进的医疗图像描述生成方法。
- 研究结果表明,专注于领域特定数据能提高描述准确性和语义对齐性。
- 该方案为自动化医学影像报告提供了一个可扩展、可解释的解决方案。
点此查看论文截图