⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-06 更新
Resource-efficient Automatic Refinement of Segmentations via Weak Supervision from Light Feedback
Authors:Alix de Langlais, Benjamin Billot, Théo Aguilar Vidal, Marc-Olivier Gauci, Hervé Delingette
Delineating anatomical regions is a key task in medical image analysis. Manual segmentation achieves high accuracy but is labor-intensive and prone to variability, thus prompting the development of automated approaches. Recently, a breadth of foundation models has enabled automated segmentations across diverse anatomies and imaging modalities, but these may not always meet the clinical accuracy standards. While segmentation refinement strategies can improve performance, current methods depend on heavy user interactions or require fully supervised segmentations for training. Here, we present SCORE (Segmentation COrrection from Regional Evaluations), a weakly supervised framework that learns to refine mask predictions only using light feedback during training. Specifically, instead of relying on dense training image annotations, SCORE introduces a novel loss that leverages region-wise quality scores and over/under-segmentation error labels. We demonstrate SCORE on humerus CT scans, where it considerably improves initial predictions from TotalSegmentator, and achieves performance on par with existing refinement methods, while greatly reducing their supervision requirements and annotation time. Our code is available at: https://gitlab.inria.fr/adelangl/SCORE.
在医学图像分析中,划定解剖区域是一项关键任务。手动分割可以实现较高的准确性,但劳动强度大且易出现差异,从而推动了自动化方法的发展。最近,一系列基础模型已经实现了跨不同解剖结构和成像方式的自动分割,但这些可能并不总能达到临床精度标准。虽然分割细化策略可以提高性能,但当前的方法依赖于大量的用户交互或为训练需要大量完全监督的分割。在这里,我们提出了SCORE(基于区域评估的分割校正),这是一个弱监督框架,它能够在训练期间仅使用轻量级反馈来学习改进蒙版预测。具体来说,SCORE不是依赖于密集的训练图像注释,而是引入了一种新型损失函数,该损失函数利用区域质量分数和过度或不足的分割误差标签。我们在肱骨CT扫描中展示了SCORE的应用,它能显著改进TotalSegmentator的初步预测结果,实现与现有细化方法相当的性能,同时大大降低其监督要求和注释时间。我们的代码可在以下网址找到:https://gitlab.inria.fr/adelangl/SCORE。
论文及项目相关链接
Summary
本文介绍了一种名为SCORE的弱监督框架,用于改进医学图像分割的精度。该框架能够在训练过程中利用轻量级反馈对分割掩膜进行修正,通过引入新的损失函数,利用区域质量分数和过度或不足分割误差标签,实现对多种解剖部位和成像模态的自动化分割。在骨骼CT扫描上的实验表明,SCORE能够显著提高初始预测的准确性,并与现有修正方法达到相当的性能,同时大大降低监督要求和标注时间。
Key Takeaways
- SCORE框架使用弱监督学习方法对医学图像分割进行精细化处理。
- 该框架引入了新的损失函数,利用区域质量分数和分割误差标签来提升分割精度。
- SCORE框架在骨骼CT扫描上的实验验证了其有效性,显著提高初始预测的准确性。
- 与现有修正方法相比,SCORE框架降低了监督要求和标注时间。
- SCORE框架具有广泛的应用潜力,可应用于多种解剖部位和成像模态的自动化分割。
- 该框架的源代码已公开发布,便于其他研究者使用和改进。
- 此方法解决了医学图像分析中手动分割劳动强度大、易出错的问题,为医学图像分析提供了有效的自动化工具。
点此查看论文截图
Forecasting Future Anatomies: Longitudianl Brain Mri-to-Mri Prediction
Authors:Ali Farki, Elaheh Moradi, Deepika Koundal, Jussi Tohka
Predicting future brain state from a baseline magnetic resonance image (MRI) is a central challenge in neuroimaging and has important implications for studying neurodegenerative diseases such as Alzheimer’s disease (AD). Most existing approaches predict future cognitive scores or clinical outcomes, such as conversion from mild cognitive impairment to dementia. Instead, here we investigate longitudinal MRI image-to-image prediction that forecasts a participant’s entire brain MRI several years into the future, intrinsically modeling complex, spatially distributed neurodegenerative patterns. We implement and evaluate five deep learning architectures (UNet, U2-Net, UNETR, Time-Embedding UNet, and ODE-UNet) on two longitudinal cohorts (ADNI and AIBL). Predicted follow-up MRIs are directly compared with the actual follow-up scans using metrics that capture global similarity and local differences. The best performing models achieve high-fidelity predictions, and all models generalize well to an independent external dataset, demonstrating robust cross-cohort performance. Our results indicate that deep learning can reliably predict participant-specific brain MRI at the voxel level, offering new opportunities for individualized prognosis.
从基线磁共振图像(MRI)预测未来的脑状态是神经影像学中的核心挑战,对于研究阿尔茨海默病(AD)等神经退行性疾病具有重要意义。大多数现有方法预测未来的认知得分或临床结果,例如从轻度认知障碍转变为痴呆。与此相反,我们在研究中探索了纵向MRI图像预测,这是一种可以预测参与者在未来几年内整个大脑的MRI图像的方法,内在地模拟复杂的空间分布神经退行模式。我们在两个纵向队列研究(ADNI和AIBL)上实现了五种深度学习架构(UNet、U2-Net、UNETR、时间嵌入UNet和ODE-UNet),并将预测的跟检MRI与实际跟踪扫描结果直接进行比较,采用衡量指标捕获全局相似性和局部差异。表现最佳的模型可实现高保真预测,所有模型在独立外部数据集上具有良好的通用性,显示出稳健的跨队列性能。我们的结果表明,深度学习可以可靠地预测特定参与者的个性化大脑MRI图像,这为个体化预后提供了新的机会。
论文及项目相关链接
Summary
该研究利用深度学习方法预测未来大脑状态MRI图像,通过对长期纵向MRI图像进行预测,模拟复杂的神经退化模式,直接对比预测结果与实际扫描结果,并在独立外部数据集上具有良好的泛化性能。此研究为个性化预测提供了新的机会。
Key Takeaways
- 该研究挑战了从基线磁共振图像(MRI)预测未来大脑状态的中心问题,这在神经影像学中对研究神经退行性疾病(如阿尔茨海默病)具有重要意义。
- 研究采用深度学习方法进行长期纵向MRI图像预测,内在模拟复杂的神经退化模式。
- 研究评估了五种深度学习架构(UNet、U2-Net、UNETR、Time-Embedding UNet和ODE-UNet)在两个纵向队列(ADNI和AIBL)中的表现。
- 最佳性能的模型能够实现高保真度的预测。
- 所有模型都能在独立外部数据集上良好地泛化,表现出跨队列的稳定性。
- 研究结果证明深度学习方法能够可靠地预测特定参与者的MRI图像,达到体素级别。
点此查看论文截图
Adapting General-Purpose Foundation Models for X-ray Ptychography in Low-Data Regimes
Authors:Robinson Umeike, Neil Getty, Yin Xiangyu, Yi Jiang
The automation of workflows in advanced microscopy is a key goal where foundation models like Language Models (LLMs) and Vision-Language Models (VLMs) show great potential. However, adapting these general-purpose models for specialized scientific tasks is critical, and the optimal domain adaptation strategy is often unclear. To address this, we introduce PtychoBench, a new multi-modal, multi-task benchmark for ptychographic analysis. Using this benchmark, we systematically compare two specialization strategies: Supervised Fine-Tuning (SFT) and In-Context Learning (ICL). We evaluate these strategies on a visual artifact detection task with VLMs and a textual parameter recommendation task with LLMs in a data-scarce regime. Our findings reveal that the optimal specialization pathway is task-dependent. For the visual task, SFT and ICL are highly complementary, with a fine-tuned model guided by context-aware examples achieving the highest mean performance (Micro-F1 of 0.728). Conversely, for the textual task, ICL on a large base model is the superior strategy, reaching a peak Micro-F1 of 0.847 and outperforming a powerful “super-expert” SFT model (0-shot Micro-F1 of 0.839). We also confirm the superiority of context-aware prompting and identify a consistent contextual interference phenomenon in fine-tuned models. These results, benchmarked against strong baselines including GPT-4o and a DINOv3-based classifier, offer key observations for AI in science: the optimal specialization path in our benchmark is dependent on the task modality, offering a clear framework for developing more effective science-based agentic systems.
在高级显微镜中,工作流程自动化是一个关键目标,而基础模型如语言模型(LLM)和视觉语言模型(VLM)显示出巨大潜力。然而,将这些通用模型适应于特定科学任务至关重要,而最佳领域适应策略往往不明确。为了解决这个问题,我们引入了PtychoBench,这是一个新的多模式多任务基准测试,用于对ptychographic进行分析。利用这一基准测试,我们系统地比较了两种专业化策略:有监督微调(SFT)和上下文学习(ICL)。我们在数据稀缺的情况下,对VLM的视觉伪影检测任务和LLM的文本参数推荐任务上评估了这些策略。我们的研究发现,最佳的专业化途径取决于任务。对于视觉任务,SFT和ICL是互补的,经过上下文感知例子引导的微调模型取得了最高平均性能(Micro-F1为0.728)。相反,对于文本任务,在大规模基础模型上进行ICL是更优越的策略,达到了峰值Micro-F1为0.847,并超越了强大的“超级专家”SFT模型(零样本Micro-F1为0.839)。我们还证实了上下文感知提示的优势,并发现微调模型中普遍存在一致的上下文干扰现象。这些结果与包括GPT-4o和基于DINOv3的分类器在内的强大基线相比,为人工智能在科学领域提供了关键观察:在我们的基准测试中,最佳的专业化路径取决于任务模式,为开发更有效的科学基础智能系统提供了清晰的框架。
论文及项目相关链接
摘要
本文介绍了一个名为PtychoBench的多模态多任务基准测试平台,该平台旨在解决先进显微镜工作流程自动化中面临的模型域适应性问题。通过该平台,本文对比了两种专业化策略:监督微调(SFT)和上下文学习(ICL)。在视觉伪物检测任务和文本参数推荐任务上的实验结果表明,最佳的专业化路径依赖于任务特性。视觉任务中,SFT和ICL相互补充,通过上下文感知的示例引导微调模型取得最高平均性能(Micro-F1为0.728)。而文本任务中,在大规模基础模型上进行ICL是更优策略,达到峰值Micro-F1为0.847,超越了强大的“超级专家”SFT模型(零样本Micro-F1为0.839)。研究还确认了上下文感知提示的优势,并发现了微调模型中一致性的上下文干扰现象。本文的研究结果对比了强大的基线模型,如GPT-4o和基于DINOv3的分类器,为科学人工智能提供了关键观察:在我们的基准测试中,最佳的专业化路径依赖于任务模式态,为开发更有效的科学基础代理系统提供了清晰框架。
关键见解
- 自动化先进显微镜的工作流程是一个关键目标,其中语言模型和视觉语言模型等基础模型具有巨大潜力。
- 适应这些通用模型于特定科学任务是至关重要的,但最佳域适应策略尚不清楚。
- 引入PtychoBench作为多模态多任务基准测试平台用于解决该问题。
- 对比了两种专业化策略:监督微调(SFT)和上下文学习(ICL)。
- 视觉伪物检测任务中,SFT和ICL结合表现最佳(Micro-F1为0.728)。
- 文本参数推荐任务中,大模型的ICL策略更优越(Micro-F1峰值达到0.847)。
点此查看论文截图
Wavelet-Optimized Motion Artifact Correction in 3D MRI Using Pre-trained 2D Score Priors
Authors:Genyuan Zhang, Xuyang Duan, Songtao Zhu, Ao Wang, Fenglin Liu
Motion artifacts in magnetic resonance imaging (MRI) remain a major challenge, as they degrade image quality and compromise diagnostic reliability. Score-based generative models (SGMs) have recently shown promise for artifact removal. However, existing 3D SGM-based approaches are limited in two key aspects: (1) their strong dependence on known forward operators makes them ineffective for correcting MRI motion artifacts, and (2) their slow inference speed hinders clinical translation. To overcome these challenges, we propose a wavelet-optimized end-to-end framework for 3D MRI motion correct using pre-trained 2D score priors (3D-WMoCo). Specifically, two orthogonal 2D score priors are leveraged to guide the 3D distribution prior, while a mean-reverting stochastic differential equation (SDE) is employed to model the restoration process of motion-corrupted 3D volumes to motion-free 3D distribution. Furthermore, wavelet diffusion is introduced to accelerate inference, and wavelet convolution is applied to enhance feature extraction. We validate the effectiveness of our approach through both simulated motion artifact experiments and real-world clinical motion artifact correction tests. The proposed method achieves robust performance improvements over existing techniques. Implementation details and source code are available at: https://github.com/ZG-yuan/3D-WMoCo.
磁共振成像(MRI)中的运动伪影仍然是一个主要挑战,因为它们会降低图像质量并影响诊断的可靠性。基于评分的生成模型(SGMs)最近在去除伪影方面显示出希望。然而,现有的3D SGM方法存在两个主要局限性:(1)它们对已知前向算子的强烈依赖使它们无法有效地校正MRI运动伪影;(2)其缓慢的推理速度阻碍了临床翻译。为了克服这些挑战,我们提出了一种利用预训练的2D评分先验知识(3D-WMoCo)进行3D MRI运动校正的小波优化端到端框架。具体来说,利用两个正交2D评分先验来引导3D分布先验,并采用均值回归随机微分方程(SDE)对运动损坏的3D体积的恢复过程进行建模,以得到无运动的3D分布。此外,引入小波扩散以加速推理,应用小波卷积以增强特征提取。我们通过模拟运动伪影实验和现实世界中的临床运动伪影校正测试验证了该方法的有效性。该方法在现有技术上实现了稳健的性能提升。实施细节和源代码可在以下网址找到:https://github.com/ZG-yuan/3D-WMoCo。
论文及项目相关链接
PDF 11 pages, 5 figures
Summary
本文提出一种基于小波优化的端到端框架(3D-WMoCo),用于利用预训练的二维分数先验值校正三维MRI运动伪影。该方法通过两个正交二维分数先验值来引导三维分布先验值,并采用均值回归随机微分方程(SDE)对运动干扰的三维体积进行恢复建模。此外,引入小波扩散以加速推理,并采用小波卷积增强特征提取。实验验证该方法在模拟和真实临床运动伪影校正测试中均取得了显著的改进效果。
Key Takeaways
- 文中提出了一种新的框架(3D-WMoCo)用于校正三维MRI中的运动伪影。
- 该方法利用预训练的二维分数先验值来引导三维分布先验值的建模。
- 采用均值回归随机微分方程(SDE)对运动干扰的三维体积进行恢复建模。
- 小波扩散被引入以加速推理过程,提高临床应用的实时性。
- 小波卷积增强了特征提取能力,提高了运动伪影校正的准确性。
- 该方法通过模拟和真实临床测试验证其有效性。
- 文章提供了详细的实现细节和源代码,可供进一步研究使用。
点此查看论文截图
Language-Enhanced Generative Modeling for PET Synthesis from MRI and Blood Biomarkers
Authors:Zhengjie Zhang, Xiaoxie Mao, Qihao Guo, Shaoting Zhang, Qi Huang, Mu Zhou, Fang Xie, Mianxin Liu
Background: Alzheimer’s disease (AD) diagnosis heavily relies on amyloid-beta positron emission tomography (Abeta-PET), which is limited by high cost and limited accessibility. This study explores whether Abeta-PET spatial patterns can be predicted from blood-based biomarkers (BBMs) and MRI scans. Methods: We collected Abeta-PET images, T1-weighted MRI scans, and BBMs from 566 participants. A language-enhanced generative model, driven by a large language model (LLM) and multimodal information fusion, was developed to synthesize PET images. Synthesized images were evaluated for image quality, diagnostic consistency, and clinical applicability within a fully automated diagnostic pipeline. Findings: The synthetic PET images closely resemble real PET scans in both structural details (SSIM = 0.920 +/- 0.003) and regional patterns (Pearson’s r = 0.955 +/- 0.007). Diagnostic outcomes using synthetic PET show high agreement with real PET-based diagnoses (accuracy = 0.80). Using synthetic PET, we developed a fully automatic AD diagnostic pipeline integrating PET synthesis and classification. The synthetic PET-based model (AUC = 0.78) outperforms T1-based (AUC = 0.68) and BBM-based (AUC = 0.73) models, while combining synthetic PET and BBMs further improved performance (AUC = 0.79). Ablation analysis supports the advantages of LLM integration and prompt engineering. Interpretation: Our language-enhanced generative model synthesizes realistic PET images, enhancing the utility of MRI and BBMs for Abeta spatial pattern assessment and improving the diagnostic workflow for Alzheimer’s disease.
背景:阿尔茨海默病(AD)的诊断严重依赖于淀粉样蛋白β正电子发射断层扫描(Abeta-PET),但由于成本高昂和可及性有限而受到限制。本研究旨在探究Abeta-PET空间模式是否能通过血液生物标志物(BBMs)和核磁共振成像(MRI)预测。方法:我们从566名参与者中收集了Abeta-PET图像、T1加权MRI扫描和BBMs数据。借助大型语言模型(LLM)和多模态信息融合技术,开发了一种语言增强生成模型,以合成PET图像。评估合成图像在图像质量、诊断一致性和临床适用性方面的表现,并将其纳入全自动诊断流程中。结果:合成PET图像在结构细节(SSIM = 0.920 ± 0.003)和区域模式(Pearson r = 0.955 ± 0.007)上紧密模拟真实PET扫描。使用合成PET的诊断结果与基于真实PET的诊断结果高度一致(准确度= 0.80)。我们开发了一个全自动AD诊断流程,该流程集成了PET合成和分类。基于合成PET的模型(AUC = 0.78)的表现优于基于T1(AUC = 0.68)和基于BBMs(AUC = 0.73)的模型,同时结合合成PET和BBMs可进一步提高性能(AUC = 0.79)。消融分析支持了LLM集成和即时工程化的优势。解读:我们的语言增强生成模型能合成逼真的PET图像,提高了MRI和BBMs在评估Abeta空间模式方面的实用性,并改善了阿尔茨海默病的诊断流程。
论文及项目相关链接
PDF 31 pages, 8 figures
Summary
本研究探索了基于血液生物标志物(BBMs)和MRI扫描预测淀粉样蛋白β正电子发射断层扫描(Abeta-PET)空间模式的可能性。开发了一种由大型语言模型(LLM)和多模态信息融合驱动的生成模型,合成PET图像,并进行了图像质量、诊断一致性和临床适用性的评估。合成PET图像与真实PET扫描在结构细节和区域模式上高度相似,使用合成PET的诊断结果与真实PET的诊断结果高度一致。此外,研究还开发了一个全自动的基于合成PET的阿尔茨海默病诊断流程,其性能优于基于T1和BBMs的模型,且结合合成PET和BBMs可进一步提高性能。
Key Takeaways
- 研究旨在解决阿尔茨海默病诊断中Abeta-PET的高成本和有限的可访问性问题。
- 通过大型语言模型和多模态信息融合开发了一种生成模型来合成PET图像。
- 合成PET图像在图像质量、诊断一致性和临床适用性方面表现出良好的性能。
- 合成PET图像与真实PET扫描在结构和区域模式上具有高度的相似性。
- 使用合成PET的诊断结果与真实PET的诊断结果高度一致。
- 基于合成PET的自动诊断流程在阿尔茨海默病诊断中具有优越性,优于基于T1和BBMs的模型。
点此查看论文截图
Beyond Spin Coating: Homogeneous All-Inorganic Perovskite Films via High-Pressure Recrystallization
Authors:Trong Tam Nguyen, José Penuelas, Aziz Benamrouche, Céline Chevalier, Thi Kim Anh Hoang, Gaëlle Trippé-Allard, Elsa Cassette, Brice Devif, Emmanuel Drouard, Emmanuelle Deleporte, Hong Hanh Mai, Abdelaziz Bouazizi, Christian Seassal, Hai Son Nguyen
Metal halide perovskites are promising materials for optoelectronic applications owing to their outstanding optical and electronic properties. Among them, all-inorganic perovskites such as CsPbBr$_3$ offer superior thermal and chemical stability. However, obtaining high-quality CsPbBr$_3$ thin films via solution processing remains challenging due to the precursor’s low solubility, and current additive or solvent engineering strategies are often complex and poorly reproducible. High-pressure recrystallization has recently emerged as a promising route to improve film quality, yet its impact on film properties remains insufficiently explored. Here, we systematically investigate the morphological, structural, and optical properties of CsPbBr$_3$ thin films prepared by high-pressure recrystallization, in comparison with standard non-recrystallized films. Optimized recrystallization at 300 bar produces smooth, pinhole-free, single-phase 3D perovskite layers with sub-nanometer roughness, while the film thickness is precisely tunable via precursor concentration. The process enhances both grain and crystallite sizes, leading to amplified spontaneous emission with a reduced excitation threshold and improved photostability. Temperature-dependent X-ray diffraction further reveals the orthorhombic–tetragonal–cubic phase transition, consistent with single-crystal behavior. This study provides fundamental insights into pressure-driven recrystallization and establishes a reproducible, scalable approach for fabricating high-quality CsPbBr$_3$ films for optoelectronic devices.
卤化物钙钛矿材料因其卓越的光学和电子特性而成为光电子应用中的有前途的材料。其中,全无机钙钛矿如CsPbBr3具有出色的热和化学稳定性。然而,通过溶液处理获得高质量的CsPbBr3薄膜仍然具有挑战性,这是由于前驱体的溶解度低,并且当前的添加剂或溶剂工程策略通常复杂且重现性差。高压再结晶最近被认为是提高薄膜质量的有前途的途径,然而它对薄膜性能的影响尚未得到充分探索。在这里,我们系统地研究了通过高压再结晶制备的CsPbBr3薄膜的形态学、结构和光学性能,并与标准的非再结晶薄膜进行了比较。在300巴下优化的再结晶产生了光滑、无针孔、单相的3D钙钛矿层,粗糙度在亚纳米范围内,同时可以通过前驱体浓度精确调节薄膜厚度。该过程增加了晶粒和微晶的尺寸,从而增强了自发发射,降低了激发阈值并改善了光稳定性。温度依赖的X射线衍射进一步揭示了正交-四棱-立方相变,这与单晶行为一致。这项研究提供了关于压力驱动再结晶的基本见解,并建立了一种可重复、可扩展的方法来制造用于光电子器件的高质量CsPbBr3薄膜。
论文及项目相关链接
摘要
金属卤化物钙钛矿因其优异的光学和电子性能而成为光电子应用的有前途的材料。其中,全无机钙钛矿如CsPbBr3具有优越的热和化学稳定性。然而,通过溶液处理获得高质量的CsPbBr3薄膜仍然具有挑战性,因为前驱体的溶解度低,目前的添加剂或溶剂工程策略通常复杂且重现性差。高压再结晶最近被证明是提高薄膜质量的有前途的途径,但其对薄膜性能的影响仍探索不足。本文系统地研究了通过高压再结晶制备的CsPbBr3薄膜的形态学、结构和光学性能,并与标准的非再结晶薄膜进行了比较。在300bar下优化的再结晶产生了光滑、无针孔、单相的3D钙钛矿层,粗糙度达亚纳米级,而薄膜厚度可通过前驱体浓度精确调节。该过程提高了晶粒和微晶的尺寸,导致自发发射增强,激发阈值降低,光稳定性提高。温度依赖的X射线衍射进一步揭示了正交-四棱-立方相变,与单晶行为一致。本研究为压力驱动再结晶提供了基础见解,并建立了一种可重复、可规模化的方法来制造用于光电子器件的高质量CsPbBr3薄膜。
关键见解
- 金属卤化物钙钛矿,如CsPbBr3,在光电子应用方面展现出巨大的潜力,但其溶液处理过程中面临前驱体溶解度低和制备高质量薄膜的挑战。
- 高压再结晶被证明是提高CsPbBr3薄膜质量的有效途径。
- 在300bar下优化的再结晶产生的薄膜表现出平滑、无针孔、单相的3D钙钛矿层特性,且薄膜厚度可通过前驱体浓度调节。
- 高压再结晶增强了晶粒和微晶的尺寸,改善了薄膜的光学性能,包括自发发射、激发阈值和光稳定性。
- 温度依赖的X射线衍射研究揭示了CsPbBr3薄膜在高压再结晶过程中的相变行为,与单晶性质相符。
- 本研究为压力驱动再结晶提供了深入理解。
点此查看论文截图
MicroAUNet: Boundary-Enhanced Multi-scale Fusion with Knowledge Distillation for Colonoscopy Polyp Image Segmentation
Authors:Ziyi Wang, Yuanmei Zhang, Dorna Esrafilzadeh, Ali R. Jalili, Suncheng Xiang
Early and accurate segmentation of colorectal polyps is critical for reducing colorectal cancer mortality, which has been extensively explored by academia and industry. However, current deep learning-based polyp segmentation models either compromise clinical decision-making by providing ambiguous polyp margins in segmentation outputs or rely on heavy architectures with high computational complexity, resulting in insufficient inference speeds for real-time colorectal endoscopic applications. To address this problem, we propose MicroAUNet, a light-weighted attention-based segmentation network that combines depthwise-separable dilated convolutions with a single-path, parameter-shared channel-spatial attention block to strengthen multi-scale boundary features. On the basis of it, a progressive two-stage knowledge-distillation scheme is introduced to transfer semantic and boundary cues from a high-capacity teacher. Extensive experiments on benchmarks also demonstrate the state-of-the-art accuracy under extremely low model complexity, indicating that MicroAUNet is suitable for real-time clinical polyp segmentation. The code is publicly available at https://github.com/JeremyXSC/MicroAUNet.
早期且准确的结肠息肉分割对于降低结肠癌死亡率至关重要,学术界和工业界对此进行了广泛探索。然而,当前的基于深度学习的息肉分割模型要么在分割结果中提供模糊的息肉边界,从而影响临床决策,要么依赖于计算复杂度高的复杂架构,导致实时结肠内窥镜应用的推理速度不足。为了解决这个问题,我们提出了MicroAUNet,这是一个轻量级的基于注意力的分割网络,它结合了深度可分离膨胀卷积和单路径、参数共享通道空间注意力块,以强化多尺度边界特征。在此基础上,引入了一种渐进的两阶段知识蒸馏方案,以从高性能教师模型中转移语义和边界线索。在基准测试上的大量实验也证明了在模型复杂度极低的情况下,其准确性达到了最新水平,表明MicroAUNet适合用于实时临床息肉分割。代码公开在https://github.com/JeremyXSC/MicroAUNet。
论文及项目相关链接
PDF Work in progress
Summary
本文介绍了针对实时临床结肠息肉分割的问题,提出一种轻量级注意力基础的分割网络MicroAUNet。它通过结合深度可分离膨胀卷积和单路径参数共享通道空间注意力块,强化了多尺度边界特征。同时引入了一种渐进的两阶段知识蒸馏方案,从高性能的教师模型中转移语义和边界线索。在基准测试上的实验结果表明,MicroAUNet在模型复杂度极低的情况下达到了最先进的准确性,适合用于实时临床息肉分割。
Key Takeaways
- MicroAUNet解决了实时临床结肠息肉分割中的关键问题,包括模糊的多边形边界和计算复杂性高的问题。
- MicroAUNet结合了深度可分离膨胀卷积和单路径参数共享通道空间注意力块技术,强化了多尺度边界特征。
- 该网络通过渐进的两阶段知识蒸馏方案,从高性能的教师模型中转移语义和边界线索。
- MicroAUNet在基准测试上表现出卓越的性能,达到了最先进的准确性,并且模型复杂度极低。
- MicroAUNet适用于实时临床息肉分割,有助于减少结肠癌死亡率。
- 该模型的代码已公开可用,便于其他研究者使用和改进。
点此查看论文截图
Knowledge Elicitation with Large Language Models for Interpretable Cancer Stage Identification from Pathology Reports
Authors:Yeawon Lee, Christopher C. Yang, Chia-Hsuan Chang, Grace Lu-Yao
Cancer staging is critical for patient prognosis and treatment planning, yet extracting pathologic TNM staging from unstructured pathology reports poses a persistent challenge. Existing natural language processing (NLP) and machine learning (ML) strategies often depend on large annotated datasets, limiting their scalability and adaptability. In this study, we introduce two Knowledge Elicitation methods designed to overcome these limitations by enabling large language models (LLMs) to induce and apply domain-specific rules for cancer staging. The first, Knowledge Elicitation with Long-Term Memory (KEwLTM), uses an iterative prompting strategy to derive staging rules directly from unannotated pathology reports, without requiring ground-truth labels. The second, Knowledge Elicitation with Retrieval-Augmented Generation (KEwRAG), employs a variation of RAG where rules are pre-extracted from relevant guidelines in a single step and then applied, enhancing interpretability and avoiding repeated retrieval overhead. We leverage the ability of LLMs to apply broad knowledge learned during pre-training to new tasks. Using breast cancer pathology reports from the TCGA dataset, we evaluate their performance in identifying T and N stages, comparing them against various baseline approaches on two open-source LLMs. Our results indicate that KEwLTM outperforms KEwRAG when Zero-Shot Chain-of-Thought (ZSCOT) inference is effective, whereas KEwRAG achieves better performance when ZSCOT inference is less effective. Both methods offer transparent, interpretable interfaces by making the induced rules explicit. These findings highlight the promise of our Knowledge Elicitation methods as scalable, high-performing solutions for automated cancer staging with enhanced interpretability, particularly in clinical settings with limited annotated data.
癌症分期对患者的预后和治疗计划至关重要,但从非结构化病理报告中提取病理TNM分期是一个持续存在的挑战。现有的自然语言处理(NLP)和机器学习(ML)策略通常依赖于大量标注数据集,这限制了它们的可扩展性和适应性。在这项研究中,我们介绍了两种知识提取方法,旨在克服这些限制,使大型语言模型(LLM)能够诱导和应用癌症分期的领域特定规则。第一种是“基于长期记忆的知识提取”(KEwLTM),它使用迭代提示策略直接从未标注的病理报告中推导分期规则,无需真实标签。第二种是“基于检索增强生成的知识提取”(KEwRAG),它采用RAG的变体,一次性从相关指南中提取规则,然后应用这些规则,提高可解释性,避免重复检索开销。我们利用LLM将预训练过程中学习到的广泛知识应用于新任务的能力。我们利用TCGA数据集中的乳腺癌病理报告,评估它们在识别T和N阶段的能力,与两种开源LLM的各种基线方法进行比较。我们的结果表明,当零射击思维链(ZSCOT)推理有效时,KEwLTM的表现优于KEwRAG,而当ZSCOT推理效果较差时,KEwRAG的表现更好。两种方法都通过使诱导的规则明确来提供透明、可解释的界面。这些发现突显了我们知识提取方法作为可伸缩、高性能的自动癌症分期解决方案的潜力,特别是在有限标注数据的临床环境中,它们具有增强的可解释性。
论文及项目相关链接
Summary
本文介绍了两种知识提取方法——通过长期记忆的知识提取(KEwLTM)和通过检索增强生成的知识提取(KEwRAG),用于从非结构化病理报告中提取癌症TNM分期信息。这两种方法克服了现有自然语言处理和机器学习策略的局限性,使大型语言模型能够诱导和应用癌症分期的特定规则。研究结果表明,在零射击链思维(ZSCOT)推理有效时,KEwLTM表现优于KEwRAG,而在ZSCOT推理不那么有效时,KEwRAG表现更好。两种方法都提供了透明的、可解释的界面,使诱导的规则变得明确。这为自动化癌症分期提供了可扩展、高性能的解决方案,特别是在缺乏注释数据的临床环境中。
Key Takeaways
- 癌症分期对于患者预后和治疗计划至关重要。
- 从非结构化病理报告中提取TNM分期信息是一个挑战。
- 知识提取方法如KEwLTM和KEwRAG能够克服现有NLP和ML策略的局限性。
- LLMs能够通过诱导和应用特定规则进行癌症分期。
- KEwLTM使用迭代提示策略直接从未注释的病理报告中推导分期规则,无需真实标签。
- KEwRAG采用从相关指南中提取规则的方法,增强了可解释性并避免了重复检索开销。
点此查看论文截图
OmniBrainBench: A Comprehensive Multimodal Benchmark for Brain Imaging Analysis Across Multi-stage Clinical Tasks
Authors:Zhihao Peng, Cheng Wang, Shengyuan Liu, Zhiying Liang, Yixuan Yuan
Brain imaging analysis is vital for diagnosing and treating brain disorders, and multimodal large language models (MLLMs) are increasingly assisting in that analysis. However, current brain-oriented visual question-answering (VQA) benchmarks either cover a few imaging modalities or are limited to coarse-grained pathological descriptions, hindering a comprehensive assessment of MLLMs throughout the full clinical continuum. To address these, we introduce OmniBrainBench, the first comprehensive multimodal VQA benchmark specifically designed to assess the multimodal comprehension capabilities of MLLMs in brain imaging analysis.OmniBrainBench consists of 15 distinct brain imaging modalities collected from 30 verified medical sources, yielding 9,527 validated VQA pairs and 31,706 images. It simulates clinical workflows and encompasses 15 multi-stage clinical tasks rigorously validated by a professional radiologist. Evaluation of 24 state-of-the-art models, including open-source, medical, and proprietary MLLMs, highlights the substantial challenges posed by OmniBrainBench. Our experiments reveal: (1) proprietary MLLMs (e.g., GPT-5) beat open-source and medical models but lag physicians; (2) medical MLLMs vary widely in performance; (3) open-source MLLMs trail overall but excel in specific tasks; (4) MLLMs underperform sharply in complex preoperative tasks, revealing a visual-to-clinical reasoning gap. OmniBrainBench sets a new standard for evaluating and advancing MLLMs in brain imaging analysis, highlighting gaps compared to expert clinical reasoning. We release it at benchmark & code.
脑成像分析在诊断与治疗脑部疾病中扮演着至关重要的角色,而多模态大型语言模型(MLLMs)正日益为这一分析提供辅助。然而,现有的面向大脑的视觉问答(VQA)基准测试要么涵盖少数成像模态,要么仅限于粗粒度的病理描述,阻碍了对MLLMs在整个临床过程中的全面评估。为了解决这个问题,我们引入了OmniBrainBench,这是专门为评估MLLMs在多模态脑成像分析中的理解能力而设计的首个全面多模态VQA基准测试。OmniBrainBench包含了从30个经过验证的医疗来源收集的15种不同的脑成像模态,产生了9527个经过验证的VQA对和31706张图像。它模拟了临床工作流程,包含了由专业放射科医生严格验证的15个多阶段临床任务。对24个最新模型的评价,包括开源、医疗和专有MLLMs,突显了OmniBrainBench带来的巨大挑战。我们的实验表明:(1)专有MLLMs(如GPT-5)击败了开源和医疗模型,但仍落后于医生;(2)医疗MLLMs的性能差异很大;(3)开源MLLMs总体上表现不佳,但在特定任务上表现优异;(4)MLLMs在复杂的术前任务中表现不佳,显示出视觉到临床推理的差距。OmniBrainBench为评估和推进脑成像分析中的MLLMs设定了新的标准,并突显了与专家临床推理相比的差距。我们已在基准测试和代码平台上发布。
论文及项目相关链接
Summary
针对多模态大脑影像分析的多模态大型语言模型(MLLMs)存在挑战,缺乏全面评估标准。OmniBrainBench应运而生,它是首个专门用于评估MLLM在多模态大脑影像分析中的理解能力的综合VQA基准测试。OmniBrainBench涵盖15种独特的脑部成像方式,模拟真实临床流程,并进行了严格的验证。对比评估多种最新模型后发现,尽管大型语言模型在某些任务中表现优异,但在复杂术前任务中表现欠佳,与专家临床推理相比存在差距。此基准测试为评估和改进大脑影像分析中的MLLMs提供了新的标准。
Key Takeaways
- 多模态大脑影像分析在诊断与治疗脑疾病中至关重要,而现有的视觉问答(VQA)基准测试在多模态大脑影像分析方面存在局限性。
- OmniBrainBench是首个专门用于评估多模态语言模型在多模态大脑影像分析中的综合VQA基准测试,涵盖了多种成像方式和临床任务。
- OmniBrainBench模拟真实临床流程,并经过专业放射科医生严格验证。
- 评估结果显示,专有大型语言模型(如GPT-5)胜过开源和医疗模型,但与医生相比仍有差距。
- 医疗大型语言模型性能差异较大,而开源大型语言模型总体表现平平,但在特定任务中表现突出。
- 大型语言模型在复杂术前任务中表现欠佳,存在视觉到临床推理的差距。
点此查看论文截图
Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing
Authors:Zhihui Chen, Mengling Feng
Recent advances in multimodal large language models have enabled remarkable medical image editing capabilities. However, the research community’s progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built specifically for medical image editing with strict anatomical and clinical constraints. We introduce Med-Banana-50K, a comprehensive 50K-image dataset for instruction-based medical image editing spanning three modalities (chest X-ray, brain MRI, fundus photography) and 23 disease types. Our dataset is constructed by leveraging Gemini-2.5-Flash-Image to generate bidirectional edits (lesion addition and removal) from real medical images. What distinguishes Med-Banana-50K from general-domain editing datasets is our systematic approach to medical quality control: we employ LLM-as-Judge with a medically grounded rubric (instruction compliance, structural plausibility, realism, and fidelity preservation) and history-aware iterative refinement up to five rounds. Beyond single-turn editing, Med-Banana-50K includes 37K failed attempts with full conversation logs for preference learning and alignment research. By providing this large-scale, medically validated, and fully documented resource, Med-Banana-50K establishes a foundation for training and evaluating the next generation of medical image editing models.Our dataset and code are publicly available at [https://github.com/richardChenzhihui/med-banana-50k].
近期多模态大型语言模型的进步为医学图像编辑提供了显著的能力。然而,研究领域的进展仍受到缺乏大规模、高质量、公开可访问的医学图像编辑数据集的制约,这些数据集需要严格的解剖和临床约束。我们介绍了Med-Banana-50K,这是一个基于指令的医学图像编辑的综合性5万张图像数据集,涵盖三种模态(胸部X射线、脑部MRI、眼底摄影)和23种疾病类型。我们的数据集通过利用Gemini-2.5-Flash-Image生成真实医学图像的双向编辑(病灶增加和移除)来构建。Med-Banana-50K与一般领域编辑数据集的区别在于我们的医疗质量控制系统方法:我们采用LLM-as-Judge,使用基于医学的评分标准(指令合规性、结构可行性、真实性和保真度保留),并进行最多五轮的历史感知迭代改进。除了单回合编辑,Med-Banana-50K还包括3.7万次失败尝试及完整对话记录,可用于偏好学习和对齐研究。通过提供大规模、经过医学验证和完整记录的这一资源,Med-Banana-50K为培训和评估下一代医学图像编辑模型奠定了基础。我们的数据集和代码可在[https://github.com/richardChenzhihui/med-banana-50k]公开获取。
论文及项目相关链接
Summary
本文介绍了一个名为Med-Banana-50K的医学图像编辑数据集,包含5万张图像,涉及三种模态和23种疾病类型。该数据集通过生成双向编辑(病变增加和移除)从真实医学图像中构建,采用LLM-as-Judge进行医学质量控制,并包含失败尝试和完整对话日志,为下一代医学图像编辑模型提供训练和评估基础。数据集和代码已公开。
Key Takeaways
- Med-Banana-50K是一个综合性的医学图像编辑数据集,包含50K张图像,覆盖三种模态和23种疾病类型。
- 数据集通过真实医学图像生成双向编辑构建。
- 采用LLM-as-Judge进行医学质量控制,确保数据质量。
- 包含了全面的失败尝试和对话日志,有助于偏好学习和对齐研究。
- Med-Banana-50K为训练和评估医学图像编辑模型提供了基础。
- 数据集和代码已公开发布,便于研究和利用。
点此查看论文截图
Applying Medical Imaging Tractography Techniques to Painterly Rendering of Images
Authors:Alberto Di Biase
Doctors and researchers routinely use diffusion tensor imaging (DTI) and tractography to visualize the fibrous structure of tissues in the human body. This paper explores the connection of these techniques to the painterly rendering of images. Using a tractography algorithm the presented method can place brush strokes that mimic the painting process of human artists, analogously to how fibres are tracked in DTI. The analogue to the diffusion tensor for image orientation is the structural tensor, which can provide better local orientation information than the gradient alone. I demonstrate this technique in portraits and general images, and discuss the parallels between fibre tracking and brush stroke placement, and frame it in the language of tractography. This work presents an exploratory investigation into the cross-domain application of diffusion tensor imaging techniques to painterly rendering of images. All the code is available at https://github.com/tito21/st-python
医生和研究者通常使用扩散张量成像(DTI)和纤维追踪技术来可视化人体内的纤维结构。本文探讨了这些技术与图像绘画渲染之间的联系。通过使用纤维追踪算法,所提出的方法可以放置模仿人类艺术家绘画过程的笔触,这与DTI中如何追踪纤维类似。用于图像方向的扩散张量的类似物是结构张量,它可以提供比单纯梯度更好的局部方向信息。我在肖像和通用图像中展示了这项技术,并讨论了纤维追踪与笔触放置之间的相似之处,并用纤维追踪的语言来表述。这项工作是对扩散张量成像技术跨领域应用于图像绘画渲染的探索性研究。所有代码均可在https://github.com/tito21/st-python找到。
论文及项目相关链接
PDF Exploratory investigation applying medical imaging tractography techniques to painterly image rendering. Code available at https://github.com/tito21/st-python
摘要
医学图像领域中,医生和研究者经常使用扩散张量成像(DTI)和纤维跟踪技术来可视化人体中的纤维结构。本文探讨了这些技术与图像绘画渲染之间的联系。通过使用纤维跟踪算法,所提出的方法可以放置模仿人类艺术家绘画过程的笔触,类似于DTI中纤维的追踪方式。图像方向的扩散张量的类似物是结构张量,它可以提供比梯度更好的局部方向信息。作者在肖像和一般图像中展示了这项技术,并讨论了纤维跟踪与笔触放置之间的平行性,并用纤维跟踪的语言进行描述。这项工作是对扩散张量成像技术在图像绘画渲染中的跨域应用进行的探索性研究。所有代码均可在https://github.com/tito21/st-python上找到。
要点摘要
- 医生和研究者使用扩散张量成像(DTI)和纤维跟踪技术来可视化人体纤维结构。
- 本文探索了医学成像技术与图像绘画渲染之间的连接。
- 通过纤维跟踪算法,模拟人类艺术家的绘画过程,将纤维追踪与图像笔触放置进行类比。
- 结构张量作为扩散张量的类似物,能提供比单纯梯度更丰富的局部方向信息。
- 作者在肖像和一般图像中展示了融合医学成像技术与绘画技术的效果。
- 探讨了纤维跟踪与图像笔触之间的平行性,并运用纤维跟踪语言进行描述。
点此查看论文截图
Towards Reliable Pediatric Brain Tumor Segmentation: Task-Specific nnU-Net Enhancements
Authors:Xiaolong Li, Zhi-Qin John Xu, Yan Ren, Tianming Qiu, Xiaowen Wang
Accurate segmentation of pediatric brain tumors in multi-parametric magnetic resonance imaging (mpMRI) is critical for diagnosis, treatment planning, and monitoring, yet faces unique challenges due to limited data, high anatomical variability, and heterogeneous imaging across institutions. In this work, we present an advanced nnU-Net framework tailored for BraTS 2025 Task-6 (PED), the largest public dataset of pre-treatment pediatric high-grade gliomas. Our contributions include: (1) a widened residual encoder with squeeze-and-excitation (SE) attention; (2) 3D depthwise separable convolutions; (3) a specificity-driven regularization term; and (4) small-scale Gaussian weight initialization. We further refine predictions with two postprocessing steps. Our models achieved first place on the Task-6 validation leaderboard, attaining lesion-wise Dice scores of 0.759 (CC), 0.967 (ED), 0.826 (ET), 0.910 (NET), 0.928 (TC) and 0.928 (WT).
在多参数磁共振成像(mpMRI)中对儿童脑肿瘤进行精确分割对于诊断、治疗规划和监测至关重要。然而,由于数据有限、解剖结构高度可变以及机构间成像的异质性,这面临着独特的挑战。在这项工作中,我们针对BraTS 2025 Task-6(PED)任务,提出了一种先进的nnU-Net框架,该框架是儿童高级别胶质瘤预处理的最大公共数据集。我们的贡献包括:(1)带有挤压和激励(SE)注意力的宽残差编码器;(2)3D深度可分离卷积;(3)特异性驱动的正则化项;(4)小规模高斯权重初始化。我们进一步通过两个后处理步骤完善预测。我们的模型在Task-6验证排行榜上名列前茅,获得了病变级别的Dice分数:0.759(CC)、0.967(ED)、0.826(ET)、0.910(NET)、0.928(TC)和0.928(WT)。
论文及项目相关链接
Summary
基于多参数磁共振成像(mpMRI)对小儿脑肿瘤进行精确分割对于诊断、治疗规划和监测至关重要。本研究采用先进的nnU-Net框架,针对BraTS 2025 Task-6(PED)进行定制化设计,该方法包含一系列创新贡献,如宽残差编码器与挤压激发(SE)注意力机制、3D深度可分离卷积、特异性驱动的正则化术语以及小尺度高斯权重初始化等。通过两个后处理步骤进一步优化预测结果,模型在Task-6验证排行榜上获得第一名,各病灶Dice得分表现优异。
Key Takeaways
- 研究关注多参数磁共振成像(mpMRI)在小儿脑肿瘤分割中的应用,这是诊断、治疗规划和监测的关键环节。
- 研究采用先进的nnU-Net框架,针对BraTS 2025 Task-6(PED)进行定制化设计,应对有限数据、高解剖变异和机构间成像差异等挑战。
- 模型的贡献包括宽残差编码器与挤压激发(SE)注意力机制、3D深度可分离卷积等创新技术。
- 模型引入特异性驱动的正则化术语和小尺度高斯权重初始化,进一步提高分割准确性。
- 通过两个后处理步骤进一步优化预测结果。
- 模型在Task-6验证排行榜上获得第一名,表明其优越性能。
点此查看论文截图
VisionCAD: An Integration-Free Radiology Copilot Framework
Authors:Jiaming Li, Junlei Wu, Sheng Wang, Honglin Xiong, Jiangdong Cai, Zihao Zhao, Yitao Zhu, Yuan Yin, Dinggang Shen, Qian Wang
Widespread clinical deployment of computer-aided diagnosis (CAD) systems is hindered by the challenge of integrating with existing hospital IT infrastructure. Here, we introduce VisionCAD, a vision-based radiological assistance framework that circumvents this barrier by capturing medical images directly from displays using a camera system. The framework operates through an automated pipeline that detects, restores, and analyzes on-screen medical images, transforming camera-captured visual data into diagnostic-quality images suitable for automated analysis and report generation. We validated VisionCAD across diverse medical imaging datasets, demonstrating that our modular architecture can flexibly utilize state-of-the-art diagnostic models for specific tasks. The system achieves diagnostic performance comparable to conventional CAD systems operating on original digital images, with an F1-score degradation typically less than 2% across classification tasks, while natural language generation metrics for automated reports remain within 1% of those derived from original images. By requiring only a camera device and standard computing resources, VisionCAD offers an accessible approach for AI-assisted diagnosis, enabling the deployment of diagnostic capabilities in diverse clinical settings without modifications to existing infrastructure.
计算机辅助诊断(CAD)系统的广泛临床应用受到了与现有医院IT基础设施集成挑战的限制。在这里,我们介绍了VisionCAD,这是一个基于视觉的放射辅助框架,它通过相机系统直接从显示屏捕获医疗图像,从而绕过了这一障碍。该框架通过一个自动化管道运行,该管道能够检测、恢复和分析屏幕上的医疗图像,将摄像头捕获的视觉数据转换为适合自动化分析和报告生成的诊断级图像。我们在不同的医学成像数据集上验证了VisionCAD,表明我们的模块化架构可以灵活地利用最先进的诊断模型来完成特定任务。系统的诊断性能与在原始数字图像上运行的常规CAD系统相当,分类任务的F1分数下降通常不到2%,而自动报告的自然语言生成指标与从原始图像中得出的指标相比仍保持在1%以内。VisionCAD仅需摄像头设备和标准计算资源,为人工智能辅助诊断提供了可访问的途径,能够在不修改现有基础设施的情况下,在不同的临床环境中部署诊断能力。
论文及项目相关链接
Summary
VisionCAD框架解决了CAD系统在临床中的部署问题。它通过相机系统直接捕获屏幕上的医学图像,将其转化为适合自动化分析和报告生成的诊断级图像。该框架具有模块化架构,可灵活利用最新诊断模型进行特定任务。其诊断性能与在原始数字图像上运行的常规CAD系统相当,分类任务的F1得分下降通常不到2%,自动生成报告的自然语言生成指标与原始图像之间的差异在1%以内。此外,它只需相机设备和标准计算资源,为在多样化临床环境中部署AI辅助诊断提供了可行的解决方案。无需对现有基础设施进行任何改动即可使用此技术,这使得VisionCAD得以广泛应用。
Key Takeaways
- VisionCAD是一个基于视觉的放射辅助诊断框架,通过相机系统直接捕获屏幕上的医学图像。
- 该框架通过自动化管道处理捕获的图像,将其转化为适合自动化分析和报告生成的诊断级图像。
- VisionCAD具有模块化架构,能够灵活利用最新的诊断模型进行特定任务的处理。
- VisionCAD的诊断性能与常规CAD系统相当,F1得分下降通常不到2%。
- 自动报告生成中的自然语言生成指标与原始图像差异在1%以内。
- VisionCAD对硬件要求低,只需相机设备和标准计算资源。
点此查看论文截图
Target-Guided Bayesian Flow Networks for Quantitatively Constrained CAD Generation
Authors:Wenhao Zheng, Chenwei Sun, Wenbo Zhang, Jiancheng Lv, Xianggen Liu
Deep generative models, such as diffusion models, have shown promising progress in image generation and audio generation via simplified continuity assumptions. However, the development of generative modeling techniques for generating multi-modal data, such as parametric CAD sequences, still lags behind due to the challenges in addressing long-range constraints and parameter sensitivity. In this work, we propose a novel framework for quantitatively constrained CAD generation, termed Target-Guided Bayesian Flow Network (TGBFN). For the first time, TGBFN handles the multi-modality of CAD sequences (i.e., discrete commands and continuous parameters) in a unified continuous and differentiable parameter space rather than in the discrete data space. In addition, TGBFN penetrates the parameter update kernel and introduces a guided Bayesian flow to control the CAD properties. To evaluate TGBFN, we construct a new dataset for quantitatively constrained CAD generation. Extensive comparisons across single-condition and multi-condition constrained generation tasks demonstrate that TGBFN achieves state-of-the-art performance in generating high-fidelity, condition-aware CAD sequences. The code is available at https://github.com/scu-zwh/TGBFN.
深度生成模型,如扩散模型,通过简化的连续性假设在图像生成和音频生成方面取得了有前景的进展。然而,由于解决长期约束和参数敏感性的挑战,生成多模态数据(如参数化CAD序列)的生成建模技术发展仍滞后。在这项工作中,我们提出了一个用于定量约束CAD生成的新型框架,称为目标导向贝叶斯流网络(TGBFN)。TGBFN首次在统一、连续和可微的参数空间中处理CAD序列的多模态性(即离散命令和连续参数),而不是在离散数据空间中。此外,TGBFN渗透参数更新内核并引入有导向的贝叶斯流来控制CAD属性。为了评估TGBFN,我们构建了一个新的数据集用于定量约束CAD生成。在单条件和多条件约束生成任务上的广泛比较表明,TGBFN在生成高保真、条件感知的CAD序列方面达到了最新技术水平。代码可在https://github.com/scu-zwh/TGBFN获取。
论文及项目相关链接
Summary
基于扩散模型等深度生成模型在图像和音频生成方面的显著进展,针对CAD序列等多模态数据生成技术仍面临长远约束和参数敏感性的挑战。本研究提出一种名为目标导向贝叶斯流网络(TGBFN)的定量约束CAD生成新框架,该框架首次在统一、连续、可微分的参数空间内处理CAD序列的多模态性(即离散命令和连续参数),而非离散数据空间。此外,TGBFN渗透参数更新内核并引入导向贝叶斯流以控制CAD属性。通过构建新的定量约束CAD生成数据集对TGBFN进行评估,在单条件和多条件约束生成任务上的广泛对比表明,TGBFN在生成高保真、条件感知的CAD序列方面达到最新技术水平。
Key Takeaways
- 深度生成模型如扩散模型在图像和音频生成方面取得显著进展。
- 针对CAD序列等多模态数据生成,仍存在长远约束和参数敏感性的挑战。
- 提出一种新型框架TGBFN,用于定量约束的CAD生成。
- TGBFN首次在统一、连续、可微分的参数空间处理CAD序列的多模态性。
- TGBFN引入参数更新内核和导向贝叶斯流以控制CAD属性。
- 为评估TGBFN,构建了新的定量约束CAD生成数据集。
点此查看论文截图
Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
Authors:Shunjie-Fabian Zheng, Hyeonjun Lee, Thijs Kooi, Ali Diba
Breast cancer remains the most commonly diagnosed malignancy among women in the developed world. Early detection through mammography screening plays a pivotal role in reducing mortality rates. While computer-aided diagnosis (CAD) systems have shown promise in assisting radiologists, existing approaches face critical limitations in clinical deployment - particularly in handling the nuanced interpretation of multi-modal data and feasibility due to the requirement of prior clinical history. This study introduces a novel framework that synergistically combines visual features from 2D mammograms with structured textual descriptors derived from easily accessible clinical metadata and synthesized radiological reports through innovative tokenization modules. Our proposed methods in this study demonstrate that strategic integration of convolutional neural networks (ConvNets) with language representations achieves superior performance to vision transformer-based models while handling high-resolution images and enabling practical deployment across diverse populations. By evaluating it on multi-national cohort screening mammograms, our multi-modal approach achieves superior performance in cancer detection and calcification identification compared to unimodal baselines, with particular improvements. The proposed method establishes a new paradigm for developing clinically viable VLM-based CAD systems that effectively leverage imaging data and contextual patient information through effective fusion mechanisms.
乳腺癌仍是发达国家女性中最常见的恶性肿瘤。通过乳腺X光筛查进行早期检测在降低死亡率方面起着至关重要的作用。计算机辅助诊断(CAD)系统在辅助放射科医生方面已显示出巨大的潜力,但现有方法在临床应用方面面临重大局限,特别是在处理多模式数据的细微解读和可行性(因需要先前临床病史)方面。本研究介绍了一种新型框架,该框架协同结合了2D乳腺X光图像中的视觉特征与从易于获取的临床元数据和合成放射学报告得出的结构化文本描述符,并通过创新的令牌化模块进行提取。本研究中提出的方法表明,通过结合卷积神经网络(ConvNets)和语言表示,在处理高分辨率图像并实现跨不同人群的实用部署方面,相较于基于视觉变压器的模型,能取得卓越性能。通过对多国队列筛查乳腺X光图像进行评估,我们的多模式方法在癌症检测和钙化识别方面实现了优于单模式基准线的卓越性能,并有特殊改进。所提出的方法为开发临床上可行的基于VLM的CAD系统建立了新范式,该系统通过有效的融合机制有效地利用成像数据和上下文患者信息。
论文及项目相关链接
PDF Accepted to Computer Vision for Automated Medical Diagnosis (CVAMD) Workshop at ICCV 2025
Summary
乳腺癌在发达国家女性中最常见的恶性肿瘤。早期通过乳腺X光摄影筛查对降低死亡率起关键作用。计算机辅助诊断(CAD)系统在辅助放射科医生方面显示出前景,但现有方法在临床部署中存在处理多模式数据的细微解读和可行性方面的局限性。本研究引入了一种新的框架,该框架协同结合了二维乳腺钼靶的视觉特征与从易于获取的临床元数据和合成放射学报告中派生的结构化文本描述符。本研究中提出的方法证明,与基于视觉转换器模型的融合相比,将卷积神经网络(ConvNets)与语言表示相结合的战略集成在处理高分辨率图像的同时,能够在不同人群中实现实际部署。在多国队列筛查乳腺钼靶摄影术中评估时,我们的多模式方法相较于单模式基线在癌症检测和钙化识别方面表现出卓越性能,具有特别的改进。该方法为开发临床可行的基于VLM的CAD系统建立了新的范例,该系统通过有效的融合机制有效地利用成像数据和上下文患者信息。
Key Takeaways
- 乳腺癌在发达国家女性中仍然是最常见的恶性肿瘤。
- 早期通过乳腺X光摄影筛查对降低死亡率至关重要。
- 计算机辅助诊断(CAD)系统在临床部署中存在处理多模式数据和可行性方面的局限性。
- 本研究结合二维乳腺钼靶的视觉特征与从临床元数据和放射学报告中派生的结构化文本描述符。
- 集成卷积神经网络(ConvNets)与语言表示的方法在处理高分辨率图像时表现出优越性能。
- 该多模式方法相较于单模式基线在癌症检测和钙化识别方面具有卓越性能。
点此查看论文截图
When are radiology reports useful for training medical image classifiers?
Authors:Herman Bergström, Zhongqi Yue, Fredrik D. Johansson
Medical images used to train machine learning models are often accompanied by radiology reports containing rich expert annotations. However, relying on these reports as inputs for clinical prediction requires the timely manual work of a trained radiologist. This raises a natural question: when can radiology reports be leveraged during training to improve image-only classification? Prior works are limited to evaluating pre-trained image representations by fine-tuning them to predict diagnostic labels, often extracted from reports, ignoring tasks with labels that are weakly associated with the text. To address this gap, we conduct a systematic study of how radiology reports can be used during both pre-training and fine-tuning, across diagnostic and prognostic tasks (e.g., 12-month readmission), and under varying training set sizes. Our findings reveal that: (1) Leveraging reports during pre-training is beneficial for downstream classification tasks where the label is well-represented in the text; however, pre-training through explicit image-text alignment can be detrimental in settings where it’s not; (2) Fine-tuning with reports can lead to significant improvements and even have a larger impact than the pre-training method in certain settings. These results provide actionable insights into when and how to leverage privileged text data to train medical image classifiers while highlighting gaps in current research.
用于训练机器学习模型的医学图像通常伴随有包含丰富专家注释的放射学报告。然而,依赖这些报告作为临床预测输入需要训练有素的放射科医生及时进行手动工作。这就提出了一个自然的问题:在训练过程中如何利用放射学报告来改善仅使用图像的分类?先前的工作仅限于评估通过微调预训练的图像表示来预测从报告中提取的诊断标签,忽略了与文本关联较弱的标签任务。为了解决这一空白,我们对如何在预训练和微调过程中使用放射学报告进行了系统研究,涵盖了诊断和预后任务(例如,12个月再入院),以及不同大小的训练集。我们的研究发现:(1)在预训练过程中利用报告对于下游分类任务是有益的,这些任务的标签在文本中得到了很好的表示;然而,通过明确的图像文本对齐进行预训练可能在某些情况下是有害的;(2)使用报告进行微调可以导致显着改进,甚至在某些情况下其影响大于预训练的方法。这些结果提供了如何利用特权文本数据来训练医学图像分类器的可操作见解,同时突出了当前研究的空白之处。
论文及项目相关链接
Summary
医学图像训练机器学习模型时,常伴随有包含丰富专家注释的放射学报告。然而,依赖这些报告作为临床预测输入需要训练有素的放射科医生进行及时的手动工作。本文探讨了在训练过程中如何利用放射学报告改进仅基于图像的分类问题。研究发现,在预训练阶段利用报告对下游分类任务有益,但在标签在文本中代表性不足的情况下,通过明确的图像文本对齐进行预训练可能是有害的。此外,用报告进行微调可能导致显著改进,甚至在某些情况下,其影响大于预训练的方法。
Key Takeaways
- 医学图像训练机器学习模型时,放射学报告可作为重要资源。
- 报告中的信息可以在预训练阶段用于改进图像分类。
- 当标签在报告中的代表性足够时,利用报告进行预训练对下游任务有益。
- 在某些情况下,通过图像文本明确对齐进行预训练可能有风险。
- 使用报告进行微调可以带来显著改进。
- 在某些设置下,微调的影响可能大于预训练。
点此查看论文截图
MiCADangelo: Fine-Grained Reconstruction of Constrained CAD Models from 3D Scans
Authors:Ahmet Serdar Karadeniz, Dimitrios Mallis, Danila Rukhovich, Kseniya Cherenkova, Anis Kacem, Djamila Aouada
Computer-Aided Design (CAD) plays a foundational role in modern manufacturing and product development, often requiring designers to modify or build upon existing models. Converting 3D scans into parametric CAD representations–a process known as CAD reverse engineering–remains a significant challenge due to the high precision and structural complexity of CAD models. Existing deep learning-based approaches typically fall into two categories: bottom-up, geometry-driven methods, which often fail to produce fully parametric outputs, and top-down strategies, which tend to overlook fine-grained geometric details. Moreover, current methods neglect an essential aspect of CAD modeling: sketch-level constraints. In this work, we introduce a novel approach to CAD reverse engineering inspired by how human designers manually perform the task. Our method leverages multi-plane cross-sections to extract 2D patterns and capture fine parametric details more effectively. It enables the reconstruction of detailed and editable CAD models, outperforming state-of-the-art methods and, for the first time, incorporating sketch constraints directly into the reconstruction process.
计算机辅助设计(CAD)在现代制造和产品开发中发挥基础作用,通常需要设计师修改或基于现有模型进行创建。将3D扫描转化为参数化CAD表示的过程,即所谓的CAD逆向工程,仍然是一个重大挑战,因为CAD模型具有高精度和结构复杂性。现有的基于深度学习的方法主要分为两类:自下而上的几何驱动方法,往往无法产生完全参数化的输出;自上而下的策略,往往忽略细粒度几何细节。此外,当前的方法忽略了CAD建模的一个基本方面:草图级别的约束。在这项工作中,我们受到人类设计师手动执行任务的启发,介绍了一种新的CAD逆向工程方法。我们的方法利用多平面截面提取二维图案,更有效地捕捉精细参数细节。它能够实现详细的可编辑CAD模型的重建,优于最先进的方法,并且首次将草图约束直接纳入重建过程中。
论文及项目相关链接
PDF Accepted at NeurIPS 2025
Summary
该文介绍了计算机辅助设计(CAD)在现代制造业和产品开发中的重要性,以及将3D扫描转换为参数化CAD表示(即CAD逆向工程)的挑战。现有深度学习方法存在不足,无法完全满足需求。本文提出了一种受人类设计师手动操作启发的新方法,利用多平面截面提取2D模式,更有效地捕捉精细参数细节,并首次将草图约束直接纳入重建过程,能够重建详细且可编辑的CAD模型。
Key Takeaways
- 计算机辅助设计(CAD)在现代制造业和产品开发中扮演重要角色,3D扫描转参数化CAD表示(CAD逆向工程)是一大挑战。
- 现有深度学习方法分为自下而上的几何驱动方法和自上而下的策略,前者难以产生完全参数化输出,后者忽略细节。
- 当前方法忽视了CAD建模的重要方面:草图级约束。
- 本文提出了一种新的CAD逆向工程方法,受人类设计师操作启发,利用多平面截面提取2D模式。
- 该方法更有效地捕捉精细参数细节,并首次将草图约束纳入重建过程。
- 所提方法能够重建详细且可编辑的CAD模型。
点此查看论文截图
Progressive Growing of Patch Size: Curriculum Learning for Accelerated and Improved Medical Image Segmentation
Authors:Stefan M. Fischer, Johannes Kiechle, Laura Daza, Lina Felsner, Richard Osuala, Daniel M. Lang, Karim Lekadir, Jan C. Peeken, Julia A. Schnabel
In this work, we introduce Progressive Growing of Patch Size, an automatic curriculum learning approach for 3D medical image segmentation. Our approach progressively increases the patch size during model training, resulting in an improved class balance for smaller patch sizes and accelerated convergence of the training process. We evaluate our curriculum approach in two settings: a resource-efficient mode and a performance mode, both regarding Dice score performance and computational costs across 15 diverse and popular 3D medical image segmentation tasks. The resource-efficient mode matches the Dice score performance of the conventional constant patch size sampling baseline with a notable reduction in training time to only 44%. The performance mode improves upon constant patch size segmentation results, achieving a statistically significant relative mean performance gain of 1.28% in Dice Score. Remarkably, across all 15 tasks, our proposed performance mode manages to surpass the constant patch size baseline in Dice Score performance, while simultaneously reducing training time to only 89%. The benefits are particularly pronounced for highly imbalanced tasks such as lesion segmentation tasks. Rigorous experiments demonstrate that our performance mode not only improves mean segmentation performance but also reduces performance variance, yielding more trustworthy model comparison. Furthermore, our findings reveal that the proposed curriculum sampling is not tied to a specific architecture but represents a broadly applicable strategy that consistently boosts performance across diverse segmentation models, including UNet, UNETR, and SwinUNETR. In summary, we show that this simple yet elegant transformation on input data substantially improves both Dice Score performance and training runtime, while being compatible across diverse segmentation backbones.
在这项工作中,我们引入了渐进式斑块大小增长(Progressive Growing of Patch Size),这是一种用于三维医学图像分割的自动课程学习方法。我们的方法能够在模型训练过程中逐步增加斑块大小,从而提高较小斑块大小的类别平衡并加速训练过程的收敛。我们在两种设置下评估了我们的课程方法:资源高效模式和性能模式,这两种模式都涉及Dice得分性能和计算成本,跨越了多样且流行的十五种三维医学图像分割任务。资源高效模式与常规恒定斑块大小采样的基准水平匹配Dice得分性能,并将训练时间缩短到只有百分之四十四。性能模式在恒定斑块大小分割结果的基础上进行了改进,在Dice得分上实现了统计意义上相对平均性能提升百分之一点二八。值得注意的是,在所有十五项任务中,我们提出的性能模式成功超越了恒定斑块大小的基线水平在Dice得分性能方面的表现,同时还将训练时间缩短到只有百分之八十九。对于高度不平衡的任务(如病变分割任务)的好处尤为突出。严格的实验表明,我们的性能模式不仅提高了平均分割性能,还降低了性能方差,从而产生了更可靠的模型比较。此外,我们的研究结果表明,所提出的课程采样并不局限于特定的架构,而是一种广泛适用的策略,在各种分割模型中都能持续提升性能,包括UNet、UNETR和SwinUNETR。总之,我们证明了这种简单而优雅的输入数据转换在大幅提高Dice得分性能和训练运行时间的同时,还兼容多种分割主干模型。
论文及项目相关链接
PDF Journal Extension of “Progressive Growing of Patch Size: Resource-Efficient Curriculum Learning for Dense Prediction Tasks” (MICCAI2024) submitted to MedIA
Summary
本文提出一种名为渐进式增长补丁大小的自动课程学习方法,用于3D医学图像分割。该方法在模型训练过程中逐步增加补丁大小,提高了较小补丁大小的类别平衡,并加速了训练过程的收敛。通过资源高效模式和性能模式两种设置验证了该方法的有效性。资源高效模式在保持与常规恒定补丁大小采样基准相同的Dice得分性能的同时,将训练时间缩短至44%。性能模式则实现了对恒定补丁大小分割结果的改进,相对平均性能提升1.28%。在所有15项任务中,性能模式在Dice得分性能上超越了恒定补丁大小基准线,并将训练时间缩短至89%。该方法对高度不平衡的任务,如病变分割任务,具有特别显著的优势。此外,该研究还发现,所提出的课程采样策略并非局限于特定的架构,而是一种广泛适用的策略,可在各种分割模型(包括UNet、UNETR和SwinUNETR)中不断提高性能。总体而言,该研究证明这种简单而优雅的输入数据转换可显著提高Dice得分性能和训练时间效率。
Key Takeaways
- 引入了一种名为渐进式增长补丁大小的自动课程学习方法,用于改善3D医学图像分割。
- 通过两种模式(资源高效模式和性能模式)验证了该方法的有效性。
- 资源高效模式在保持Dice得分性能的同时大幅缩短训练时间。
- 性能模式在多项任务中超越了常规恒定补丁大小分割方法的性能。
- 该方法对于高度不平衡的任务具有显著优势。
- 所提出的课程采样策略适用于多种分割模型,包括UNet、UNETR和SwinUNETR。
点此查看论文截图
Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
Authors:Qi Chen, Xinze Zhou, Chen Liu, Hao Chen, Wenxuan Li, Zekun Jiang, Ziyan Huang, Yuxuan Zhao, Dexin Yu, Junjun He, Yefeng Zheng, Ling Shao, Alan Yuille, Zongwei Zhou
AI for tumor segmentation is limited by the lack of large, voxel-wise annotated datasets, which are hard to create and require medical experts. In our proprietary JHH dataset of 3,000 annotated pancreatic tumor scans, we found that AI performance stopped improving after 1,500 scans. With synthetic data, we reached the same performance using only 500 real scans. This finding suggests that synthetic data can steepen data scaling laws, enabling more efficient model training than real data alone. Motivated by these lessons, we created AbdomenAtlas 2.0–a dataset of 10,135 CT scans with a total of 15,130 tumor instances per-voxel manually annotated in six organs (pancreas, liver, kidney, colon, esophagus, and uterus) and 5,893 control scans. Annotated by 23 expert radiologists, it is several orders of magnitude larger than existing public tumor datasets. While we continue expanding the dataset, the current version of AbdomenAtlas 2.0 already provides a strong foundation–based on lessons from the JHH dataset–for training AI to segment tumors in six organs. It achieves notable improvements over public datasets, with a +7% DSC gain on in-distribution tests and +16% on out-of-distribution tests.
人工智能在肿瘤分割方面的应用受限于大规模、逐体素标注的数据集的缺乏,这类数据集的创建困难,需要医学专家参与。在我们拥有的3000个标注胰腺肿瘤扫描的JHH数据集中,我们发现人工智能的性能在1500次扫描后就不再提高。使用合成数据,我们仅使用500次真实扫描就达到了同样的性能。这一发现表明,合成数据可以加快数据扩展规律,使得模型训练比单独使用真实数据更加高效。基于这些经验,我们创建了AbdomenAtlas 2.0数据集,包含10,135个CT扫描,共手动逐体素标注了15,130个肿瘤实例,涉及六个器官(胰腺、肝脏、肾脏、结肠、食道和子宫),还有5,893个对照扫描。由23位专业放射科医生进行标注,其规模远大于现有的公共肿瘤数据集。虽然我们正在继续扩大数据集,但AbdomenAtlas 2.0的当前版本已经为在六个器官中训练用于分割肿瘤的AI提供了坚实的基础,基于JHH数据集的经验,它在公共数据集上取得了显著的改进,在内部测试中获得了+7%的DSC增益,在外部测试上获得了+16%的增益。
论文及项目相关链接
PDF ICCV 2025
Summary
肿瘤分割人工智能受限于大规模、逐像素注释数据集的缺乏,创建此类数据集困难且需要医学专家。在自有JHH胰腺肿瘤扫描数据集(含3000个注释)中发现,AI性能在1500个扫描后不再提升。合成数据的使用仅使用500个真实扫描即达到相同性能,显示合成数据可优化数据缩放法则,比单独使用真实数据更高效训练模型。基于这些经验,创建了AbdomenAtlas 2.0数据集,包含手动逐像素注释的六个器官(胰腺、肝脏、肾脏、结肠、食道和子宫)的肿瘤实例共15,130个,控制扫描数量达数千张。专家放射科医生标注的数据集比现有公共肿瘤数据集大几个数量级。目前版本已基于JHH数据集的经验为训练AI分割六个器官的肿瘤提供了坚实基础,相较于公共数据集有显著改进,在内部测试中可提高DSC得分7%,外部测试中提高DSC得分达16%。
Key Takeaways
1. AI在肿瘤分割领域受限于缺乏大规模逐像素注释的数据集。
2. 创建这种数据集既困难又需要医学专家参与。
3. 在自有JHH数据集中发现AI性能在达到一定扫描数量后不再提升。
4. 合成数据能够优化数据缩放法则,并可实现仅使用较少真实数据就达到高效的模型训练效果。
5. AbdomenAtlas 2.0数据集由多个器官的肿瘤实例组成,包括胰腺、肝脏等六个器官,且数据集规模远超现有公共肿瘤数据集。
6. AbdomenAtlas 2.0数据集的创建得益于从JHH数据集中汲取的经验教训。
点此查看论文截图
WeCKD: Weakly-supervised Chained Distillation Network for Efficient Multimodal Medical Imaging
Authors:Md. Abdur Rahman, Mohaimenul Azam Khan Raiaan, Sami Azam, Asif Karim, Jemima Beissbarth, Amanda Leach
Knowledge distillation (KD) has traditionally relied on a static teacher-student framework, where a large, well-trained teacher transfers knowledge to a single student model. However, these approaches often suffer from knowledge degradation, inefficient supervision, and reliance on either a very strong teacher model or large labeled datasets. To address these, we present the first-ever Weakly-supervised Chain-based KD network (WeCKD) that redefines knowledge transfer through a structured sequence of interconnected models. Unlike conventional KD, it forms a progressive distillation chain, where each model not only learns from its predecessor but also refines the knowledge before passing it forward. This structured knowledge transfer further enhances feature learning and addresses the limitations of one-step KD. Each model in the chain is trained on only a fraction of the dataset and shows that effective learning can be achieved with minimal supervision. Extensive evaluation on six imaging datasets across otoscopic, microscopic, and magnetic resonance imaging modalities shows that it generalizes and outperforms existing methods. Furthermore, the proposed distillation chain resulted in cumulative accuracy gains of up to +23% over a single backbone trained on the same limited data, which highlights its potential for real-world adoption.
知识蒸馏(KD)传统上依赖于静态的教师-学生框架,其中大型、训练良好的教师模型将知识转移给单一的学生模型。然而,这些方法常常遭受知识退化、监督效率低下以及依赖非常强大的教师模型或大量标注数据集的困扰。为了解决这些问题,我们首次提出了弱监督链式KD网络(WeCKD),它通过一系列相互连接的结构化模型重新定义知识转移。不同于传统的KD,它形成了一个渐进的蒸馏链,其中每个模型不仅从它的前身中学习,还精炼知识后再向前传递。这种结构化的知识转移进一步增强了特征学习,并解决了一步KD的局限性。链中的每个模型只在数据集的一部分上进行训练,并证明在最小监督下可以实现有效的学习。在耳镜、显微镜和磁共振成像等六种成像数据集上的广泛评估表明,它具有良好的通用性并在现有方法上表现出优越性。此外,与在相同有限数据上训练的单个主干相比,所提出的知识蒸馏链的累积精度提高了高达+23%,这突显了其在实际应用中的潜力。
论文及项目相关链接
Summary
本文介绍了一种新型的弱监督链式知识蒸馏网络(WeCKD),该网络通过一系列互联模型进行结构化知识转移,解决了传统知识蒸馏中的知识退化、监督效率低下以及对强教师模型或大量标记数据集的依赖问题。WeCKD形成了一种渐进式的蒸馏链,每个模型不仅从前一个模型学习,还精炼知识后传递给下一个模型。这种结构化的知识转移提高了特征学习能力,并克服了一步知识蒸馏的局限性。在六个成像数据集上的广泛评估表明,该方法在耳镜、显微镜和磁共振成像等多种模态下具有通用性并优于现有方法。此外,与在相同有限数据上训练的单一主干相比,蒸馏链的累积准确率提高了高达+23%,突显了其在实际应用中的潜力。
Key Takeaways
- 传统知识蒸馏依赖于静态的教师-学生框架,存在知识退化、监督效率低下和对强教师模型的依赖问题。
- WeCKD网络是一种新型的弱监督链式知识蒸馏方法,通过一系列互联模型进行结构化知识转移。
- WeCKD形成渐进式的蒸馏链,每个模型不仅学习前序模型的知识,还精炼后传递。
- 结构化的知识转移提高了特征学习能力,并解决了单步知识蒸馏的局限性。
- WeCKD在多种成像模态下具有通用性,并在六个数据集上的评估中优于现有方法。
- 与单一主干相比,蒸馏链的累积准确率显著提高。
- WeCKD的潜力在于其在实际应用中的广泛应用,特别是在有限数据的场景下。
点此查看论文截图