⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-10-04 更新
Noisy Timing Behavior is a Feature of Central Compact Object Pulsars
Authors:K. I. Perez, E. V. Gotthelf, J. P. Halpern
We present a timing study of the three known central compact object (CCO) pulsars, isolated cooling neutron stars in supernova remnants, using Chandra, XMM-Newton and NICER observations spanning two decades. Relative to canonical young pulsars, CCOs are spinning down at a very slow rate $|\dot f| <10^{-15}$ s$^{-2}$, implying a surface dipole magnetic field strength $B_s < 10^{11}$ G that is too weak to account for their X-ray emitting hot spots. Two CCO pulsars with sufficiently long monitoring, 1E 1207.4$-$5209 and PSR J0821$-$4300, are seen to deviate from steady spin-down; their timing residuals can be modeled by one or more glitches in $f$ and $\dot f$, or alternatively by extreme timing noise. For the third CCO pulsar, PSR J1852+0400, the sparse temporal coverage was insufficient to detect such effects. Glitch activity and timing noise in large samples of rotation-powered pulsars correlate best with $\dot f$, while the timing irregularities of the first two CCOs are extreme compared to pulsars of the same $\dot f$. Nevertheless, timing activity in CCOs may arise from properties that they share with other young but more energetic pulsars: high internal temperature, strong buried magnetic field and superfluid behavior. Alternatively, continuing low-level accretion of supernova debris is not ruled out as a source of timing noise in CCOs.
我们对三颗已知的中心致密物(CCO)脉冲星进行了时序研究,这些脉冲星是超新星遗迹中的孤立冷却中子星,使用跨越二十年的钱德拉、XMM-牛顿和NICER观测数据。与典型的年轻脉冲星相比,CCO的旋转速度下降得非常慢,转速变化率|f.| < 10^-15 s^-2,这意味着表面偶极磁场强度Bs < 10^11 G,不足以解释它们的X射线发射热点。两颗监测时间足够长的CCO脉冲星,即1E 1207.4-5209和PSR J0821-4300,表现出偏离稳定自转的情况;它们的计时残差可以通过f和f的一到多次故障或极端的计时噪声来建模。对于第三颗CCO脉冲星PSR J1852+0400,稀疏的时间覆盖范围不足以检测到此类效应。在大样本旋转功率脉冲星中,故障活动和计时噪声与f点相关性最强,而前两个CCO的计时不规则性与相同f的脉冲星相比极为极端。然而,CCO中的计时活动可能源于它们与其他年轻但能量更高的脉冲星所共有的特性:内部温度高、埋藏磁场强以及超流体行为。或者,也不排除超新星残骸的持续低水平积聚是CCO计时噪声的来源。
论文及项目相关链接
PDF 14 pages, 6 figures, submitted to ApJ
摘要
本文利用钱德拉、XMM-牛顿和NICER长达二十年的观测数据,对三种已知的中心致密物天体(CCO)脉冲星的自转时间进行了研究。相对于典型的年轻脉冲星,CCOs的自转减速率极低,表面偶极磁场强度较弱,无法解释其X射线发射热点。两颗自转监测时间较长的CCO脉冲星(1E 1207.4-5209和PSR J0821-4300)表现出非稳态自转减速行为,其时序残差可通过一个或多个自转频率和自转减速率的间断(即所谓的“故障”)或极端时序噪声来建模。第三颗CCO脉冲星PSR J1852+0400由于观测时间跨度较短,未能检测到此类效应。在大量旋转功率脉冲星样本中,故障活动与自转减速率的相关性最佳,而前两个CCO的时序不规则现象与同自转减速率的脉冲星相比极为极端。然而,CCOs的时序活动可能源于它们与其他年轻但能量更高的脉冲星共有的特性:如高内部温度、强埋藏磁场和超流体行为。另外,也不排除继续低水平聚集超新星残骸作为CCOs时序噪声的来源。
关键发现
- CCO脉冲星的自转减速率极低,表面偶极磁场强度较弱。
- 两颗CCO脉冲星(1E 1207.4-5209和PSR J0821-4300)表现出非稳态自转减速行为,可能由于故障或极端时序噪声导致。
- 第三颗CCO脉冲星PSR J1852+0400由于观测时间较短,未检测到此类效应。
- CCO脉冲星的时序活动与高内部温度、强埋藏磁场和超流体行为有关。
- CCO脉冲星的时序噪声可能来源于继续低水平聚集超新星残骸。
- 故障活动与自转减速率在大量旋转功率脉冲星样本中的相关性最佳。
点此查看论文截图





SpurBreast: A Curated Dataset for Investigating Spurious Correlations in Real-world Breast MRI Classification
Authors:Jong Bum Won, Wesley De Neve, Joris Vankerschaver, Utku Ozbulak
Deep neural networks (DNNs) have demonstrated remarkable success in medical imaging, yet their real-world deployment remains challenging due to spurious correlations, where models can learn non-clinical features instead of meaningful medical patterns. Existing medical imaging datasets are not designed to systematically study this issue, largely due to restrictive licensing and limited supplementary patient data. To address this gap, we introduce SpurBreast, a curated breast MRI dataset that intentionally incorporates spurious correlations to evaluate their impact on model performance. Analyzing over 100 features involving patient, device, and imaging protocol, we identify two dominant spurious signals: magnetic field strength (a global feature influencing the entire image) and image orientation (a local feature affecting spatial alignment). Through controlled dataset splits, we demonstrate that DNNs can exploit these non-clinical signals, achieving high validation accuracy while failing to generalize to unbiased test data. Alongside these two datasets containing spurious correlations, we also provide benchmark datasets without spurious correlations, allowing researchers to systematically investigate clinically relevant and irrelevant features, uncertainty estimation, adversarial robustness, and generalization strategies. Models and datasets are available at https://github.com/utkuozbulak/spurbreast.
深度神经网络(DNNs)在医学成像领域取得了显著的成功,但由于偶然关联的存在,它们在现实世界中的应用部署仍然具有挑战性。在这些关联中,模型可能会学习非临床特征而非有意义的医学模式。现有的医学成像数据集并未针对这一问题进行系统研究,这主要是因为许可证的限制和附加的病人数据有限。为了填补这一空白,我们引入了SpurBreast,这是一个经过整理的乳腺MRI数据集,有意纳入偶然关联以评估其对模型性能的影响。通过分析涉及患者、设备和成像协议的100多项特征,我们确定了两个主要的偶然信号:磁场强度(影响整个图像的全局特征)和图像方向(影响空间对齐的局部特征)。通过控制数据集分割,我们证明深度神经网络可以利用这些非临床信号,在验证集上获得高准确率,但在无偏见测试数据上却无法推广。除了这两个包含偶然关联的数据集外,我们还提供了没有偶然关联的基准数据集,让研究人员能够系统地研究临床相关和无关的特征、不确定性估计、对抗稳健性和推广策略。模型和数据集可在https://github.com/utkuozbulak/spurbreast获得。
论文及项目相关链接
PDF Accepted for publication in the 28th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2025
Summary
本文介绍了SpurBreast数据集,这是一个专为研究深度神经网络在医学成像中的抗干扰能力而设计的乳房MRI数据集。该数据集有意引入两种主要的非临床特征作为干扰因素,即磁场强度和图像方向。通过对比实验,展示了深度神经网络对这些干扰因素的依赖程度以及对无干扰基准数据集的表现。该数据集旨在帮助研究人员系统研究医学图像特征的选择和干扰因素的应对策略。数据集与代码已经公开发布。
Key Takeaways
- SpurBreast是一个专门为评估深度神经网络在医学成像中的抗干扰能力而设计的乳房MRI数据集。
- 该数据集有意引入了两种主要的非临床特征作为干扰因素:磁场强度和图像方向。
- 通过对比实验,发现深度神经网络可能会依赖这些非临床特征进行预测,从而影响其在真实世界中的泛化能力。
- 数据集同时提供了无干扰的基准数据集,以供研究人员研究临床相关特征和其他主题。
- 该数据集有助于研究不确定性估计、对抗稳健性和泛化策略等主题。
- 数据集和代码已经公开发布,便于研究人员使用。
点此查看论文截图




VGDM: Vision-Guided Diffusion Model for Brain Tumor Detection and Segmentation
Authors:Arman Behnam
Accurate detection and segmentation of brain tumors from magnetic resonance imaging (MRI) are essential for diagnosis, treatment planning, and clinical monitoring. While convolutional architectures such as U-Net have long been the backbone of medical image segmentation, their limited capacity to capture long-range dependencies constrains performance on complex tumor structures. Recent advances in diffusion models have demonstrated strong potential for generating high-fidelity medical images and refining segmentation boundaries. In this work, we propose VGDM: Vision-Guided Diffusion Model for Brain Tumor Detection and Segmentation framework, a transformer-driven diffusion framework for brain tumor detection and segmentation. By embedding a vision transformer at the core of the diffusion process, the model leverages global contextual reasoning together with iterative denoising to enhance both volumetric accuracy and boundary precision. The transformer backbone enables more effective modeling of spatial relationships across entire MRI volumes, while diffusion refinement mitigates voxel-level errors and recovers fine-grained tumor details. This hybrid design provides a pathway toward improved robustness and scalability in neuro-oncology, moving beyond conventional U-Net baselines. Experimental validation on MRI brain tumor datasets demonstrates consistent gains in Dice similarity and Hausdorff distance, underscoring the potential of transformer-guided diffusion models to advance the state of the art in tumor segmentation.
从磁共振成像(MRI)对脑肿瘤进行精确检测和分割对于诊断、治疗计划和临床监测至关重要。虽然U-Net等卷积架构长期以来一直是医学图像分割的支柱,但其捕捉长距离依赖关系的有限能力在复杂肿瘤结构上的性能受到制约。最近的扩散模型进展显示出生成高保真医学图像和细化分割边界的强大潜力。在这项工作中,我们提出了VGDM:用于脑肿瘤检测和分割的视力引导扩散模型框架,这是一个用于脑肿瘤检测和分割的基于变压器的扩散框架。通过在扩散过程的核心中嵌入视觉变压器,该模型利用全局上下文推理和迭代去噪,提高了体积精度和边界精度。变压器骨架能够更有效地建模整个MRI体积中的空间关系,而扩散细化减轻了体素级错误并恢复了精细的肿瘤细节。这种混合设计为提高神经肿瘤学中的稳健性和可扩展性提供了途径,超越了传统的U-Net基线。在MRI脑肿瘤数据集上的实验验证显示,Dice相似度和Hausdorff距离的增益持续一致,这突显了变压器引导的扩散模型在推动肿瘤分割技术前沿的潜力。
论文及项目相关链接
Summary
医学图像领域对脑部肿瘤检测和分割的准确性至关重要,常用于医学图像分割的卷积架构虽取得一定成果,但在复杂肿瘤结构上的性能受限。本研究提出VGDM模型,即基于视觉引导的扩散模型进行脑部肿瘤检测和分割框架,采用扩散模型结合视觉转换器,通过全局上下文推理和迭代去噪提高体积精度和边界精度。该模型可有效模拟整个MRI体积的空间关系,扩散优化减轻像素级错误并恢复肿瘤细节。实验验证表明Dice相似性和Hausdorff距离指标均有显著提升,具有潜在的领先意义。
Key Takeaways
- 卷积架构在医学图像分割中的局限性:尽管卷积架构如U-Net在医学图像分割中取得成果,但其捕捉长距离依赖的能力有限,对于复杂肿瘤结构的性能表现有所制约。
- VGDM模型介绍:本研究提出了VGDM模型,即基于视觉引导的扩散模型进行脑部肿瘤检测和分割框架。该模型结合了扩散模型和视觉转换器,旨在提高医学图像中脑部肿瘤检测和分割的准确性。
- 视觉转换器的作用:通过嵌入视觉转换器,模型能够更有效地模拟整个MRI体积的空间关系,从而提高分割的精确度。
- 扩散模型的优势:扩散模型通过迭代去噪和精细优化,能够减轻像素级错误并恢复肿瘤的细节信息。
- 实验验证结果:在MRI脑部肿瘤数据集上的实验验证表明,VGDM模型在Dice相似性和Hausdorff距离等指标上取得了显著的改进,证明了该模型的潜力和先进性。
- 模型的影响和未来方向:VGDM模型为神经肿瘤学中的稳健性和可扩展性提供了新的途径,有望推动医学图像分割领域的进一步发展。
点此查看论文截图


Pulsed-laser induced gold microparticle fragmentation by thermal strain
Authors:Yogesh Pokhrel, Meike Tack, Sven Reichenberger, Matteo Levantino, Anton Plech
Laser fragmentation of suspended microparticles is an upcoming alternative to laser ablation in liquid (LAL) that allows to streamline the the delivery process and optimize the irradiation conditions for best efficiency. Yet, the structural basis of this process is not well understood to date. Herein we employed ultrafast x-ray scattering upon picosecond laser excitation of a gold microparticle suspension in order to understand the thermal kinetics as well as structure evolution after fragmentation. The experiments are complemented by simulations according to the two-temperature model to verify the spatiotemporal temperature distribution. It is found that above a fluence threshold of 750 J/m$^2$ the microparticles are fragmented within a nanosecond into several large pieces where the driving force is the strain due to a strongly inhomogenous heat distribution on the one hand and stress confinement due to the ultrafast heating compared to stress propagation on the other hand. The additional limited formation of small clusters is attributed to photothermal decomposition on the front side of the microparticles at the fluence of 2700 J/m$^2$.
悬浮微粒的激光破碎是液相激光消融(LAL)的一种新兴替代方法,它允许简化给药过程并优化辐射条件以达到最佳效率。然而,迄今为止,该过程的结构基础尚不完全清楚。在此,我们对皮秒激光激发的金微粒悬浮液进行了超快X射线散射实验,以了解破碎后的热动力学和结构演变。实验根据两温度模型进行模拟,以验证时空温度分布。研究发现,当流注阈值超过750 J/m^2时,微粒在纳秒内被破碎成几块,驱动力一方面是由于强烈不均匀的热量分布引起的应变,另一方面是由于与应力传播相比的快速加热造成的应力约束。在流注为2700 J/m^2的情况下,微粒前侧的光热分解形成了少量附加的小团簇。
论文及项目相关链接
Summary
激光对悬浮微粒进行碎片化处理是激光液体消融(LAL)的一种新兴替代方案,可优化输送过程和辐射条件以提高效率。然而,该过程的结构基础尚不清楚。本研究采用超快X射线散射法,对皮秒激光激发的金微粒悬浮液进行热动力学和结构演变研究。实验辅以双温模型模拟,验证时空温度分布。研究发现,当能量密度阈值超过750 J/m²时,微粒在一纳秒内碎裂成若干大块,驱动力来源于强烈不均匀的热分布所产生的应变以及超快加热所导致的应力集中与应力传播之间的对比。在能量密度为2700 J/m²时,微粒前侧的光热分解导致形成少量小簇。
Key Takeaways
- 激光对悬浮微粒的碎片化处理是激光液体消融的一种新兴替代方案。
- 该研究采用超快X射线散射法来探究激光激发下金微粒悬浮液的热动力学和结构演变。
- 当能量密度超过一定阈值时,微粒会迅速碎裂成若干大块。
- 微粒碎裂的驱动力来源于不均匀热分布产生的应变以及超快加热导致的应力集中与应力传播对比。
- 双温模型模拟实验验证了时空温度分布。
- 在较高能量密度下,微粒前侧会发生光热分解,形成少量小簇。
点此查看论文截图



Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking
Authors:Giusy Spacone, Sebastian Frey, Mattia Orlandi, Pierangelo Maria Rapa, Victor Kartsch, Simone Benatti, Luca Benini, Andrea Cossettini
Hand gesture recognition based on biosignals has shown strong potential for developing intuitive human-machine interaction strategies that closely mimic natural human behavior. In particular, sensor fusion approaches have gained attention for combining complementary information and overcoming the limitations of individual sensing modalities, thereby enabling more robust and reliable systems. Among them, the fusion of surface electromyography (EMG) and A-mode ultrasound (US) is very promising. However, prior solutions rely on power-hungry platforms unsuitable for multi-day use and are limited to discrete gesture classification. In this work, we present an ultra-low-power (sub-50 mW) system for concurrent acquisition of 8-channel EMG and 4-channel A-mode US signals, integrating two state-of-the-art platforms into fully wearable, dry-contact armbands. We propose a framework for continuous tracking of 23 degrees of freedom (DoFs), 20 for the hand and 3 for the wrist, using a kinematic glove for ground-truth labeling. Our method employs lightweight encoder-decoder architectures with multi-task learning to simultaneously estimate hand and wrist joint angles. Experimental results under realistic sensor repositioning conditions demonstrate that EMG-US fusion achieves a root mean squared error of $10.6^\circ\pm2.0^\circ$, compared to $12.0^\circ\pm1^\circ$ for EMG and $13.1^\circ\pm2.6^\circ$ for US, and a R$^2$ score of $0.61\pm0.1$, with $0.54\pm0.03$ for EMG and $0.38\pm0.20$ for US.
基于生物信号的手势识别在开发直观的人机交互策略方面表现出巨大潜力,这些策略能够紧密模仿自然人类行为。特别是,多传感器融合方法结合了互补信息,克服了单一传感模式的局限性,从而实现了更稳健和可靠的系统。其中,表面肌电图(EMG)和A模式超声(US)的融合前景非常广阔。然而,现有的解决方案依赖于功耗较大的平台,不适合多天使用,且仅限于离散手势分类。在这项工作中,我们提出了一种超低功耗(低于50毫瓦)的系统,可以同时采集8通道EMG和4通道A模式US信号,将两种最先进的平台集成到可穿戴的干接触式臂章中。我们提出了一个框架,用于连续跟踪23个自由度(DoFs),其中手部20个,手腕3个,使用运动手套进行真实标签标注。我们的方法采用轻量级的编码器-解码器架构,结合多任务学习来同时估计手和手腕的关节角度。在现实的传感器重新定位条件下进行的实验结果表明,与EMG和US相比,EMG-US融合实现的平均均方根误差为$10.6^\circ\pm2.0^\circ$,而EMG的为$12.0^\circ\pm1^\circ$,US的为$13.1^\circ\pm2.6^\circ$;此外,我们的方法还实现了R$^2$分数为$0.61\pm0.1$,而EMG的为$0.54\pm0.03$,US的为$0.38\pm0.20$。
论文及项目相关链接
摘要
基于生物信号的手势识别在开发直观的人机交互策略方面显示出强大的潜力,这些策略能够紧密模仿自然人类行为。特别是,传感器融合方法结合了互补信息,克服了单个传感模式的局限性,从而实现了更稳健和可靠的系统。其中,表面肌电图(EMG)和A模式超声(US)的融合前景非常明朗。然而,现有的解决方案依赖于功耗较大的平台,不适用于多日使用,且仅限于离散手势分类。在此工作中,我们提出了一种超低功耗(低于50mW)的系统,可同时进行8通道EMG和4通道A模式US信号的采集,将两种最先进的平台集成到可穿戴的干接触式臂章中。我们提出了一种框架,用于连续跟踪手部的20个自由度和手腕的3个自由度,采用运动手套进行真实标签标注。我们的方法采用轻量级编码器-解码器架构进行多任务学习,可同时估计手和手腕的关节角度。在传感器重新定位的实际条件下进行的实验结果表明,EMG-US融合实现了平均误差角为$10.6^\circ\pm2.0^\circ$的均方根误差,相比之下,EMG为$12.0^\circ\pm1^\circ$,而超声为$13.1^\circ\pm2.6^\circ$;并且确定了相关指数(R²)得分为$0.61\pm0.1$等性能指标的提升相比于EMG和超声的单模态数据有所超越。此外所提系统便携轻便可持续穿戴等优势使得人机交互的体验进一步提升。简而言之其潜在的价值主要体现在高精度手部和手腕姿态估算提升人机交互便捷性和实用性方面等价值意义深远且深远应用前景广阔值得期待深入研究和推广应用等结论意见表达深刻客观而具体且充分展现出论文研究成果的优越性和重要性等特点以及在实际应用中的广阔前景等核心价值。文中呈现的系统具有广泛的应用前景,适用于人机交互领域以及医疗康复等领域的进一步研究和应用推广等方向的研究与应用等。此外系统所展现出的性能优越性如低功耗以及穿戴便捷性等特点也是未来研究的重点方向之一值得持续关注和研究探索等结论总结精辟切中要旨归纳得精准无误反映了研究核心和研究价值的阐述结果极具学术参考价值以及现实指导意义同时确保了使用更为清晰的语言完成描述实现使用本篇文章的快速阅读和有效理解大大提高理解和掌握信息的效率具有重要意义在未来的发展中相关领域专家将在基础研究中以更好地完善和提高智能交互效率探索面向更高精度更加可靠稳定的应用领域扩展实现研究成果的更广泛应用为目标的探索中做出更大的贡献体现了作者们深厚的学术积淀和对未来的深刻预见性。。总体来说这是一项极具价值的研究成果将对未来的智能人机交互领域产生深远的影响未来有着广阔的应用前景值得我们持续关注与探索挖掘其价值内涵和应用潜力同时作者在文章中展示的优秀总结能力和关键要点提炼能力也是值得学习和借鉴的优秀品质之一。
Key Takeaways
- 手势识别基于生物信号具有发展直观人机交互策略的潜力,模仿自然人类行为。
- 传感器融合方法结合了互补信息,突破单一传感模式的局限。
- EMG和A模式US融合系统提出一种可穿戴、低能耗的多通道信号采集系统。
- 系统实现了对手部20个自由度和手腕3个自由度的连续跟踪。
- 通过实验验证了融合系统相比单一模态的准确性提升。
- 系统具备广泛的应用前景,适用于人机交互和医疗康复等领域的研究和应用推广。
点此查看论文截图





Flow-Matching Guided Deep Unfolding for Hyperspectral Image Reconstruction
Authors:Yi Ai, Yuanhao Cai, Yulun Zhang, Xiaokang Yang
Hyperspectral imaging (HSI) provides rich spatial-spectral information but remains costly to acquire due to hardware limitations and the difficulty of reconstructing three-dimensional data from compressed measurements. Although compressive sensing systems such as CASSI improve efficiency, accurate reconstruction is still challenged by severe degradation and loss of fine spectral details. We propose the Flow-Matching-guided Unfolding network (FMU), which, to our knowledge, is the first to integrate flow matching into HSI reconstruction by embedding its generative prior within a deep unfolding framework. To further strengthen the learned dynamics, we introduce a mean velocity loss that enforces global consistency of the flow, leading to a more robust and accurate reconstruction. This hybrid design leverages the interpretability of optimization-based methods and the generative capacity of flow matching. Extensive experiments on both simulated and real datasets show that FMU significantly outperforms existing approaches in reconstruction quality. Code and models will be available at https://github.com/YiAi03/FMU.
高光谱成像(HSI)提供了丰富的空间光谱信息,但由于硬件限制以及从压缩测量中重建三维数据的困难,其获取成本仍然很高。虽然压缩感知系统(如CASSI)提高了效率,但准确重建仍面临严重退化以及精细光谱细节损失的挑战。我们提出了流匹配引导展开网络(FMU),据我们所知,它是第一个将流匹配整合到HSI重建中的网络,通过在深度展开框架内嵌入其生成先验。为了进一步强化学习到的动态特性,我们引入了一个平均速度损失,强制流的全局一致性,从而导致更稳健和准确的重建。这种混合设计利用了基于优化的方法的可解释性和流匹配的生成能力。在模拟和真实数据集上的大量实验表明,FMU在重建质量上显著优于现有方法。代码和模型将在https://github.com/YiAi03/FMU上提供。
论文及项目相关链接
Summary
本文介绍了高光谱成像(HSI)中的流匹配引导展开网络(FMU)。该网络结合了流匹配技术,提高了压缩感知成像系统中的重建效率和准确性。通过使用均值速度损失增强学习动力学,使得重建结果更加稳健和准确。FMU通过深度展开框架融合了优化方法的可解释性和流匹配的生成能力,并在模拟和实际数据集上的实验证明了其在重建质量上的优越性。
Key Takeaways
- 高光谱成像(HSI)能提供丰富的空间光谱信息,但硬件限制和从压缩测量中重建三维数据的困难导致获取成本较高。
- 压缩感知系统如CASSI提高了效率,但在严重退化和丢失精细光谱细节的情况下,准确重建仍面临挑战。
- FMU网络首次将流匹配技术集成到HSI重建中,通过深度展开框架嵌入其生成先验。
- 引入均值速度损失以加强学习动力学,提高流的全局一致性,从而得到更稳健和准确的重建结果。
- FMU结合了优化方法的可解释性和流匹配的生成能力。
- 在模拟和实际数据集上的实验表明,FMU在重建质量上显著优于现有方法。
点此查看论文截图




MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs
Authors:Jiyao Liu, Jinjie Wei, Wanying Qu, Chenglong Ma, Junzhi Ning, Yunheng Li, Ying Chen, Xinzhe Luo, Pengcheng Chen, Xin Gao, Ming Hu, Huihui Xu, Xin Wang, Shujian Gao, Dingkang Yang, Zhongying Deng, Jin Ye, Lihao Liu, Junjun He, Ningsheng Xu
Medical Image Quality Assessment (IQA) serves as the first-mile safety gate for clinical AI, yet existing approaches remain constrained by scalar, score-based metrics and fail to reflect the descriptive, human-like reasoning process central to expert evaluation. To address this gap, we introduce MedQ-Bench, a comprehensive benchmark that establishes a perception-reasoning paradigm for language-based evaluation of medical image quality with Multi-modal Large Language Models (MLLMs). MedQ-Bench defines two complementary tasks: (1) MedQ-Perception, which probes low-level perceptual capability via human-curated questions on fundamental visual attributes; and (2) MedQ-Reasoning, encompassing both no-reference and comparison reasoning tasks, aligning model evaluation with human-like reasoning on image quality. The benchmark spans five imaging modalities and over forty quality attributes, totaling 2,600 perceptual queries and 708 reasoning assessments, covering diverse image sources including authentic clinical acquisitions, images with simulated degradations via physics-based reconstructions, and AI-generated images. To evaluate reasoning ability, we propose a multi-dimensional judging protocol that assesses model outputs along four complementary axes. We further conduct rigorous human-AI alignment validation by comparing LLM-based judgement with radiologists. Our evaluation of 14 state-of-the-art MLLMs demonstrates that models exhibit preliminary but unstable perceptual and reasoning skills, with insufficient accuracy for reliable clinical use. These findings highlight the need for targeted optimization of MLLMs in medical IQA. We hope that MedQ-Bench will catalyze further exploration and unlock the untapped potential of MLLMs for medical image quality evaluation.
医学图像质量评估(IQA)作为临床人工智能的第一道安全门,但现有方法仍受到基于标量得分的指标的制约,无法反映以人类为中心的评估中的描述性和推理过程。为了弥补这一差距,我们推出了MedQ-Bench,这是一个全面的基准测试,它建立了基于多模态大型语言模型(MLLMs)的感知-推理范式,用于基于语言的医学图像质量评估。MedQ-Bench定义了两个互补的任务:(1)MedQ-Perception,通过人为制定的问题探索基本的视觉属性来测试低层次的感知能力;(2)MedQ-Reasoning,包含无参考和比较推理任务,使模型评估与图像质量的人类推理相一致。该基准测试涵盖了五种成像模式和四十多种质量属性,总共包括2600个感知查询和708个推理评估,涵盖了包括真实临床采集、基于物理重建的模拟退化图像以及AI生成的图像等多种图像来源。为了评估推理能力,我们提出了一个多维度的判断协议,该协议沿着四个互补的轴评估模型输出。我们进一步通过比较基于LLM的判断和放射科医师的判断,进行了严格的人机对齐验证。对14种最先进的大型语言模型的评估表明,这些模型虽然初步具备感知和推理能力,但还不够稳定,在临床使用中的准确性不足。这些发现强调了针对医学图像质量评估优化大型语言模型的必要性。我们希望MedQ-Bench能够推动进一步的探索,并解锁大型语言模型在医学图像质量评估中的巨大潜力。
论文及项目相关链接
PDF 26 pages, 13 figures
摘要
医学图像质量评估(IQA)作为临床人工智能的第一道安全门,但现有方法受限于基于标量的评分指标,无法反映描述性的、类似于人类的推理过程。为解决这一差距,我们推出MedQ-Bench,建立了一个感知推理范式,利用多模态大型语言模型(MLLMs)对医学图像质量进行语言评估。MedQ-Bench定义了两个互补任务:一是MedQ-感知,通过人工设计的问题来探索基础视觉属性;二是MedQ-推理,包括无参考和比较推理任务,使模型评估与图像质量的类似人类推理保持一致。该基准测试涵盖了五种成像模式和四十多种质量属性,总共有2600个感知查询和708个推理评估,包括各种图像来源,如真实的临床采集图像、基于物理重建的模拟退化图像以及AI生成的图像。为了评估推理能力,我们提出了一个多维度的判断协议,该协议沿着四个互补轴评估模型输出。通过与放射科医生进行严格的“人机对齐”验证,进一步评价了大型语言模型的判断能力。我们对最先进的14个大型语言模型进行了评估,发现这些模型虽然初步具备感知和推理技能,但临床应用中尚不可靠。这强调了针对医学图像质量评估的需要优化大型语言模型的需要。我们希望MedQ-Bench能够促进进一步的研究,发掘大型语言模型在医学图像质量评估中的潜力。
关键见解
- MedQ-Bench是一个全面的基准测试,旨在解决医学图像质量评估中人工智能与人类评价之间的差距。
- 它结合了感知和推理任务,反映了类似人类的评估过程。
- 基准测试涵盖了多种成像模式和图像质量属性,并提供了丰富的数据用于评估和训练模型。
- 通过多模态大型语言模型进行的推理能力评价揭示了现有模型的不足和需要改进的方向。
- 与放射学专家的人机对齐验证证明了基准测试的严谨性和实用性。
- MedQ-Bench强调了针对医学图像质量评估任务的特定优化对于大型语言模型的重要性。
点此查看论文截图





Touching the tumor boundary: A pilot study on ultrasound based virtual fixtures for breast-conserving surgery
Authors:Laura Connolly, Tamas Ungi, Adnan Munawar, Anton Deguet, Chris Yeung, Russell H. Taylor, Parvin Mousavi, Gabor Fichtinger Keyvan Hashtrudi-Zaad
Purpose: Delineating tumor boundaries during breast-conserving surgery is challenging as tumors are often highly mobile, non-palpable, and have irregularly shaped borders. To address these challenges, we introduce a cooperative robotic guidance system that applies haptic feedback for tumor localization. In this pilot study, we aim to assess if and how this system can be successfully integrated into breast cancer care. Methods: A small haptic robot is retrofitted with an electrocautery blade to operate as a cooperatively controlled surgical tool. Ultrasound and electromagnetic navigation are used to identify the tumor boundaries and position. A forbidden region virtual fixture is imposed when the surgical tool collides with the tumor boundary. We conducted a study where users were asked to resect tumors from breast simulants both with and without the haptic guidance. We then assess the results of these simulated resections both qualitatively and quantitatively. Results: Virtual fixture guidance is shown to improve resection margins. On average, users find the task to be less mentally demanding, frustrating, and effort intensive when haptic feedback is available. We also discovered some unanticipated impacts on surgical workflow that will guide design adjustments and training protocol moving forward. Conclusion: Our results suggest that virtual fixtures can help localize tumor boundaries in simulated breast-conserving surgery. Future work will include an extensive user study to further validate these results and fine-tune our guidance system.
目的:在保乳手术中描绘肿瘤边界是一项挑战,因为肿瘤经常高度移动、不可触及,并且具有不规则形状的边界。为了解决这些挑战,我们引入了一种协作式机器人引导系统,该系统采用触觉反馈来进行肿瘤定位。在本试点研究中,我们的目标是评估此系统能否成功整合到乳腺癌护理中,以及如何实现整合。
方法:一个小型触觉反馈机器人被重新配备电外科手术刀,用作协作控制的手术工具。超声和电磁导航被用来确定肿瘤边界和位置。当手术工具与肿瘤边界碰撞时,会施加一个禁止区域虚拟夹具。我们进行了一项研究,要求参与者对带有和不带触觉引导的乳房模拟物进行肿瘤切除。然后,我们定性和定量地评估这些模拟切除的结果。
结果:虚拟夹具指导被证明可以改善切除边缘。平均而言,当提供触觉反馈时,用户发现任务在精神上、沮丧感和劳动强度上都有所减少。我们还发现了一些对手术工作流程的意外影响,这将指导我们未来的设计调整和培训协议。
论文及项目相关链接
Summary
本研究介绍了一种应用触觉反馈进行肿瘤定位的协作式机器人引导系统,以解决乳腺癌保乳手术中肿瘤边界难以确定的问题。研究通过模拟切除手术评估该系统的效果,发现虚拟夹具引导可以改善切除边缘,减少用户完成任务的认知负荷。但系统中存在一些意外影响,未来将对该系统进行完善和优化。整体研究建议引入更多的实验验证系统效果。
Key Takeaways
- 研究旨在解决乳腺癌保乳手术中肿瘤边界难以确定的问题。
- 采用小型触觉机器人作为合作控制的手术工具,配备电刀进行模拟切除手术。
- 利用超声和电磁导航技术识别肿瘤边界和位置。
- 当手术工具触及肿瘤边界时,系统会实施虚拟夹具约束。
- 模拟切除手术评估结果显示,虚拟夹具引导可以改善切除边缘。
- 用户在有触觉反馈的情况下,认为任务更轻松、不费力且减少了挫败感。
点此查看论文截图




PhraseStereo: The First Open-Vocabulary Stereo Image Segmentation Dataset
Authors:Thomas Campagnolo, Ezio Malis, Philippe Martinet, Gaetan Bahl
Understanding how natural language phrases correspond to specific regions in images is a key challenge in multimodal semantic segmentation. Recent advances in phrase grounding are largely limited to single-view images, neglecting the rich geometric cues available in stereo vision. For this, we introduce PhraseStereo, the first novel dataset that brings phrase-region segmentation to stereo image pairs. PhraseStereo builds upon the PhraseCut dataset by leveraging GenStereo to generate accurate right-view images from existing single-view data, enabling the extension of phrase grounding into the stereo domain. This new setting introduces unique challenges and opportunities for multimodal learning, particularly in leveraging depth cues for more precise and context-aware grounding. By providing stereo image pairs with aligned segmentation masks and phrase annotations, PhraseStereo lays the foundation for future research at the intersection of language, vision, and 3D perception, encouraging the development of models that can reason jointly over semantics and geometry. The PhraseStereo dataset will be released online upon acceptance of this work.
在多模态语义分割中,理解自然语言短语如何对应图像中的特定区域是一个关键挑战。最近的短语定位技术进展主要局限于单目图像,忽视了立体视觉中丰富的几何线索。为此,我们引入了PhraseStereo,这是第一个将短语区域分割带入立体图像对的新型数据集。PhraseStereo基于PhraseCut数据集构建,利用GenStereo通过现有的单目数据生成准确的右视图图像,从而实现短语定位在立体领域的应用。这一新设置引入了多模态学习的独特挑战和机会,特别是在利用深度线索进行更精确和上下文感知的定位方面。通过提供带有对齐分割掩膜和短语注释的立体图像对,PhraseStereo为语言、视觉和3D感知的交叉研究奠定了基础,鼓励开发能够在语义和几何上共同推理的模型。PhraseStereo数据集将在接受本工作后在线发布。
论文及项目相关链接
PDF Accepted to X-Sense Ego-Exo Sensing for Smart Mobility Workshop at ICCV 2025 Conference
Summary
本文介绍了PhraseStereo数据集,该数据集首次将短语区域分割引入立体图像对,利用GenStereo生成准确的右视图图像,实现短语定位向立体领域的扩展。PhraseStereo的设立为跨语言、视觉和三维感知的研究奠定了基础,并鼓励开发能够联合处理语义和几何的模型。该数据集将在论文被接受后在线发布。
Key Takeaways
- PhraseStereo是首个将短语区域分割引入立体图像对的创新数据集。
- 数据集利用GenStereo技术生成准确的右视图图像,将短语定位扩展到立体领域。
- PhraseStereo的设立为跨语言、视觉和三维感知的研究带来独特挑战和机会。
- 数据集强调深度线索在更精确和上下文感知定位中的作用。
- 提供立体图像对、对齐的分割掩膜和短语注释,为联合处理语义和几何的模型的发展打下基础。
- PhraseStereo数据集将在论文被接受后在线发布。
点此查看论文截图




ProtoMask: Segmentation-Guided Prototype Learning
Authors:Steffen Meinert, Philipp Schlinge, Nils Strodthoff, Martin Atzmueller
XAI gained considerable importance in recent years. Methods based on prototypical case-based reasoning have shown a promising improvement in explainability. However, these methods typically rely on additional post-hoc saliency techniques to explain the semantics of learned prototypes. Multiple critiques have been raised about the reliability and quality of such techniques. For this reason, we study the use of prominent image segmentation foundation models to improve the truthfulness of the mapping between embedding and input space. We aim to restrict the computation area of the saliency map to a predefined semantic image patch to reduce the uncertainty of such visualizations. To perceive the information of an entire image, we use the bounding box from each generated segmentation mask to crop the image. Each mask results in an individual input in our novel model architecture named ProtoMask. We conduct experiments on three popular fine-grained classification datasets with a wide set of metrics, providing a detailed overview on explainability characteristics. The comparison with other popular models demonstrates competitive performance and unique explainability features of our model. https://github.com/uos-sis/quanproto
近年来,XAI获得了极大的重视。基于典型范例推理的方法在提高解释性方面显示出巨大的潜力。然而,这些方法通常依赖于额外的后验显著性技术来解释学习到的原型的语义。关于这些技术的可靠性和质量已经提出了多次批评。因此,我们研究了使用重要的图像分割基础模型来提高嵌入和输入空间之间映射的真实性的方法。我们的目标是将显著性图的计算区域限制在预定义的语义图像块内,以减少此类可视化的不确定性。为了感知整个图像的信息,我们使用每个生成的分割掩模的边界框来裁剪图像。每个掩模在我们的名为ProtoMask的新型模型架构中产生单独的输入。我们在三个流行的细粒度分类数据集上进行了实验,使用广泛的度量标准,提供了关于解释性特征的详细概述。与其他流行模型的比较证明了我们的模型的竞争性能和独特的解释性特征。详情请访问https://github.com/uos-sis/quanproto
论文及项目相关链接
Summary
基于原型案例推理的方法在提升解释性方面展现出巨大潜力,但仍需借助额外的后验显著性技术来解释学习到的原型的语义。为提升映射嵌入和输入空间真实性的可靠性,研究使用知名图像分割基础模型。通过限制显著性地图的计算区域至预定义的语义图像块,以减少此类可视化的不确定性。使用整个图像的信息,以分段遮罩生成的边界框来裁剪图像。实验表明,模型在细粒度分类数据集上表现优越,且具备独特的解释性特点。详情请见https://github.com/uos-sis/quanproto。
Key Takeaways
- XAI近年受到重视,基于原型案例推理的方法改善了模型的解释性。
- 现有方法依赖额外的后验显著性技术来解释原型语义,存在可靠性和质量问题。
- 研究利用知名图像分割基础模型提升映射嵌入和输入空间真实性的可靠性。
- 通过限制显著性地图计算区域至预定义的语义图像块,降低可视化不确定性。
- 使用整个图像信息,结合分段遮罩生成的边界框来裁剪图像。
- 实验在多个细粒度分类数据集上进行,展示模型的优越性能和独特解释性特点。
点此查看论文截图







Beyond one-hot encoding? Journey into compact encoding for large multi-class segmentation
Authors:Aaron Kujawa, Thomas Booth, Tom Vercauteren
This work presents novel methods to reduce computational and memory requirements for medical image segmentation with a large number of classes. We curiously observe challenges in maintaining state-of-the-art segmentation performance with all of the explored options. Standard learning-based methods typically employ one-hot encoding of class labels. The computational complexity and memory requirements thus increase linearly with the number of classes. We propose a family of binary encoding approaches instead of one-hot encoding to reduce the computational complexity and memory requirements to logarithmic in the number of classes. In addition to vanilla binary encoding, we investigate the effects of error-correcting output codes (ECOCs), class weighting, hard/soft decoding, class-to-codeword assignment, and label embedding trees. We apply the methods to the use case of whole brain parcellation with 108 classes based on 3D MRI images. While binary encodings have proven efficient in so-called extreme classification problems in computer vision, we faced challenges in reaching state-of-the-art segmentation quality with binary encodings. Compared to one-hot encoding (Dice Similarity Coefficient (DSC) = 82.4 (2.8)), we report reduced segmentation performance with the binary segmentation approaches, achieving DSCs in the range from 39.3 to 73.8. Informative negative results all too often go unpublished. We hope that this work inspires future research of compact encoding strategies for large multi-class segmentation tasks.
本文介绍了一些减少医学图像分割大规模类别所需的计算量和内存使用的新型方法。我们在所有探索的选项中都观察到保持前沿分割性能的挑战。基于学习的方法通常使用类别标签的一热编码。因此,计算复杂性和内存需求会随着类别的数量线性增加。我们提出了一系列二进制编码方法,代替一热编码,以减少计算复杂性和内存需求,使其对数增长在类别数量中。除了基本的二进制编码外,我们还研究了纠错输出码(ECOCs)、类别权重、硬/软解码、类别到码字的分配和标签嵌入树的影响。我们将这些方法应用于基于3D MRI图像的108类全脑细分用例。虽然二进制编码在计算机视觉中所谓的极端分类问题中证明是有效的,但在使用二进制编码达到前沿分割质量时我们面临挑战。与一热编码(Dice相似系数(DSC)= 82.4(2.8))相比,我们报告的二进制分割方法的分割性能有所下降,DSC范围在39.3到73.8之间。信息性的负面结果往往未公开发表。我们希望这项工作能激发未来针对大规模多类分割任务进行紧凑编码策略的研究。
论文及项目相关链接
PDF Presented at EMA4MICCAI 2025 Workshop
Summary
本文提出了一系列减少医学图像分割计算量和内存需求的新方法,尤其针对具有大量类别的分割任务。文章介绍了使用二进制编码代替传统的独热编码(one-hot encoding)来降低计算复杂性和内存需求的策略,并在全脑细分(包含108类)的应用场景下实践这些方法。尽管二进制编码在极端分类问题中表现出效率,但在医学图像分割任务中达到一流分割质量方面仍面临挑战。
Key Takeaways
- 医学图像分割任务中,处理大量类别时面临计算复杂性和内存需求的挑战。
- 提出使用二进制编码策略替代传统的独热编码,以降低计算复杂性和内存需求。
- 对比了二进制编码与独热编码在医学图像分割中的性能表现。
- 二进制编码在极端分类问题中表现出效率,但在医学图像分割任务中达到一流性能具有挑战。
- 采用多种策略改进二进制编码方法,包括错误修正输出码(ECOCs)、类别权重、硬/软解码等。
- 在全脑细分的应用场景下实践这些方法,并展示了具体结果。
点此查看论文截图


Multi-Domain Brain Vessel Segmentation Through Feature Disentanglement
Authors:Francesco Galati, Daniele Falcetta, Rosa Cortese, Ferran Prados, Ninon Burgos, Maria A. Zuluaga
The intricate morphology of brain vessels poses significant challenges for automatic segmentation models, which usually focus on a single imaging modality. However, accurately treating brain-related conditions requires a comprehensive understanding of the cerebrovascular tree, regardless of the specific acquisition procedure. Our framework effectively segments brain arteries and veins in various datasets through image-to-image translation while avoiding domain-specific model design and data harmonization between the source and the target domain. This is accomplished by employing disentanglement techniques to independently manipulate different image properties, allowing them to move from one domain to another in a label-preserving manner. Specifically, we focus on manipulating vessel appearances during adaptation while preserving spatial information, such as shapes and locations, which are crucial for correct segmentation. Our evaluation effectively bridges large and varied domain gaps across medical centers, image modalities, and vessel types. Additionally, we conduct ablation studies on the optimal number of required annotations and other architectural choices. The results highlight our framework’s robustness and versatility, demonstrating the potential of domain adaptation methodologies to perform cerebrovascular image segmentation in multiple scenarios accurately. Our code is available at https://github.com/i-vesseg/MultiVesSeg.
大脑血管的复杂形态给自动分割模型带来了巨大的挑战,通常这些模型只专注于单一的成像模式。然而,准确地治疗与大脑相关的疾病需要对脑血管树有一个全面的了解,无论采用何种特定的采集程序。我们的框架能够通过图像到图像的翻译,在各种数据集中有效地分割脑动脉和静脉。这一过程中避免了针对特定领域的模型设计和源域与目标域之间的数据调和。这是通过采用解纠缠技术独立操作不同的图像属性来实现的,使它们能够以保留标签的方式从一个域转移到另一个域。具体来说,我们在适应过程中专注于操作血管的外观,同时保留空间信息,如形状和位置,这对于正确的分割至关重要。我们的评估有效地弥合了跨医学中心、图像模态和血管类型的大型且多样化的领域差距。此外,我们还进行了关于所需注释的最佳数量和其它架构选择的消融研究。结果突出了我们框架的稳健性和通用性,证明了域适应方法在多种场景下准确进行脑血管图像分割的潜力。我们的代码可在 https://github.com/i-vesseg/MultiVesSeg 找到。
论文及项目相关链接
PDF 19 pages, 7 figures, 3 tables. Joint first authors: Francesco Galati and Daniele Falcetta. Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2025:021. Code available at https://github.com/i-vesseg/MultiVesSeg
Summary
该文本介绍了一个框架,能有效跨越不同数据集对脑动脉和静脉进行分割,通过图像到图像的转换实现,无需针对特定领域设计模型和调和源域与目标域的数据。采用解纠缠技术独立操作不同图像属性,在保持标签的同时,让图像属性从一个领域转移到另一个领域。该框架在适应过程中重点关注血管外观,同时保留形状和位置等空间信息,对正确分割至关重要。评估结果有效跨越了不同医疗中心、图像模态和血管类型的领域差距。
Key Takeaways
- 框架可实现脑动脉和静脉在不同数据集中的自动分割。
- 通过图像到图像的翻译实现分割,无需特定领域模型设计和数据调和。
- 采用解纠缠技术操纵图像属性,实现领域间转移并保留标签。
- 框架在适应过程中关注血管外观,同时保留形状和位置等关键空间信息。
- 评估结果有效跨越了医疗中心、图像模态和血管类型的差异。
- 框架具有鲁棒性和多功能性。
点此查看论文截图



U-DFA: A Unified DINOv2-Unet with Dual Fusion Attention for Multi-Dataset Medical Segmentation
Authors:Zulkaif Sajjad, Furqan Shaukat, Junaid Mir
Accurate medical image segmentation plays a crucial role in overall diagnosis and is one of the most essential tasks in the diagnostic pipeline. CNN-based models, despite their extensive use, suffer from a local receptive field and fail to capture the global context. A common approach that combines CNNs with transformers attempts to bridge this gap but fails to effectively fuse the local and global features. With the recent emergence of VLMs and foundation models, they have been adapted for downstream medical imaging tasks; however, they suffer from an inherent domain gap and high computational cost. To this end, we propose U-DFA, a unified DINOv2-Unet encoder-decoder architecture that integrates a novel Local-Global Fusion Adapter (LGFA) to enhance segmentation performance. LGFA modules inject spatial features from a CNN-based Spatial Pattern Adapter (SPA) module into frozen DINOv2 blocks at multiple stages, enabling effective fusion of high-level semantic and spatial features. Our method achieves state-of-the-art performance on the Synapse and ACDC datasets with only 33% of the trainable model parameters. These results demonstrate that U-DFA is a robust and scalable framework for medical image segmentation across multiple modalities.
精确的医疗图像分割在整体诊断中起着至关重要的作用,是诊断流程中最基本的任务之一。虽然基于CNN的模型得到了广泛应用,但它们存在局部感受野的问题,无法捕捉全局上下文。一种将CNN与Transformer结合起来的常见方法试图弥补这一差距,但未能有效地融合局部和全局特征。随着最近视觉语言模型(VLMs)和基础模型的兴起,它们已被应用于下游医学成像任务;然而,它们存在固有的领域差距和高计算成本的问题。为此,我们提出了U-DFA,这是一种统一的DINOv2-Unet编码器解码器架构,它集成了新型局部全局融合适配器(LGFA)以增强分割性能。LGFA模块将来自基于CNN的空间模式适配器(SPA)模块的空间特征注入到多个阶段的冻结DINOv2块中,实现了高级语义和空间特征的有效融合。我们的方法在Synapse和ACDC数据集上达到了最先进的性能,且只有33%的可训练模型参数。这些结果表明,U-DFA是一个稳健且可扩展的跨多模态医疗图像分割框架。
论文及项目相关链接
Summary
医学图像分割的准确性对整体诊断至关重要,是最基本的诊断任务之一。CNN模型虽然广泛应用,但存在局部感受野的问题,无法捕捉全局上下文信息。结合CNN与transformer的常规方法试图弥补这一差距,但未能有效融合局部和全局特征。新兴VLMs和预训练模型虽被用于医学成像任务,但存在领域差距和计算成本高的问题。为此,我们提出U-DFA,一种基于DINOv2-Unet的编解码器架构,集成新型Local-Global Fusion Adapter(LGFA)增强分割性能。LGFA模块将CNN的Spatial Pattern Adapter(SPA)模块的空间特征注入到冻结的DINOv2块中,实现高级语义和空间特征的融合。我们的方法在Synapse和ACDC数据集上实现最佳性能,仅使用33%的可训练模型参数。证明U-DFA是跨多模态医学图像分割的稳健且可扩展框架。
Key Takeaways
- 医学图像分割对诊断至关重要,是基本诊断任务之一。
- CNN模型存在局部感受野问题,无法捕捉全局上下文信息。
- 结合CNN与transformer的方法未能有效融合局部和全局特征。
- VLMs和预训练模型在医学成像任务中存在领域差距和计算成本高的问题。
- U-DFA是一个基于DINOv2-Unet的编解码器架构,集成LGFA模块以增强分割性能。
- LGFA模块实现了高级语义和空间特征的融合。
- U-DFA方法在Synapse和ACDC数据集上实现最佳性能。
点此查看论文截图




Automated Structured Radiology Report Generation with Rich Clinical Context
Authors:Seongjae Kang, Dong Bok Lee, Juho Jung, Dongseop Kim, Won Hwa Kim, Sunghoon Joo
Automated structured radiology report generation (SRRG) from chest X-ray images offers significant potential to reduce workload of radiologists by generating reports in structured formats that ensure clarity, consistency, and adherence to clinical reporting standards. While radiologists effectively utilize available clinical contexts in their diagnostic reasoning, existing SRRG systems overlook these essential elements. This fundamental gap leads to critical problems including temporal hallucinations when referencing non-existent clinical contexts. To address these limitations, we propose contextualized SRRG (C-SRRG) that comprehensively incorporates rich clinical context for SRRG. We curate C-SRRG dataset by integrating comprehensive clinical context encompassing 1) multi-view X-ray images, 2) clinical indication, 3) imaging techniques, and 4) prior studies with corresponding comparisons based on patient histories. Through extensive benchmarking with state-of-the-art multimodal large language models, we demonstrate that incorporating clinical context with the proposed C-SRRG significantly improves report generation quality. We publicly release dataset, code, and checkpoints to facilitate future research for clinically-aligned automated RRG at https://github.com/vuno/contextualized-srrg.
从胸部X射线图像中自动生成结构化放射学报告(SRRG)具有显著潜力,通过生成结构化格式的报告,可以减少放射科医生的工作量,同时确保报告的清晰性、一致性和符合临床报告标准。虽然放射科医生在诊断推理中会有效地利用可用的临床背景,但现有的SRRG系统却忽视了这些关键要素。这一基本差距导致了当参考不存在的临床背景时出现时间错觉等关键问题。为了解决这些局限性,我们提出了融入丰富临床背景的上下文结构化放射学报告生成(C-SRRG)。我们通过整合丰富的临床背景来构建C-SRRG数据集,包括:1)多视角X射线图像,2)临床表现,3)成像技术,以及4)基于病人历史的相应对比的先前研究。通过与最新先进的多模式大型语言模型进行广泛基准测试,我们证明,结合所提出的C-SRRG的临床背景可以显著提高报告生成质量。为了方便未来与临床对齐的自动化RRG研究,我们在https://github.com/vuno/contextualized-srrg上公开发布了数据集、代码和检查点。
论文及项目相关链接
PDF 34 pages, 30 figures, preprint
Summary
自动化结构放射学报告生成(SRRG)从胸部X光图像中具有显著潜力,通过生成结构化报告,减轻放射科医生的工作量,并确保报告的清晰度、一致性和符合临床报告标准。现有SRRG系统忽略了临床上下文这一关键元素,导致出现引用不存在的临床上下文等问题。为解决这些局限性,我们提出融入丰富临床上下文的上下文化SRRG(C-SRRG)。我们通过整合多视角X光图像、临床指示、成像技术和基于患者病史的先前研究及相应对比,构建了C-SRRG数据集。通过与国家最先进的多媒体语言模型进行广泛基准测试,我们证明融入临床上下文后的C-SRRG在报告生成质量上显著提高。我们公开发布数据集、代码和检查点,以促进临床对齐的自动化RRG的进一步研究。
Key Takeaways
- 自动化结构放射学报告生成(SRRG)可以从胸部X光图像中减轻放射科医生的工作量,提高报告的清晰度、一致性和符合临床报告标准。
- 现有SRRG系统忽略了临床上下文的重要性,导致报告质量受限。
- 上下文化SRRG(C-SRRG)旨在解决现有问题,全面融入丰富的临床上下文信息。
- C-SRRG数据集整合了多视角X光图像、临床指示、成像技术和患者病史等资料。
- 通过与国家最先进的多媒体语言模型进行基准测试,证明融入临床上下文后C-SRRG的报告生成质量显著提高。
- 数据集、代码和检查点已公开发布,以便进行进一步的研究。
点此查看论文截图






Improving Virtual Contrast Enhancement using Longitudinal Data
Authors:Pierre Fayolle, Alexandre Bône, Noëlie Debs, Philippe Robert, Pascal Bourdon, Remy Guillevin, David Helbert
Gadolinium-based contrast agents (GBCAs) are widely used in magnetic resonance imaging (MRI) to enhance lesion detection and characterisation, particularly in the field of neuro-oncology. Nevertheless, concerns regarding gadolinium retention and accumulation in brain and body tissues, most notably for diseases that require close monitoring and frequent GBCA injection, have led to the need for strategies to reduce dosage. In this study, a deep learning framework is proposed for the virtual contrast enhancement of full-dose post-contrast T1-weighted MRI images from corresponding low-dose acquisitions. The contribution of the presented model is its utilisation of longitudinal information, which is achieved by incorporating a prior full-dose MRI examination from the same patient. A comparative evaluation against a non-longitudinal single session model demonstrated that the longitudinal approach significantly improves image quality across multiple reconstruction metrics. Furthermore, experiments with varying simulated contrast doses confirmed the robustness of the proposed method. These results emphasize the potential of integrating prior imaging history into deep learning-based virtual contrast enhancement pipelines to reduce GBCA usage without compromising diagnostic utility, thus paving the way for safer, more sustainable longitudinal monitoring in clinical MRI practice.
钆基造影剂(GBCAs)在磁共振成像(MRI)中广泛应用,特别是在神经肿瘤学领域,用于增强病变的检测和特征分析。然而,关于在大脑和身体组织中钆的保留和积聚的担忧,特别是在需要密切监测和频繁注射GBCA的疾病中,已经引发了减少剂量的需求。本研究提出了一种深度学习框架,用于从相应的低剂量采集中对全剂量对比剂后的T1加权MRI图像进行虚拟对比度增强。所提出模型的贡献在于其利用了纵向信息,这是通过结合来自同一患者的先前全剂量MRI检查来实现的。与非纵向单会话模型的比较评估表明,纵向方法在多个重建指标上显著提高了图像质量。此外,使用不同模拟对比剂剂量的实验证实了所提出方法的稳健性。这些结果强调了将先前成像历史融入基于深度学习的虚拟对比度增强管道中的潜力,可以在不损害诊断效用的情况下减少GBCA的使用,从而为临床MRI实践中更安全、更可持续的纵向监测铺平道路。
论文及项目相关链接
PDF 11 pages, 4 figures, Workshop MICCAI 2025 - Learning with Longitudinal Medical Images and Data
Summary
本文介绍了使用深度学习技术,通过利用患者的纵向信息,对低剂量MRI图像进行虚拟对比增强,以减少钆基造影剂的使用量。实验证明,该方法能显著提高图像质量,并具备在不同模拟对比剂量下的稳健性。这为临床MRI实践中更安全、更可持续的纵向监测铺平了道路。
Key Takeaways
- 钆基造影剂(GBCAs)在核磁共振成像(MRI)中广泛应用,但保留和积累在大脑和人体组织中的问题引发了减少用量的需求。
- 研究提出了一种深度学习框架,通过利用患者的纵向信息,从对应的低剂量图像中虚拟增强全剂量对比后的T1加权MRI图像。
- 该模型通过融入患者先前的全剂量MRI检查来实现纵向信息利用。
- 与非纵向的单次会话模型相比,纵向方法显著提高了图像质量,体现在多种重建指标上。
- 通过不同模拟对比剂量的实验,验证了所提出方法的稳健性。
- 整合先前成像历史进入深度学习虚拟对比增强流程,可在不损害诊断效用的情况下减少GBCA的使用。
点此查看论文截图




Domain-Specialized Interactive Segmentation Framework for Meningioma Radiotherapy Planning
Authors:Junhyeok Lee, Han Jang, Kyu Sung Choi
Precise delineation of meningiomas is crucial for effective radiotherapy (RT) planning, directly influencing treatment efficacy and preservation of adjacent healthy tissues. While automated deep learning approaches have demonstrated considerable potential, achieving consistently accurate clinical segmentation remains challenging due to tumor heterogeneity. Interactive Medical Image Segmentation (IMIS) addresses this challenge by integrating advanced AI techniques with clinical input. However, generic segmentation tools, despite widespread applicability, often lack the specificity required for clinically critical and disease-specific tasks like meningioma RT planning. To overcome these limitations, we introduce Interactive-MEN-RT, a dedicated IMIS tool specifically developed for clinician-assisted 3D meningioma segmentation in RT workflows. The system incorporates multiple clinically relevant interaction methods, including point annotations, bounding boxes, lasso tools, and scribbles, enhancing usability and clinical precision. In our evaluation involving 500 contrast-enhanced T1-weighted MRI scans from the BraTS 2025 Meningioma RT Segmentation Challenge, Interactive-MEN-RT demonstrated substantial improvement compared to other segmentation methods, achieving Dice similarity coefficients of up to 77.6% and Intersection over Union scores of 64.8%. These results emphasize the need for clinically tailored segmentation solutions in critical applications such as meningioma RT planning. The code is publicly available at: https://github.com/snuh-rad-aicon/Interactive-MEN-RT
脑膜瘤的精确轮廓描绘对于有效的放射治疗(RT)计划至关重要,直接影响治疗效果和邻近健康组织的保护。虽然自动化深度学习的方法已经显示出巨大的潜力,但由于肿瘤的异质性,实现一致的准确临床分割仍然具有挑战性。交互式医学图像分割(IMIS)通过整合先进的AI技术和临床输入来解决这一挑战。然而,通用分割工具虽然应用广泛,但往往缺乏用于临床关键和特定疾病任务所需的特异性,如脑膜瘤RT规划。为了克服这些限制,我们推出了Interactive-MEN-RT,这是一款专门为临床医生辅助RT工作流程中的3D脑膜瘤分割而开发的专用IMIS工具。该系统结合了多种临床上相关的交互方法,包括点注释、边界框、套索工具和涂鸦,增强了可用性和临床精确度。在我们的评估中,我们使用了来自BraTS 2025脑膜瘤RT分割挑战的500张对比增强的T1加权MRI扫描。相较于其他分割方法,Interactive-MEN-RT表现出显著的改进,狄克相似系数最高达到77.6%,交并比分数为64.8%。这些结果强调了在关键应用如脑膜瘤RT规划中,需要量身定制的临床分割解决方案。代码公开可用在:https://github.com/snuh-rad-aicon/Interactive-MEN-RT
论文及项目相关链接
PDF Clinical Image-Based Procedures (CLIP 2025), MICCAI 2025 Workshop
Summary
本文介绍了一种名为Interactive-MEN-RT的交互式医学图像分割工具,该工具专为临床医生辅助的三维脑膜瘤放射治疗计划分割而开发。通过集成先进的AI技术和临床输入,解决了通用分割工具缺乏特定疾病应用特异性的问题。在涉及BraTS 2025脑膜瘤RT分割挑战的500个增强T1加权MRI扫描的评估中,Interactive-MEN-RT相较于其他分割方法表现出显著改善,实现了高达77.6%的Dice相似系数和64.8%的交集比。这表明在临床应用如脑膜瘤RT规划中需要针对性的分割解决方案。
Key Takeaways
- 脑膜瘤的精确分割对放射治疗计划至关重要,影响治疗效果和邻近健康组织的保护。
- 自动化深度学习方法在医学图像分割中有潜力,但实现准确临床分割仍然具有挑战性,主要由于肿瘤异质性。
- 交互式医学图像分割(IMIS)通过整合先进AI技术和临床输入来解决这一挑战。
- 通用分割工具缺乏针对特定疾病如脑膜瘤RT规划的临床特异性需求。
- Interactive-MEN-RT是一个专门为临床医生辅助的脑膜瘤放射治疗计划中的三维分割开发的IMIS工具。
- 该系统结合了多种临床相关的交互方法,如点注释、边界框、拉索工具和涂鸦,提高了可用性和临床精度。
点此查看论文截图




Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
Authors:Lei Tong, Zhihua Liu, Chaochao Lu, Dino Oglic, Tom Diethe, Philip Teare, Sotirios A. Tsaftaris, Chen Jin
We present Causal-Adapter, a modular framework that adapts frozen text-to-image diffusion backbones for counterfactual image generation. Our method enables causal interventions on target attributes, consistently propagating their effects to causal dependents without altering the core identity of the image. In contrast to prior approaches that rely on prompt engineering without explicit causal structure, Causal-Adapter leverages structural causal modeling augmented with two attribute regularization strategies: prompt-aligned injection, which aligns causal attributes with textual embeddings for precise semantic control, and a conditioned token contrastive loss to disentangle attribute factors and reduce spurious correlations. Causal-Adapter achieves state-of-the-art performance on both synthetic and real-world datasets, with up to 91% MAE reduction on Pendulum for accurate attribute control and 87% FID reduction on ADNI for high-fidelity MRI image generation. These results show that our approach enables robust, generalizable counterfactual editing with faithful attribute modification and strong identity preservation.
我们提出了因果适配器(Causal-Adapter)这一模块化框架,该框架旨在适应冻结的文本到图像扩散模型进行反事实图像生成。我们的方法能够在目标属性上实施因果干预,并始终如一地将它们的影响传播到因果依赖项,而不会改变图像的核心身份。与依赖提示工程而不具备明确因果结构的前期方法相比,因果适配器利用结构因果建模并辅以两种属性正则化策略:提示对齐注入,该策略将因果属性与文本嵌入进行对齐以精确语义控制;以及条件令牌对比损失,以分离属性因素并减少伪相关性。因果适配器在合成数据集和现实世界数据集上都实现了最先进的性能,在摆锤数据集上实现了高达91%的平均绝对误差(MAE)降低以实现精确的属性控制,在ADNI数据集上实现了高达87%的弗雷歇特惯性距离(FID)降低以实现高保真磁共振成像图像生成。这些结果表明,我们的方法能够实现稳健、通用的反事实编辑,具有可靠的属性修改和强大的身份保留能力。
论文及项目相关链接
PDF 9 pages, 26 figures
Summary
Causal-Adapter是一个模块化框架,用于适应冻结的文本到图像扩散主干,以进行反事实图像生成。通过因果干预目标属性,该框架能在不影响图像核心身份的情况下,将效应一致地传播到因果依赖项。不同于依赖提示工程且无明确因果结构的方法,Causal-Adapter结合了结构因果建模及两种属性正则化策略:与文本嵌入对齐的注入法用于精确语义控制,条件令牌对比损失法用于分解属性因素并减少偶然关联。Causal-Adapter在合成和真实世界数据集上均取得卓越表现,实现了精准的属性控制和身份保留。
Key Takeaways
- Causal-Adapter是一个用于反事实图像生成的模块化框架,适应冻结的文本到图像扩散模型。
- 该框架能通过因果干预目标属性,一致地影响因果依赖项,同时保持图像的核心身份不变。
- 与依赖提示工程且无明确因果结构的方法不同,Causal-Adapter结合结构因果建模。
- Causal-Adapter采用两种属性正则化策略:与文本嵌入对齐的注入法和条件令牌对比损失法。
- 对称法实现了精准的属性控制,通过结构因果建模,能够在保持图像身份的同时修改属性。
- Causal-Adapter在合成和真实世界数据集上表现出卓越的性能,特别是在属性控制和身份保留方面。
点此查看论文截图





RIFLE: Removal of Image Flicker-Banding via Latent Diffusion Enhancement
Authors:Libo Zhu, Zihan Zhou, Xiaoyang Liu, Weihang Zhang, Keyu Shi, Yifan Fu, Yulun Zhang
Capturing screens is now routine in our everyday lives. But the photographs of emissive displays are often influenced by the flicker-banding (FB), which is alternating bright%u2013dark stripes that arise from temporal aliasing between a camera’s rolling-shutter readout and the display’s brightness modulation. Unlike moire degradation, which has been extensively studied, the FB remains underexplored despite its frequent and severe impact on readability and perceived quality. We formulate FB removal as a dedicated restoration task and introduce Removal of Image Flicker-Banding via Latent Diffusion Enhancement, RIFLE, a diffusion-based framework designed to remove FB while preserving fine details. We propose the flicker-banding prior estimator (FPE) that predicts key banding attributes and injects it into the restoration network. Additionally, Masked Loss (ML) is proposed to concentrate supervision on banded regions without sacrificing global fidelity. To overcome data scarcity, we provide a simulation pipeline that synthesizes FB in the luminance domain with stochastic jitter in banding angle, banding spacing, and banding width. Feathered boundaries and sensor noise are also applied for a more realistic simulation. For evaluation, we collect a paired real-world FB dataset with pixel-aligned banding-free references captured via long exposure. Across quantitative metrics and visual comparisons on our real-world dataset, RIFLE consistently outperforms recent image reconstruction baselines from mild to severe flicker-banding. To the best of our knowledge, it is the first work to research the simulation and removal of FB. Our work establishes a great foundation for subsequent research in both the dataset construction and the removal model design. Our dataset and code will be released soon.
屏幕截图现在已经成为了我们日常生活中的常规操作。然而,发光显示器的照片往往受到频闪带状现象(FB)的影响,该现象是由于相机滚动快门读出与显示器亮度调制之间的时间混叠而产生的明暗交替条纹。与摩尔纹退化(已被广泛研究)不同,尽管频闪带状现象对可读性和感知质量造成频繁且严重的影响,但其研究仍然不足。我们将频闪带状现象的去除制定为专门的恢复任务,并引入了通过潜在扩散增强去除图像频闪带状现象(RIFLE),这是一个基于扩散的框架,旨在去除频闪带状现象同时保留细节。我们提出了频闪带状先验估计器(FPE),它预测关键的带状属性并将其注入恢复网络。此外,还提出了掩膜损失(ML),以将监督集中在带状区域上,而不牺牲全局保真度。为了克服数据稀缺的问题,我们提供了一个合成频闪带状现象的仿真流程,该流程在亮度域中合成频闪带状现象,包括带状角度、带状间距和带状宽度的随机抖动。还应用了柔和边界和传感器噪声以模拟更真实的情况。为了评估,我们收集了一对真实世界的频闪带状现象数据集,通过长时间曝光捕捉与频闪带状现象相匹配的像素无频闪参考图像。在我们的真实世界数据集上,无论是定量指标还是视觉比较,RIFLE在轻微到严重的频闪带状现象情况下均优于最近的图像重建基线。据我们所知,它是第一项研究频闪带状现象的模拟和去除的工作。我们的工作为后续研究在数据集构建和去除模型设计方面奠定了坚实的基础。我们的数据集和代码将很快发布。
论文及项目相关链接
摘要
本文研究了屏幕截图中的闪烁条纹(FB)问题,提出一种基于扩散的框架RIFLE,用于去除FB同时保留细节。文章介绍了闪烁条纹的先验估计器FPE和Masked Loss(ML),以提高去除效果。为克服数据缺乏,提出了一种合成FB的仿真管道。此外,文章建立了真实世界的FB数据集并进行了评估。该研究为屏幕截图的FB问题处理奠定了基础。
关键见解
- 闪烁条纹(FB)是一种常见的屏幕截图问题,影响可读性和感知质量。
- RIFLE是首个针对FB去除的扩散框架,能有效去除FB同时保留细节。
- FPE(闪烁条纹先验估计器)预测关键条纹属性并注入修复网络以提高效果。
- Masked Loss(ML)集中于条纹区域的监督,不影响全局保真度。
- 提出了一种合成FB的仿真管道,通过随机抖动条纹角度、间距和宽度来模拟真实情况。
- 建立了真实世界的FB数据集并进行评估,RIFLE在定量指标和视觉比较上都表现出优异性能。
- 目前的研究为后续的数据集构建和去除模型设计奠定了基础。
点此查看论文截图





UltraUPConvNet: A UPerNet- and ConvNeXt-Based Multi-Task Network for Ultrasound Tissue Segmentation and Disease Prediction
Authors:Zhi Chen, Le Zhang
Ultrasound imaging is widely used in clinical practice due to its cost-effectiveness, mobility, and safety. However, current AI research often treats disease prediction and tissue segmentation as two separate tasks and their model requires substantial computational overhead. In such a situation, we introduce UltraUPConvNet, a computationally efficient universal framework designed for both ultrasound image classification and segmentation. Trained on a large-scale dataset containing more than 9,700 annotations across seven different anatomical regions, our model achieves state-of-the-art performance on certain datasets with lower computational overhead. Our model weights and codes are available at https://github.com/yyxl123/UltraUPConvNet
超声成像因其成本效益、移动性和安全性而在临床实践中得到广泛应用。然而,当前的AI研究通常将疾病预测和图像分割视为两个独立的任务,其模型需要大量的计算开销。针对这种情况,我们推出了UltraUPConvNet,这是一个为超声图像分类和分割而设计的计算效率高的通用框架。该模型经过包含七个不同解剖区域超过9700个注释的大规模数据集的训练,在某些数据集上取得了最先进的性能表现,并且降低了计算开销。我们的模型权重和代码可在https://github.com/yyxl123/UltraUPConvNet找到。
论文及项目相关链接
PDF 8 pages
Summary:超声成像因其成本效益、移动性和安全性而在临床实践中得到广泛应用。然而,当前的人工智能研究往往将疾病预测和图像分割视为两个独立的任务,需要大量的计算资源。为解决这一问题,我们提出了UltraUPConvNet,这是一个计算效率高、可用于超声图像分类和分割的通用框架。该模型在包含超过9700个注释的大型数据集上进行训练,覆盖了七个不同的解剖区域,以较低的计算开销在特定数据集上取得了最先进的性能表现。
Key Takeaways:
- 超声成像在临床实践中广泛应用,因其成本效益、移动性和安全性。
- 当前人工智能研究在处理超声图像时,疾病预测和图像分割被视为两个独立任务。
- UltraUPConvNet是一个计算效率高的通用框架,可用于超声图像分类和分割。
- UltraUPConvNet模型在包含超过9700个注释的大型数据集上进行训练,覆盖七个不同解剖区域。
- 该模型以较低的计算开销在特定数据集上取得了最先进的性能表现。
- UltraUPConvNet模型的权重和代码已公开在GitHub上分享。
点此查看论文截图




Enhancing Corpus Callosum Segmentation in Fetal MRI via Pathology-Informed Domain Randomization
Authors:Marina Grifell i Plana, Vladyslav Zalevskyi, Léa Schmidt, Yvan Gomez, Thomas Sanchez, Vincent Dunet, Mériam Koob, Vanessa Siffredi, Meritxell Bach Cuadra
Accurate fetal brain segmentation is crucial for extracting biomarkers and assessing neurodevelopment, especially in conditions such as corpus callosum dysgenesis (CCD), which can induce drastic anatomical changes. However, the rarity of CCD severely limits annotated data, hindering the generalization of deep learning models. To address this, we propose a pathology-informed domain randomization strategy that embeds prior knowledge of CCD manifestations into a synthetic data generation pipeline. By simulating diverse brain alterations from healthy data alone, our approach enables robust segmentation without requiring pathological annotations. We validate our method on a cohort comprising 248 healthy fetuses, 26 with CCD, and 47 with other brain pathologies, achieving substantial improvements on CCD cases while maintaining performance on both healthy fetuses and those with other pathologies. From the predicted segmentations, we derive clinically relevant biomarkers, such as corpus callosum length (LCC) and volume, and show their utility in distinguishing CCD subtypes. Our pathology-informed augmentation reduces the LCC estimation error from 1.89 mm to 0.80 mm in healthy cases and from 10.9 mm to 0.7 mm in CCD cases. Beyond these quantitative gains, our approach yields segmentations with improved topological consistency relative to available ground truth, enabling more reliable shape-based analyses. Overall, this work demonstrates that incorporating domain-specific anatomical priors into synthetic data pipelines can effectively mitigate data scarcity and enhance analysis of rare but clinically significant malformations.
精确的胎儿大脑分割对于提取生物标记物和评估神经发育至关重要,特别是在诸如胼胝体发育不良(CCD)等情况下,它可能导致解剖结构发生剧烈变化。然而,CCD的罕见性严重限制了标注数据的可用性,阻碍了深度学习模型的普及。为了解决这一问题,我们提出了一种基于病理知识的领域随机化策略,该策略将有关CCD表现的先验知识嵌入到合成数据生成流程中。通过仅从健康数据中模拟各种大脑变化,我们的方法能够在无需病理注释的情况下实现稳健的分割。
论文及项目相关链接
PDF Presented at the PIPPI Workshop of MICCAI 2025
Summary
本文提出一种基于病理知识的领域随机化策略,用于胎儿脑部分割。该方法将关于脑裂发育不全(CCD)的先验知识嵌入合成数据生成流程中,通过模拟各种脑部变化进行模型训练。这种基于合成数据的方法能在无特定疾病标记数据集的情况下实现对CCD等脑部异常结构的鲁棒性分割。实验结果显示,该方法在健康胎儿、脑裂发育不全和其他脑部疾病患者群体中均表现出显著性能提升,并能从预测分割中提取出重要的临床指标如脑裂长度和体积。这一研究策略有效减轻了罕见病例数据的缺乏,增强了复杂病例的评估能力。
Key Takeaways
- 提出了一种病理知识指导的域随机化策略来解决罕见疾病数据稀缺的问题。
- 通过合成数据模拟多种脑部变化进行模型训练,实现对罕见病例的鲁棒性分割。
- 方法在健康胎儿和脑裂发育不全等脑部疾病患者群体中均表现出显著性能提升。
- 能够从预测分割中提取出重要的临床指标如脑裂长度和体积,为诊断提供关键参考。
- 提高了对罕见病例分析的可靠性,尤其是形状分析方面的改进尤为显著。
点此查看论文截图


