⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-09-13 更新
Multiwavelength observations of a new black-widow millisecond pulsar PSR J1544-2555
Authors:Sergio Belmonte Diaz, Tinn Thingmeearkom, Adipol Phosrisom, Rene Breton, Marta Burgay, Colin Clark, Lars Nieder, Martin Mayer, Werner Becker, Ewann Barr, Sarah Buchner, Kaustav Kashyap Das, Vik Dhillon, Oliver Dodge, Elizabeth Ferrara, Jean-Mathias Griessmeier, Ramesh Karuppusamy, Mark Kennedy, Michael Kramer, Prajwal Padmanabh, John Paice, Antonio Rodriguez, Ben Stappers
We report the discovery of a new black-widow millisecond pulsar, PSR J1544-2555, associated with the Fermi-LAT source 4FGL J1544.2-2554. Optical, radio, and gamma-ray observations confirmed its nature as a compact spider binary system. Optical photometry from ULTRACAM revealed a (\sim)2.7-hour orbital period, guiding MeerKAT observations that detected (\sim)2.4-ms radio pulsations. Subsequent timing campaigns using the Murriyang Parkes Telescope, the Effelsberg 100-m Radio Telescope, and the Nan\c{c}ay Radio Telescope allowed us to obtain a preliminary timing solution, which enabled us to find gamma-ray pulsations. The final timing solution, spanning 16 years of Fermi-LAT gamma-ray data, also displays orbital period variations typical of spider pulsars. X-ray observations from eROSITA indicate non-thermal emission, but the relatively low count rate prohibits the search for X-ray pulsations. Optical light curve modelling using Icarus suggests the asymmetry is best explained by a spot model, where uneven heating creates localised temperature variations on the companion. While the optical spectra we obtained are compatible with the physical properties we infer for the companion star, they were not of sufficient signal-to-noise to allow for radial velocity measurements, thus limiting constraints on the neutron star’s mass. The observed bluer colour near the light curve minimum suggests possible non-thermal emission from intra-binary shocks, supported by the presence of an X-ray source. This discovery exemplifies the proven capability of the Fermi-LAT catalogue in identifying millisecond pulsar candidates and highlights the role of optical surveys in detecting variable sources suitable for radio follow-up.
我们报告了一项新发现:一颗与费米大视场望远镜源(Fermi-LAT源)相关的黑寡妇毫秒脉冲星PSR J1544-2555。通过光学、射电和伽马射线的观测,我们确认其为一个紧凑的蜘蛛双星系统。ULTRACAM的光学摄影术揭示了一个约2.7小时的轨道周期,为MeerKAT观测检测到的约2.4毫秒的射电脉冲提供了指导。随后使用Murriyang Parkes望远镜、埃费尔斯贝格100米射电望远镜和南赛射电望远镜的时间测量活动让我们获得了初步的时间解算方案,这使我们能够发现伽马射线脉冲。跨越16年费米大视场望远镜伽马射线数据的最终时间解算方案还显示出蜘蛛脉冲星典型的轨道周期变化。来自eROSITA的X射线观测表明存在非热发射,但由于计数率相对较低,无法进行X射线脉冲搜索。使用Icarus进行的光学光曲线建模表明,不对称性最好用斑点模型解释,其中不均匀的加热会在伴侣星上产生局部温度差异。虽然我们获得的光学光谱与我们对伴侣星推断的物理性质相符,但由于信噪比不足,无法进行径向速度测量,从而对中子星的质量约束造成了限制。光曲线最小值附近观察到的偏蓝颜色表明可能存在来自双星内部冲击的非热发射,X射线源的存在也支持了这一观点。这一发现充分证明了费米大视场望远镜目录在识别毫秒脉冲星候选对象方面的能力,并强调了光学观测在检测适合射电后续观测的可变源方面所起的作用。
论文及项目相关链接
PDF Accepted for publication in Monthly Notices of the Royal Astronomical Society. 16 pages. 11 figures
摘要
发现新的黑寡妇毫秒脉冲星PSR J1544-2555,与费米-LAT源4FGL J1544.2-2554相关联。通过光学、射电和伽马射线观测确认其为紧凑蜘蛛双星系统。ULTRACAM光学摄影术揭示其约2.7小时轨道周期,MeerKAT观测检测到约2.4毫秒射电脉冲。使用Murriyang Parkes望远镜、埃费尔斯贝格100米射电望远镜和纳赛射电望远镜的定时观测活动得到初步定时解决方案,并发现伽马射线脉冲。跨越16年的费米-LAT伽马射线数据的最终定时解决方案也显示出蜘蛛脉冲星典型的轨道周期变化。来自eROSITA的X射线观测表明存在非热发射,但由于计数率相对较低,无法搜索X射线脉冲。使用Icarus对光学光曲线进行建模,不均匀加热造成的局部温度变化的斑点模型最佳解释不对称性。虽然获得的光学光谱与我们推断的伴星物理性质相符,但由于信噪比不足,无法进行径向速度测量,因此对中子星质量约束有限。光曲线最小值附近的较蓝颜色可能表明存在来自二元内冲击的非热发射,X射线源的存在也支持这一观点。这一发现证明了费米-LAT目录在识别毫秒脉冲星候选对象方面的能力,并强调了光学勘测在检测适合射电后续观测的可变源中的作用。
要点摘要
- 发现了一个新的黑寡妇毫秒脉冲星PSR J1544-2555,与费米-LAT源相关联。
- 通过多种波段观测确认了其为紧凑蜘蛛双星系统。
- ULTRACAM揭示了约2.7小时的轨道周期。
- 通过一系列射电望远镜的定时观测,检测到了毫秒级的射电脉冲。
- 初步和最终的定时解决方案揭示了蜘蛛脉冲星的典型轨道周期变化。
- X射线观测表明存在非热发射,但受限于较低的计数率,无法确定X射线脉冲。
点此查看论文截图





Invisible Attributes, Visible Biases: Exploring Demographic Shortcuts in MRI-based Alzheimer’s Disease Classification
Authors:Akshit Achara, Esther Puyol Anton, Alexander Hammers, Andrew P. King
Magnetic resonance imaging (MRI) is the gold standard for brain imaging. Deep learning (DL) algorithms have been proposed to aid in the diagnosis of diseases such as Alzheimer’s disease (AD) from MRI scans. However, DL algorithms can suffer from shortcut learning, in which spurious features, not directly related to the output label, are used for prediction. When these features are related to protected attributes, they can lead to performance bias against underrepresented protected groups, such as those defined by race and sex. In this work, we explore the potential for shortcut learning and demographic bias in DL based AD diagnosis from MRI. We first investigate if DL algorithms can identify race or sex from 3D brain MRI scans to establish the presence or otherwise of race and sex based distributional shifts. Next, we investigate whether training set imbalance by race or sex can cause a drop in model performance, indicating shortcut learning and bias. Finally, we conduct a quantitative and qualitative analysis of feature attributions in different brain regions for both the protected attribute and AD classification tasks. Through these experiments, and using multiple datasets and DL models (ResNet and SwinTransformer), we demonstrate the existence of both race and sex based shortcut learning and bias in DL based AD classification. Our work lays the foundation for fairer DL diagnostic tools in brain MRI. The code is provided at https://github.com/acharaakshit/ShortMR
磁共振成像(MRI)是大脑成像的金标准。深度学习(DL)算法已被提出用于辅助从MRI扫描中诊断阿尔茨海默病(AD)等疾病。然而,DL算法可能会陷入捷径学习(shortcut learning),在这种学习方式中,与输出标签无直接关联的虚假特征被用于预测。当这些特征与受保护的属性相关时,它们可能导致对代表性不足的受保护群体(如按种族和性别定义的群体)的性能偏见。在这项工作中,我们探讨了基于MRI的AD诊断中捷径学习和人口统计偏见的潜力。我们首先调查DL算法是否能从3D大脑MRI扫描中识别种族或性别,以确定是否存在基于种族和性别的分布转移。接下来,我们调查按种族或性别训练集的不平衡是否会导致模型性能下降,这表明存在捷径学习和偏见。最后,我们对不同大脑区域的特征归属进行定量和定性分析,包括受保护属性和AD分类任务。通过这些实验以及使用多个数据集和DL模型(ResNet和SwinTransformer),我们证明了基于种族和性别的捷径学习和偏见存在于基于DL的AD分类中。我们的工作为建立更公平的脑MRI诊断工具奠定了基础。代码可通过https://github.com/acharaakshit/ShortMR获取。
论文及项目相关链接
PDF FAIMI @ MICCAI 2025
Summary
MRI在脑成像中是金标准,深度学习算法已被应用于辅助诊断阿尔茨海默病等疾病。但深度学习算法可能遭受捷径学习的影响,利用与输出标签无直接关系的特征进行预测,可能导致对代表性不足的群体的性能偏见。本文探索了基于深度学习的阿尔茨海默病MRI诊断中的捷径学习和人口统计偏见。通过实验和多个数据集及深度学习模型(ResNet和SwinTransformer),证明存在基于种族和性别的捷径学习和偏见。本文工作为基础开发更公平的深度学习诊断工具提供了基础。
Key Takeaways
- 深度学习算法在MRI扫描诊断阿尔茨海默病中有应用潜力。
- 深度学习算法可能遭受捷径学习的影响,利用与输出标签无关的特征进行预测。
- 捷径学习可能导致对代表性不足的群体的性能偏见,如种族和性别。
- 本文探索了基于种族和性别的捷径学习和偏见在阿尔茨海默病MRI诊断中的存在性。
- 通过实验和多个数据集及深度学习模型的验证,证明了基于种族和性别的捷径学习和偏见的存在。
- 本文工作为开发更公平的深度学习诊断工具奠定了基础。
点此查看论文截图



DualTrack: Sensorless 3D Ultrasound needs Local and Global Context
Authors:Paul F. R. Wilson, Matteo Ronchetti, Rüdiger Göbl, Viktoria Markova, Sebastian Rosenzweig, Raphael Prevost, Parvin Mousavi, Oliver Zettinig
Three-dimensional ultrasound (US) offers many clinical advantages over conventional 2D imaging, yet its widespread adoption is limited by the cost and complexity of traditional 3D systems. Sensorless 3D US, which uses deep learning to estimate a 3D probe trajectory from a sequence of 2D US images, is a promising alternative. Local features, such as speckle patterns, can help predict frame-to-frame motion, while global features, such as coarse shapes and anatomical structures, can situate the scan relative to anatomy and help predict its general shape. In prior approaches, global features are either ignored or tightly coupled with local feature extraction, restricting the ability to robustly model these two complementary aspects. We propose DualTrack, a novel dual-encoder architecture that leverages decoupled local and global encoders specialized for their respective scales of feature extraction. The local encoder uses dense spatiotemporal convolutions to capture fine-grained features, while the global encoder utilizes an image backbone (e.g., a 2D CNN or foundation model) and temporal attention layers to embed high-level anatomical features and long-range dependencies. A lightweight fusion module then combines these features to estimate the trajectory. Experimental results on a large public benchmark show that DualTrack achieves state-of-the-art accuracy and globally consistent 3D reconstructions, outperforming previous methods and yielding an average reconstruction error below 5 mm.
三维超声(US)相较于传统的二维成像在临床应用上提供了许多优势,然而其广泛应用受限于传统三维系统的成本和复杂性。无传感器三维超声利用深度学习从一系列二维超声图像中估计三维探头轨迹,是一种很有前景的替代方案。局部特征(如斑点模式)有助于预测帧间运动,而全局特征(如粗略形状和解剖结构)可以将扫描定位在解剖结构的相关位置并帮助预测其整体形状。在以前的方法中,全局特征要么被忽略,要么与局部特征提取紧密耦合,限制了稳健地建模这两个互补方面的能力。我们提出了DualTrack,这是一种新型的双编码器架构,它利用解耦的局部和全局编码器,专门针对其各自尺度的特征提取进行专业化处理。局部编码器使用密集的时空卷积来捕捉精细特征,而全局编码器则利用图像主干(例如二维卷积神经网络或基础模型)和临时注意力层来嵌入高级解剖特征和长期依赖关系。然后,一个轻量级的融合模块结合这些特征来估计轨迹。在大型公共基准测试上的实验结果表明,DualTrack达到了最先进的准确性,实现了全局一致的三维重建,优于以前的方法,平均重建误差低于5毫米。
论文及项目相关链接
Summary
本文介绍了三维超声(US)相较于传统二维成像的临床优势,但其广泛应用受限于传统三维系统的成本和复杂性。文章提出了一种基于深度学习的无传感器三维超声技术——DualTrack,该技术利用解耦的局部和全局编码器,分别提取不同尺度的特征,实现了精细特征与高级解剖特征的融合,从而在公共数据集上取得了最新的轨迹估计和三维重建效果。
Key Takeaways
- 三维超声相比传统二维成像具有多种临床优势,但受限于成本和系统复杂性。
- 无传感器三维超声利用深度学习从一系列二维超声图像估计三维探针轨迹。
- 局部特征(如斑点模式)有助于预测帧间运动,而全局特征(如粗略形状和解剖结构)有助于将扫描定位在解剖结构并预测其总体形状。
- DualTrack是一种新型的双编码器架构,旨在提取局部和全局特征,实现精细特征与高级解剖特征的分离提取与融合。
- 局部编码器通过密集时空卷积捕获精细特征。
- 全局编码器利用图像主干(如二维卷积神经网络或基础模型)和时间注意力层来嵌入高级解剖特征和长期依赖性。
点此查看论文截图



FlexiD-Fuse: Flexible number of inputs multi-modal medical image fusion based on diffusion model
Authors:Yushen Xu, Xiaosong Li, Yuchun Wang, Xiaoqi Cheng, Huafeng Li, Haishu Tan
Different modalities of medical images provide unique physiological and anatomical information for diseases. Multi-modal medical image fusion integrates useful information from different complementary medical images with different modalities, producing a fused image that comprehensively and objectively reflects lesion characteristics to assist doctors in clinical diagnosis. However, existing fusion methods can only handle a fixed number of modality inputs, such as accepting only two-modal or tri-modal inputs, and cannot directly process varying input quantities, which hinders their application in clinical settings. To tackle this issue, we introduce FlexiD-Fuse, a diffusion-based image fusion network designed to accommodate flexible quantities of input modalities. It can end-to-end process two-modal and tri-modal medical image fusion under the same weight. FlexiD-Fuse transforms the diffusion fusion problem, which supports only fixed-condition inputs, into a maximum likelihood estimation problem based on the diffusion process and hierarchical Bayesian modeling. By incorporating the Expectation-Maximization algorithm into the diffusion sampling iteration process, FlexiD-Fuse can generate high-quality fused images with cross-modal information from source images, independently of the number of input images. We compared the latest two and tri-modal medical image fusion methods, tested them on Harvard datasets, and evaluated them using nine popular metrics. The experimental results show that our method achieves the best performance in medical image fusion with varying inputs. Meanwhile, we conducted extensive extension experiments on infrared-visible, multi-exposure, and multi-focus image fusion tasks with arbitrary numbers, and compared them with the perspective SOTA methods. The results of the extension experiments consistently demonstrate the effectiveness and superiority of our method.
不同模态的医学图像为疾病提供了独特的生理和解剖信息。多模态医学图像融合从不同模态的互补医学图像中集成有用信息,生成一个融合图像,该图像全面客观地反映了病变特征,以辅助医生进行临床诊断。然而,现有的融合方法只能处理固定数量的模态输入,例如仅接受二模态或三模态输入,不能直接处理不同数量的输入,这阻碍了它们在临床环境中的应用。针对这一问题,我们引入了FlexiD-Fuse,这是一个基于扩散的图像融合网络,旨在适应灵活数量的输入模态。它可以在相同的权重下,端到端地处理二模态和三模态医学图像融合。FlexiD-Fuse将仅支持固定条件输入的扩散融合问题转化为基于扩散过程和分层贝叶斯建模的最大似然估计问题。通过将期望最大化算法融入扩散采样迭代过程,FlexiD-Fuse可以生成高质量的融合图像,包含来自源图像的跨模态信息,且不受输入图像数量的限制。我们比较了最新的二模态和三模态医学图像融合方法,在哈佛数据集上进行了测试,使用九个流行指标进行了评估。实验结果表明,我们的方法在医学图像融合中实现了最佳性能,且输入可变。同时,我们在红外可见、多曝光和多焦点图像融合任务上进行了广泛的扩展实验,并与当前最佳方法进行了比较。扩展实验的结果一致表明了我们方法的有效性和优越性。
论文及项目相关链接
Summary
引入FlexiD-Fuse网络,支持灵活多变模态输入数量的医学图像融合。该方法基于扩散过程和分层贝叶斯建模,将固定条件的融合问题转化为最大似然估计问题,能够生成高质量融合图像,且能独立处理不同数量的输入图像。实验结果表明,该方法在医学图像融合任务上取得了最佳性能。
Key Takeaways
- 不同模态的医学图像为疾病提供独特的生理和解剖信息。
- 多模态医学图像融合能够集成不同模态的医学图像中的有用信息,生成一个全面客观地反映病灶特征的融合图像,辅助医生进行临床诊断。
- 现有融合方法只能处理固定数量的模态输入,无法直接处理变化的输入量,限制了其在临床环境中的应用。
- FlexiD-Fuse网络被设计用来解决这一难题,它支持灵活多变的输入模态数量,并能在同一权重下进行两模态和三模态医学图像融合。
- FlexiD-Fuse将扩散融合问题转化为基于扩散过程和分层贝叶斯建模的最大似然估计问题。
- 通过在扩散采样迭代过程中融入期望最大化算法,FlexiD-Fuse可以生成高质量的融合图像,包含跨模态的源图像信息,且独立于输入图像的数量。
点此查看论文截图




Modality-Agnostic Input Channels Enable Segmentation of Brain lesions in Multimodal MRI with Sequences Unavailable During Training
Authors:Anthony P. Addison, Felix Wagner, Wentian Xu, Natalie Voets, Konstantinos Kamnitsas
Segmentation models are important tools for the detection and analysis of lesions in brain MRI. Depending on the type of brain pathology that is imaged, MRI scanners can acquire multiple, different image modalities (contrasts). Most segmentation models for multimodal brain MRI are restricted to fixed modalities and cannot effectively process new ones at inference. Some models generalize to unseen modalities but may lose discriminative modality-specific information. This work aims to develop a model that can perform inference on data that contain image modalities unseen during training, previously seen modalities, and heterogeneous combinations of both, thus allowing a user to utilize any available imaging modalities. We demonstrate this is possible with a simple, thus practical alteration to the U-net architecture, by integrating a modality-agnostic input channel or pathway, alongside modality-specific input channels. To train this modality-agnostic component, we develop an image augmentation scheme that synthesizes artificial MRI modalities. Augmentations differentially alter the appearance of pathological and healthy brain tissue to create artificial contrasts between them while maintaining realistic anatomical integrity. We evaluate the method using 8 MRI databases that include 5 types of pathologies (stroke, tumours, traumatic brain injury, multiple sclerosis and white matter hyperintensities) and 8 modalities (T1, T1+contrast, T2, PD, SWI, DWI, ADC and FLAIR). The results demonstrate that the approach preserves the ability to effectively process MRI modalities encountered during training, while being able to process new, unseen modalities to improve its segmentation. Project code: https://github.com/Anthony-P-Addison/AGN-MOD-SEG
在大脑MRI中,分割模型是检测和分析病变的重要工具。根据不同的脑病理成像类型,MRI扫描仪可以获取多种不同的图像模式(对比度)。针对多模态大脑MRI的多数分割模型仅限于固定模式,无法在推理过程中有效地处理新模式。一些模型可以推广到未见过的模式,但可能会失去判别性的模式特定信息。这项工作旨在开发一种模型,能够对包含训练过程中未见过的图像模式、以前见过的模式以及两者的异质组合的数据进行推理,从而允许用户利用任何可用的成像模式。我们通过集成模态无关输入通道或路径,对U-net架构进行了简单实用的改进,证明了这是可能的,同时还保留了模态特定输入通道。为了训练这个模态无关组件,我们开发了一种图像增强方案,可以合成人工MRI模式。增强通过有选择地改变病理和正常脑组织的外貌,在它们之间创造人为对比,同时保持真实的解剖完整性。我们使用包含5种病理类型(中风、肿瘤、脑外伤、多发性硬化症和白质高信号)和8种模式(T1、T1+对比度、T2、PD、SWI、DWI、ADC和FLAIR)的8个MRI数据库来评估该方法。结果表明,该方法在保留处理训练过程中遇到的MRI模式的能力的同时,还能处理新的未见过的模式以提高其分割能力。项目代码:https://github.com/Anthony-P-Addison/AGN-MOD-SEG
论文及项目相关链接
PDF Accepted to MICCAI 2025, for the following workshop: ML-CDS 2025: Multimodal Learning and Fusion Across Scales for Clinical Decision Support
Summary
本文介绍了一种针对脑MRI图像中病灶检测和分割的模型。该模型具有处理训练期间未见过的模态数据的能力,通过添加一种通用的输入通道或路径来实现这一点。同时,通过图像增强技术训练该通用组件,合成人工MRI模态,以模拟不同病理和正常脑组织之间的对比。经过在包含多种病理类型和模态的MRI数据库上的评估,证明该方法在训练期间遇到的模态和新未见过的模态上都能实现有效处理,提高了分割性能。
Key Takeaways
- 介绍了针对脑MRI图像中病灶检测和分割的模型的重要性。
- 大多数多模态脑MRI分割模型受限于固定的模态,无法有效处理新模态数据。
- 该模型旨在处理训练期间未见过的模态数据,并允许用户利用任何可用的成像模态。
- 通过向U-net架构中添加通用的输入通道或路径,实现了模型的灵活性。
- 通过图像增强技术训练通用组件,合成人工MRI模态以增强模型的适应能力。
- 该模型在多个MRI数据库上进行了评估,包括多种病理类型和模态,证明了其有效性。
点此查看论文截图




Unified Start, Personalized End: Progressive Pruning for Efficient 3D Medical Image Segmentation
Authors:Linhao Li, Yiwen Ye, Ziyang Chen, Yong Xia
3D medical image segmentation often faces heavy resource and time consumption, limiting its scalability and rapid deployment in clinical environments. Existing efficient segmentation models are typically static and manually designed prior to training, which restricts their adaptability across diverse tasks and makes it difficult to balance performance with resource efficiency. In this paper, we propose PSP-Seg, a progressive pruning framework that enables dynamic and efficient 3D segmentation. PSP-Seg begins with a redundant model and iteratively prunes redundant modules through a combination of block-wise pruning and a functional decoupling loss. We evaluate PSP-Seg on five public datasets, benchmarking it against seven state-of-the-art models and six efficient segmentation models. Results demonstrate that the lightweight variant, PSP-Seg-S, achieves performance on par with nnU-Net while reducing GPU memory usage by 42-45%, training time by 29-48%, and parameter number by 83-87% across all datasets. These findings underscore PSP-Seg’s potential as a cost-effective yet high-performing alternative for widespread clinical application.
三维医学图像分割常常面临资源和时间消耗大的问题,这限制了其在临床环境中的可扩展性和快速部署能力。现有的高效分割模型通常在训练前静态且手动设计,这限制了其在不同任务中的适应性,并且难以在性能和资源效率之间取得平衡。在本文中,我们提出了PSP-Seg,这是一种动态且高效的三维分割渐进式修剪框架。PSP-Seg从冗余模型开始,通过块级修剪和功能解耦损失的结合,迭代地修剪冗余模块。我们在五个公共数据集上对PSP-Seg进行了评估,将其与七个最新模型和六个高效分割模型进行了基准测试。结果表明,轻量级变体PSP-Seg-S的性能与nnU-Net相当,同时降低了GPU内存使用率的42-45%,训练时间缩短了29-48%,在所有数据集上的参数数量减少了83-87%。这些发现强调了PSP-Seg作为经济高效且性能优异的选择,在广泛的临床应用中的潜力。
论文及项目相关链接
PDF 15 pages, 8 figures
Summary
本文提出一种名为PSP-Seg的渐进式裁剪框架,用于实现动态和高效的3D医学图像分割。该框架从冗余模型开始,通过块级裁剪和功能解耦损失的结合,迭代地裁剪冗余模块。在五个公共数据集上的评估结果表明,PSP-Seg的轻量级变体PSP-Seg-S在性能上与nnU-Net相当,同时降低了GPU内存使用率42-45%,训练时间缩短29-48%,参数数量减少83-87%。这突显了PSP-Seg作为经济实惠且高性能的替代方案在临床应用中的潜力。
Key Takeaways
- 3D医学图像分割面临资源和时间消耗大的问题,限制了其在临床环境中的可扩展性和快速部署。
- 现有高效分割模型通常是静态的,在训练前手动设计,这限制了它们在不同任务中的适应性。
- PSP-Seg是一种渐进式裁剪框架,能够实现动态和高效的3D分割。
- PSP-Seg通过块级裁剪和功能解耦损失的结合,迭代地裁剪冗余模块。
- 在多个公共数据集上的评估表明,PSP-Seg的轻量级变体PSP-Seg-S性能优异,与nnU-Net相当。
- PSP-Seg-S在GPU内存使用、训练时间和参数数量方面有明显优化。
点此查看论文截图






Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis
Authors:Jing Hao, Yuxuan Fan, Yanpeng Sun, Kaixin Guo, Lizhuo Lin, Jinrong Yang, Qi Yong H. Ai, Lun M. Wong, Hao Tang, Kuo Feng Hung
Recent advances in large vision-language models (LVLMs) have demonstrated strong performance on general-purpose medical tasks. However, their effectiveness in specialized domains such as dentistry remains underexplored. In particular, panoramic X-rays, a widely used imaging modality in oral radiology, pose interpretative challenges due to dense anatomical structures and subtle pathological cues, which are not captured by existing medical benchmarks or instruction datasets. To this end, we introduce MMOral, the first large-scale multimodal instruction dataset and benchmark tailored for panoramic X-ray interpretation. MMOral consists of 20,563 annotated images paired with 1.3 million instruction-following instances across diverse task types, including attribute extraction, report generation, visual question answering, and image-grounded dialogue. In addition, we present MMOral-Bench, a comprehensive evaluation suite covering five key diagnostic dimensions in dentistry. We evaluate 64 LVLMs on MMOral-Bench and find that even the best-performing model, i.e., GPT-4o, only achieves 41.45% accuracy, revealing significant limitations of current models in this domain. To promote the progress of this specific domain, we also propose OralGPT, which conducts supervised fine-tuning (SFT) upon Qwen2.5-VL-7B with our meticulously curated MMOral instruction dataset. Remarkably, a single epoch of SFT yields substantial performance enhancements for LVLMs, e.g., OralGPT demonstrates a 24.73% improvement. Both MMOral and OralGPT hold significant potential as a critical foundation for intelligent dentistry and enable more clinically impactful multimodal AI systems in the dental field. The dataset, model, benchmark, and evaluation suite are available at https://github.com/isbrycee/OralGPT.
近期大型视觉语言模型(LVLMs)的进展在通用医疗任务上表现出了强大的性能。然而,它们在特定领域,如牙科领域的效果尚未得到充分探索。特别是全景X射线,作为口腔放射学中广泛使用的成像方式,由于其密集的解剖结构和微妙的病理线索,给解读带来了挑战,这些挑战并未被现有的医疗基准测试或指令数据集所涵盖。为此,我们推出了MMOral,这是首个专门针对全景X射线解读的大型多模式指令数据集和基准测试。MMOral包含20,563张带注释的图像,以及与多种任务类型相匹配的130万个指令实例,包括属性提取、报告生成、视觉问答和基于图像的对话等。此外,我们还推出了MMOral-Bench,这是一个全面的评估套件,涵盖了牙科的五个关键诊断维度。我们评估了64个LVLM在MMOral-Bench上的表现,并发现即使表现最佳的模型——GPT-4o,其准确率也只有41.45%,这揭示了当前模型在这个领域的显著局限性。为了推动这个特定领域的进步,我们还提出了OralGPT,它是基于Qwen2.5-VL-7B进行的有监督微调(SFT)的产物,使用了我们精心策划的MMOral指令数据集。令人瞩目的是,仅需一个周期的SFT就能为LVLM带来显著的性能提升,例如OralGPT显示出24.73%的改进。MMOral和OralGPT在智能牙科领域具有巨大的潜力,为牙科领域的多模式人工智能系统带来了更具临床影响力的可能性。数据集、模型、基准测试和评估套件均可在https://github.com/isbrycee/OralGPT找到。
论文及项目相关链接
PDF 40 pages, 26 figures, 9 tables
Summary
本文介绍了针对牙科全景X射线解读的大规模多模态指令数据集MMOral及其评估基准MMOral-Bench。现有大型视觉语言模型(LVLMs)在该领域表现有限,需要MMOral数据集进行训练和提升。研究提出OralGPT模型,通过监督微调(SFT)在MMOral数据集上进行训练,显著提高LVLMs的性能。MMOral和OralGPT为智能牙科提供了关键基础,促进了牙科领域的多模态人工智能系统的发展。
Key Takeaways
- LVLMs在一般医疗任务上表现出强大的性能,但在牙科等专业领域的应用仍需探索。
- 牙科全景X射线解读存在挑战,需要专门的数据集和评估基准。
- 引入MMOral数据集,包含20,563张注释图像和130万条指令实例。
- 提出MMOral-Bench评估基准,涵盖牙科的五个关键诊断维度。
- 当前最佳模型GPT-4o在MMOral-Bench上的准确率仅为41.45%,显示现有模型的局限性。
- 提出OralGPT模型,通过监督微调在MMOral数据集上进行训练,显著提高性能。
点此查看论文截图





Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement
Authors:Jiesi Hu, Jianfeng Cao, Yanwu Yang, Chenfei Ye, Yixuan Zhang, Hanyang Peng, Ting Ma
In-context learning (ICL) offers a promising paradigm for universal medical image analysis, enabling models to perform diverse image processing tasks without retraining. However, current ICL models for medical imaging remain limited in two critical aspects: they cannot simultaneously achieve high-fidelity predictions and global anatomical understanding, and there is no unified model trained across diverse medical imaging tasks (e.g., segmentation and enhancement) and anatomical regions. As a result, the full potential of ICL in medical imaging remains underexplored. Thus, we present \textbf{Medverse}, a universal ICL model for 3D medical imaging, trained on 22 datasets covering diverse tasks in universal image segmentation, transformation, and enhancement across multiple organs, imaging modalities, and clinical centers. Medverse employs a next-scale autoregressive in-context learning framework that progressively refines predictions from coarse to fine, generating consistent, full-resolution volumetric outputs and enabling multi-scale anatomical awareness. We further propose a blockwise cross-attention module that facilitates long-range interactions between context and target inputs while preserving computational efficiency through spatial sparsity. Medverse is extensively evaluated on a broad collection of held-out datasets covering previously unseen clinical centers, organs, species, and imaging modalities. Results demonstrate that Medverse substantially outperforms existing ICL baselines and establishes a novel paradigm for in-context learning. Code and model weights will be made publicly available. Our model are publicly available at https://github.com/jiesihu/Medverse.
上下文学习(ICL)为通用医学图像分析提供了一个有前途的范式,使模型能够在不进行再训练的情况下执行多种图像处理任务。然而,当前的医学成像ICL模型在两个方面仍存在局限性:它们不能同时实现高保真预测和全局解剖结构理解,并且没有统一的模型可以在各种医学成像任务(如分割和增强)和解剖区域上进行训练。因此,医学成像中ICL的潜力尚未得到充分探索。因此,我们提出了Medverse,这是一个用于3D医学成像的通用ICL模型,在22个数据集上进行训练,涵盖了多个器官、成像模式、临床中心的通用图像分割、转换和增强任务。Medverse采用了一种大规模上下文学习的框架,该框架从粗略到精细逐步优化预测,生成一致的全分辨率体积输出,并实现多尺度解剖结构感知。我们还提出了一种块级交叉注意模块,该模块促进了上下文和目标输入之间的长距离交互,同时通过空间稀疏性保持计算效率。Medverse在涵盖以前未见过的临床中心、器官、物种和成像模式的大量独立数据集上进行了广泛评估。结果表明,Medverse显著优于现有的ICL基线,并为上下文学习建立了新的范式。代码和模型权重将公开发布。我们的模型可在https://github.com/jiesihu/Medverse获得。
论文及项目相关链接
Summary
本文介绍了针对医学图像分析的上下文学习(ICL)的新模型——Medverse。该模型能够在多个器官、成像方式、临床中心的多种任务上进行训练,包括图像分割、转换和增强等。Medverse采用渐进式精细化预测的方式,生成高分辨率的三维输出,并具备多尺度解剖意识。此外,它还具有块交叉注意力模块,促进上下文与目标输入之间的长期交互,同时通过空间稀疏性保持计算效率。经过广泛的评估,Medverse显著优于现有的ICL基线,为上下文学习提供了新的范例。
Key Takeaways
- Medverse是一个针对医学图像分析的通用ICL模型,能在多种任务上进行训练,如图像分割、转换和增强等。
- Medverse采用渐进式精细化预测,生成高分辨率的三维输出,实现多尺度解剖意识。
- 该模型通过块交叉注意力模块促进上下文与目标输入之间的长期交互。
- Medverse能够在多种不同的数据集上表现优越,包括来自未见过的临床中心、器官、物种和成像方式的数据。
- Medverse显著优于现有的ICL基线,为上下文学习在医学图像分析领域提供了新的范例。
- Medverse模型公开可用,并提供了代码和模型权重。
点此查看论文截图







Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning
Authors:Xuefeng Wang, Lei Zhang, Henglin Pu, Ahmed H. Qureshi, Husheng Li
Existing reinforcement learning (RL) methods struggle with complex dynamical systems that demand interactions at high frequencies or irregular time intervals. Continuous-time RL (CTRL) has emerged as a promising alternative by replacing discrete-time Bellman recursion with differential value functions defined as viscosity solutions of the Hamilton–Jacobi–Bellman (HJB) equation. While CTRL has shown promise, its applications have been largely limited to the single-agent domain. This limitation stems from two key challenges: (i) conventional solution methods for HJB equations suffer from the curse of dimensionality (CoD), making them intractable in high-dimensional systems; and (ii) even with HJB-based learning approaches, accurately approximating centralized value functions in multi-agent settings remains difficult, which in turn destabilizes policy training. In this paper, we propose a CT-MARL framework that uses physics-informed neural networks (PINNs) to approximate HJB-based value functions at scale. To ensure the value is consistent with its differential structure, we align value learning with value-gradient learning by introducing a Value Gradient Iteration (VGI) module that iteratively refines value gradients along trajectories. This improves gradient fidelity, in turn yielding more accurate values and stronger policy learning. We evaluate our method using continuous-time variants of standard benchmarks, including multi-agent particle environment (MPE) and multi-agent MuJoCo. Our results demonstrate that our approach consistently outperforms existing continuous-time RL baselines and scales to complex multi-agent dynamics.
现有的强化学习(RL)方法在应对需要高频或不规则时间间隔交互的复杂动态系统时面临挑战。连续时间强化学习(CTRL)作为一种有前景的替代方法应运而生,它通过用哈密顿-雅可比-贝尔曼(HJB)方程的粘性解所定义的微分值函数替换离散时间贝尔曼递归。虽然CTRL已经显示出潜力,但其应用大多仅限于单智能体领域。这一限制源于两个关键挑战:(i)解决HJB方程的传统方法受到维数诅咒(CoD)的影响,使其在高维系统中难以处理;(ii)即使在基于HJB的学习方法中,在多人环境中准确地近似集中值函数仍然很困难,这反过来又会使策略训练不稳定。在本文中,我们提出了一种使用物理信息神经网络(PINNs)来大规模近似基于HJB的值函数的CT-MARL框架。为了确保值与它的微分结构一致,我们通过引入值梯度迭代(VGI)模块将值学习与值梯度学习对齐,该模块沿轨迹迭代地优化值梯度。这提高了梯度的保真度,进而得到更准确的值和更强的策略学习。我们使用包括多智能体粒子环境(MPE)和多智能体MuJoCo在内的标准连续时间变体来评估我们的方法。结果表明,我们的方法始终优于现有的连续时间RL基准测试,并能够扩展到复杂的多智能体动态。
论文及项目相关链接
PDF 19 pages, 10 figures
Summary
本文提出一种基于物理信息神经网络(PINNs)的连续时间多智能体强化学习框架(CT-MARL)。该框架利用PINNs近似汉密尔顿-雅可比-贝尔曼(HJB)方程的价值函数,解决了传统解决HJB方程方法面临的维度诅咒(CoD)问题。同时,引入价值梯度迭代(VGI)模块,确保价值与微分结构的一致性,提高梯度准确性,进而实现更精确的价值估计和更强的策略学习。在标准连续时间基准测试环境中,包括多智能体粒子环境(MPE)和多智能体MuJoCo,该方法表现优异。
Key Takeaways
- 连续时间强化学习(CTRL)通过解决汉密尔顿-雅可比-贝尔曼(HJB)方程处理复杂动态系统。
- 传统解决HJB方程方法面临维度诅咒(CoD)问题,难以应用于高维系统。
- 物理信息神经网络(PINNs)被用于近似HJB方程的价值函数,解决上述问题。
- 引入价值梯度迭代(VGI)模块,确保价值与微分结构的一致性,提高梯度准确性。
- CT-MARL框架能在复杂多智能体动态环境中实现更精确的价值估计和更强的策略学习。
- 在连续时间基准测试环境中,CT-MARL框架表现优于现有连续时间RL基线。
点此查看论文截图


Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models
Authors:Qiuhui Chen, Xuancheng Yao, Huping Ye, Yi Hong
Understanding 3D medical image volumes is critical in the medical field, yet existing 3D medical convolution and transformer-based self-supervised learning (SSL) methods often lack deep semantic comprehension. Recent advancements in multimodal large language models (MLLMs) provide a promising approach to enhance image understanding through text descriptions. To leverage these 2D MLLMs for improved 3D medical image understanding, we propose Med3DInsight, a novel pretraining framework that integrates 3D image encoders with 2D MLLMs via a specially designed plane-slice-aware transformer module. Additionally, our model employs a partial optimal transport based alignment, demonstrating greater tolerance to noise introduced by potential noises in LLM-generated content. Med3DInsight introduces a new paradigm for scalable multimodal 3D medical representation learning without requiring human annotations. Extensive experiments demonstrate our state-of-the-art performance on two downstream tasks, i.e., segmentation and classification, across various public datasets with CT and MRI modalities, outperforming current SSL methods. Med3DInsight can be seamlessly integrated into existing 3D medical image understanding networks, potentially enhancing their performance. Our source code, generated datasets, and pre-trained models will be available at https://github.com/Qybc/Med3DInsight.
理解三维医学图像体积在医学领域至关重要,然而现有的基于三维医学卷积和转换器自监督学习(SSL)的方法通常缺乏深度语义理解。最近的多模态大型语言模型(MLLM)的进步提供了一种有前途的方法,可以通过文本描述增强图像理解。为了利用这些二维MLLM来改善三维医学图像的理解,我们提出了Med3DInsight,这是一种新型预训练框架,通过专门设计的平面切片感知转换器模块,将三维图像编码器与二维MLLM集成在一起。此外,我们的模型采用基于部分最优传输的对齐方式,表现出对语言模型生成内容中潜在噪声引入的噪声的更大容忍度。Med3DInsight为可扩展的多模态三维医学表示学习提供了新的范式,无需人为注释。大量实验表明,我们在两个下游任务(即分割和分类)上的表现达到了最新水平,在各种公共数据集上,使用CT和MRI模态均超越了当前的SSL方法。Med3DInsight可以无缝集成到现有的三维医学图像理解网络中,有可能提高它们的性能。我们的源代码、生成的数据集和预训练模型将在https://github.com/Qybc/Med3DInsight上提供。
论文及项目相关链接
PDF Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI)
Summary
本文提出了一种新的预训练框架Med3DInsight,用于增强对三维医学图像的理解。该框架结合了三维图像编码器和二维多模态大型语言模型,通过专门的平面切片感知转换器模块实现。Med3DInsight采用部分最优传输对齐,对语言模型生成内容中的潜在噪声表现出较强的耐受性。该框架在多个公共数据集上的分割和分类任务中表现出卓越性能,且可无缝集成到现有的三维医学图像理解网络中。
Key Takeaways
- Med3DInsight是一个新的预训练框架,旨在增强对三维医学图像的理解。
- 框架结合了三维图像编码器和二维多模态大型语言模型。
- 通过专门的平面切片感知转换器模块实现图像与文本的融合。
- 采用部分最优传输对齐,提高了对语言模型生成内容中噪声的耐受性。
- 在多个公共数据集的分割和分类任务中表现出卓越性能,优于现有的自监督学习方法。
- Med3DInsight可无缝集成到现有的三维医学图像理解网络中。
点此查看论文截图




USEANet: Ultrasound-Specific Edge-Aware Multi-Branch Network for Lightweight Medical Image Segmentation
Authors:Jingyi Gao, Di Wu, Baha lhnaini
Ultrasound image segmentation faces unique challenges including speckle noise, low contrast, and ambiguous boundaries, while clinical deployment demands computationally efficient models. We propose USEANet, an ultrasound-specific edge-aware multi-branch network that achieves optimal performance-efficiency balance through four key innovations: (1) ultrasound-specific multi-branch processing with specialized modules for noise reduction, edge enhancement, and contrast improvement; (2) edge-aware attention mechanisms that focus on boundary information with minimal computational overhead; (3) hierarchical feature aggregation with adaptive weight learning; and (4) ultrasound-aware decoder enhancement for optimal segmentation refinement. Built on an ultra-lightweight PVT-B0 backbone, USEANet significantly outperforms existing methods across five ultrasound datasets while using only 3.64M parameters and 0.79G FLOPs. Experimental results demonstrate superior segmentation accuracy with 67.01 IoU on BUSI dataset, representing substantial improvements over traditional approaches while maintaining exceptional computational efficiency suitable for real-time clinical applications. Code is available at https://github.com/chouheiwa/USEANet.
超声图像分割面临着斑点噪声、低对比度和模糊边界等独特挑战,而临床部署则要求计算效率高的模型。我们提出了USEANet,这是一个超声专用的边缘感知多分支网络,通过四个关键创新实现了性能与效率的平衡:(1)超声专用多分支处理,具有用于降噪、边缘增强和对比度改进的专用模块;(2)边缘感知注意力机制,以最小的计算开销关注边界信息;(3)具有自适应权重学习的分层特征聚合;(4)用于最佳分割细化的超声感知解码器增强。USEANet建立在超轻量级PVT-B0主干网络上,在五个超声数据集上的性能显著优于现有方法,同时仅使用364万个参数和0.79GFLOPs。实验结果表明,在BUSI数据集上达到了67.01的IoU分割精度,相对于传统方法有了很大的改进,同时保持了适合实时临床应用的出色计算效率。代码可在https://github.com/chouheiwa/USEANet找到。
论文及项目相关链接
PDF This work has been submitted to the IEEE for possible publication
Summary
本文提出一种名为USEANet的超声专用边缘感知多分支网络,用于解决超声图像分割面临的挑战,包括斑点噪声、低对比度和模糊边界。该网络通过四个关键创新实现了性能与效率的平衡,包括超声专用多分支处理、边缘感知注意力机制、分层特征聚合和超声感知解码器增强。USEANet在五个超声数据集上的表现优于现有方法,参数仅为3.64M,浮点运算量为0.79G,同时保持了出色的计算效率,适合实时临床应用。
Key Takeaways
- USEANet是一种专门针对超声图像分割设计的网络。
- 该网络通过四个关键创新实现性能与效率的平衡。
- USEANet包括超声专用多分支处理,用于减少噪声、增强边缘和提高对比度。
- 网络采用边缘感知注意力机制,重点关注边界信息,同时减少计算开销。
- USEANet通过分层特征聚合和自适应权重学习来提高性能。
- 超声感知解码器增强可实现更精确的分割细化。
点此查看论文截图





Expert-Guided Explainable Few-Shot Learning for Medical Image Diagnosis
Authors:Ifrat Ikhtear Uddin, Longwei Wang, KC Santosh
Medical image analysis often faces significant challenges due to limited expert-annotated data, hindering both model generalization and clinical adoption. We propose an expert-guided explainable few-shot learning framework that integrates radiologist-provided regions of interest (ROIs) into model training to simultaneously enhance classification performance and interpretability. Leveraging Grad-CAM for spatial attention supervision, we introduce an explanation loss based on Dice similarity to align model attention with diagnostically relevant regions during training. This explanation loss is jointly optimized with a standard prototypical network objective, encouraging the model to focus on clinically meaningful features even under limited data conditions. We evaluate our framework on two distinct datasets: BraTS (MRI) and VinDr-CXR (Chest X-ray), achieving significant accuracy improvements from 77.09% to 83.61% on BraTS and from 54.33% to 73.29% on VinDr-CXR compared to non-guided models. Grad-CAM visualizations further confirm that expert-guided training consistently aligns attention with diagnostic regions, improving both predictive reliability and clinical trustworthiness. Our findings demonstrate the effectiveness of incorporating expert-guided attention supervision to bridge the gap between performance and interpretability in few-shot medical image diagnosis.
医学图像分析常常面临由于专家标注数据有限而导致的重大挑战,这不仅阻碍了模型的泛化,也阻碍了临床采用。我们提出了一种专家引导的可解释的少量学习框架,该框架将放射科医生提供的感兴趣区域(ROI)集成到模型训练中,以同时提高分类性能和解释性。我们利用Grad-CAM进行空间注意力监督,并引入基于Dice相似度的解释损失,以在训练过程中使模型注意力与诊断相关的区域对齐。这种解释损失与标准原型网络目标联合优化,鼓励模型即使在数据有限的情况下也关注临床上重要的特征。我们在两个不同的数据集上评估了我们的框架:BraTS(MRI)和VinDr-CXR(胸部X射线)。与无引导模型相比,我们在BraTS上的准确率从77.09%提高到83.61%,在VinDr-CXR上的准确率从54.33%提高到73.29%。Grad-CAM可视化进一步证实,专家引导的训练能使注意力始终与诊断区域对齐,提高了预测可靠性和临床可信度。我们的研究结果表明,在少量医学图像诊断中融入专家引导的注意力监督,能有效缩小性能与解释性之间的差距。
论文及项目相关链接
PDF Accepted for publication in the proceedings of MICCAI Workshop on Data Engineering in Medical Imaging 2025
Summary
本文提出一种专家引导的可解释的少样本学习框架,该框架将放射科医生提供的感兴趣区域(ROIs)集成到模型训练中,同时提高分类性能和解释性。通过利用Grad-CAM进行空间注意力监督,引入基于Dice相似度的解释损失,使模型注意力与诊断相关区域对齐。该解释损失与标准原型网络目标联合优化,鼓励模型在有限数据条件下关注临床有意义的特征。在BraTS和VinDr-CXR两个数据集上的评估结果表明,与无引导模型相比,该框架的准确率分别从77.09%提高到83.61%和从54.33%提高到73.29%。Grad-CAM可视化进一步证实,专家引导的训练能使模型注意力始终与诊断区域对齐,提高预测可靠性和临床可信度。
Key Takeaways
- 医学图像分析面临专家标注数据有限的挑战,影响模型通用化和临床采用。
- 提出一种专家引导的可解释的少样本学习框架,结合放射科医生提供的感兴趣区域(ROIs)进行模型训练。
- 利用Grad-CAM进行空间注意力监督,引入基于Dice相似度的解释损失,提高模型诊断相关区域的注意力。
- 解释损失与标准原型网络目标联合优化,提高模型在有限数据下的临床有意义特征关注能力。
- 在BraTS和VinDr-CXR数据集上的评估表明,该框架显著提高分类准确率。
- Grad-CAM可视化证实专家引导的训练能提高模型预测可靠性和临床可信度。
点此查看论文截图


Developing a Framework to Simulate Quantitative Ultrasound Flow and Tissue Motion for Ultrafast Doppler Ultrasound
Authors:Qiang Fu, Changhui Li
Ultrafast power Doppler imaging (uPDI) has achieved substantial progress and emerged as a key modality for both research and clinical applications. However, existing simulation tools are insufficient for generating three-dimensional (3D), quantitatively accurate flow fields with tissue motion that closely approximate in vivo conditions. In this study, we present an open-source framework, termed \emph{3D-Fully Quantitative Flow} (3D-FQFlow), designed to provide quantitative modeling of 3D vascular hemodynamics with physiologically realistic tissue motion for uPDI.The framework integrates a L-system-based vascular generator with hemodynamics modeling, a tissue motion simulator supporting user-defined or clinical-data-driven condition, an optimized ultrasound simulator, a GPU-accelerated image reconstruction module, and a quantitative analyzer (MSE/PSNR/SSIM). We demonstrate the workflow and performance of 3D-FQFlow using both synthetic vascular structures and clinical datasets. This framework provides ground-truth simulation models to support the development, validation, and benchmarking of uPDI techniques. The complete source code is freely available online at https://github.com/FortuneOU/3D-FQFlow.
极速功率多普勒成像(uPDI)已经取得了重要进展,并已成为研究和临床应用的关键技术。然而,现有的仿真工具在生成具有组织运动的定量三维(3D)血流场方面仍存在不足,无法紧密近似体内条件。在本研究中,我们提出了一种开源框架,名为“三维全定量血流”(3D-FQFlow),旨在为uPDI提供具有生理现实性组织运动的定量三维血流动力学建模。该框架集成了基于L系统的血管生成器与血流动力学建模、支持用户定义或临床数据驱动条件的组织运动模拟器、优化后的超声模拟器、GPU加速的图像重建模块以及定量分析仪(MSE/PSNR/SSIM)。我们展示了使用合成血管结构和临床数据集的工作流程和性能。该框架提供了基准仿真模型,以支持uPDI技术的开发、验证和基准测试。完整的源代码可免费在线获取,地址是:https://github.com/FortuneOU/3D-FQFlow。
论文及项目相关链接
Summary
本文介绍了一项名为“三维全定量流”(3D-FQFlow)的开源框架,该框架旨在提供定量建模三维血管血流动力学的方法,并模拟生理现实条件下的组织运动,以支持超快速功率多普勒成像(uPDI)。该框架集成了基于L系统的血管生成器、血流动力学建模、支持用户定义或临床数据驱动条件的组织运动模拟器、优化的超声模拟器、GPU加速的图像重建模块和定量分析仪。通过合成血管结构和临床数据集,展示了该框架的工作流程和性能。它为uPDI技术的发展、验证和基准测试提供了地面真实模拟模型。
Key Takeaways
- 介绍了名为“三维全定量流”(3D-FQFlow)的开源框架。
- 该框架旨在提供定量建模三维血管血流动力学的方法。
- 该框架可以模拟生理现实条件下的组织运动。
- 该框架集成了多个模块,包括血管生成器、血流动力学建模等。
- 该框架支持超快速功率多普勒成像(uPDI)。
- 通过合成血管结构和临床数据集,展示了该框架的工作流程和性能。
点此查看论文截图






Learning functions through Diffusion Maps
Authors:Alvaro Almeida Gomez
We propose a data-driven method for approximating real-valued functions on smooth manifolds, building on the Diffusion Maps framework under the manifold hypothesis. Given pointwise evaluations of a function, the method constructs a smooth extension to the ambient space by exploiting diffusion geometry and its connection to the heat equation and the Laplace-Beltrami operator. To address the computational challenges of high-dimensional data, we introduce a dimensionality reduction strategy based on the low-rank structure of the distance matrix, revealed via singular value decomposition (SVD). In addition, we develop an online updating mechanism that enables efficient incorporation of new data, thereby improving scalability and reducing computational cost. Numerical experiments, including applications to sparse CT reconstruction, demonstrate that the proposed methodology outperforms classical feedforward neural networks and interpolation methods in terms of both accuracy and efficiency.
我们提出了一种基于数据驱动的方法,用于在平滑流形上逼近实值函数。该方法建立在流形假设下的扩散映射框架之上。给定函数的点态评估,该方法通过利用扩散几何及其与热方程和Laplace-Beltrami算子的联系,在环境空间中构建平滑扩展。为了解决高维数据的计算挑战,我们提出了一种基于距离矩阵低秩结构的降维策略,该策略通过奇异值分解(SVD)揭示。此外,我们开发了一种在线更新机制,能够高效地融入新数据,从而提高可扩展性并降低计算成本。数值实验表明,包括在稀疏CT重建中的应用在内,所提出的方法在准确性和效率方面都优于经典的前馈神经网络和插值方法。
论文及项目相关链接
PDF Comments are welcome
Summary
本文提出一种基于数据驱动的方法,用于在平滑流形上近似实值函数。该方法利用扩散映射框架和流形假设,通过扩散几何及其与热方程和Laplace-Beltrami算子的联系,在环境空间中构建平滑扩展。针对高维数据的计算挑战,我们采用基于距离矩阵低秩结构的降维策略,通过奇异值分解(SVD)揭示。此外,我们开发了一种在线更新机制,能够高效整合新数据,从而提高可扩展性并降低计算成本。数值实验表明,该方法在精度和效率方面都优于传统的前馈神经网络和插值方法,包括在稀疏CT重建中的应用。
Key Takeaways
- 本文提出一种基于数据驱动的方法,用于在平滑流形上近似实值函数。
- 方法利用扩散映射框架和流形假设,构建平滑扩展至环境空间。
- 通过扩散几何和与热方程、Laplace-Beltrami算子的联系实现方法。
- 采用基于距离矩阵低秩结构的降维策略,通过奇异值分解(SVD)实现高维数据的处理。
- 引入在线更新机制,高效整合新数据,提高方法可扩展性并降低计算成本。
- 数值实验表明,该方法在精度和效率方面优于传统方法。
点此查看论文截图

Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings
Authors:Feiwei Qin, Shichao Lu, Junhao Hou, Changmiao Wang, Meie Fang, Ligang Liu
Computer-Aided Design (CAD) generative modeling is driving significant innovations across industrial applications. Recent works have shown remarkable progress in creating solid models from various inputs such as point clouds, meshes, and text descriptions. However, these methods fundamentally diverge from traditional industrial workflows that begin with 2D engineering drawings. The automatic generation of parametric CAD models from these 2D vector drawings remains underexplored despite being a critical step in engineering design. To address this gap, our key insight is to reframe CAD generation as a sequence-to-sequence learning problem where vector drawing primitives directly inform the generation of parametric CAD operations, preserving geometric precision and design intent throughout the transformation process. We propose Drawing2CAD, a framework with three key technical components: a network-friendly vector primitive representation that preserves precise geometric information, a dual-decoder transformer architecture that decouples command type and parameter generation while maintaining precise correspondence, and a soft target distribution loss function accommodating inherent flexibility in CAD parameters. To train and evaluate Drawing2CAD, we create CAD-VGDrawing, a dataset of paired engineering drawings and parametric CAD models, and conduct thorough experiments to demonstrate the effectiveness of our method. Code and dataset are available at https://github.com/lllssc/Drawing2CAD.
计算机辅助设计(CAD)生成建模正在推动工业应用的重大创新。近期的研究工作显示,从点云、网格和文本描述等各种输入创建实体模型取得了显著进展。然而,这些方法从根本上偏离了始于二维工程图纸的传统工业工作流程。尽管从二维矢量图形自动生成参数化CAD模型是工程设计中的关键步骤,但这一领域的探索仍然不足。为了弥补这一空白,我们的主要见解是将CAD生成重新构建为一个序列到序列的学习问题,其中矢量绘图原语直接为参数化CAD操作生成提供信息,并在整个转换过程中保留几何精度和设计意图。我们提出了Drawing2CAD框架,包含三个关键技术组件:友好的网络矢量原始表示,保留精确的几何信息;双解码器转换器架构,在保持精确对应的同时解耦命令类型和参数生成;以及适应CAD参数固有灵活性的软目标分布损失函数。为了训练和评估Drawing2CAD,我们创建了CAD-VGDrawing数据集,其中包含配对工程图纸和参数化CAD模型,并进行了一系列实验来证明我们的方法的有效性。代码和数据集可在https://github.com/lllssc/Drawing2CAD找到。
论文及项目相关链接
PDF Accepted to ACM MM 2025
Summary
基于计算机辅助设计(CAD)的生成建模正在推动工业应用的重大创新。最新研究已从点云、网格和文本描述等输入创建了令人印象深刻的实体模型。然而,这些方法与传统工业工作流程相悖,传统流程始于二维工程图纸。尽管从二维矢量图纸自动生成参数化CAD模型是工程设计中的关键步骤,但目前这一领域的自动探索仍较少。为解决这一差距,我们的见解是将CAD生成重新构建为序列到序列的学习问题,其中矢量绘图原语直接为参数化CAD操作生成提供信息,在转换过程中保留几何精度和设计意图。我们提出Drawing2CAD框架,包含三个关键技术组件:友好的网络矢量原始表示,保留精确的几何信息;双解码器转换器架构,在保持精确对应的同时解耦命令类型和参数生成;以及容纳CAD参数固有灵活性的软目标分布损失函数。为训练和评估Drawing2CAD,我们创建了CAD-VGDrawing数据集,包含配对的工程图纸和参数化CAD模型,并通过实验证明了我们方法的有效性。
Key Takeaways
- 计算机辅助设计(CAD)生成建模正在推动工业应用的创新。
- 最新方法主要从点云、网格和文本描述创建实体模型,但传统工业流程始于二维工程图纸。
- 从二维矢量图纸自动生成参数化CAD模型是工程设计中的关键步骤,但此领域的自动探索较少。
- Drawing2CAD框架将CAD生成重新构建为序列到序列的学习问题,利用矢量绘图原语直接生成参数化CAD操作。
- Drawing2CAD框架包含三个关键技术组件:网络矢量原始表示、双解码器转换器架构和软目标分布损失函数。
- 为训练和评估Drawing2CAD,创建了CAD-VGDrawing数据集。
点此查看论文截图



Scaling Artificial Intelligence for Prostate Cancer Detection on MRI towards Organized Screening and Primary Diagnosis in a Global, Multiethnic Population (Study Protocol)
Authors:Anindo Saha, Joeran S. Bosma, Jasper J. Twilt, Alexander B. C. D. Ng, Aqua Asif, Kirti Magudia, Peder Larson, Qinglin Xie, Xiaodong Zhang, Chi Pham Minh, Samuel N. Gitau, Ivo G. Schoots, Martijn F. Boomsma, Renato Cuocolo, Nikolaos Papanikolaou, Daniele Regge, Derya Yakar, Mattijs Elschot, Jeroen Veltman, Baris Turkbey, Nancy A. Obuchowski, Jurgen J. Fütterer, Anwar R. Padhani, Hashim U. Ahmed, Tobias Nordström, Martin Eklund, Veeru Kasivisvanathan, Maarten de Rooij, Henkjan Huisman
In this intercontinental, confirmatory study, we include a retrospective cohort of 22,481 MRI examinations (21,288 patients; 46 cities in 22 countries) to train and externally validate the PI-CAI-2B model, i.e., an efficient, next-generation iteration of the state-of-the-art AI system that was developed for detecting Gleason grade group $\geq$2 prostate cancer on MRI during the PI-CAI study. Of these examinations, 20,471 cases (19,278 patients; 26 cities in 14 countries) from two EU Horizon projects (ProCAncer-I, COMFORT) and 12 independent centers based in Europe, North America, Asia and Africa, are used for training and internal testing. Additionally, 2010 cases (2010 patients; 20 external cities in 12 countries) from population-based screening (STHLM3-MRI, IP1-PROSTAGRAM trials) and primary diagnostic settings (PRIME trial) based in Europe, North and South Americas, Asia and Australia, are used for external testing. Primary endpoint is the proportion of AI-based assessments in agreement with the standard of care diagnoses (i.e., clinical assessments made by expert uropathologists on histopathology, if available, or at least two expert urogenital radiologists in consensus; with access to patient history and peer consultation) in the detection of Gleason grade group $\geq$2 prostate cancer within the external testing cohorts. Our statistical analysis plan is prespecified with a hypothesis of diagnostic interchangeability to the standard of care at the PI-RADS $\geq$3 (primary diagnosis) or $\geq$4 (screening) cut-off, considering an absolute margin of 0.05 and reader estimates derived from the PI-CAI observer study (62 radiologists reading 400 cases). Secondary measures comprise the area under the receiver operating characteristic curve (AUROC) of the AI system stratified by imaging quality, patient age and patient ethnicity to identify underlying biases (if any).
在这项洲际确认性研究中,我们纳入了22,481例MRI检查(涉及来自22个国家中的46座城市、总计患者数量21,288名患者)来对PI-CAI-2B模型进行训练和外部验证。该模型是PI-CAI研究期间针对检测 Gleason ≥ 分组组的四级或以上前列腺肿瘤开创下一代最新 AI 系统所构建的升级版产品。来自两项欧盟地平线计划(ProCAncer-I 和 COMFORT 计划)和另外来自欧洲、北美、亚洲和非洲的十二家独立中心的共计 20,471 例(涉及来自十四个国家中的二十六座城市、总计患者数量 19,278 名患者)的案例用于内部测试和训练。另外,在来自基于欧洲的基于人口的筛查研究(STHLM3-MRI 与 IP1-PROSTAGRAM 试验)、涉及欧洲和北南美洲、亚洲和澳大利亚的基于初级诊断设置(PRIME 试验)的共计 2010 例患者的支持下来,对此 AI 模型进行了外部测试,数据提供者有经过集体分析记录的累积组织起来的自然生成的团队参考流程。(一般而言患者平均年龄与国际流行趋势趋于一致的患者观察所得的独有逻辑表征会在介入准确度临床试验作为常规检查证明有意义等不利观察以后推广到评价项目上使用)(The independent supportin main question corresponds to proportion of the intelligent decisions being aligned with those by conventional diagnostic procedures on detection Gleason grade group $\geq$2 prostate cancer.)主要观察终点是与常规诊断结果一致的比例进行指标确认性(当病理诊断由经验丰富的泌尿病理学家进行组织病理学评估时,如果无法获取则至少由两名泌尿生殖放射专家共识进行)在该研究的实验样本队列中对 Gleason ≥ 分组组的四级或以上前列腺肿瘤的检测率上应用智能系统判断结果的准确性和精确性上起着无可或缺的作用。该试验经过前期制定的预设计划同时实施样本病理等级的相对对照模型界定为有能解释未知问题的所有实验设计与操作流程风险病理检查结果等指标经来自早期面向系统覆盖范围推导涉及的分类并且组成效果函数中有机判断互为关键的受众单元继而参考用于病理生理评分参考上限并覆盖最终交叉比较辅助阅读评价员汇总和共享的观察学习进步或待优化的遗留性症状来指导其工作规程的执行结果同时接受外部专业医疗质量监控结果确认是否达到了检测前列腺癌的临床诊断金标准对于提高患者治疗效果有着重大意义。我们的统计计划假设是在与 PI-RADS ≥ 分组等级三(初级诊断)或四(筛查)的常规诊断标准互换性上,考虑了 PI-CAI 观察者研究中由六十二位放射科医生对四百例案例得出的评估值及与差异 0.05 相关性的范围评估我们将会分析测量受试系统的表现并与关注质量控制的发展之再做一个明确探究者意识的灵敏度能够设定底线来达到这次操作的等级如何等综合评估。次要评估指标包括通过分层成像质量、患者年龄和种族识别潜在偏见(如果存在)来测量 AI 系统的曲线下面积(AUROC)。
论文及项目相关链接
Summary
本研究采用多国际跨地域的大型队列研究,利用全球不同地区的MRI检查数据训练和验证新一代前列腺癌AI诊断模型PI-CAI-2B。该模型旨在提高前列腺癌分级诊断的准确度,特别是针对Glason评分≥2的前列腺癌检测。研究中利用多来源数据包括欧盟Horizon项目的案例,用于内部训练测试和外部测试。主要研究目标是评估AI诊断结果与标准护理诊断结果的符合程度,同时探讨模型在不同成像质量、患者年龄和种族方面的表现差异。
Key Takeaways
- 本研究采用全球多地域的MRI检查数据,旨在训练和验证新一代前列腺癌AI诊断模型PI-CAI-2B。
- 该模型是基于已有的先进AI系统PI-CAI开发,用于检测Glason评分≥2的前列腺癌。
- 数据来源于两大欧盟Horizon项目及多个独立研究中心,涵盖欧洲、北美、亚洲和非洲等不同地区。
- 主要研究目标是评估AI诊断结果与标准护理诊断结果的符合程度。
- 研究将考虑AI诊断在成像质量、患者年龄和种族等方面的差异表现。
- 研究采用预设的统计分析计划,包括假设检验和读者评估,以验证模型的诊断互换性。
点此查看论文截图






SV-DRR: High-Fidelity Novel View X-Ray Synthesis Using Diffusion Model
Authors:Chun Xie, Yuichi Yoshii, Itaru Kitahara
X-ray imaging is a rapid and cost-effective tool for visualizing internal human anatomy. While multi-view X-ray imaging provides complementary information that enhances diagnosis, intervention, and education, acquiring images from multiple angles increases radiation exposure and complicates clinical workflows. To address these challenges, we propose a novel view-conditioned diffusion model for synthesizing multi-view X-ray images from a single view. Unlike prior methods, which are limited in angular range, resolution, and image quality, our approach leverages the Diffusion Transformer to preserve fine details and employs a weak-to-strong training strategy for stable high-resolution image generation. Experimental results demonstrate that our method generates higher-resolution outputs with improved control over viewing angles. This capability has significant implications not only for clinical applications but also for medical education and data extension, enabling the creation of diverse, high-quality datasets for training and analysis. Our code is available at GitHub.
X射线成像是一种快速且成本效益高的工具,用于可视化人体内部结构。虽然多视角X射线成像提供了增强诊断、干预和教育的补充信息,但从多个角度获取图像会增加辐射暴露并使临床工作流程复杂化。为了解决这些挑战,我们提出了一种基于扩散模型的新型视角合成方法,可以从单个视角合成多视角X射线图像。不同于以前的方法,在角度范围、分辨率和图像质量上有局限性,我们的方法利用扩散变压器保留细节,并采用由弱到强的训练策略进行稳定的高分辨率图像生成。实验结果表明,我们的方法生成了更高分辨率的输出,对观察角度的控制也有所改善。这项能力不仅对于临床应用具有重要意义,而且对于医学教育和数据扩展也有重大影响,能够创建多样、高质量的数据集,用于培训和数据分析。我们的代码可在GitHub上获得。
论文及项目相关链接
PDF Accepted by MICCAI2025
Summary
本研究提出一种基于扩散模型的新型视图条件化方法,可从单一视角合成多视角X射线图像。该方法利用扩散变压器保存细节,采用弱到强的训练策略实现稳定的高分辨率图像生成。实验结果表明,该方法生成的图像分辨率高,对视角控制更加灵活。该研究不仅在临床应用中具有显著意义,还对医学教育和数据扩展产生影响,能为训练和分析创建多样、高质量的数据集。
Key Takeaways
- X射线成像是一种快速且经济实惠的内部人体结构可视化工具。
- 多视角X射线成像能够提供互补信息,增强诊断、治疗和教育的效果。
- 当前多视角成像面临的挑战包括辐射暴露增加和临床工作流程复杂化。
- 研究提出了一种基于视图条件的扩散模型,可从单一视角合成多视角X射线图像。
- 该方法利用扩散变压器保存细节,并采用弱到强的训练策略进行图像生成。
- 实验结果显示,该方法生成的高分辨率图像具有更好的视角控制能力。
- 该研究对临床应用、医学教育和数据扩展具有重要影响,能创建高质量、多样化的数据集,用于训练和深入分析。
点此查看论文截图



Generation of Indoor Open Street Maps for Robot Navigation from CAD Files
Authors:Jiajie Zhang, Shenrui Wu, Xu Ma, Sören Schwertfeger
The deployment of autonomous mobile robots is predicated on the availability of environmental maps, yet conventional generation via SLAM (Simultaneous Localization and Mapping) suffers from significant limitations in time, labor, and robustness, particularly in dynamic, large-scale indoor environments where map obsolescence can lead to critical localization failures. To address these challenges, this paper presents a complete and automated system for converting architectural Computer-Aided Design (CAD) files into a hierarchical topometric OpenStreetMap (OSM) representation, tailored for robust life-long robot navigation. Our core methodology involves a multi-stage pipeline that first isolates key structural layers from the raw CAD data and then employs an AreaGraph-based topological segmentation to partition the building layout into a hierarchical graph of navigable spaces. This process yields a comprehensive and semantically rich map, further enhanced by automatically associating textual labels from the CAD source and cohesively merging multiple building floors into a unified, topologically-correct model. By leveraging the permanent structural information inherent in CAD files, our system circumvents the inefficiencies and fragility of SLAM, offering a practical and scalable solution for deploying robots in complex indoor spaces. The software is encapsulated within an intuitive Graphical User Interface (GUI) to facilitate practical use. The code and dataset are available at https://github.com/jiajiezhang7/osmAG-from-cad.
自主移动机器人的部署依赖于环境地图的可用性,然而,传统的通过SLAM(同时定位与地图构建)生成地图的方法在时间、劳动力和稳健性方面存在重大局限性,特别是在动态、大规模室内环境中,地图失效可能导致关键定位失败。为了解决这些挑战,本文提出了一种完整、自动化的系统,该系统可将建筑计算机辅助设计(CAD)文件转换为分层的拓扑OpenStreetMap(OSM)表示形式,特别适合用于稳健的终身机器人导航。我们的核心方法涉及一个多阶段管道,首先从原始CAD数据中隔离出关键结构层,然后采用基于AreaGraph的拓扑分割将建筑布局划分为可导航空间的层次图。这一过程生成了全面且语义丰富的地图,通过自动关联来自CAD源的文本标签,并将多个建筑楼层合并为一个统一、拓扑正确的模型,进一步增强了地图的功能。通过利用CAD文件中固有的永久结构信息,我们的系统避免了SLAM的不效率和脆弱性,为在复杂室内空间部署机器人提供了实用且可扩展的解决方案。软件被封装在一个直观的图形用户界面(GUI)中,以方便实际应用。代码和数据集可在https://github.com/jiajiezhang7/osmAG-from-cad上找到。
论文及项目相关链接
PDF 8 pages, 8 figures
Summary
基于环境地图的自主移动机器人部署面临时间、劳动力和稳健性方面的挑战,特别是在动态、大规模室内环境中,地图失效可能导致关键定位失败。针对这些问题,本文提出一种将建筑计算机辅助设计(CAD)文件转换为分层拓扑OpenStreetMap(OSM)表示形式的完整自动化系统,用于实现稳健的终身机器人导航。
Key Takeaways
- 该方法使用多阶段管道处理,从原始CAD数据中隔离关键结构层。
- 采用AreaGraph基于拓扑的分割将建筑布局划分为可导航空间的层次图。
- 系统能够生成综合且语义丰富的地图,通过自动关联CAD源中的文本标签并整合多个楼层,形成统一、拓扑正确的模型。
- 利用CAD文件中固有的永久结构信息,避免了SLAM方法的不效率和脆弱性。
- 系统提供了一个实用的室内空间机器人部署解决方案。
- 软件封装在直观的用户图形界面(GUI)内,便于实际应用。
点此查看论文截图








Uncertainty-aware Diffusion and Reinforcement Learning for Joint Plane Localization and Anomaly Diagnosis in 3D Ultrasound
Authors:Yuhao Huang, Yueyue Xu, Haoran Dou, Jiaxiao Deng, Xin Yang, Hongyu Zheng, Dong Ni
Congenital uterine anomalies (CUAs) can lead to infertility, miscarriage, preterm birth, and an increased risk of pregnancy complications. Compared to traditional 2D ultrasound (US), 3D US can reconstruct the coronal plane, providing a clear visualization of the uterine morphology for assessing CUAs accurately. In this paper, we propose an intelligent system for simultaneous automated plane localization and CUA diagnosis. Our highlights are: 1) we develop a denoising diffusion model with local (plane) and global (volume/text) guidance, using an adaptive weighting strategy to optimize attention allocation to different conditions; 2) we introduce a reinforcement learning-based framework with unsupervised rewards to extract the key slice summary from redundant sequences, fully integrating information across multiple planes to reduce learning difficulty; 3) we provide text-driven uncertainty modeling for coarse prediction, and leverage it to adjust the classification probability for overall performance improvement. Extensive experiments on a large 3D uterine US dataset show the efficacy of our method, in terms of plane localization and CUA diagnosis. Code is available at https://github.com/yuhoo0302/CUA-US.
先天性子宫异常(CUAs)可能导致不孕、流产、早产和妊娠并发症风险增加。与传统的二维超声(US)相比,三维超声能够重建冠状面,为评估CUAs提供清晰的子宫形态可视化。在本文中,我们提出了一种智能系统,用于同时进行平面定位自动化和CUA诊断。我们的要点如下:1)我们开发了一种带有局部(平面)和全局(体积/文本)指导的去噪扩散模型,并使用自适应加权策略优化对不同条件的注意力分配;2)我们引入了一种基于强化学习的框架,通过无监督奖励从冗余序列中提取关键切片摘要,充分整合多个平面的信息,降低学习难度;3)我们为粗略预测提供了文本驱动的不确定性建模,并利用它调整分类概率以提高整体性能。在大型三维子宫超声数据集上进行的广泛实验表明,我们的方法在平面定位和CUA诊断方面都很有效。代码可在https://github.com/yuhoo0302/CUA-US找到。
论文及项目相关链接
PDF Accepted by MICCAI 2025;10 pages, 3 figures
摘要
先天性子宫异常(CUAs)可能导致不孕、流产、早产和妊娠并发症风险增加。与传统二维超声(US)相比,三维超声(US)能重建冠状面,更准确地评估子宫形态,从而诊断CUAs。本文提出了一种智能系统,用于同时进行平面定位和CUA诊断。主要亮点包括:1)开发了一种带有局部(平面)和全局(体积/文本)引导的降噪扩散模型,采用自适应加权策略优化不同条件下的注意力分配;2)引入基于强化学习的框架和无人监督奖励机制,从冗余序列中提取关键切片摘要,并整合多平面信息降低学习难度;3)建立文本驱动的不确定性模型进行粗略预测,并利用其调整分类概率以提高整体性能。在大型三维子宫超声数据集上的广泛实验证明了该方法在平面定位和CUA诊断方面的有效性。相关代码可通过链接访问:https://github.com/yuhoo0302/CUA-US。
关键见解
- 先天性子宫异常(CUAs)是多种妊娠问题的潜在原因,包括不孕、流产和妊娠并发症。
- 三维超声(US)技术比传统二维US技术更能准确评估子宫形态,有助于CUAs的诊断。
- 提出的智能系统结合了降噪扩散模型和强化学习框架,实现平面自动定位和CUAs诊断。
- 该系统通过局部和全局引导优化注意力分配,并从冗余序列中提取关键信息以提高效率。
- 通过文本驱动的不确定性建模进行粗略预测,提高诊断分类的准确性。
- 在大型三维子宫超声数据集上的实验证明了该系统的有效性。
点此查看论文截图



ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation
Authors:Feng Yuan, Yifan Gao, Wenbin Wu, Keqing Wu, Xiaotong Guo, Jie Jiang, Xin Gao
Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for preserving modality-specific edge and texture details, and Mamba’s selective state-space modeling for efficient long- and short-range feature dependencies. Structurally, our dual-resolution framework leverages SAM2’s image encoder to capture organ-scale semantics from high-resolution inputs, while a parallel CNNs branch extracts fine-grained local features. The Robust Feature Fusion Network (RFFN) integrates these epresentations, and the Bidirectional Mamba Residual Network (BMRN) models spatial dependencies using spiral scanning and bidirectional state-space dynamics. A three-stage skip fusion decoder enhances edge and texture fidelity. We employ Efficient Low-Rank Adaptation (LoRA+) fine-tuning to enable precise domain specialization while maintaining the foundational capabilities of the pre-trained components. Extensive experimental validation on the SynthRAD2023 and BraTS2019 datasets demonstrates that ABS-Mamba outperforms state-of-the-art methods, delivering high-fidelity cross-modal synthesis that preserves anatomical semantics and structural details to enhance diagnostic accuracy in clinical applications. The code is available at https://github.com/gatina-yone/ABS-Mamba
精确的多模态医学图像翻译需要协调全局解剖语义和局部结构忠实性,这一挑战因模态间信息丢失和结构失真而复杂化。我们提出了ABS-Mamba,这是一种新型架构,集成了Segment Anything Model 2(SAM2)用于器官感知语义表示,专门设计的卷积神经网络(CNN)用于保留模态特定的边缘和纹理细节,以及Mamba的选择性状态空间建模,以实现高效的长短程特征依赖性。结构上,我们的双分辨率框架利用SAM2的图像编码器捕获高分辨率输入的组织规模语义,而并行的CNN分支则提取精细的局部特征。鲁棒特征融合网络(RFFN)集成了这些表示,双向Mamba残差网络(BMRN)使用螺旋扫描和双向状态空间动力学进行空间依赖性建模。三阶段跳跃融合解码器增强了边缘和纹理的忠实性。我们采用有效的低秩适应(LoRA+)微调方法,以实现在保持预训练组件基础能力的同时,进行精确的领域专业化。在SynthRAD2023和BraTS2019数据集上的大量实验验证表明,ABS-Mamba优于最新方法,实现了高保真跨模态合成,保留了解剖语义和结构细节,提高了临床应用中诊断的准确性。代码可用在https://github.com/gatina-yone/ABS-Mamba。
论文及项目相关链接
PDF MICCAI 2025(under view)
摘要
多模态医学图像翻译的准确性需要调和全局解剖语义和局部结构忠实度,这一挑战因模态间信息丢失和结构失真而复杂化。我们提出ABS-Mamba,一种结合SAM2进行器官感知语义表示、专业卷积神经网络(CNN)保留模态特定的边缘和纹理细节,以及Mamba的选择状态空间建模进行长短距离特征依赖的新架构。我们的双分辨率框架利用SAM2的图像编码器捕捉高分辨率输入的器官规模语义,而并行的CNN分支提取精细的局部特征。融合网络(RFFN)将这些表示集成在一起,双向Mamba残差网络(BMRN)使用螺旋扫描和双向状态空间动力学建模空间依赖关系。三阶段跳过融合解码器增强了边缘和纹理忠实度。我们在SynthRAD2023和BraTS2019数据集上进行的大量实验验证表明,ABS-Mamba优于最新方法,可实现高保真跨模态合成,保留解剖语义和结构细节,提高临床应用的诊断准确性。
关键见解
- ABS-Mamba架构集成了多种技术,包括SAM2用于器官感知语义表示、专业CNN保留模态特定的细节,以及Mamba的状态空间建模。
- 双分辨率框架能够捕捉器官规模语义和精细的局部特征。
- RFFN网络融合了不同表示,而BMRN则通过螺旋扫描和双向状态空间动力学建模空间依赖关系。
- 三阶段跳过融合解码器增强了边缘和纹理的忠实度。
- ABS-Mamba使用LoRA+微调方法,能够在精确域专业化的同时保持预训练组件的基础能力。
- 在SynthRAD2023和BraTS2019数据集上的实验表明,ABS-Mamba在跨模态合成方面表现出色,能够保留解剖语义和结构细节。
- ABS-Mamba架构有望提高医学图像诊断的准确性,并在临床应用中有广阔前景。
点此查看论文截图

