发布日期: 2025-09-17

更新日期: 2025-10-07

文章字数: 21k

阅读时长: 86 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-17 更新

Deriving accurate galaxy cluster masses using X-ray thermodynamic profiles and graph neural networks

Authors:Asif Iqbal, Subhabrata Majumdar, Elena Rasia, Gabriel W. Pratt, Daniel de Andres, Jean-Baptiste Melin, Weiguang Cui

Precise determination of galaxy cluster masses is crucial for establishing reliable mass-observable scaling relations in cluster cosmology. We employ graph neural networks (GNNs) to estimate galaxy cluster masses from radially sampled profiles of the intra-cluster medium (ICM) inferred from X-ray observations. GNNs naturally handle inputs of variable length and resolution by representing each ICM profile as a graph, enabling accurate and flexible modeling across diverse observational conditions. We trained and tested GNN model using state-of-the-art hydrodynamical simulations of galaxy clusters from The Three Hundred Project. The mass estimates using our method exhibit no systematic bias compared to the true cluster masses in the simulations. Additionally, we achieve a scatter in recovered mass versus true mass of about 6%, which is a factor of six smaller than obtained from a standard hydrostatic equilibrium approach. Our algorithm is robust to both data quality and cluster morphology and it is capable of incorporating model uncertainties alongside observational uncertainties. Finally, we apply our technique to XMM-Newton observed galaxy cluster samples and compare the GNN derived mass estimates with those obtained with $Y_{\rm SZ}$-M${500}$ scaling relations. Our results provide strong evidence, at 5$\sigma$ level, for a mass-dependent bias in SZ derived masses, with higher mass clusters exhibiting a greater degree of deviation. Furthermore, we find the median bias to be $(1-b)=0.85{-14}^{+34}$, albeit with significant dispersion due to its mass dependence. This work takes a significant step towards establishing unbiased observable mass scaling relations by integrating X-ray, SZ and optical datasets using deep learning techniques, thereby enhancing the role of galaxy clusters in precision cosmology.

精确地确定星系团的质量对于建立集群宇宙学中可靠的质量观测比例关系至关重要。我们采用图神经网络（GNNs）估计星系团质量，这些估计基于从X射线观测推断的集群内介质（ICM）的径向采样分布。图神经网络通过将每个ICM分布表示为图来自然处理可变长度和分辨率的输入，从而在各种观测条件下实现准确且灵活建模。我们使用来自“三百项目”最先进的星系团流体动力学模拟来训练和测试GNN模型。与使用模拟中的真实团簇质量相比，我们的方法得出的质量估计值没有系统性偏差。此外，我们获得的恢复质量与真实质量之间的偏差约为6％，这是通过传统的静水力学平衡方法获得的散度的六分之一。我们的算法对数据质量和集群形态具有很强的鲁棒性，并且它能够结合模型不确定性和观测不确定性。最后，我们将该技术应用于XMM-牛顿观测的星系团样本，并将GNN衍生的质量估计值与通过Y_SZ-M_500比例关系获得的质量进行比较。我们的结果在5σ水平上提供了有力的证据表明存在与质点相关的偏差，更高质量的团簇表现出更大的偏差程度。此外，我们发现中位偏差为（1-b）= 0.85_{-14}^{+34}，尽管由于其对质量的依赖性而具有较大的分散度。通过整合X射线、SZ和光学数据集使用深度学习技术建立无偏的可观测质量比例关系，这项工作迈出了重要的一步，从而增强了星系团在精确宇宙学中的作用。

论文及项目相关链接

PDF 20 pages, 15 figures, 6 tables, resubmitted to A&A after revision, comments welcome

摘要

本研究采用图神经网络（GNNs）方法，根据X射线观测得出的星系团内介质（ICM）径向分布图估计星系团质量。这种方法能够自然处理不同长度和分辨率的输入数据，通过图形表示每个ICM分布特征，在多种观测条件下实现精确灵活建模。研究使用The Three Hundred Project先进的水动力学模拟数据训练并测试了GNN模型。该方法的估算质量相较于模拟中的真实质量无明显系统性偏差，相较于静水压平衡方法所得到的估计质量值误差减少约降低了百分之六的散射程度。这一算法稳健性良好，能够应对不同质量和形态下的数据差异并能整合模型及观测的不确定性。本研究进一步应用这一技术于搭载在天文卫星集群XMM上的Newton观测结果中估算出的星系团样本的质量与SZ谱下的相应数值进行对比。本研究以强有力的证据表明：有偏存在的MZ呈现出强大的质量和相互偏差的影响并且主要集中在高质量的集群区域偏离程度高这将在五个方面使用基本选择上下法的的精细规定。。通过这些研究和评估，我们找到了一个中位数偏差为（1-b）= 0.85_{-14}^{+34}，尽管由于质量依赖性的存在导致弥散度较大。本研究通过整合X射线、SZ和光学数据集并采用深度学习技术，朝着建立无偏观测质量比例关系迈出了重要一步，从而提升星系团在精确宇宙学中的角色。研究展望该技术的巨大潜力将为建立精确的宇宙学模型奠定重要基础。简而言之，该论文展示了利用图神经网络估算星系团质量的新方法，并证明了其在提高宇宙学精度方面的潜力。

关键见解

一、采用图神经网络估算星系团质量，利用X射线观测的ICM径向分布数据。此方法适应不同长度和分辨率的数据输入，展现出精确灵活建模的能力。
二、利用先进的水动力学模拟数据进行模型训练与验证，表明质量估算结果无系统性偏差且误差分散较小。
三、研究验证了算法对数据质量和集群形态的稳健性，并考虑了模型及观测的不确定性。
四、应用此方法于实际观测数据，发现大规模星系团的质量偏差现象，并指出其质量依赖特性。

Cool Papers

点此查看论文截图

Multi Anatomy X-Ray Foundation Model

Authors:Nishank Singla, Krisztian Koos, Farzin Haddadpour, Amin Honarmandi Shandiz, Lovish Chum, Xiaojian Xu, Qing Jin, Erhan Bas

X-ray imaging is a ubiquitous in radiology, yet most existing AI foundation models are limited to chest anatomy and fail to generalize across broader clinical tasks. In this work, we introduce XR-0, the multi-anatomy X-ray foundation model using self-supervised learning on a large, private dataset of 1.15 million images spanning diverse anatomical regions and evaluated across 12 datasets and 20 downstream tasks, including classification, retrieval, segmentation, localization, visual grounding, and report generation. XR-0 achieves state-of-the-art performance on most multi-anatomy tasks and remains competitive on chest-specific benchmarks. Our results demonstrate that anatomical diversity and supervision are critical for building robust, general-purpose medical vision models, paving the way for scalable and adaptable AI systems in radiology.

X射线成像在放射学中无处不在，然而大多数现有的AI基础模型仅限于胸部解剖结构，无法推广到更广泛的临床任务。在这项工作中，我们引入了XR-0，这是一个多解剖结构的X射线基础模型，它使用大规模私有数据集进行自监督学习，数据图像达115万张，涵盖了多种解剖结构区域，并在包括分类、检索、分割、定位、视觉定位和报告生成在内的12个数据集和20个下游任务中进行了评估。XR-0在大多数多解剖结构任务上取得了最先进的性能，并在专门针对胸部的基准测试中保持竞争力。我们的结果表明，解剖结构多样性和监督对于构建稳健的通用医疗视觉模型至关重要，为放射学中可扩展和可适应的AI系统铺平了道路。

论文及项目相关链接

PDF This work has been submitted to the IEEE for possible publication

Summary

本文介绍了XR-0多部位X光基础模型的研究。该模型采用大规模私有数据集进行自监督学习，涵盖多种解剖区域，并在包括分类、检索、分割、定位、视觉定位以及报告生成在内的下游任务上进行了评估。XR-0在多数多部位任务上达到了最佳性能，并在针对胸部的特定基准测试中保持了竞争力。研究表明，解剖多样性和监督对于构建稳健、通用的医学视觉模型至关重要，为放射学中可扩展和可适应的AI系统铺平了道路。

Key Takeaways

XR-0模型是一种多部位X光基础模型，旨在解决当前AI模型在放射学中局限于胸部解剖的问题。
该模型采用自监督学习在大规模私有数据集上进行训练，涵盖多种解剖区域，增强了模型的泛化能力。
XR-0在多个下游任务上进行了评估，包括分类、检索、分割等，表现出卓越的性能。
XR-0在多数多部位任务上达到最佳性能，并在胸部特定任务上保持竞争力。
解剖多样性和监督对于构建稳健、通用的医学视觉模型至关重要。
研究结果有助于推动放射学中可扩展和可适应的AI系统的发展。

Cool Papers

点此查看论文截图

3DViT-GAT: A Unified Atlas-Based 3D Vision Transformer and Graph Learning Framework for Major Depressive Disorder Detection Using Structural MRI Data

Authors:Nojod M. Alotaibi, Areej M. Alhothali, Manar S. Ali

Major depressive disorder (MDD) is a prevalent mental health condition that negatively impacts both individual well-being and global public health. Automated detection of MDD using structural magnetic resonance imaging (sMRI) and deep learning (DL) methods holds increasing promise for improving diagnostic accuracy and enabling early intervention. Most existing methods employ either voxel-level features or handcrafted regional representations built from predefined brain atlases, limiting their ability to capture complex brain patterns. This paper develops a unified pipeline that utilizes Vision Transformers (ViTs) for extracting 3D region embeddings from sMRI data and Graph Neural Network (GNN) for classification. We explore two strategies for defining regions: (1) an atlas-based approach using predefined structural and functional brain atlases, and (2) an cube-based method by which ViTs are trained directly to identify regions from uniformly extracted 3D patches. Further, cosine similarity graphs are generated to model interregional relationships, and guide GNN-based classification. Extensive experiments were conducted using the REST-meta-MDD dataset to demonstrate the effectiveness of our model. With stratified 10-fold cross-validation, the best model obtained 78.98% accuracy, 76.54% sensitivity, 81.58% specificity, 81.58% precision, and 78.98% F1-score. Further, atlas-based models consistently outperformed the cube-based approach, highlighting the importance of using domain-specific anatomical priors for MDD detection.

抑郁症（MDD）是一种常见的心理健康问题，对个人福祉和全球公共卫生都有负面影响。利用结构磁共振成像（sMRI）和深度学习（DL）方法进行抑郁症的自动化检测在提高诊断准确性和早期干预方面有着巨大的潜力。现有的大多数方法要么使用体素级特征，要么使用基于预定义脑图谱的手动区域表示，这限制了它们捕捉复杂脑模式的能力。本文开发了一个统一的流程，利用视觉转换器（ViTs）从sMRI数据中提取3D区域嵌入，并使用图神经网络（GNN）进行分类。我们探索了两种定义区域的方法：（1）一种基于图谱的方法，使用预定义的结构和功能脑图谱；（2）一种基于立方体块的方法，直接训练ViTs从均匀提取的3D块中识别区域。此外，还生成了余弦相似度图来模拟区域间的关系，并引导基于GNN的分类。使用REST-meta-MDD数据集进行了大量实验，以证明我们的模型的有效性。通过分层10倍交叉验证，最佳模型获得了78.98%的准确率、76.54%的敏感性、81.58%的特异性、81.58%的精确度和78.98%的F1分数。此外，基于图谱的模型始终优于基于立方体块的方法，强调了在使用特定领域的解剖学先验进行抑郁症检测时的重要性。

论文及项目相关链接

PDF 14 pages, 1 figure, 7 tables

Summary

本文介绍了一种利用Vision Transformers（ViTs）从结构磁共振成像（sMRI）数据中提取三维区域嵌入，结合Graph Neural Network（GNN）进行分类的抑郁症自动化检测模型。探讨了基于图谱的方法和基于立方体方法的两种区域定义策略，并使用REST-meta-MDD数据集进行验证。结果显示，基于图谱的方法性能更佳，准确性达到78.98%，显示出在抑郁症检测中利用特定领域解剖先验的重要性。

Key Takeaways

MDD是一种普遍的心理疾病，对个人健康和社会健康都有负面影响。
利用结构磁共振成像（sMRI）和深度学习（DL）方法进行自动化检测具有提高诊断准确性和实现早期干预的潜力。
该研究提出了一种结合了Vision Transformers（ViTs）和Graph Neural Network（GNN）的统一管道模型，用于从sMRI数据中提取三维区域嵌入并进行分类。
探讨了基于图谱和基于立方体两种区域定义策略，结果显示基于图谱的方法性能更优。
利用REST-meta-MDD数据集进行的实验显示，该模型的准确性达到78.98%，其中基于图谱的模型表现更稳定。

Cool Papers

点此查看论文截图

End-to-End 4D Heart Mesh Recovery Across Full-Stack and Sparse Cardiac MRI

Authors:Yihong Chen, Jiancheng Yang, Deniz Sayin Mercadier, Hieu Le, Juerg Schwitter, Pascal Fua

Reconstructing cardiac motion from cine CMR sequences is critical for diagnosis, prediction, and intervention. Existing methods rely on complete CMR stacks to infer full heart motion, limiting their utility in intra-procedural scenarios where only sparse observations are available. We present TetHeart, the first end-to-end framework that unifies full 4D multi-structure heart mesh recovery from both offline full-stack acquisitions and intra-procedural sparse-slice observations. Our method leverages deep deformable tetrahedra, an explicit-implicit hybrid representation, to capture shape and motion in a coherent space shared across cardiac structures. It is initialized from high-quality pre-procedural or offline-acquired full stacks to build detailed, patient-specific heart meshes, which can then be updated using whatever slices are available, from full stacks down to a single slice. We further incorporate several key innovations: (i) an attentive mechanism for slice-adaptive 2D-3D feature assembly that dynamically integrates information from arbitrary numbers of slices at any position, combined with a distillation strategy from full-slice to sparse-slice settings to ensure accurate reconstruction under extreme sparsity; and (ii) a two-stage weakly supervised motion learning scheme requiring only keyframe (e.g., ED and ES) annotations. Trained and validated on three large public datasets and externally evaluated zero-shot on additional private interventional and public CMR datasets, TetHeart achieves state-of-the-art accuracy and strong generalization in both pre- and intra-procedural settings.

从电影式CMR序列重建心脏运动对于诊断、预测和干预至关重要。现有方法依赖于完整的CMR堆叠来推断完整的心脏运动，这在仅提供稀疏观察的术中场景中限制了它们的实用性。我们提出了TetHeart，这是第一个端到端的框架，它统一了离线全栈采集和术中稀疏切片观察下的4D多结构心脏网格恢复。我们的方法利用可变形四面体，一种显式隐式混合表示法，来捕获跨心脏结构的共享连贯空间中的形状和运动。它通过对高质量术前或离线获取的全栈进行初始化，以构建详细的、针对特定患者的心脏网格，然后可以使用任何可用的切片进行更新，从全栈到单个切片。我们还融入了几个关键的创新点：（i）一种用于切片自适应的2D-3D特征组合的注意力机制，它动态地集成了任意位置任意数量的切片的信息，并结合了一种从全切片到稀疏切片的蒸馏策略，以确保在极端稀疏情况下实现准确的重建；（ii）一个两阶段的弱监督运动学习方案，只需要关键帧（例如ED和ES）注释。在三组大型公共数据集上进行训练和验证，并在额外的私有介入公共CMR数据集上进行零样本评估，TetHeart在术前和术中环境中均达到了最先进的准确性和强大的泛化能力。

论文及项目相关链接

PDF

Summary

本文介绍了一种名为TetHeart的端到端框架，该框架可从离线全栈采集和术中稀疏切片观测中恢复全4D多结构心脏网格。它利用深度可变形四面体，结合显式隐式混合表示法，捕捉心脏结构的形状和运动。该方法能够从全栈到单个切片进行更新，并使用切片自适应的2D-3D特征组装机制和蒸馏策略确保极端稀疏条件下的准确重建。此外，它采用两阶段弱监督运动学习方案，仅需要关键帧（如ED和ES）注释。该框架在公共数据集和私有介入及公共CMR数据集上的表现均达到最新水平。

Key Takeaways

TetHeart是一个端到端的框架，能够从离线全栈采集和术中稀疏切片观测中恢复心脏运动的完整4D多结构网格。
TetHeart利用深度可变形四面体进行心脏形状的表示和运动捕捉。
该方法能够适应不同数量的切片并能在全栈至单个切片的情况下进行更新。
通过切片自适应的2D-3D特征组装机制和蒸馏策略，确保在极端稀疏条件下的准确重建。
TetHeart采用两阶段弱监督运动学习方案，仅需要关键帧注释。

Cool Papers

点此查看论文截图

End-to-End Learning of Multi-Organ Implicit Surfaces from 3D Medical Imaging Data

Authors:Farahdiba Zarin, Nicolas Padoy, Jérémy Dana, Vinkle Srivastav

The fine-grained surface reconstruction of different organs from 3D medical imaging can provide advanced diagnostic support and improved surgical planning. However, the representation of the organs is often limited by the resolution, with a detailed higher resolution requiring more memory and computing footprint. Implicit representations of objects have been proposed to alleviate this problem in general computer vision by providing compact and differentiable functions to represent the 3D object shapes. However, architectural and data-related differences prevent the direct application of these methods to medical images. This work introduces ImplMORe, an end-to-end deep learning method using implicit surface representations for multi-organ reconstruction from 3D medical images. ImplMORe incorporates local features using a 3D CNN encoder and performs multi-scale interpolation to learn the features in the continuous domain using occupancy functions. We apply our method for single and multiple organ reconstructions using the totalsegmentator dataset. By leveraging the continuous nature of occupancy functions, our approach outperforms the discrete explicit representation based surface reconstruction approaches, providing fine-grained surface details of the organ at a resolution higher than the given input image. The source code will be made publicly available at: https://github.com/CAMMA-public/ImplMORe

从三维医学成像中对不同器官的精细表面重建可以提供先进的诊断支持和改进的手术计划。然而，器官的表示通常受到分辨率的限制，更高的详细分辨率需要更多的内存和计算资源。为了缓解一般计算机视觉中的这个问题，已经提出了对象的隐式表示方法，通过提供紧凑和可微分的函数来表示三维对象的形状。然而，结构和数据相关的差异阻止了这些方法在医学图像上的直接应用。这项工作介绍了ImplMORe，这是一种使用隐式表面表示进行三维医学图像多器官重建的端到端深度学习方法。ImplMORe利用三维CNN编码器融入局部特征，并执行多尺度插值，利用占用函数在连续域中学习特征。我们使用totalsegmentator数据集进行单器官和多器官重建应用。通过利用占用函数的连续性，我们的方法优于基于离散显式表示的重建方法，提供了高于输入图像的分辨率的器官精细表面细节。源代码将在https://github.com/CAMMA-public/ImplMORe公开提供。

论文及项目相关链接

PDF

Summary
医学图像的精细表面重建对于高级诊断和手术规划有着重要的支持作用。然而，由于分辨率限制，高解析度需要更多的内存和计算资源。该研究引入了ImplMORe方法，一种利用隐式表面表示的多器官重建深度学习方法。它通过整合局部特征和进行多尺度插值，实现了在连续域内学习特征的能力，优于离散显式表示的表面重建方法，并提供了高于输入图像的分辨率的器官精细表面细节。

Key Takeaways

医学图像的精细表面重建对于诊断和手术规划至关重要。
现有的医学图像重建受限于分辨率，高解析度需要更多计算资源。
ImplMORe方法是一种利用隐式表面表示的多器官重建深度学习方法。
ImplMORe结合了局部特征和多尺度插值技术。
ImplMORe方法在连续域内学习特征的能力是其优势之一。
ImplMORe方法优于离散显式表示的表面重建方法。

Cool Papers

点此查看论文截图

A Study of Spectral Variability between flaring and non-flaring state in M74 X-1

Authors:Aman Upadhyay, Tanuman Ghosh, Vikram Rana

We conducted an extensive long-term spectral and timing study of the ultraluminous X-ray source (ULX) M74 X-1, using data taken between 2001 and 2021 by Chandra and XMM-Newton X-ray observatories. Our analysis shows that flares are present in some observations, whereas they are absent in others. Flaring state exhibits two-component spectra at a lower average flux level, whereas the non-flaring state displays single-component spectra at a higher average flux level. The M74 X-1 spectra are best described by the combination of accretion disk and Comptonization components, a dual thermal disk blackbody model, and a modified multi-temperature disk blackbody model. Using the dual thermal disk blackbody model, we obtain cool and hot temperatures of $T_{in}$ (cool) = $0.38^{+0.08}{-0.06}$ keV and $T{in}$ (hot) = $1.67^{+0.18}{-0.13}$ keV, respectively, suggesting two temperature emitting regions and indicating possible presence of outflowing wind along with the accretion disk. We found a Gaussian feature at $E{line}$ = $0.96^{+0.05}{-0.11}$ keV with $\sigma$ = $0.11^{+0.13}{-0.06}$ keV in the spectra of the flaring state which can be interpreted as the unresolved wind feature in the system when compared to similar feature seen in other ULXs. Plotting the hardness luminosity diagram, we get a trend of increasing hardness with luminosity, suggesting the presence of geometrical beaming in a low-inclination system. Additionally, using the hot disk blackbody component from the dual thermal disk blackbody model, we estimate the mass of the compact object to be M = $7.1^{+1.4}_{-1.3}$ M$_\odot$, classifying it as a stellar-mass black hole and confirming super-Eddington accretion in the system.

我们对超亮X射线源（ULX）M74 X-1进行了长期的频谱和时间研究，使用的数据来自Chandra和XMM-Newton X射线天文台在2001年至2021年间的观测数据。我们的分析表明，在某些观测中存在耀斑，而在另一些观测中则不存在。耀斑状态表现出较低平均流量水平的双组分光谱，而非耀斑状态则表现出较高平均流量水平的单组分光谱。M74 X-1的频谱最好通过结合吸积盘和康普顿化成分、双温盘黑体模型以及修正的多温盘黑体模型来描述。使用双温盘黑体模型，我们获得了冷却和高温的$T_{in}$（冷却）= $0.38^{+0.08}{-0.06}$ keV和$T{in}$（热）= $1.67^{+0.18}{-0.13}$ keV，分别表明有两个温度发射区域，并可能伴有流出风的存在。我们在耀斑状态的频谱中发现了一个高斯特征，其能量线为$E{line}$ = $0.96^{+0.05}{-0.11}$ keV，标准差为σ = $0.11^{+0.13}{-0.06}$ keV。与其他ULX观察到的类似特征相比，这可以被解释为系统内部未解决的风特征。绘制硬度亮度图，我们得到了硬度随亮度增加而增加的趋势，这表明在低倾斜系统中存在几何集束效应。此外，利用双温盘黑体模型中的热盘黑体成分，我们估计紧凑物体的质量为M = $7.1^{+1.4}_{-1.3}$ M⊙，将其分类为恒星质量黑洞，并确认系统中存在超爱丁顿吸积。

论文及项目相关链接

PDF Published in The Astrophysical Journal

摘要

对M74 X-1这一超亮X射线源进行了长达二十年的光谱和时间研究分析。发现其存在耀斑和无耀斑两种状态，显示不同的光谱特性。采用双温盘黑体模型解析其光谱，发现双温度发射区域并暗示存在流出风。还观察到系统硬度随光度增加而增加的趋势，暗示低倾角系统中存在几何光束效应。通过模型估算出紧凑物体的质量为恒星质量黑洞，并确认了系统中的超爱丁顿吸积现象。

关键见解

对M74 X-1进行了长达二十年的光谱和时间研究分析。
源存在耀斑和无耀斑两种状态，显示不同的光谱特性。
双温盘黑体模型最佳地描述了M74 X-1的光谱。
发现了双温度发射区域和可能的流出风存在。
系统硬度与光度之间呈现出增加趋势，暗示几何光束效应。
通过模型估算，紧凑物体的质量被分类为恒星质量黑洞。

Cool Papers

点此查看论文截图

Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing

Authors:Bingyu Li, Haocheng Dong, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

Open-Vocabulary Remote Sensing Image Segmentation (OVRSIS), an emerging task that adapts Open-Vocabulary Segmentation (OVS) to the remote sensing (RS) domain, remains underexplored due to the absence of a unified evaluation benchmark and the domain gap between natural and RS images. To bridge these gaps, we first establish a standardized OVRSIS benchmark (\textbf{OVRSISBench}) based on widely-used RS segmentation datasets, enabling consistent evaluation across methods. Using this benchmark, we comprehensively evaluate several representative OVS/OVRSIS models and reveal their limitations when directly applied to remote sensing scenarios. Building on these insights, we propose \textbf{RSKT-Seg}, a novel open-vocabulary segmentation framework tailored for remote sensing. RSKT-Seg integrates three key components: (1) a Multi-Directional Cost Map Aggregation (RS-CMA) module that captures rotation-invariant visual cues by computing vision-language cosine similarities across multiple directions; (2) an Efficient Cost Map Fusion (RS-Fusion) transformer, which jointly models spatial and semantic dependencies with a lightweight dimensionality reduction strategy; and (3) a Remote Sensing Knowledge Transfer (RS-Transfer) module that injects pre-trained knowledge and facilitates domain adaptation via enhanced upsampling. Extensive experiments on the benchmark show that RSKT-Seg consistently outperforms strong OVS baselines by +3.8 mIoU and +5.9 mACC, while achieving 2x faster inference through efficient aggregation. Our code is \href{https://github.com/LiBingyu01/RSKT-Seg}{\textcolor{blue}{here}}.

远程开放词汇遥感图像分割（OVRSIS）是一项新兴任务，它将开放词汇分割（OVS）适应于遥感（RS）领域。由于缺乏统一的评估基准和自然图像与遥感图像之间的领域差距，这项任务仍被较少探索。为了弥补这些差距，我们首先基于广泛使用的遥感分割数据集建立了标准化的OVRSIS基准（OVRSISBench），使各种方法能够进行一致性的评估。使用该基准，我们全面评估了若干具有代表性的OVS/OVRSIS模型，并揭示了它们在直接应用于遥感场景时的局限性。基于这些见解，我们提出了针对遥感的全新开放词汇分割框架RSKT-Seg。RSKT-Seg集成了三个关键组件：（1）多方向成本图聚合（RS-CMA）模块，通过计算多个方向上的视觉语言余弦相似性，捕捉旋转不变的视觉线索；（2）高效成本图融合（RS-Fusion）变压器，它采用轻量级降维策略，联合建模空间语义依赖性；（3）遥感知识转移（RS-Transfer）模块，注入预训练知识，并通过增强上采样促进领域适应。在基准测试上的大量实验表明，RSKT-Seg在强大的OVS基准线上，mIoU提高了+3.8，+5.9 mACC，同时通过有效的聚合实现了2倍更快的推理速度。我们的代码可以在[https://github.com/LiBingyu01/RSKT-Seg]（点击蓝色字体查看）。

论文及项目相关链接

PDF

Summary
建立了一个统一的开放词汇遥感图像分割基准测试（OVRSISBench），对几种代表性的开放词汇分割（OVS）/OVRSIS模型进行了综合评估，并据此提出了针对遥感领域的定制开放词汇分割框架RSKT-Seg，通过引入三个关键组件解决遥感图像分割中的挑战。

Key Takeaways

提出建立标准化的开放词汇遥感图像分割基准测试（OVRSISBench），基于广泛使用的遥感分割数据集，促进不同方法的统一评估。
综合评估了代表性OVS/OVRSIS模型在遥感场景下的表现，揭示了其局限性。
引入了新的开放词汇分割框架RSKT-Seg，专为遥感领域定制。
RSKT-Seg包含三个关键组件：多方向代价图聚合（RS-CMA）模块、高效代价图融合（RS-Fusion）变压器和遥感知识转移（RS-Transfer）模块。
RS-CMA模块通过计算跨多个方向的视觉语言余弦相似性来捕捉旋转不变的视觉线索。
RS-Fusion变压器通过轻量级降维策略联合建模空间和语义依赖性。
RS-Transfer模块注入预训练知识，并通过增强上采样促进领域适应。

Cool Papers

点此查看论文截图

Very-low-field MRI scanners: from the ideal to the real permanent magnet array

Authors:Umberto Zanovello, Alessandro Arduino, Vittorio Basso, Luca Zilberti, Alessandro Sola, Andrea Agosto, Luca Toso, Oriano Bottauscio

Very-low-field MRIs are becoming increasingly popular due to their portability and adaptability to different environments. They are being successfully used for various clinical applications, leading to a paradigm shift in the way imaging care is typically performed. The development of low-cost MRI scanner prototypes began a few years ago, with some interesting and promising open-source projects emerging in both hardware and software design. Using permanent magnets (PMs) to generate the static magnetic field B0 can substantially reduce the manufacturing cost of low-field scanners while achieving satisfactory homogeneity. This article focuses on characterizing magnet performance in terms of B0 spatial homogeneity. Specifically, it investigates its sensitivity to various factors and explores the reasons for discrepancies between numerical expectations and actual measurements on fabricated magnets. The analysis also examines the consequences of using different numerical model approximations, revisiting concepts most frequently used in other design contexts. While these assumptions simplify the numerical model and may improve its performance in terms of computational time, this paper demonstrates that they also impact the reliability of the obtained results.

超低场MRI因其便携性和适应不同环境的能力而越来越受欢迎。它们已成功应用于各种临床，导致成像护理方式发生范式转变。几年前就开始了低成本MRI扫描仪原型的开发，在硬件和软件设计方面都出现了一些有趣且前景光明的开源项目。使用永磁体（PM）产生静态磁场B0可以大幅度降低低场扫描仪的制造成本，同时实现令人满意的均匀性。本文主要关注永磁体在B0空间均匀性方面的性能特征。具体来说，它研究了其对各种因素的敏感性，并探讨了数值预期与实际测量之间差异的成因。分析还探讨了使用不同数值模型近似值的影响，并重新探讨了其他设计语境中最常使用的概念。虽然这些假设简化了数值模型并可能提高了其计算时间性能，但本文证明了它们还影响了结果的可靠性。

论文及项目相关链接

PDF 12 pages, 7 figures

Summary
极低场MRI因其便携性和适应不同环境的能力而日益普及，正在为各种临床应用成功应用，引发了成像护理方式的转变。文章重点研究使用永磁体生成静态磁场B0的性能特性，尤其是它对不同因素的敏感性，探讨数值预期与实际测量之间差异的原因。同时分析不同数值模型近似值的使用后果，重新审视在其他设计上下文中经常使用的概念。这些假设虽然简化了数值模型并可能提高了其计算性能，但也影响了结果的可靠性。

Key Takeaways

极低场MRI技术因其便携性和环境适应性而日益普及，为各种临床应用提供了成功的解决方案。
使用永磁体（PMs）生成静态磁场B0可以显著降低低场扫描仪的制造成本，同时实现满意的均匀性。
文章重点研究永磁体性能特性，特别是其对不同因素的敏感性。
文章探讨了数值预期与实际测量之间差异的原因，这有助于更好地理解永磁体的性能表现。
不同数值模型近似值的使用会影响结果的可靠性，这是通过实际测量与数值模型对比得出的结论。
简化数值模型虽然可以提高计算性能，但也可能导致结果的偏差和不准确。

Cool Papers

点此查看论文截图

A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications

Authors:Hongyuan Zhang, Yuheng Wu, Mingyang Zhao, Zhiwei Chen, Rebecca Li, Fei Zhu, Haohan Zhao, Xiaohua Yuan, Meng Yang, Chunli Qiu, Xiang Cong, Haiyan Chen, Lina Luan, Randolph H. L. Wong, Huai Liao, Colin A Graham, Shi Chang, Guowei Tao, Dong Yi, Zhen Lei, Nassir Navab, Sebastien Ourselin, Jiebo Luo, Hongbin Liu, Gaofeng Meng

Artificial intelligence (AI) that can effectively learn ultrasound representations by integrating multi-source data holds significant promise for advancing clinical care. However, the scarcity of large labeled datasets in real-world clinical environments and the limited generalizability of task-specific models have hindered the development of generalizable clinical AI models for ultrasound applications. In this study, we present EchoCare, a novel ultrasound foundation model for generalist clinical use, developed via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData. EchoCareData comprises 4.5 million ultrasound images, sourced from over 23 countries across 5 continents and acquired via a diverse range of distinct imaging devices, thus encompassing global cohorts that are multi-center, multi-device, and multi-ethnic. Unlike prior studies that adopt off-the-shelf vision foundation model architectures, we introduce a hierarchical classifier into EchoCare to enable joint learning of pixel-level and representation-level features, capturing both global anatomical contexts and local ultrasound characteristics. With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks of varying diagnostic difficulties, spanning disease diagnosis, lesion segmentation, organ detection, landmark prediction, quantitative regression, imaging enhancement and report generation. The code and pretrained model are publicly released, rendering EchoCare accessible for fine-tuning and local adaptation, supporting extensibility to additional applications. EchoCare provides a fully open and generalizable foundation model to boost the development of AI technologies for diverse clinical ultrasound applications.

能够通过整合多源数据有效学习超声表示的人工智能在临床护理的推进中具有巨大潜力。然而，真实临床环境中大型标记数据集的稀缺性以及特定任务模型的有限通用性，阻碍了超声应用通用临床人工智能模型的发展。在这项研究中，我们推出了EchoCare，这是一个用于通用临床使用的新型超声基础模型，它是基于我们精选的、公开可用的、大规模数据集EchoCareData进行自我监督学习而开发的。EchoCareData包含来自五大洲超过23个国家的450万张超声图像，通过各种各样的成像设备获取，涵盖了多中心、多设备、多民族的全球人群。不同于先前采用现成视觉基础模型架构的研究，我们在EchoCare中引入了一个层次分类器，以实现像素级和表示级特征的联合学习，捕捉全局解剖背景和局部超声特征。在具有诊断难度差异的十个代表性超声基准测试上，EchoCare以极少的训练超越了最先进的其他模型，包括疾病诊断、病变分割、器官检测、地标预测、定量回归、成像增强和报告生成等方面。代码和预训练模型均已公开发布，使得EchoCare易于微调并可在本地进行适应，支持扩展到其他应用。EchoCare提供了一个完全开放和可通用的基础模型，推动了各种临床超声应用的人工智能技术开发。

论文及项目相关链接

PDF

Summary

该研究表明，通过整合多源数据学习超声表征的人工智能在临床医学领域具有巨大潜力。然而，由于真实临床环境中大型标记数据集的稀缺性以及特定任务模型的有限通用性，限制了超声应用通用临床人工智能模型的发展。本研究提出了EchoCare，这是一种面向临床医生使用的新型超声基础模型，通过在我们精选的、公开的、大规模数据集EchoCareData上进行自我监督学习而开发。EchoCareData包含来自五大洲超过23个国家的450万张超声图像，并通过各种成像设备获取，涵盖了多中心、多设备、多民族的全球队列。与先前采用现成的视觉基础模型架构的研究不同，我们在EchoCare中引入了一个层次分类器，以实现对像素级和表示级特征的联合学习，捕捉全局解剖背景和局部超声特征。经过最少的训练，EchoCare在10个具有代表性的超声基准测试中表现出超越最新对比模型的性能，这些基准测试涵盖了疾病诊断、病变分割、器官检测、地标预测、定量回归、图像增强和报告生成。公开发布的代码和预训练模型使EchoCare易于微调并适应本地需求，支持扩展到其他应用。EchoCare提供了一个完全开放和通用的基础模型，推动了临床医学中多种超声应用的人工智能技术发展。

Key Takeaways

人工智能通过学习超声表征并整合多源数据在临床医学中具有巨大潜力。
大型标记数据集在真实临床环境中的稀缺性和特定任务模型的有限通用性是超声应用AI模型发展的主要挑战。
EchoCare是一个面向临床医生使用的超声基础模型，可在多中心、多设备、多民族的全球队列中进行自我监督学习。
EchoCare采用层次分类器联合学习像素级和表示级特征，提高模型性能。
EchoCare在多种超声基准测试中表现优越，包括疾病诊断、病变分割等。
EchoCare的代码和预训练模型已公开发布，支持细粒度调整和本地适应。

Cool Papers

点此查看论文截图

Multiple Instance Learning Framework with Masked Hard Instance Mining for Gigapixel Histopathology Image Analysis

Authors:Wenhao Tang, Sheng Huang, Heng Fang, Fengtao Zhou, Bo Liu, Qingshan Liu

Digitizing pathological images into gigapixel Whole Slide Images (WSIs) has opened new avenues for Computational Pathology (CPath). As positive tissue comprises only a small fraction of gigapixel WSIs, existing Multiple Instance Learning (MIL) methods typically focus on identifying salient instances via attention mechanisms. However, this leads to a bias towards easy-to-classify instances while neglecting challenging ones. Recent studies have shown that hard examples are crucial for accurately modeling discriminative boundaries. Applying such an idea at the instance level, we elaborate a novel MIL framework with masked hard instance mining (MHIM-MIL), which utilizes a Siamese structure with a consistency constraint to explore the hard instances. Using a class-aware instance probability, MHIM-MIL employs a momentum teacher to mask salient instances and implicitly mine hard instances for training the student model. To obtain diverse, non-redundant hard instances, we adopt large-scale random masking while utilizing a global recycle network to mitigate the risk of losing key features. Furthermore, the student updates the teacher using an exponential moving average, which identifies new hard instances for subsequent training iterations and stabilizes optimization. Experimental results on cancer diagnosis, subtyping, survival analysis tasks, and 12 benchmarks demonstrate that MHIM-MIL outperforms the latest methods in both performance and efficiency. The code is available at: https://github.com/DearCaat/MHIM-MIL.

将病理图像数字化为超高分辨率的全切片图像（Whole Slide Images, WSI）为计算病理学（CPath）开辟了新的途径。由于阳性组织只占超高分辨率WSI的一小部分，现有的多实例学习（MIL）方法通常通过注意力机制专注于识别显著实例。然而，这导致了对容易分类的实例的偏向，同时忽略了具有挑战性的实例。最近的研究表明，困难样本对于准确建模判别边界至关重要。在实例层面应用这一理念，我们详细阐述了一种新型的多实例学习框架，该框架具有掩码硬实例挖掘（MHIM-MIL）功能，它利用Siamese结构以及一致性约束来探索硬实例。使用类感知实例概率，MHIM-MIL利用动量教师模型来掩盖显著实例并隐含地挖掘硬实例以训练学生模型。为了获得多样且非冗余的硬实例，我们采用大规模随机掩码，同时利用全局回收网络来降低丢失关键特征的风险。此外，学生模型通过指数移动平均来更新教师模型，这可以识别出新的硬实例以供后续训练迭代使用，并稳定优化。在癌症诊断、亚型分类、生存分析任务和12个基准测试上的实验结果表明，MHIM-MIL在性能和效率上均优于最新方法。代码可在https://github.com/DearCaat/MHIM-MIL上找到。

论文及项目相关链接

PDF 27 pages, 8 figures

Summary
数字化病理图像为计算病理学提供了新的途径。现有方法大多关注通过注意力机制识别重要实例，而忽视挑战实例。本研究提出了一种带有掩盖性硬实例挖掘的基于多实例学习的新框架，采用对比学习机制挖掘难以分类的实例进行训练。该框架利用全局回收网络获取多样化的非冗余硬实例，通过指数移动平均更新教师模型，以提高后续训练迭代中的性能稳定性。

Key Takeaways

数字化病理图像为计算病理学带来新机会。
现有方法主要关注易于分类的实例，但挑战实例对准确建模至关重要。
提出了一种新的基于多实例学习的框架MHIM-MIL，用于挖掘难以分类的实例进行训练。
MHIM-MIL采用Siamese结构和一致性约束来探索难以分类的实例。
使用类感知实例概率和动量教师模型进行训练优化。
利用大规模随机掩盖和全局回收网络获取多样化的非冗余硬实例。
指数移动平均更新教师模型，提高了训练的稳定性和性能。

Cool Papers

点此查看论文截图

ANROT-HELANet: Adverserially and Naturally Robust Attention-Based Aggregation Network via The Hellinger Distance for Few-Shot Classification

Authors:Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N. Duong

Few-Shot Learning (FSL), which involves learning to generalize using only a few data samples, has demonstrated promising and superior performances to ordinary CNN methods. While Bayesian based estimation approaches using Kullback-Leibler (KL) divergence have shown improvements, they remain vulnerable to adversarial attacks and natural noises. We introduce ANROT-HELANet, an Adversarially and Naturally RObusT Hellinger Aggregation Network that significantly advances the state-of-the-art in FSL robustness and performance. Our approach implements an adversarially and naturally robust Hellinger distance-based feature class aggregation scheme, demonstrating resilience to adversarial perturbations up to $\epsilon=0.30$ and Gaussian noise up to $\sigma=0.30$. The network achieves substantial improvements across benchmark datasets, including gains of 1.20% and 1.40% for 1-shot and 5-shot scenarios on miniImageNet respectively. We introduce a novel Hellinger Similarity contrastive loss function that generalizes cosine similarity contrastive loss for variational few-shot inference scenarios. Our approach also achieves superior image reconstruction quality with a FID score of 2.75, outperforming traditional VAE (3.43) and WAE (3.38) approaches. Extensive experiments conducted on four few-shot benchmarked datasets verify that ANROT-HELANet’s combination of Hellinger distance-based feature aggregation, attention mechanisms, and our novel loss function establishes new state-of-the-art performance while maintaining robustness against both adversarial and natural perturbations. Our code repository will be available at https://github.com/GreedYLearner1146/ANROT-HELANet/tree/main.

小样学习（FSL）技术涉及使用少量数据样本进行推广学习，已经展现出对普通CNN方法的优异性能。虽然基于贝叶斯估计的方法使用Kullback-Leibler（KL）散度已经显示出改进，但它们仍然容易受到对抗性攻击和自然噪声的影响。我们引入了ANROT-HELANet，这是一个对抗性和自然鲁棒的Hellinger聚合网络，它显著地提高了FSL的稳健性和性能的最新水平。我们的方法实现了一种对抗性和自然鲁棒的基于Hellinger距离的特征类聚合方案，对高达$\epsilon=0.30$的对抗性扰动和高达$\sigma=0.30$的高斯噪声表现出韧性。该网络在基准数据集上取得了重大改进，例如在miniImageNet上的1次和5次场景分别提高了1.20％和1.40％。我们引入了一种新型的Hellinger相似性对比损失函数，它推广了用于变分小样推理场景的余弦相似性对比损失。我们的方法还实现了优越的图像重建质量，FID得分为2.75，优于传统的VAE（3.43）和WAE（3.38）方法。在四个小样基准数据集上进行的广泛实验证实，ANROT-HELANet结合了基于Hellinger距离的特征聚合、注意力机制和我们的新型损失函数，在保持对抗性和自然扰动的稳健性的同时，创造了新的最佳性能。我们的代码仓库将位于https://github.com/GreedYLearner1146/ANROT-HELANet/tree/main。

论文及项目相关链接

PDF Preprint version. The manuscript has been submitted to a journal. All changes will be transferred to the final version if accepted. Also an erratum: In Figure 10 and 11, the $\epsilon = 0.005$ value should be $\epsilon = 0.05$

摘要

本文介绍了ANROT-HELANet网络，该网络在少样本学习（FSL）的稳健性和性能上取得了显著进展。通过实现基于Hellinger距离的鲁棒性特征类聚合方案，该网络对对抗性扰动和自然噪声具有强大的抗性。在多个基准数据集上，该网络实现了显著的性能提升，并在图像重建质量方面表现出优异的表现。此外，本文还提出了一种新的Hellinger相似性对比损失函数，用于变分少样本推理场景。总体而言，ANROT-HELANet网络通过结合Hellinger距离特征聚合、注意力机制和新型损失函数，在四个少样本基准数据集上实现了最新性能，同时保持了对抗性和自然扰动的稳健性。

关键见解

ANROT-HELANet网络在少样本学习（FSL）中展示了卓越的稳健性和性能。
该网络通过实施基于Hellinger距离的鲁棒性特征类聚合方案，增强了对抗性和自然噪声的抗性。
在多个基准数据集上，ANROT-HELANet实现了显著的性能提升，特别是在miniImageNet数据集上的1-shot和5-shot场景。
引入了一种新的Hellinger相似性对比损失函数，适用于变分少样本推理场景。
ANROT-HELANet在图像重建质量方面表现出色，优于传统的VAE和WAE方法。
广泛实验证明，ANROT-HELANet结合Hellinger距离特征聚合、注意力机制和新型损失函数，在四个少样本基准数据集上实现最佳性能。

Cool Papers

点此查看论文截图

Automated Radiology Report Generation Based on Topic-Keyword Semantic Guidance

Authors:Jing Xiao, Hongfei Liu, Ruiqi Dong, Jimin Liu, Haoyong Yu

Automated radiology report generation is essential in clinical practice. However, diagnosing radiological images typically requires physicians 5-10 minutes, resulting in a waste of valuable healthcare resources. Existing studies have not fully leveraged knowledge from historical radiology reports, lacking sufficient and accurate prior information. To address this, we propose a Topic-Keyword Semantic Guidance (TKSG) framework. This framework uses BiomedCLIP to accurately retrieve historical similar cases. Supported by multimodal, TKSG accurately detects topic words (disease classifications) and keywords (common symptoms) in diagnoses. The probabilities of topic terms are aggregated into a topic vector, serving as global information to guide the entire decoding process. Additionally, a semantic-guided attention module is designed to refine local decoding with keyword content, ensuring report accuracy and relevance. Experimental results show that our model achieves excellent performance on both IU X-Ray and MIMIC-CXR datasets. The code is available at https://github.com/SCNU203/TKSG.

在临床实践中，自动放射学报告生成至关重要。然而，医生通常需要花费5-10分钟的时间来诊断放射图像，这导致了宝贵的医疗资源被浪费。现有的研究尚未充分利用历史放射学报告中的知识，缺乏充足和准确的先验信息。为了解决这个问题，我们提出了一个名为Topic-Keyword Semantic Guidance（TKSG）的框架。该框架使用BiomedCLIP来准确检索历史相似病例。在多模态的支持下，TKSG能够准确地检测诊断中的主题词（疾病分类）和关键词（常见症状）。主题词的概率被聚集到一个主题向量中，作为全局信息来引导整个解码过程。此外，还设计了一个语义引导注意力模块，以细化关键词内容的局部解码，确保报告的准确性和相关性。实验结果表明，我们的模型在IU X-Ray和MIMIC-CXR数据集上均取得了卓越的性能。代码可在https://github.com/SCNU203/TKSG找到。

论文及项目相关链接

PDF

Summary
医学图像自动报告生成在临床实践中至关重要，但诊断图像通常需要医生花费大量时间。为解决此问题并充分利用历史放射学报告中的知识，提出了Topic-Keyword Semantic Guidance（TKSG）框架。该框架使用BiomedCLIP准确检索类似病例，通过多模态支持，准确检测诊断中的主题词和关键词。该框架还设计了一个语义引导注意力模块，以提高报告准确性和相关性。实验结果表明，该模型在IU X-Ray和MIMIC-CXR数据集上表现优异。

Key Takeaways

医学图像自动报告生成有助于优化临床实践中的资源利用。
当前诊断过程存在时间浪费问题，需要更高效的方法。
Topic-Keyword Semantic Guidance（TKSG）框架旨在解决这一问题。
TKSG框架利用BiomedCLIP技术准确检索历史相似病例。
TKSG通过多模态支持准确检测主题词和关键词。
TKSG框架设计了一个语义引导注意力模块以提高报告准确性。

Cool Papers

点此查看论文截图

A Statistical 3D Stomach Shape Model for Anatomical Analysis

Authors:Erez Posner, Ore Shtalrid, Oded Erell, Daniel Noy, Moshe Bouhnik

Realistic and parameterized 3D models of human anatomy have become invaluable in research, diagnostics, and surgical planning. However, the development of detailed models for internal organs, such as the stomach, has been limited by data availability and methodological challenges. In this paper, we propose a novel pipeline for the generation of synthetic 3D stomach models, enabling the creation of anatomically diverse morphologies informed by established studies on stomach shape variability. Using this pipeline, we construct a dataset of synthetic stomachs. Building on this dataset, we develop a 3D statistical shape model of the stomach, trained to capture natural anatomical variability in a low-dimensional shape space. The model is further refined using CT meshes derived from publicly available datasets through a semi-supervised alignment process, enhancing its ability to generalize to unseen anatomical variations. We evaluated the model on a held-out test set of real stomach CT scans, demonstrating robust generalization and fit accuracy. We make the statistical shape model along with the synthetic dataset publicly available on GitLab: https://gitlab.com/Erez.Posner/stomach_pytorch to facilitate further research. This work introduces the first statistical 3D shape model of the stomach, with applications ranging from surgical simulation and pre-operative planning to medical education and computational modeling. By combining synthetic data generation, parametric modeling, and real-world validation, our approach represents a significant advancement in organ modeling and opens new possibilities for personalized healthcare solutions.

真实且参数化的三维人体解剖模型在研究、诊断和手术规划中具有无可估量的价值。然而，由于数据可用性和方法上的挑战，针对内部器官（如胃）的详细模型开发受到了限制。在本文中，我们提出了一种生成合成三维胃模型的新流程，该流程能够根据已有的关于胃形状变异的研究创建形态各异的解剖结构。使用该流程，我们构建了一个合成胃数据集。在此基础上，我们开发了一个三维胃统计形状模型，通过训练以在低维形状空间中捕捉自然的解剖变异。该模型进一步使用通过半监督对齐过程从公开数据集中得出的CT网格进行细化，提高了其对未见解剖变异的概括能力。我们在一组真实的胃CT扫描测试集上评估了该模型，证明了其稳健的概括能力和拟合精度。我们将统计形状模型以及合成数据集在GitLab上公开：https://gitlab.com/Erez.Posner/stomach_pytorch，以促进进一步的研究。这项工作推出了首个胃的三维统计形状模型，其应用范围包括手术模拟和术前规划到医学教育和计算建模。通过结合合成数据生成、参数建模和现实世界验证，我们的方法代表了器官建模方面的一项重大进展，并为个性化医疗保健解决方案提供了新的可能性。

论文及项目相关链接

PDF

Summary

本文介绍了一种生成合成3D胃模型的新型管道，该管道能够创建形态各异的胃模型，依据的是关于胃形状变化的研究数据。研究团队构建了一个合成胃数据集，并在此基础上开发了一个3D统计形状模型，能够捕捉自然解剖结构的低维变化。该模型通过半监督对齐过程使用CT网格进一步优化，提高了对未见解剖变异的泛化能力。模型已在真实胃CT扫描的测试集上进行了评估，证明了其泛化能力和拟合精度。该统计形状模型及合成数据集已公开在GitLab上供进一步研究使用。这一工作首次引入了胃的3D统计形状模型，可应用于手术模拟、术前规划、医学教育和计算建模等领域。

Key Takeaways

提出了生成合成3D胃模型的新型管道，能够创建形态各异的胃模型。
依据关于胃形状变化的研究数据构建合成胃数据集。
开发了3D统计形状模型，捕捉自然解剖结构的低维变化。
通过半监督对齐过程使用CT网格优化模型，提高泛化能力。
模型在真实胃CT扫描测试集上表现出良好的泛化能力和拟合精度。
统计形状模型及合成数据集已公开在GitLab上供研究使用。

Cool Papers

点此查看论文截图

Comparing Conditional Diffusion Models for Synthesizing Contrast-Enhanced Breast MRI from Pre-Contrast Images

Authors:Sebastian Ibarra, Javier del Riego, Alessandro Catanese, Julian Cuba, Julian Cardona, Nataly Leon, Jonathan Infante, Karim Lekadir, Oliver Diaz, Richard Osuala

Dynamic contrast-enhanced (DCE) MRI is essential for breast cancer diagnosis and treatment. However, its reliance on contrast agents introduces safety concerns, contraindications, increased cost, and workflow complexity. To this end, we present pre-contrast conditioned denoising diffusion probabilistic models to synthesize DCE-MRI, introducing, evaluating, and comparing a total of 22 generative model variants in both single-breast and full breast settings. Towards enhancing lesion fidelity, we introduce both tumor-aware loss functions and explicit tumor segmentation mask conditioning. Using a public multicenter dataset and comparing to respective pre-contrast baselines, we observe that subtraction image-based models consistently outperform post-contrast-based models across five complementary evaluation metrics. Apart from assessing the entire image, we also separately evaluate the region of interest, where both tumor-aware losses and segmentation mask inputs improve evaluation metrics. The latter notably enhance qualitative results capturing contrast uptake, albeit assuming access to tumor localization inputs that are not guaranteed to be available in screening settings. A reader study involving 2 radiologists and 4 MRI technologists confirms the high realism of the synthetic images, indicating an emerging clinical potential of generative contrast-enhancement. We share our codebase at https://github.com/sebastibar/conditional-diffusion-breast-MRI.

动态增强（DCE）MRI对乳腺癌诊断和治疗至关重要。然而，其对造影剂的依赖引发了安全性问题、禁忌症、成本增加和工作流程复杂化等挑战。为此，我们提出了基于预造影条件降噪扩散概率模型的DCE-MRI合成方法。在单乳腺和全乳腺环境中，我们介绍、评估和比较了总共22种生成模型变体。为提高病灶保真度，我们引入了肿瘤感知损失函数和明确的肿瘤分割掩膜条件。使用公共多中心数据集，与相应的预造影基线相比，我们发现基于减法图像的模型在五种互补评估指标上持续优于基于后造影的模型。除了对整个图像进行评估外，我们还对感兴趣区域进行了单独评估，其中肿瘤感知损失和分割掩膜输入均能提高评估指标。后者显著提高了定性结果捕捉造影剂摄取的效果，尽管假设存在肿瘤定位输入，但在筛查环境中无法保证其可用性。涉及两名放射学家和四位MRI技术专家的读者研究证实了合成图像的高度逼真性，表明生成造影增强的潜在临床价值正日益显现。我们已在https://github.com/sebastibar/conditional-diffusion-breast-MRI分享我们的代码库。

论文及项目相关链接

PDF 13 pages, 5 figures, submitted and accepted to MICCAI Deepbreath workshop 2025

Summary

本文介绍了动态增强磁共振成像（DCE-MRI）在乳腺癌诊断和治疗中的重要性，但由于其依赖造影剂而带来安全顾虑、禁忌症、成本增加和工作流程复杂等问题。为此，研究团队提出了基于预造影条件下的降噪扩散概率模型的DCE-MRI合成方法，并在单乳与全乳环境下评估和比较了总计22种生成模型变体。为提高病灶保真度，研究引入了肿瘤感知损失函数和明确的肿瘤分割掩膜条件。在公共多中心数据集上的实验表明，基于减法图像的模型在五种互补评估指标上一致优于基于后造影的模型。此外，还对感兴趣区域进行了单独评估，结果显示肿瘤感知损失和分割掩膜输入改善了评估指标。尽管需要假设有肿瘤定位输入，这在筛查环境中可能无法保证，但包含肿瘤感知损失的模型在捕捉造影剂摄取方面取得了显著的改进。通过涉及两名放射科医生和四名磁共振成像技术专家的读者研究证实了合成图像的高逼真度，表明生成型造影增强具有潜在的临床应用价值。

Key Takeaways

DCE-MRI在乳腺癌诊断与治疗中具有重要作用，但依赖造影剂带来诸多问题。
提出了基于预造影条件的降噪扩散概率模型来合成DCE-MRI。
在单乳与全乳环境下评估和比较了多种生成模型变体。
引入肿瘤感知损失函数和肿瘤分割掩膜条件以提高病灶保真度。
基于减法图像的模型在多种评估指标上优于基于后造影的模型。
读者研究证实了合成图像的高逼真度。

Cool Papers

点此查看论文截图

A Human-Centered Approach to Identifying Promises, Risks, & Challenges of Text-to-Image Generative AI in Radiology

Authors:Katelyn Morrison, Arpit Mathur, Aidan Bradshaw, Tom Wartmann, Steven Lundi, Afrooz Zandifar, Weichang Dai, Kayhan Batmanghelich, Motahhare Eslami, Adam Perer

As text-to-image generative models rapidly improve, AI researchers are making significant advances in developing domain-specific models capable of generating complex medical imagery from text prompts. Despite this, these technical advancements have overlooked whether and how medical professionals would benefit from and use text-to-image generative AI (GenAI) in practice. By developing domain-specific GenAI without involving stakeholders, we risk the potential of building models that are either not useful or even more harmful than helpful. In this paper, we adopt a human-centered approach to responsible model development by involving stakeholders in evaluating and reflecting on the promises, risks, and challenges of a novel text-to-CT Scan GenAI model. Through exploratory model prompting activities, we uncover the perspectives of medical students, radiology trainees, and radiologists on the role that text-to-CT Scan GenAI can play across medical education, training, and practice. This human-centered approach additionally enabled us to surface technical challenges and domain-specific risks of generating synthetic medical images. We conclude by reflecting on the implications of medical text-to-image GenAI.

随着文本到图像生成模型迅速改进，人工智能研究人员在开发能够从文本提示生成复杂医疗图像的领域特定模型方面取得了重大进展。尽管如此，这些技术进展忽略了医疗专业人士在实践中是否会以及如何受益于文本到图像生成人工智能（GenAI）。如果没有涉及利益相关者，开发领域特定的GenAI可能会带来构建无用甚至有害模型的风险。在本文中，我们通过让利益相关者参与评估和反思新型文本到CT扫描GenAI模型的潜力、风险和挑战，采用以人为本的方法来进行负责任的模型开发。通过探索性模型提示活动，我们了解了医学生、放射学实习者和放射科医生对文本到CT扫描GenAI在医学教育、培训和实践中所起作用的不同观点。这种以人为本的方法还使我们能够发现生成合成医疗图像的技术挑战和领域特定风险。最后，我们反思了医疗文本到图像GenAI的影响。

论文及项目相关链接

PDF 10 pages of main content, Appendix attached after references, accepted to AAAI/ACM AIES 2025

Summary
文本主要讨论了医学图像生成模型的发展情况，指出尽管技术发展迅速，但缺乏对医疗专业人士如何使用这些技术的考虑。因此，本文采用以人为本的方法，通过涉及利益相关者评估和反思文本到CT扫描生成式人工智能模型的前景、风险和挑战。通过探索性模型提示活动，揭示了医学生、放射学实习者和放射科医生对医学教育、培训和实践中该技术的作用的看法。

Key Takeaways

AI研究者在医学图像生成模型上取得显著进展，能由文本提示生成复杂的医学图像。
技术发展忽视了医疗专业人士对此类技术的实际应用和受益情况。
医学教育、培训和实践中涉及利益相关者的评估与反馈对于发展文本到CT扫描生成式人工智能模型至关重要。
采用以人为本的方法有助于揭示医学图像生成技术的潜在风险和挑战。
医疗专业人士对文本到CT扫描生成式人工智能模型的看法有助于指导技术发展方向和改进。
在开发此类技术时，需关注技术挑战和医学领域的特定风险。

Cool Papers

点此查看论文截图

OSDM-MReg: Multimodal Image Registration based One Step Diffusion Model

Authors:Xiaochen Wei, Weiwei Guo, Wenxian Yu, Feiming Wei, Dongying Li

Multimodal remote sensing image registration aligns images from different sensors for data fusion and analysis. However, existing methods often struggle to extract modality-invariant features when faced with large nonlinear radiometric differences, such as those between SAR and optical images. To address these challenges, we propose OSDM-MReg, a novel multimodal image registration framework that bridges the modality gap through image-to-image translation. Specifically, we introduce a one-step unaligned target-guided conditional diffusion model (UTGOS-CDM) to translate source and target images into a unified representation domain. Unlike traditional conditional DDPM that require hundreds of iterative steps for inference, our model incorporates a novel inverse translation objective during training to enable direct prediction of the translated image in a single step at test time, significantly accelerating the registration process. After translation, we design a multimodal multiscale registration network (MM-Reg) that extracts and fuses both unimodal and translated multimodal images using the proposed multimodal fusion strategy, enhancing the robustness and precision of alignment across scales and modalities. Extensive experiments on the OSdataset demonstrate that OSDM-MReg achieves superior registration accuracy compared to state-of-the-art methods.

多模态遥感图像配准旨在实现对不同传感器图像的融合与分析。然而，在面对如SAR与光学图像间存在较大非线性辐射差异时，现有方法往往难以提取模态不变特征。为解决这些挑战，我们提出了OSDM-MReg，这是一种新型多模态图像配准框架，它通过图像到图像的翻译来弥合不同模态之间的差距。具体来说，我们引入了一步式未对齐目标引导条件扩散模型（UTGOS-CDM），将源图像和目标图像翻译到一个统一的表示域中。与传统的需要数百次迭代推理的条件DDPM不同，我们的模型在训练过程中融入了新颖的反向翻译目标，从而在测试时能够在单步内直接预测翻译后的图像，显著加速了配准过程。翻译后，我们设计了一个多模态多尺度配准网络（MM-Reg），它利用提出的多模态融合策略来提取和融合单模态和翻译后的多模态图像，提高了跨尺度和跨模态对齐的鲁棒性和精度。在OSdataset上的大量实验表明，OSDM-MReg的配准精度优于最新方法。

论文及项目相关链接

PDF This version updates our previous submission. After rerunning the experiments, we found that the proposed high-frequency perceptual loss did not improve the overall performance of the model. Therefore, we removed this component, revised the corresponding ablation studies, and updated the contributions accordingly. This work has been submitted to the IEEE for possible publication

Summary

多模态遥感图像配准是对来自不同传感器的图像进行融合和分析的关键步骤。然而，现有方法在应对大范围的非线性辐射差异时，提取模态不变特征面临挑战。为解决这一问题，本文提出了OSDM-MReg框架，通过图像到图像的翻译来缩小模态差异。该框架引入了一步式未对齐目标引导条件扩散模型（UTGOS-CDM），将源图像和目标图像翻译成一个统一的表现域。与传统需要数百步迭代的条件DDPM不同，该模型在训练过程中融入了逆向翻译目标，使得测试时的翻译图像预测更加快速和直接。翻译后，设计了一个多模态多尺度配准网络（MM-Reg），通过提出的多模态融合策略，提取和融合单模态和翻译后的多模态图像，提高了跨尺度和模态对齐的稳健性和精度。在OSdataset上的广泛实验表明，OSDM-MReg相较于最先进的方法具有更高的配准精度。

Key Takeaways

多模态遥感图像配准在数据融合和分析中起到关键作用。
现有方法在大范围非线性辐射差异下提取模态不变特征时面临挑战。
OSDM-MReg框架通过图像到图像的翻译缩小了模态差异。
UTGOS-CDM模型实现了快速直接的图像翻译预测。
设计了MM-Reg网络进行多模态和多尺度下的图像配准。
提出的多模态融合策略提高了配准的稳健性和精度。

Cool Papers

点此查看论文截图

Automatic quality control in multi-centric fetal brain MRI super-resolution reconstruction

Authors:Thomas Sanchez, Vladyslav Zalevskyi, Angeline Mihailov, Gerard Martí-Juan, Elisenda Eixarch, Andras Jakab, Vincent Dunet, Mériam Koob, Guillaume Auzias, Meritxell Bach Cuadra

Quality control (QC) has long been considered essential to guarantee the reliability of neuroimaging studies. It is particularly important for fetal brain MRI, where acquisitions and image processing techniques are less standardized than in adult imaging. In this work, we focus on automated quality control of super-resolution reconstruction (SRR) volumes of fetal brain MRI, an important processing step where multiple stacks of thick 2D slices are registered together and combined to build a single, isotropic and artifact-free T2 weighted volume. We propose FetMRQC${SR}$, a machine-learning method that extracts more than 100 image quality metrics to predict image quality scores using a random forest model. This approach is well suited to a problem that is high dimensional, with highly heterogeneous data and small datasets. We validate FetMRQC${SR}$ in an out-of-domain (OOD) setting and report high performance (ROC AUC = 0.89), even when faced with data from an unknown site or SRR method. We also investigate failure cases and show that they occur in $45%$ of the images due to ambiguous configurations for which the rating from the expert is arguable. These results are encouraging and illustrate how a non deep learning-based method like FetMRQC$_{SR}$ is well suited to this multifaceted problem. Our tool, along with all the code used to generate, train and evaluate the model are available at https://github.com/Medical-Image-Analysis-Laboratory/fetmrqc_sr/ .

质量控制（QC）对于神经影像学研究的可靠性至关重要，这一点长期以来备受关注。对于胎儿脑部MRI（磁共振成像）而言尤其如此，因为与成人成像相比，胎儿脑部MRI的采集和图像处理技术标准化程度较低。在这项工作中，我们专注于胎儿脑部MRI的超分辨率重建（SRR）体积的自动化质量控制，这是一个重要的处理步骤，其中多个厚的2D切片堆叠在一起注册并组合成一个单一、各向同性和无伪影的T2加权体积。我们提出了一种名为FetMRQC_{SR}的机器学习方法，该方法提取了超过100个图像质量指标，并使用随机森林模型预测图像质量分数。这种方法非常适合高维、数据高度异质且数据集小的问题。我们在域外（OOD）环境中验证了FetMRQC_{SR}，并报告了高性能（ROC AUC = 0.89），即使在面对来自未知站点或SRR方法的数据时也是如此。我们还调查了失败的情况，并表明这些失败发生在45%的图像中，是由于配置不明确，导致专家评分有争议。这些结果令人鼓舞，并说明了非深度学习方法（如FetMRQC_{SR}）如何适应这种多面问题。我们的工具以及用于生成、训练和评估模型的所有代码都可在https://github.com/Medical-Image-Analysis-Laboratory/fetmrqc_sr/找到。

论文及项目相关链接

PDF 14 pages, 5 figures; accepted at the 2025 MICCAI Perinatal, Preterm and Paediatric Image Analysis (PIPPI) Workshop

摘要
本文关注胎儿脑部MRI的超分辨率重建（SRR）体积的自动化质量控制。提出一种名为FetMRQC_{SR}的机器学习方法，通过提取超过100个图像质量指标，使用随机森林模型预测图像质量分数。该方法适用于高维、数据高度异质且数据集较小的问题。在跨域设置中进行验证，表现出高性能（ROC AUC = 0.89），即使面对来自未知站点或SRR方法的数据也是如此。调查失败案例发现，45%的图像因模糊配置而导致专家评分有争议。研究成果鼓舞人心，展示了非深度学习方法如FetMRQC_{SR}在此多元化问题中的适用性。相关工具和代码已公开在链接。

关键见解

本研究专注于胎儿脑部MRI的超分辨率重建（SRR）体积的自动化质量控制，强调其在保证神经影像学研究可靠性中的重要性。
提出了一种名为FetMRQC_{SR}的机器学习方法，用于预测图像质量分数。
该方法能够适应高维、数据高度异质且数据集较小的场景。
在跨域设置中验证了FetMRQC_{SR}的高性能（ROC AUC = 0.89），显示其面对未知来源数据的稳健性。
研究发现失败案例中的45%是由于图像模糊配置导致的专家评分争议。
结果表明非深度学习方法如FetMRQC_{SR}在处理这种多元化问题上具有适用性。

Cool Papers

点此查看论文截图

Accelerating Low-field MRI: From Compressed Sensing to Deep Learning Reconstruction with CNNs and Transformers

Authors:Efrat Shimron, Shanshan Shan, James Grover, Neha Koonjoo, Sheng Shen, Thomas Boele, Annabel J. Sorby-Adams, John E. Kirsch, Matthew S. Rosen, David E. J. Waddington

Portable, low-field Magnetic Resonance Imaging (MRI) scanners are increasingly being deployed in clinical settings. However, key barriers to their widespread use include low signal-to-noise ratio (SNR), generally low image quality, and long scan durations. Hence, methods for accelerating acquisition and boosting image quality are critically important to enable clinically actionable, high-quality imaging in these systems. Despite the role that compressed sensing (CS) and deep learning (DL)-based methods have played in improving image quality for high-field MRI, their adoption for low-field imaging is still in its infancy, and it remains unclear how robust these methods are in low-SNR regimes. Here, we propose, investigate, and compare four reconstruction approaches: (i) L1-wavelet CS; (ii) a data-driven network; (iii) an unrolled network; and (iv) a Swin Transformer Cascade. We evaluate their performance across a range of SNR values using publicly available datasets and ultra-low field (6.5 mT) MRI data. Our results show that the unrolled network and Swin Transformer cascade outperform CS and data-driven models. While transformer-based models achieve the highest performance at high SNR, unrolled convolution-based networks are more robust in ultra-low SNR settings and often outperform transformers, indicating that simpler DL architectures may be better suited to low-field MRI. This work highlights both the potential and limitations of advanced reconstruction techniques in low-field MRI and pinpoints effective DL strategies for addressing SNR challenges.

便携式、低场磁共振成像（MRI）扫描仪在临床环境中的部署日益增多。然而，其广泛使用的关键障碍包括信噪比（SNR）低、图像质量普遍较差和扫描时间长。因此，加速采集和提高图像质量的方法对于在这些系统中实现临床可操作的高质量成像至关重要。尽管压缩感知（CS）和深度学习（DL）方法在改善高场MRI图像质量方面发挥了作用，但它们在低场成像中的应用仍处于起步阶段，而且在这些方法中，在低信噪比情况下其稳健性仍有待明确。在这里，我们提出、调查和比较了四种重建方法：（i）L1-小波CS；（ii）数据驱动网络；（iii）展开网络；（iv）Swin Transformer级联。我们使用公开数据集和超低场（6.5 mT）MRI数据，在各种信噪比下评估它们的性能。我们的结果表明，展开网络和Swin Transformer级联优于CS和数据驱动模型。虽然基于变压器的模型在高信噪比下表现最佳，但基于展开卷积的网络在超低信噪比环境中更稳健，并且往往优于变压器，这表明对于低场MRI，更简单的DL架构可能更适合。这项工作突出了先进重建技术在低场MRI中的潜力和局限性，并指出了解决信噪比挑战的有效深度学习策略。

论文及项目相关链接

PDF

Summary

本文探讨了便携式低场磁共振成像（MRI）扫描仪在临床应用中的挑战，包括信号噪声比（SNR）低、图像质量普遍较差和扫描时间长等问题。为了改善图像质量并加速采集，本文提出了四种重建方法，并比较了它们在跨越不同SNR值时的性能表现。结果显示，卷积网络在未展开的神经网络和基于数据的模型中有更好的表现，尤其在超低SNR环境中更加稳健。这表明在低场MRI中更适用于更简单的深度学习架构。本研究揭示了先进技术用于低场MRI的潜力和局限性，并提出了针对SNR挑战的有效深度学习策略。

Key Takeaways

低场MRI扫描仪面临信号噪声比低、图像质量差和扫描时间长的问题。
加速采集和提高图像质量是推动低场MRI临床应用的关键。
本文研究了四种重建方法在低场MRI中的性能表现。
未展开的卷积网络在超低SNR环境中表现稳健，有时优于基于变压器的模型。
结果指出简单深度学习架构在低场MRI中的适用性更佳。

Cool Papers

点此查看论文截图

Bayesian Unsupervised Disentanglement of Anatomy and Geometry for Deep Groupwise Image Registration

Authors:Xinzhe Luo, Xin Wang, Linda Shapiro, Chun Yuan, Jianfeng Feng, Xiahai Zhuang

This article presents a general Bayesian learning framework for multi-modal groupwise image registration. The method builds on probabilistic modelling of the image generative process, where the underlying common anatomy and geometric variations of the observed images are explicitly disentangled as latent variables. Therefore, groupwise image registration is achieved via hierarchical Bayesian inference. We propose a novel hierarchical variational auto-encoding architecture to realise the inference procedure of the latent variables, where the registration parameters can be explicitly estimated in a mathematically interpretable fashion. Remarkably, this new paradigm learns groupwise image registration in an unsupervised closed-loop self-reconstruction process, sparing the burden of designing complex image-based similarity measures. The computationally efficient disentangled network architecture is also inherently scalable and flexible, allowing for groupwise registration on large-scale image groups with variable sizes. Furthermore, the inferred structural representations from multi-modal images via disentanglement learning are capable of capturing the latent anatomy of the observations with visual semantics. Extensive experiments were conducted to validate the proposed framework, including four different datasets from cardiac, brain, and abdominal medical images. The results have demonstrated the superiority of our method over conventional similarity-based approaches in terms of accuracy, efficiency, scalability, and interpretability.

本文提出了一个多模态群组图像注册的通用贝叶斯学习框架。该方法建立在图像生成过程的概率模型上，其中观察到的图像的潜在共同解剖结构和几何变化被明确地分离为潜在变量。因此，群组图像注册是通过分层贝叶斯推理来实现的。我们提出了一种新颖的分层变分自动编码架构，以实现潜在变量的推理过程，其中注册参数可以以数学上可解释的方式明确估计。值得注意的是，这种新范式在无人监督的闭环自重建过程中学习群组图像注册，省去了设计复杂图像相似度指标的负担。计算效率高的解耦网络架构本质上是可扩展和灵活的，允许对具有不同大小的大规模图像组进行群组注册。此外，通过解耦学习从多模态图像推断的结构表示能够捕获观察对象的潜在解剖结构并具有视觉语义。为了验证所提出框架，进行了大量实验，包括来自心脏、大脑和腹部医学图像的四个不同数据集。结果表明，我们的方法在准确性、效率、可扩展性和可解释性方面优于传统的基于相似度的方法。

论文及项目相关链接

PDF

Summary
本文提出一种用于多模态群组图像注册的通用贝叶斯学习框架。该框架基于图像生成过程的概率建模，将观察到的图像的底层共同结构和几何变化作为潜在变量显式分离。通过分层贝叶斯推断实现群组图像注册。提出了一种新型的分层变分自动编码架构来实现潜在变量的推断过程，可以明确估计注册参数，具有数学可解释性。该新范例以无监督的闭环自重建过程学习群组图像注册，无需设计复杂的图像相似性度量。其计算高效的分离网络架构具有固有可伸缩性和灵活性，适用于大规模可变尺寸图像组的群组注册。通过解耦学习从多模态图像推断的结构表示能够捕捉观测的潜在结构并具有视觉语义。

Key Takeaways

提出了一种基于贝叶斯学习的多模态群组图像注册框架。
通过概率建模图像生成过程，分离底层共同结构和几何变化为潜在变量。
使用分层变分自动编码架构实现潜在变量的推断。
采用分层贝叶斯推断进行群组图像注册。
学习过程是无监督的闭环自重建，无需设计复杂的图像相似性度量。
框架具有计算效率高、可伸缩性强、适用于大规模可变尺寸图像组的特点。
通过解耦学习，能够从多模态图像中推断出具有视觉语义的结构表示。

Cool Papers

点此查看论文截图

SRSNetwork: Siamese Reconstruction-Segmentation Networks based on Dynamic-Parameter Convolution

Authors:Bingkun Nian, Fenghe Tang, Jianrui Ding, Jie Yang, Zhonglong Zheng, Shaohua Kevin Zhou, Wei Liu

Dynamic convolution demonstrates outstanding representation capabilities, which are crucial for natural image segmentation. However, it fails when applied to medical image segmentation (MIS) and infrared small target segmentation (IRSTS) due to limited data and limited fitting capacity. In this paper, we propose a new type of dynamic convolution called dynamic parameter convolution (DPConv) which shows superior fitting capacity, and it can efficiently leverage features from deep layers of encoder in reconstruction tasks to generate DPConv kernels that adapt to input variations.Moreover, we observe that DPConv, built upon deep features derived from reconstruction tasks, significantly enhances downstream segmentation performance. We refer to the segmentation network integrated with DPConv generated from reconstruction network as the siamese reconstruction-segmentation network (SRS). We conduct extensive experiments on seven datasets including five medical datasets and two infrared datasets, and the experimental results demonstrate that our method can show superior performance over several recently proposed methods. Furthermore, the zero-shot segmentation under unseen modality demonstrates the generalization of DPConv. The code is available at: https://github.com/fidshu/SRSNet.

动态卷积在自然图像分割中表现出卓越的表征能力，这是非常重要的。然而，由于数据有限和拟合能力有限，当它应用于医学图像分割（MIS）和红外小目标分割（IRSTS）时会失败。在本文中，我们提出了一种新型动态卷积，称为动态参数卷积（DPConv），它显示出优越的拟合能力，并能有效地利用编码器深层特征进行重建任务，生成适应输入变化的DPConv内核。此外，我们观察到，基于重建任务得到的深层特征构建的DPConv能显著提高下游分割性能。我们将与由重建网络生成的DPConv相结合的分割网络称为Siamese重建-分割网络（SRS）。我们在七个数据集上进行了大量实验，包括五个医学数据集和两个红外数据集。实验结果表明，我们的方法在多个最近提出的方法中表现出卓越的性能。此外，未见模态下的零样本分割证明了DPConv的泛化能力。代码可在：https://github.com/fidshu/SRSNet找到。

论文及项目相关链接

PDF Accepted by IEEE Transactions on Image Processing (IEEE-TIP)

Summary

动态卷积在自然图像分割中表现出优秀的表征能力，但在医学图像分割和红外小目标分割中因数据有限和拟合能力有限而表现不佳。本文提出一种新的动态卷积方法——动态参数卷积（DPConv），具有更好的拟合能力，并能有效地利用编码器深层特征进行重建任务，生成适应输入变化的DPConv内核。通过基于深层特征生成的DPConv，下游分割性能得到显著提升。实验结果表明，该方法在七个数据集上的性能优于近期提出的方法，并在未见模态下实现零分割，展现出泛化能力。

Key Takeaways