⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-20 更新
Seeing Beyond the Image: ECG and Anatomical Knowledge-Guided Myocardial Scar Segmentation from Late Gadolinium-Enhanced Images
Authors:Farheen Ramzan, Yusuf Kiberu, Nikesh Jathanna, Meryem Jabrane, Vicente Grau, Shahnaz Jamil-Copley, Richard H. Clayton, Chen, Chen
Accurate segmentation of myocardial scar from late gadolinium enhanced (LGE) cardiac MRI is essential for evaluating tissue viability, yet remains challenging due to variable contrast and imaging artifacts. Electrocardiogram (ECG) signals provide complementary physiological information, as conduction abnormalities can help localize or suggest scarred myocardial regions. In this work, we propose a novel multimodal framework that integrates ECG-derived electrophysiological information with anatomical priors from the AHA-17 atlas for physiologically consistent LGE-based scar segmentation. As ECGs and LGE-MRIs are not acquired simultaneously, we introduce a Temporal Aware Feature Fusion (TAFF) mechanism that dynamically weights and fuses features based on their acquisition time difference. Our method was evaluated on a clinical dataset and achieved substantial gains over the state-of-the-art image-only baseline (nnU-Net), increasing the average Dice score for scars from 0.6149 to 0.8463 and achieving high performance in both precision (0.9115) and sensitivity (0.9043). These results show that integrating physiological and anatomical knowledge allows the model to “see beyond the image”, setting a new direction for robust and physiologically grounded cardiac scar segmentation.
精确分割晚期钆增强(LGE)心脏MRI中的心肌瘢痕对于评估组织活力至关重要,但由于对比度变化和成像伪影,这仍然是一个挑战。心电图(ECG)信号提供了补充的生理信息,因为传导异常有助于定位或提示瘢痕心肌区域。在这项工作中,我们提出了一种新型多模式框架,该框架将ECG衍生的生理信息与AHA-17图谱解剖先验知识相结合,用于基于生理一致的LGE瘢痕分割。由于心电图和LGE-MRI并非同时采集,我们引入了一种时间感知特征融合(TAFF)机制,该机制根据采集时间差异动态加权和融合特征。我们的方法在临床试验数据集上进行了评估,相较于最先进的仅图像基线(nnU-Net)取得了显著进步,瘢痕的平均Dice得分从0.6149提高到0.8463,并且在精确度(0.9115)和灵敏度(0.9043)方面都表现出卓越性能。这些结果表明,整合生理和解剖知识使模型能够“超越图像看到更多”,为稳健且基于生理的心脏瘢痕分割设定了一个新的方向。
论文及项目相关链接
Summary
本研究提出了一种新型的多模态框架,该框架结合了心电图衍生的生理信息与AHA-17图谱提供的解剖学先验信息,用于基于晚期钆增强(LGE)心脏磁共振成像(MRI)的生理一致性瘢痕分割。通过引入基于时间感知特征融合(TAFF)机制,根据采集时间差异动态加权和融合特征,实现了心电图和LGE-MRI的非同步数据融合。相较于仅依赖图像的前瞻性方法(如nnU-Net),该方法在临床数据集上的评估结果显著提升了瘢痕的狄克系数,从0.6149提升至0.8463,同时在精确度和灵敏度方面均表现出优异性能。这表明结合生理和解剖学知识可使模型“超越图像”,为稳健且基于生理的心脏瘢痕分割设定新方向。
Key Takeaways
- 本研究旨在通过结合心电图(ECG)和晚期钆增强(LGE)心脏MRI数据,实现心肌瘢痕的准确分割。
- 引入了一种新型多模态框架,集成了ECG的生理信息与AHA-17图谱的解剖学先验信息。
- 通过Temporal Aware Feature Fusion(TAFF)机制,实现了非同步数据融合,根据采集时间差异动态调整特征权重。
- 相较于仅依赖图像的方法,该框架显著提升了瘢痕分割的狄克系数,从0.6149提升至0.8463。
- 该方法在精确度和灵敏度方面均取得了优异性能,分别达到了0.9115和0.9043。
- 结合生理和解剖学知识使模型能够“超越图像”,为心脏瘢痕分割提供了新的研究方向。
点此查看论文截图
NERD: Network-Regularized Diffusion Sampling For 3D Computed Tomography
Authors:Shijun Liang, Ismail Alkhouri, Qing Qu, Rongrong Wang, Saiprasad Ravishankar
Numerous diffusion model (DM)-based methods have been proposed for solving inverse imaging problems. Among these, a recent line of work has demonstrated strong performance by formulating sampling as an optimization procedure that enforces measurement consistency, forward diffusion consistency, and both step-wise and backward diffusion consistency. However, these methods have only considered 2D reconstruction tasks and do not directly extend to 3D image reconstruction problems, such as in Computed Tomography (CT). To bridge this gap, we propose NEtwork-Regularized diffusion sampling for 3D CT (NERD) by incorporating an L1 regularization into the optimization objective. This regularizer encourages spatial continuity across adjacent slices, reducing inter-slice artifacts and promoting coherent volumetric reconstructions. Additionally, we introduce two efficient optimization strategies to solve the resulting objective: one based on the Alternating Direction Method of Multipliers (ADMM) and another based on the Primal-Dual Hybrid Gradient (PDHG) method. Experiments on medical 3D CT data demonstrate that our approach achieves either state-of-the-art or highly competitive results.
针对逆向成像问题,已经提出了许多基于扩散模型(DM)的方法。其中,近期的一项工作通过将采样制定为优化程序,强制实施测量一致性、前向扩散一致性以及逐步和反向扩散一致性,从而取得了强大的性能。然而,这些方法仅考虑了2D重建任务,并不能直接扩展到3D图像重建问题,例如在计算机断层扫描(CT)中。为了弥补这一空白,我们通过将L1正则化纳入优化目标,提出了用于3D CT的网络正则化扩散采样(NERD)。这种正则化鼓励相邻切片之间的空间连续性,减少了切片间的伪影,促进了连贯的体积重建。此外,我们还介绍了两种有效的优化策略来解决目标问题:一种基于交替方向乘子法(ADMM),另一种基于原始-对偶混合梯度(PDHG)方法。在医疗3D CT数据上的实验表明,我们的方法达到了最新水平或极具竞争力的结果。
论文及项目相关链接
摘要
本文提出了针对三维CT重建的基于网络正则化扩散采样方法(NERD),结合了扩散模型和L1正则化技术,以解决三维图像重建问题。通过引入空间连续性约束,该方法提高了相邻切片之间的连续性,减少了切片间伪影,促进了体积重建的一致性。此外,本文还介绍了两种有效的优化策略,基于交替方向乘子法(ADMM)和原始-对偶混合梯度法(PDHG)。实验表明,该方法在医学三维CT数据上取得了最前沿或高度竞争的结果。
关键见解
- 提出了一种基于网络正则化的扩散采样方法(NERD),解决了三维CT重建问题。
- NERD方法结合了扩散模型和L1正则化技术,提高了相邻切片之间的空间连续性。
- L1正则化有助于减少切片间伪影,促进体积重建的一致性。
- 介绍了两种优化策略:基于ADMM和PDHG的方法。
- 实验表明,NERD方法在医学三维CT数据上表现优异,取得了最前沿或高度竞争的结果。
- 该方法为解决其他三维图像重建问题提供了新的思路和方法。
点此查看论文截图
Improving segmentation of retinal arteries and veins using cardiac signal in doppler holograms
Authors:Marius Dubosc, Yann Fischer, Zacharie Auray, Nicolas Boutry, Edwin Carlinet, Michael Atlan, Thierry Geraud
Doppler holography is an emerging retinal imaging technique that captures the dynamic behavior of blood flow with high temporal resolution, enabling quantitative assessment of retinal hemodynamics. This requires accurate segmentation of retinal arteries and veins, but traditional segmentation methods focus solely on spatial information and overlook the temporal richness of holographic data. In this work, we propose a simple yet effective approach for artery-vein segmentation in temporal Doppler holograms using standard segmentation architectures. By incorporating features derived from a dedicated pulse analysis pipeline, our method allows conventional U-Nets to exploit temporal dynamics and achieve performance comparable to more complex attention- or iteration-based models. These findings demonstrate that time-resolved preprocessing can unlock the full potential of deep learning for Doppler holography, opening new perspectives for quantitative exploration of retinal hemodynamics. The dataset is publicly available at https://huggingface.co/datasets/DigitalHolography/
多普勒全息术是一种新兴视网膜成像技术,能够以高时间分辨率捕捉血流的动态行为,实现对视网膜血流动力学定量评估。这需要准确地对视网膜动脉和静脉进行分割,但传统分割方法只关注空间信息,忽略了全息数据的丰富时间信息。在这项研究中,我们提出了一种简单有效的基于标准分割架构的动脉静脉分割方法。通过结合来自专用脉冲分析管道的特征,我们的方法允许传统U-Net利用时间动态信息,并实现与更复杂基于注意力或迭代模型的性能相当的效果。这些结果表明,时间解析预处理可以解锁深度学习在多普勒全息术中的潜力,为视网膜血流动力学定量研究提供了新的视角。数据集公开在https://huggingface.co/datasets/DigitalHolography/上可下载使用。
论文及项目相关链接
PDF 5 pages, 3 figures, 1 table. Submitted to ISBI2026
Summary
多普勒全息术是一种新兴的视网膜成像技术,能捕捉血流的动态行为,具有高的时间分辨率,可以对视网膜血流动力学进行定量评估。本文提出了一种简单有效的动脉静脉分割方法,该方法结合标准分割架构并利用脉冲分析管道的特征,允许传统的U-Net利用时间动态信息,实现与更复杂基于注意力或迭代模型的性能相当。这为多普勒全息术深度学习解锁了潜力,为视网膜血流动力学的定量研究提供了新的视角。
Key Takeaways
- 多普勒全息术是一种新兴视网膜成像技术,能捕捉血流动态行为并具有高时间分辨率。
- 传统分割方法主要关注空间信息,忽略了全息数据的丰富时间信息。
- 本文提出了一种结合标准分割架构的动脉静脉分割方法,该方法利用脉冲分析的特征。
- 通过利用时间动态信息,该方法性能与更复杂的注意力或迭代模型相当。
- 时间解析预处理可以解锁深度学习在多普勒全息术中的潜力。
- 本文方法公开了数据集以供使用。
点此查看论文截图
RepAir: A Framework for Airway Segmentation and Discontinuity Correction in CT
Authors:John M. Oyer, Ali Namvar, Benjamin A. Hoff, Wassim W. Labaki, Ella A. Kazerooni, Charles R. Hatt, Fernando J. Martinez, MeiLan K. Han, Craig J. Galbán, Sundaresh Ram
Accurate airway segmentation from chest computed tomography (CT) scans is essential for quantitative lung analysis, yet manual annotation is impractical and many automated U-Net-based methods yield disconnected components that hinder reliable biomarker extraction. We present RepAir, a three-stage framework for robust 3D airway segmentation that combines an nnU-Net-based network with anatomically informed topology correction. The segmentation network produces an initial airway mask, after which a skeleton-based algorithm identifies potential discontinuities and proposes reconnections. A 1D convolutional classifier then determines which candidate links correspond to true anatomical branches versus false or obstructed paths. We evaluate RepAir on two distinct datasets: ATM’22, comprising annotated CT scans from predominantly healthy subjects and AeroPath, encompassing annotated scans with severe airway pathology. Across both datasets, RepAir outperforms existing 3D U-Net-based approaches such as Bronchinet and NaviAirway on both voxel-level and topological metrics, and produces more complete and anatomically consistent airway trees while maintaining high segmentation accuracy.
从胸部计算机断层扫描(CT)中准确分割气道对于定量肺部分析至关重要。然而,手动标注是不切实际的,许多基于U-Net的自动化方法会产生断开的组件,阻碍了可靠的生物标志物提取。我们提出了RepAir,这是一个稳健的3D气道分割三阶段框架,它将基于nnU-Net的网络与解剖信息拓扑校正相结合。分割网络生成初始气道掩膜,之后基于骨架的算法识别潜在的不连续点并提出重新连接。然后,一维卷积分类器确定哪些候选链接对应于真正的解剖分支,哪些是错误或受阻的路径。我们在两个不同的数据集上对RepAir进行了评估:ATM’22主要由健康受试者的注释CT扫描组成,AeroPath则包含带有严重气道病理的注释扫描。在两个数据集上,RepAir在体素级和拓扑指标上均优于现有的3D U-Net方法(如Bronchinet和NaviAirway),并且产生的气道树更加完整且解剖上一致,同时保持了较高的分割精度。
论文及项目相关链接
PDF 4 pages, 3 figures, 1 table. Preprint submitted to SSIAI 2026 Conference on November 17, 2025
Summary
本文介绍了一种名为RepAir的稳健三维气道分割框架,该框架结合了nnU-Net网络及解剖学信息拓扑修正技术。它能从胸部CT扫描中准确分割气道,解决了手动标注不实用及现有U-Net方法产生的分割组件不连续等问题。RepAir通过三个阶段完成分割:首先由分割网络生成初始气道掩膜,接着通过基于骨架的算法识别潜在的不连续点并建议重新连接,最后通过一维卷积分类器确定哪些候选链接是真正的解剖分支或假路径。在ATM’22和AeroPath两个数据集上的实验表明,RepAir在体素级和拓扑度量上均优于现有的三维U-Net方法(如Bronchinet和NaviAirway),能生成更完整且解剖学上一致的气道树,同时保持较高的分割精度。
Key Takeaways
- RepAir框架是一个针对三维气道分割的稳健方法,结合了nnU-Net网络和解剖学信息拓扑修正技术。
- 气道分割存在准确性的挑战,尤其是手动标注的不实用性和现有U-Net方法产生的组件不连续问题。
- RepAir通过三个阶段完成分割:初始气道掩膜的生成、潜在不连续点的识别及重新连接的建议、以及通过卷积分类器确定解剖分支与假路径。
- 实验在两个数据集上进行了评估,包括ATM’22和AeroPath数据集。
- RepAir在体素级和拓扑度量上优于现有的三维U-Net方法(如Bronchinet和NaviAirway)。
- RepAir能生成更完整且解剖学上一致的气道树。
点此查看论文截图
SparseSurf: Sparse-View 3D Gaussian Splatting for Surface Reconstruction
Authors:Meiying Gu, Jiawei Zhang, Jiahe Li, Xiaohan Yu, Haonan Luo, Jin Zheng, Xiao Bai
Recent advances in optimizing Gaussian Splatting for scene geometry have enabled efficient reconstruction of detailed surfaces from images. However, when input views are sparse, such optimization is prone to overfitting, leading to suboptimal reconstruction quality. Existing approaches address this challenge by employing flattened Gaussian primitives to better fit surface geometry, combined with depth regularization to alleviate geometric ambiguities under limited viewpoints. Nevertheless, the increased anisotropy inherent in flattened Gaussians exacerbates overfitting in sparse-view scenarios, hindering accurate surface fitting and degrading novel view synthesis performance. In this paper, we propose \net{}, a method that reconstructs more accurate and detailed surfaces while preserving high-quality novel view rendering. Our key insight is to introduce Stereo Geometry-Texture Alignment, which bridges rendering quality and geometry estimation, thereby jointly enhancing both surface reconstruction and view synthesis. In addition, we present a Pseudo-Feature Enhanced Geometry Consistency that enforces multi-view geometric consistency by incorporating both training and unseen views, effectively mitigating overfitting caused by sparse supervision. Extensive experiments on the DTU, BlendedMVS, and Mip-NeRF360 datasets demonstrate that our method achieves the state-of-the-art performance.
关于优化高斯混涂(Gaussian Splatting)以用于场景几何的最新进展,已经能够实现从图像高效重建细节丰富的表面。然而,当输入视角较为稀疏时,这种优化容易出现过度拟合的情况,导致重建质量不尽人意。现有方法通过采用扁平化高斯基元以更好地适应表面几何结构,并结合深度正则化来缓解有限视角下的几何歧义性来解决这一挑战。然而,扁平化高斯所固有的各向异性增加在稀疏视角场景中加剧了过度拟合问题,阻碍了精确的表面拟合和新颖视角合成性能的降低。在本文中,我们提出了net{}方法,该方法能够重建更加精确和细节丰富的表面,同时保留高质量的新颖视角渲染。我们的主要见解是引入立体几何纹理对齐(Stereo Geometry-Texture Alignment),它架起了渲染质量和几何估计之间的桥梁,从而联合增强表面重建和视角合成。此外,我们还提出了一种伪特征增强几何一致性(Pseudo-Feature Enhanced Geometry Consistency)方法,它通过结合训练视角和未见过的视角来强制执行多视角几何一致性,有效地缓解了由稀疏监督引起的过度拟合问题。在DTU、BlendedMVS和Mip-NeRF360数据集上的大量实验表明,我们的方法达到了最先进的性能。
论文及项目相关链接
PDF Accepted at AAAI 2026. Project page: https://miya-oi.github.io/SparseSurf-project
Summary
本文介绍了针对场景几何优化的高斯混合技术的新进展,该技术能够从图像中重建详细的表面。然而,当输入视角稀疏时,该技术容易过度拟合,导致重建质量不佳。为解决这一问题,本文提出了一种新的方法\net{},该方法能够在保持高质量新颖视图渲染的同时,重建更准确、更详细的表面。本文的关键见解是引入立体几何纹理对齐技术,该技术能够桥接渲染质量和几何估计,从而同时提高表面重建和视图合成的质量。此外,还提出了一种伪特征增强几何一致性方法,通过结合训练视角和未见视角,有效地缓解了稀疏监督引起的过度拟合问题。在DTU、BlendedMVS和Mip-NeRF360数据集上的广泛实验表明,该方法达到了最先进的性能。
Key Takeaways
- 介绍了高斯混合技术在场景几何优化中的新进展,能够从图像重建详细表面。
- 在输入视角稀疏时,现有技术容易过度拟合,导致重建质量不佳。
- 引入\net{}方法,提高了表面重建的准确性,同时保持了高质量的新颖视图渲染。
- 提出了立体几何纹理对齐技术,桥接渲染质量和几何估计,增强表面重建和视图合成的质量。
- 引入了伪特征增强几何一致性方法,通过结合训练视角和未见视角,缓解稀疏监督引起的过度拟合问题。
- 在多个数据集上的实验表明,该方法达到了最先进的性能。
点此查看论文截图
XAttn-BMD: Multimodal Deep Learning with Cross-Attention for Femoral Neck Bone Mineral Density Estimation
Authors:Yilin Zhang, Leo D. Westbury, Elaine M. Dennison, Nicholas C. Harvey, Nicholas R. Fuggle, Rahman Attar
Poor bone health is a significant public health concern, and low bone mineral density (BMD) leads to an increased fracture risk, a key feature of osteoporosis. We present XAttn-BMD (Cross-Attention BMD), a multimodal deep learning framework that predicts femoral neck BMD from hip X-ray images and structured clinical metadata. It utilizes a novel bidirectional cross-attention mechanism to dynamically integrate image and metadata features for cross-modal mutual reinforcement. A Weighted Smooth L1 loss is tailored to address BMD imbalance and prioritize clinically significant cases. Extensive experiments on the data from the Hertfordshire Cohort Study show that our model outperforms the baseline models in regression generalization and robustness. Ablation studies confirm the effectiveness of both cross-attention fusion and the customized loss function. Experimental results show that the integration of multimodal data via cross-attention outperforms naive feature concatenation without cross-attention, reducing MSE by 16.7%, MAE by 6.03%, and increasing the R2 score by 16.4%, highlighting the effectiveness of the approach for femoral neck BMD estimation. Furthermore, screening performance was evaluated using binary classification at clinically relevant femoral neck BMD thresholds, demonstrating the model’s potential in real-world scenarios.
骨健康不良是一个重要的公共卫生问题,低骨矿物质密度(BMD)会增加骨折风险,这是骨质疏松症的主要特征。我们提出了XAttn-BMD(跨注意力BMD)系统,这是一个多模态深度学习框架,可以从髋关节X射线图像和结构化的临床元数据中预测股骨颈BMD。它利用一种新型双向跨注意力机制来动态融合图像和元数据特征,实现跨模态相互增强。针对BMD不平衡问题,我们定制了加权平滑L1损失,并优先考虑临床意义重大的病例。在赫特福德郡队列研究的数据上进行的广泛实验表明,我们的模型在回归泛化和稳健性方面优于基准模型。消融研究证实了跨注意力融合和自定义损失函数的有效性。实验结果表明,通过跨注意力融合多模态数据的方法优于没有跨注意力的特征简单拼接方法,降低了均方误差(MSE)16.7%,平均绝对误差(MAE)降低了6.03%,并提高了R2分数达16.4%,凸显了该方法在股骨颈BMD估计中的有效性。此外,利用临床上股骨颈BMD相关阈值的二元分类对筛查性能进行了评估,显示了该模型在现实场景中的潜力。
论文及项目相关链接
PDF 11 figures, 10 tables, 38 pages. Submitted to Artificial Intelligence in Medicine (currently with editor)
Summary
本文介绍了XAttn-BMD系统,这是一个用于预测股骨颈骨密度(BMD)的多模式深度学习框架。它通过整合髋关节X光图像和结构化临床元数据,利用双向交叉注意力机制进行动态特征融合。该系统采用加权平滑L1损失以解决BMD不平衡问题并重点关注具有临床意义的情况。在赫特福德郡队列研究的数据上进行的大量实验表明,与基线模型相比,该模型在回归泛化和稳健性方面表现出更高的性能。
Key Takeaways
- XAttn-BMD是一个多模式深度学习框架,旨在预测股骨颈骨密度(BMD)。
- 该系统利用髋关节X光图像和结构化临床元数据进行动态特征融合。
- 双向交叉注意力机制用于增强图像和元数据的相互作用。
- 加权平滑L1损失函数用于处理BMD不平衡问题,并重点关注临床意义显著的案例。
- 在赫特福德郡队列研究的数据上进行的实验表明,XAttn-BMD在回归泛化和稳健性方面优于基线模型。
- 通过交叉注意力融合多模式数据的方法比无交叉注意力的特征拼接方法更有效,在均方误差、平均绝对误差和R²分数方面有所改进。
点此查看论文截图
MRI Embeddings Complement Clinical Predictors for Cognitive Decline Modeling in Alzheimer’s Disease Cohorts
Authors:Nathaniel Putera, Daniel Vilet Rodríguez, Noah Videcrantz, Julia Machnio, Mostafa Mehdipour Ghazi
Accurate modeling of cognitive decline in Alzheimer’s disease is essential for early stratification and personalized management. While tabular predictors provide robust markers of global risk, their ability to capture subtle brain changes remains limited. In this study, we evaluate the predictive contributions of tabular and imaging-based representations, with a focus on transformer-derived Magnetic Resonance Imaging (MRI) embeddings. We introduce a trajectory-aware labeling strategy based on Dynamic Time Warping clustering to capture heterogeneous patterns of cognitive change, and train a 3D Vision Transformer (ViT) via unsupervised reconstruction on harmonized and augmented MRI data to obtain anatomy-preserving embeddings without progression labels. The pretrained encoder embeddings are subsequently assessed using both traditional machine learning classifiers and deep learning heads, and compared against tabular representations and convolutional network baselines. Results highlight complementary strengths across modalities. Clinical and volumetric features achieved the highest AUCs of around 0.70 for predicting mild and severe progression, underscoring their utility in capturing global decline trajectories. In contrast, MRI embeddings from the ViT model were most effective in distinguishing cognitively stable individuals with an AUC of 0.71. However, all approaches struggled in the heterogeneous moderate group. These findings indicate that clinical features excel in identifying high-risk extremes, whereas transformer-based MRI embeddings are more sensitive to subtle markers of stability, motivating multimodal fusion strategies for AD progression modeling.
对阿尔茨海默病中的认知衰退进行精确建模对于早期分层和个性化管理至关重要。虽然表格预测因子提供了全球风险的稳健标志,但它们捕捉细微的脑部变化的能力仍然有限。在这项研究中,我们评估了表格和基于成像的表示形式的预测贡献,重点关注由转换器派生的磁共振成像(MRI)嵌入。我们引入了一种基于动态时间弯曲聚类的轨迹感知标签策略,以捕获认知变化的不同模式,并通过在调和和增强的MRI数据上进行无监督重建训练3D视觉转换器(ViT),以获得无需进展标签的解剖保留嵌入。随后使用传统的机器学习分类器和深度学习头对预训练的编码器嵌入进行评估,并与表格表示和卷积网络基准进行比较。结果突出了跨模式的互补优势。临床和体积特征在预测轻度至重度进展方面达到了约0.70的最高AUC值,这突显了它们在捕捉全球衰退轨迹方面的效用。相比之下,来自ViT模型的MRI嵌入在区分认知稳定个体方面最为有效,AUC为0.71。然而,所有方法在异质的中度组中都遇到了困难。这些结果表明,临床特征在识别高风险极端情况方面表现出色,而基于转换器的MRI嵌入更敏感于稳定性的细微标志,这激发了为阿尔茨海默氏症进展建模的多模式融合策略。
论文及项目相关链接
PDF Accepted at SPIE - Medical Imaging Conference 2026
Summary
本文研究了阿尔茨海默病中的认知衰退准确建模的重要性,并评估了表格和成像表征的预测贡献,重点介绍了基于Transformer的磁共振成像(MRI)嵌入。研究采用动态时间规整聚类法构建轨迹感知标签策略,并使用无监督重建方法在调和和增强的MRI数据上训练三维视觉Transformer(ViT),获得无需进展标签的解剖结构保留嵌入。评估结果显示,不同模态具有互补优势:临床和体积特征在预测轻度至重度进展方面表现最佳,AUC约为0.7;而ViT模型的MRI嵌入在区分认知稳定个体方面最为有效,AUC为0.71。但所有方法在识别中度异质性进展组时都面临挑战。
Key Takeaways
- 阿尔茨海默病的认知衰退准确建模对于早期分层和个性化管理至关重要。
- 研究评估了表格预测因素和成像表征的预测能力,特别是基于Transformer的MRI嵌入。
- 采用动态时间规整聚类法的轨迹感知标签策略用于捕捉认知变化的异质模式。
- 使用无监督重建训练的ViT模型生成解剖结构保留的MRI嵌入。
- 临床和体积特征在预测极端风险方面表现最佳,而基于Transformer的MRI嵌入更擅长识别微妙的稳定性标记。
点此查看论文截图
CCSD: Cross-Modal Compositional Self-Distillation for Robust Brain Tumor Segmentation with Missing Modalities
Authors:Dongqing Xie, Yonghuang Wu, Zisheng Ai, Jun Min, Zhencun Jiang, Shaojin Geng, Lei Wang
The accurate segmentation of brain tumors from multi-modal MRI is critical for clinical diagnosis and treatment planning. While integrating complementary information from various MRI sequences is a common practice, the frequent absence of one or more modalities in real-world clinical settings poses a significant challenge, severely compromising the performance and generalizability of deep learning-based segmentation models. To address this challenge, we propose a novel Cross-Modal Compositional Self-Distillation (CCSD) framework that can flexibly handle arbitrary combinations of input modalities. CCSD adopts a shared-specific encoder-decoder architecture and incorporates two self-distillation strategies: (i) a hierarchical modality self-distillation mechanism that transfers knowledge across modality hierarchies to reduce semantic discrepancies, and (ii) a progressive modality combination distillation approach that enhances robustness to missing modalities by simulating gradual modality dropout during training. Extensive experiments on public brain tumor segmentation benchmarks demonstrate that CCSD achieves state-of-the-art performance across various missing-modality scenarios, with strong generalization and stability.
脑肿瘤的精准分割在多模态MRI的临床诊断和治疗计划制定中至关重要。虽然融合各种MRI序列的互补信息是一种常见做法,但在现实世界的临床环境中,一种或多种模态的频繁缺失对深度学习分割模型的性能和泛化能力构成了重大挑战。为了应对这一挑战,我们提出了一种新型的跨模态组合自蒸馏(CCSD)框架,它可以灵活地处理任意组合的输入模态。CCSD采用共享特定编码器-解码器架构,并融合两种自蒸馏策略:(i)一种分层模态自蒸馏机制,它通过跨模态层次结构转移知识,以减少语义差异;(ii)一种渐进式模态组合蒸馏方法,通过在训练过程中模拟逐渐丢失模态,增强了对缺失模态的鲁棒性。在公共脑肿瘤分割基准测试上的广泛实验表明,CCSD在各种缺失模态场景下达到了最先进的性能,具有强大的泛化能力和稳定性。
论文及项目相关链接
PDF 9 pages, 5 figures
Summary
本论文提出一种解决实际应用中MRI多模态融合的问题,由于现实情况中模态数据缺失对深度学习模型的影响。采用跨模态组合自蒸馏(CCSD)框架,能够灵活处理任意组合的输入模态。CCSD包含共享特定编码器解码器架构和两种自蒸馏策略,包括分层模态自蒸馏和渐进模态组合蒸馏,以提高模型的鲁棒性和泛化能力。在公开脑肿瘤分割基准测试上的实验表明,CCSD在不同缺失模态场景下实现最佳性能。
Key Takeaways
- 准确的多模态MRI脑肿瘤分割对临床诊断和治疗计划至关重要。
- 现实临床环境中,MRI序列中一种或多种模态的缺失是一个巨大挑战。
- 提出的CCSD框架能灵活处理任意组合的输入模态。
- CCSD采用共享特定编码器解码器架构。
- CCSD包含两种自蒸馏策略:分层模态自蒸馏和渐进模态组合蒸馏。
- 分层模态自蒸馏能减少语义差异,渐进模态组合蒸馏增强模型对缺失模态的鲁棒性。
点此查看论文截图
A comparison of time-dependent Cloudy astrophysical code simulations with experimental X-ray spectra from keV laser-generated argon plasmas
Authors:N. Rathee, F. P. Keenan, R. J. R. Williams, G. J. Ferland, S. J. Rose, S. White, D. Riley
We have generated strongly photoionized Ar plasmas in experiments designed to use primarily X-ray L-shell line emission generated from Ag foils irradiated by the VULCAN high-power laser at the UK Central Laser Facility. The principle of the experiment is that use of line emission rather than the usual sub-keV quasi-blackbody source allows keV radiation to play a more dominant role compared to softer X-rays and thus mimic the effect of a blackbody with a higher effective spectral temperature. Our aim is to reproduce in the laboratory the extreme photoionization conditions found in accretion-powered astrophysical sources. In this paper, we compare the experimental results on K-$β$ X-ray Ar spectra with modelling using the time-dependent version of the Cloudy astrophysical code. The results indicate that photoionized laboratory plasmas can be successfully modelled with codes such as Cloudy that have been developed for application to astrophysical sources. Our comparison of simulation and experiment shows that the flux of sub-keV photons that photoionize the outer-shell electrons can have a significant effect, and that detailed measurements of the X-ray drive spectrum across all photon energy ranges are crucial for accurate modelling of experiments.
我们在英国中央激光设施使用VULCAN高功率激光对银箔进行辐射,以此生成主要用于X射线L壳层线发射的实验,从而产生了高度电离的氩等离子体。实验的原理是利用线发射而非通常的亚千电子伏准黑体源,使千电子伏辐射相较于软X射线发挥更重要的作用,从而模仿具有较高有效光谱温度的黑体效应。我们的目标是在实验室中重现积盘电源天体物理源中的极端光离子化条件。本文中,我们将K-$β$ X射线氩光谱的实验结果与使用Cloudy天文代码的时间依赖性版本进行的建模进行了比较。结果表明,利用Cloudy等针对天体物理源开发的代码可以成功模拟实验室中的光离子化等离子体。我们对模拟和实验的比较表明,亚千电子伏光子通量对外部电子的光电离可以产生重大影响,且对所有光子能量范围内的X射线驱动光谱的详细测量对于实验的准确建模至关重要。
论文及项目相关链接
Summary
本实验利用英国中央激光设施的高功率激光对银箔进行辐射,生成了强烈的电离氩等离子体。实验原理是利用线发射而非常规的亚千电子伏特准黑体源,使千电子伏特辐射发挥更主导作用,模仿更高有效谱温黑体的效果。实验目的是在实验室中重现极端的光电离条件,这些条件存在于引力源产生的天体物理源中。本文将实验结果的氩K-$β$射线光谱与云模糊代码的时变版本模型进行比较。结果指示成功的利用云雾代码对实验室内的光致等离子体进行了建模。对比仿真和实验表明,亚千电子伏特光子的流量对外部电子层的光电离作用明显,而跨越所有光子能量范围的X射线驱动光谱的精确测量对于实验的精确建模至关重要。
Key Takeaways
- 实验成功生成了强烈电离的氩等离子体,使用英国中央激光设施的高功率激光和银箔辐射实现。
- 实验原理在于利用线发射模仿更高有效谱温黑体的效果,使千电子伏特辐射发挥主导作用。
- 实验目的是在实验室环境中重现天体物理源中的极端光电离条件。
- 实验结果的氩K-$β$射线光谱与云模糊模型的比较成功。
- 云模糊代码成功地应用于实验室光致等离子体的建模。
- 实验与仿真对比显示,亚千电子伏特光子流量对外部电子层的光电离有显著影响。
点此查看论文截图
D-PerceptCT: Deep Perceptual Enhancement for Low-Dose CT Images
Authors:Taifour Yousra Nabila, Azeddine Beghdadi, Marie Luong, Zuheng Ming, Habib Zaidi, Faouzi Alaya Cheikh
Low Dose Computed Tomography (LDCT) is widely used as an imaging solution to aid diagnosis and other clinical tasks. However, this comes at the price of a deterioration in image quality due to the low dose of radiation used to reduce the risk of secondary cancer development. While some efficient methods have been proposed to enhance LDCT quality, many overestimate noise and perform excessive smoothing, leading to a loss of critical details. In this paper, we introduce D-PerceptCT, a novel architecture inspired by key principles of the Human Visual System (HVS) to enhance LDCT images. The objective is to guide the model to enhance or preserve perceptually relevant features, thereby providing radiologists with CT images where critical anatomical structures and fine pathological details are perceptu- ally visible. D-PerceptCT consists of two main blocks: 1) a Visual Dual-path Extractor (ViDex), which integrates semantic priors from a pretrained DINOv2 model with local spatial features, allowing the network to incorporate semantic-awareness during enhancement; (2) a Global-Local State-Space block that captures long-range information and multiscale features to preserve the important structures and fine details for diagnosis. In addition, we propose a novel deep perceptual loss, designated as the Deep Perceptual Relevancy Loss Function (DPRLF), which is inspired by human contrast sensitivity, to further emphasize perceptually important features. Extensive experiments on the Mayo2016 dataset demonstrate the effectiveness of D-PerceptCT method for LDCT enhancement, showing better preservation of structural and textural information within LDCT images compared to SOTA methods.
低剂量计算机断层扫描(LDCT)作为一种成像解决方案,广泛应用于辅助诊断和临床任务。然而,由于采用低剂量辐射以降低二次癌症风险,其图像质量往往会受到影响。虽然已有一些有效的方法提出改善LDCT的质量,但它们有时会过度估计噪声并过度平滑处理,导致关键细节丢失。在本文中,我们介绍了D-PerceptCT,这是一种受人类视觉系统(HVS)关键原理启发的新型架构,旨在增强LDCT图像。我们的目标是引导模型增强或保留感知相关的特征,从而为放射科医生提供CT图像,其中关键解剖结构和精细病理细节在视觉上清晰可见。D-PerceptCT由两个主要模块组成:一是视觉双路径提取器(ViDex),它结合了来自预训练的DINOv2模型的语义先验知识和局部空间特征,允许网络在增强过程中融入语义感知能力;二是全局-局部状态空间模块,该模块捕捉远程信息和多尺度特征,以保留重要的结构和精细诊断细节。此外,我们提出了一种新颖的深度感知损失,被称为深度感知相关性损失函数(DPRLF),它受到人类对比敏感度启发,能够进一步强调感知重要的特征。在Mayo2016数据集上的大量实验表明,D-PerceptCT方法在LDCT增强方面的有效性,相较于其他最先进的方法,它在LDCT图像中更好地保留了结构和纹理信息。
论文及项目相关链接
摘要
低剂量计算机断层扫描(LDCT)作为一种成像解决方案,广泛应用于诊断和其他临床任务。然而,由于采用低剂量辐射以降低二次癌症发展的风险,图像质量会下降。虽然已有一些有效的方法来提高LDCT的质量,但它们往往会过度估计噪声,进行过度平滑处理,导致关键细节丢失。本文介绍了D-PerceptCT,一种受人类视觉系统(HVS)关键原理启发的全新架构,用于增强LDCT图像。目标是引导模型增强或保留感知相关的特征,为放射科医生提供CT图像,其中关键解剖结构和精细病理细节在感知上是可见的。D-PerceptCT由两个主要模块组成:1)视觉双路径提取器(ViDex),它整合了预训练DINOv2模型的语义先验知识和局部空间特征,允许网络在增强过程中融入语义意识;2)全局-局部状态空间模块,该模块捕捉长程信息和多尺度特征,以保留重要结构和精细诊断细节。此外,我们提出了一种新的深度感知损失,称为深度感知相关性损失函数(DPRLF),它受到人类对比敏感度启发,以进一步强调感知上重要的特征。在Mayo2016数据集上的大量实验表明,D-PerceptCT方法在LDCT增强方面效果显著,能够更好地保留LDCT图像中的结构和纹理信息,与现有先进技术相比具有优势。
关键见解
- LDCT由于采用低剂量辐射以降低癌症风险而广泛被应用,但会导致图像质量下降。
- 当前存在的方法在增强LDCT图像质量时可能会过度估计噪声和过度平滑处理,导致丢失关键细节。
- D-PerceptCT架构被提出以解决这个问题,它结合人类视觉系统的原理来增强LDCT图像。
- D-PerceptCT包含两个主要模块:视觉双路径提取器(ViDex)和全局-局部状态空间模块,旨在增强感知相关特征并保留关键解剖结构和病理细节。
- 提出了一个新的深度感知损失函数DPRLF,它基于人类对比敏感度,以强调感知上重要的特征。
- 在Mayo2016数据集上的实验表明,D-PerceptCT方法在LDCT图像增强方面表现出优异效果,比现有技术更好地保留结构和纹理信息。
点此查看论文截图
Segmentation-Aware Latent Diffusion for Satellite Image Super-Resolution: Enabling Smallholder Farm Boundary Delineation
Authors:Aditi Agarwal, Anjali Jain, Nikita Saxena, Ishan Deshpande, Michal Kazmierski, Abigail Annkah, Nadav Sherman, Karthikeyan Shanmugam, Alok Talekar, Vaibhav Rajan
Delineating farm boundaries through segmentation of satellite images is a fundamental step in many agricultural applications. The task is particularly challenging for smallholder farms, where accurate delineation requires the use of high resolution (HR) imagery which are available only at low revisit frequencies (e.g., annually). To support more frequent (sub-) seasonal monitoring, HR images could be combined as references (ref) with low resolution (LR) images – having higher revisit frequency (e.g., weekly) – using reference-based super-resolution (Ref-SR) methods. However, current Ref-SR methods optimize perceptual quality and smooth over crucial features needed for downstream tasks, and are unable to meet the large scale-factor requirements for this task. Further, previous two-step approaches of SR followed by segmentation do not effectively utilize diverse satellite sources as inputs. We address these problems through a new approach, $\textbf{SEED-SR}$, which uses a combination of conditional latent diffusion models and large-scale multi-spectral, multi-source geo-spatial foundation models. Our key innovation is to bypass the explicit SR task in the pixel space and instead perform SR in a segmentation-aware latent space. This unique approach enables us to generate segmentation maps at an unprecedented 20$\times$ scale factor, and rigorous experiments on two large, real datasets demonstrate up to $\textbf{25.5}$ and $\textbf{12.9}$ relative improvement in instance and semantic segmentation metrics respectively over approaches based on state-of-the-art Ref-SR methods.
通过卫星图像的分割来划定农田边界是许多农业应用中的基本步骤。该任务对于小规模农场来说尤其具有挑战性,因为准确划定边界需要使用高分辨率(HR)图像,而这些图像只能在较低的回访频率(例如每年)下获取。为了支持更频繁(亚)季节性的监测,可以将高分辨率图像与具有更高回访频率的低分辨率(LR)图像相结合作为参考(例如每周),并使用基于参考的超分辨率(Ref-SR)方法。然而,当前的Ref-SR方法在优化感知质量时忽视了后续任务所需的关键特征,无法满足此任务的大规模因子要求。此外,先前采取的先超分辨率再分割的两步走方法并未有效地利用多样的卫星数据源作为输入。我们通过一种新方法SEED-SR来解决这些问题,该方法结合了条件潜在扩散模型以及大规模的多光谱、多源地理空间基础模型。我们的关键创新之处在于绕过像素空间的显式超分辨率任务,转而执行感知分割的潜在空间中的超分辨率。这种独特的方法使我们能够在前所未有的20倍放大倍数下生成分割图,对两个大型真实数据集进行的严格实验表明,相较于基于最前沿Ref-SR方法的方案,我们的方法在实例和语义分割指标上分别有高达25.5%和12.9%的相对改进。
论文及项目相关链接
Summary
本摘要针对卫星图像中农场边界的划定问题展开,针对高分辨率和低分辨率卫星图像的特点进行了融合利用。传统方法在应对具有周期性要求较高的场景下存在问题,当前的新技术如SEED-SR通过结合条件潜在扩散模型和多源地理空间基础模型,在分割感知潜在空间中绕过显式超分辨率任务,实现了前所未有的高尺度分割映射生成。实验证明,相较于基于当前主流参考超分辨率方法的方法,SEED-SR在实例和语义分割指标上取得了显著的相对提升。这不仅推动了农业应用的精准发展,也体现了技术的革新价值。
Key Takeaways
- 卫星图像农场边界的划定是农业应用中的基础步骤,对高分辨率和低分辨率卫星图像的融合利用是关键挑战之一。
- 当前面临的问题是超分辨率技术需要频繁的回顾和利用数据(年度访问),导致在某些场合的时效性和可靠性问题。为了解决这一问题,结合了高分辨率和低分辨率图像的优势。
点此查看论文截图
Cranio-ID: Graph-Based Craniofacial Identification via Automatic Landmark Annotation in 2D Multi-View X-rays
Authors:Ravi Shankar Prasad, Nandani Sharma, Dinesh Singh
In forensic craniofacial identification and in many biomedical applications, craniometric landmarks are important. Traditional methods for locating landmarks are time-consuming and require specialized knowledge and expertise. Current methods utilize superimposition and deep learning-based methods that employ automatic annotation of landmarks. However, these methods are not reliable due to insufficient large-scale validation studies. In this paper, we proposed a novel framework Cranio-ID: First, an automatic annotation of landmarks on 2D skulls (which are X-ray scans of faces) with their respective optical images using our trained YOLO-pose models. Second, cross-modal matching by formulating these landmarks into graph representations and then finding semantic correspondence between graphs of these two modalities using cross-attention and optimal transport framework. Our proposed framework is validated on the S2F and CUHK datasets (CUHK dataset resembles with S2F dataset). Extensive experiments have been conducted to evaluate the performance of our proposed framework, which demonstrates significant improvements in both reliability and accuracy, as well as its effectiveness in cross-domain skull-to-face and sketch-to-face matching in forensic science.
在法医学颅面部识别以及许多生物医学应用中,颅面测量地标非常重要。传统地标定位方法耗时,需要专业知识和专门知识。目前的方法采用叠加和基于深度学习的方法,对地标进行自动标注。然而,由于缺乏大规模验证研究,这些方法并不可靠。在本文中,我们提出了一种新型框架Cranio-ID:首先,使用我们训练的YOLO-pose模型对二维颅骨(面部X射线扫描)及其相应的光学图像进行地标自动标注。其次,通过将这些地标转化为图形表示,然后在两种模态的图形之间找到语义对应关系,使用跨注意力和最优传输框架实现跨模态匹配。我们的提议框架在S2F和CUHK数据集上进行了验证(CUHK数据集与S2F数据集相似)。已经进行了大量实验来评估我们提议框架的性能,它在可靠性和准确性方面显示出显着提高,并且在法医学中的跨域颅骨对面部以及草图对面部的匹配中非常有效。
论文及项目相关链接
PDF 11 pages, 6 figures
Summary
本文提出了一种新型框架Cranio-ID,用于自动标注颅骨上的特征点并进行跨模态匹配。该框架利用YOLO-pose模型对二维颅骨图像进行自动标注,并通过图形表示和跨注意力机制实现跨模态匹配。实验验证该框架在S2F和CUHK数据集上的性能显著提高,有效应用于法医科学的跨域颅骨到人脸和草图到人脸匹配。
Key Takeaways
- Craniometric landmarks are significant in forensic craniofacial identification and biomedical applications.
- 传统方法寻找特征点耗时且需要专业知识和经验。
- 当前方法使用叠加和深度学习进行自动标注,但缺乏大规模验证研究,可靠性不足。
- 新型框架Cranio-ID提出自动标注二维颅骨图像上的特征点。
- 通过图形表示和跨注意力机制实现跨模态匹配。
- 在S2F和CUHK数据集上进行实验验证,性能显著提高。
点此查看论文截图
The SRG/eROSITA all-sky survey: X-ray scaling relations of galaxy groups and clusters in the western Galactic hemisphere
Authors:M. E. Ramos-Ceja, L. Fiorino, E. Bulbul, V. Ghirardini, N. Clerc, A. Liu, J. S. Sanders, Y. E. Bahar, J. Dietl, M. Kluge, F. Pacaud, E. Artis, F. Balzer, J. Comparat, Z. Ding, N. Malavasi, A. Merloni, T. Mistele, K. Nandra, R. Seppi, S. Zelmer, X. Zhang
The soft X-ray telescope on board the Spectrum-Roentgen-Gamma (SRG) mission, eROSITA (extended ROentgen Survey with an Imaging Telescope Array), has produced the largest sample to date of galaxy groups and clusters detected via their intracluster/intragroup medium (ICM/IGrM) emission. Scaling relations between the intrinsic properties of these systems provide valuable insight into their formation and evolution. In this work, we investigate the scaling relations between key physical properties, such as soft band X-ray luminosity, temperature, gas mass, and the low-scatter mass proxy $Y_{\rm X}$, for the galaxy groups and clusters detected in the first eROSITA All-Sky Survey (eRASS1). Our analysis fully accounts for selection effects and the redshift evolution of the observable distributions. We construct a high-purity sample of $3061$ galaxy groups and clusters spanning the redshift range $0.05<z<1.07$ and mass range of $1.1\times10^{13}<M_{500}/$M${\odot}<1.6\times10^{15}$. This represents the largest sample to date used for scaling relation analysis. The selection function, derived from state-of-the-art simulations of the eROSITA sky, is rigorously incorporated into our modeling. We report best-fit parameters - normalization, slope, redshift evolution, and intrinsic scatter - for a set of scaling relations: $L{\mathrm{X}}-T$, $L_{\mathrm{X}}-M_{\rm gas}$, $L_{\mathrm{X}}-Y_{\rm X}$, as well as the $M_{\rm gas}-T$ relation. Our best-fit models indicate that the slopes of the scaling relations deviate significantly from self-similar expectations, while the redshift evolution remains consistent with the self-similar model. The fits exhibit small statistical uncertainties, likely owing to the large sample size. Our results are in good agreement with previous observational studies that account for selection effects, as well as with simulations that incorporate non-gravitational physics.
搭载在Spectrum-Roentgen-Gamma(SRG)任务上的软X射线望远镜eROSITA(扩展的成像望远镜阵列进行的ROentgen调查)至今已经产生了通过集群内或组内介质(ICM/IGrM)发射检测到的最大的星系团样本。这些系统内在属性之间的标度关系为深入了解它们的形成和演化提供了宝贵的见解。在这项工作中,我们研究了软X射线光度、温度、气体质量和低散射质量代理Yx等关键物理属性之间的标度关系,这些星系团是在第一次eROSITA全天空普查(eRASS1)中检测到的。我们的分析充分考虑了选择效应和可观测分布的红移演化。我们构建了跨越红移范围0.05<z<1.07和质量范围1.1×10^13<M_{500}/M_{\odot}<1.6×10^15的高纯度样本,包含3061个星系团。这是迄今为止用于标度关系分析的最大样本。选择函数是从eROSITA天空的先进模拟中得出的,被严格地纳入我们的模型中。我们报告了一组标度关系的最佳拟合参数,包括Lx-T、Lx-Mgas、Lx-Yx以及Mgas-T关系。我们的最佳拟合模型表明,标度关系的斜率与自相似预期存在显著差异,而红移演化与自相似模型保持一致。拟合结果具有很小的统计不确定性,这可能是由于样本量大的原因。我们的研究结果与考虑选择效应的前瞻性观测研究以及考虑到非重力作用的模拟结果吻合良好。
论文及项目相关链接
PDF 13 pages, 18 figures. Submitted to A&A
Summary
eROSITA望远镜在SRG任务上观察到迄今为止最大的星系团集群样本,通过其团内/组内介质发射检测得到。分析这些系统的内在属性之间的标度关系,为理解它们的形成和演化提供了宝贵见解。研究团队探究了软X射线光度、温度、气体质量等关键物理属性之间的标度关系,以及第一个eROSITA全天巡天得到的星系团样本的低散射质量代理Yx。该样本跨越了红移范围0.05<z<1.07和质量范围,代表了迄今为止用于标度关系分析的最大样本。研究团队将选择函数纳入模型中,并报告了一系列最佳拟合参数。结果表明,标度关系的斜率与自相似预期存在显著差异,而红移演化则与自相似模型一致。
Key Takeaways
- eROSITA望远镜观测到了迄今为止最大的星系团集群样本。
- 通过分析这些星系团和集群的内在属性之间的标度关系,揭示了它们可能的形成和演化过程。
- 研究了软X射线光度、温度、气体质量和Yx等关键物理属性的标度关系。
- 样本涵盖了广泛的红移范围和质量范围,为标度关系分析提供了丰富的数据。
- 严格考虑了选择效应,并将选择函数纳入模型中进行分析。
- 标度关系的斜率与自相似预期存在差异,而红移演化则与自相似模型一致。
点此查看论文截图
Step by Step Network
Authors:Dongchen Han, Tianzhu Ye, Zhuofan Xia, Kaiyi Chen, Yulin Wang, Hanting Chen, Gao Huang
Scaling up network depth is a fundamental pursuit in neural architecture design, as theory suggests that deeper models offer exponentially greater capability. Benefiting from the residual connections, modern neural networks can scale up to more than one hundred layers and enjoy wide success. However, as networks continue to deepen, current architectures often struggle to realize their theoretical capacity improvements, calling for more advanced designs to further unleash the potential of deeper networks. In this paper, we identify two key barriers that obstruct residual models from scaling deeper: shortcut degradation and limited width. Shortcut degradation hinders deep-layer learning, while the inherent depth-width trade-off imposes limited width. To mitigate these issues, we propose a generalized residual architecture dubbed Step by Step Network (StepsNet) to bridge the gap between theoretical potential and practical performance of deep models. Specifically, we separate features along the channel dimension and let the model learn progressively via stacking blocks with increasing width. The resulting method mitigates the two identified problems and serves as a versatile macro design applicable to various models. Extensive experiments show that our method consistently outperforms residual models across diverse tasks, including image classification, object detection, semantic segmentation, and language modeling. These results position StepsNet as a superior generalization of the widely adopted residual architecture.
扩大网络深度是神经网络架构设计中的基本追求,因为理论表明,更深的模型能力呈指数级增长。得益于残差连接,现代神经网络可以扩展到一百层以上,并获得了广泛的成功。然而,随着网络的继续深化,当前架构往往难以实现其理论上的性能提升,这需要更先进的设计来进一步释放更深网络的潜力。在本文中,我们确定了阻碍残差模型进一步深化的两个关键障碍:捷径退化和有限宽度。捷径退化阻碍了深层学习,而固有的深度宽度权衡导致了宽度的限制。为了解决这些问题,我们提出了一种通用残差架构,名为“逐步网络”(StepsNet),以弥补深模型理论潜力与实际性能之间的鸿沟。具体来说,我们沿通道维度分离特征,并通过堆叠宽度不断增加的块来让模型逐步学习。由此产生的方法缓解了这两个已确定的问题,并作为一种适用于各种模型的通用宏观设计。大量实验表明,我们的方法在各项任务上均优于残差模型,包括图像分类、目标检测、语义分割和语言建模。这些结果将StepsNet定位为广泛采用的残差架构的优越通用化版本。
论文及项目相关链接
Summary
该文探讨了神经网络深度扩展的问题,指出残差模型在深度扩展时面临两大障碍:shortcut降解和宽度受限。为此,提出了一种名为StepsNet的广义残差架构,通过分离特征通道并逐步增加宽度,解决了这些问题,并在多种任务上实现了对残差模型的性能超越。
Key Takeaways
- 神经网络深度扩展是神经架构设计中的基本追求,但现有模型在深度增加时难以实现理论上的性能提升。
- 残差模型在深度扩展时面临两大障碍:shortcut降解和宽度受限。
- Shortcut降解阻碍深层学习,而宽度受限则受到深度与宽度之间的固有权衡限制。
- StepsNet作为一种广义残差架构,旨在弥补深层模型理论潜力与实际应用之间的鸿沟。
- StepsNet通过分离特征通道并逐渐增加宽度,让模型逐步学习。
- StepsNet在多种任务上实现了对残差模型的性能超越,包括图像分类、目标检测、语义分割和语言建模等。
点此查看论文截图
Dental3R: Geometry-Aware Pairing for Intraoral 3D Reconstruction from Sparse-View Photographs
Authors:Yiyi Miao, Taoyu Wu, Tong Chen, Ji Jiang, Zhe Tang, Zhengyong Jiang, Angelos Stefanidis, Limin Yu, Jionglong Su
Intraoral 3D reconstruction is fundamental to digital orthodontics, yet conventional methods like intraoral scanning are inaccessible for remote tele-orthodontics, which typically relies on sparse smartphone imagery. While 3D Gaussian Splatting (3DGS) shows promise for novel view synthesis, its application to the standard clinical triad of unposed anterior and bilateral buccal photographs is challenging. The large view baselines, inconsistent illumination, and specular surfaces common in intraoral settings can destabilize simultaneous pose and geometry estimation. Furthermore, sparse-view photometric supervision often induces a frequency bias, leading to over-smoothed reconstructions that lose critical diagnostic details. To address these limitations, we propose \textbf{Dental3R}, a pose-free, graph-guided pipeline for robust, high-fidelity reconstruction from sparse intraoral photographs. Our method first constructs a Geometry-Aware Pairing Strategy (GAPS) to intelligently select a compact subgraph of high-value image pairs. The GAPS focuses on correspondence matching, thereby improving the stability of the geometry initialization and reducing memory usage. Building on the recovered poses and point cloud, we train the 3DGS model with a wavelet-regularized objective. By enforcing band-limited fidelity using a discrete wavelet transform, our approach preserves fine enamel boundaries and interproximal edges while suppressing high-frequency artifacts. We validate our approach on a large-scale dataset of 950 clinical cases and an additional video-based test set of 195 cases. Experimental results demonstrate that Dental3R effectively handles sparse, unposed inputs and achieves superior novel view synthesis quality for dental occlusion visualization, outperforming state-of-the-art methods.
口腔内3D重建是数字正畸的基础,但传统的如口腔内扫描等方法对于远程正畸治疗并不可行,远程正畸通常依赖于稀疏的智能手机图像。虽然3D高斯摊铺(3DGS)在新型视图合成方面显示出潜力,但将其应用于标准临床三联的前方和双侧颊侧照片却具有挑战性。口腔环境中常见的视野基线大、照明不一致和镜面表面可能会破坏姿态和几何估计的同时性。此外,稀疏视图的光度监督通常会产生频率偏差,导致过度平滑的重建,从而丢失关键的诊断细节。为了解决这些局限性,我们提出了\textbf{Dental3R},这是一种无姿态、以图引导的管道,可从稀疏的口腔照片中进行稳健、高保真重建。我们的方法首先构建了一个几何感知配对策略(GAPS),以智能选择高价值图像对组成的紧凑子图。GAPS专注于对应匹配,从而提高几何初始化的稳定性并减少内存使用。在恢复的姿态和点云的基础上,我们采用小波正则化的目标训练了3DGS模型。通过离散小波变换强制执行带限保真度,我们的方法能够保留精细的釉质边界和邻接边缘,同时抑制高频伪影。我们在包含950个临床病例的大规模数据集和额外的基于视频的测试集(包含195个病例)上验证了我们的方法。实验结果表明,Dental3R有效地处理了稀疏且无姿态的输入,并在牙齿咬合可视化方面实现了卓越的新型视图合成质量,超过了最先进的方法。
论文及项目相关链接
Summary
本文提出了一种名为Dental3R的方法,用于从稀疏的口腔内照片中进行稳健、高保真度的重建。通过构造Geometry-Aware Pairing Strategy (GAPS)和利用离散小波变换强制有限带宽保真,提高了模型在处理不一致照明、大的视点基准线等问题时的稳定性和几何恢复的精度。通过大型临床病例数据集的实验验证,Dental3R在稀疏、未定位输入的情况下表现出优异的性能,实现了高质量的牙齿咬合可视化,优于现有方法。
Key Takeaways
- 介绍了远程正畸面临的限制和挑战,如缺乏高效的3D重建方法用于获取高质量的牙齿信息。
- 详细描述了如何利用3D Gaussian Splatting (3DGS)方法进行牙科图像重建的挑战及其重要性。
- 突出了新提出的Dental3R方法的关键组成部分,包括Geometry-Aware Pairing Strategy (GAPS)和基于小波正则化的目标函数优化。
- 通过大规模数据集的实验验证,证明了Dental3R方法在稀疏、未定位输入条件下对口腔图像进行稳健、高保真重建的能力。
点此查看论文截图
Iterative Diffusion-Refined Neural Attenuation Fields for Multi-Source Stationary CT Reconstruction: NAF Meets Diffusion Model
Authors:Jiancheng Fang, Shaoyu Wang, Junlin Wang, Weiwen Wu, Yikun Zhang, Qiegen Liu
Multi-source stationary computed tomography (CT) has recently attracted attention for its ability to achieve rapid image reconstruction, making it suitable for time-sensitive clinical and industrial applications. However, practical systems are often constrained by ultra-sparse-view sampling, which significantly degrades reconstruction quality. Traditional methods struggle under ultra-sparse-view settings, where interpolation becomes inaccurate and the resulting reconstructions are unsatisfactory. To address this challenge, this study proposes Diffusion-Refined Neural Attenuation Fields (Diff-NAF), an iterative framework tailored for multi-source stationary CT under ultra-sparse-view conditions. Diff-NAF combines a Neural Attenuation Field representation with a dual-branch conditional diffusion model. The process begins by training an initial NAF using ultra-sparse-view projections. New projections are then generated through an Angle-Prior Guided Projection Synthesis strategy that exploits inter view priors, and are subsequently refined by a Diffusion-driven Reuse Projection Refinement Module. The refined projections are incorporated as pseudo-labels into the training set for the next iteration. Through iterative refinement, Diff-NAF progressively enhances projection completeness and reconstruction fidelity under ultra-sparse-view conditions, ultimately yielding high-quality CT reconstructions. Experimental results on multiple simulated 3D CT volumes and real projection data demonstrate that Diff-NAF achieves the best performance under ultra-sparse-view conditions.
多源静止计算机断层扫描(CT)因其快速图像重建能力而最近备受关注,使其适用于时间敏感的临床和工业应用。然而,实际系统通常受到超稀疏视图采样的限制,这显著降低了重建质量。在超稀疏视图条件下,传统方法表现不佳,插值变得不准确,重建结果令人不满意。为了解决这一挑战,本研究提出了扩散细化神经网络衰减场(Diff-NAF),这是一个针对多源静止CT在超稀疏视图条件下的迭代框架。Diff-NAF结合了神经网络衰减场表示和双分支条件扩散模型。过程首先使用超稀疏视图投影训练初始NAF。然后,通过利用视角先验的角度引导投影合成策略生成新的投影,随后通过扩散驱动的重用投影细化模块进行细化。细化后的投影被作为伪标签纳入下一轮训练集。通过迭代细化,Diff-NAF在超稀疏视图条件下逐步提高了投影的完整性和重建的忠实度,最终产生高质量的CT重建图像。在多个模拟的3D CT体积和真实投影数据上的实验结果表明,Diff-NAF在超稀疏视图条件下表现最佳。
论文及项目相关链接
Summary
这篇文本介绍了多源固定计算机断层扫描(CT)技术及其面临的挑战,即超稀疏视角采样导致的重建质量下降问题。针对这一问题,本文提出了一种名为Diffusion-Refined Neural Attenuation Fields(Diff-NAF)的迭代框架,能够在超稀疏视角条件下提高图像重建的速度和质量。Diff-NAF结合了神经衰减场表示和双分支条件扩散模型,通过迭代优化逐步提高了投影的完整性和重建的准确性。实验结果表明,Diff-NAF在超稀疏视角条件下取得了最佳性能。
Key Takeaways
- 多源固定CT技术因快速图像重建而备受关注,适用于时间敏感的临床和工业应用。
- 超稀疏视角采样是实践中的常见问题,会显著影响重建质量。
- 传统方法在超稀疏视角条件下表现不佳,插值不准确,重建结果不理想。
- Diff-NAF框架结合神经衰减场表示和双分支条件扩散模型,专门用于多源固定CT的超稀疏视角条件。
- Diff-NAF通过角度引导投影合成策略和扩散驱动的重用投影优化模块进行迭代优化。
- 实验结果表明,Diff-NAF在多个模拟的3D CT体积和真实投影数据上,在超稀疏视角条件下取得了最佳性能。
点此查看论文截图
SAM-Fed: SAM-Guided Federated Semi-Supervised Learning for Medical Image Segmentation
Authors:Sahar Nasirihaghighi, Negin Ghamsarian, Yiping Li, Marcel Breeuwer, Raphael Sznitman, Klaus Schoeffmann
Medical image segmentation is clinically important, yet data privacy and the cost of expert annotation limit the availability of labeled data. Federated semi-supervised learning (FSSL) offers a solution but faces two challenges: pseudo-label reliability depends on the strength of local models, and client devices often require compact or heterogeneous architectures due to limited computational resources. These constraints reduce the quality and stability of pseudo-labels, while large models, though more accurate, cannot be trained or used for routine inference on client devices. We propose SAM-Fed, a federated semi-supervised framework that leverages a high-capacity segmentation foundation model to guide lightweight clients during training. SAM-Fed combines dual knowledge distillation with an adaptive agreement mechanism to refine pixel-level supervision. Experiments on skin lesion and polyp segmentation across homogeneous and heterogeneous settings show that SAM-Fed consistently outperforms state-of-the-art FSSL methods.
医学图像分割在临床具有重要意义,然而数据隐私和专家标注的成本限制了标记数据的可用性。联合半监督学习(FSSL)提供了解决方案,但面临两个挑战:伪标签的可靠性取决于本地模型的强度,而由于计算资源有限,客户端设备通常需要紧凑或异构架构。这些约束降低了伪标签的质量和稳定性,而大型模型虽然更准确,但不能在客户端设备进行常规推理训练或使用。我们提出了SAM-Fed,这是一个联邦半监督框架,利用高容量的分割基础模型在训练过程中引导轻量级客户端。SAM-Fed结合双知识蒸馏和自适应协议机制来优化像素级监督。在皮肤病变和息肉分割的同质和异质设置上的实验表明,SAM-Fed始终优于最新的FSSL方法。
论文及项目相关链接
Summary
医学图像分割在临床中具有重要意义,但由于数据隐私和专家标注成本高昂,导致标注数据有限。联邦半监督学习(FSSL)虽为此提供解决方案,但面临伪标签可靠性和客户端计算资源限制两大挑战。为此,我们提出SAM-Fed框架,利用高容量的分割基础模型指导轻量级客户端进行训练,并结合双重知识蒸馏和自适应共识机制进行像素级监督优化。实验表明,SAM-Fed在不同设置下对皮肤病变和息肉分割的分割效果均优于现有FSSL方法。
Key Takeaways
- 医学图像分割对临床重要且具有挑战性。
- 数据隐私和专家标注成本限制标注数据的可用性。
- 联邦半监督学习(FSSL)是解决该问题的一种方法。
- FSSL面临伪标签可靠性和客户端计算资源限制的挑战。
- SAM-Fed框架利用高容量分割基础模型指导客户端训练。
- SAM-Fed通过双重知识蒸馏和自适应共识机制优化像素级监督。
点此查看论文截图
NeuralBoneReg: A Novel Self-Supervised Method for Robust and Accurate Multi-Modal Bone Surface Registration
Authors:Luohong Wu, Matthias Seibold, Nicola A. Cavalcanti, Yunke Ao, Roman Flepp, Aidana Massalimova, Lilian Calvet, Philipp Fürnstahl
In computer- and robot-assisted orthopedic surgery (CAOS), patient-specific surgical plans derived from preoperative imaging define target locations and implant trajectories. During surgery, these plans must be accurately transferred, relying on precise cross-registration between preoperative and intraoperative data. However, substantial modality heterogeneity across imaging modalities makes this registration challenging and error-prone. Robust, automatic, and modality-agnostic bone surface registration is therefore clinically important. We propose NeuralBoneReg, a self-supervised, surface-based framework that registers bone surfaces using 3D point clouds as a modality-agnostic representation. NeuralBoneReg includes two modules: an implicit neural unsigned distance field (UDF) that learns the preoperative bone model, and an MLP-based registration module that performs global initialization and local refinement by generating transformation hypotheses to align the intraoperative point cloud with the neural UDF. Unlike SOTA supervised methods, NeuralBoneReg operates in a self-supervised manner, without requiring inter-subject training data. We evaluated NeuralBoneReg against baseline methods on two publicly available multi-modal datasets: a CT-ultrasound dataset of the fibula and tibia (UltraBones100k) and a CT-RGB-D dataset of spinal vertebrae (SpineDepth). The evaluation also includes a newly introduced CT–ultrasound dataset of cadaveric subjects containing femur and pelvis (UltraBones-Hip), which will be made publicly available. NeuralBoneReg matches or surpasses existing methods across all datasets, achieving mean RRE/RTE of 1.68°/1.86 mm on UltraBones100k, 1.88°/1.89 mm on UltraBones-Hip, and 3.79°/2.45 mm on SpineDepth. These results demonstrate strong generalizability across anatomies and modalities, providing robust and accurate cross-modal alignment for CAOS.
在计算机辅助骨科手术(CAOS)中,根据术前成像制定的患者特异性手术计划定义了目标位置和植入物轨迹。在手术过程中,必须准确传输这些计划,这依赖于术前和术中数据之间的精确跨注册。然而,不同成像方式之间的模态异质性使得这种注册具有挑战性和易出错。因此,稳健、自动、与模态无关的骨表面注册在临床上具有重要意义。我们提出了NeuralBoneReg,这是一个基于自监督、基于表面的框架,使用3D点云作为与模态无关的表示来注册骨表面。NeuralBoneReg包括两个模块:一个隐式神经网络无符号距离场(UDF),用于学习术前骨模型,一个基于多层感知器(MLP)的注册模块,通过生成转换假设来执行全局初始化和局部细化,以将术中点云与神经UDF对齐。与最先进的监督方法不同,NeuralBoneReg采用自监督方式运行,无需跨主体训练数据。我们在两个公开可用的多模式数据集上对NeuralBoneReg与基准方法进行了评估:腓骨和胫骨的CT-超声数据集(UltraBones100k)和脊椎骨的CT-RGB-D数据集(SpineDepth)。评估还包括新引入的包含股骨和骨盆的尸检CT-超声数据集(UltraBones-Hip),该数据集将公开提供。NeuralBoneReg在所有数据集上均达到或超越了现有方法,在UltraBones100k上的平均RRE/RTE为1.68°/1.86毫米,在UltraBones-Hip上为1.88°/1.89毫米,在SpineDepth上为3.79°/2.45毫米。这些结果证明了在不同解剖结构和模态下的强大通用性,为CAOS提供了稳健而准确的跨模态对齐。
论文及项目相关链接
Summary
本文介绍了在骨科计算机辅助手术(CAOS)中,基于术前成像制定患者特异性手术计划的重要性。文章提出了NeuralBoneReg,一种自监督的基于表面的框架,用于使用3D点云作为模态无关表示进行骨骼表面注册。该方法包括两个模块:学习术前骨骼模型的隐式神经无符号距离场(UDF)和基于MLP的注册模块,该模块通过生成转换假设来执行全局初始化和局部细化,使术中点云与神经UDF对齐。在多个公共可用多模态数据集上的评估表明,NeuralBoneReg在跨模态对齐方面表现出强大的泛化能力,为CAOS提供了稳健和准确的交叉模态对齐。
Key Takeaways
- 计算机辅助骨科手术(CAOS)中,需根据术前成像制定患者特异性手术计划,明确目标位置和植入轨迹。
- 术中需要准确转换这些计划,这依赖于术前和术中的数据之间的精确跨注册。
- 不同的成像模式之间存在显著的模态异质性,使得注册具有挑战性和易出错性。
- 提出了NeuralBoneReg,一种自监督的基于表面的框架,用于使用3D点云进行骨骼表面注册,具有模态无关性。
- NeuralBoneReg包括两个模块:隐式神经无符号距离场(UDF)和基于MLP的注册模块。
- NeuralBoneReg在不同数据集上的评估表现出强大的泛化能力和准确性。
点此查看论文截图
Unveiling the Sources of X-ray Luminosity in DESI Galaxy Groups: Insights from the SRG/eROSITA All-Sky Survey
Authors:YunLiang Zheng, Xiaohu Yang, Teng Liu, Shijiang Chen, Esra Bulbul, Ang Liu, Yi Zhang, Dawei Li, Xi Kang, Yizhou Gu, Yirong Wang, Qingyang Li, Jiaqi Wang
We use the first eROSITA all-sky survey (eRASS1) to investigate the contributions of AGN and extended gas to the total X-ray luminosity ($L_X$) of galaxy groups with different halo masses ($M_h$) at different redshifts. The presence of AGN in their central galaxies is identified using multi-wavelength catalogs, including the X-ray counterparts, the ASKAP radio catalog, and the DESI spectroscopic measurements. We apply the stacking method to obtain sufficient statistics for the X-ray surface brightness profile and the $L_X$ for groups with different central AGN properties. We find that the X-ray groups exhibit the highest $L_X$, followed by groups with QSO, radio, BPT-AGN, and non-AGN centrals. Moreover, the $L_X$ of the $M_h \lesssim 10^{13}h^{-1}M_\odot$ groups is dominated by the central AGN, while the X-ray emission from extended gas tends to be more prominent in the $M_h \gtrsim 10^{13}h^{-1}M_\odot$ groups. In groups where the AGN play a major role in X-ray emission, the contribution from extended gas is minor, resulting in significant uncertainties concerning the extended X-ray emission. When the subset containing the X-ray detected counterparts is excluded, the extended gas component becomes easier to obtain. A correlation has been identified between the X-ray luminosity of the central AGN and extended gas. However, once we account for the positional offset, their correlation becomes less prominent. Currently, the results are not conclusive enough to confirm whether there is a connection between the AGN feedback and extended gas. However, they provide a new perspective on the feedback processes in the history of group assembly.
我们利用eROSITA首次全天空调查(eRASS1)来研究活动星系核(AGN)和延展气体对不同红移下不同晕质量(Mh)星系群总X射线光度(Lx)的贡献。通过多波段目录,包括X射线对应体、ASKAP射电目录和DESI光谱测量,确定了其中心星系中活动星系核的存在。我们应用叠加方法获得不同中心活动星系核属性的星系的X射线表面亮度分布和Lx的足够统计数据。我们发现,X射线星系的Lx最高,其次是具有类星体、射电、BPT-AGN和非活动星系核的中心星系。此外,对于Mh小于或等于10^13h^-1M_⊙的星系群来说,中心活动星系核对Lx的贡献占主导地位,而延展气体的X射线发射在Mh大于或等于10^13h^-1M_⊙的星系群中更为突出。在中心活动星系核对X射线发射起主要作用的星系群中,延展气体的贡献较小,导致延展X射线发射存在重大不确定性。当排除包含X射线检测对应体的子集时,更容易获得延展气体成分。中央活动星系核的X射线光度和延展气体之间已经发现了一定的相关性。然而,一旦考虑到位置偏移,它们之间的关联就变得不那么显著。目前的结果尚不足以确定活动星系核反馈与延展气体之间是否存在联系。然而,这为研究星系团形成过程中的反馈过程提供了新的视角。
论文及项目相关链接
PDF 29 pages, 14 figures, ApJ accepted
Summary
利用eROSITA全天图首次调查(eRASS1)研究不同中心星系活动星系核(AGN)的贡献以及不同质量暗物质晕(Mh)星系团中延展气体对总X射线光度(Lx)的贡献。通过多波段目录识别中央星系中的活动星系核,包括X射线对应物、ASKAP无线电目录和DESI光谱测量。应用堆叠方法获得不同中央活动星系核特性的星系的X射线表面亮度分布和Lx。发现X射线星团具有最高的Lx,其次是QSO、无线电、BPT- AGN和非活动星系核星团。对于Mh小于等于10^13h^-1M⊙的星团,中央活动星系核对Lx的贡献占主导地位,而延展气体的X射线发射在Mh大于或等于10^13h^-1M⊙的星团中更为突出。活动星系核对X射线发射起主要作用的星团中,延展气体的贡献较小,导致延展X射线发射存在重大不确定性。当排除包含X射线检测对应物的子集时,更容易获得延展气体成分。中央活动星系核的X射线光度和延展气体之间存在相关性,但考虑到位置偏移,其相关性变得不那么明显。目前的结果尚不足以证实活动星系核反馈与延展气体之间的联系,但它们为群体组装过程中的反馈过程提供了新的视角。
Key Takeaways
- 利用eROSITA全天图首次调查研究了不同中心星系的活动星系核(AGN)和延展气体对星系团总X射线光度的贡献。
- 通过多波段目录识别中央星系中的活动星系核。
- 应用堆叠方法分析不同中央活动星系核特性的星系的X射线表面亮度分布和Lx。
- 发现不同类型的星团在Lx上存在差异,其中X射线星团具有最高的Lx值。
- 在低质量星团中,中央活动星系核对Lx的贡献显著;而在高质量星团中,延展气体的X射线发射更为突出。
- 活动星系核对X射线发射起主导作用的星团中,延展气体的贡献存在较大的不确定性。
点此查看论文截图
GCA-ResUNet:Image segmentation in medical images using grouped coordinate attention
Authors:Jun Ding, Shang Gao
Medical image segmentation underpins computer-aided diagnosis and therapy by supporting clinical diagnosis, preoperative planning, and disease monitoring. While U-Net style convolutional neural networks perform well due to their encoder-decoder structures with skip connections, they struggle to capture long-range dependencies. Transformer-based variants address global context but often require heavy computation and large training datasets. This paper proposes GCA-ResUNet, an efficient segmentation network that integrates Grouped Coordinate Attention (GCA) into ResNet-50 residual blocks. GCA uses grouped coordinate modeling to jointly encode global dependencies across channels and spatial locations, strengthening feature representation and boundary delineation while adding minimal parameter and FLOP overhead compared with self-attention. On the Synapse dataset, GCA-ResUNet achieves a Dice score of 86.11%, and on the ACDC dataset, it reaches 92.64%, surpassing several state-of-the-art baselines while maintaining fast inference and favorable computational efficiency. These results indicate that GCA offers a practical way to enhance convolutional architectures with global modeling capability, enabling high-accuracy and resource-efficient medical image segmentation.
医学图像分割是计算机辅助诊断和治疗的基石,它支持临床诊断、术前规划和疾病监测。虽然U-Net风格的卷积神经网络由于其带有跳跃连接的编码器-解码器结构而表现良好,但在捕捉长期依赖关系方面仍有困难。基于Transformer的变体解决了全局上下文的问题,但通常需要大量的计算和大型训练数据集。本文提出了GCA-ResUNet,这是一种高效的分割网络,它将分组坐标注意力(GCA)集成到ResNet-50残差块中。GCA使用分组坐标建模来联合编码通道和空间位置之间的全局依赖关系,加强了特征表示和边界描绘,与自注意力相比增加了极少的参数和FLOP开销。在Synapse数据集上,GCA-ResUNet的Dice系数为86.11%,在ACDC数据集上达到了92.64%,超越了多个最先进的基线,同时保持了快速推理和计算效率。这些结果表明,GCA提供了一种实用的方法,可以在卷积架构中增强全局建模能力,从而实现高准确性和资源高效的医学图像分割。
论文及项目相关链接
Summary
医学图像分割是计算机辅助诊断和治疗的基石,为临床诊断、术前规划和疾病监测提供支持。本文提出一种高效的分割网络GCA-ResUNet,它将Grouped Coordinate Attention(GCA)融入ResNet-50残差块中。GCA能够在编码全局依赖关系的同时增强特征表示和边界描绘,且参数和FLOPs开销较小。在Synapse和ACDC数据集上的实验结果表明,GCA-ResUNet实现了较高的分割精度和计算效率,为卷积架构提供了实用的全局建模增强方式。
Key Takeaways
- 医学图像分割在辅助临床诊断和治疗过程中具有关键作用。
- U-Net风格的卷积神经网络在医学图像分割中表现良好,但难以捕捉长距离依赖关系。
- Transformer变体能够处理全局上下文信息,但需要大量计算和训练数据集。
- GCA-ResUNet是一个结合Grouped Coordinate Attention(GCA)和ResNet-50的分割网络。
- GCA能够在编码全局依赖关系的同时强化特征表示和边界描绘,且开销较小。
- GCA-ResUNet在Synapse和ACDC数据集上实现了高分割精度和计算效率。