嘘~ 正在从服务器偷取页面 . . .

医学图像


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-11-18 更新

LARM: A Large Articulated-Object Reconstruction Model

Authors:Sylvia Yuan, Ruoxi Shi, Xinyue Wei, Xiaoshuai Zhang, Hao Su, Minghua Liu

Modeling 3D articulated objects with realistic geometry, textures, and kinematics is essential for a wide range of applications. However, existing optimization-based reconstruction methods often require dense multi-view inputs and expensive per-instance optimization, limiting their scalability. Recent feedforward approaches offer faster alternatives but frequently produce coarse geometry, lack texture reconstruction, and rely on brittle, complex multi-stage pipelines. We introduce LARM, a unified feedforward framework that reconstructs 3D articulated objects from sparse-view images by jointly recovering detailed geometry, realistic textures, and accurate joint structures. LARM extends LVSM a recent novel view synthesis (NVS) approach for static 3D objects into the articulated setting by jointly reasoning over camera pose and articulation variation using a transformer-based architecture, enabling scalable and accurate novel view synthesis. In addition, LARM generates auxiliary outputs such as depth maps and part masks to facilitate explicit 3D mesh extraction and joint estimation. Our pipeline eliminates the need for dense supervision and supports high-fidelity reconstruction across diverse object categories. Extensive experiments demonstrate that LARM outperforms state-of-the-art methods in both novel view and state synthesis as well as 3D articulated object reconstruction, generating high-quality meshes that closely adhere to the input images. project page: https://sylviayuan-sy.github.io/larm-site/

对具有真实几何、纹理和运动学的3D关节对象进行建模对于各种应用至关重要。然而,现有的基于优化的重建方法通常需要密集的多视角输入和昂贵的单实例优化,这限制了其可扩展性。最近的前馈方法提供了更快的替代方案,但通常产生粗糙的几何形状,缺乏纹理重建,并且依赖于脆弱、复杂的多阶段流水线。我们引入了LARM,这是一个统一的前馈框架,可以从稀疏视角图像重建3D关节对象,通过联合恢复详细的几何形状、现实的纹理和精确的关节结构。LARM将LVSM(一种针对静态3D对象的新型视图合成(NVS)方法)扩展到关节设定中,通过使用基于transformer的架构联合推理相机姿态和关节变化,从而实现可扩展和准确的新型视图合成。此外,LARM生成辅助输出,如深度图和部分掩膜,以促进明确的3D网格提取和关节估计。我们的流水线消除了对密集监督的需求,支持跨不同对象类别的高保真重建。大量实验表明,LARM在新型视图、状态合成以及3D关节对象重建方面均优于最新方法,生成的高质量网格与输入图像紧密贴合。项目页面:https://sylviayuan-sy.github.io/larm-site/

论文及项目相关链接

PDF project page: https://sylviayuan-sy.github.io/larm-site/

Summary

本文介绍了一种名为LARM的统一前馈框架,用于从稀疏视角图像重建3D关节活动对象。LARM通过联合恢复详细几何、真实纹理和精确关节结构,扩展了针对静态3D对象的最新新型视图合成方法LVSM,进入关节活动设置。LARM使用基于变压器的架构进行相机姿态和关节活动变化的联合推理,实现了可扩展和准确的新型视图合成。此外,LARM生成深度图和部分掩膜等辅助输出,促进明确的3D网格提取和关节估计。该管道无需密集监督,支持不同对象类别的高保真重建。

Key Takeaways

  1. LARM是一个用于从稀疏视角图像重建3D关节活动对象的前馈框架。
  2. LARM扩展了针对静态3D对象的视图合成方法LVSM,以处理关节活动对象。
  3. LARM通过联合恢复详细几何、真实纹理和精确关节结构,提供高质量的3D重建。
  4. LARM使用基于变压器的架构进行相机姿态和关节活动的联合推理。
  5. LARM支持从稀疏视角图像进行新型视图合成,具有可扩展性和准确性。
  6. LARM生成深度图和部分掩膜等辅助输出,促进3D网格提取和关节估计。

Cool Papers

点此查看论文截图

CVChess: A Deep Learning Framework for Converting Chessboard Images to Forsyth-Edwards Notation

Authors:Luthira Abeykoon, Ved Patel, Gawthaman Senthilvelan, Darshan Kasundra

Chess has experienced a large increase in viewership since the pandemic, driven largely by the accessibility of online learning platforms. However, no equivalent assistance exists for physical chess games, creating a divide between analog and digital chess experiences. This paper presents CVChess, a deep learning framework for converting chessboard images to Forsyth-Edwards Notation (FEN), which is later input into online chess engines to provide you with the best next move. Our approach employs a convolutional neural network (CNN) with residual layers to perform piece recognition from smartphone camera images. The system processes RGB images of a physical chess board through a multistep process: image preprocessing using the Hough Line Transform for edge detection, projective transform to achieve a top-down board alignment, segmentation into 64 individual squares, and piece classification into 13 classes (6 unique white pieces, 6 unique black pieces and an empty square) using the residual CNN. Residual connections help retain low-level visual features while enabling deeper feature extraction, improving accuracy and stability during training. We train and evaluate our model using the Chess Recognition Dataset (ChessReD), containing 10,800 annotated smartphone images captured under diverse lighting conditions and angles. The resulting classifications are encoded as an FEN string, which can be fed into a chess engine to generate the most optimal move

自疫情以来,象棋的观众数量大幅增加,这主要得益于在线学习平台的可访问性。然而,对于实体象棋游戏,尚没有类似的辅助工具存在,这就造成了模拟象棋与数字象棋体验之间的鸿沟。本文提出了CVChess,这是一个深度学习框架,用于将棋盘图像转换为Forsyth-Edwards记号法(FEN),然后将其输入在线象棋引擎,以提供最佳的下一步行动建议。我们的方法采用带有残差层的卷积神经网络(CNN)来执行从智能手机摄像头图像进行棋子识别。该系统通过多步骤处理物理象棋板的RGB图像:使用霍夫线变换进行边缘检测的图像预处理,投影变换以实现从上到下的棋盘对齐,分割成64个单独的方格,并使用残差CNN将棋子分类为13类(6种独特的白棋,6种独特的黑棋和一个空方格)。残差连接有助于保留低级别的视觉特征,同时实现更深的特征提取,提高训练和预测过程中的准确性和稳定性。我们使用包含在各种光照条件和角度下拍摄的10800张注释过的智能手机图像的象棋识别数据集(ChessReD)来训练和评估我们的模型。所得的分类结果编码为FEN字符串,可以输入到象棋引擎中,生成最优质的行动建议。

论文及项目相关链接

PDF

Summary

疫情期间网络棋牌游戏的观众数量激增,但实体棋盘游戏缺乏相应的辅助工具,造成模拟与数字棋牌体验间的鸿沟。本文提出CVChess深度学习框架,可将棋盘图像转换为Forsyth-Edwards记号(FEN),再输入网络棋牌引擎,给出最佳下一步行动。采用卷积神经网络(CNN)结合残差层进行棋子识别,可通过智能手机摄像头拍摄的图片进行。系统通过多步骤处理图像,包括霍夫线变换边缘检测、投影变换实现棋盘顶部对齐、分割成64个独立方格,并使用残差CNN将棋子分类为13类。残差连接有助于保留低层次视觉特征,同时实现深层次特征提取,提高训练和预测期间的准确性和稳定性。模型在Chess Recognition Dataset数据集上进行训练和评估,包含10800张标注的智能手机图像,可在不同光照条件和角度下进行拍摄。结果编码为FEN字符串,可输入棋牌引擎生成最优走棋步骤。

Key Takeaways

  1. 疫情期间网络棋牌游戏观众增多,实体棋盘游戏缺乏辅助工具。
  2. CVChess框架可将棋盘图像转换为FEN记号。
  3. 采用CNN结合残差层进行棋子识别,可通过智能手机摄像头图片进行。
  4. 系统包括图像预处理、棋盘对齐、分割成方格和棋子分类等步骤。
  5. 残差连接有助于提高模型的准确性和稳定性。
  6. 模型在Chess Recognition Dataset数据集上训练并评估。

Cool Papers

点此查看论文截图

OpenUS: A Fully Open-Source Foundation Model for Ultrasound Image Analysis via Self-Adaptive Masked Contrastive Learning

Authors:Xiaoyu Zheng, Xu Chen, Awais Rauf, Qifan Fu, Benedetta Monosi, Felice Rivellese, Myles J. Lewis, Shaogang Gong, Gregory Slabaugh

Ultrasound (US) is one of the most widely used medical imaging modalities, thanks to its low cost, portability, real-time feedback, and absence of ionizing radiation. However, US image interpretation remains highly operator-dependent and varies significantly across anatomical regions, acquisition protocols, and device types. These variations, along with unique challenges such as speckle, low contrast, and limited standardized annotations, hinder the development of generalizable, label-efficient ultrasound AI models. In this paper, we propose OpenUS, the first reproducible, open-source ultrasound foundation model built on a large collection of public data. OpenUS employs a vision Mamba backbone, capturing both local and global long-range dependencies across the image. To extract rich features during pre-training, we introduce a novel self-adaptive masking framework that combines contrastive learning with masked image modeling. This strategy integrates the teacher’s attention map with student reconstruction loss, adaptively refining clinically-relevant masking to enhance pre-training effectiveness. OpenUS also applies a dynamic learning schedule to progressively adjust the difficulty of the pre-training process. To develop the foundation model, we compile the largest to-date public ultrasound dataset comprising over 308K images from 42 publicly available datasets, covering diverse anatomical regions, institutions, imaging devices, and disease types. Our pre-trained OpenUS model can be easily adapted to specific downstream tasks by serving as a backbone for label-efficient fine-tuning. Code is available at https://github.com/XZheng0427/OpenUS.

超声(US)是最常用的医学成像方式之一,得益于其低成本、便携性、实时反馈和无电离辐射的特点。然而,超声图像解读仍然高度依赖于操作员,并且在不同的解剖区域、采集协议和设备类型之间存在显著差异。这些差异,再加上斑点、低对比度和缺乏标准化注释等独特挑战,阻碍了通用、标签高效的超声人工智能模型的发展。在本文中,我们提出了OpenUS,这是基于大量公共数据构建的、可复现的开源超声基础模型。OpenUS采用视觉Mamba骨干网络,能够捕捉图像中的局部和全局长距离依赖关系。为了在进行预训练时提取丰富的特征,我们引入了一种新型的自适应掩码框架,该框架结合了对比学习与掩码图像建模。该策略将教师的注意力图与学生重建损失相结合,自适应地精细调整与临床相关的掩码,以提高预训练效果。OpenUS还采用动态学习时间表来逐步调整预训练过程的难度。为了开发基础模型,我们整合了迄今为止最大的公共超声数据集,包含超过30.8万张图像,来自42个公开数据集,涵盖多种解剖区域、机构、成像设备和疾病类型。我们的预训练OpenUS模型可以很容易地适应特定的下游任务,作为标签高效微调的主干。代码可访问于 https://github.com/XZheng0427/OpenUS。

论文及项目相关链接

PDF

Summary
本研究提出OpenUS——首个基于大规模公开数据的开源超声基础模型。模型结合视觉技术并采用自适应遮蔽框架,通过对比学习与遮罩图像建模进行特征提取。该模型动态调整预训练难度,整合了来自多个公开数据集的多样化超声图像。OpenUS能轻松适应下游任务并提升标注效率。相关代码可在指定网址获取。

Key Takeaways

  • 研究引入了超声领域的新模型——OpenUS,利用大量公开数据集建立开放源码的模型,简化了该领域AI的开发。
  • OpenUS基于Mamba架构实现本地与全球范围内的长期依赖性捕获,这一技术用于改善图像的捕获与处理效率。
  • 创新性使用自适应遮蔽框架(结合了对比学习与遮罩图像建模),旨在提高预训练效果并适应临床相关遮蔽。
  • 动态学习调度策略用于逐步调整预训练过程的难度,提升模型的适应性。
  • 模型涵盖多种解剖学区域、机构、成像设备和疾病类型的数据集组合为最大的公开超声数据集,共计超过30万张图像。这为建立更具泛化性的模型提供了数据基础。

Cool Papers

点此查看论文截图

Data-efficient U-Net for Segmentation of Carbide Microstructures in SEM Images of Steel Alloys

Authors:Alinda Ezgi Gerçek, Till Korten, Paul Chekhonin, Maleeha Hassan, Peter Steinbach

Understanding reactor-pressure-vessel steel microstructure is crucial for predicting mechanical properties, as carbide precipitates both strengthen the alloy and can initiate cracks. In scanning electron microscopy images, gray-value overlap between carbides and matrix makes simple thresholding ineffective. We present a data-efficient segmentation pipeline using a lightweight U-Net (30.7~M parameters) trained on just \textbf{10 annotated scanning electron microscopy images}. Despite limited data, our model achieves a \textbf{Dice-Sørensen coefficient of 0.98}, significantly outperforming the state-of-the-art in the field of metallurgy (classical image analysis: 0.85), while reducing annotation effort by one order of magnitude compared to the state-of-the-art data efficient segmentation model. This approach enables rapid, automated carbide quantification for alloy design and generalizes to other steel types, demonstrating the potential of data-efficient deep learning in reactor-pressure-vessel steel analysis.

理解反应堆压力容器钢的微结构对于预测其机械性能至关重要,因为碳化物的沉淀既能使合金硬化也可能引发裂纹。在扫描电子显微镜图像中,碳化物和基质之间的灰度值重叠使得简单的阈值处理无效。我们提出了一种高效的数据分割管道,仅使用经过标注的扫描电子显微镜图像进行训练的轻量级U-Net(30.7~M参数)。尽管数据量有限,我们的模型实现了Dice-Sørensen系数为0.98,在冶金领域显著优于最先进技术(经典图像分析:0.85),并且与传统的高效数据分割模型相比,减少了一个数量级的标注工作量。这种方法能够实现快速的自动化碳化物定量分析,用于合金设计,并可推广到其他类型的钢材,展示了数据高效深度学习在反应堆压力容器钢分析中的潜力。

论文及项目相关链接

PDF

Summary

本文介绍了对反应堆压力容器钢微观结构中碳化物的识别与分割技术。采用数据高效的分割管道,利用轻量级U-Net模型在少量标注的扫描电子显微镜图像上进行训练,实现了高准确率的碳化物分割,并大幅减少了标注工作量。该技术有助于快速自动化量化合金中的碳化物,为合金设计和反应堆压力容器钢分析提供了潜在的数据高效深度学习应用。

Key Takeaways

  1. 碳化物在反应堆压力容器钢中的重要作用:强化合金,但也可能引发裂纹。
  2. 扫描电子显微镜图像分析中面临的挑战:碳化物和基质之间的灰度值重叠。
  3. 采用轻量级U-Net模型进行图像分割:模型参数效率优化,仅使用少量标注的扫描电子显微镜图像进行训练。
  4. 模型表现卓越:达到Dice-Sørensen系数0.98,优于冶金领域的现有技术(传统图像分析:0.85)。
  5. 显著减少标注工作量:相比现有数据高效分割模型,降低了标注工作量一个数量级。
  6. 技术在合金设计中的应用:快速自动化量化碳化物,为合金设计提供依据。

Cool Papers

点此查看论文截图

VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation

Authors:Maximilian Rokuss, Moritz Langenberg, Yannick Kirchhoff, Fabian Isensee, Benjamin Hamm, Constantin Ulrich, Sebastian Regnery, Lukas Bauer, Efthimios Katsigiannopulos, Tobias Norajitra, Klaus Maier-Hein

We introduce VoxTell, a vision-language model for text-prompted volumetric medical image segmentation. It maps free-form descriptions, from single words to full clinical sentences, to 3D masks. Trained on 62K+ CT, MRI, and PET volumes spanning over 1K anatomical and pathological classes, VoxTell uses multi-stage vision-language fusion across decoder layers to align textual and visual features at multiple scales. It achieves state-of-the-art zero-shot performance across modalities on unseen datasets, excelling on familiar concepts while generalizing to related unseen classes. Extensive experiments further demonstrate strong cross-modality transfer, robustness to linguistic variations and clinical language, as well as accurate instance-specific segmentation from real-world text. Code is available at: https://www.github.com/MIC-DKFZ/VoxTell

我们介绍了VoxTell,这是一种用于文本提示的体积医学图像分割的视觉语言模型。它将自由形式的描述(从单个单词到完整的临床句子)映射到3D蒙版。VoxTell在超过1000个解剖和病理类别的超过62K个CT、MRI和PET体积上进行了训练,在解码器层上使用了多阶段的视觉语言融合,以在多个尺度上将文本和视觉特征对齐。它在未见数据集上实现了跨模态的零样本性能前沿水平,在熟悉的概念上表现出色,并推广到相关的未见类别。进一步的实验还证明了强大的跨模态迁移能力,对语言变化和临床语言的稳健性,以及从现实世界的文本进行准确的实例特定分割。代码可用在:https://www.github.com/MIC-DKFZ/VoxTell

论文及项目相关链接

PDF

Summary

VoxTell是一种用于文本提示的体积医学图像分割的视觉语言模型。它将自由形式的描述(从单个单词到完整的临床句子)映射到3D掩膜上。该模型在超过62K的CT、MRI和PET体积数据上进行训练,涵盖超过一千种解剖和病理类别。VoxTell使用多阶段视觉语言融合,在解码器层对齐文本和视觉特征,实现跨模态的零样本性能最优,并且在未见过的类别上表现出良好的泛化能力。此外,它还表现出强大的跨模态转移能力,对语言变化和临床语言的稳健性,以及从真实世界文本中进行实例特定分割的准确性。

Key Takeaways

  1. VoxTell是一个视觉语言模型,用于文本提示的体积医学图像分割。
  2. 模型将自由形式的文本描述映射到3D掩膜上。
  3. VoxTell在多种医学图像数据上进行训练,涵盖广泛的解剖和病理类别。
  4. 使用多阶段视觉语言融合,对齐文本和视觉特征。
  5. 实现了跨模态的零样本性能最优,并在未见过的类别上表现出良好的泛化能力。
  6. VoxTell具有强大的跨模态转移能力,对语言变化和临床语言具有稳健性。

Cool Papers

点此查看论文截图

Hi-DREAM: Brain Inspired Hierarchical Diffusion for fMRI Reconstruction via ROI Encoder and visuAl Mapping

Authors:Guowei Zhang, Yun Zhao, Moein Khajehnejad, Adeel Razi, Levin Kuhlmann

Mapping human brain activity to natural images offers a new window into vision and cognition, yet current diffusion-based decoders face a core difficulty: most condition directly on fMRI features without analyzing how visual information is organized across the cortex. This overlooks the brain’s hierarchical processing and blurs the roles of early, middle, and late visual areas. We propose Hi-DREAM, a brain-inspired conditional diffusion framework that makes the cortical organization explicit. A region-of-interest (ROI) adapter groups fMRI into early/mid/late streams and converts them into a multi-scale cortical pyramid aligned with the U-Net depth (shallow scales preserve layout and edges; deeper scales emphasize objects and semantics). A lightweight, depth-matched ControlNet injects these scale-specific hints during denoising. The result is an efficient and interpretable decoder in which each signal plays a brain-like role, allowing the model not only to reconstruct images but also to illuminate functional contributions of different visual areas. Experiments on the Natural Scenes Dataset (NSD) show that Hi-DREAM attains state-of-the-art performance on high-level semantic metrics while maintaining competitive low-level fidelity. These findings suggest that structuring conditioning by cortical hierarchy is a powerful alternative to purely data-driven embeddings and provides a useful lens for studying the visual cortex.

将人类大脑活动与天然图像相映射为视觉和认知研究提供了新的视角,然而,当前的基于扩散的解码器面临一个核心难题:它们大多直接以fMRI特征为条件,而没有分析视觉信息在大脑皮层上的组织方式。这忽略了大脑的分层处理过程,并模糊了早期、中期和晚期视觉区域的角色。我们提出了Hi-DREAM,这是一个受大脑启发的条件扩散框架,它明确了皮层组织。感兴趣区域(ROI)适配器将fMRI分为早期/中期/晚期流,并将其转化为与U-Net深度对齐的多尺度皮层金字塔(浅层尺度保留布局和边缘;深层尺度强调对象和语义)。一个轻便的、深度匹配的ControlNet在降噪过程中注入这些尺度特定的提示。结果是一个高效且可解释的解码器,其中每个信号都扮演着类似大脑的角色,使模型不仅能够重建图像,还能够阐明不同视觉区域的功能贡献。在自然场景数据集(NSD)上的实验表明,Hi-DREAM在高层次语义指标上达到了最新性能,同时在低层次保真度上保持了竞争力。这些发现表明,通过皮层结构进行结构化条件设置是纯粹数据驱动嵌入的有力替代方案,为研究视觉皮层提供了有用的视角。

论文及项目相关链接

PDF

Summary

该文提出一种名为Hi-DREAM的脑启发条件扩散框架,该框架将人类大脑活动映射到自然图像上,以揭示视觉和认知的新视角。文章指出当前扩散解码器主要直接依赖于fMRI特征而忽视视觉信息在大脑皮层的组织方式,因此提出通过皮层组织明确的Hi-DREAM框架来解决这一问题。该框架通过区域兴趣适配器将fMRI数据分为早期、中期和晚期流,并转换为与U-Net深度相匹配的多尺度皮层金字塔。深度匹配的ControlNet在降噪过程中注入尺度特定提示,使得解码器不仅高效且可解释,而且每个信号都扮演类似大脑的角色,既能重建图像又能阐明不同视觉区域的贡献。在Natural Scenes Dataset上的实验表明,Hi-DREAM在高层次语义指标上达到最佳性能,同时保持较低层次的保真竞争力。

Key Takeaways

  1. 当前扩散解码器在映射人脑活动到自然图像时忽略视觉信息在大脑皮层的组织方式。
  2. Hi-DREAM框架通过明确皮层组织来解决这一问题,将fMRI数据分为早期、中期和晚期流。
  3. Hi-DREAM利用多尺度皮层金字塔与U-Net深度相匹配,实现高效且可解释的解码。
  4. ControlNet注入尺度特定提示,使每个信号扮演类似大脑的角色,既能重建图像又能阐明视觉区域的贡献。
  5. 实验结果表明Hi-DREAM在高层次语义指标上表现优异。
  6. Hi-DREAM框架提供了一个强大的工具来研究视觉皮层的功能和结构。

Cool Papers

点此查看论文截图

Unsupervised Motion-Compensated Decomposition for Cardiac MRI Reconstruction via Neural Representation

Authors:Xuanyu Tian, Lixuan Chen, Qing Wu, Xiao Wang, Jie Feng, Yuyao Zhang, Hongjiang Wei

Cardiac magnetic resonance (CMR) imaging is widely used to characterize cardiac morphology and function. To accelerate CMR imaging, various methods have been proposed to recover high-quality spatiotemporal CMR images from highly undersampled k-t space data. However, current CMR reconstruction techniques either fail to achieve satisfactory image quality or are restricted by the scarcity of ground truth data, leading to limited applicability in clinical scenarios. In this work, we proposed MoCo-INR, a new unsupervised method that integrates implicit neural representations (INR) with the conventional motion-compensated (MoCo) framework. Using explicit motion modeling and the continuous prior of INRs, MoCo-INR can produce accurate cardiac motion decomposition and high-quality CMR reconstruction. Furthermore, we introduce a new INR network architecture tailored to the CMR problem, which significantly stabilizes model optimization. Experiments on retrospective (simulated) datasets demonstrate the superiority of MoCo-INR over state-of-the-art methods, achieving fast convergence and fine-detailed reconstructions at ultra-high acceleration factors (e.g., 20x in VISTA sampling). Additionally, evaluations on prospective (real-acquired) free-breathing CMR scans highlight the clinical practicality of MoCo-INR for real-time imaging. Several ablation studies further confirm the effectiveness of the critical components of MoCo-INR.

心脏磁共振(CMR)成像广泛用于表征心脏形态和功能。为加速CMR成像,已经提出了各种方法从高度欠采样的k-t空间数据中恢复高质量的时空CMR图像。然而,当前的CMR重建技术要么无法达到令人满意的图像质量,要么受到真实数据缺乏的限制,导致在临床场景中的应用有限。在这项工作中,我们提出了MoCo-INR,这是一种新的无监督方法,它将隐式神经表示(INR)与传统的运动补偿(MoCo)框架相结合。通过明确的运动建模和INR的连续先验,MoCo-INR可以实现准确的心脏运动分解和高质量的CMR重建。此外,我们针对CMR问题引入了一种新的INR网络架构,这可以显著稳定模型优化。在回顾性(模拟)数据集上的实验表明,MoCo-INR优于最先进的方法,实现了快速收敛和在超高加速因子(例如VISTA采样中的20x)下的精细重建。此外,对前瞻性(真实采集)自由呼吸CMR扫描的评估突出了MoCo-INR在实时成像中的临床实用性。几项消融研究进一步证实了MoCo-INR关键组件的有效性。

论文及项目相关链接

PDF Accepted by AAAI-26

Summary
此摘要介绍了一种新型的无监督方法MoCo-INR,它将隐神经表示(INR)与传统运动补偿(MoCo)框架相结合,用于加速心脏磁共振(CMR)成像。该方法能准确分解心脏运动并重建高质量CMR图像,并在回顾性模拟数据集和前瞻性实时CMR扫描上均表现出优异性能。

Key Takeaways

  1. MoCo-INR是一种新型无监督方法,结合了隐神经表示(INR)和运动补偿(MoCo)技术,用于加速心脏磁共振(CMR)成像。
  2. MoCo-INR能准确分解心脏运动并重建高质量CMR图像。
  3. 在回顾性模拟数据集上,MoCo-INR的表现优于现有技术,实现了快速收敛和精细的重建,甚至在超高加速度因子(例如20倍)下也是如此。
  4. MoCo-INR在前景实时CMR扫描上的表现突出了其实用性。
  5. MoCo-INR的新型INR网络架构针对CMR问题进行了定制,显著稳定了模型优化。
  6. 消融研究进一步证实了MoCo-INR关键组件的有效性。

Cool Papers

点此查看论文截图

Unsupervised Segmentation of Micro-CT Scans of Polyurethane Structures By Combining Hidden-Markov-Random Fields and a U-Net

Authors:Julian Grolig, Lars Griem, Michael Selzer, Hans-Ulrich Kauczor, Simon M. F. Triphan, Britta Nestler, Arnd Koeppe

Extracting digital material representations from images is a necessary prerequisite for a quantitative analysis of material properties. Different segmentation approaches have been extensively studied in the past to achieve this task, but were often lacking accuracy or speed. With the advent of machine learning, supervised convolutional neural networks (CNNs) have achieved state-of-the-art performance for different segmentation tasks. However, these models are often trained in a supervised manner, which requires large labeled datasets. Unsupervised approaches do not require ground-truth data for learning, but suffer from long segmentation times and often worse segmentation accuracy. Hidden Markov Random Fields (HMRF) are an unsupervised segmentation approach that incorporates concepts of neighborhood and class distributions. We present a method that integrates HMRF theory and CNN segmentation, leveraging the advantages of both areas: unsupervised learning and fast segmentation times. We investigate the contribution of different neighborhood terms and components for the unsupervised HMRF loss. We demonstrate that the HMRF-UNet enables high segmentation accuracy without ground truth on a Micro-Computed Tomography ($μ$CT) image dataset of Polyurethane (PU) foam structures. Finally, we propose and demonstrate a pre-training strategy that considerably reduces the required amount of ground-truth data when training a segmentation model.

从图像中提取数字材料表示是对材料属性进行定量分析的必要前提。过去已经研究了不同的分割方法来实现这一任务,但往往缺乏准确性或速度。随着机器学习的发展,有监督的卷积神经网络(CNN)在不同分割任务上达到了最先进的性能。然而,这些模型通常是有监督训练的,需要大量标记数据集。无监督的方法不需要真实数据进行学习,但存在分割时间长和分割准确性较差的问题。隐马尔可夫随机场(HMRF)是一种结合了邻域和类别分布概念的无监督分割方法。我们提出了一种结合HMRF理论和CNN分割的方法,充分利用了无监督学习和快速分割时间两个领域的优势。我们研究了不同邻域项和组件对无监督HMRF损失的贡献。我们证明了HMRF-UNet在无Micro-Computed Tomography(μCT)图像数据集Polyurethane(PU)泡沫结构的情况下可实现高分割精度。最后,我们提出并展示了一种预训练策略,该策略在训练分割模型时显著减少了所需的真实数据数量。

论文及项目相关链接

PDF

Summary

基于机器学习的卷积神经网络(CNN)已用于医学图像分割任务,并取得最先进的性能。然而,这些方法通常需要大量的标记数据集进行训练。本研究提出了一种结合隐马尔可夫随机场(HMRF)理论和CNN分割的方法,旨在实现无监督学习和快速分割。该研究展示了在无需真实标注的情况下,利用HMRF-UNet在聚氨酯(PU)泡沫结构的Micro-Computed Tomography(μCT)图像数据集上实现高分割准确性的能力。此外,本研究还提出了一种预训练策略,极大地减少了分割模型对真实标注数据的需求。

Key Takeaways

  1. 数字材料表示是定量材料性质分析的必要先决条件,过去的研究提出了多种分割方法来实现这一任务,但准确性或速度方面常存在不足。
  2. 随着机器学习的发展,监督学习卷积神经网络(CNNs)在分割任务上取得了最先进的性能。
  3. 传统的监督学习方法需要大量的标记数据集进行模型训练。
  4. 隐马尔可夫随机场(HMRF)是一种无监督分割方法,结合了邻域和类别分布的概念。
  5. 本研究结合了HMRF理论和CNN分割,实现了无监督学习与快速分割。
  6. 在无需真实标注的情况下,HMRF-UNet在聚氨酯泡沫结构的μCT图像数据集上实现了高分割准确性。
  7. 研究提出了一种预训练策略,显著减少了训练分割模型对真实标注数据的需求。

Cool Papers

点此查看论文截图

Toward Scalable Early Cancer Detection: Evaluating EHR-Based Predictive Models Against Traditional Screening Criteria

Authors:Jiheum Park, Chao Pang, Tristan Y. Lee, Jeong Yun Yang, Jacob Berkowitz, Alexander Z. Wei, Nicholas Tatonetti

Current cancer screening guidelines cover only a few cancer types and rely on narrowly defined criteria such as age or a single risk factor like smoking history, to identify high-risk individuals. Predictive models using electronic health records (EHRs), which capture large-scale longitudinal patient-level health information, may provide a more effective tool for identifying high-risk groups by detecting subtle prediagnostic signals of cancer. Recent advances in large language and foundation models have further expanded this potential, yet evidence remains limited on how useful HER-based models are compared with traditional risk factors currently used in screening guidelines. We systematically evaluated the clinical utility of EHR-based predictive models against traditional risk factors, including gene mutations and family history of cancer, for identifying high-risk individuals across eight major cancers (breast, lung, colorectal, prostate, ovarian, liver, pancreatic, and stomach), using data from the All of Us Research Program, which integrates EHR, genomic, and survey data from over 865,000 participants. Even with a baseline modeling approach, EHR-based models achieved a 3- to 6-fold higher enrichment of true cancer cases among individuals identified as high risk compared with traditional risk factors alone, whether used as a standalone or complementary tool. The EHR foundation model, a state-of-the-art approach trained on comprehensive patient trajectories, further improved predictive performance across 26 cancer types, demonstrating the clinical potential of EHR-based predictive modeling to support more precise and scalable early detection strategies.

当前癌症筛查指南仅覆盖几种癌症类型,并依赖于狭义的标准,例如年龄或吸烟史等单一风险因素来识别高危个体。使用电子健康记录(EHRs)的预测模型能够捕捉大规模纵向患者级健康信息,通过检测癌症的微妙预诊断信号,提供更有效的工具来识别高危群体。随着大型语言和基础模型的最新进展,这一潜力得到了进一步拓展,然而关于与当前筛查指南中使用的传统风险因素相比,基于HER的模型的实际效用仍证据有限。我们系统地评估了基于EHR的预测模型与传统风险因素(包括基因突变和癌症家族史)在临床上的效用,以识别八种主要癌症(乳腺癌、肺癌、结直肠癌、前列腺癌、卵巢癌、肝癌、胰腺癌和胃)中的高危个体。我们使用了“我们所有人研究计划”的数据,该计划整合了来自超过86万五千名参与者的EHR、基因和调查数据。即使在基线建模方法下,与仅使用传统风险因素相比,基于EHR的模型在被识别为高危个体的人群中,真实癌症病例的富集率提高了三到六倍。不论作为独立工具还是辅助工具,这一模型都展示了其临床潜力,用于支持更精确和可规模化的早期检测策略。此外,基于EHR基础模型的最新方法经过全面的患者轨迹训练,进一步提高了在多种癌症类型中的预测性能。这显示了基于EHR的预测模型的临床潜力,有助于支持更精确和可扩展的早期检测策略。

论文及项目相关链接

PDF

Summary

基于当前癌症筛查指南覆盖面较窄,仅针对少数癌症类型和狭窄定义的标准(如年龄或单一风险因素如吸烟史),来识别高风险个体。研究利用电子健康记录(EHRs)预测模型识别高风险人群具有较大潜力,这些模型捕捉大规模的纵向患者健康信息,可检测出癌症的微妙预诊断信号。本文通过全面评估电子健康记录基础预测模型相对于传统风险因素的实用性,确认了其在识别八大癌症(乳腺癌、肺癌等)高风险个体方面的优势。即使在基本的建模方法中,电子健康记录模型也比传统风险因素更具优势,富集了高风险群体中的真实癌症病例。最先进的电子健康记录基础模型在预测26种癌症方面的表现进一步提升,显示出其在支持更精确和可扩展的早期检测策略方面的潜力。

Key Takeaways

  1. 当前癌症筛查指南覆盖范围有限,主要依赖年龄和单一风险因素来识别高风险人群。
  2. 电子健康记录(EHRs)预测模型能够捕捉大规模纵向患者健康信息,具有更大潜力来识别癌症高风险人群。
  3. EHRs预测模型相比传统风险因素(如基因突变和家族癌症史)在识别八大癌症高风险个体方面表现出优势。
  4. 电子健康记录基础模型富集高风险群体中的真实癌症病例数量显著高于传统风险因素。
  5. 最先进的电子健康记录基础模型在预测多种癌症方面的表现优异,进一步证明了其潜力。
  6. 电子健康记录预测模型可作为独立工具或补充工具使用,以提高癌症检测的准确性。

Cool Papers

点此查看论文截图

Coordinative Learning with Ordinal and Relational Priors for Volumetric Medical Image Segmentation

Authors:Haoyi Wang

Volumetric medical image segmentation presents unique challenges due to the inherent anatomical structure and limited availability of annotations. While recent methods have shown promise by contrasting spatial relationships between slices, they rely on hard binary thresholds to define positive and negative samples, thereby discarding valuable continuous information about anatomical similarity. Moreover, these methods overlook the global directional consistency of anatomical progression, resulting in distorted feature spaces that fail to capture the canonical anatomical manifold shared across patients. To address these limitations, we propose Coordinative Ordinal-Relational Anatomical Learning (CORAL) to capture both local and global structure in volumetric images. First, CORAL employs a contrastive ranking objective to leverage continuous anatomical similarity, ensuring relational feature distances between slices are proportional to their anatomical position differences. In addition, CORAL incorporates an ordinal objective to enforce global directional consistency, aligning the learned feature distribution with the canonical anatomical progression across patients. Learning these inter-slice relationships produces anatomically informed representations that benefit the downstream segmentation task. Through this coordinative learning framework, CORAL achieves state-of-the-art performance on benchmark datasets under limited-annotation settings while learning representations with meaningful anatomical structure. Code is available at https://github.com/haoyiwang25/CORAL.

医学图像三维分割面临着独特的挑战,主要是由于其固有的解剖结构和标注的有限可用性。虽然最近的方法通过对比切片之间的空间关系显示出了一定的前景,但它们依赖于硬二进制阈值来定义正样本和负样本,从而丢弃了关于解剖相似性的宝贵连续信息。此外,这些方法忽视了解剖进展的全局方向一致性,导致特征空间失真,无法捕获患者之间共享的规范解剖流形。为了解决这些局限性,我们提出了协调序数关系解剖学学习(CORAL)方法,以捕获三维图像中的局部和全局结构。首先,CORAL采用对比排名目标来利用连续的解剖相似性,确保切片之间的关系特征距离与它们的解剖位置差异成正比。此外,CORAL还纳入了一个序数目标来强制执行全局方向一致性,使学习到的特征分布与跨患者的规范解剖进展保持一致。学习这些切片间的关系产生了对下游分割任务有益的解剖学信息表示。通过这一协调学习框架,CORAL在有限标注设置的基准数据集上实现了最先进的性能,同时学习了具有有意义的解剖结构表示。代码可在https://github.com/haoyiwang25/CORAL找到。

论文及项目相关链接

PDF

Summary

医学图像体积分割面临独特挑战,如解剖结构内在性和标注数据有限。现有方法通过对比切片间的空间关系展现出潜力,但采用硬二进制阈值定义正负面样本,丢失了关于解剖相似性的连续信息。此外,这些方法忽视了全局方向一致性,导致特征空间扭曲,无法捕捉跨患者的通用解剖流形。为解决这些问题,我们提出协调序关系解剖学学习(CORAL)方法,以捕捉体积图像中的局部和全局结构。CORAL采用对比排名目标,利用连续的解剖学相似性,确保切片之间的相对距离与它们的解剖位置差异成比例。此外,CORAL还包含一个序贯目标,以强制执行全局方向一致性,使学习到的特征分布与跨患者的标准解剖进展对齐。这种协调学习框架使CORAL在有限标注设置下实现了最佳性能,并学习了具有有意义解剖结构的表示。

Key Takeaways

  1. 医学图像体积分割面临标注数据有限和解剖结构独特的挑战。
  2. 现有方法使用硬二进制阈值定义样本,丢失了解剖相似性的连续信息。
  3. CORAL方法通过对比排名目标和序贯目标,捕捉体积图像中的局部和全局结构。
  4. CORAL利用连续的解剖学相似性,确保切片之间的相对距离与解剖位置差异成比例。
  5. CORAL强制执行全局方向一致性,使学习到的特征分布与跨患者的标准解剖进展对齐。
  6. 通过协调学习框架,CORAL在有限标注设置下实现了最佳性能。
  7. CORAL学习的表示具有有意义的解剖结构。

Cool Papers

点此查看论文截图

MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI

Authors:Mohammad Areeb Qazi, Munachiso S Nwadike, Ibrahim Almakky, Mohammad Yaqub, Numan Saeed

Foundational models are trained on extensive datasets to capture the general trends of a domain. However, in medical imaging, the scarcity of data makes pre-training for every domain, modality, or task challenging. Instead of building separate models, we propose MAFM^3 (Modular Adaptation of Foundation Models for Multi-Modal Medical AI), a framework that enables a single foundation model to expand into diverse domains, tasks, and modalities through lightweight modular components. These components serve as specialized skill sets that allow the system to flexibly activate the appropriate capability at the inference time, depending on the input type or clinical objective. Unlike conventional adaptation methods that treat each new task or modality in isolation, MAFM^3 provides a unified and expandable framework for efficient multitask and multimodality adaptation. Empirically, we validate our approach by adapting a chest CT foundation model initially trained for classification into prognosis and segmentation modules. Our results show improved performance on both tasks. Furthermore, by incorporating PET scans, MAFM^3 achieved an improvement in the Dice score 5% compared to the respective baselines. These findings establish that foundation models, when equipped with modular components, are not inherently constrained to their initial training scope but can evolve into multitask, multimodality systems for medical imaging. The code implementation of this work can be found at https://github.com/Areeb2735/CTscan_prognosis_VLM

基础模型在大量数据集上进行训练,以捕捉领域的一般趋势。然而,在医学成像领域,数据的稀缺性使得针对每个领域、模态或任务的预训练变得具有挑战性。我们并不主张构建单独模型,而是提出了MAFM^3(基础模型的模块化适应多模态医疗人工智能),这是一种框架,它允许单个基础模型通过轻量级模块化组件扩展到多个领域、任务和模态。这些组件充当了专门技能集,使得系统可以根据输入类型或临床目标灵活地激活相应能力在推断时。与传统的隔离处理每个新任务或模态的适应方法不同,MAFM^3提供了一个统一且可扩展的框架,可实现高效的多任务和多模态适应。我们通过将初始用于分类的胸部CT基础模型改编为预后和分割模块来验证我们的方法。我们的结果显示这两个任务上的性能都有所提高。此外,通过结合PET扫描,MAFM^3在Dice得分上实现了与各自基线相比提高了5%。这些发现表明,当配备模块化组件时,基础模型并不局限于其初始训练范围,而是可以演变为多任务、多模态医学成像系统。该工作的代码实现可以在https://github.com/Areeb2735/CTscan_prognosis_VLM找到。

论文及项目相关链接

PDF 2 figures, 3 tables

Summary
医学图像领域提出MAFM^3框架,通过单一基础模型结合轻量级模块化组件,实现模型在多域、多任务和多模态下的灵活扩展。框架能使系统根据输入类型或临床目标,灵活激活相应能力。经验证,该框架在胸部CT图像分类基础模型转化为预后和分割模块时表现出卓越性能,并在融合PET扫描时进一步提高Dice得分。

Key Takeaways

  1. 基础模型在医学成像领域面临数据稀缺问题,难以进行每个域、模式或任务的预训练。
  2. 提出MAFM^3框架,使单一基础模型能扩展到多个域、任务的模式。
  3. MAFM^3框架利用轻量级模块化组件,作为专门技能集,使系统能灵活适应不同任务或模态。
  4. 与传统适应方法不同,MAFM^3提供统一、可扩展的框架,实现多任务和多模态的有效适应。
  5. 通过将胸部CT分类基础模型转化为预后和分割模块,验证了MAFM^3方法的优越性。
  6. 结合PET扫描,MAFM^3框架在Dice得分上提高了5%,相比基线有明显改进。

Cool Papers

点此查看论文截图

Machine-Learning Based Detection of Coronary Artery Calcification Using Synthetic Chest X-Rays

Authors:Dylan Saeed, Ramtin Gharleghi, Susann Bier, Sonit Singh

Coronary artery calcification (CAC) is a strong predictor of cardiovascular events, with CT-based Agatston scoring widely regarded as the clinical gold standard. However, CT is costly and impractical for large-scale screening, while chest X-rays (CXRs) are inexpensive but lack reliable ground truth labels, constraining deep learning development. Digitally reconstructed radiographs (DRRs) offer a scalable alternative by projecting CT volumes into CXR-like images while inheriting precise labels. In this work, we provide the first systematic evaluation of DRRs as a surrogate training domain for CAC detection. Using 667 CT scans from the COCA dataset, we generate synthetic DRRs and assess model capacity, super-resolution fidelity enhancement, preprocessing, and training strategies. Lightweight CNNs trained from scratch outperform large pretrained networks; pairing super-resolution with contrast enhancement yields significant gains; and curriculum learning stabilises training under weak supervision. Our best configuration achieves a mean AUC of 0.754, comparable to or exceeding prior CXR-based studies. These results establish DRRs as a scalable, label-rich foundation for CAC detection, while laying the foundation for future transfer learning and domain adaptation to real CXRs.

冠状动脉钙化(CAC)是心血管疾病事件的强烈预测因子,基于CT的Agatston评分被广泛应用于临床黄金标准。然而,CT成本高昂,不适用于大规模筛查,而胸部X射线(CXRs)价格低廉,但缺乏可靠的真实标签,限制了深度学习的发展。数字重建图像(DRRs)通过将从CT体积投影到类似CXR的图像中,同时继承精确标签,提供了一种可扩展的替代方案。在这项工作中,我们首次对DRRs作为CAC检测替代训练域进行了系统评估。使用COCA数据集的667个CT扫描,我们生成合成DRRs,并评估模型容量、超分辨率保真度增强、预处理和训练策略。从头开始训练的轻型CNN优于大型预训练网络;配对超分辨率与对比度增强可产生重大收益;课程学习可在弱监督下稳定训练。我们的最佳配置达到了平均AUC为0.754,可与或优于先前的基于CXR的研究。这些结果将DRRs确立为CAC检测的可扩展、标签丰富的基石,同时为未来的迁移学习和对真实CXRs的领域适应奠定了基础。

论文及项目相关链接

PDF 10 pages, 5 figures. Under review for MIDL 2026

Summary

本文研究了数字重建放射影像(DRRs)作为冠状动脉钙化(CAC)检测的替代训练领域的应用。通过对COCA数据集的667份CT扫描生成合成DRRs,评估了模型容量、超分辨率保真度增强、预处理和训练策略。研究结果显示,轻量级CNN表现优于大型预训练网络;结合超分辨率与对比度增强有明显提升;课程学习在弱监督下稳定训练。最佳配置的平均AUC达到0.754,与先前的CXR研究相当或更优秀。DRRs作为一种可伸缩、标签丰富的CAC检测基础已得到验证,为未来向真实CXRs的迁移学习和域适应奠定了基础。

Key Takeaways

  1. 冠状动脉钙化(CAC)是心血管疾病事件的强烈预测因素,CT基础上的Agatston评分是临床金标准。
  2. CT成本高昂,不适用于大规模筛查,而胸部X光片(CXRs)虽价格低廉但缺乏可靠的地面真实标签,制约了深度学习的发展。
  3. 数字重建放射影像(DRRs)通过将CT体积投影成CXR样式的图像,同时继承精确标签,提供了一种可扩展的替代方案。
  4. 本研究首次对DRRs作为CAC检测替代训练域进行了系统评估。
  5. 使用轻量级CNN从基础开始训练表现优于大型预训练网络。
  6. 结合超分辨率与对比度增强技术带来了显著的提升。

Cool Papers

点此查看论文截图

Highly Polarized Intrinsic Emission and its Orthogonal Counterpart in Vela X-1

Authors:WanYun Wu, Fei Xie, Long Ji, Mingyu Ge, Fabio La Monaca

Vela X-1 is one of the most archetypal wind-fed X-ray pulsars (XRPs), and the emergence of its orthogonal polarization states reveals distinctive polarimetric properties. Using data from Imaging X-ray Polarimetry Explorer (IXPE) observations of Vela X-1, we perform a polarization analysis of Vela X-1 using a triple power-law spectral model absorbed by varying column densities, successfully isolating two physically distinct orthogonal polarized components. The first polarized component corresponds to emission from the accretion mound surface that is not obscured by the wind clumps, with its polarization degree (PD) exceeding 30%. In specific phase intervals, the PD reaches (50.9 \pm 10.7%). This marks the first detection of such highly polarized neutron star emission in an XRP. The second polarized component likely originates from complex physical processes within or near the accretion mound, with its PD showing a potential negative correlation with column density. Furthermore, by rotating the predicted polarization angle (PA) of the first polarized component by 90$^\circ$, we successfully achieve separate fitting and simultaneous fitting of the two orthogonal polarization states using the rotating vector model (RVM).

Vela X-1是最典型的受风驱动的X射线脉冲星之一,其正交偏振态的出现揭示了独特的偏振特性。我们使用成像X射线偏振仪(IXPE)对Vela X-1的观察数据,采用三重幂律谱模型(被不同柱密度吸收),成功分离出两个物理上不同的正交偏振分量。第一个偏振分量对应于来自堆积山丘表面的发射,该发射未被风团遮挡,其偏振度(PD)超过30%。在特定的相位间隔内,PD达到(50.9 \pm 10.7%)。这标志着在X射线脉冲星中首次检测到如此高度偏振的中子星发射。第二个偏振分量可能源于堆积山丘内部或近处的复杂物理过程,其PD与柱密度呈潜在负相关。此外,通过将第一个偏振分量的预测偏振角(PA)旋转90°,我们成功使用旋转矢量模型(RVM)实现了两个正交偏振态的单独拟合和同时拟合。

论文及项目相关链接

PDF 14 pages, 8 figures. Accepted for publication in APJ

Summary

Vela X-1的典型风喂X射线脉冲星(XRPs)的极化特性被研究。利用成像X射线偏振探测器(IXPE)观测数据,通过三重幂律谱模型分析,成功分离出两个物理上正交偏振分量。首个偏振成分来自未受风团遮蔽的吸积堆表面,偏振度(PD)超过30%,特定区间可达50.9±10.7%。这是XRP中首次检测到如此高偏振的中子星发射。第二个偏振成分可能源于吸积堆内部或近处的复杂物理过程,其PD与柱密度呈潜在负相关。通过旋转预测的第一偏振成分的偏振角(PA)90°,成功用旋转矢量模型(RVM)分别拟合和同时拟合两个正交偏振状态。

Key Takeaways

  1. Vela X-1是典型的风喂X射线脉冲星,具有独特的偏振特性。
  2. 通过IXPE观测数据,发现了两个物理上正交偏振分量。
  3. 首个偏振成分来自吸积堆表面,偏振度超过30%,在特定区间可达50.9%。
  4. 这是XRP中首次检测到如此高偏振的中子星发射。
  5. 第二个偏振成分可能源于复杂物理过程,其偏振度与柱密度存在潜在负相关。
  6. 通过旋转预测的第一偏振成分的偏振角,实现了对两个正交偏振状态的分别和同时拟合。

Cool Papers

点此查看论文截图

S2D-ALIGN: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation

Authors:Jiechao Gao, Chang Liu, Yuangang Li

Radiology Report Generation (RRG) aims to automatically generate diagnostic reports from radiology images. To achieve this, existing methods have leveraged the powerful cross-modal generation capabilities of Multimodal Large Language Models (MLLMs), primarily focusing on optimizing cross-modal alignment between radiographs and reports through Supervised Fine-Tuning (SFT). However, by only performing instance-level alignment with the image-text pairs, the standard SFT paradigm fails to establish anatomically-grounded alignment, where the templated nature of reports often leads to sub-optimal generation quality. To address this, we propose \textsc{S2D-Align}, a novel SFT paradigm that establishes anatomically-grounded alignment by leveraging auxiliary signals of varying granularities. \textsc{S2D-Align} implements a shallow-to-deep strategy, progressively enriching the alignment process: it begins with the coarse radiograph-report pairing, then introduces reference reports for instance-level guidance, and ultimately utilizes key phrases to ground the generation in specific anatomical details. To bridge the different alignment stages, we introduce a memory-based adapter that empowers feature sharing, thereby integrating coarse and fine-grained guidance. For evaluation, we conduct experiments on the public \textsc{MIMIC-CXR} and \textsc{IU X-Ray} benchmarks, where \textsc{S2D-Align} achieves state-of-the-art performance compared to existing methods. Ablation studies validate the effectiveness of our multi-stage, auxiliary-guided approach, highlighting a promising direction for enhancing grounding capabilities in complex, multi-modal generation tasks.

放射学报告生成(RRG)旨在自动根据放射学图像生成诊断报告。为实现这一目标,现有方法已经利用多模态大型语言模型(MLLMs)的跨模态生成能力,主要关注通过有监督微调(SFT)优化放射图像与报告之间的跨模态对齐。然而,仅通过实例级对齐进行图像文本对的方法,标准SFT范式未能建立基于解剖结构的对齐,报告的模板化性质往往导致生成质量不佳。为解决这一问题,我们提出了一种新的SFT范式——S2D-Align。它通过利用不同粒度的辅助信号,建立基于解剖结构的对齐。S2D-Align采用从浅到深的策略,逐步丰富对齐过程:它从粗略的放射图像报告配对开始,然后引入参考报告进行实例级指导,并最终利用关键短语将生成细化到特定的解剖细节。为了桥接不同的对齐阶段,我们引入了一种基于内存的适配器,以实现特征共享,从而整合粗略和精细的引导。为了进行评估,我们在公共的MIMIC-CXR和IU X-Ray基准上进行了实验,与现有方法相比,S2D-Align实现了最先进的性能。消融研究验证了我们的多阶段辅助引导方法的有效性,为增强复杂多模态生成任务的定位能力指出了有希望的方向。

论文及项目相关链接

PDF

Summary
本研究提出了一个基于深度学习的新方法——S2D-Align,旨在提高放射报告生成质量。通过对多模态大规模语言模型的训练改进,该算法可实现从粗到细的层次性影像报告匹配和解剖特征描述的对齐,以提升自动生成的诊断报告的准确性。在公开数据集上的实验结果表明,S2D-Align相较于现有方法取得了最佳性能。

Key Takeaways

  1. Radiology Report Generation (RRG)目标是自动从放射学图像生成诊断报告。
  2. 当前方法主要利用多模态大规模语言模型的跨模态生成能力,通过监督微调(SFT)优化放射图像与报告之间的跨模态对齐。
  3. 现有SFT范式存在缺陷,仅进行实例级对齐,未能建立基于解剖结构的对齐,导致报告生成质量不佳。
  4. S2D-Align是一种新型的SFT范式,通过利用不同粒度的辅助信号建立解剖结构为基础的对齐。
  5. S2D-Align采用由浅到深的策略,逐步丰富对齐过程,包括粗级放射图像报告配对、引入参考报告进行实例级指导以及利用关键短语对特定解剖细节进行定位。
  6. 为衔接不同的对齐阶段,引入了基于内存的适配器,实现特征共享,从而整合粗粒度和细粒度指导。

Cool Papers

点此查看论文截图

PINGS-X: Physics-Informed Normalized Gaussian Splatting with Axes Alignment for Efficient Super-Resolution of 4D Flow MRI

Authors:Sun Jo, Seok Young Hong, JinHyun Kim, Seungmin Kang, Ahjin Choi, Don-Gwan An, Simon Song, Je Hyeong Hong

4D flow magnetic resonance imaging (MRI) is a reliable, non-invasive approach for estimating blood flow velocities, vital for cardiovascular diagnostics. Unlike conventional MRI focused on anatomical structures, 4D flow MRI requires high spatiotemporal resolution for early detection of critical conditions such as stenosis or aneurysms. However, achieving such resolution typically results in prolonged scan times, creating a trade-off between acquisition speed and prediction accuracy. Recent studies have leveraged physics-informed neural networks (PINNs) for super-resolution of MRI data, but their practical applicability is limited as the prohibitively slow training process must be performed for each patient. To overcome this limitation, we propose PINGS-X, a novel framework modeling high-resolution flow velocities using axes-aligned spatiotemporal Gaussian representations. Inspired by the effectiveness of 3D Gaussian splatting (3DGS) in novel view synthesis, PINGS-X extends this concept through several non-trivial novel innovations: (i) normalized Gaussian splatting with a formal convergence guarantee, (ii) axes-aligned Gaussians that simplify training for high-dimensional data while preserving accuracy and the convergence guarantee, and (iii) a Gaussian merging procedure to prevent degenerate solutions and boost computational efficiency. Experimental results on computational fluid dynamics (CFD) and real 4D flow MRI datasets demonstrate that PINGS-X substantially reduces training time while achieving superior super-resolution accuracy. Our code and datasets are available at https://github.com/SpatialAILab/PINGS-X.

四维血流磁共振成像(MRI)是一种可靠、无创的血流速度估计方法,对心血管疾病的诊断至关重要。与传统的专注于解剖结构的MRI不同,四维血流MRI需要高的时空分辨率来早期检测关键疾病,如狭窄或动脉瘤。然而,实现这种分辨率通常会导致扫描时间延长,需要在采集速度和预测精度之间进行权衡。最近的研究已经利用物理信息神经网络(PINNs)进行MRI数据的超分辨率处理,但它们的实际适用性受到限制,因为每个患者的训练过程极其缓慢。为了克服这一局限性,我们提出了PINGS-X,这是一个新型框架,利用轴对齐的时空高斯表示对高分辨率血流速度进行建模。PINGS-X受到三维高斯喷涂(3DGS)在新型视图合成中的有效性的启发,通过几个非平凡的创新扩展了这一概念:(i)具有正式收敛保证的归一化高斯喷涂;(ii)轴对齐的高斯简化了高维数据的训练过程,同时保留了精度和收敛性的保证;(iii)一个高斯合并过程,以防止退化解决方案并提高工作效率。在计算流体动力学(CFD)和真实四维血流MRI数据集上的实验结果表明,PINGS-X在减少训练时间的同时,实现了优越的超分辨率精度。我们的代码和数据集可在https://github.com/SpatialAILab/PINGS-X找到。

论文及项目相关链接

PDF Accepted at AAAI 2026. Supplementary material included after references. 27 pages, 21 figures, 11 tables

Summary

本文介绍了4D流磁共振成像(MRI)在估计血流速度方面的可靠性和重要性,这对心血管诊断至关重要。文章强调了实现高时空分辨率以早期检测关键病变如狭窄或动脉瘤的困难。为了解决这个问题,提出了一种名为PINGS-X的新型框架,它通过轴对齐的时空高斯表示对高分辨率流速进行建模。实验结果表明,PINGS-X在减少训练时间的同时实现了较高的超分辨率精度。

Key Takeaways

  1. 4D流MRI对于心血管诊断非常重要,能可靠估计血流速度。
  2. 实现高时空分辨率是早期检测关键病变如狭窄或动脉瘤的关键挑战。
  3. 目前存在使用物理信息神经网络(PINNs)进行MRI数据超分辨率处理的方法,但其训练过程缓慢,限制了实际应用。
  4. PINGS-X框架利用轴对齐的时空高斯表示建模高分辨流速,是本文的核心创新点。
  5. PINGS-X具有规范化高斯贴片技术、轴对齐的高斯和合并程序以防止退化解决方案并提升计算效率等特点。
  6. 实验结果表明,PINGS-X在减少训练时间的同时实现了较高的超分辨率精度。

Cool Papers

点此查看论文截图

Algorithms Trained on Normal Chest X-rays Can Predict Health Insurance Types

Authors:Chi-Yu Chen, Rawan Abulibdeh, Arash Asgari, Leo Anthony Celi, Deirdre Goode, Hassan Hamidi, Laleh Seyyed-Kalantari, Po-Chih Kuo, Ned McCague, Thomas Sounack

Artificial intelligence is revealing what medicine never intended to encode. Deep vision models, trained on chest X-rays, can now detect not only disease but also invisible traces of social inequality. In this study, we show that state-of-the-art architectures (DenseNet121, SwinV2-B, MedMamba) can predict a patient’s health insurance type, a strong proxy for socioeconomic status, from normal chest X-rays with significant accuracy (AUC around 0.67 on MIMIC-CXR-JPG, 0.68 on CheXpert). The signal persists even when age, race, and sex are controlled for, and remains detectable when the model is trained exclusively on a single racial group. Patch-based occlusion reveals that the signal is diffuse rather than localized, embedded in the upper and mid-thoracic regions. This suggests that deep networks may be internalizing subtle traces of clinical environments, equipment differences, or care pathways; learning socioeconomic segregation itself. These findings challenge the assumption that medical images are neutral biological data. By uncovering how models perceive and exploit these hidden social signatures, this work reframes fairness in medical AI: the goal is no longer only to balance datasets or adjust thresholds, but to interrogate and disentangle the social fingerprints embedded in clinical data itself.

人工智能正在揭示医学从未打算编码的信息。在胸部X光片上训练的深度视觉模型现在不仅能够检测疾病,还能够检测社会不平等的隐形痕迹。在这项研究中,我们展示了最先进的架构(DenseNet121、SwinV2-B、MedMamba)可以从正常的胸部X光片中预测患者的健康保险类型(作为社会经济地位的有力代理),预测的准确性相当高(在MIMIC-CXR-JPG上为约0.67的AUC,在CheXpert上为0.68)。即使在控制年龄、种族和性别的情况下,这一信号依然存在,并且在模型仅针对单一种族群体进行训练时仍可检测到。基于补丁的遮挡表明,这一信号是分散的,而非局部的,嵌入在胸部上区和中部区域。这表明深度网络可能正在内化临床环境、设备差异或护理路径的细微痕迹;学习社会经济隔离本身。这些发现挑战了医学图像是中立生物数据的假设。通过揭示模型如何感知和利用这些隐藏的社会特征,这项工作重新定义了医疗人工智能的公平性:目标不再仅仅是平衡数据集或调整阈值,而是探索和解决嵌入在临床数据中的社会指纹。

论文及项目相关链接

PDF Submitting to MIDL 2026

Summary

人工智能不仅能在胸部X光片中检测疾病,还能揭示社会不平等的隐形痕迹。本研究展示了最先进的模型架构可以预测患者的健康保险类型,作为社会经济地位的强大代理指标,准确率显著。这些信号在控制年龄、种族和性别因素后仍然存在,并且在单一种族群体的模型训练中也可检测得到。这些发现揭示了医疗图像并非中性的生物数据,而是蕴含了社会背景信息。本研究重新定义了医疗人工智能中的公平性问题,不仅要在数据集上实现平衡或调整阈值,还需要深入探索并消除临床数据中嵌入的社会指纹。

Key Takeaways

  1. 人工智能能够从胸部X光片中检测社会不平等。
  2. 最先进的模型架构能预测患者的健康保险类型,反映社会经济地位。
  3. 预测准确率显著,即使控制多个因素后,这种预测仍然有效。
  4. 隐藏的社会签名(如临床环境、设备差异或护理途径)可能嵌入在医疗图像中。
  5. 医疗图像并非纯粹生物数据,蕴含了丰富的社会背景信息。
  6. 对模型如何感知和利用这些隐藏的社会签名的研究重新定义了医疗人工智能的公平性挑战。

Cool Papers

点此查看论文截图

ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization

Authors:Anzhe Cheng, Shukai Duan, Shixuan Li, Chenzhong Yin, Mingxi Cheng, Heng Ping, Tamoghna Chattopadhyay, Sophia I Thomopoulos, Shahin Nazarian, Paul Thompson, Paul Bogdan

Mixture-of-Experts (MoE) architectures expand model capacity by sparsely activating experts but face two core challenges: misalignment between router logits and each expert’s internal structure leads to unstable routing and expert underutilization, and load imbalances create straggler bottlenecks. Standard solutions, such as auxiliary load-balancing losses, can reduce load disparities but often weaken expert specialization and hurt downstream performance. To address these issues, we propose ERMoE, a sparse MoE transformer that reparameterizes each expert in a learned orthonormal eigenbasis and replaces learned gating logits with an “Eigenbasis Score”, defined as the cosine similarity between input features and an expert’s basis. This content-aware routing ties token assignments directly to experts’ representation spaces, stabilizing utilization and promoting interpretable specialization without sacrificing sparsity. Crucially, ERMoE removes the need for explicit balancing losses and avoids the interfering gradients they introduce. We show that ERMoE achieves state-of-the-art accuracy on ImageNet classification and cross-modal image-text retrieval benchmarks (e.g., COCO, Flickr30K), while naturally producing flatter expert load distributions. Moreover, a 3D MRI variant (ERMoE-ba) improves brain age prediction accuracy by more than 7% and yields anatomically interpretable expert specializations. ERMoE thus introduces a new architectural principle for sparse expert models that directly addresses routing instabilities and enables improved performance with scalable, interpretable specialization.

专家的混合(MoE)架构通过稀疏地激活专家来扩展模型容量,但面临两大核心挑战:路由器逻辑与每个专家的内部结构之间存在不匹配,导致路由不稳定和专家利用不足,而负载不均衡则会产生滞后瓶颈。标准解决方案,如辅助负载均衡损失,可以减少负载差异,但往往会削弱专家的专业化并损害下游性能。为了解决这些问题,我们提出了ERMoE,这是一种稀疏的MoE转换器,它在学习的正交特征基中对每个专家进行重新参数化,并用“特征基得分”替换学习到的门控逻辑,该得分定义为输入特征与专家基础之间的余弦相似性。这种内容感知的路由直接将令牌分配与专家的表示空间相关联,稳定了利用,促进了可解释的专业化,同时不牺牲稀疏性。关键的是,ERMoE消除了对显式平衡损失的需求,避免了它们引入的干扰梯度。我们表明,ERMoE在ImageNet分类和跨模态图像文本检索基准测试(例如COCO、Flickr30K)上实现了最先进的准确性,同时自然地产生了平坦的专家负载分布。此外,3D MRI变体(ERMoE-ba)提高了脑龄预测精度超过7%,并产生了可解剖解释的专家专业化。因此,ERMoE为稀疏专家模型引入了一个新的架构原则,直接解决了路由不稳定问题,并通过可缩放、可解释的专业化实现了改进的性能。

论文及项目相关链接

PDF

摘要
针对MoE(Mixture-of-Experts)架构中路由不稳定和专家利用不足的问题,提出ERMoE方案。该方案通过重新参数化专家模型,使用正交基替换学习到的门控对数几率(gating logits),并采用基于输入特征和专家基之间的余弦相似性的Eigenbasis Score进行内容感知路由。此方法提高了专家利用率,促进了可解释的专门化,同时避免了牺牲稀疏性。ERMoE无需显式平衡损失,避免了干扰梯度的问题。在ImageNet分类和跨模态图像文本检索基准测试中表现最佳,同时自然产生更平坦的专家负载分布。此外,在3D MRI的ERMoE-ba变体提高了脑龄预测精度超过7%,并产生可解释的专家专业化。

关键见解

  • ERMoE解决了MoE架构中的路由不稳定和专家利用不足的问题。
  • ERMoE通过重新参数化专家模型,使用正交基替换学习到的门控对数几率。
  • 基于输入特征和专家基之间的余弦相似性的Eigenbasis Score用于内容感知路由。
  • ERMoE提高了专家利用率并促进了可解释的专门化,同时保持模型的稀疏性。
  • ERMoE无需显式平衡损失,避免了干扰梯度的问题。
  • ERMoE在图像分类和跨模态检索等多个任务上表现最佳。
  • ERMoE提高了脑龄预测精度,并展现出可解释的专家专业化在医学图像分析中的应用潜力。

Cool Papers

点此查看论文截图

Divide, Conquer and Unite: Hierarchical Style-Recalibrated Prototype Alignment for Federated Medical Image Segmentation

Authors:Xingyue Zhao, Wenke Huang, Xingguang Wang, Haoyu Zhao, Linghao Zhuang, Anwen Jiang, Guancheng Wan, Mang Ye

Federated learning enables multiple medical institutions to train a global model without sharing data, yet feature heterogeneity from diverse scanners or protocols remains a major challenge. Many existing works attempt to address this issue by leveraging model representations (e.g., mean feature vectors) to correct local training; however, they often face two key limitations: 1) Incomplete Contextual Representation Learning: Current approaches primarily focus on final-layer features, overlooking critical multi-level cues and thus diluting essential context for accurate segmentation. 2) Layerwise Style Bias Accumulation: Although utilizing representations can partially align global features, these methods neglect domain-specific biases within intermediate layers, allowing style discrepancies to build up and reduce model robustness. To address these challenges, we propose FedBCS to bridge feature representation gaps via domain-invariant contextual prototypes alignment. Specifically, we introduce a frequency-domain adaptive style recalibration into prototype construction that not only decouples content-style representations but also learns optimal style parameters, enabling more robust domain-invariant prototypes. Furthermore, we design a context-aware dual-level prototype alignment method that extracts domain-invariant prototypes from different layers of both encoder and decoder and fuses them with contextual information for finer-grained representation alignment. Extensive experiments on two public datasets demonstrate that our method exhibits remarkable performance.

联合学习使多个医疗机构能够在不共享数据的情况下训练全球模型,但来自不同扫描仪或协议的特征异质性仍然是一个主要挑战。许多现有工作试图通过利用模型表示(例如,平均特征向量)来纠正局部训练来解决这个问题;然而,它们通常面临两个关键局限:1)上下文表示学习不完整:当前方法主要关注于最终层特征,忽略了关键的多层次线索,从而稀释了准确分割的必备上下文。2)逐层风格偏差累积:虽然利用表示可以部分对齐全局特征,但这些方法忽视了中间层中的特定领域偏差,使得风格差异累积并降低了模型的稳健性。为了解决这些挑战,我们提出FedBCS来通过领域不变上下文原型对齐来弥补特征表示差距。具体来说,我们在原型构建中引入了一种频域自适应风格重新校准,这不仅解耦了内容-风格表示,还学习了最佳风格参数,从而实现更稳健的领域不变原型。此外,我们设计了一种上下文感知的双级原型对齐方法,从不同层的编码器和解码器中提取领域不变原型,并与上下文信息融合,以实现更精细的表示对齐。在两个公开数据集上的大量实验表明,我们的方法表现出卓越的性能。

论文及项目相关链接

PDF Accepted at AAAI-26

Summary

本文提出FedBCS方法来解决联邦学习中医学图像特征异质性问题。通过引入域不变上下文原型对齐和频率域自适应风格重新校准技术,提高了模型的稳健性和准确性。

Key Takeaways

  1. 联邦学习在医学图像领域的应用面临特征异质性的挑战。
  2. 现有方法通过利用模型表示来纠正局部训练,但存在上下文表示学习不完整和层次风格偏差累积两个关键局限。
  3. FedBCS被提出以解决特征表示差距,通过域不变上下文原型对齐来建立桥梁。
  4. 引入频率域自适应风格重新校准,以解耦内容-风格表示并学习最佳风格参数。
  5. 设计了上下文感知的双级原型对齐方法,从编码器和解码器的不同层提取域不变原型,并与上下文信息进行精细对齐。
  6. 在两个公共数据集上的广泛实验证明了该方法的有效性。

Cool Papers

点此查看论文截图

FAST-CAD: A Fairness-Aware Framework for Non-Contact Stroke Diagnosis

Authors:Tianming Sha, Zechuan Chen, Zhan Cheng, Haotian Zhai, Xuwei Ding, Keze Wang

Stroke is an acute cerebrovascular disease, and timely diagnosis significantly improves patient survival. However, existing automated diagnosis methods suffer from fairness issues across demographic groups, potentially exacerbating healthcare disparities. In this work we propose FAST-CAD, a theoretically grounded framework that combines domain-adversarial training (DAT) with group distributionally robust optimization (Group-DRO) for fair and accurate non-contact stroke diagnosis. Our approach is built on domain adaptation and minimax fairness theory and provides convergence guarantees and fairness bounds. We curate a multimodal dataset covering 12 demographic subgroups defined by age, gender, and posture. FAST-CAD employs self-supervised encoders with adversarial domain discrimination to learn demographic-invariant representations, while Group-DRO optimizes worst-group risk to ensure robust performance across all subgroups. Extensive experiments show that our method achieves superior diagnostic performance while maintaining fairness across demographic groups, and our theoretical analysis supports the effectiveness of the unified DAT + Group-DRO framework. This work provides both practical advances and theoretical insights for fair medical AI systems.

中风是一种急性脑血管疾病,及时诊断能显著提高患者的存活率。然而,现有的自动化诊断方法在各人口群体中存在公平性问题,可能加剧医疗保健差异。在本研究中,我们提出了FAST-CAD,这是一个有理论基础的框架,它将域对抗训练(DAT)与群体分布鲁棒优化(Group-DRO)相结合,用于公平且精确的非接触式中风诊断。我们的方法建立在域适应和极小极大公平性理论之上,提供收敛保证和公平性界限。我们整理了一个多模式数据集,覆盖由年龄、性别和姿势定义的12个人口亚组。FAST-CAD采用自监督编码器与对抗域判别,学习人口不变的表示,而Group-DRO则优化最劣群体风险,以确保所有亚组的稳健性能。大量实验表明,我们的方法在维持各人口群体的公平性的同时,实现了优越的诊断性能。我们的理论分析支持了DAT+Group-DRO统一框架的有效性。本研究为公平的医学人工智能系统提供了实践进展和理论见解。

论文及项目相关链接

PDF Accepted for oral presentation at the AAAI Conference on Artificial Intelligence 2026 (AAAI 2026)

Summary

该文章介绍了急性脑血管疾病即脑卒中的诊断情况。现有的自动化诊断方法存在跨不同群体之间公平性问题,可能导致健康护理的差距加剧。本研究提出了一种结合领域对抗训练(DAT)与群体分布稳健优化(Group-DRO)的理论框架,称为FAST-CAD,用于实现公平而准确的非接触卒中诊断。其构建在领域适应和极小极大公平性理论基础上,并提供收敛保证和公平性界限。研究通过构建一个覆盖由年龄、性别和姿势定义的12个亚群体构成的多模式数据集进行实验验证,结果显示FAST-CAD在保持公平性的同时,在诊断性能上表现优异。本研究为公平医疗人工智能系统提供了实践进展和理论见解。

Key Takeaways

  1. 脑卒中是一种急性脑血管疾病,及时诊断能显著提高患者生存率。
  2. 现有自动化诊断方法在不同群体之间存在公平性问题。
  3. FAST-CAD框架结合了领域对抗训练(DAT)和群体分布稳健优化(Group-DRO),旨在实现公平而准确的非接触卒中诊断。
  4. FAST-CAD建立在领域适应和极小极大公平性理论基础上,确保收敛性和公平性界限。
  5. 研究通过包含多个亚群体的多模式数据集进行验证,这些亚群体是根据年龄、性别和姿势定义的。
  6. FAST-CAD使用自监督编码器和对抗领域鉴别来实现群体不变表示学习,同时通过Group-DRO优化最糟糕群体风险来确保所有亚群体的稳健性能。

Cool Papers

点此查看论文截图

UCDSC: Open Set UnCertainty aware Deep Simplex Classifier for Medical Image Datasets

Authors:Arnav Aditya, Nitin Kumar, Saurabh Shigwan

Driven by advancements in deep learning, computer-aided diagnoses have made remarkable progress. However, outside controlled laboratory settings, algorithms may encounter several challenges. In the medical domain, these difficulties often stem from limited data availability due to ethical and legal restrictions, as well as the high cost and time required for expert annotations-especially in the face of emerging or rare diseases. In this context, open-set recognition plays a vital role by identifying whether a sample belongs to one of the known classes seen during training or should be rejected as an unknown. Recent studies have shown that features learned in the later stages of deep neural networks are observed to cluster around their class means, which themselves are arranged as individual vertices of a regular simplex [32]. The proposed method introduces a loss function designed to reject samples of unknown classes effectively by penalizing open space regions using auxiliary datasets. This approach achieves significant performance gain across four MedMNIST datasets-BloodMNIST, OCTMNIST, DermaMNIST, TissueMNIST and a publicly available skin dataset [29] outperforming state-of-the-art techniques.

随着深度学习的发展,计算机辅助诊断已经取得了显著的进步。然而,在实验室控制环境之外,算法可能会遇到许多挑战。在医学领域,这些困难往往源于由于伦理和法律的限制导致的数据可用性的限制,以及尤其是面对新兴或罕见疾病时,专家注释所需的高成本和时间。在这种情况下,开放集识别发挥了重要作用,它可以确定一个样本是否属于训练期间见过的已知类别,或者是否应该被拒绝为未知样本。最近的研究表明,在深度神经网络的后期阶段学习的特征会被观察到聚集在其类别均值周围,而这些均值本身则被排列为规则单纯形的各个顶点[32]。所提出的方法引入了一种损失函数,旨在通过利用辅助数据集对开放空间区域进行惩罚,从而有效地拒绝未知类别的样本。该方法在四个MedMNIST数据集(BloodMNIST、OCTMNIST、DermaMNIST、TissueMNIST)和公开可用的皮肤数据集[29]上实现了显著的性能提升,超越了最新技术。

论文及项目相关链接

PDF 10 pages, Accepted at IEEE/CVF WACV 2026, Source code is available at this URL https://github.com/Arnavadi19/UCDSC

Summary
深度学习驱动下的计算机辅助诊断已取得了显著进展。然而,在实际应用环境中,算法面临诸多挑战,如医学领域的数据获取受限、标注成本高昂且耗时等。开放集识别技术在此起到关键作用,能识别样本是否属于训练时的已知类别,或是未知样本。最新研究表明,深度神经网络后期学习的特征会围绕类别均值聚集,这些均值作为正则单纯形的顶点排列。研究方法通过设计损失函数有效拒绝未知类别的样本,利用辅助数据集对开放空间区域进行惩罚。该方法在四个MedMNIST数据集及一个公开皮肤数据集上表现优越,超越了现有技术。

Key Takeaways

  1. 深度学习在计算机辅助诊断中取得了显著进展。
  2. 实际应用中,算法面临数据获取受限、标注成本高等挑战。
  3. 开放集识别技术可识别样本是否属于已知类别或未知样本。
  4. 深度神经网络后期学习的特征会围绕类别均值聚集。
  5. 研究的损失函数能有效拒绝未知类别的样本。
  6. 该方法在多个MedMNIST数据集及公开皮肤数据集上表现优越。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
TTS TTS
TTS 方向最新论文已更新,请持续关注 Update in 2025-11-18 CLARITY Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation
2025-11-18
下一篇 
Diffusion Models Diffusion Models
Diffusion Models 方向最新论文已更新,请持续关注 Update in 2025-11-18 Intrinsic Dimension Estimation for Radio Galaxy Zoo using Diffusion Models
  目录