嘘~ 正在从服务器偷取页面 . . .

医学图像


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-09-20 更新

Semi-Supervised 3D Medical Segmentation from 2D Natural Images Pretrained Model

Authors:Pak-Hei Yeung, Jayroop Ramesh, Pengfei Lyu, Ana Namburete, Jagath Rajapakse

This paper explores the transfer of knowledge from general vision models pretrained on 2D natural images to improve 3D medical image segmentation. We focus on the semi-supervised setting, where only a few labeled 3D medical images are available, along with a large set of unlabeled images. To tackle this, we propose a model-agnostic framework that progressively distills knowledge from a 2D pretrained model to a 3D segmentation model trained from scratch. Our approach, M&N, involves iterative co-training of the two models using pseudo-masks generated by each other, along with our proposed learning rate guided sampling that adaptively adjusts the proportion of labeled and unlabeled data in each training batch to align with the models’ prediction accuracy and stability, minimizing the adverse effect caused by inaccurate pseudo-masks. Extensive experiments on multiple publicly available datasets demonstrate that M&N achieves state-of-the-art performance, outperforming thirteen existing semi-supervised segmentation approaches under all different settings. Importantly, ablation studies show that M&N remains model-agnostic, allowing seamless integration with different architectures. This ensures its adaptability as more advanced models emerge. The code is available at https://github.com/pakheiyeung/M-N.

本文探讨了从在2D自然图像上预训练的通用视觉模型迁移知识,以提高3D医学图像分割的效果。我们关注半监督设置,其中只有少数3D医学图像有标签,以及大量无标签图像。为此,我们提出了一个模型无关框架,该框架从2D预训练模型逐步蒸馏知识,用于训练从无开始的3D分割模型。我们的方法M&N涉及使用由彼此生成的伪掩膜对两个模型进行迭代协同训练,以及我们提出的学习率引导采样,该采样可自适应地调整每个训练批次中有标签和无标签数据的比例,以符合模型的预测准确性和稳定性,从而最小化由不准确的伪掩膜引起的不利影响。在多个公开数据集上的广泛实验表明,M&N达到了最新技术水平,在所有不同设置下超过了十三种现有的半监督分割方法。重要的是,消融研究表明,M&N仍然是模型无关的,可以与不同架构无缝集成。这确保了其作为更先进模型出现时的适应性。代码可在https://github.com/pakheiyeung/M-N上找到。

论文及项目相关链接

PDF Machine Learning in Medical Imaging (MLMI) 2025 Oral

Summary

该论文研究了基于自然图像预训练的通用视觉模型在改进三维医学图像分割中的应用。文章关注半监督场景,即只有少量标记的三维医学图像和大量未标记的图像。论文提出了一个通用的框架M&N,它从预训练的二维模型逐步提炼知识并应用于从零开始训练的三维分割模型。通过两个模型的迭代协同训练以及伪掩码的生成和使用,结合提出的学习率引导采样技术,自适应调整标记和未标记数据在训练批次中的比例,与模型的预测精度和稳定性相匹配,减少了伪掩码不准确带来的负面影响。在多个公开数据集上的实验表明,M&N达到了最先进的性能水平,在所有不同设置下均优于十三种现有的半监督分割方法。此外,M&N具有模型通用性,可无缝集成到不同的架构中,确保了其随着更先进模型的涌现而适应的能力。论文代码可在指定网址下载。

Key Takeaways

  1. 研究关注于将预训练的通用视觉模型应用于三维医学图像分割。
  2. 提出了模型通用框架M&N进行二维至三维医学图像分割知识的转移。
  3. M&N通过迭代协同训练和伪掩码生成促进知识提炼和应用。
  4. 学习率引导采样技术用于调整训练数据比例,以匹配模型的预测精度和稳定性。
  5. 在多个数据集上表现优于现有半监督分割方法。
  6. M&N具有模型通用性,可与不同架构无缝集成。

Cool Papers

点此查看论文截图

No Modality Left Behind: Adapting to Missing Modalities via Knowledge Distillation for Brain Tumor Segmentation

Authors:Shenghao Zhu, Yifei Chen, Weihong Chen, Shuo Jiang, Guanyu Zhou, Yuanhan Wang, Feiwei Qin, Changmiao Wang, Qiyuan Tian

Accurate brain tumor segmentation is essential for preoperative evaluation and personalized treatment. Multi-modal MRI is widely used due to its ability to capture complementary tumor features across different sequences. However, in clinical practice, missing modalities are common, limiting the robustness and generalizability of existing deep learning methods that rely on complete inputs, especially under non-dominant modality combinations. To address this, we propose AdaMM, a multi-modal brain tumor segmentation framework tailored for missing-modality scenarios, centered on knowledge distillation and composed of three synergistic modules. The Graph-guided Adaptive Refinement Module explicitly models semantic associations between generalizable and modality-specific features, enhancing adaptability to modality absence. The Bi-Bottleneck Distillation Module transfers structural and textural knowledge from teacher to student models via global style matching and adversarial feature alignment. The Lesion-Presence-Guided Reliability Module predicts prior probabilities of lesion types through an auxiliary classification task, effectively suppressing false positives under incomplete inputs. Extensive experiments on the BraTS 2018 and 2024 datasets demonstrate that AdaMM consistently outperforms existing methods, exhibiting superior segmentation accuracy and robustness, particularly in single-modality and weak-modality configurations. In addition, we conduct a systematic evaluation of six categories of missing-modality strategies, confirming the superiority of knowledge distillation and offering practical guidance for method selection and future research. Our source code is available at https://github.com/Quanato607/AdaMM.

精确的脑肿瘤分割对于术前评估和个性化治疗至关重要。多模态MRI由于其能够在不同序列中捕获互补的肿瘤特征而得到广泛应用。然而,在临床实践中,缺失模态的情况很常见,这限制了依赖完整输入的现有深度学习方法的稳健性和通用性,特别是在非主导模态组合下。为了解决这一问题,我们提出了AdaMM,这是一个针对缺失模态场景的多模态脑肿瘤分割框架,以知识蒸馏为中心,由三个协同模块组成。图引导自适应细化模块显式建模通用特征和模态特定特征之间的语义关联,增强了适应模态缺失的能力。双瓶颈蒸馏模块通过全局风格匹配和对抗性特征对齐,将教师和学生的模型之间的结构和纹理知识进行了传递。病灶存在引导可靠性模块通过辅助分类任务预测病灶类型的先验概率,有效地抑制了不完全输入下的误报。在BraTS 2018和2024数据集上的大量实验表明,AdaMM始终优于现有方法,表现出更高的分割精度和稳健性,特别是在单模态和弱模态配置下。此外,我们对六类缺失模态策略进行了系统评估,证实了知识蒸馏的优越性,并为方法选择和未来研究提供了实际指导。我们的源代码可在https://github.com/Quanato607/AdaMM上找到。

论文及项目相关链接

PDF 38 pages, 9 figures

Summary

本文提出一种AdaMM的多模态脑肿瘤分割框架,适用于缺失模态场景。该框架包含三个协同模块,通过知识蒸馏技术增强对缺失模态的适应性。实验结果表明,AdaMM在BraTS 2018和2024数据集上表现出卓越的分割精度和稳健性,特别是在单模态和弱模态配置下。

Key Takeaways

  1. AdaMM是一个针对缺失模态场景的多模态脑肿瘤分割框架。
  2. 框架包含三个协同模块,通过知识蒸馏技术增强对缺失模态的适应性。
  3. AdaMM在BraTS 2018和2024数据集上表现出卓越性能。
  4. 在单模态和弱模态配置下,AdaMM的分割精度和稳健性尤其突出。
  5. 系统评价了六种缺失模态策略,确认知识蒸馏的优越性。
  6. AdaMM源代码已公开,可供研究使用。

Cool Papers

点此查看论文截图

EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence

Authors:Chaoyin She, Ruifang Lu, Lida Chen, Wei Wang, Qinghua Huang

Ultrasound imaging has become the preferred imaging modality for early cancer screening due to its advantages of non-ionizing radiation, low cost, and real-time imaging capabilities. However, conventional ultrasound diagnosis heavily relies on physician expertise, presenting challenges of high subjectivity and low diagnostic efficiency. Vision-language models (VLMs) offer promising solutions for this issue, but existing general-purpose models demonstrate limited knowledge in ultrasound medical tasks, with poor generalization in multi-organ lesion recognition and low efficiency across multi-task diagnostics. To address these limitations, we propose EchoVLM, a vision-language model specifically designed for ultrasound medical imaging. The model employs a Mixture of Experts (MoE) architecture trained on data spanning seven anatomical regions. This design enables the model to perform multiple tasks, including ultrasound report generation, diagnosis and visual question-answering (VQA). The experimental results demonstrated that EchoVLM achieved significant improvements of 10.15 and 4.77 points in BLEU-1 scores and ROUGE-1 scores respectively compared to Qwen2-VL on the ultrasound report generation task. These findings suggest that EchoVLM has substantial potential to enhance diagnostic accuracy in ultrasound imaging, thereby providing a viable technical solution for future clinical applications. Source code and model weights are available at https://github.com/Asunatan/EchoVLM.

超声成像因其非电离辐射、成本低、实时成像能力强的优势,已成为早期癌症筛查的首选成像方式。然而,传统的超声诊断高度依赖医生的专业知识,存在主观性高、诊断效率低的挑战。视觉语言模型(VLM)为解决这一问题提供了有前景的解决方案,但现有的通用模型在超声医学任务中的知识有限,在多器官病变识别中的泛化能力较差,多任务诊断的效率也较低。为了解决这个问题,我们提出了专为超声医学成像设计的视觉语言模型EchoVLM。该模型采用混合专家(MoE)架构,在涵盖七个解剖区域的数据上进行训练。这种设计使模型能够执行多项任务,包括生成超声报告、诊断和视觉问答(VQA)。实验结果表明,与Qwen2-VL相比,EchoVLM在超声报告生成任务上的BLEU-1得分和ROUGE-1得分分别提高了10.15和4.77分。这些发现表明,EchoVLM在超声成像诊断准确性方面具有巨大的潜力,为未来临床应用提供了可行的技术解决方案。源代码和模型权重可在https://github.com/Asunatan/EchoVLM获取。

论文及项目相关链接

PDF

Summary

超声成像因具有非电离辐射、成本低、实时成像等优点,已成为早期癌症筛查的首选影像技术。然而,传统超声诊断高度依赖医师经验,存在主观性高、诊断效率低的挑战。本文提出一种针对超声医学影像的特定视觉语言模型EchoVLM,采用混合专家架构,在七个解剖区域的数据上进行训练。该模型可完成超声报告生成、诊断及视觉问答等多项任务,并在超声报告生成任务上较Qwen2-VL模型显著提高BLEU-1和ROUGE-1分数。这显示了EchoVLM在超声影像诊断中的巨大潜力,为未来临床应用提供了可行的技术解决方案。

Key Takeaways

  1. 超声成像成为早期癌症筛查的首选模态,但传统诊断存在主观性高、效率低的问题。
  2. 视觉语言模型(VLMs)为解决这一问题提供了潜力。
  3. 现有通用VLM在超声医疗任务中知识有限,多器官病变识别泛化能力低。
  4. EchoVLM模型专为超声医学影像设计,采用混合专家架构,可完成多项任务。
  5. EchoVLM在超声报告生成任务上较Qwen2-VL模型有显著改善。
  6. EchoVLM有望提高超声影像诊断的准确性。

Cool Papers

点此查看论文截图

ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification

Authors:Alvaro Lopez Pellicer, Andre Mariucci, Plamen Angelov, Marwan Bukhari, Jemma G. Kerns

Bone health studies are crucial in medical practice for the early detection and treatment of Osteopenia and Osteoporosis. Clinicians usually make a diagnosis based on densitometry (DEXA scans) and patient history. The applications of AI in this field are ongoing research. Most successful methods rely on deep learning models that use vision alone (DEXA/X-ray imagery) and focus on prediction accuracy, while explainability is often disregarded and left to post hoc assessments of input contributions. We propose ProtoMedX, a multi-modal model that uses both DEXA scans of the lumbar spine and patient records. ProtoMedX’s prototype-based architecture is explainable by design, which is crucial for medical applications, especially in the context of the upcoming EU AI Act, as it allows explicit analysis of model decisions, including incorrect ones. ProtoMedX demonstrates state-of-the-art performance in bone health classification while also providing explanations that can be visually understood by clinicians. Using a dataset of 4,160 real NHS patients, the proposed ProtoMedX achieves 87.58% accuracy in vision-only tasks and 89.8% in its multi-modal variant, both surpassing existing published methods.

在医疗实践中,骨骼健康研究对于骨量减少和骨质疏松的早期检测和治疗至关重要。临床医生通常根据骨密度测定(DEXA扫描)和患者病史进行诊断。人工智能在该领域的应用正在持续研究中。大多数成功的方法依赖于使用深度学习的模型,仅使用视觉(DEXA/X射线影像),并专注于预测准确性,而可解释性通常被忽视并在事后对输入贡献进行评估。我们提出了ProtoMedX,这是一种使用腰椎DEXA扫描和患者记录的多模式模型。ProtoMedX基于原型的设计是可解释的,这在医疗应用中至关重要,特别是在即将到来的欧盟人工智能法案的背景下,因为它允许对模型决策进行明确分析,包括错误的决策。ProtoMedX在骨骼健康分类方面达到了最先进的性能,同时提供了临床医生可以视觉理解的解释。使用包含4,160名真实NHS患者的数据集,所提出的ProtoMedX在仅视觉任务中达到87.58%的准确率,多模式变体达到89.8%,均超过了已发布的方法。

论文及项目相关链接

PDF Accepted ICCV 2025. Adaptation, Fairness, Explainability in AI Medical Imaging (PHAROS-AFE-AIMI Workshop). 8 pages, 5 figures, 4 tables

Summary
提出ProtoMedX多模态模型,结合DEXA扫描和病人记录,用于骨骼健康诊断。该模型具有原型基础架构,可解释设计,能分析模型决策,包括错误决策。在骨骼健康分类上表现优异,准确率高,且解释可视,便于医生理解。

Key Takeaways

  1. 骨健康研究对医学实践至关重要,有助于早期检测和治疗骨质疏松等疾病。
  2. 目前临床通常通过骨密度测定(DEXA扫描)和患者病史进行诊断。
  3. AI在该领域的应用正在进行中,大多数成功的方法依赖于使用深度学习模型进行视觉分析(DEXA/X射线影像),并侧重于预测准确性。
  4. 现有模型在解释性方面常被忽视,留给后续评估。
  5. ProtoMedX是一个多模态模型,结合DEXA扫描和病人记录,旨在解决这一问题。
  6. ProtoMedX的原型基础架构设计可解释性强,有助于分析模型决策,包括错误的决策。

Cool Papers

点此查看论文截图

Radiology Report Conditional 3D CT Generation with Multi Encoder Latent diffusion Model

Authors:Sina Amirrajab, Zohaib Salahuddin, Sheng Kuang, Henry C. Woodruff, Philippe Lambin

Text to image latent diffusion models have recently advanced medical image synthesis, but applications to 3D CT generation remain limited. Existing approaches rely on simplified prompts, neglecting the rich semantic detail in full radiology reports, which reduces text image alignment and clinical fidelity. We propose Report2CT, a radiology report conditional latent diffusion framework for synthesizing 3D chest CT volumes directly from free text radiology reports, incorporating both findings and impression sections using multiple text encoder. Report2CT integrates three pretrained medical text encoders (BiomedVLP CXR BERT, MedEmbed, and ClinicalBERT) to capture nuanced clinical context. Radiology reports and voxel spacing information condition a 3D latent diffusion model trained on 20000 CT volumes from the CT RATE dataset. Model performance was evaluated using Frechet Inception Distance (FID) for real synthetic distributional similarity and CLIP based metrics for semantic alignment, with additional qualitative and quantitative comparisons against GenerateCT model. Report2CT generated anatomically consistent CT volumes with excellent visual quality and text image alignment. Multi encoder conditioning improved CLIP scores, indicating stronger preservation of fine grained clinical details in the free text radiology reports. Classifier free guidance further enhanced alignment with only a minor trade off in FID. We ranked first in the VLM3D Challenge at MICCAI 2025 on Text Conditional CT Generation and achieved state of the art performance across all evaluation metrics. By leveraging complete radiology reports and multi encoder text conditioning, Report2CT advances 3D CT synthesis, producing clinically faithful and high quality synthetic data.

文本到图像潜在扩散模型最近推动了医学图像合成的发展,但其在3D CT生成方面的应用仍然有限。现有方法依赖于简化的提示,忽略了完整放射学报告中的丰富语义细节,这降低了文本图像对齐和临床保真度。我们提出了Report2CT,这是一个基于放射学报告的潜在扩散框架,可直接从自由文本放射学报告合成3D胸部CT体积,并结合检查结果和印象部分使用多个文本编码器。Report2CT集成了三个预训练的医学文本编码器(BiomedVLP CXR BERT、MedEmbed和ClinicalBERT)来捕捉微妙的临床上下文。放射学报告和体素间距信息对在CT RATE数据集上经过训练的2万个CT体积的3D潜在扩散模型进行了条件处理。模型性能评估采用Frechet Inception Distance(FID)进行真实与合成分布相似性评估,并使用基于CLIP的指标进行语义对齐评估,同时与GenerateCT模型进行定性和定量比较。Report2CT生成的CT体积解剖结构一致,视觉质量上乘,文本图像对齐效果良好。多编码器条件改善了CLIP得分,表明在自由文本放射学报告中精细的临床细节得到了更好的保留。无分类器指导进一步增强了对齐效果,同时FID的牺牲较小。在MICCAI 2025年的VLM3D挑战中,我们在文本条件CT生成方面排名第一,并在所有评估指标上实现了最先进的性能。通过利用完整的放射学报告和多编码器文本条件处理,Report2CT推动了3D CT的合成,生成了临床真实且高质量的综合数据。

论文及项目相关链接

PDF

Summary

本文介绍了Report2CT模型在医学图像合成领域的应用。该模型通过结合完整的放射学报告和多编码器文本条件技术,实现了从文本直接生成3D胸部CT体积的先进技术。该模型在CT RATE数据集上进行了训练,并通过多种评估指标证明了其优秀的性能,包括Frechet Inception Distance(FID)和CLIP指标。Report2CT在文本图像对齐和解剖结构一致性方面表现出色,并在MICCAI 2025的VLM3D挑战中获得了第一名。

Key Takeaways

  1. Report2CT是一个基于放射学报告条件的潜在扩散框架,可直接从文本放射学报告生成3D胸部CT体积。
  2. 该模型结合了发现与印象两部分,使用多个文本编码器来捕捉微妙的临床背景。
  3. Report2CT使用了三种预训练的医疗文本编码器,包括BiomedVLP CXR BERT、MedEmbed和ClinicalBERT。
  4. 模型在CT RATE数据集上进行了训练,包含20000个CT体积。
  5. 通过Frechet Inception Distance(FID)和CLIP指标对模型性能进行了评估。
  6. Report2CT生成的CT体积在解剖结构一致性和视觉质量方面表现出色,文本图像对齐良好。
  7. 多编码器条件技术改进了CLIP分数,表明在自由文本放射学报告中保留了更精细的临床细节。

Cool Papers

点此查看论文截图

A Noninvasive and Dispersive Framework for Estimating Nonuniform Conductivity of Brain Tumor in Patient-Specific Head Models

Authors:Yoshiki Kubota, Yosuke Nagata, Manabu Tamura, Akimasa Hirata

We propose a noninvasive and dispersive framework for estimating the spatially nonuniform conductivity of brain tumors using MR images. The method consists of two components: (i) voxel-wise assignment of tumor conductivity based on reference values fitted to the Cole-Cole model using empirical data from the literature and (ii) fine-tuning of a deep learning model pretrained on healthy participants. A total of 67 cases, comprising both healthy participants and tumor patients and including 9,806 paired T1- and T2-weighted MR images, were used for training and evaluation. The proposed method successfully estimated patient-specific conductivity maps, exhibiting smooth spatial variations that reflected tissue characteristics, such as edema, necrosis, and rim-associated intensity gradients observed in T1- and T2-weighted MR images. At 10 kHz, case-wise mean conductivity values varied across patients, ranging from 0.132 to 0.512 S/m in the rim (defined as the region within 2 mm of the tumor boundary), from 0.132 to 0.608 S/m in the core (the area inside the rim), and from 0.141 to 0.542 S/m in the entire tumor. Electromagnetic simulations for transcranial magnetic stimulation in individualized head models showed substantial differences in intratumoral field distributions between uniform assignments and the proposed nonuniform maps. Furthermore, this framework demonstrated voxel-wise dispersive mapping at 10 kHz, 1 MHz, and 100 MHz. This framework supports accurate whole-brain conductivity estimation by incorporating both individual anatomical structures and tumor-specific characteristics. Collectively, these results advance patient-specific EM modeling for tumor-bearing brains and lay the groundwork for subsequent microwave-band validation.

我们提出了一种利用MR图像估计脑肿瘤空间非均匀导电性的无创分散框架。该方法由两部分组成:(i)基于文献中的经验数据对Cole-Cole模型进行拟合,为每个体素分配肿瘤导电率;(ii)对在健康参与者上预训练的深度学习模型进行微调。共使用67个病例,包括健康参与者和肿瘤患者的9806对T1和T2加权MR图像,用于训练和评估。所提出的方法成功地估计了患者特定的导电率图,显示出平滑的空间变化,反映了组织特征,如水肿、坏死和在T1和T2加权MR图像中观察到的边缘相关强度梯度。在10 kHz频率下,不同患者之间的平均导电率值有所变化,边缘区域的导电率在0.132至0.512 S/m之间(定义为肿瘤边界2毫米内的区域),核心区域的导电率在0.132至0.608 S/m之间(定义为边缘内的区域),整个肿瘤的导电率在0.141至0.542 S/m之间。针对个性化头部模型的颅内磁刺激电磁模拟显示,均匀分配与非均匀图之间的肿瘤内场分布存在显著差异。此外,该框架在10 kHz、1 MHz和100 MHz下实现了体素分散映射。该框架通过结合个体解剖结构和肿瘤特征,支持准确的全脑导电率估计。总体而言,这些结果推动了患者特定的肿瘤承载脑电磁建模,并为随后的微波频段验证奠定了基础。

论文及项目相关链接

PDF This work has been submitted to the IEEE for possible publication

Summary
本文提出一种利用MR图像估计脑肿瘤空间非均匀电导率的非侵入性分散框架。该方法包括两部分:基于文献经验数据对Cole-Cole模型进行肿瘤电导率体素级赋值,以及对健康参与者预训练深度学习模型的微调。该研究使用包含健康参与者和肿瘤患者的67例病例及9806对T1和T2加权MR图像进行训练和评估。该方法成功估计了患者特定的电导率图,反映了如水肿、坏死和边缘相关强度梯度等组织特征。此框架支持结合个体解剖结构和肿瘤特征进行全脑电导率准确估计,为肿瘤性脑部的患者特定电磁建模奠定了基础。

Key Takeaways

  1. 提出了一个基于MR图像的非侵入性分散框架,用于估计脑肿瘤的空间非均匀电导率。
  2. 方法包括体素级的肿瘤电导率赋值和深度学习模型的微调。
  3. 使用包含健康参与者和肿瘤患者的病例进行训练和评估。
  4. 成功估计了患者特定的电导率图,反映组织特征如水肿、坏死等。
  5. 该框架支持全脑电导率的准确估计,结合个体解剖结构和肿瘤特征。
  6. 电磁模拟显示,与均匀分配相比,非均匀地图在脑内肿瘤场内部分布存在显著差异。

Cool Papers

点此查看论文截图

HybridMamba: A Dual-domain Mamba for 3D Medical Image Segmentation

Authors:Weitong Wu, Zhaohu Xing, Jing Gong, Qin Peng, Lei Zhu

In the domain of 3D biomedical image segmentation, Mamba exhibits the superior performance for it addresses the limitations in modeling long-range dependencies inherent to CNNs and mitigates the abundant computational overhead associated with Transformer-based frameworks when processing high-resolution medical volumes. However, attaching undue importance to global context modeling may inadvertently compromise critical local structural information, thus leading to boundary ambiguity and regional distortion in segmentation outputs. Therefore, we propose the HybridMamba, an architecture employing dual complementary mechanisms: 1) a feature scanning strategy that progressively integrates representations both axial-traversal and local-adaptive pathways to harmonize the relationship between local and global representations, and 2) a gated module combining spatial-frequency analysis for comprehensive contextual modeling. Besides, we collect a multi-center CT dataset related to lung cancer. Experiments on MRI and CT datasets demonstrate that HybridMamba significantly outperforms the state-of-the-art methods in 3D medical image segmentation.

在3D生物医学图像分割领域,Mamba展现了卓越性能,因为它解决了CNN模型中长距离依赖建模的局限性,并减轻了基于Transformer的框架在处理高分辨率医学体积图像时的大量计算开销。然而,过分强调全局上下文建模可能会无意中损害关键局部结构信息,从而导致分割结果出现边界模糊和区域失真。因此,我们提出了HybridMamba架构,它采用双重互补机制:1)特征扫描策略,通过轴向遍历和局部自适应路径逐步集成表示,以协调局部和全局表示之间的关系;2)结合空间频率分析的门控模块,用于全面的上下文建模。此外,我们收集了一个多中心肺癌CT数据集。在MRI和CT数据集上的实验表明,HybridMamba在3D医学图像分割中显著优于最新方法。

论文及项目相关链接

PDF

Summary

Mamba在3D生物医学图像分割领域中表现优异,解决了CNN建模中的远程依赖问题并减轻了处理高分辨率医疗卷积时的计算开销。然而,过分强调全局上下文建模可能会无意中丢失关键局部结构信息,导致分割结果边界模糊和区域失真。因此,我们提出了HybridMamba架构,采用双重互补机制:一是特征扫描策略,逐步集成轴向遍历和本地自适应路径的表示,使局部和全局表示之间和谐共存;二是结合空间频率分析的受控模块,实现全面的上下文建模。此外,我们收集了关于肺癌的多中心CT数据集。在MRI和CT数据集上的实验表明,HybridMamba在3D医学图像分割方面显著优于现有技术。

Key Takeaways

  • Mamba在解决CNN建模远程依赖问题和处理高分辨率医疗卷积的计算开销方面表现出优势。
  • 过分强调全局上下文建模可能导致关键局部结构信息的丢失。
  • HybridMamba架构采用双重互补机制来平衡局部和全局信息。
  • 特征扫描策略逐步集成轴向遍历和本地自适应路径表示。
  • 受控模块结合空间频率分析实现全面的上下文建模。
  • 收集的多中心CT数据集用于研究肺癌相关的医学图像。
  • 在MRI和CT数据集上的实验显示HybridMamba显著优于现有技术。

Cool Papers

点此查看论文截图

The Stochastic Dissipation Model for the Steady State Neutrino and Multi-Wavelength Emission of TXS 0506+056

Authors:Zhen-Jie Wang, Ruo-Yu Liu, Xiang-Yu Wang

The blazar TXS 0506+056 has been suggested to be a potential high-energy neutrino source thanks to the observations of IceCube, which found outburst-like neutrino emissions during 2014-2015 and 2017 in the transient emission search, and a $3.5\sigma$ local significance in a 10-year time-integrated search. The conventional one-zone jet model cannot explain the observed neutrino flux during outbursts due to the constraint from the X-ray flux, leading to proposals of multi-zone models (e.g. two-zone model) with multiple radiation zones. In literature, it has been shown that multi-zone models may consistently explain the high-state neutrino emission and the multi-wavelength emission of TXS 0506+056, while the quasi-steady-state long-term emission has not been well studied. In this work, we investigate a physically based model for the quasi-steady-state neutrino and electromagnetic radiation under the same framework, and successfully reproduce the multi-messenger emission of TXS 0506+056.

暴耀体TXS 0506+056由于其观测结果被认为是潜在的高能中微子源。IceCube的观察结果显示,在瞬态发射搜索过程中,该源在2014-2015年和2017年表现出类似爆发的中微子发射,并且在为期十年的时间积分搜索中具有3.5σ的局部显著性。传统的单区喷射模型由于X射线流量的限制,无法解释爆发期间观察到的中微子流量,这导致了多区模型(例如两区模型)的提出,该模型具有多个辐射区。文献表明,多区模型可以一致地解释TXS 0506+056的高态中微子发射和多波长发射,而对其准稳态长期发射的研究尚不充分。在这项工作中,我们在同一框架下研究了基于物理的准稳态中微子和电磁辐射模型,并成功再现了TXS 0506+056的多信使发射。

论文及项目相关链接

PDF 10 pages, 3 figures, accepted for publication in PRD

Summary
天琴座BZBZ源TXS 0506+056观测到了疑似的高能中微子源爆发。基于十年时间综合观测和单次爆发观测数据,提出传统单区jet模型无法解释观测到的中微子流量爆发。文献中有研究显示多区模型可以解释高状态的中微子发射和多波段的TXS 0 506+056的发射,但对于其长期准稳态的发射研究尚不充分。本研究在同一框架下探究基于物理的准稳态中微子和电磁辐射模型,成功再现了TXS 0506+056的多信使发射。

Key Takeaways

  1. 天琴座BZBZ源TXS 0506+056被认为是潜在的高能中微子源。
  2. IceCube观测到该源在2014-2015年和2017年的爆发性中微子发射。
  3. 传统单区jet模型无法解释观测到的中微子流量爆发,因为受到X射线流量的限制。
  4. 多区模型(如两区模型)被提出以解释高状态的中微子发射和多波段的TXS 0506+056的发射。
  5. 多区模型能够一致地解释TXS 0506+056的高状态中微子发射和多波长发射。
  6. 关于TXS 0506+056的准稳态长期发射尚未得到充分研究。

Cool Papers

点此查看论文截图

HQCNN: A Hybrid Quantum-Classical Neural Network for Medical Image Classification

Authors: Shahjalal, Jahid Karim Fahim, Pintu Chandra Paul, Md Robin Hossain, Md. Tofael Ahmed, Dulal Chakraborty

Classification of medical images plays a vital role in medical image analysis; however, it remains challenging due to the limited availability of labeled data, class imbalances, and the complexity of medical patterns. To overcome these challenges, we propose a novel Hybrid Quantum-Classical Neural Network (HQCNN) for both binary and multi-class classification. The architecture of HQCNN integrates a five-layer classical convolutional backbone with a 4-qubit variational quantum circuit that incorporates quantum state encoding, superpositional entanglement, and a Fourier-inspired quantum attention mechanism. We evaluate the model on six MedMNIST v2 benchmark datasets. The HQCNN consistently outperforms classical and quantum baselines, achieving up to 99.91% accuracy and 100.00% AUC on PathMNIST (binary) and 99.95% accuracy on OrganAMNIST (multi-class) with strong robustness on noisy datasets like BreastMNIST (87.18% accuracy). The model demonstrates superior generalization capability and computational efficiency, accomplished with significantly fewer trainable parameters, making it suitable for data-scarce scenarios. Our findings provide strong empirical evidence that hybrid quantum-classical models can advance medical imaging tasks.

医学图像分类在医学图像分析中起着至关重要的作用;然而,由于标记数据有限、类别不平衡以及医学模式复杂性,它仍然是一个挑战。为了克服这些挑战,我们提出了一种新型的混合量子经典神经网络(HQCNN)进行二元和多元分类。HQCNN架构将五层经典卷积主干与包含量子态编码、叠加纠缠和傅立叶量子注意力机制的4量子比特变分量子电路相结合。我们在六个MedMNIST v2基准数据集上评估了该模型。HQCNN持续超越经典和量子基线模型,在PathMNIST(二元)上达到最高99.91%的准确率和100.00%的AUC值,在OrganAMNIST(多元)上达到最高99.95%的准确率,并且对诸如BreastMNIST这样的噪声数据集具有较强的鲁棒性(准确率达到了87.18%)。该模型展示了卓越的总体化能力和计算效率,实现了更少数量的可训练参数,使其成为数据稀缺场景的理想选择。我们的研究提供了强有力的实证证据表明混合量子经典模型可以促进医学成像任务的发展。

论文及项目相关链接

PDF 21 pages, 8 figures. Submitted to Quantum Journal. Corresponding author: Pintu Chandra Paul (pintu@cou.ac.bd)

Summary
医学图像分类在医学图像分析中起着至关重要的作用,但由于缺乏标记数据、类别不平衡和医学模式复杂性,仍存在挑战。为解决这些问题,提出了一种新型混合量子经典神经网络(HQCNN),可用于二元及多类分类。HQCNN架构融合了五层经典卷积主干与包含量子状态编码、叠加纠缠和傅里叶量子注意力机制的4量子比特变分量子电路。在六个MedMNIST v2基准数据集上评估该模型,HQCNN持续超越经典和量子基线,在PathMNIST(二元)上达到最高99.91%准确率和100.0% AUC,以及在OrganAMNIST(多类)上达到最高99.95%准确率。即使在噪音较大的数据集(如BreastMNIST的准确率最高达87.18%)中,该模型也表现出良好的鲁棒性。模型具有卓越泛化能力和计算效率,所需训练参数显著减少,适用于数据稀缺场景。这为混合量子经典模型在医学成像任务中的优势提供了强有力的实证证据。

Key Takeaways

  • 医学图像分类在医学图像分析中非常重要,但受限于缺乏标记数据、类别不平衡和复杂的医学模式。
  • 提出了一种新型的混合量子经典神经网络(HQCNN)来解决这些问题,适用于二元和多类分类。
  • HQCNN结合了经典卷积神经网络和量子计算技术,包括量子状态编码、叠加纠缠和傅里叶量子注意力机制。
  • 在多个基准数据集上评估HQCNN性能,表现出较高的准确率和鲁棒性。
  • HQCNN具有优秀的泛化能力和计算效率,尤其适用于数据稀缺场景。

Cool Papers

点此查看论文截图

Microlocal analysis of non-linear operators arising in Compton CT

Authors:James W. Webber, Sean Holman

We present a novel microlocal analysis of a non-linear ray transform, $\mathcal{R}$, arising in Compton Scattering Tomography (CST). Due to attenuation effects in CST, the integral weights depend on the reconstruction target, $f$, which has singularities. Thus, standard linear Fourier Integral Operator (FIO) theory does not apply as the weights are non-smooth. The V-line (or broken ray) transform, $\mathcal{V}$, can be used to model the attenuation of incoming and outgoing rays. Through novel analysis of $\mathcal{V}$, we characterize the location and strength of the singularities of the ray transform weights. In conjunction, we provide new results which quantify the strength of the singularities of distributional products based on the Sobolev order of the individual components. By combining this new theory, our analysis of $\mathcal{V}$, and classical linear FIO theory, we determine the Sobolev order of the singularities of $\mathcal{R}f$. The strongest (lowest Sobolev order) singularities of $\mathcal{R}f$ are shown to correspond to the wavefront set elements of the classical Radon transform applied to $f$, and we use this idea and known results on the Radon transform to prove injectivity results for $\mathcal{R}$. In addition, we present novel reconstruction methods based on our theory, and we validate our results using simulated image reconstructions.

我们对出现在康普顿散射断层扫描技术(CST)中的非线性射线变换 $\mathcal{R}$ 进行了一种新颖的局部分析。由于 CST 中的衰减效应,积分权重取决于具有奇异性的重建目标 $f$,因此标准的线性傅里叶积分算子理论无法适用,因为权重是非平滑的。V 线(或断线)变换 $\mathcal{V}$ 可用于模拟入射和出射射线的衰减。通过对 $\mathcal{V}$ 的新颖分析,我们确定了射线变换权重的奇异点的位置和强度。同时,我们提供了基于各个组件的索博列夫阶数来量化分布产品奇异点强度的新结果。通过结合这一新理论、我们对 $\mathcal{V}$ 的分析和经典的线性傅立叶积分算子理论,我们确定了 $\mathcal{R}f$ 奇异性的索波列夫阶数。我们发现 $\mathcal{R}f$ 的最强(最低索波列夫阶数)奇异点对应于经典 Radon 变换应用于 $f$ 的波前集合元素,我们使用这一思想以及 Radon 变换的已知结果来证明 $\mathcal{R}$ 的单射性结果。此外,我们基于我们的理论提出了新颖的重构方法,并使用模拟的图像重构来验证我们的结果。

论文及项目相关链接

PDF 27 pages, 8 figures

Summary

本文介绍了一种非线性射线变换$\mathcal{R}$在康普顿散射断层扫描(CST)中的新型微局部分析。由于CST中的衰减效应,积分权重取决于具有奇异性的重建目标$f$,因此标准的线性傅立叶积分算子理论不适用。通过V线(或断线)变换$\mathcal{V}$的新型分析,表征射线变换权重的奇异位置和强度。结合新的奇异强度量化结果和索博列夫阶数的分布产品,我们确定了$\mathcal{R}f$的索博列夫阶数。$\mathcal{R}f$的最强(最低的索博列夫阶数)奇异点对应于经典拉东变换应用于$f$的波前集元素,我们利用这一思想和拉东变换的已知结果证明了$\mathcal{R}$的注入性结果。此外,我们基于理论提出了新型重建方法,并通过模拟图像重建验证了我们的结果。

Key Takeaways

  1. 介绍了在康普顿散射断层扫描(CST)中的非线性射线变换$\mathcal{R}$的微局部分析。
  2. 由于衰减效应,CST中的积分权重具有奇异性,使得标准线性傅立叶积分算子理论不适用。
  3. 通过V线(或断线)变换$\mathcal{V}$分析,描述了射线变换权重的奇异位置和强度。
  4. 提供了新的结果,量化分布产品的奇异强度,并基于个别组件的索博列夫阶数。
  5. 确定了$\mathcal{R}f$的索博列夫阶数。
  6. $\mathcal{R}f$的最强奇异点与经典拉东变换应用于$f$的波前集元素相对应。

Cool Papers

点此查看论文截图

DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut

Authors:Paul Couairon, Mustafa Shukor, Jean-Emmanuel Haugeard, Matthieu Cord, Nicolas Thome

Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks. While prior works have addressed unsupervised image segmentation, they significantly lag behind supervised models. In this paper, we use a diffusion UNet encoder as a foundation vision encoder and introduce DiffCut, an unsupervised zero-shot segmentation method that solely harnesses the output features from the final self-attention block. Through extensive experimentation, we demonstrate that the utilization of these diffusion features in a graph based segmentation algorithm, significantly outperforms previous state-of-the-art methods on zero-shot segmentation. Specifically, we leverage a recursive Normalized Cut algorithm that softly regulates the granularity of detected objects and produces well-defined segmentation maps that precisely capture intricate image details. Our work highlights the remarkably accurate semantic knowledge embedded within diffusion UNet encoders that could then serve as foundation vision encoders for downstream tasks. Project page at https://diffcut-segmentation.github.io

跨语言、视觉和多模态任务等多个领域,基础模型已展现出强大的工具能力。虽然之前的工作已经解决了无监督图像分割的问题,但它们与有监督模型相比仍有较大差距。在本文中,我们使用扩散U-Net编码器作为基础视觉编码器,并引入了DiffCut,这是一种无监督零样本分割方法,它仅利用最终自注意力块的输出特征。通过大量实验,我们证明了在基于图的分割算法中使用这些扩散特征,在零样本分割方面显著优于先前最先进的技术。具体来说,我们采用递归归一化切割算法,该算法可以温和地控制检测对象的粒度,并产生定义明确的分割图,能够精确捕捉图像细节。我们的工作突出了扩散U-Net编码器中所嵌入的精确语义知识,之后可以作为下游任务的基础视觉编码器。项目页面为https://diffcut-segmentation.github.io。

论文及项目相关链接

PDF NeurIPS 2024. Project page at https://diffcut-segmentation.github.io. Code at https://github.com/PaulCouairon/DiffCut

Summary

本文介绍了一种基于扩散UNet编码器的无监督零分割方法DiffCut。该方法仅利用最终自注意力块的输出特征,通过基于图的分割算法,显著优于之前的零分割方法。通过递归归一化切割算法,可以精细地检测物体并生成精确的分割图,捕捉图像细节。本文强调了扩散UNet编码器中的语义知识,可作为下游任务的基础视觉编码器。

Key Takeaways

  1. 扩散UNet编码器作为基础视觉编码器在图像分割中的强大性能。
  2. 引入了一种新的无监督零分割方法DiffCut,利用扩散特征进行图像分割。
  3. DiffCut通过使用基于图的分割算法和递归归一化切割算法实现了精确的图像分割。
  4. DiffCut在零分割方面显著优于现有的最先进方法。
  5. Diffusion UNet编码器中的语义知识对图像分割任务至关重要。
  6. 提出的方法能够捕捉图像的细节并生成详细的分割图。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
TTS TTS
TTS 方向最新论文已更新,请持续关注 Update in 2025-09-20 MELA-TTS Joint transformer-diffusion model with representation alignment for speech synthesis
2025-09-20
下一篇 
Diffusion Models Diffusion Models
Diffusion Models 方向最新论文已更新,请持续关注 Update in 2025-09-20 Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model
  目录