发布日期: 2025-10-03

更新日期: 2025-11-27

文章字数: 21.2k

阅读时长: 88 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-03 更新

A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI

Authors:Arvind Murari Vepa, Yannan Yu, Jingru Gan, Anthony Cuturrufo, Weikai Li, Wei Wang, Fabien Scalzo, Yizhou Sun

We introduce mpLLM, a prompt-conditioned hierarchical mixture-of-experts (MoE) architecture for visual question answering over multi-parametric 3D brain MRI (mpMRI). mpLLM routes across modality-level and token-level projection experts to fuse multiple interrelated 3D modalities, enabling efficient training without image-report pretraining. To address limited image-text paired supervision, mpLLM integrates a synthetic visual question answering (VQA) protocol that generates medically relevant VQA from segmentation annotations, and we collaborate with medical experts for clinical validation. mpLLM outperforms strong medical VLM baselines by 5.3% on average across multiple mpMRI datasets. Our study features three main contributions: (1) the first clinically validated VQA dataset for 3D brain mpMRI, (2) a novel multimodal LLM that handles multiple interrelated 3D modalities, and (3) strong empirical results that demonstrate the medical utility of our methodology. Ablations highlight the importance of modality-level and token-level experts and prompt-conditioned routing.

我们介绍了mpLLM，这是一种针对多参数3D脑MRI（mpMRI）的视觉问答的提示条件分层专家混合（MoE）架构。mpLLM在模态级别和令牌级别投影专家之间进行路由，以融合多个相关的3D模态，实现有效的训练，无需图像报告预处理。为了解决有限的图像文本配对监督问题，mpLLM集成了一个合成视觉问答（VQA）协议，该协议从分割注释生成医学相关的VQA，并与医学专家进行临床验证。mpLLM在多个mpMRI数据集上的平均表现优于强大的医学VLM基准测试5.3%。我们的研究有三个主要贡献：（1）首个经过临床验证的3D脑mpMRI的VQA数据集，（2）一种处理多个相关3D模态的新型多模态LLM，以及（3）强大的实证结果，证明了我们方法论的医学实用性。消融研究突出了模态级别和令牌级别专家以及提示条件路由的重要性。

论文及项目相关链接

PDF 23 pages, 3 figures

Summary
本文介绍了mpLLM，这是一种用于多参数三维脑MRI（mpMRI）的视觉问答的提示条件分层混合专家（MoE）架构。mpLLM通过跨模态和令牌级别的投影专家路由，融合多个相关的三维模态，能够在没有图像报告预训练的情况下进行高效训练。针对有限的图像文本配对监督问题，mpLLM集成了一种合成视觉问答（VQA）协议，该协议可从分割注释生成医学相关的VQA，并与医学专家进行临床验证。mpLLM在多个mpMRI数据集上的平均表现优于强大的医学VLM基线模型，达到5.3%。本研究的主要贡献包括：（1）首个经过临床验证的用于三维脑mpMRI的视觉问答数据集，（2）一种处理多个相关三维模态的新型多模态大型语言模型，（3）强有力的实证结果证明了我们的方法具有医学应用价值。消融实验突出了模态级别和令牌级别专家以及提示条件路由的重要性。

Key Takeaways

mpLLM是一个用于多参数三维脑MRI的视觉问答的分层混合专家架构。
它通过跨模态和令牌级别的投影专家路由融合多个相关三维模态。
mpLLM在没有图像报告预训练的情况下进行高效训练。
集成合成视觉问答（VQA）协议以生成医学相关的VQA并进行临床验证。
mpLLM在多个数据集上的表现优于现有医学VLM模型。
研究的主要贡献包括临床验证的VQA数据集、处理多模态的大型语言模型和实证结果证明其医学价值。

Cool Papers

点此查看论文截图

Dolphin v1.0 Technical Report

Authors:Taohan Weng, Chi zhang, Chaoran Yan, Siya Liu, Xiaoyang Liu, Yalun Wu, Boyang Wang, Boyan Wang, Jiren Ren, Kaiwen Yan, Jinze Yu, Kaibing Hu, Henan Liu, Haoyun Zheng, Zhenyu Liu, Duo Zhang, Xiaoqing Guo, Anjie Le, Hongcheng Guo

Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound’s complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.

超声在现代医学中至关重要，但面临着操作者依赖、图像噪声和实时扫描等挑战，阻碍了人工智能的整合。虽然大型多模式模型在其他医学成像领域表现出色，但在应对超声的复杂性方面却遇到了困难。为了解决这一问题，我们推出了Dolphin v1.0（V1）及其增强推理版Dolphin R1——首个大规模超声多模式基础模型，在一个视觉语言框架下统一了多种临床任务。为了应对超声的变性和噪声问题，我们筛选了一个200万规模的多模式数据集，结合了教科书知识、公开数据、合成样本和一般语料库。这确保了稳健的感知、通用性和临床适应性。Dolphin系列采用三阶段训练策略：领域专业化预训练、指令驱动对齐和基于强化的精炼。Dolphin v1.0在分类、检测、回归和报告生成方面表现出可靠的性能。Dolphin R1通过强化学习与超声特定奖励相结合，增强了诊断推理、推理透明度和可解释性。在U2-Bench上评估的八个超声任务中，Dolphin R1取得了U2分数0.5835，是第二名最佳模型（0.2968）的两倍多，创造了新的技术纪录。Dolphin v1.0也表现出竞争力，验证了统一框架的有效性。对比结果显示，经过增强推理的训练显著提高了诊断的准确性、一致性和可解释性，突显其在高风险医疗人工智能中的重要性。

论文及项目相关链接

PDF

Summary

本文介绍了超声波在现代医学中的重要性及其所面临的挑战，如操作依赖性、图像噪声和实时扫描等。为了解决这些问题，文章提出了海豚v1.0及其增强版海豚R1，这是首个大规模的多模式超声波基础模型，在一个统一的视觉语言框架内融合了不同的临床任务。通过采用三阶段训练策略和大规模多模式数据集，海豚系列模型在分类、检测、回归和报告生成等方面表现出可靠性能，海豚R1在U2-Bench上的表现尤为突出，实现了业界领先的结果。

Key Takeaways

超声波在现代医学中至关重要，但面临操作依赖性、图像噪声和实时扫描等挑战。
多模式模型在解决超声波复杂性方面表现出色。
海豚v1.0及其增强版海豚R1是首个大规模的多模式超声波基础模型，统一了不同的临床任务。
通过采用三阶段训练策略和大规模多模式数据集，海豚系列模型展现出优秀性能。
海豚R1在U2-Bench上的表现突出，实现了业界领先的结果。
推理增强训练显著提高了诊断的准确性、一致性和可解释性。

Cool Papers

点此查看论文截图

U-MAN: U-Net with Multi-scale Adaptive KAN Network for Medical Image Segmentation

Authors:Bohan Huang, Qianyun Bao, Haoyuan Ma

Medical image segmentation faces significant challenges in preserving fine-grained details and precise boundaries due to complex anatomical structures and pathological regions. These challenges primarily stem from two key limitations of conventional U-Net architectures: (1) their simple skip connections ignore the encoder-decoder semantic gap between various features, and (2) they lack the capability for multi-scale feature extraction in deep layers. To address these challenges, we propose the U-Net with Multi-scale Adaptive KAN (U-MAN), a novel architecture that enhances the emerging Kolmogorov-Arnold Network (KAN) with two specialized modules: Progressive Attention-Guided Feature Fusion (PAGF) and the Multi-scale Adaptive KAN (MAN). Our PAGF module replaces the simple skip connection, using attention to fuse features from the encoder and decoder. The MAN module enables the network to adaptively process features at multiple scales, improving its ability to segment objects of various sizes. Experiments on three public datasets (BUSI, GLAS, and CVC) show that U-MAN outperforms state-of-the-art methods, particularly in defining accurate boundaries and preserving fine details.

医学图像分割在保留精细粒度和精确边界方面面临着重大挑战，这是由于复杂的解剖结构和病理区域所导致的。这些挑战主要源于传统U-Net架构的两个关键局限：（1）其简单的跳跃连接忽略了编码器-解码器之间各种特征的语义差距；（2）它们在深层中缺乏多尺度特征提取的能力。为了解决这些挑战，我们提出了具有多尺度自适应KAN（U-MAN）的U-Net，这是一种新型架构，它通过两个专用模块增强了新兴的Kolmogorov-Arnold网络（KAN）：渐进注意力引导特征融合（PAGF）和多尺度自适应KAN（MAN）。我们的PAGF模块使用注意力融合编码器和解码器的特征，替换了简单的跳跃连接。MAN模块使网络能够自适应地处理多尺度特征，提高了其分割各种尺寸物体的能力。在BUSI、GLAS和CVC三个公共数据集上的实验表明，U-MAN优于最先进的方法，特别是在定义精确边界和保留细节方面。

论文及项目相关链接

PDF

Summary

医学图像分割在保留精细粒度和精确边界方面面临挑战，源于传统U-Net架构的两个主要局限：一是简单的跳跃连接忽略了编码器与解码器之间的语义差距，二是缺乏深层多尺度特征提取能力。为解决这些问题，我们提出了带有多尺度自适应KAN（U-MAN）的新型U-Net架构，通过两个专门模块——渐进式注意力引导特征融合（PAGF）和多尺度自适应KAN（MAN）来增强Kolmogorov-Arnold网络（KAN）。PAGF模块用注意力融合编码器和解码器的特征，替代了简单的跳跃连接。MAN模块使网络能够自适应地处理多尺度特征，提高了分割各种尺寸物体的能力。在三个公共数据集上的实验表明，U-MAN优于最新方法，尤其在定义精确边界和保留细节方面。

Key Takeaways

医学图像分割面临保留精细粒度和精确边界的挑战。
传统U-Net架构存在两个主要局限：忽略编码器和解码器之间的语义差距，以及缺乏深层多尺度特征提取能力。
为解决这些挑战，提出了带有U-MAN的新型U-Net架构。
U-MAN包括两个专门模块：PAGF和MAN。
PAGF模块用注意力融合特征，替代了简单的跳跃连接。
MAN模块使网络能够自适应处理多尺度特征，提高分割各种尺寸物体的能力。

Cool Papers

点此查看论文截图

Diffusion Bridge Variational Inference for Deep Gaussian Processes

Authors:Jian Xu, Qibin Zhao, John Paisley, Delu Zeng

Deep Gaussian processes (DGPs) enable expressive hierarchical Bayesian modeling but pose substantial challenges for posterior inference, especially over inducing variables. Denoising diffusion variational inference (DDVI) addresses this by modeling the posterior as a time-reversed diffusion from a simple Gaussian prior. However, DDVI’s fixed unconditional starting distribution remains far from the complex true posterior, resulting in inefficient inference trajectories and slow convergence. In this work, we propose Diffusion Bridge Variational Inference (DBVI), a principled extension of DDVI that initiates the reverse diffusion from a learnable, data-dependent initial distribution. This initialization is parameterized via an amortized neural network and progressively adapted using gradients from the ELBO objective, reducing the posterior gap and improving sample efficiency. To enable scalable amortization, we design the network to operate on the inducing inputs, which serve as structured, low-dimensional summaries of the dataset and naturally align with the inducing variables’ shape. DBVI retains the mathematical elegance of DDVI, including Girsanov-based ELBOs and reverse-time SDEs,while reinterpreting the prior via a Doob-bridged diffusion process. We derive a tractable training objective under this formulation and implement DBVI for scalable inference in large-scale DGPs. Across regression, classification, and image reconstruction tasks, DBVI consistently outperforms DDVI and other variational baselines in predictive accuracy, convergence speed, and posterior quality.

深度高斯过程（DGPs）能够实现表达性层次化的贝叶斯建模，但给后验推断带来了实质性的挑战，特别是在诱导变量方面。降噪扩散变分推断（DDVI）通过模拟后验作为从简单高斯先验的时间反转扩散来解决这一问题。然而，DDVI的固定无条件起始分布与复杂的真实后验相差甚远，导致推理轨迹效率低下，收敛缓慢。

在这项工作中，我们提出了扩散桥变分推断（DBVI），这是一种DDVI的原则性扩展，它从可学习的、数据依赖的初始分布开始反向扩散。这个初始化是通过平均神经网络参数化的，并随着来自ELBO目标的梯度逐步适应，从而减少后验差距并提高了样本效率。为了实现可扩展的平均处理，我们设计网络在诱导输入上运行，这些输入作为结构化、低维度的数据集摘要，自然与诱导变量的形状对齐。DBVI保留了DDVI的数学优雅性，包括基于Girsanov的ELBOs和反向时间SDEs，同时通过Doob桥扩散过程重新解释先验。我们在此公式下推导了一个可行的训练目标，并实现了DBVI用于大规模DGP中的可扩展推理。在回归、分类和图像重建任务中，DBVI在预测精度、收敛速度和后验质量方面始终优于DDVI和其他变分基准。

论文及项目相关链接

PDF

Summary
扩散桥变分推理（DBVI）是一种基于深度高斯过程（DGPs）和去噪扩散变分推理（DDVI）的方法，它通过从可学习的、数据依赖的初始分布开始反向扩散，提高了推理轨迹的效率和收敛速度。DBVI采用摊销神经网络进行参数化，并使用ELBO目标的梯度进行逐步适应，缩小了后验差距并提高了样本效率。它对诱导输入进行操作，作为数据集的结构性、低维摘要，与自然对齐诱导变量的形状。DBVI在回归、分类和图像重建任务中表现出色，在预测准确性、收敛速度和后验质量方面均优于DDVI和其他变分基准方法。

Key Takeaways

Deep Gaussian Processes (DGPs) 提供了表达丰富的层次化贝叶斯建模，但后验推理存在挑战。
Denoising Diffusion Variational Inference (DDVI) 通过将后验建模为从简单高斯先验开始的逆向扩散过程来解决这个问题。
DDVI的无条件起始分布固定，与复杂的真实后验相差较远，导致推理轨迹不够高效且收敛缓慢。
Diffusion Bridge Variational Inference (DBVI) 是DDVI的扩展，它通过从可学习的、数据依赖的初始分布开始反向扩散，缩小了后验差距并提高了样本效率。
DBVI使用摊销神经网络进行参数化，并通过ELBO目标的梯度进行逐步适应。
DBVI对诱导输入进行操作，利用结构化、低维的数据摘要，提高了方法的可扩展性。

Cool Papers

点此查看论文截图

Physics-Guided Null-Space Diffusion with Sparse Masking for Corrective Sparse-View CT Reconstruction

Authors:Zekun Zhou, Yanru Gong, Liu Shi, Qiegen Liu

Diffusion models have demonstrated remarkable generative capabilities in image processing tasks. We propose a Sparse condition Temporal Rewighted Integrated Distribution Estimation guided diffusion model (STRIDE) for sparse-view CT reconstruction. Specifically, we design a joint training mechanism guided by sparse conditional probabilities to facilitate the model effective learning of missing projection view completion and global information modeling. Based on systematic theoretical analysis, we propose a temporally varying sparse condition reweighting guidance strategy to dynamically adjusts weights during the progressive denoising process from pure noise to the real image, enabling the model to progressively perceive sparse-view information. The linear regression is employed to correct distributional shifts between known and generated data, mitigating inconsistencies arising during the guidance process. Furthermore, we construct a dual-network parallel architecture to perform global correction and optimization across multiple sub-frequency components, thereby effectively improving the model capability in both detail restoration and structural preservation, ultimately achieving high-quality image reconstruction. Experimental results on both public and real datasets demonstrate that the proposed method achieves the best improvement of 2.58 dB in PSNR, increase of 2.37% in SSIM, and reduction of 0.236 in MSE compared to the best-performing baseline methods. The reconstructed images exhibit excellent generalization and robustness in terms of structural consistency, detail restoration, and artifact suppression.

扩散模型在图像处理任务中表现出了显著的生成能力。我们提出了一种用于稀疏视图CT重建的稀疏条件时间加权积分分布估计引导扩散模型（STRIDE）。具体来说，我们设计了一种由稀疏条件概率引导的联合训练机制，以促进模型有效地学习缺失投影视图的补全和全局信息建模。基于系统的理论分析，我们提出了一种时间变化的稀疏条件重加权引导策略，在从纯噪声到真实图像的渐进去噪过程中动态调整权重，使模型能够逐步感知稀疏视图信息。采用线性回归来校正已知数据和生成数据之间的分布偏移，减轻指导过程中产生的不一致性。此外，我们构建了一个双网络并行架构，以在多个子频率分量上执行全局校正和优化，从而有效地提高了模型在细节恢复和结构保持方面的能力，最终实现了高质量图像重建。在公共和真实数据集上的实验结果表明，与表现最佳的基线方法相比，该方法在PSNR上实现了2.58 dB的最佳提升，SSIM增加了2.37%，MSE减少了0.236。重建的图像在结构一致性、细节恢复和伪影抑制方面具有良好的通用性和鲁棒性。

论文及项目相关链接

PDF

Summary

本文提出一种基于稀疏条件时间加权积分分布估计的扩散模型（STRIDE），用于稀疏视图CT重建。通过联合训练机制和稀疏条件概率指导，模型能有效学习缺失投影视图的补全和全局信息建模。采用动态调整权重的时间变化稀疏条件重加权指导策略，在从纯噪声到真实图像的逐步去噪过程中逐步感知稀疏视图信息。通过线性回归校正已知和生成数据之间的分布偏移，减轻指导过程中的不一致性。此外，构建双网络并行架构，实现多子频率组件的全局校正和优化，提高模型在细节恢复和结构保持方面的能力，最终实现高质量图像重建。

Key Takeaways

提出的STRIDE模型利用扩散模型在图像处理任务中的生成能力，特别适用于稀疏视图CT重建。
通过联合训练机制和稀疏条件概率指导，模型能有效学习缺失投影视图的补全和全局信息建模。
采用时间变化的稀疏条件重加权策略，在去噪过程中动态调整权重，逐步感知稀疏视图信息。
通过线性回归校正分布偏移，减轻指导过程中的不一致性。
构建双网络并行架构，实现多子频率组件的全局校正和优化。
实验结果展示，与最佳基线方法相比，所提方法在PSNR上提高了2.58dB，在SSIM上增加了2.37%，在MSE上减少了0.236。

Cool Papers

点此查看论文截图

Imagining Alternatives: Towards High-Resolution 3D Counterfactual Medical Image Generation via Language Guidance

Authors:Mohamed Mohamed, Brennan Nichyporuk, Douglas L. Arnold, Tal Arbel

Vision-language models have demonstrated impressive capabilities in generating 2D images under various conditions; however, the success of these models is largely enabled by extensive, readily available pretrained foundation models. Critically, comparable pretrained models do not exist for 3D, significantly limiting progress. As a result, the potential of vision-language models to produce high-resolution 3D counterfactual medical images conditioned solely on natural language remains unexplored. Addressing this gap would enable powerful clinical and research applications, such as personalized counterfactual explanations, simulation of disease progression, and enhanced medical training by visualizing hypothetical conditions in realistic detail. Our work takes a step toward this challenge by introducing a framework capable of generating high-resolution 3D counterfactual medical images of synthesized patients guided by free-form language prompts. We adapt state-of-the-art 3D diffusion models with enhancements from Simple Diffusion and incorporate augmented conditioning to improve text alignment and image quality. To our knowledge, this is the first demonstration of a language-guided native-3D diffusion model applied to neurological imaging, where faithful three-dimensional modeling is essential. On two neurological MRI datasets, our framework simulates varying counterfactual lesion loads in Multiple Sclerosis and cognitive states in Alzheimer’s disease, generating high-quality images while preserving subject fidelity. Our results lay the groundwork for prompt-driven disease progression analysis in 3D medical imaging. Project link - https://lesupermomo.github.io/imagining-alternatives/.

视觉语言模型在各种条件下生成2D图像的能力令人印象深刻。然而，这些模型的成功在很大程度上得益于广泛且易于获取的预训练基础模型。关键的是，3D领域的预训练模型尚不存在，这极大地限制了进展。因此，视觉语言模型仅根据自然语言生成高分辨率的3D反事实医疗图像的潜力尚未被探索。弥补这一空白将能推动强大的临床和研究应用，如个性化的反事实解释、疾病进展模拟，以及通过详细展示假设条件来增强医疗培训。我们的工作朝着这一挑战迈出了一步，我们引入了一个框架，该框架能够根据自由形式的语言提示，生成由合成患者引导的高分辨率的3D反事实医疗图像。我们改进了最先进的3D扩散模型，加入了Simple Diffusion的增强功能，并增加了附加条件，以提高文本对齐和图像质量。据我们所知，这是首次将语言引导的本地3D扩散模型应用于神经成像，忠实的三维建模在这里至关重要。我们的框架在两组神经MRI数据集上模拟了多发性硬化症的多种反事实病变负荷和阿尔茨海默病中的认知状态，生成高质量图像的同时保持了主体保真度。我们的研究为基于提示的疾病进展分析在3D医疗成像方面的应用奠定了基础。项目链接：https://lesupermomo.github.io/imagining-alternatives/。

论文及项目相关链接

PDF Accepted to the 2025 MICCAI ELAMI Workshop

Summary

本文介绍了视觉语言模型在生成二维图像方面的出色表现，但其在三维图像生成领域的应用仍受限。由于缺乏相应的预训练模型，视觉语言模型在生成高分辨率的三维虚构医疗图像方面的潜力尚未被探索。本文引入了一个框架，能够通过自由形式的文字提示生成高分辨率的三维虚构医疗图像，为临床和研究应用提供了可能，如个性化反事实解释、疾病进展模拟和增强医疗训练等。该框架首次将语言引导的三维扩散模型应用于神经成像，并在两个神经MRI数据集上进行了模拟测试。

Key Takeaways

视觉语言模型在生成二维图像方面表现出色，但在三维图像生成上受预训练模型缺乏的限制。
缺乏相应的预训练模型限制了视觉语言模型在生成高分辨率三维虚构医疗图像方面的应用。
引入了一个框架，能够通过自由形式的文字提示生成高分辨率的三维虚构医疗图像。
该框架为临床和研究应用提供了可能，如个性化反事实解释、疾病进展模拟和增强医疗训练。
该框架首次将语言引导的三维扩散模型应用于神经成像。
框架在两个神经MRI数据集上进行了模拟测试，生成了高质量的图片，并保持了主题忠实度。

Cool Papers

点此查看论文截图

Can General-Purpose Omnimodels Compete with Specialists? A Case Study in Medical Image Segmentation

Authors:Yizhe Zhang, Qiang Chen, Tao Zhou

The emergence of powerful, general-purpose omnimodels capable of processing diverse data modalities has raised a critical question: can these jack-of-all-trades'' systems perform on par with highly specialized models in knowledge-intensive domains? This work investigates this question within the high-stakes field of medical image segmentation. We conduct a comparative study analyzing the zero-shot performance of a state-of-the-art omnimodel (Gemini, the Nano Banana’’ model) against domain-specific deep learning models on three distinct tasks: polyp (endoscopy), retinal vessel (fundus), and breast tumor segmentation (ultrasound). Our study focuses on performance at the extremes by curating subsets of the easiest'' and hardest’’ cases based on the specialist models’ accuracy. Our findings reveal a nuanced and task-dependent landscape. For polyp and breast tumor segmentation, specialist models excel on easy samples, but the omnimodel demonstrates greater robustness on hard samples where specialists fail catastrophically. Conversely, for the fine-grained task of retinal vessel segmentation, the specialist model maintains superior performance across both easy and hard cases. Intriguingly, qualitative analysis suggests omnimodels may possess higher sensitivity, identifying subtle anatomical features missed by human annotators. Our results indicate that while current omnimodels are not yet a universal replacement for specialists, their unique strengths suggest a potential complementary role with specialist models, particularly in enhancing robustness on challenging edge cases.

通用多模态全功能模型的出现引发了关键问题：这些“无所不能”的系统能否在知识密集型领域与高度专业化的模型表现相当？本研究在医疗图像分割这一高风险领域探讨了这个问题。我们对最先进的全功能模型Gemini（“Nano Banana”模型）进行了比较研究，分析了其在三项不同任务上的零样本性能表现，并与针对特定领域的深度学习模型进行了比较：息肉（内窥镜）、视网膜血管（眼底）和肿瘤分割（超声）。我们的研究侧重于基于专业模型的准确性筛选出的“最容易”和“最困难”案例子集的性能表现。研究发现了一个微妙且依赖于特定任务的景象。在息肉和肿瘤分割方面，专业模型在简单样本上表现出色，但全功能模型在困难样本上表现出了更大的稳健性，在这些困难样本上专业模型遭遇了灾难性的失败。相反，对于精细的视网膜血管分割任务，专业模型在简单和困难案例中都保持了卓越的性能。有趣的是，定性分析表明全功能模型可能具有更高的敏感性，能够识别出人类注释器遗漏的微妙解剖特征。我们的结果表明，虽然当前的全功能模型还无法普遍取代专家，但其独特的优势表明它们可能与专家模型互补，特别是在提高处理具有挑战性的边缘案例的稳健性方面。

论文及项目相关链接

PDF 15 pages, 7 figures

Summary

本文探讨了全功能omnimodel在医学图像分割领域内的表现。研究通过对比先进的omnimodel（Gemini模型）与特定领域的深度学习模型，在三种不同任务上的零样本性能表现，发现其在极端情况下的表现呈现出微妙的任务依赖性。在息肉和肿瘤分割方面，特定任务的模型在简单样本上表现优秀，而omnimodel在复杂样本上表现出更大的稳健性。而在视网膜血管分割等精细任务上，特定任务模型在简单和复杂案例中均表现出卓越性能。此外，定性分析表明omnimodel可能具有更高的敏感性，能够识别出人类注释器遗漏的细微解剖特征。因此，尽管omnimodel尚未成为专业模型的全面替代者，但其独特优势表明它们可能与专业模型互补，特别是在提高挑战性边缘案例的稳健性方面。

Key Takeaways

全功能omnimodel在医学图像分割领域内的表现被研究并和特定领域的深度学习模型进行了对比。
在息肉和肿瘤分割方面，特定任务的模型在简单样本上表现优秀，而omnimodel在复杂样本上展现出更大的稳健性。
在视网膜血管分割等精细任务上，特定任务模型表现出卓越性能，omnimodel相对较弱。
omnimodel具有更高的敏感性，能够识别出人类注释器遗漏的细微解剖特征。
omnimodel尚未成为专业模型的全面替代者，但在某些情况下可以与其互补。
在处理挑战性边缘案例时，omnimodel的稳健性潜力得到了突显。

Cool Papers

点此查看论文截图

CADDesigner: Conceptual Design of CAD Models Based on General-Purpose Agent

Authors:Jingzhe Ni, Xiaolong Yin, Xingyu Lu, Xintong Li, Ji Wei, Ruofeng Tong, Min Tang, Peng Du

Computer-Aided Design (CAD) plays a pivotal role in industrial manufacturing but typically requires a high level of expertise from designers. To lower the entry barrier and improve design efficiency, we present an agent for CAD conceptual design powered by large language models (LLMs). The agent accepts both abstract textual descriptions and freehand sketches as input, engaging in interactive dialogue with users to refine and clarify design requirements through comprehensive requirement analysis. Built upon a novel Context-Independent Imperative Paradigm (CIP), the agent generates high-quality CAD modeling code. During the generation process, the agent incorporates iterative visual feedback to improve model quality. Generated design cases are stored in a structured knowledge base, enabling continuous improvement of the agent’s code generation capabilities. Experimental results demonstrate that our method achieves state-of-the-art performance in CAD code generation.

计算机辅助设计（CAD）在工业制造中扮演着至关重要的角色，通常需要设计师具备高水平的专业知识。为了降低入门门槛并提高设计效率，我们提出了一种由大型语言模型（LLM）驱动的CAD概念设计代理。该代理接受抽象的文本描述和手绘草图作为输入，通过全面的需求分析与用户进行互动对话，以细化和明确设计要求。代理建立在新型上下文独立指令范式（CIP）之上，可生成高质量的CAD建模代码。在生成过程中，代理会结合迭代视觉反馈来提高模型质量。生成的设计案例存储在结构化知识库中，能够持续改善代理的代码生成能力。实验结果证明，我们的方法在CAD代码生成方面达到了最新技术水平。

论文及项目相关链接

PDF The theoretical proof of Context-Independent Imperative Paradigm is flawed; I request withdrawal of the manuscript

Summary

基于计算机辅助设计（CAD）在工业制造中的重要性，以及设计师所需的高水平专业知识，我们提出了一种由大型语言模型（LLM）驱动的CAD概念设计代理。该代理接受抽象文本描述和手绘草图作为输入，通过全面的需求分析与用户进行交互对话，以澄清和细化设计要求。它采用新颖的Context-Independent Imperative Paradigm（CIP）生成高质量的CAD建模代码。在生成过程中，代理会结合迭代视觉反馈来提高模型质量。生成的案例被存储在结构化知识库中，使代理的代码生成能力得以持续改进。实验结果表明，我们的方法在CAD代码生成方面达到了最新技术水平。

Key Takeaways

该代理使用大型语言模型（LLM）技术为计算机辅助设计（CAD）提供支持。
代理接受抽象文本描述和手绘草图作为输入，便于用户操作。
通过全面的需求分析，代理能与用户进行交互对话以澄清和细化设计要求。
采用新颖的Context-Independent Imperative Paradigm（CIP）生成高质量的CAD建模代码。
在生成CAD建模代码的过程中，代理会结合迭代视觉反馈来提高模型质量。
生成的案例被存储在结构化知识库中，使代理能够持续改进其代码生成能力。

Cool Papers

点此查看论文截图

MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis

Authors:José Morano, Botond Fazekas, Emese Sükei, Ronald Fecso, Taha Emre, Markus Gumpinger, Georg Faustmann, Marzieh Oghbaie, Ursula Schmidt-Erfurth, Hrvoje Bogunović

Artificial intelligence (AI) has become a fundamental tool for assisting clinicians in analyzing ophthalmic images, such as optical coherence tomography (OCT). However, developing AI models often requires extensive annotation, and existing models tend to underperform on independent, unseen data. Foundation models (FMs), large AI models trained on vast unlabeled datasets, have shown promise in overcoming these challenges. Nonetheless, available FMs for ophthalmology lack extensive validation, especially for segmentation tasks, and focus on a single imaging modality. In this context, we propose MIRAGE, a novel multimodal FM for the analysis of OCT and scanning laser ophthalmoscopy (SLO) images. Additionally, we propose a new evaluation benchmark with OCT/SLO classification and segmentation tasks. The comparison with general and specialized FMs and segmentation methods shows the superiority of MIRAGE in both types of tasks, highlighting its suitability as a basis for the development of robust AI systems for retinal OCT image analysis. Both MIRAGE and the evaluation benchmark are publicly available: https://github.com/j-morano/MIRAGE.

人工智能（AI）已经成为协助临床医生分析眼科图像的重要工具，如光学相干断层扫描（OCT）。然而，开发AI模型通常需要大量的标注，现有模型在独立、未见数据上的表现往往不佳。基础模型（FMs）是在大量无标签数据集上训练的庞大AI模型，在克服这些挑战方面显示出潜力。然而，现有的眼科基础模型缺乏广泛的验证，尤其是在分割任务方面，而且专注于单一的成像模式。在这种情况下，我们提出了MIRAGE，这是一种用于分析OCT和扫描激光眼科检查（SLO）图像的新型多模态基础模型。此外，我们还提出了一个新的评估基准，包括OCT/SLO分类和分割任务。与通用和基础模型以及分割方法的比较表明，MIRAGE在这两种类型的任务中都表现出卓越的性能，证明了其作为开发用于视网膜OCT图像分析的稳健AI系统的基础的适用性。MIRAGE和评估基准均可公开访问：https://github.com/j-morano/MIRAGE。

论文及项目相关链接

PDF Accepted for publication in npj Digital Medicine

Summary

AI在眼科图像分析中的应用日益普及，尤其在光学相干断层扫描（OCT）方面。然而，开发AI模型需要大量标注数据，现有模型在新数据上表现不佳。为解决这些问题，研究者提出了基础模型（FMs），这些大型AI模型在大量无标签数据集上进行训练。尽管如此，现有的眼科基础模型在验证方面存在不足，特别是在分割任务上，且主要关注单一成像模式。为此，研究者提出了MIRAGE这一新型的多模式基础模型，用于分析OCT和扫描激光眼科检查（SLO）图像。此外，他们还建立了一个新的评估基准，包括OCT/SLO分类和分割任务。与通用和基础模型及分割方法的比较显示，MIRAGE在两类任务中都表现出卓越性能，适合作为开发稳健的视网膜OCT图像分析AI系统的基础。MIRAGE和评估基准均已公开可用。

Key Takeaways

AI已成为眼科图像分析的重要工具，特别是在OCT方面。
开发AI模型需要大量标注数据，现有模型在新数据上表现不稳定。
基础模型（FMs）在解决这些问题方面显示出潜力。
现有的眼科基础模型在验证方面存在不足，特别是在多模式图像分析方面。
MIRAGE是一种新型的多模式基础模型，可用于分析OCT和SLO图像。
MIRAGE在分类和分割任务中都表现出卓越性能。

Cool Papers

点此查看论文截图

Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining

Authors:Daniele Molino, Camillo Maria Caruso, Filippo Ruffini, Paolo Soda, Valerio Guarrasi

Objective: While recent advances in text-conditioned generative models have enabled the synthesis of realistic medical images, progress has been largely confined to 2D modalities such as chest X-rays. Extending text-to-image generation to volumetric CT remains a significant challenge, due to its high dimensionality, anatomical complexity, and the absence of robust frameworks that align vision-language data in 3D medical imaging. Methods: We introduce a novel architecture for Text-to-CT generation that combines a latent diffusion model with a 3D contrastive vision-language pretraining scheme. Our approach leverages a dual-encoder CLIP-style model trained on paired CT volumes and radiology reports to establish a shared embedding space, which serves as the conditioning input for generation. CT volumes are compressed into a low-dimensional latent space via a pretrained volumetric VAE, enabling efficient 3D denoising diffusion without requiring external super-resolution stages. Results: We evaluate our method on the CT-RATE dataset and conduct a comprehensive assessment of image fidelity, clinical relevance, and semantic alignment. Our model achieves competitive performance across all tasks, significantly outperforming prior baselines for text-to-CT generation. Moreover, we demonstrate that CT scans synthesized by our framework can effectively augment real data, improving downstream diagnostic performance. Conclusion: Our results show that modality-specific vision-language alignment is a key component for high-quality 3D medical image generation. By integrating contrastive pretraining and volumetric diffusion, our method offers a scalable and controllable solution for synthesizing clinically meaningful CT volumes from text, paving the way for new applications in data augmentation, medical education, and automated clinical simulation. Code at https://github.com/cosbidev/Text2CT.

目标：尽管近期文本条件生成模型取得了进展，使得合成逼真的医学图像成为可能，但进展主要局限于如胸部X射线等2D模式。将文本到图像的生成扩展到体积CT仍然是一个重大挑战，这是由于其高维度、解剖结构复杂，以及缺乏能够在3D医学图像中对齐视觉语言数据的稳健框架。方法：我们介绍了一种用于文本到CT生成的新型架构，该架构结合了潜在扩散模型与3D对比视觉语言预训练方案。我们的方法利用在配对CT体积和放射学报告上训练的双重编码器CLIP风格模型，建立一个共享嵌入空间，作为生成的条件输入。CT体积通过预训练的体积VAE压缩到低维潜在空间，从而实现高效的3D去噪扩散，而无需外部超分辨率阶段。结果：我们在CT-RATE数据集上评估了我们的方法，并对图像保真度、临床相关性和语义对齐进行了全面评估。我们的模型在所有任务上都表现出竞争力，尤其是在文本到CT生成方面显著超越了先前的基准测试。此外，我们证明了我们框架合成的CT扫描可以有效地增强真实数据，提高下游诊断性能。结论：我们的结果表明，特定模态的视觉语言对齐是高质量3D医学图像生成的关键组成部分。通过集成对比预训练和体积扩散，我们的方法提供了一种可扩展和可控的解决方案，可以根据文本合成具有临床意义的CT体积，为数据增强、医学教育和自动化临床模拟等领域开辟新的应用途径。代码地址：https://github.com/cosbidev/Text2CT。

论文及项目相关链接

PDF

Summary

本文介绍了一种将文本转化为CT图像的新方法，结合了潜在扩散模型和三维对比视觉语言预训练方案。该方法使用在CT体积和放射学报告上训练的双重编码器CLIP风格模型，建立共享嵌入空间，作为生成的条件输入。通过预训练的体积VAE将CT体积压缩到低维潜在空间，实现了高效的3D降噪扩散，无需外部超分辨率阶段。该方法在CT-RATE数据集上表现优异，图像保真度、临床相关性和语义对齐均优于先前基线。合成的CT扫描可有效地扩充真实数据，提高下游诊断性能。本研究为高质量的三维医学图像生成提供了一种可扩展和可控的解决方案，具有数据增强、医学教育和自动化临床模拟等潜在应用。

Key Takeaways

文本转化为CT图像的方法结合了潜在扩散模型和三维对比视觉语言预训练方案，应对高维度和复杂的解剖结构挑战。
使用双重编码器CLIP风格模型，基于配对的CT体积和放射学报告建立共享嵌入空间，为生成提供条件输入。
通过预训练的体积VAE压缩CT体积至低维潜在空间，实现高效的3D降噪扩散。
在CT-RATE数据集上表现优异，图像保真度、临床相关性和语义对齐均优于先前方法。
合成的CT扫描可扩充真实数据，提高下游诊断性能。
方法具有可扩展性和可控性，为高质量的三维医学图像生成提供了解决方案。

Cool Papers

点此查看论文截图

GOUHFI: a novel contrast- and resolution-agnostic segmentation tool for Ultra-High Field MRI

Authors:Marc-Antoine Fortin, Anne Louise Kristoffersen, Michael Staff Larsen, Laurent Lamalle, Ruediger Stirnberg, Paal Erik Goa

Recently, Ultra-High Field MRI (UHF-MRI) has become more available and one of the best tools to study the brain. One common step in quantitative neuroimaging is to segment the brain into several regions, which has been done using software packages like FreeSurfer , FastSurferVINN or SynthSeg. However, the differences between UHF-MRI and 1.5T or 3T images are such that the automatic segmentation techniques optimized at these field strengths usually produce unsatisfactory segmentation results for UHF images. Thus, it has been particularly challenging to perform region-based quantitative analyses as typically done with 1.5-3T data, underscoring the crucial need for developing new automatic segmentation techniques designed to handle UHF images. Hence, we propose a novel Deep Learning (DL)-based segmentation technique called GOUHFI: Generalized and Optimized segmentation tool for Ultra-High Field Images, designed to segment UHF images of various contrasts and resolutions. For training, we used a total of 206 label maps from datasets acquired at 3T, 7T and 9.4T. In contrast to most DL strategies, we used a domain randomization approach, where synthetic images were used to train a 3D U-Net. GOUHFI was tested on seven different datasets and compared to existing techniques like FastSurferVINN,SynthSeg and CEREBRUM-7T. GOUHFI was able to segment the six contrasts and seven resolutions tested at 3T, 7T and 9.4T. Average Dice scores of 0.90, 0.90 and 0.93 were computed against the ground truth segmentations at 3T, 7T and 9.4T, respectively. Ultimately, GOUHFI is a promising new segmentation tool, being the first of its kind proposing a contrast- and resolution-agnostic alternative for UHF-MRI without requiring fine-tuning or retraining, making it the forthcoming alternative for neuroscientists working with UHF-MRI or even lower field strengths.

最近，超高场磁共振成像（UHF-MRI）越来越普及，成为研究大脑的最佳工具之一。定量神经成像中的一个常见步骤是将大脑分割成几个区域，这已通过软件包如FreeSurfer、FastSurferVINN或SynthSeg完成。然而，UHF-MRI与1.5T或3T图像之间的差异使得在这些场强下优化的自动分割技术通常会产生令人不满意的UHF图像分割结果。因此，像通常使用1.5-3T数据那样进行基于区域的定量分析具有特别大的挑战性，这强调了对开发专门处理UHF图像的新型自动分割技术的迫切需求。因此，我们提出了一种基于深度学习（DL）的新的分割技术，称为GOUHFI：用于超高场图像的通用和优化分割工具，旨在分割具有各种对比度和分辨率的UHF图像。为了训练，我们使用了从3T、7T和9.4T采集的数据集的总共206个标签图。与大多数深度学习策略不同，我们采用了领域随机化方法，使用合成图像来训练3D U-Net。GOUHFI在七个不同的数据集上进行了测试，并与现有的技术如FastSurferVINN、SynthSeg和CEREBRUM-7T进行了比较。GOUHFI能够在3T、7T和9.4T测试的六个对比度和七个分辨率上进行分割。相对于在3T、7T和9.4T的基准分割，其平均Dice得分分别为0.90、0.90和0.93。最终，GOUHFI是一个很有前途的新分割工具，它是第一个提出一种对比度和分辨率不变的替代方案，适用于UHF-MRI而无需微调或重新训练，使其成为从事UHF-MRI或甚至较低场强的神经科学家们的未来首选工具。

论文及项目相关链接

PDF 51 pages, 10 Figures, 7 Tables, Accepted for publication to Imaging Neuroscience after being peer-reviewed on 29-09-25

Summary

一种全新的基于深度学习的超高频MRI图像分割工具——GOUHFI，设计用于分割具有不同对比度和分辨率的超高频图像。该方法通过合成图像训练3D U-Net网络，具有对对比度与分辨率的通用性，无需微调或重新训练。在多个数据集上的测试表明，其分割效果良好，平均Dice得分较高。

Key Takeaways

UHF-MRI成为研究大脑的最佳工具之一，但自动分割技术在超高频图像上的应用具有挑战性。
提出了一种新型的基于深度学习的分割工具GOUHFI，适用于超高频图像，可处理不同对比度和分辨率的图像。
GOUHFI通过使用合成图像训练3D U-Net网络，采用领域随机化方法。
GOUHFI在多个数据集上进行了测试，与现有技术相比表现出良好的分割效果。
GOUHFI具有广泛的应用前景，为神经科学家提供一种新的超高频MRI图像分割工具。
GOUHFI具有对比度和分辨率的通用性，无需针对特定数据集进行微调或重新训练。

Cool Papers

点此查看论文截图

First Results on the Search for Lepton Number Violating Neutrinoless Double Beta Decay with the LEGEND-200 Experiment

Authors:H. Acharya, N. Ackermann, M. Agostini, A. Alexander, C. Andreoiu, G. R. Araujo, F. T. Avignone III, M. Babicz, W. Bae, A. Bakalyarov, M. Balata, A. S. Barabash, P. S. Barbeau, C. J. Barton, L. Baudis, C. Bauer, E. Bernieri, L. Bezrukov, K. H. Bhimani, V. Biancacci, E. Blalock, S. J. Borden, G. Borghi, F. Borra, B. Bos, A. Boston, V. Bothe, R. Bouabid, R. Brugnera, N. Burlac, M. Busch, S. Calgaro, L. Canonica, S. Capra, M. Carminati, R. M. D. Carney, C. Cattadori, R. Cesarano, Y. -D. Chan, J. R. Chapman, A. Chernogorov, P. -J. Chiu, C. D. Christofferson, M. L. Clark, A. I. Colon-Rivera, T. Comellato, V. D’Andrea, R. Deckert, J. A. Detwiler, A. Di Giacinto, N. Di Marco, T. Dixon, K. -M. Dong, A. Drobizhev, G. Duran, Yu. Efremenko, S. R. Elliott, C. H. J. Emmanuel, E. Engelhardt, E. Esch, M. T. Febbraro, F. Ferella, D. E. Fields, C. Fiorini, M. Fomina, N. Fuad, R. Gala, A. Galindo-Uribarri, A. Gangapshev, A. Garfagnini, S. Gazzana, A. Geraci, L. Gessler, C. Ghiano, A. Gieb, S. Giri, M. Gold, C. Gooch, G. Grünauer, M. P. Green, J. Gruszko, I. Guinn, V. E. Guiseppe, V. Gurentsov, Y. Gurov, K. Gusev, B. Hackett, F. Hagemann, M. Haranczyk, F. Henkes, R. Henning, J. Herrera, D. Hervas Aguilar, J. Hinton, R. Hodák, H. F. R. Hoffmann, M. A. Howe, M. Huber, M. Hult, A. Ianni, K. Jędrzejczak, J. Jochum, R. W. L. Jones, D. S. Judson, M. Junker, J. Kaizer, V. Kazalov, M. F. Kidd, T. Kihm, K. Kilgus, A. Klimenko, K. T. Knöpfle, I. Kochanek, O. Kochetov, I. Kontul, L. L. Kormos, V. N. Kornoukhov, P. Krause, H. Krishnamoorthy, V. V. Kuzminov, K. Lang, M. Laubenstein, N. N. P. N. Lay, E. León, A. Leder, B. Lehnert, A. Leonhardt, N. Levashko, L. Y. Li, A. Li, Y. -R. Lin, M. Lindner, I. Lippi, A. Love, A. Lubashevskiy, B. Lubsandorzhiev, N. Lusardi, C. Macolino, B. Majorovits, F. Mamedov, L. Manzanillas, G. G. Marshall, R. D. Martin, E. L. Martin, R. Massarczyk, A. Mazumdar, G. McDowell, D. -M. Mei, S. P. Meireles, M. Menzel, S. Mertens, E. Miller, I. Mirza, M. Misiaszek, M. Morella, B. Morgan, T. Mroz, D. Muenstermann, C. J. Nave, I. Nemchenok, M. Neuberger, N. O’Briant, F. Paissan, L. Papp, L. S. Paudel, K. Pelczar, L. Pertoldi, W. Pettus, F. Piastra, M. Pichotta, P. Piseri, A. W. P. Poon, P. P. Povinec, M. Pruckner, A. Pullia, W. S. Quinn, D. C. Radford, Y. A. Ramachers, A. Razeto, M. Redchuk, A. L. Reine, S. Riboldi, K. Rielage, C. Romo-Luque, N. Rossi, S. Rozov, T. J. Ruland, N. Rumyantseva, J. Runge, R. Saakyan, S. Sailer, G. Salamanna, F. Salamida, G. Saleh, V. Sandukovsky, C. Savarese, S. Schönert, A. -K. Schütz, D. C. Schaper, L. Schlüter, S. J. Schleich, O. Schulz, M. Schwarz, B. Schwingenheuer, C. Seibt, O. Selivanenko, G. Senatore, A. Serafini, K. Shakhov, E. Shevchik, M. Shirchenko, Y. Shitov, H. Simgen, F. Šimkovic, S. Simonaitis-Boyd, M. Skorokhvatov, M. Slavíčková, A. Smolnikov, J. A. Solomon, G. Song, A. C. Sousa, A. R. Sreekala, L. Steinhart, I. Štekl, T. Sterr, M. Stommel, S. A. Sullivan, R. R. Sumathi, K. Szczepaniec, L. Taffarello, D. Tagnani, D. J. Tedeschi, T. N. Thorpe, V. Tretyak, M. Turqueti, E. E. Van Nieuwenhuizen, L. J. Varriano, S. Vasilyev, A. Veresnikova, C. Vignoli, C. Vogl, K. von Sturm, A. Warren, D. Waters, S. L. Watkins, C. Wiesinger, J. F. Wilkerson, M. Willers, C. Wiseman, M. Wojcik, D. Xu, W. Xu, E. Yakushev, T. Ye, C. -H. Yu, V. Yumatov, D. Zinatulina, K. Zuber, G. Zuzel

The LEGEND collaboration is searching for neutrinoless double beta ($0\nu\beta\beta$) decay by operating high-purity germanium detectors enriched in $^{76}$Ge in a low-background liquid argon environment. Building on key technological innovations from GERDA and the MAJORANA DEMONSTRATOR, LEGEND-200 has performed a first $0\nu\beta\beta$ decay search based on 61.0 kg yr of data. Over half of this exposure comes from our highest performing detectors, including newly developed inverted-coaxial detectors, and is characterized by an estimated background level of $0.5^{+0.3}{-0.2}$ cts/(keV kg yr) in the $0\nu\beta\beta$ decay signal region. A combined analysis of data from GERDA, the MAJORANA DEMONSTRATOR, and LEGEND-200, characterized by a 90% confidence level exclusion sensitivity of $2.8 \times 10^{26}$ yr on the half-life of $0\nu\beta\beta$ decay, reveals no evidence for a signal and sets a new observed lower limit at $T^{0\nu}{1/2} > 1.9 \times 10^{26}$ yr (90% confidence level). Assuming the decay is mediated by Majorana neutrinos, this corresponds to an upper limit on the effective Majorana mass in the range $m_{\beta\beta} < 75-200$ meV, depending on the adopted nuclear matrix element.

传奇（LEGEND）合作正在低背景液氩环境中操作高纯度锗探测器，寻找无中微子双β衰变（$0νββ$）。传奇基于GERDA和MAJORANA演示者的关键技术创新，已经基于长达61千克年的数据进行了首次无中微子双β衰变搜索。半数以上的曝光来自性能最高的探测器，包括最新开发的倒置同轴探测器，在双β衰变信号区域的背景水平估计为 $（中子数目/千瓦时）× (上±下)-ii \mathrm{cnts}$ （单位为千克年）。对GERDA、MAJORANA演示器和LEGEND的综合数据分析，在排除半衰期达到 $（半衰期）年 $的置信水平为百分之九十的情况下，揭示出没有信号的证据，并设定了新的观测下限为 $T^{0ν}_{半衰期} > 1.9 \times 10^{半衰期}$ 年（百分之九十的置信水平）。假设衰变是由马约拉纳中微子介导的，那么这对应有效马约拉纳质量的上限是在数值范围为毫电子伏特下的值以内不超过规定数值和参数在对应的理论标准值的正负范围范围内浮动（根据所选择的核矩阵元素有所不同）。换句话说，这个发现为我们提供了关于无中微子双β衰变的新观测结果和理论限制。

论文及项目相关链接

PDF

摘要
传奇合作运用高纯度锗探测器寻找无中微子双β衰变（$0νββ$），探测器以$^{76}$Ge浓缩物为工作介质，在低本底液氩环境中操作。基于GERDA和MAJORANA DEMONSTRATOR的关键技术创新，LEGEND-200首次进行了基于$0νββ$衰变的搜索，分析数据达61.0公斤年。新开发的倒置同轴探测器等高性能探测器贡献了超过一半的曝光量，其在$0νββ$衰变信号区域的背景水平估计为$0.5^{+0.3}{-0.2}$ cts/(kg年)。综合分析GERDA、MAJORANA DEMONSTRATOR和LEGEND-200的数据，在90%的置信水平下，半衰期的排除敏感性为$2.8 \times 10^{26}$年，未见信号迹象，并设置新的观测下限为$T^{0ν}{1/2} > 1.9 \times 10^{26}$年。假设衰变由马约拉纳中微子介导，这对应于有效马约拉纳质量上限在采用不同核矩阵元素时有所不同，范围为$m_{\beta\beta} < 75-200$ meV。

关键见解

LEGEND合作使用高纯度锗探测器寻找无中微子双β衰变（$0νββ$）。
LEGEND-200基于GERDA和MAJORANA DEMONSTRATOR的技术创新进行搜索。
数据分析揭示了无信号证据，并设定了新的观测下限。
在无中微子双β衰变区域，背景水平估计较低。
高性能探测器如倒置同轴探测器对实验贡献显著。
综合分析数据来自GERDA、MAJORANA DEMONSTRATOR和LEGEND-200。

Cool Papers

点此查看论文截图

Robustness and sex differences in skin cancer detection: logistic regression vs CNNs

Authors:Nikolette Pedersen, Regitze Sydendal, Andreas Wulff, Ralf Raumanns, Eike Petersen, Veronika Cheplygina

Deep learning has been reported to achieve high performances in the detection of skin cancer, yet many challenges regarding the reproducibility of results and biases remain. This study is a replication (different data, same analysis) of a previous study on Alzheimer’s disease detection, which studied the robustness of logistic regression (LR) and convolutional neural networks (CNN) across patient sexes. We explore sex bias in skin cancer detection, using the PAD-UFES-20 dataset with LR trained on handcrafted features reflecting dermatological guidelines (ABCDE and the 7-point checklist), and a pre-trained ResNet-50 model. We evaluate these models in alignment with the replicated study: across multiple training datasets with varied sex composition to determine their robustness. Our results show that both the LR and the CNN were robust to the sex distribution, but the results also revealed that the CNN had a significantly higher accuracy (ACC) and area under the receiver operating characteristics (AUROC) for male patients compared to female patients. The data and relevant scripts to reproduce our results are publicly available (https://github.com/ nikodice4/Skin-cancer-detection-sex-bias).

深度学习在皮肤癌检测方面已表现出卓越性能，但仍存在关于结果可重复性和偏见方面的诸多挑战。本研究是对一项关于阿尔茨海默病检测研究的复现（不同数据，相同分析），该研究探讨了逻辑回归（LR）和卷积神经网络（CNN）在不同患者性别间的稳健性。我们利用PAD-UFES-20数据集，探索皮肤癌检测中的性别偏见，使用逻辑回归对手工特征进行训练，这些特征反映了皮肤科指南（ABCDE和7点清单），以及预训练的ResNet-50模型。我们与复现的研究相一致，对多个训练数据集进行了评估，这些数据集具有不同的性别组成，以确定模型的稳健性。我们的结果表明，逻辑回归和卷积神经网络对性别分布具有稳健性，但结果还显示，卷积神经网络对男性患者的准确率（ACC）和受试者操作特性曲线下面积（AUROC）显著高于女性患者。有关复现我们结果的数据和相关脚本可公开获取（https://github.com/nikodice4/Skin-cancer-detection-sex-bias）。

论文及项目相关链接

PDF 10 pages, 1 figure, published at FAIMI workshop at the MICCAI 2025 conference

Summary
深度学习在皮肤癌检测中表现出高性能，但仍存在结果可重复性和偏见等多方面的挑战。本研究是对一项阿尔茨海默病检测研究的复制（不同数据，相同分析），探讨了逻辑回归（LR）和卷积神经网络（CNN）在患者性别间的稳健性。研究使用PAD-UFES-20数据集，采用基于皮肤科指南（ABCDE和7点清单）的手工特征训练的LR和预训练的ResNet-50模型，评估模型在性别分布不同的多个训练数据集上的稳健性。结果表明，LR和CNN对性别分布具有稳健性，但CNN对男性患者的准确度（ACC）和受试者特征接收曲线下面积（AUROC）显著高于女性患者。相关数据及脚本已公开可供复制研究。

Key Takeaways

深度学习在皮肤癌检测中展现出高性能，但结果的可重复性和偏见问题仍需关注。
本研究是对阿尔茨海默病检测研究的复制，旨在探讨逻辑回归和卷积神经网络在不同患者性别间的稳健性。
使用PAD-UFES-20数据集进行皮肤癌检测研究，采用基于皮肤科指南的手工特征训练的LR模型和预训练的ResNet-50模型。
评估模型在多个性别分布不同的训练数据集上的表现，显示模型对性别分布具有稳健性。
CNN模型对男性患者的检测准确度和受试者特征接收曲线下面积显著高于女性患者。
研究数据和相关脚本已公开，方便其他研究者进行复制和进一步分析。

Cool Papers

点此查看论文截图

Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions

Authors:Mohammad Almansoori, Komal Kumar, Hisham Cholakkal

In this work, we introduce MedAgentSim, an open-source simulated clinical environment with doctor, patient, and measurement agents designed to evaluate and enhance LLM performance in dynamic diagnostic settings. Unlike prior approaches, our framework requires doctor agents to actively engage with patients through multi-turn conversations, requesting relevant medical examinations (e.g., temperature, blood pressure, ECG) and imaging results (e.g., MRI, X-ray) from a measurement agent to mimic the real-world diagnostic process. Additionally, we incorporate self improvement mechanisms that allow models to iteratively refine their diagnostic strategies. We enhance LLM performance in our simulated setting by integrating multi-agent discussions, chain-of-thought reasoning, and experience-based knowledge retrieval, facilitating progressive learning as doctor agents interact with more patients. We also introduce an evaluation benchmark for assessing the LLM’s ability to engage in dynamic, context-aware diagnostic interactions. While MedAgentSim is fully automated, it also supports a user-controlled mode, enabling human interaction with either the doctor or patient agent. Comprehensive evaluations in various simulated diagnostic scenarios demonstrate the effectiveness of our approach. Our code, simulation tool, and benchmark are available at \href{https://medagentsim.netlify.app/}.

在这项工作中，我们介绍了MedAgentSim，这是一个开放源码的模拟临床环境，其中包含医生、患者和测量代理，旨在评估和提高大型语言模型在动态诊断环境中的性能。与传统的框架不同，我们的框架要求医生代理通过多轮对话积极地与患者互动，请求相关的医学检查（如体温、血压、心电图）和来自测量代理的成像结果（如磁共振成像、X射线），以模拟现实世界的诊断过程。此外，我们引入了自我改进机制，允许模型通过不断与更多患者互动来逐步优化其诊断策略。我们通过整合多代理讨论、链式思维推理和基于经验的知识检索，增强大型语言模型在模拟环境中的性能，促进医生代理在互动中的渐进学习。我们还引入了一个评估基准，以评估大型语言模型参与动态、基于上下文诊断交互的能力。虽然MedAgentSim完全自动化，但它也支持用户控制模式，允许人类与医生或患者代理进行互动。在各种模拟诊断场景中的综合评估证明了我们方法的有效性。我们的代码、仿真工具和基准测试平台可在[https://medagentsim.netlify.app/]上找到。

论文及项目相关链接

PDF 14 page, 4 figures, 61 references, presented in MICCAI (Oral)

Summary
医学图像领域的研究人员推出了一款名为MedAgentSim的开源模拟临床环境，该环境通过医生、患者和测量代理模拟动态诊断场景，评估并提升大型语言模型（LLM）的性能。MedAgentSim要求医生代理通过多轮对话积极与患者互动，从测量代理请求相关的医学检查和医学图像结果，以模拟真实世界的诊断过程。该框架还允许模型通过自我改进机制不断完善其诊断策略。该模拟环境支持多代理讨论、链式思维和基于经验的知企检索等特色功能，以促进医生代理与患者互动的渐进学习。此外，还推出了评估基准，以评估LLM在动态、语境感知的诊断交互中的能力。MedAgentSim既支持全自动模式，也支持用户控制模式。

Key Takeaways

MedAgentSim是一个模拟临床环境的开源框架，针对动态诊断场景评估和提升大型语言模型（LLM）的性能。
通过医生、患者和测量代理的互动来模拟真实世界的诊断过程。
框架引入了多轮对话，医生代理需积极与患者互动，并请求医学检查和医学图像结果。
框架包含自我改进机制，允许模型不断完善其诊断策略。
支持多代理讨论、链式思维及基于经验的知企检索等功能，促进渐进学习。
提供了一个评估基准，用于评估LLM在动态、语境感知的诊断交互中的能力。

Cool Papers

点此查看论文截图

Efficient Self-Supervised Adaptation for Medical Image Analysis

Authors:Moein Sorkhei, Emir Konuk, Jingyu Guo, Chanjuan Meng, Christos Matsoukas, Kevin Smith

Self-supervised adaptation (SSA) improves foundation model transfer to medical domains but is computationally prohibitive. Although parameter efficient fine-tuning methods such as LoRA have been explored for supervised adaptation, their effectiveness for SSA remains unknown. In this work, we introduce efficient self-supervised adaptation (ESSA), a framework that applies parameter-efficient fine-tuning techniques to SSA with the aim of reducing computational cost and improving adaptation performance. Among the methods tested, Attention Projection Layer Adaptation (APLA) sets a new state-of-the-art, consistently surpassing full-parameter SSA and supervised fine-tuning across diverse medical tasks, while reducing GPU memory by up to 40.1% and increasing training throughput by 25.2%, all while maintaining inference efficiency.

自我监督适应（SSA）改善了基础模型在医学领域的迁移，但计算成本很高。虽然参数高效的微调方法（如LoRA）已被探索用于监督适应，但它们在SSA中的有效性尚不清楚。在这项工作中，我们引入了高效自我监督适应（ESSA）框架，该框架将参数高效的微调技术应用于SSA，旨在降低计算成本并提高适应性能。在测试的方法中，注意力投影层适应（APLA）树立了新的技术标杆，它不断超越全参数SSA和监督微调，在各种医学任务中表现优异。同时，它减少了高达40.1%的GPU内存，提高了25.2%的训练效率，同时保持了推理效率。

论文及项目相关链接

PDF Accepted to ICCV CVAMD 2025

Summary

基于自监督适应（SSA）的方法改善了基础模型在医学领域的迁移效果，但其计算成本较高。本研究引入高效自监督适应（ESSA）框架，采用参数优化微调技术来优化SSA，旨在降低计算成本并提高适应性性能。测试中，注意力投影层适应（APLA）表现卓越，在多种医学任务上均超越全参数SSA和监督微调方法，同时减少GPU内存使用达40.1%，提高训练效率达25.2%，并维持推理效率。

Key Takeaways

自监督适应（SSA）在医学领域模型迁移中有良好表现，但计算成本较高。
参数优化微调技术如LoRA在监督适应中已被探索，但在SSA中的有效性未知。
高效自监督适应（ESSA）框架旨在降低SSA的计算成本并提高适应性。
注意力投影层适应（APLA）在多种医学任务上表现优异，超越全参数SSA及监督微调。
APLA能显著降低GPU内存使用并提高训练效率。
APLA能维持推理效率。

Cool Papers

点此查看论文截图

A Survey on Self-supervised Contrastive Learning for Multimodal Text-Image Analysis

Authors:Asifullah Khan, Laiba Asmatullah, Anza Malik, Shahzaib Khan, Hamna Asif

Self-supervised learning is a machine learning approach that generates implicit labels by learning underlined patterns and extracting discriminative features from unlabeled data without manual labelling. Contrastive learning introduces the concept of “positive” and “negative” samples, where positive pairs (e.g., variation of the same image/object) are brought together in the embedding space, and negative pairs (e.g., views from different images/objects) are pushed farther away. This methodology has shown significant improvements in image understanding and image text analysis without much reliance on labeled data. In this paper, we comprehensively discuss the terminologies, recent developments and applications of contrastive learning with respect to text-image models. Specifically, we provide an overview of the approaches of contrastive learning in text-image models in recent years. Secondly, we categorize the approaches based on different model structures. Thirdly, we further introduce and discuss the latest advances of the techniques used in the process such as pretext tasks for both images and text, architectural structures, and key trends. Lastly, we discuss the recent state-of-art applications of self-supervised contrastive learning Text-Image based models.

自监督学习是一种机器学习的方法，它通过学习潜在的模式并从无标签数据中提取判别特征，从而生成隐式标签，而无需手动标注。对比学习引入了“正样本”和“负样本”的概念，其中正样本对（例如，同一图像/对象的变体）在嵌入空间中聚集在一起，而负样本对（例如，来自不同图像/对象的视图）则被推开。这种方法在图像理解和图像文本分析方面取得了显著的改进，而且不需要依赖大量的标注数据。在本文中，我们全面讨论了与文本-图像模型相关的对比学习的术语、最新发展以及应用。具体地，我们概述了近年来文本-图像模型中对比学习的方法。其次，我们根据不同的模型结构对这些方法进行了分类。再次，我们进一步介绍了过程中使用的最新技术，如图像和文本的预训练任务、架构结构和关键趋势。最后，我们讨论了基于文本-图像的最新先进的自监督对比学习的应用。

论文及项目相关链接

PDF 38 pages, 8 figures, survey paper

Summary
自监督学习通过从非标记数据中学习潜在模式和提取判别特征，生成隐式标签，无需人工标注。对比学习引入了“正样本”和“负样本”的概念，将正样本对拉近嵌入空间，将负样本对推开。此方法在图像理解和文本分析方面表现出显著改进，对标注数据的依赖较小。本文全面探讨了文本图像模型中对比学习的术语、最新发展及应用，介绍了最新的技术和趋势。

Key Takeaways

自监督学习是通过学习潜在模式和提取非标记数据的判别特征来生成隐式标签。
对比学习通过引入正样本和负样本的概念，在嵌入空间中区分不同的输入。
正样本对在嵌入空间中相互接近，而负样本对被推开。
对比学习方法在图像理解和文本分析方面取得了显著进展，减少对标注数据的依赖。
文本图像模型中对比学习的最新发展和应用被详细讨论。
文章介绍了基于不同模型结构的对比学习方法分类。

Cool Papers

点此查看论文截图

High-Precision Dichotomous Image Segmentation via Depth Integrity-Prior and Fine-Grained Patch Strategy

Authors:Xianjie Liu, Keren Fu, Qijun Zhao

High-precision dichotomous image segmentation (DIS) is a task of extracting fine-grained objects from high-resolution images. Existing methods face a dilemma: non-diffusion methods work efficiently but suffer from false or missed detections due to weak semantics and less robust spatial priors; diffusion methods, using strong generative priors, have high accuracy but encounter high computational burdens. As a solution, we find pseudo depth information from monocular depth estimation models can provide essential semantic understanding that quickly reveals spatial differences across target objects and backgrounds. Inspired by this phenomenon, we discover a novel insight we term the depth integrity-prior: in pseudo depth maps, foreground objects consistently convey stable depth values with much lower variances than chaotic background patterns. To exploit such a prior, we propose a Prior of Depth Fusion Network (PDFNet). Specifically, our network establishes multimodal interactive modeling to achieve depth-guided structural perception by deeply fusing RGB and pseudo depth features. We further introduce a novel depth integrity-prior loss to explicitly enforce depth consistency in segmentation results. Additionally, we design a fine-grained perception enhancement module with adaptive patch selection to perform boundary-sensitive detail refinement. Notably, PDFNet achieves state-of-the-art performance with only 94M parameters (<11% of those diffusion-based models), outperforming all non-diffusion methods and surpassing some diffusion methods. Code is provided in the supplementary materials.

高精度二值图像分割（DIS）是从高分辨率图像中提取细粒度物体的任务。现有方法面临一个困境：非扩散方法虽然效率高，但由于语义较弱和空间先验不够稳健，容易出现误检或漏检；而使用强生成先验的扩散方法虽然精度高，但计算负担大。为解决这一问题，我们从单目深度估计模型中找到伪深度信息，它能提供对语义的重要理解，快速揭示目标物体和背景之间的空间差异。受此现象的启发，我们发现了称为深度完整性先验的新见解：在伪深度图中，前景物体具有稳定的深度值，其方差远低于混乱的背景模式。为了利用这一先验，我们提出了深度融合网络（PDFNet）。具体而言，我们的网络建立多模态交互建模，通过深度融合RGB和伪深度特征来实现深度引导的结构感知。我们还引入了一种新的深度完整性先验损失，以显式地强制分割结果中的深度一致性。此外，我们设计了一个具有自适应补丁选择的精细粒度感知增强模块，以执行边界敏感的细节优化。值得注意的是，PDFNet仅使用94M参数（不到扩散模型的11%），实现了最先进的性能，不仅优于所有非扩散方法，还超越了一些扩散方法。代码已包含在补充材料中。

论文及项目相关链接

PDF

Summary

本文提出了一种基于伪深度信息的高精度二值图像分割方法。该方法通过融合RGB和伪深度特征，建立多模态交互模型，实现深度引导的结构感知，并引入深度完整性先验损失来增强分割结果的深度一致性。此外，还设计了一个具有自适应补丁选择的精细感知增强模块，以执行边界敏感的细节优化。该方法参数少，性能优异，优于所有非扩散方法，并超越了一些扩散方法。

Key Takeaways

高精度二值图像分割（DIS）是从高分辨率图像中提取细粒度对象的任务。
现有方法面临非扩散和扩散方法的权衡：非扩散方法效率高但存在误检或漏检，扩散方法准确率高但计算负担大。
伪深度信息可以提供对目标对象和背景空间差异的关键语义理解。
提出了深度完整性先验（DIP），指出前景对象在伪深度图中具有稳定的深度值，方差较低。
介绍了基于深度完整性先验的PDFNet网络，通过融合RGB和伪深度特征实现深度引导的结构感知。
PDFNet引入了一种新的深度完整性先验损失，以明确加强分割结果的深度一致性。

Cool Papers

点此查看论文截图

Toward a Robust R2D2 Paradigm for Radio-interferometric Imaging: Revisiting Deep Neural Network Training and Architecture

Authors:Amir Aghabiglou, Chung San Chu, Chao Tang, Arwa Dabbech, Yves Wiaux

The R2D2 Deep Neural Network (DNN) series was recently introduced for image formation in radio interferometry. It can be understood as a learned version of CLEAN, whose minor cycles are substituted with DNNs. We revisit R2D2 on the grounds of series convergence, training methodology, and DNN architecture, improving its robustness in terms of generalizability beyond training conditions, capability to deliver high data fidelity, and epistemic uncertainty. First, while still focusing on telescope-specific training, we enhance the learning process by randomizing Fourier sampling integration times, incorporating multiscan multinoise configurations, and varying imaging settings, including pixel resolution and visibility-weighting scheme. Second, we introduce a convergence criterion whereby the reconstruction process stops when the data residual is compatible with noise, rather than simply using all available DNNs. This not only increases the reconstruction efficiency by reducing its computational cost, but also refines training by pruning out the data/image pairs for which optimal data fidelity is reached before training the next DNN. Third, we substitute R2D2’s early U-Net DNN with a novel architecture (U-WDSR) combining U-Net and WDSR, which leverages wide activation, dense skip connections, weight normalization, and low-rank convolution to improve feature reuse and reconstruction precision. As previously, R2D2 was trained for monochromatic intensity imaging with the Very Large Array at fixed $512 \times 512$ image size. Simulations on a wide range of inverse problems and a case study on real data reveal that the new R2D2 model consistently outperforms its earlier version in image reconstruction quality, data fidelity, and epistemic uncertainty.

R2D2深度神经网络（DNN）系列最近被引入到射电干涉法成像中。可以将其理解为CLEAN的习得版本，其小周期被DNN替代。我们从序列收敛、训练方法和DNN架构等方面重新审视R2D2，提高了其在超越训练条件的一般性、提供高数据保真性和认识论不确定性方面的稳健性。首先，在仍关注望远镜特定训练的同时，我们通过随机傅里叶采样积分时间、融入多扫描多噪声配置以及变化成像设置（包括像素分辨率和可见性加权方案）来增强学习过程。其次，我们引入了一个收敛标准，即当数据残差与噪声兼容时，重建过程将停止，而不是简单地使用所有可用的DNN。这不仅通过减少计算成本提高了重建效率，而且通过剔除那些已经在训练下一个DNN之前达到最佳数据保真度的数据/图像对来优化训练。第三，我们用一种新型架构（U-WDSR）替代R2D2早期的U-Net DNN，该架构结合了U-Net和WDSR，利用宽激活、密集跳跃连接、权重归一化和低秩卷积来提高特征复用和重建精度。与之前一样，R2D2以固定$512 \times 512$图像大小使用超大阵列进行单色强度成像训练。在广泛范围的反问题模拟和真实数据案例研究中的结果表明，新型R2D2模型在图像重建质量、数据保真性和认识论不确定性方面均优于其早期版本。

论文及项目相关链接

PDF 18 pages, 6 figures

摘要

R2D2深度神经网络（DNN）系列最近被引入射电干涉仪图像形成中。可理解为对CLEAN的深度学习版本，其小周期被DNN替代。本文从序列收敛、训练方法和DNN架构等方面重新审视R2D2，提高了其在超越训练条件的一般化能力、高数据保真度和认知不确定性方面的稳健性。首先，在望远镜特定训练的基础上，通过随机傅里叶采样积分时间、引入多扫描多噪声配置和变化成像设置（包括像素分辨率和可见性加权方案）来增强学习过程。其次，我们引入了一个收敛准则，即当数据残差与噪声兼容时，重建过程就会停止，而不是简单地使用所有可用的DNNs。这不仅提高了重建效率并降低了计算成本，而且还通过剔除那些在达到最佳数据保真度之前就已训练好的下一个DNN的数据/图像对，从而优化了训练。第三，我们用一种新型架构U-WDSR替代了R2D2早期的U-Net DNN，该架构结合了U-Net和WDSR，利用宽激活、密集跳跃连接、权重归一化和低秩卷积来提高特征复用和重建精度。模拟广泛反问题和真实数据的案例研究表明，新R2D2模型在图像重建质量、数据保真度和认知不确定性方面均优于早期版本。

关键见解

R2D2系列网络在射电干涉仪图像形成中引入深度神经网络（DNN），是对CLEAN算法的深度学习改进。
通过随机傅里叶采样积分时间、多扫描多噪声配置和变化的成像设置增强了R2D2的稳健性。
引入新的收敛准则，基于数据残差与噪声的兼容性来停止重建过程，提高重建效率和计算成本效益。
用新型架构U-WDSR替代R2D2早期的U-Net DNN，结合了U-Net和WDSR的优点，提高了特征复用和重建精度。
新R2D2模型在图像重建质量、数据保真度和认知不确定性方面表现优越。
训练过程针对广泛的逆问题和真实数据进行了模拟和实证研究。

Cool Papers

点此查看论文截图

IM360: Large-scale Indoor Mapping with 360 Cameras

Authors:Dongki Jung, Jaehoon Choi, Yonghan Lee, Dinesh Manocha

We present a novel 3D mapping pipeline for large-scale indoor environments. To address the significant challenges in large-scale indoor scenes, such as prevalent occlusions and textureless regions, we propose IM360, a novel approach that leverages the wide field of view of omnidirectional images and integrates the spherical camera model into the Structure-from-Motion (SfM) pipeline. Our SfM utilizes dense matching features specifically designed for 360 images, demonstrating superior capability in image registration. Furthermore, with the aid of mesh-based neural rendering techniques, we introduce a texture optimization method that refines texture maps and accurately captures view-dependent properties by combining diffuse and specular components. We evaluate our pipeline on large-scale indoor scenes, demonstrating its effectiveness in real-world scenarios. In practice, IM360 demonstrates superior performance, achieving a 3.5 PSNR increase in textured mesh reconstruction. We attain state-of-the-art performance in terms of camera localization and registration on Matterport3D and Stanford2D3D.

我们针对大规模室内环境提出了一种新型的三维映射流程。为了解决大规模室内场景所面临的巨大挑战，如普遍的遮挡和纹理缺失区域，我们提出了一种新方法IM360。该方法利用全景图像的宽视野，并将球形相机模型融入运动恢复结构（SfM）流程中。我们的SfM利用专为全景图像设计的密集匹配特征，在图像配准方面表现出卓越的能力。此外，借助基于网格的神经网络渲染技术，我们引入了一种纹理优化方法，用于改进纹理映射，通过结合漫反射和镜面成分准确捕捉视图相关属性。我们在大规模室内场景上评估了我们的流程，证明了其在现实场景中的有效性。在实践中，IM30展现了卓越的性能，在纹理网格重建方面实现了3.5的峰值信噪比增加。我们在Matterport3D和Stanford2D3D上的相机定位和配准方面达到了业界先进水平。

论文及项目相关链接

PDF

Summary

本文介绍了一种用于大规模室内环境的新型三维映射管道IM360，它利用全景图像的广视角并将球形相机模型融入SfM（从运动中恢复结构）管道，以应对大规模室内场景中的遮挡和纹理缺失等挑战。通过利用专为全景图像设计的密集匹配特征，IM360在图像注册方面表现出卓越的能力。此外，借助基于网格的神经渲染技术，引入了一种纹理优化方法，可优化纹理映射并准确捕捉与视图相关的属性，通过结合漫反射和镜面成分实现高质量的纹理重建。在大型室内场景的实践中，IM360表现出卓越的性能，实现了纹理网格重建的PSNR值增加3.5。在Matterport3D和Stanford2D3D上的相机定位和注册方面达到了最先进的性能。

Key Takeaways

提出了新型三维映射管道IM360，用于大规模室内环境。
IM360利用全景图像的广视角和球形相机模型应对室内场景的遮挡和纹理缺失挑战。
IM360在SfM管道中采用密集匹配特征，增强图像注册能力。
引入基于网格的神经渲染技术的纹理优化方法，优化纹理映射并捕捉视图相关属性。
IM360实现了高质量的纹理重建，PSNR值增加3.5。
在Matterport3D和Stanford2D3D上的相机定位和注册方面达到最先进性能。

Cool Papers

点此查看论文截图

LEAD: Large Foundation Model for EEG-Based Alzheimer’s Disease Detection

Authors:Yihe Wang, Nan Huang, Nadia Mammone, Marco Cecchi, Xiang Zhang

Electroencephalography (EEG) provides a non-invasive, highly accessible, and cost-effective approach for detecting Alzheimer’s disease (AD). However, existing methods, whether based on handcrafted feature engineering or standard deep learning, face two major challenges: 1) the lack of large-scale EEG-AD datasets for robust representation learning, and 2) the absence of a dedicated deep learning pipeline for subject-level detection, which is more clinically meaningful than the commonly used sample-level detection. To address these gaps, we have curated the world’s largest EEG-AD corpus to date, comprising 2,255 subjects. Leveraging this unique data corpus, we propose LEAD, the first large-scale foundation model for EEG analysis in dementia. Our approach provides an innovative framework for subject-level AD detection, including: 1) a comprehensive preprocessing pipeline such as artifact removal, resampling, and filtering, and a newly proposed multi-scale segmentation strategy, 2) a subject-regularized spatio-temporal transformer trained with a novel subject-level cross-entropy loss and an indices group-shuffling algorithm, and 3) AD-guided contrastive pre-training. We pre-train on 12 datasets (3 AD-related and 9 non-AD) and fine-tune/test on 4 AD datasets. Compared with 10 baselines, LEAD consistently obtains superior subject-level detection performance under the challenging subject-independent cross-validation protocol. On the benchmark ADFTD dataset, our model achieves an impressive subject-level Sensitivity of 90.91% under the leave-one-subject-out (LOSO) setting. These results strongly validate the effectiveness of our method for real-world EEG-based AD detection. Source code: https://github.com/DL4mHealth/LEAD

脑电图（EEG）为检测阿尔茨海默病（AD）提供了一种非侵入性、易于获取且成本效益高的方法。然而，现有方法，无论是基于手工特征工程还是标准深度学习，都面临两大挑战：1）缺乏用于稳健表示学习的大规模EEG-AD数据集；2）缺乏用于主体级别检测的专用深度学习管道，这在临床上比常用的样本级别检测更有意义。为了解决这些差距，我们迄今为止整理了世界上最大的EEG-AD语料库，包含2255个受试者。利用这一独特的数据集，我们提出了LEAD，这是用于痴呆症脑电图分析的大规模基础模型。我们的方法为主体级AD检测提供了一个创新框架，包括：1）全面的预处理管道，如伪迹去除、重新采样和过滤，以及新提出的多尺度分割策略；2）使用新型主体级交叉熵损失和指数组打乱算法训练的受试者规范化时空变压器；3）AD引导对比预训练。我们在12个数据集（其中3个与AD相关，9个与AD不相关）上进行预训练，并在4个AD数据集上进行微调/测试。与10个基线相比，LEAD在具有挑战性的受试者独立交叉验证协议下始终获得更优秀的主体级检测性能。在基准ADFTD数据集上，我们的模型在留出一位受试者（LOSO）设置下达到了令人印象深刻的主体级敏感度90.91%。这些结果强烈验证了我们的方法在现实世界中基于EEG的AD检测中的有效性。源代码：https://github.com/DL4mHealth/LEAD

论文及项目相关链接

PDF

Summary

本文介绍了利用脑电图（EEG）进行阿尔茨海默病（AD）检测的非侵入性、高可及性和成本效益方法。针对现有方法的挑战，研究团队构建了迄今为止世界上最大的EEG-AD语料库，并提出了LEAD模型，该模型为痴呆症脑电图分析提供了大规模框架，用于主体层面的AD检测。该研究采用了一系列技术，包括数据预处理、主体正则化时空变换器训练、以及AD引导对比预训练。在多个数据集上的实验表明，LEAD模型在主体层面检测AD的性能优于其他基线模型。

Key Takeaways