⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-10-22 更新
Towards Explainable Skin Cancer Classification: A Dual-Network Attention Model with Lesion Segmentation and Clinical Metadata Fusion
Authors:Md. Enamul Atiq, Shaikh Anowarul Fattah
Skin cancer is a life-threatening disease where early detection significantly improves patient outcomes. Automated diagnosis from dermoscopic images is challenging due to high intra-class variability and subtle inter-class differences. Many deep learning models operate as “black boxes,” limiting clinical trust. In this work, we propose a dual-encoder attention-based framework that leverages both segmented lesions and clinical metadata to enhance skin lesion classification in terms of both accuracy and interpretability. A novel Deep-UNet architecture with Dual Attention Gates (DAG) and Atrous Spatial Pyramid Pooling (ASPP) is first employed to segment lesions. The classification stage uses two DenseNet201 encoders-one on the original image and another on the segmented lesion whose features are fused via multi-head cross-attention. This dual-input design guides the model to focus on salient pathological regions. In addition, a transformer-based module incorporates patient metadata (age, sex, lesion site) into the prediction. We evaluate our approach on the HAM10000 dataset and the ISIC 2018 and 2019 challenges. The proposed method achieves state-of-the-art segmentation performance and significantly improves classification accuracy and average AUC compared to baseline models. To validate our model’s reliability, we use Gradient-weighted Class Activation Mapping (Grad-CAM) to generate heatmaps. These visualizations confirm that our model’s predictions are based on the lesion area, unlike models that rely on spurious background features. These results demonstrate that integrating precise lesion segmentation and clinical data with attention-based fusion leads to a more accurate and interpretable skin cancer classification model.
皮肤癌是一种威胁生命的疾病,早期发现能显著改善患者预后。由于类内变化高度多样和类间差异细微,使得从皮肤镜图像中进行自动化诊断充满挑战。许多深度学习模型如同“黑箱”,限制了临床信任度。在这项工作中,我们提出了一种基于双编码器注意力的框架,它利用分割的病变和临床元数据来提高皮肤病变分类的准确性和可解释性。首先,采用具有双注意力门(DAG)和大孔空间金字塔池化(ASPP)的新型Deep-UNet架构进行病变分割。分类阶段使用两个DenseNet201编码器——一个用于原始图像,另一个用于分割的病变,其特征通过多头交叉注意力融合。这种双输入设计引导模型关注显著的病理区域。此外,一个基于变压器的模块将患者元数据(年龄、性别、病变部位)纳入预测中。我们在HAM10000数据集以及ISIC 2018和2019挑战赛上评估了我们的方法。所提出的方法实现了最先进的分割性能,与基线模型相比,显著提高了分类精度和平均AUC。为了验证我们模型的可靠性,我们使用梯度加权类激活映射(Grad-CAM)生成热图。这些可视化确认了我们模型的预测是基于病变区域的,不同于那些依赖于虚假背景特征的模型。这些结果表明,将精确的病变分割和临床数据与基于注意力的融合相结合,可以构建出更准确和可解释的皮肤癌分类模型。
论文及项目相关链接
PDF 15 pages, 7 Figures, 3 Tables
摘要
本文提出了一种基于双编码器注意力机制的框架,结合分割病灶和临床元数据,以提高皮肤病变分类的准确性和可解释性。采用带有双注意力门(DAG)和阿特罗空间金字塔池化(ASPP)的新型Deep-UNet架构进行病灶分割。分类阶段使用两个DenseNet201编码器,一个处理原始图像,另一个处理分割后的病灶,通过多头交叉注意力融合特征。这种双输入设计引导模型关注显著的病理区域。此外,一个基于变压器模块的患者元数据(年龄、性别、病灶部位)被纳入预测中。在HAM10000数据集和ISIC 2018及2019挑战赛上评估了我们的方法,所提方法实现了先进的分割性能,并显著提高了分类准确率和平均AUC值,相较于基准模型。通过梯度加权类激活映射(Grad-CAM)验证了我们模型的可靠性,生成的热图确认我们的预测是基于病灶区域,不同于依赖背景特征的模型。结果证明,将精确病灶分割和临床数据与注意力融合相结合,可构建更准确和可解释的皮肤癌分类模型。
关键见解
- 皮肤癌早期检测对患者预后至关重要,而自动化诊断面临诸多挑战。
- 提出了一种基于双编码器注意力机制的框架,融合分割病灶和临床元数据,以提高皮肤病变分类的准确性和可解释性。
- 引入新型Deep-UNet架构进行精确病灶分割。
- 分类阶段采用双输入设计,通过多头交叉注意力融合特征,引导模型关注显著病理区域。
- 首次将患者元数据纳入预测中,增强模型的全面性和可靠性。
- 在多个数据集上的评估证明了所提方法的高度准确性和先进性。
点此查看论文截图


Improving Cross-Patient Generalization in Parkinson’s Disease Detection through Chunk-Based Analysis of Hand-Drawn Patterns
Authors:Mhd Adnan Albani, Riad Sonbol
Parkinson’s disease (PD) is a neurodegenerative disease affecting about 1% of people over the age of 60, causing motor impairments that impede hand coordination activities such as writing and drawing. Many approaches have tried to support early detection of Parkinson’s disease based on hand-drawn images; however, we identified two major limitations in the related works: (1) the lack of sufficient datasets, (2) the robustness when dealing with unseen patient data. In this paper, we propose a new approach to detect Parkinson’s disease that consists of two stages: The first stage classifies based on their drawing type(circle, meander, spiral), and the second stage extracts the required features from the images and detects Parkinson’s disease. We overcame the previous two limitations by applying a chunking strategy where we divide each image into 2x2 chunks. Each chunk is processed separately when extracting features and recognizing Parkinson’s disease indicators. To make the final classification, an ensemble method is used to merge the decisions made from each chunk. Our evaluation shows that our proposed approach outperforms the top performing state-of-the-art approaches, in particular on unseen patients. On the NewHandPD dataset our approach, it achieved 97.08% accuracy for seen patients and 94.91% for unseen patients, our proposed approach maintained a gap of only 2.17 percentage points, compared to the 4.76-point drop observed in prior work.
帕金森病(PD)是一种神经退行性疾病,影响约60岁以上人口的1%,导致运动障碍,影响手写和绘图等手部协调活动。许多方法都试图基于手绘图像进行帕金森病的早期检测;然而,我们在相关工作中发现了两个主要局限性:(1)数据集不足,(2)在处理未见患者数据时的稳健性。在本文中,我们提出了一种新的帕金森病检测方法,分为两个阶段:第一阶段根据绘画类型(圆形、蛇形、螺旋形)进行分类,第二阶段从图像中提取所需特征并进行帕金森病检测。我们通过应用分块策略克服了前两个局限性,将每张图像分成2x2的块。在提取特征和识别帕金森病指标时,每个块会单独处理。为了做出最终分类,我们使用集成方法融合了每个块的决策。我们的评估显示,我们提出的方法在未见患者方面表现出优于当前最先进的技术的效果。在新HandPD数据集上,我们的方法在为已知患者和未知患者的检测方面分别达到了97.08%和94.91%的准确率。与之前工作中观察到的4.76%的下降相比,我们的方法仅保持了2.17%的差距。
论文及项目相关链接
PDF 19 pages, 2 figures, 9 tables
Summary
本文提出了一种新的帕金森病早期检测法,该方法分为两个阶段:首先根据绘画类型(圆圈、曲折、螺旋)进行分类,接着从图像中提取特征并检测帕金森病。为克服现有研究中的数据集不足以及处理未知患者数据不鲁棒的问题,采用了分块策略。实验结果表明,新提出的方法相较于其他先进方法具有更佳表现,特别是在未知患者上的表现,NewHandPD数据集上的准确率达到了97.08%(已知患者)和94.91%(未知患者)。
Key Takeaways
- 帕金森病影响超过60岁人群的1%,导致手部协调活动如书写和绘画出现障碍。
- 当前研究中存在两大挑战:数据集不足以及处理未知患者数据的鲁棒性不足。
- 提出了一种新的帕金森病检测法,分为基于绘画类型的分类和从图像中提取特征两个阶段。
- 采用分块策略克服上述挑战,将图像分为多个小块进行处理。
- 使用集成方法结合各小块的决策进行最终分类。
- 在NewHandPD数据集上的实验表明,新方法相较于其他先进方法有更佳表现。
点此查看论文截图



Intelligent Communication Mixture-of-Experts Boosted-Medical Image Segmentation Foundation Model
Authors:Xinwei Zhang, Hu Chen, Zhe Yuan, Sukun Tian, Peng Feng
Foundation models for medical image segmentation have achieved remarkable performance. Adaptive fine-tuning of natural image segmentation foundation models is crucial for medical image segmentation tasks. However, some limitations exist in existing fine-tuning methods: 1) insufficient representation of high-level features and 2) the fine-tuning process disrupts the structural integrity of pretrained weights. Inspired by these critical problems, we propose an intelligent communication mixture-of-experts boosted-medical image segmentation foundation model, named IC-MoE, with twofold ideas: 1) We construct basic experts, semantic experts, and adaptive experts. Moreover, we implement a pixel probability adaptive voting strategy, which enables expert selection and fusion through label consistency and load balancing. This approach preliminarily enhances the representation capability of high-level features while preserving the structural integrity of pretrained weights. 2) We propose a semantic-guided contrastive learning method to address the issue of weak supervision in contrastive learning. This method further enhances the representation capability of high-level features while preserving the structural integrity of pretrained weights. Extensive experiments across three public medical image segmentation datasets demonstrate that the IC-MoE outperforms other SOTA models. Consequently, the proposed IC-MoE effectively supplements foundational medical image segmentation models with high-level features and pretrained structural integrity. We also validate the superior generalizability of the IC-MoE across diverse medical image segmentation scenarios.
医疗图像分割的基础模型已经取得了显著的成果。自然图像分割基础模型的自适应微调对医疗图像分割任务至关重要。然而,现有微调方法存在一些局限性:1)高级特征表示不足;2)微调过程破坏了预训练权重的结构完整性。针对这些关键问题,我们提出了智能通信混合专家增强医疗图像分割基础模型,简称IC-MoE,该模型有两方面的思想:1)我们构建了基础专家、语义专家和自适应专家。此外,我们实现了像素概率自适应投票策略,通过标签一致性和负载均衡进行专家选择和融合。这种方法初步增强了高级特征的表示能力,同时保持了预训练权重的结构完整性。2)我们提出了一种语义引导对比学习方法,解决对比学习中监督不足的问题。该方法进一步增强了高级特征的表示能力,同时保持了预训练权重的结构完整性。在三个公开医疗图像分割数据集上的大量实验表明,IC-MoE优于其他最先进模型。因此,所提出的IC-MoE有效地补充了医疗图像分割模型的高级特征和预训练的结构完整性。我们还验证了IC-MoE在不同医疗图像分割场景中的优秀泛化能力。
论文及项目相关链接
Summary
基于医学图像分割基础模型的出色表现,自适应微调自然图像分割基础模型对医学图像分割任务至关重要。针对现有微调方法的局限性,如高层次特征表示不足和预训练权重结构完整性的破坏,本文提出了一种智能通信混合专家增强医学图像分割基础模型(IC-MoE)。该模型通过构建基本专家、语义专家和自适应专家,并采用像素概率自适应投票策略实现专家选择和融合,提高了高层次特征的表示能力,同时保持了预训练权重的结构完整性。此外,还提出了一种语义引导对比学习方法,以解决对比学习中的弱监督问题。实验表明,IC-MoE在三个公开医学图像分割数据集上的表现优于其他先进模型,有效补充了医学图像分割模型的高层次特征和预训练结构完整性,并在多种医学图像分割场景中表现出良好的泛化能力。
Key Takeaways
- 医学图像分割基础模型已经取得了显著的性能提升。
- 自适应微调自然图像分割基础模型对于医学图像分割至关重要。
- 现有微调方法存在高层次特征表示不足和预训练权重结构破坏的问题。
- IC-MoE模型通过构建专家并采用像素概率自适应投票策略,提高了高层次的特征表示能力。
- IC-MoE模型通过语义引导对比学习方法解决了对比学习中的弱监督问题。
- IC-MoE在多个医学图像分割数据集上的表现优于其他先进模型。
点此查看论文截图








ZACH-ViT: A Zero-Token Vision Transformer with ShuffleStrides Data Augmentation for Robust Lung Ultrasound Classification
Authors:Athanasios Angelakis, Amne Mousa, Micah L. A. Heldeweg, Laurens A. Biesheuvel, Mark A. Haaksma, Jasper M. Smit, Pieter R. Tuinman, Paul W. G. Elbers
Differentiating cardiogenic pulmonary oedema (CPE) from non-cardiogenic and structurally normal lungs in lung ultrasound (LUS) videos remains challenging due to the high visual variability of non-cardiogenic inflammatory patterns (NCIP/ARDS-like), interstitial lung disease, and healthy lungs. This heterogeneity complicates automated classification as overlapping B-lines and pleural artefacts are common. We introduce ZACH-ViT (Zero-token Adaptive Compact Hierarchical Vision Transformer), a 0.25 M-parameter Vision Transformer variant that removes both positional embeddings and the [CLS] token, making it fully permutation-invariant and suitable for unordered medical image data. To enhance generalization, we propose ShuffleStrides Data Augmentation (SSDA), which permutes probe-view sequences and frame orders while preserving anatomical validity. ZACH-ViT was evaluated on 380 LUS videos from 95 critically ill patients against nine state-of-the-art baselines. Despite the heterogeneity of the non-cardiogenic group, ZACH-ViT achieved the highest validation and test ROC-AUC (0.80 and 0.79) with balanced sensitivity (0.60) and specificity (0.91), while all competing models collapsed to trivial classification. It trains 1.35x faster than Minimal ViT (0.62M parameters) with 2.5x fewer parameters, supporting real-time clinical deployment. These results show that aligning architectural design with data structure can outperform scale in small-data medical imaging.
在肺部超声(LUS)视频中,将心源性肺水肿(CPE)与结构正常的非心源性肺区分开来仍然是一个挑战,因为非心源性炎症模式(NCIP/ARDS类似)、间质性肺疾病和健康肺部的视觉变化非常大。这种异质性使得自动化分类变得复杂,因为B线重叠和胸膜伪影很常见。我们引入了ZACH-ViT(零令牌自适应紧凑分层视觉转换器),这是一种具有0.25M参数的视觉转换器变体,它移除了位置嵌入和[CLS]令牌,使其完全具有排列不变性,适用于无序的医学图像数据。为了增强通用性,我们提出了ShuffleStrides数据增强(SSDA),它通过改变探头视图序列和帧顺序来保留解剖有效性。ZACH-ViT在来自95名危重患者的380个LUS视频上进行了评估,与9种最先进的基线相比,尽管非心源性组的异质性很强,但ZACH-ViT达到了最高的验证和测试ROC-AUC(分别为0.80和0.79),具有平衡的敏感性(0.60)和特异性(0.91),而所有竞争模型都归类为微不足道。它比具有0.62M参数的最小ViT训练速度更快(快1.35倍),同时参数更少(少2.5倍),支持实时临床部署。这些结果表明,将架构设计与数据结构相匹配可以在小型医疗图像数据中获得超越规模的性能。
论文及项目相关链接
PDF 14 pages, 6 figures, 2 tables. Primary subject: cs.LG (Machine Learning) Cross-listed to: cs.CV (Computer Vision and Pattern Recognition), eess.IV (Image and Video Processing). Code available at: https://github.com/Bluesman79/ZACH-ViT Installation: pip install zachvit Paper licensed under CC BY-NC-ND 4.0. Code released under Apache 2.0 License
Summary
针对肺超声视频中心源性肺水肿(CPE)与非心源性及结构正常肺部的区分,仍存在挑战。提出一种名为ZACH-ViT的新型视觉转换器,可在无序医学图像数据上实现全自动分类。通过ShuffleStrides数据增强方法提高通用性,并在380个肺超声视频上进行评估,结果显示ZACH-ViT在异质非心源性疾病组中具有最佳验证和测试ROC-AUC值,且训练速度快、参数少,适合实时临床部署。
Key Takeaways
* 区分肺超声视频中CPE与非心源性及正常肺部具有挑战性。
* ZACH-ViT是一种新型的视觉转换器,可处理无序医学图像数据。
* 提出的ShuffleStrides数据增强方法能提高模型的通用性。
* ZACH-ViT在非心源性肺病组中具有最佳ROC-AUC值。
* ZACH-ViT具有快速训练与较少的参数,适合实时临床部署。
点此查看论文截图




CEPerFed: Communication-Efficient Personalized Federated Learning for Multi-Pulse MRI Classification
Authors:Ludi Li, Junbin Mao, Hanhe Lin, Xu Tian, Fang-Xiang Wu, Jin Liu
Multi-pulse magnetic resonance imaging (MRI) is widely utilized for clinical practice such as Alzheimer’s disease diagnosis. To train a robust model for multi-pulse MRI classification, it requires large and diverse data from various medical institutions while protecting privacy by preventing raw data sharing across institutions. Although federated learning (FL) is a feasible solution to address this issue, it poses challenges of model convergence due to the effect of data heterogeneity and substantial communication overhead due to large numbers of parameters transmitted within the model. To address these challenges, we propose CEPerFed, a communication-efficient personalized FL method. It mitigates the effect of data heterogeneity by incorporating client-side historical risk gradients and historical mean gradients to coordinate local and global optimization. The former is used to weight the contributions from other clients, enhancing the reliability of local updates, while the latter enforces consistency between local updates and the global optimization direction to ensure stable convergence across heterogeneous data distributions. To address the high communication overhead, we propose a hierarchical SVD (HSVD) strategy that transmits only the most critical information required for model updates. Experiments on five classification tasks demonstrate the effectiveness of the CEPerFed method. The code will be released upon acceptance at https://github.com/LD0416/CEPerFed.
多脉冲磁共振成像(MRI)广泛应用于临床实践,如阿尔茨海默病的诊断。为了训练用于多脉冲MRI分类的稳健模型,需要来自不同医疗机构的大量且多样化的数据,同时要通过防止原始数据在机构之间的共享来保护隐私。虽然联邦学习(FL)是解决这个问题的一种可行解决方案,但由于数据异质性的影响和模型内传输的大量参数导致的通信开销较大,它给模型的收敛带来了挑战。为了解决这些挑战,我们提出了CEPerFed,一种通信高效的个性化联邦学习方法。它通过融入客户端侧的历史风险梯度和历史平均梯度来协调局部和全局优化,从而减轻了数据异质性的影响。前者用于权衡其他客户端的贡献,提高本地更新的可靠性,而后者则强制本地更新与全局优化方向之间保持一致,以确保在异质数据分布之间实现稳定的收敛。为解决高通信开销问题,我们提出了一种分层SVD(HSVD)策略,只传输模型更新所需的最关键信息。在五个分类任务上的实验证明了CEPerFed方法的有效性。代码将在接受后发布在https://github.com/LD0416/CEPerFed。
论文及项目相关链接
Summary
本文介绍了多脉冲磁共振成像(MRI)在临床实践中的广泛应用,如用于阿尔茨海默病的诊断。为解决多脉冲MRI分类模型的训练需要大量跨机构的数据且要保证数据隐私的问题,本文提出了联邦学习(FL)的解决方案。然而,数据异质性和大量参数传输带来的模型收敛和通信效率挑战也备受关注。为此,本文提出了CEPerFed,一种通信高效的个性化联邦学习方法。它通过融入客户端的历史风险梯度和历史平均梯度来协调本地和全局优化,以缓解数据异质性的影响。同时,采用分层SVD(HSVD)策略,仅传输模型更新所需的最关键信息,以解决高通信开销问题。实验结果表明,CEPerFed方法在五个分类任务上均表现出良好的效果。
Key Takeaways
- 多脉冲MRI在临床实践中有着广泛应用,特别是在疾病的诊断上,如阿尔茨海默病。
- 解决多脉冲MRI分类模型的训练需要处理跨机构大数据的同时保护隐私。
- 联邦学习是解决这一问题的可行方案,但面临数据异质性和通信效率的挑战。
- CEPerFed方法通过融入历史风险梯度和历史平均梯度来缓解数据异质性的影响。
- CEPerFed方法采用分层SVD策略,减少通信开销,仅传输模型更新所需的最关键信息。
- 实验结果表明CEPerFed方法在五个分类任务上的有效性。
点此查看论文截图





MambaX-Net: Dual-Input Mamba-Enhanced Cross-Attention Network for Longitudinal MRI Segmentation
Authors:Yovin Yahathugoda, Davide Prezzi, Piyalitt Ittichaiwong, Vicky Goh, Sebastien Ourselin, Michela Antonelli
Active Surveillance (AS) is a treatment option for managing low and intermediate-risk prostate cancer (PCa), aiming to avoid overtreatment while monitoring disease progression through serial MRI and clinical follow-up. Accurate prostate segmentation is an important preliminary step for automating this process, enabling automated detection and diagnosis of PCa. However, existing deep-learning segmentation models are often trained on single-time-point and expertly annotated datasets, making them unsuitable for longitudinal AS analysis, where multiple time points and a scarcity of expert labels hinder their effective fine-tuning. To address these challenges, we propose MambaX-Net, a novel semi-supervised, dual-scan 3D segmentation architecture that computes the segmentation for time point t by leveraging the MRI and the corresponding segmentation mask from the previous time point. We introduce two new components: (i) a Mamba-enhanced Cross-Attention Module, which integrates the Mamba block into cross attention to efficiently capture temporal evolution and long-range spatial dependencies, and (ii) a Shape Extractor Module that encodes the previous segmentation mask into a latent anatomical representation for refined zone delination. Moreover, we introduce a semi-supervised self-training strategy that leverages pseudo-labels generated from a pre-trained nnU-Net, enabling effective learning without expert annotations. MambaX-Net was evaluated on a longitudinal AS dataset, and results showed that it significantly outperforms state-of-the-art U-Net and Transformer-based models, achieving superior prostate zone segmentation even when trained on limited and noisy data.
活跃监控(AS)是管理和治疗低危和中危前列腺癌(PCa)的一种治疗方式,旨在通过一系列MRI和临床随访监测疾病进展,避免过度治疗。准确的前列腺分割是自动化此过程的重要初步步骤,可实现前列腺癌的自动检测和诊断。然而,现有的深度学习分割模型通常基于单时间点和专业注释的数据集进行训练,因此不适用于纵向AS分析。在纵向AS分析中,多个时间点和专家标签的稀缺性阻碍了其有效的微调。为了应对这些挑战,我们提出了MambaX-Net,这是一种新型半监督、双扫描3D分割架构。它通过利用MRI和上一个时间点的相应分割掩膜来计算时间点t的分割。我们介绍了两个新组件:(i)Mamba增强型交叉注意模块,它将Mamba块集成到交叉注意中,以有效地捕捉时间演变和远程空间依赖性;(ii)形状提取模块,它将上一个分割掩膜编码为潜在解剖表示,以实现更精细的区域划分。此外,我们引入了一种半监督自训练策略,该策略利用来自预训练nnU-Net生成的伪标签,实现无需专家注释的有效学习。MambaX-Net在纵向AS数据集上进行了评估,结果表明,它显著优于最先进U-Net和基于Transformer的模型,即使在有限和噪声数据中也能实现卓越的前列腺区域分割。
论文及项目相关链接
Summary
本文介绍了针对低危和中间风险的前列腺癌(PCa)治疗过程中的一种重要的预处理步骤——准确的前列腺分割技术。现有深度学习模型不能满足长期的活跃监测分析需求,为此提出了名为MambaX-Net的半监督双扫描三维分割架构。MambaX-Net包括两种新组件:一种马迈巴增强的交叉注意力模块,用于捕捉时间序列演变和远程空间依赖关系;一种形状提取模块,用于将先前的分割掩膜编码为潜在解剖结构表示以细化区域界限。同时,提出了一种半监督自训练策略,利用预训练模型生成的伪标签实现无需专家标注的有效学习。实验结果表明,在纵向活跃监测数据集上,MambaX-Net相较于目前最先进的U-Net和Transformer模型具有显著优势,能够在有限且带噪声的数据集上实现优异的前列腺区域分割效果。
Key Takeaways
- 活跃监测(AS)是治疗低危和中间风险前列腺癌的方法之一,其中准确的前列腺分割是重要步骤。
- 现有深度学习模型在纵向活跃监测分析中面临挑战,如多个时间点及专家标签稀缺的问题。
- MambaX-Net是一种半监督双扫描三维分割架构,用于解决上述问题。它使用两种新组件来提高性能:马迈巴增强的交叉注意力模块和形状提取模块。
- MambaX-Net能有效捕捉时间序列演变和远程空间依赖关系,并使用先前的分割掩膜进行精细区域划分。
- MambaX-Net通过引入半监督自训练策略,能在无专家标注的情况下实现有效学习。
- MambaX-Net在纵向活跃监测数据集上的表现优于其他先进模型,如U-Net和Transformer模型。
点此查看论文截图



Segmenting infant brains across magnetic fields: Domain randomization and annotation curation in ultra-low field MRI
Authors:Vladyslav Zalevskyi, Dondu-Busra Bulut, Thomas Sanchez, Meritxell Bach Cuadra
Early identification of neurodevelopmental disorders relies on accurate segmentation of brain structures in infancy, a task complicated by rapid brain growth, poor tissue contrast, and motion artifacts in pediatric MRI. These challenges are further exacerbated in ultra-low-field (ULF, 0.064~T) MRI, which, despite its lower image quality, offers an affordable, portable, and sedation-free alternative for use in low-resource settings. In this work, we propose a domain randomization (DR) framework to bridge the domain gap between high-field (HF) and ULF MRI in the context of the hippocampi and basal ganglia segmentation in the LISA challenge. We show that pre-training on whole-brain HF segmentations using DR significantly improves generalization to ULF data, and that careful curation of training labels, by removing misregistered HF-to-ULF annotations from training, further boosts performance. By fusing the predictions of several models through majority voting, we are able to achieve competitive performance. Our results demonstrate that combining robust augmentation with annotation quality control can enable accurate segmentation in ULF data. Our code is available at https://github.com/Medical-Image-Analysis-Laboratory/lisasegm
早期神经发育障碍的识别依赖于婴幼儿期脑结构的准确分割,这一任务因儿童大脑的快速生长、组织对比度差以及运动伪影而变得复杂。这些挑战在超低场(ULF,0.064~T)MRI中更为严重,尽管其图像质量较低,但它提供了经济实惠、便携且无需镇静的替代方案,适用于资源匮乏的环境。在这项工作中,我们提出了一个域随机化(DR)框架,以弥补高场(HF)和ULF MRI之间的域间隙差异,针对Lisa挑战中的海马体和基底神经节分割进行研究。我们证明了在全脑HF分割上使用DR进行预训练可显著提高对ULF数据的通用性,并且通过仔细筛选训练标签(从训练中剔除误注册的高频到超低场的注释),可以进一步提高性能。通过投票方式融合多个模型的预测结果,我们能够取得令人瞩目的表现。我们的结果表明,结合鲁棒的增强功能和注释质量控制能够在ULF数据中实现准确的分割。我们的代码可以在https://github.com/Medical-Image-Analysis-Laboratory/lisasegm找到。
论文及项目相关链接
PDF 1st place (hippocampus) and 3rd place (basal ganglia) in the Low field pediatric brain magnetic resonance Image Segmentation and quality Assurance Challenge (LISA) 2025
Summary
本文关注早期神经发育障碍的准确诊断,尤其是在低资源环境中。研究中提出一种基于领域随机化(DR)的框架,旨在缩小高场(HF)和超低频(ULF)MRI在脑结构分割方面的差距。通过预训练全脑HF分割并采用DR技术,模型在ULF数据上的泛化能力显著提高。同时,对训练标签进行仔细筛选,去除误注册的高频至低频标注,进一步提升性能。通过融合多个模型的预测结果,实现具有竞争力的表现。该研究结合了鲁棒增强与标注质量控制,使得ULF数据的精准分割成为可能。
Key Takeaways
- 早期神经发育障碍诊断依赖于婴儿期脑结构的准确分割。
- 超低频MRI(ULF)虽图像质量较低,但作为一种经济实惠、便携且无需镇静的选择,在低资源环境中具有应用价值。
- 领域随机化(DR)框架被用于缩小高场(HF)MRI与ULF MRI在脑结构分割上的差距。
- 预训练全脑HF分割并采用DR技术显著提高模型在ULF数据上的泛化能力。
- 训练标签的仔细筛选和误注册标注的去除对提升模型性能至关重要。
- 通过融合多个模型的预测结果,实现了在ULF数据上的精准分割。
点此查看论文截图



GACO-CAD: Geometry-Augmented and Conciseness-Optimized CAD Model Generation from Single Image
Authors:Yinghui Wang, Xinyu Zhang, Peng Du
Generating editable, parametric CAD models from a single image holds great potential to lower the barriers of industrial concept design. However, current multi-modal large language models (MLLMs) still struggle with accurately inferring 3D geometry from 2D images due to limited spatial reasoning capabilities. We address this limitation by introducing GACO-CAD, a novel two-stage post-training framework. It is designed to achieve a joint objective: simultaneously improving the geometric accuracy of the generated CAD models and encouraging the use of more concise modeling procedures. First, during supervised fine-tuning, we leverage depth and surface normal maps as dense geometric priors, combining them with the RGB image to form a multi-channel input. In the context of single-view reconstruction, these priors provide complementary spatial cues that help the MLLM more reliably recover 3D geometry from 2D observations. Second, during reinforcement learning, we introduce a group length reward that, while preserving high geometric fidelity, promotes the generation of more compact and less redundant parametric modeling sequences. A simple dynamic weighting strategy is adopted to stabilize training. Experiments on the DeepCAD and Fusion360 datasets show that GACO-CAD achieves state-of-the-art performance under the same MLLM backbone, consistently outperforming existing methods in terms of code validity, geometric accuracy, and modeling conciseness.
从单一图像生成可编辑的参数化CAD模型,为降低工业概念设计的壁垒带来了巨大的潜力。然而,由于空间推理能力的局限性,当前的多模态大型语言模型(MLLMs)在从二维图像准确推断三维几何结构方面仍然面临困难。我们通过引入GACO-CAD这一新型两阶段后训练框架来解决这一问题。它的设计旨在实现共同的目标:同时提高生成的CAD模型的空间准确性并鼓励使用更简洁的建模程序。首先,在监督微调期间,我们利用深度图和表面法线图作为密集几何先验知识,将它们与RGB图像结合形成多通道输入。在单视图重建的情境中,这些先验知识提供了补充的空间线索,有助于MLLM更可靠地从二维观测中恢复三维几何结构。其次,在强化学习期间,我们引入了一个组长度奖励,它在保持高几何保真度的同时,促进了更紧凑、更少冗余的参数化建模序列的生成。采用简单的动态加权策略来稳定训练。在DeepCAD和Fusion360数据集上的实验表明,GACO-CAD在相同的MLLM主干网络下达到了最先进的性能,在代码有效性、几何准确性和建模简洁性方面均优于现有方法。
论文及项目相关链接
Summary
GACO-CAD框架通过引入两阶段后训练技术,提升了从单图生成可编辑参数化CAD模型的几何精度和建模程序的简洁性。第一阶段利用深度图和表面法线图作为几何先验,与RGB图像结合形成多通道输入;第二阶段通过强化学习,引入组长度奖励,促进生成更简洁的模型序列。实验表明,GACO-CAD在MLLM架构下实现最佳性能,优于现有方法在代码有效性、几何精度和建模简洁性方面的表现。
Key Takeaways
- GACO-CAD框架旨在提高从单一图像生成的可编辑参数化CAD模型的几何精度和建模简洁性。
- 引入了两阶段后训练技术,包括利用深度图和表面法线图作为几何先验的监督微调阶段,以及通过强化学习进行训练。
- 在监督微调阶段,多通道输入提高了模型从2D图像中准确推断3D几何的能力。
- 强化学习阶段引入的组长度奖励,在保持高几何保真度的同时,促进了更简洁的参数化建模序列的生成。
- 采用简单的动态加权策略来稳定训练过程。
- 在DeepCAD和Fusion360数据集上的实验表明,GACO-CAD在相同MLLM架构下实现最佳性能。
点此查看论文截图






PorousGen: An Efficient Algorithm for Generating Porous Structures with Accurate Porosity and Uniform Density Distribution
Authors:Shota Arai, Takashi Yoshidome
This work presents a novel algorithm for generating porous structures as an alternative to the PoreSpy program suite. Unlike PoreSpy, which often produces structures whose porosity deviates from the target value, our proposed algorithm generates structures whose porosity closely matches the specified input, within a defined error margin. Furthermore, parallel computation enables efficient generation of large-scale structures, while memory usage is reduced compared to PoreSpy. To evaluate performance, structures were generated using both PoreSpy and the proposed method with parameters corresponding to X-ray ptychography experiments. The porosity mismatch in PoreSpy led to a relative error exceeding 20% in the computed gas diffusion coefficients, whereas our method reproduced the experimental values within 5%. These results demonstrate that the proposed method provides an efficient, high-precision approach for generating porous structures and supports reliable prediction of material properties. The program called PorousGen is publicly available under the MIT License from https://github.com/YoshidomeGroup-Hydration/PorousGen.
本文介绍了一种生成多孔结构的新算法,作为PoreSpy程序套件的一种替代方案。与经常产生孔隙度偏离目标值的PoreSpy不同,我们提出的算法能够生成孔隙度在指定输入值附近、在定义的误差范围内紧密匹配的结构。此外,并行计算使得能够高效生成大规模结构,与PoreSpy相比,内存使用也有所减少。为了评估性能,使用与X射线ptychography实验相对应的参数,使用PoreSpy和所提出的方法生成了结构。PoreSpy中的孔隙度不匹配导致计算出的气体扩散系数相对误差超过20%,而我们的方法在5%以内再现了实验值。这些结果表明,该方法提供了一种高效、高精度的生成多孔结构的方法,并支持对材料属性进行可靠的预测。该程序名为PorousGen,可在MIT许可证下从https://github.com/YoshidomeGroup-Hydration/PorousGen公开获取。
论文及项目相关链接
PDF 15 pages, 5 figures
Summary
本文介绍了一种生成多孔结构的新算法,作为PoreSpy程序套件的替代方案。新算法解决了PoreSpy产生的结构孔隙率与目标值偏差的问题,能够生成与指定输入紧密匹配的孔隙率结构,并在一定误差范围内表现良好。此外,新算法采用并行计算,提高了大规模结构的生成效率,并降低了内存使用量。对比实验表明,新算法在气体扩散系数计算方面的精度高于PoreSpy,能够可靠预测材料属性。相关程序PorousGen已在MIT许可证下公开,可从https://github.com/YoshidomeGroup-Hydration/PorousGen获取。
Key Takeaways
- 新算法解决了PoreSpy生成的多孔结构孔隙率与目标值偏差的问题。
- 新算法能够高效生成大规模结构,采用并行计算并降低内存使用。
- 新算法生成的孔隙率结构在误差范围内与指定输入紧密匹配。
- 对比实验表明新算法在气体扩散系数计算方面表现出高精确度。
- 新算法支持可靠预测材料属性。
- 新算法的程序PorousGen已公开,可在GitHub上获取。
点此查看论文截图



Click, Predict, Trust: Clinician-in-the-Loop AI Segmentation for Lung Cancer CT-Based Prognosis within the Knowledge-to-Action Framework
Authors:Mohammad R. Salmanpour, Sonya Falahati, Amir Hossein Pouria, Amin Mousavi, Somayeh Sadat Mehrnia, Morteza Alizadeh, Arman Gorji, Zeinab Farsangi, Alireza Safarian, Mehdi Maghsudi, Carlos Uribe, Arman Rahmim, Ren Yuan
Lung cancer remains the leading cause of cancer mortality, with CT imaging central to screening, prognosis, and treatment. Manual segmentation is variable and time-intensive, while deep learning (DL) offers automation but faces barriers to clinical adoption. Guided by the Knowledge-to-Action framework, this study develops a clinician-in-the-loop DL pipeline to enhance reproducibility, prognostic accuracy, and clinical trust. Multi-center CT data from 999 patients across 12 public datasets were analyzed using five DL models (3D Attention U-Net, ResUNet, VNet, ReconNet, SAM-Med3D), benchmarked against expert contours on whole and click-point cropped images. Segmentation reproducibility was assessed using 497 PySERA-extracted radiomic features via Spearman correlation, ICC, Wilcoxon tests, and MANOVA, while prognostic modeling compared supervised (SL) and semi-supervised learning (SSL) across 38 dimensionality reduction strategies and 24 classifiers. Six physicians qualitatively evaluated masks across seven domains, including clinical meaningfulness, boundary quality, prognostic value, trust, and workflow integration. VNet achieved the best performance (Dice = 0.83, IoU = 0.71), radiomic stability (mean correlation = 0.76, ICC = 0.65), and predictive accuracy under SSL (accuracy = 0.88, F1 = 0.83). SSL consistently outperformed SL across models. Radiologists favored VNet for peritumoral representation and smoother boundaries, preferring AI-generated initial masks for refinement rather than replacement. These results demonstrate that integrating VNet with SSL yields accurate, reproducible, and clinically trusted CT-based lung cancer prognosis, highlighting a feasible path toward physician-centered AI translation.
肺癌仍然是导致癌症死亡的主要原因,计算机断层扫描(CT)在筛查、预后和治疗中起着核心作用。手动分割具有多变性和耗时性,而深度学习(DL)虽然可以实现自动化,但临床采用仍面临障碍。本研究遵循知识到行动框架,开发了一种临床医生参与的深度学习管道,以提高可重复性、预后准确性和临床信任度。使用五个深度学习模型(3D Attention U-Net、ResUNet、VNet、ReconNet、SAM-Med3D)对来自12个公共数据集的999名患者的多中心CT数据进行了分析,以专家轮廓为基准,对整体和点击点裁剪图像进行了评估。使用PySERA提取的497个放射学特征通过斯皮尔曼相关性、ICC、威尔科克森检验和MANOVA评估了分割的可重复性。预后模型在38种降维策略和2d种分类器中比较了监督学习(SL)和半监督学习(SSL)。六位医生在七个领域对遮罩进行了定性评估,包括临床意义、边界质量、预后价值、信任和工作流程集成。VNet取得了最佳性能(Dice = 0.83,IoU = 0.71),放射稳定性(平均相关性= 0.76,ICC = 0.65),并在SSL下预测准确度较高(准确度= 0.88,F1 = 0.83)。在所有模型中,SSL始终优于SL。放射科医生青睐于VNet的肿瘤周围表示和更平滑的边界,更喜欢使用AI生成的初始遮罩进行细化而不是替换。这些结果表明,将VNet与SSL相结合,可实现准确、可复制和临床信赖的基于CT的肺癌预后预测,为以医生为中心的AI翻译提供了可行的路径。
论文及项目相关链接
PDF 13 pages, 2 figures, and 2 tables
Summary
本文研究了肺癌CT影像的深度学习分割与预后模型。研究采用多中心CT数据,对比了多种深度学习模型的表现,并通过医生评价验证了模型的实用性和可靠性。最终发现,VNet结合半监督学习在肺癌CT影像分割和预后预测中具有最佳性能,得到了医生的认可。
Key Takeaways
- 肺癌仍是导致癌症死亡的主要原因,CT成像在筛查、预后和治疗中起关键作用。
- 深度学习为肺部CT影像的自动分割提供了方法,但临床采纳存在障碍。
- 研究通过知识行动框架引导,建立了医生参与的深度学习流程,提高了重复性、预后准确性和临床信任度。
- 多中心CT数据分析和五种深度学习模型的对比表明,VNet在影像分割方面表现最佳。
- 评估分割重现性的结果显示,VNet的放射组学特征稳定,具有较高的预测准确性。
- 半监督学习在预后建模中表现优于监督学习。
点此查看论文截图


ReclAIm: A multi-agent framework for degradation-aware performance tuning of medical imaging AI
Authors:Eleftherios Tzanis, Michail E. Klontzas
Ensuring the long-term reliability of AI models in clinical practice requires continuous performance monitoring and corrective actions when degradation occurs. Addressing this need, this manuscript presents ReclAIm, a multi-agent framework capable of autonomously monitoring, evaluating, and fine-tuning medical image classification models. The system, built on a large language model core, operates entirely through natural language interaction, eliminating the need for programming expertise. ReclAIm successfully trains, evaluates, and maintains consistent performance of models across MRI, CT, and X-ray datasets. Once ReclAIm detects significant performance degradation, it autonomously executes state-of-the-art fine-tuning procedures that substantially reduce the performance gap. In cases with performance drops of up to -41.1% (MRI InceptionV3), ReclAIm managed to readjust performance metrics within 1.5% of the initial model results. ReclAIm enables automated, continuous maintenance of medical imaging AI models in a user-friendly and adaptable manner that facilitates broader adoption in both research and clinical environments.
确保人工智能模型在临床实践中的长期可靠性,需要进行持续的性能监控,并在性能下降时采取纠正措施。为满足这一需求,本文提出了ReclAIm,这是一个多智能体框架,能够自主监控、评估和微调医学图像分类模型。该系统基于大型语言模型核心构建,完全通过自然语言交互运行,无需编程专业知识。ReclAIm成功地在MRI、CT和X射线数据集上训练、评估和维护模型的性能一致性。一旦ReclAIm检测到性能显著下降,它将自主执行最先进的微调程序,从而大幅度缩小性能差距。在性能下降高达-41.1%(MRI InceptionV3)的情况下,ReclAIm能够在初始模型结果的1.5%范围内调整性能指标。ReclAIm以用户友好和可适应的方式实现了医学成像人工智能模型的自动化和持续维护,有助于在研究和临床环境中更广泛地采用。
论文及项目相关链接
PDF 25 pages, 4 figures
Summary
ReclAIm是一个多智能体框架,能够自主监控、评估和微调医学图像分类模型,从而提升AI模型在临床实践中的长期可靠性。该框架通过自然语言交互运作,无需编程专业知识。它可以在MRI、CT和X-ray数据集上成功训练、评估和保持模型的性能一致性。当检测到模型性能显著下降时,ReclAIm可自主执行先进的微调程序,大幅缩小性能差距。
Key Takeaways
- ReclAIm是一个多智能体框架,用于自主监控、评估和微调医学图像分类模型的性能。
- 该框架能够提升AI模型在临床实践中的长期可靠性。
- ReclAIm通过自然语言交互运作,消除了对编程专业知识的要求。
- ReclAIm可以在不同的医学图像数据集(如MRI、CT和X-ray)上成功训练、评估和保持模型性能。
- 当检测到模型性能显著下降时,ReclAIm能够自主执行微调程序。
- ReclAIm在性能下降的情况下,能够缩小性能差距。
- ReclAIm促进了医学成像AI模型的自动化、持续维护,为用户提供了友好、可适应的方式,有助于在研究和临床环境中更广泛地采用。
点此查看论文截图


Class-N-Diff: Classification-Induced Diffusion Model Can Make Fair Skin Cancer Diagnosis
Authors:Nusrat Munia, Abdullah Imran
Generative models, especially Diffusion Models, have demonstrated remarkable capability in generating high-quality synthetic data, including medical images. However, traditional class-conditioned generative models often struggle to generate images that accurately represent specific medical categories, limiting their usefulness for applications such as skin cancer diagnosis. To address this problem, we propose a classification-induced diffusion model, namely, Class-N-Diff, to simultaneously generate and classify dermoscopic images. Our Class-N-Diff model integrates a classifier within a diffusion model to guide image generation based on its class conditions. Thus, the model has better control over class-conditioned image synthesis, resulting in more realistic and diverse images. Additionally, the classifier demonstrates improved performance, highlighting its effectiveness for downstream diagnostic tasks. This unique integration in our Class-N-Diff makes it a robust tool for enhancing the quality and utility of diffusion model-based synthetic dermoscopic image generation. Our code is available at https://github.com/Munia03/Class-N-Diff.
生成模型,尤其是扩散模型,已经显示出在生成高质量合成数据(包括医学图像)方面的显著能力。然而,传统的类别条件生成模型往往难以生成准确代表特定医学类别的图像,这限制了它们在皮肤癌诊断等应用中的实用性。为了解决这个问题,我们提出了一种分类诱导的扩散模型,即Class-N-Diff模型,可以同时生成和分类皮肤镜图像。我们的Class-N-Diff模型在扩散模型内部集成了一个分类器,根据类别条件指导图像生成。因此,该模型对类别条件图像合成的控制更好,能够生成更真实、更多样的图像。此外,分类器还展示了其改进的性能,凸显其在下游诊断任务中的有效性。这种独特的集成在我们的Class-N-Diff模型中使其成为提高基于扩散模型的合成皮肤镜图像生成质量和实用性的强大工具。我们的代码可通过https://github.com/Munia03/Class-N-Diff获取。
论文及项目相关链接
PDF EMBC 2025
Summary
本文提出一种名为Class-N-Diff的分类引导扩散模型,该模型将分类器集成到扩散模型中,以根据类别条件引导医学图像生成。该模型在生成逼真且多样化的皮肤镜图像方面表现出优异性能,并提高了分类器的性能,使其成为增强扩散模型合成的皮肤镜图像质量和实用性的强大工具。
Key Takeaways
- 扩散模型在生成高质量合成数据,包括医学图像方面表现出卓越的能力。
- 传统类条件生成模型在生成准确代表特定医学类别的图像时遇到困难。
- Class-N-Diff模型通过集成分类器在扩散模型中,根据类别条件引导图像生成,从而解决了这一问题。
- Class-N-Diff模型能够更有效地控制类条件图像合成,生成更逼真和多样化的图像。
- Class-N-Diff中的分类器性能得到提升,显示其在下游诊断任务中的有效性。
- Class-N-Diff模型的独特集成使其成为提高基于扩散模型的合成皮肤镜图像质量和实用性的强大工具。
点此查看论文截图





BARL: Bilateral Alignment in Representation and Label Spaces for Semi-Supervised Volumetric Medical Image Segmentation
Authors:Shujian Gao, Yuan Wang, Zekuan Yu
Semi-supervised medical image segmentation (SSMIS) seeks to match fully supervised performance while sharply reducing annotation cost. Mainstream SSMIS methods rely on \emph{label-space consistency}, yet they overlook the equally critical \emph{representation-space alignment}. Without harmonizing latent features, models struggle to learn representations that are both discriminative and spatially coherent. To this end, we introduce \textbf{Bilateral Alignment in Representation and Label spaces (BARL)}, a unified framework that couples two collaborative branches and enforces alignment in both spaces. For label-space alignment, inspired by co-training and multi-scale decoding, we devise \textbf{Dual-Path Regularization (DPR)} and \textbf{Progressively Cognitive Bias Correction (PCBC)} to impose fine-grained cross-branch consistency while mitigating error accumulation from coarse to fine scales. For representation-space alignment, we conduct region-level and lesion-instance matching between branches, explicitly capturing the fragmented, complex pathological patterns common in medical imagery. Extensive experiments on four public benchmarks and a proprietary CBCT dataset demonstrate that BARL consistently surpasses state-of-the-art SSMIS methods. Ablative studies further validate the contribution of each component. Code will be released soon.
半监督医学图像分割(SSMIS)旨在匹配全监督性能的同时大幅度降低标注成本。主流SSMIS方法依赖于标签空间的一致性,却忽视了同样关键的表示空间对齐。没有协调潜在特征,模型在学习既有辨别力又空间连贯的表示时倍感困难。为此,我们引入了表示空间和标签空间的双向对齐(BARL),这是一个统一框架,包含两个协同分支并在两个空间内强制执行对齐。对于标签空间对齐,我们受到协同训练和多尺度解码的启发,设计了双路径正则化(DPR)和渐进认知偏差校正(PCBC),以在分支之间实现精细粒度的一致性,同时减轻从粗到细尺度的误差累积。对于表示空间对齐,我们在分支之间进行区域级别和病变实例匹配,明确捕捉医学图像中常见的碎片化、复杂病理模式。在四个公共基准测试和专有CBCT数据集上的大量实验表明,BARL始终超越最新SSMIS方法。消融研究进一步验证了每个组件的贡献。代码将很快发布。
论文及项目相关链接
PDF 14 pages, 5 figures
Summary
半监督医学图像分割(SSMIS)旨在实现与全监督方法相当的性能,同时大大降低标注成本。当前主流方法依赖于标签空间一致性,但忽略了表示空间对齐同样重要。本文提出BARL(双边对齐表示和标签空间)框架,通过两个协作分支实现两个空间的协同对齐。对于标签空间对齐,本文受到协同训练和多尺度解码的启发,设计了双路径正则化和渐进认知偏差校正,以实现精细跨分支一致性并减少从粗到细尺度的误差累积。对于表示空间对齐,本文在分支间进行区域级别和病变实例匹配,明确捕捉医学图像中常见的碎片化、复杂病理模式。在多个公共基准测试和专有CBCT数据集上的广泛实验表明,BARL性能超越现有SSMIS方法。
Key Takeaways
- SSMIS旨在减少标注成本同时达到全监督性能。
- 当前方法主要关注标签空间一致性,但表示空间对齐同样重要。
- BARL框架实现表示空间和标签空间的双边对齐。
- 对于标签空间对齐,设计了双路径正则化和渐进认知偏差校正。
- 对于表示空间对齐,进行区域和病变实例的分支间匹配。
- BARL在多个实验中的性能超越现有SSMIS方法。
点此查看论文截图




EMRRG: Efficient Fine-Tuning Pre-trained X-ray Mamba Networks for Radiology Report Generation
Authors:Mingzheng Zhang, Jinfeng Gao, Dan Xu, Jiangrui Yu, Yuhan Qiao, Lan Chen, Jin Tang, Xiao Wang
X-ray image-based medical report generation (MRG) is a pivotal area in artificial intelligence that can significantly reduce diagnostic burdens for clinicians and patient wait times. Existing MRG models predominantly rely on Large Language Models (LLMs) to improve report generation, with limited exploration of pre-trained vision foundation models or advanced fine-tuning techniques. Mainstream frameworks either avoid fine-tuning or utilize simplistic methods like LoRA, often neglecting the potential of enhancing cross-attention mechanisms. Additionally, while Transformer-based models dominate vision-language tasks, non-Transformer architectures, such as the Mamba network, remain underexplored for medical report generation, presenting a promising avenue for future research. In this paper, we propose EMRRG, a novel X-ray report generation framework that fine-tunes pre-trained Mamba networks using parameter-efficient methods. Specifically, X-ray images are divided into patches, tokenized, and processed by an SSM-based vision backbone for feature extraction, with Partial LoRA yielding optimal performance. An LLM with a hybrid decoder generates the medical report, enabling end-to-end training and achieving strong results on benchmark datasets. Extensive experiments on three widely used benchmark datasets fully validated the effectiveness of our proposed strategies for the X-ray MRG. The source code of this paper will be released on https://github.com/Event-AHU/Medical_Image_Analysis.
基于X射线的医学报告生成(MRG)是人工智能中的一个关键领域,可以显著降低医生的诊断负担和患者的等待时间。现有的MRG模型主要依赖大型语言模型(LLM)来改善报告生成,对预训练的视觉基础模型和高级微调技术的探索有限。主流框架要么避免微调,要么使用简单的如LoRA等方法,往往忽视了增强交叉注意力机制的潜力。虽然基于Transformer的模型在视觉语言任务中占主导地位,但对于医学报告生成来说,非Transformer架构,如Mamba网络,仍然被忽视,这为未来的研究提供了一个有前景的方向。在本文中,我们提出了EMRRG,这是一种新型的X射线报告生成框架,它使用参数有效的方法对预训练的Mamba网络进行微调。具体来说,X射线图像被分割成补丁,进行标记化,并通过SSM为基础的视觉主干进行特征提取,其中Partial LoRA取得了最佳性能。一个带有混合解码器的大型语言模型生成医学报告,实现了端到端的训练和在基准数据集上的强大表现。在三个广泛使用的基准数据集上进行的大量实验充分验证了我们为X射线MRG提出的策略的有效性。本文的源代码将在https://github.com/Event-AHU/Medical_Image_Analysis上发布。
论文及项目相关链接
Summary
基于X射线图像的医学报告生成(MRG)是人工智能领域的关键方向,可显著减少医生的诊断负担和患者的等待时间。现有MRG模型主要依赖大型语言模型(LLM)进行报告生成,很少探索预训练的视觉基础模型或高级微调技术。本文提出了 EMRRG,一种新型的 X 射线报告生成框架,它使用参数高效方法对预训练的 Mamba 网络进行微调。通过分割X射线图像成补丁、令牌化,并使用SSM为基础的视觉主干进行特征提取,结合部分LoRA技术达到最佳性能。采用带有混合解码器的LLM生成医学报告,实现端到端的训练和强结果表现。广泛的实验验证充分证明了我们提出的策略在X射线MRG中的有效性。
Key Takeaways
- X射线图像医学报告生成是人工智能领域的重要应用,有助于减轻医生诊断负担和缩短患者等待时间。
- 当前MRG模型主要依赖大型语言模型,而预训练的视觉基础模型和高级微调技术尚未得到充分探索。
- 本文提出一种新型的X射线报告生成框架——EMRRG,结合参数高效微调方法和Mamba网络进行医学报告生成。
- EMRRG框架采用分割、令牌化和特征提取等技术处理X射线图像,并结合部分LoRA技术达到最佳性能。
- 框架采用带有混合解码器的大型语言模型生成医学报告,实现了端到端的训练和良好的结果表现。
- 在三个广泛使用的基准数据集上进行了大量实验,充分验证了EMRRG框架的有效性。
点此查看论文截图



Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
Authors:Zheng Huang, Enpei Zhang, Yinghao Cai, Weikang Qiu, Carl Yang, Elynn Chen, Xiang Zhang, Rex Ying, Dawei Zhou, Yujun Yan
Understanding how the brain encodes visual information is a central challenge in neuroscience and machine learning. A promising approach is to reconstruct visual stimuli, essentially images, from functional Magnetic Resonance Imaging (fMRI) signals. This involves two stages: transforming fMRI signals into a latent space and then using a pretrained generative model to reconstruct images. The reconstruction quality depends on how similar the latent space is to the structure of neural activity and how well the generative model produces images from that space. Yet, it remains unclear which type of latent space best supports this transformation and how it should be organized to represent visual stimuli effectively. We present two key findings. First, fMRI signals are more similar to the text space of a language model than to either a vision based space or a joint text image space. Second, text representations and the generative model should be adapted to capture the compositional nature of visual stimuli, including objects, their detailed attributes, and relationships. Building on these insights, we propose PRISM, a model that Projects fMRI sIgnals into a Structured text space as an interMediate representation for visual stimuli reconstruction. It includes an object centric diffusion module that generates images by composing individual objects to reduce object detection errors, and an attribute relationship search module that automatically identifies key attributes and relationships that best align with the neural activity. Extensive experiments on real world datasets demonstrate that our framework outperforms existing methods, achieving up to an 8% reduction in perceptual loss. These results highlight the importance of using structured text as the intermediate space to bridge fMRI signals and image reconstruction.
理解大脑如何编码视觉信息是神经科学和机器学习领域的一个核心挑战。一种有前途的方法是通过功能磁共振成像(fMRI)信号重建视觉刺激,本质上是图像。这涉及两个阶段:将fMRI信号转换为潜在空间,然后使用预训练的生成模型来重建图像。重建质量取决于潜在空间与神经活动结构之间的相似性,以及生成模型从该空间生成图像的能力。然而,仍不清楚哪种类型的潜在空间最适合支持这种转换,以及如何组织它以有效地表示视觉刺激。我们提出两个关键发现。首先,fMRI信号与语言模型的文本空间更为相似,而不是基于视觉的空间或联合文本图像空间。其次,文本表示和生成模型应适应于捕捉视觉刺激的组成性质,包括物体、其详细属性和关系。基于这些见解,我们提出了PRISM模型,该模型将fMRI信号投影到结构化文本空间作为视觉刺激重建的中间表示。它包括一个以对象为中心的扩散模块,通过组合单个对象来生成图像,以减少对象检测错误;以及一个属性关系搜索模块,该模块自动识别与神经活动最佳对齐的关键属性和关系。在现实数据集上的大量实验表明,我们的框架优于现有方法,感知损失减少了高达8%。这些结果强调了使用结构化文本作为连接fMRI信号和图像重建的中间空间的重要性。
论文及项目相关链接
Summary
本文探讨如何使用功能磁共振成像(fMRI)信号重构视觉刺激图像,提出了两项关键发现。首先,fMRI信号与语言模型的文本空间更为相似,而非基于视觉的空间或联合文本图像空间。其次,为了捕捉视觉刺激的组成性质,包括物体、详细属性和关系,文本表示和生成模型应进行相应的调整。基于此,本文提出了PRISM模型,将fMRI信号投影到结构化文本空间作为视觉刺激重构的中间表示。实验证明,该框架优于现有方法,降低了8%的感知损失。这表明使用结构化文本作为连接fMRI信号和图像重构的中间空间至关重要。
Key Takeaways
- 利用功能磁共振成像(fMRI)信号重构视觉刺激是神经科学和机器学习领域的一个核心挑战。
- fMRI信号与语言模型的文本空间更为相似。
- 文本表示和生成模型需要适应捕捉视觉刺激的组成性质,包括物体、详细属性和关系。
- PRISM模型通过将fMRI信号投影到结构化文本空间来作为视觉刺激重构的中间表示。
- PRISM模型包括一个对象中心扩散模块,通过组合单个物体来生成图像,以减少对象检测错误。
- PRISM模型还包含一个属性关系搜索模块,可自动识别与神经活动最佳对齐的关键属性和关系。
点此查看论文截图



DuetMatch: Harmonizing Semi-Supervised Brain MRI Segmentation via Decoupled Branch Optimization
Authors:Thanh-Huy Nguyen, Hoang-Thien Nguyen, Vi Vu, Ba-Thinh Lam, Phat Huynh, Tianyang Wang, Xingjian Li, Ulas Bagci, Min Xu
The limited availability of annotated data in medical imaging makes semi-supervised learning increasingly appealing for its ability to learn from imperfect supervision. Recently, teacher-student frameworks have gained popularity for their training benefits and robust performance. However, jointly optimizing the entire network can hinder convergence and stability, especially in challenging scenarios. To address this for medical image segmentation, we propose DuetMatch, a novel dual-branch semi-supervised framework with asynchronous optimization, where each branch optimizes either the encoder or decoder while keeping the other frozen. To improve consistency under noisy conditions, we introduce Decoupled Dropout Perturbation, enforcing regularization across branches. We also design Pair-wise CutMix Cross-Guidance to enhance model diversity by exchanging pseudo-labels through augmented input pairs. To mitigate confirmation bias from noisy pseudo-labels, we propose Consistency Matching, refining labels using stable predictions from frozen teacher models. Extensive experiments on benchmark brain MRI segmentation datasets, including ISLES2022 and BraTS, show that DuetMatch consistently outperforms state-of-the-art methods, demonstrating its effectiveness and robustness across diverse semi-supervised segmentation scenarios.
在医学成像中,标注数据的有限可用性使得半监督学习因其从非完美监督中学习的能力而越来越受欢迎。最近,师徒框架因其训练效益和稳健性能而受到欢迎。然而,同时优化整个网络可能会阻碍收敛和稳定性,尤其在具有挑战性的场景中。为了解决医学图像分割中的这一问题,我们提出了DuetMatch,这是一种具有异步优化的新型双分支半监督框架,每个分支可以优化编码器或解码器,同时保持另一个分支冻结。为了提高在有噪声条件下的一致性,我们引入了去耦dropout扰动,在分支之间实施正则化。我们还设计了Pair-wise CutMix Cross-Guidance,通过增强输入对交换伪标签来增强模型多样性。为了减少来自噪声伪标签的确认偏见,我们提出了一致性匹配,使用稳定的预测结果对冻结的教师模型进行标签细化。在ISLES2022和BraTS等基准大脑MRI分割数据集上的广泛实验表明,DuetMatch始终优于最先进的方法,证明了其在各种半监督分割场景中的有效性和稳健性。
论文及项目相关链接
PDF The paper is under review at CMIG
Summary
本文提出了一种基于异步优化的双分支半监督学习框架DuetMatch,用于医学图像分割。该框架通过优化编码器或解码器中的一个分支,同时冻结另一个分支,以提高模型的收敛性和稳定性。为提高噪声条件下的模型一致性,引入了去耦dropout扰动技术。此外,通过交换增强输入对的伪标签增强模型多样性,并设计一致性匹配技术以缓解噪声伪标签带来的确认偏见。在ISLES2022和BraTS等基准脑MRI分割数据集上的实验表明,DuetMatch在多种半监督分割场景中表现优异,优于现有方法。
Key Takeaways
- 医学图像标注数据有限,半监督学习因其能从不完全监督中学习而越来越受欢迎。
- 教师-学生框架因其训练优势和稳健性能而受到关注,但整体网络联合优化可能阻碍收敛和稳定性。
- 引入DuetMatch:一种新型双分支半监督框架,采用异步优化,分别优化编码器或解码器。
- 为提高噪声条件下的模型一致性,提出去耦dropout扰动技术。
- 设计了Pair-wise CutMix Cross-Guidance技术,通过交换伪标签增强模型多样性。
- 为缓解噪声伪标签带来的确认偏见,引入一致性匹配技术。
- 在多个基准数据集上的实验表明,DuetMatch在多种半监督分割场景中表现优于现有方法。
点此查看论文截图

Identifying multi-omics interactions for lung cancer drug targets discovery using Kernel Machine Regression
Authors:Md. Imtyaz Ahmed, Md. Delwar Hossain, Md Mostafizer Rahman, Md. Ahsan Habib, Md. Mamunur Rashid, Md. Selim Reza, Md Ashad Alam
Cancer exhibits diverse and complex phenotypes driven by multifaceted molecular interactions. Recent biomedical research has emphasized the comprehensive study of such diseases by integrating multi-omics datasets (genome, proteome, transcriptome, epigenome). This approach provides an efficient method for identifying genetic variants associated with cancer and offers a deeper understanding of how the disease develops and spreads. However, it is challenging to comprehend complex interactions among the features of multi-omics datasets compared to single omics. In this paper, we analyze lung cancer multi-omics datasets from The Cancer Genome Atlas (TCGA). Using four statistical methods, LIMMA, the T test, Canonical Correlation Analysis (CCA), and the Wilcoxon test, we identified differentially expressed genes across gene expression, DNA methylation, and miRNA expression data. We then integrated these multi-omics data using the Kernel Machine Regression (KMR) approach. Our findings reveal significant interactions among the three omics: gene expression, miRNA expression, and DNA methylation in lung cancer. From our data analysis, we identified 38 genes significantly associated with lung cancer. From our data analysis, we identified 38 genes significantly associated with lung cancer. Among these, eight genes of highest ranking (PDGFRB, PDGFRA, SNAI1, ID1, FGF11, TNXB, ITGB1, ZIC1) were highlighted by rigorous statistical analysis. Furthermore, in silico studies identified three top-ranked potential candidate drugs (Selinexor, Orapred, and Capmatinib) that could play a crucial role in the treatment of lung cancer. These proposed drugs are also supported by the findings of other independent studies, which underscore their potential efficacy in the fight against lung cancer.
癌症表现出多样且复杂的表现型,其背后由多层面的分子交互驱动。近期的生物医学研究强调了通过整合多组学数据集(基因组、蛋白质组、转录组和表观基因组)来对这类疾病进行全面研究的重要性。这种方法为识别与癌症相关的遗传变异提供了一种高效方法,并提供了对疾病发展和扩散的更深层次理解。然而,与单组学相比,理解多组学数据特征之间的复杂交互是一项挑战。在本文中,我们分析了来自癌症基因组图谱(TCGA)的肺癌多组学数据集。我们使用四种统计方法(LIMMA、T检验、典型相关分析(CCA)和Wilcoxon检验)来识别基因表达、DNA甲基化和miRNA表达数据中的差异表达基因。然后,我们使用核机器回归(KMR)方法整合这些多组学数据。我们的研究发现,基因表达、miRNA表达和DNA甲基化三者之间存在显著的交互作用。通过我们的数据分析,我们鉴定了与肺癌显著相关的38个基因。其中,排名最高的8个基因(PDGFRB、PDGFRA、SNAI1、ID1、FGF11、TNXB、ITGB1、ZIC1)经过严格的统计分析被突出显示。此外,计算机模拟研究确定了三种排名最高的潜在候选药物(Selinexor、Orapred和Capmatinib),这些药物在肺癌的治疗中可能发挥关键作用。这些提议的药物也得到了其他独立研究的支持,这强调了它们在抗击肺癌中的潜在疗效。
论文及项目相关链接
Summary
本文通过分析肺癌的多组学数据集,包括基因表达、DNA甲基化和miRNA表达数据,揭示了与肺癌相关的基因交互和变化。利用四种统计方法识别了差异表达的基因,并利用核机器回归方法对这些数据进行了整合。研究发现了38个与肺癌显著相关的基因,并确定了其中8个高排名的基因。此外,还通过计算机模拟研究发现了三种可能成为肺癌治疗候选药物的潜在药物。
Key Takeaways
- 癌症展现出多样且复杂的表型,这由多面的分子交互驱动。
- 整合多组学数据集(基因组、蛋白质组、转录组和表观基因组)对于全面研究此类疾病至关重要。
- 通过分析肺癌的多组学数据集,可以识别与癌症相关的基因变异并深入了解疾病发展和传播。
- 使用LIMMA、T检验、典型相关性分析和Wilcoxon检验等统计方法,可以识别差异表达的基因。
- 核机器回归方法有助于整合多组学数据并揭示基因表达、miRNA表达和DNA甲基化之间的显著交互。
- 研究发现了38个与肺癌显著相关的基因,其中8个基因表现尤为突出。
点此查看论文截图


CARDIUM: Congenital Anomaly Recognition with Diagnostic Images and Unified Medical records
Authors:Daniela Vega, Hannah V. Ceballos, Javier S. Vera, Santiago Rodriguez, Alejandra Perez, Angela Castillo, Maria Escobar, Dario Londoño, Luis A. Sarmiento, Camila I. Castro, Nadiezhda Rodriguez, Juan C. Briceño, Pablo Arbeláez
Prenatal diagnosis of Congenital Heart Diseases (CHDs) holds great potential for Artificial Intelligence (AI)-driven solutions. However, collecting high-quality diagnostic data remains difficult due to the rarity of these conditions, resulting in imbalanced and low-quality datasets that hinder model performance. Moreover, no public efforts have been made to integrate multiple sources of information, such as imaging and clinical data, further limiting the ability of AI models to support and enhance clinical decision-making. To overcome these challenges, we introduce the Congenital Anomaly Recognition with Diagnostic Images and Unified Medical records (CARDIUM) dataset, the first publicly available multimodal dataset consolidating fetal ultrasound and echocardiographic images along with maternal clinical records for prenatal CHD detection. Furthermore, we propose a robust multimodal transformer architecture that incorporates a cross-attention mechanism to fuse feature representations from image and tabular data, improving CHD detection by 11% and 50% over image and tabular single-modality approaches, respectively, and achieving an F1 score of 79.8 $\pm$ 4.8% in the CARDIUM dataset. We will publicly release our dataset and code to encourage further research on this unexplored field. Our dataset and code are available at https://github.com/BCV-Uniandes/Cardium, and at the project website https://bcv-uniandes.github.io/CardiumPage/
先天性心脏疾病(CHDs)的产前诊断在人工智能(AI)驱动的解决方案方面具有巨大潜力。然而,由于这些疾病的罕见性,收集高质量的诊断数据仍然很困难,导致数据集不平衡且质量低下,从而影响了模型性能。此外,尚未有公开的努力尝试整合多种来源的信息,如成像和临床数据,这进一步限制了AI模型在支持和增强临床决策制定方面的能力。为了克服这些挑战,我们推出了先天性异常识别与诊断图像和统一医疗记录(CARDIUM)数据集,这是第一个公开可用的多模式数据集,整合了胎儿超声和超声心动图像以及产妇临床记录,用于产前CHD检测。此外,我们提出了一种稳健的多模式转换器架构,该架构采用了交叉注意机制来融合图像和表格数据的特征表示,提高了CHD检测的准确性,相对于图像和表格单模态方法分别提高了11%和50%,在CARDIUM数据集上达到了79.8±4.8%的F1分数。我们将公开发布我们的数据集和代码,以鼓励对此未开发领域进行进一步的研究。我们的数据集和代码可在[https://github.com/BCV-Uniandes/Cardium以及项目网站https://bcv-uniandes.github.io/CardiumPage上找到。]
论文及项目相关链接
PDF Accepted to CVAMD Workshop, ICCV 2025
Summary
先天性心脏疾病(CHD)的产前诊断具有巨大的应用人工智能(AI)解决方案的潜力。然而,由于这些疾病的罕见性,收集高质量的诊断数据仍然具有挑战性,导致数据集不平衡且质量低下,从而阻碍模型性能。此外,尚未有公开的努力整合成像和临床数据等多元信息,限制了AI模型在支持和提高临床决策制定方面的能力。为了克服这些挑战,我们推出了先天性异常识别与诊断图像和统一医疗记录(CARDIUM)数据集,这是首个公开可用的多模式数据集,整合了胎儿超声和超声心动图像以及孕妇临床记录用于产前CHD检测。我们还提出了一种稳健的多模式转换器架构,该架构采用跨注意力机制来融合图像和表格数据的特征表示,提高了CHD检测的准确性。
Key Takeaways
- 先天性心脏疾病(CHD)的产前诊断在利用人工智能解决方案方面具有巨大潜力。
- 收集高质量的诊断数据对于CHDs的AI模型训练是困难的,因为这些疾病的罕见性导致数据集不平衡且质量低下。
- 目前尚未有公开的努力整合多元信息,如成像和临床数据,以支持AI模型在CHDs方面的应用。
- 推出了CARDIUM数据集,这是首个公开可用的多模式数据集,用于产前CHD检测,其中包括胎儿超声和超声心动图像以及孕妇临床记录。
- 提出了一种稳健的多模式转换器架构,该架构通过融合图像和表格数据的特征表示来提高CHD检测的准确性。
- 该研究在CARDIUM数据集上实现了F1分数为79.8±4.8%的检测结果。
点此查看论文截图




Dolphin v1.0 Technical Report
Authors:Taohan Weng, Kaibing Hu, Henan Liu, Siya Liu, Xiaoyang Liu, Zhenyu Liu, Jiren Ren, Boyan Wang, Boyang Wang, Yiyu Wang, Yalun Wu, Chaoran Yan, Kaiwen Yan, Jinze Yu, Chi Zhang, Duo Zhang, Haoyun Zheng, Xiaoqing Guo, Jacques Souquet, Hongcheng Guo, Anjie Le
Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound’s complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.
超声在现代医学中至关重要,但面临着操作者依赖、图像噪声和实时扫描等挑战,阻碍了人工智能的整合。虽然大型多模式模型在其他医学成像领域表现出色,但在应对超声的复杂性方面却遇到困难。为了解决这一问题,我们推出了Dolphin v1.0(V1)及其增强推理版本Dolphin R1——这是第一个统一了多种临床任务的大型超声多模式基础模型,在一个视觉语言框架内。为了解决超声变性和噪声问题,我们筛选了一个规模达2百万的多模式数据集,结合了教科书知识、公开数据、合成样本和一般语料库。这确保了稳健的感知、通用性和临床适应性。Dolphin系列采用了一种三阶段的训练策略:领域专业化预训练、指令驱动对齐和基于强化的精炼。Dolphin v1.0在分类、检测、回归和报告生成方面表现出可靠的性能。Dolphin R1通过强化学习利用超声特定奖励增强诊断推理、推理透明度和解释性。在U2-Bench上评估的八个超声任务中,Dolphin R1的U2分数为0.5835,是第二名模型(0.2968)的两倍多,创造了新的技术记录。Dolphin v1.0也表现出竞争力,验证了统一框架的有效性。对比显示,增强推理训练显著提高了诊断的准确性、一致性和解释性,强调了其在高风险医疗人工智能中的重要性。
论文及项目相关链接
Summary
本文介绍了超声波在现代医学中的重要性及其所面临的挑战,如操作依赖性、图像噪声和实时扫描等。为了解决这些问题,文章提出了海豚v1.0及其增强版海豚R1,这是首个大规模的多模式超声波基础模型,在一个统一的视觉语言框架内融合了不同的临床任务。通过采用三阶段训练策略和大规模多模式数据集,海豚系列模型在分类、检测、回归和报告生成等方面表现出可靠性能。其中,海豚R1在U2-Bench的八个超声波任务上取得了最新技术水平的成绩。
Key Takeaways
- 超声波在现代医学中至关重要,但面临操作依赖性、图像噪声和实时扫描等挑战。
- 多模式大型模型在其它医学成像领域表现出色,但在应对超声波复杂性方面存在困难。
- 海豚v1.0及其增强版海豚R1被引入,作为首个大规模多模式超声波基础模型,统一了不同的临床任务在一个视觉语言框架内。
- 为了解决超声波的变性和噪声问题,研究团队创建了一个大规模的多模式数据集,确保了稳健的感知、概括和临床适应性。
- 海豚系列采用了三阶段训练策略,包括领域专业预训练、指令驱动对齐和强化学习基础上的精炼。
- 海豚R1在U2-Bench的多个超声波任务上取得了最新技术水平的成绩,诊断推断、推理透明度和解释性得到了增强。
点此查看论文截图




Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
Authors:Raphaël Bourgade, Guillaume Balezo, Hana Feki, Lily Monier, Matthieu Blons, Alice Blondel, Delphine Loussouarn, Anne Vincent-Salomon, Thomas Walter
Mitotic figures represent a key histoprognostic feature in tumor pathology, providing crucial insights into tumor aggressiveness and proliferation. However, their identification remains challenging, subject to significant inter-observer variability, even among experienced pathologists. To address this issue, the MItosis DOmain Generalization (MIDOG) 2025 challenge marks the third edition of an international competition aiming to develop robust mitosis detection algorithms. In this paper, we present a mitotic figure detection approach based on the state-of-the-art YOLOv12 object detection architecture. Our method achieved an F1-score of 0.801 on the preliminary test set (hotspots only) and ranked second on the final test leaderboard with an F1-score of 0.7216 across complex and heterogeneous whole-slide regions, without relying on external data.
有丝分裂图像是肿瘤病理中的关键组织预后特征,为理解肿瘤的侵袭性和增殖提供了重要依据。然而,它们的识别仍然具有挑战性,即使是经验丰富的病理学家之间也存在显著的观察者间变异。为了解决这一问题,有丝分裂域泛化(MIDOG)2025挑战赛是国际竞赛的第三版,旨在开发稳健的有丝分裂检测算法。在本文中,我们提出了一种基于最新YOLOv12目标检测架构的有丝分裂图像检测方法。我们的方法在初步测试集(仅热点)上达到了0.801的F1分数,并在最终测试排行榜上以0.7216的F1分数排名第二,且无需依赖外部数据,即可处理复杂且非均质的整张幻灯片区域。
论文及项目相关链接
Summary
本文介绍了基于最新YOLOv12目标检测架构的肿瘤分裂象检测新方法,对于预测肿瘤侵袭性和增殖等关键预后因素具有重要意义。文章提到虽然病理学中识别分裂象仍是难点,但其仍是诊断关键信息的重要一环。因此,MIDOG挑战应运而生,旨在开发稳健的分裂象检测算法。本文提出的方案在初步测试集上的准确率为0.801(仅限于热点区域),并在最终测试排行榜上排名第二,其准确率为0.7216,即使面对复杂且异质的全幻灯片区域也无需依赖外部数据。这一发现有助于改善病理学家间因观察者差异造成的诊断不确定性问题。
Key Takeaways
- 分裂象是肿瘤病理中的关键指标,有助于了解肿瘤侵袭性和增殖。但其识别难度大,不同观察者之间存在较大差异。因此出现MIDOG挑战推动对稳健分裂象检测算法的研究发展。
点此查看论文截图


