⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-11-09 更新
Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis
Authors:Eva Prakash, Jeya Maria Jose Valanarasu, Zhihong Chen, Eduardo Pontes Reis, Andrew Johnston, Anuj Pareek, Christian Bluethgen, Sergios Gatidis, Cameron Olsen, Akshay Chaudhari, Andrew Ng, Curtis Langlotz
Purpose: To explore best-practice approaches for generating synthetic chest X-ray images and augmenting medical imaging datasets to optimize the performance of deep learning models in downstream tasks like classification and segmentation. Materials and Methods: We utilized a latent diffusion model to condition the generation of synthetic chest X-rays on text prompts and/or segmentation masks. We explored methods like using a proxy model and using radiologist feedback to improve the quality of synthetic data. These synthetic images were then generated from relevant disease information or geometrically transformed segmentation masks and added to ground truth training set images from the CheXpert, CANDID-PTX, SIIM, and RSNA Pneumonia datasets to measure improvements in classification and segmentation model performance on the test sets. F1 and Dice scores were used to evaluate classification and segmentation respectively. One-tailed t-tests with Bonferroni correction assessed the statistical significance of performance improvements with synthetic data. Results: Across all experiments, the synthetic data we generated resulted in a maximum mean classification F1 score improvement of 0.150453 (CI: 0.099108-0.201798; P=0.0031) compared to using only real data. For segmentation, the maximum Dice score improvement was 0.14575 (CI: 0.108267-0.183233; P=0.0064). Conclusion: Best practices for generating synthetic chest X-ray images for downstream tasks include conditioning on single-disease labels or geometrically transformed segmentation masks, as well as potentially using proxy modeling for fine-tuning such generations.
目的:本文旨在探索生成合成胸部X射线图像的最佳实践方法,以及增强医学影像数据集的方式,以优化下游任务(如分类和分割)中深度学习模型的表现。
材料与方法:我们利用潜在扩散模型,根据文本提示和/或分割掩膜来条件生成合成胸部X射线图像。我们探索了使用代理模型和使用放射科医生反馈的方法来提高合成数据的质量。这些合成图像根据相关的疾病信息或几何变换分割掩膜生成,并添加到来自CheXpert、CANDID-PTX、SIIM和RSNA Pneumonia数据集的地面真实训练集图像中,以测量测试集上分类和分割模型性能的提升。我们使用F1分数和Dice系数分别评估分类和分割的效果。采用单尾t检验(Bonferroni校正)评估使用合成数据后性能提升的统计显著性。
结果:在所有实验中,与我们仅使用真实数据相比,生成的合成数据使分类的F1分数平均提高了最大0.150453(置信区间:0.099108-0.201798;P=0.0031)。对于分割任务,Dice系数的最大提高为0.14575(置信区间:0.108267-0.183233;P=0.0064)。
论文及项目相关链接
摘要
本文旨在探讨生成合成胸X射线图像和增强医学成像数据集的最佳实践方法,以优化下游任务(如分类和分割)中深度学习模型的表现。通过潜在扩散模型以文本提示和/或分割掩膜为条件生成合成胸X射线图像。采用代理模型和放射科医生反馈等方法提高合成数据的质量。这些合成图像从相关疾病信息或几何变换分割掩膜生成,并添加到来自CheXpert、CANDID-PTX、SIIM和RSNA Pneumonia数据集的地面真实训练集图像中,以测量分类和分割模型在测试集上的性能改进。使用F1分数和Dice系数分别评估分类和分割的性能。通过Bonferroni校正的单尾t检验评估了使用合成数据后性能改进的统计学意义。实验结果显示,与仅使用真实数据相比,生成的合成数据使平均分类F1分数最大提高了0.150453(CI:0.099108-0.201798;P=0.0031)。对于分割,Dice分数的最大提高为0.14575(CI:0.108267-0.183233;P=0.0064)。结论指出,为下游任务生成合成胸X射线图像的最佳实践包括以单一疾病标签或几何变换分割掩膜为条件,以及可能使用代理模型进行微调。
关键见解
- 利用潜在扩散模型生成合成胸X射线图像,可基于文本提示和/或分割掩膜进行条件控制。
- 通过使用代理模型和放射科医生反馈,提高了合成数据的质量。
- 合成图像可用于增强医学成像数据集,通过添加与真实训练集图像相关的疾病信息或几何变换分割掩膜来提高深度学习模型的性能。
- 在分类任务中,使用合成数据最大可提高F1分数达0.150453。
- 在分割任务中,使用合成数据最大可提高Dice系数达0.14575。
- 最佳实践包括根据单一疾病标签或几何变换的分割掩膜生成合成图像,并考虑使用代理模型进行微调。
点此查看论文截图
A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift
Authors:Sanath Budakegowdanadoddi Nagaraju, Brian Bernhard Moser, Tobias Christian Nauen, Stanislav Frolov, Federico Raue, Andreas Dengel
Transformer-based architectures have recently advanced the image reconstruction quality of super-resolution (SR) models. Yet, their scalability remains limited by quadratic attention costs and coarse patch embeddings that weaken pixel-level fidelity. We propose TaylorIR, a plug-and-play framework that enforces 1x1 patch embeddings for true pixel-wise reasoning and replaces conventional self-attention with TaylorShift, a Taylor-series-based attention mechanism enabling full token interactions with near-linear complexity. Across multiple SR benchmarks, TaylorIR delivers state-of-the-art performance while reducing memory consumption by up to 60%, effectively bridging the gap between fine-grained detail restoration and efficient transformer scaling.
基于Transformer的架构最近已经提高了超分辨率(SR)模型的图像重建质量。然而,它们的可扩展性仍然受到二次注意力成本的限制,以及会降低像素级保真度的粗糙补丁嵌入。我们提出了TaylorIR,这是一个即插即用的框架,它强制使用1x1补丁嵌入进行真正的像素级推理,并用TaylorShift替换传统的自注意力机制,这是一种基于泰勒级数展开的注意力机制,能够以接近线性的复杂度实现完整的令牌交互。在多个SR基准测试中,TaylorIR达到了最先进的性能,同时减少了高达60%的内存消耗,有效地缩短了精细细节恢复和高效Transformer扩展之间的差距。
论文及项目相关链接
Summary
基于Transformer的架构已提升超分辨率(SR)模型的图像重建质量,但其可扩展性受限于二次方的注意力成本及粗粒度的补丁嵌入,这会削弱像素级别的保真度。本文提出TaylorIR,一个即插即用的框架,采用1x1像素级别的补丁嵌入和基于泰勒序列的注意力机制替换常规的自注意力,实现全token交互并接近线性复杂度。在多个SR基准测试中,TaylorIR达到了最新技术水平,同时降低了高达60%的内存消耗,有效桥接了精细粒度细节恢复和高效Transformer扩展之间的鸿沟。
Key Takeaways
- Transformer架构在超分辨率(SR)模型的图像重建中表现优异。
- 现有Transformer模型的可扩展性受到二次方注意力成本和粗粒度补丁嵌入的限制。
- TaylorIR框架采用1x1像素级别的补丁嵌入实现真正的像素级推理。
- TaylorIR使用基于泰勒序列的注意力机制(TaylorShift)替换常规自注意力。
- TaylorShift能实现全token交互,并接近线性复杂度,提高模型效率。
- TaylorIR在多个SR基准测试中达到最新技术水平。
点此查看论文截图
LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation
Authors:Juntao Jiang, Mengmeng Wang, Huizhong Tian, Lingbo Cheng, Yong Liu
While large models have achieved significant progress in computer vision, challenges such as optimization complexity, the intricacy of transformer architectures, computational constraints, and practical application demands highlight the importance of simpler model designs in medical image segmentation. This need is particularly pronounced in mobile medical devices, which require lightweight, deployable models with real-time performance. However, existing lightweight models often suffer from poor robustness across datasets, limiting their widespread adoption. To address these challenges, this paper introduces LV-UNet, a lightweight and vanilla model that leverages pre-trained MobileNetv3-Large backbones and incorporates fusible modules. LV-UNet employs an enhanced deep training strategy and switches to a deployment mode during inference by re-parametrization, significantly reducing parameter count and computational overhead. Experimental results on ISIC 2016, BUSI, CVC-ClinicDB, CVC-ColonDB, and Kvair-SEG datasets demonstrate a better trade-off between performance and the computational load. The code will be released at https://github.com/juntaoJianggavin/LV-UNet.
在计算机视觉领域,大型模型已经取得了显著进展,但优化复杂性、转换器架构的复杂性、计算约束和实际应用需求等挑战突显了在医学图像分割中更简单模型设计的重要性。这一需求在移动医疗设备中尤为突出,这些设备需要轻便、可部署的具有实时性能要求的模型。然而,现有的轻量化模型通常在跨数据集方面存在鲁棒性不足的问题,限制了其广泛应用。为了应对这些挑战,本文引入了LV-UNet,这是一个轻量级的普通模型,它利用预训练的MobileNetv3-Large骨干网并融入了可融合模块。LV-UNet采用了一种增强的深度训练策略,并通过重新参数化在推理时切换到部署模式,显著减少了参数计数和计算开销。在ISIC 2016、BUSI、CVC-ClinicDB、CVC-ColonDB和Kvair-SEG数据集上的实验结果表明,它在性能和计算负载之间达到了更好的权衡。代码将在https://github.com/juntaoJianggavin/LV-UNet发布。
论文及项目相关链接
PDF Accepted by IEEE BIBM2024 ML4BMI workshop
Summary
医学图像分割中,大型模型虽有所成就但仍面临优化复杂度、结构复杂性等挑战。为满足医疗设备的实际需求,如移动医疗设备要求轻量化、可部署、实时性能等需求,本研究提出LV-UNet模型。该模型利用预训练的MobileNetv3-Large骨干网络和融合模块,采用增强深度训练策略并在推理时切换到部署模式以重新调整参数。实验结果证明了LV-UNet在计算负载和性能之间的更好平衡。该代码将于 [网址发布地址发布地址公开]。
Key Takeaways
- 大模型在医学图像分割上虽有进展,但仍面临优化复杂度和实际应用需求等挑战。
- 移动医疗设备需要轻量化、可部署的模型实现实时性能。
- LV-UNet模型利用预训练MobileNetv3-Large骨干网络和融合模块,简化模型复杂度。
- LV-UNet采用增强深度训练策略以提升性能。
- 模型在推理时切换至部署模式以降低参数数量和计算开销。
- 实验结果证明LV-UNet在计算负载和性能之间取得良好平衡。
点此查看论文截图
X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models
Authors:Emmanuelle Bourigault, Abdullah Hamdi, Amir Jamaludin
Magnetic Resonance Imaging (MRI) is a crucial diagnostic tool, but high-resolution scans are often slow and expensive due to extensive data acquisition requirements. Traditional MRI reconstruction methods aim to expedite this process by filling in missing frequency components in the K-space, performing 3D-to-3D reconstructions that demand full 3D scans. In contrast, we introduce X-Diffusion, a novel cross-sectional diffusion model that reconstructs detailed 3D MRI volumes from extremely sparse spatial-domain inputs, achieving 2D-to-3D reconstruction from as little as a single 2D MRI slice or few slices. A key aspect of X-Diffusion is that it models MRI data as holistic 3D volumes during the cross-sectional training and inference, unlike previous learning approaches that treat MRI scans as collections of 2D slices in standard planes (coronal, axial, sagittal). We evaluated X-Diffusion on brain tumor MRIs from the BRATS dataset and full-body MRIs from the UK Biobank dataset. Our results demonstrate that X-Diffusion not only surpasses state-of-the-art methods in quantitative accuracy (PSNR) on unseen data but also preserves critical anatomical features such as tumor profiles, spine curvature, and brain volume. Remarkably, the model generalizes beyond the training domain, successfully reconstructing knee MRIs despite being trained exclusively on brain data. Medical expert evaluations further confirm the clinical relevance and fidelity of the generated images.To our knowledge, X-Diffusion is the first method capable of producing detailed 3D MRIs from highly limited 2D input data, potentially accelerating MRI acquisition and reducing associated costs. The code is available on the project website https://emmanuelleb985.github.io/XDiffusion/ .
磁共振成像(MRI)是一项重要的诊断工具,但高分辨率扫描往往因为需要大量数据收集而变得缓慢且昂贵。传统的MRI重建方法旨在通过填充K空间中的缺失频率成分来加快这一过程,进行需要完整3D扫描的3D-to-3D重建。相比之下,我们引入了X-Diffusion,这是一种新型截面扩散模型,能够从极稀疏的空间域输入中重建详细的3DMRI体积,仅使用单个2DMRI切片或少量切片即可实现2D-to-3D重建。X-Diffusion的一个关键方面是,它在截面训练和推断期间将MRI数据建模为整体3D体积,这与以前的学习方法不同,后者将MRI扫描视为标准平面(冠状面、轴面、矢状面)中的2D切片集合。我们对BRATS数据集中的脑肿瘤MRI和UK Biobank数据集中的全身MRI进行了X-Diffusion评估。结果表明,X-Diffusion不仅在未见数据的定量准确性(PSNR)方面超越了最先进的方法,而且还保留了关键的组织特征,如肿瘤概况、脊柱曲度和脑容量。值得注意的是,该模型在训练领域之外也具有通用性,能够成功重建膝盖MRI,尽管它只接受脑部数据训练。医学专家评估进一步证实了生成图像的临床相关性和保真度。据我们所知,X-Diffusion是第一种能够从高度有限的2D输入数据生成详细的3DMRI的方法,可能会加速MRI采集并降低相关成本。代码可在项目网站https://emmanuelleb985.github.io/XDiffusion/上找到。
论文及项目相关链接
PDF accepted at ICCV 2025 GAIA workshop https://era-ai-biomed.github.io/GAIA/ , project website: https://emmanuelleb985.github.io/XDiffusion/
Summary
基于深度学习技术的X-Diffusion模型能够从极少的二维MRI切片重建出详细的三维图像,加速MRI诊断过程并降低费用。相较于传统方法,其更具优势,体现在重建准确性、关键特征保留、以及跨领域数据重建能力上。其方法模型公开在GitHub上供研究使用。
Key Takeaways
- X-Diffusion模型利用深度学习技术,能够从极少的二维MRI切片重建出详细的三维图像。
- 传统MRI重建方法是通过填充缺失的频率成分来完成重建,而X-Diffusion采用扩散模型从稀疏空间域输入实现重建。
- X-Diffusion模型实现了二维到三维的重建过程,突破了传统扫描的限制。
- 模型通过模拟MRI数据整体三维体积,提升了解剖特征如肿瘤形态、脊柱弯曲和脑容量的保留效果。
- 模型在未见数据上表现出超越现有方法的定量准确性(PSNR)。
点此查看论文截图
BoxCell: Leveraging SAM for Cell Segmentation with Box Supervision
Authors:Aayush Kumar Tyagi, Vaibhav Mishra, Prathosh A. P., Mausam
Cell segmentation in histopathological images is vital for diagnosis, and treatment of several diseases. Annotating data is tedious, and requires medical expertise, making it difficult to employ supervised learning. Instead, we study a weakly supervised setting, where only bounding box supervision is available, and present the use of Segment Anything (SAM) for this without any finetuning, i.e., directly utilizing the pre-trained model. We propose BoxCell, a cell segmentation framework that utilizes SAM’s capability to interpret bounding boxes as prompts, \emph{both} at train and test times. At train time, gold bounding boxes given to SAM produce (pseudo-)masks, which are used to train a standalone segmenter. At test time, BoxCell generates two segmentation masks: (1) generated by this standalone segmenter, and (2) a trained object detector outputs bounding boxes, which are given as prompts to SAM to produce another mask. Recognizing complementary strengths, we reconcile the two segmentation masks using a novel integer programming formulation with intensity and spatial constraints. We experiment on three publicly available cell segmentation datasets namely, CoNSep, MoNuSeg, and TNBC, and find that BoxCell significantly outperforms existing box supervised image segmentation models, obtaining 6-10 point Dice gains.
病理图像中的细胞分割对于多种疾病的诊断和治疗至关重要。数据标注是一项乏味且需要医学专业知识的工作,使得采用监督学习变得困难。因此,我们研究了一种弱监督环境,其中仅提供边界框监督,并展示了在不进行任何微调的情况下直接使用预训练模型使用Segment Anything(SAM)。我们提出BoxCell,这是一种细胞分割框架,利用SAM将边界框解释为提示的能力,在训练和测试时都是如此。在训练过程中,提供给SAM的金边界框会产生(伪)掩膜,用于训练独立的分段器。在测试时,BoxCell生成两个分割掩膜:(1)由该独立分段器生成,(2)训练好的对象检测器输出边界框,这些边界框作为提示给SAM以产生另一个掩膜。通过识别互补的优势,我们使用具有强度和空间约束的新型整数编程公式来协调这两个分割掩膜。我们在三个公开的细胞分割数据集CoNSep、MoNuSeg和TNBC上进行实验,发现BoxCell显著优于现有的盒式监督图像分割模型,Dice得分提高了6-10个点。
论文及项目相关链接
Summary
本研究提出一种名为BoxCell的弱监督细胞分割框架,利用预训练的Segment Anything(SAM)模型进行解读。通过利用黄金边界框生成伪掩膜来训练独立分割器,同时在测试阶段生成两种分割掩膜并进行结合。实验结果表明,BoxCell在三种公开可用的细胞分割数据集上显著优于现有的基于边界框的图像分割模型。
Key Takeaways
- 细胞分割在病理图像诊断与治疗疾病中至关重要。
- 数据标注需要医学专业知识且繁琐,使得监督学习难以实施。
- 研究集中在弱监督设置下,仅使用边界框监督信息。
- 利用Segment Anything(SAM)模型进行细胞分割,无需微调。
- BoxCell框架在训练和测试阶段都利用SAM的解读能力,通过边界框生成伪掩膜训练独立分割器。
- BoxCell生成两种分割掩膜并通过整数规划方法结合两者的优势。