发布日期: 2025-05-31

更新日期: 2025-06-24

文章字数: 1.6k

阅读时长: 6 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-05-31 更新

PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation

Authors:Christian Schmidt, Heinrich Martin Overhoff

In medical image segmentation, limited external validity remains a critical obstacle when models are deployed across unseen datasets, an issue particularly pronounced in the ultrasound image domain. Existing solutions-such as domain adaptation and GAN-based style transfer-while promising, often fall short in the medical domain where datasets are typically small and diverse. This paper presents a novel application of principal component analysis (PCA) to address this limitation. PCA preprocessing reduces noise and emphasizes essential features by retaining approximately 90% of the dataset variance. We evaluate our approach across six diverse breast tumor ultrasound datasets comprising 3,983 B-mode images and corresponding expert tumor segmentation masks. For each dataset, a corresponding dimensionality reduced PCA-dataset is created and U-Net-based segmentation models are trained on each of the twelve datasets. Each model trained on an original dataset was inferenced on the remaining five out-of-domain original datasets (baseline results), while each model trained on a PCA dataset was inferenced on five out-of-domain PCA datasets. Our experimental results indicate that using PCA reconstructed datasets, instead of original images, improves the model’s recall and Dice scores, particularly for model-dataset pairs where baseline performance was lowest, achieving statistically significant gains in recall (0.57 $\pm$ 0.07 vs. 0.70 $\pm$ 0.05, $p = 0.0004$) and Dice scores (0.50 $\pm$ 0.06 vs. 0.58 $\pm$ 0.06, $p = 0.03$). Our method reduced the decline in recall values due to external validation by $33%$. These findings underscore the potential of PCA reconstruction as a safeguard to mitigate declines in segmentation performance, especially in challenging cases, with implications for enhancing external validity in real-world medical applications.

在医学图像分割领域，当模型部署在未见过的数据集上时，有限的外部有效性仍然是一个关键的障碍，这一问题在超声图像领域尤其突出。虽然现有的解决方案（如域适应和基于GAN的风格转换）很有前景，但在医学领域通常会出现短板，因为医学数据集通常较小且多样。本文提出了一种主成分分析（PCA）的新应用来解决这一限制。PCA预处理通过保留大约90%的数据集方差来减少噪声并强调重要特征。我们评估了六种不同的乳腺癌超声数据集，包含3983张B模式图像和相应的专家肿瘤分割掩膜。对于每个数据集，都会创建一个相应的降维PCA数据集，并在每个十二个数据集上训练基于U-Net的分割模型。每个在原始数据集上训练的模型会在剩下的五个域外原始数据集上进行推理（基线结果），而在PCA数据集上训练的模型会在五个域外PCA数据集上进行推理。我们的实验结果表明，使用PCA重建的数据集而不是原始图像，可以提高模型的召回率和Dice得分，特别是对于基线性能最低的模型-数据集对，召回率（0.57±0.07 vs. 0.70±0.05，p=0.0004）和Dice得分（0.50±0.06 vs. 0.58±0.06，p=0.03）均实现了统计学上的显著增益。我们的方法将由于外部验证而导致的召回值下降减少了33%。这些发现强调了PCA重建作为缓解分割性能下降的安全保障的潜力，尤其是在复杂情况下，这对增强现实世界医学应用的外部有效性具有重要意义。

论文及项目相关链接

PDF

摘要

本文探讨医学图像分割中外部有效性受限的问题，特别是在超声图像领域。现有解决方案如域适应和基于GAN的风格转换虽具潜力，但在医疗领域通常效果不佳，因为医疗数据集通常小而多样。本文提出一种应用主成分分析（PCA）的新方法来解决这一局限性。PCA预处理通过保留约90%的数据集方差来减少噪声并强调重要特征。在包含3983张B超图像和相应专家肿瘤分割掩码的六个不同乳腺癌超声数据集上评估了该方法。对于每个数据集，创建一个相应的降维PCA数据集，并在每个十二个数据集上训练基于U-Net的分割模型。每个在原始数据集上训练的模型被推断在其他五个域外的原始数据集上（基线结果），而在PCA数据集上训练的模型则在五个域外的PCA数据集上进行推断。实验结果表明，使用PCA重建数据集而不是原始图像可以提高模型的召回率和Dice得分，特别是在基线性能最低的模型数据集对上，召回率（0.57±0.07 vs. 0.70±0.05，p=0.0004）和Dice得分（0.50±0.06 vs. 0.58±0.06，p=0.03）均有统计学上的显著提高。该方法将由于外部验证而导致的召回值下降减少了33%。这些发现强调了PCA重建作为保障措施以减少分割性能下降潜力的潜力，尤其是在具有挑战性的情况下，这对增强现实世界医疗应用中的外部有效性具有重大意义。

Key Takeaways