GAN

发布日期: 2025-04-26

更新日期: 2025-05-14

文章字数: 791

阅读时长: 3 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-04-26 更新

PPS-Ctrl: Controllable Sim-to-Real Translation for Colonoscopy Depth Estimation

Authors:Xinqi Xiong, Andrea Dunn Beltran, Jun Myeong Choi, Marc Niethammer, Roni Sengupta

Accurate depth estimation enhances endoscopy navigation and diagnostics, but obtaining ground-truth depth in clinical settings is challenging. Synthetic datasets are often used for training, yet the domain gap limits generalization to real data. We propose a novel image-to-image translation framework that preserves structure while generating realistic textures from clinical data. Our key innovation integrates Stable Diffusion with ControlNet, conditioned on a latent representation extracted from a Per-Pixel Shading (PPS) map. PPS captures surface lighting effects, providing a stronger structural constraint than depth maps. Experiments show our approach produces more realistic translations and improves depth estimation over GAN-based MI-CycleGAN. Our code is publicly accessible at https://github.com/anaxqx/PPS-Ctrl.

精确的深度估计可以增强内窥镜导航和诊断，但在临床环境中获取真实深度信息是一项挑战。通常使用合成数据集进行训练，但领域差距限制了在实际数据上的泛化能力。我们提出了一种新型图像到图像的翻译框架，能够在保留结构的同时，利用临床数据生成逼真的纹理。我们的主要创新点在于结合了稳定扩散和控制网络（ControlNet），以像素阴影（PPS）映射中提取的潜在表示为条件。PPS捕捉表面光照效果，为深度图提供了更强的结构约束。实验表明，我们的方法能够产生更逼真的翻译效果，并且提高了基于GAN的MI-CycleGAN的深度估计效果。我们的代码公开在https://github.com/anaxqx/PPS-Ctrl上可访问。

论文及项目相关链接

PDF

Summary

本文提出了一种新型图像到图像转换框架，通过集成Stable Diffusion与ControlNet技术，结合从Per-Pixel Shading（PPS）地图中提取的潜在表征，用于生成具有真实纹理的临床数据。该方法解决了在深度估计中内镜导航和诊断的难题，并解决了合成数据集在实际应用中的领域差距问题。该方法的代码已公开。

Key Takeaways