GAN

发布日期: 2025-11-26

更新日期: 2025-11-27

文章字数: 5.9k

阅读时长: 24 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-26 更新

Three-Dimensional Anatomical Data Generation Based on Artificial Neural Networks

Authors:Ann-Sophia Müller, Moonkwang Jeong, Meng Zhang, Jiyuan Tian, Arkadiusz Miernik, Stefanie Speidel, Tian Qiu

Surgical planning and training based on machine learning requires a large amount of 3D anatomical models reconstructed from medical imaging, which is currently one of the major bottlenecks. Obtaining these data from real patients and during surgery is very demanding, if even possible, due to legal, ethical, and technical challenges. It is especially difficult for soft tissue organs with poor imaging contrast, such as the prostate. To overcome these challenges, we present a novel workflow for automated 3D anatomical data generation using data obtained from physical organ models. We additionally use a 3D Generative Adversarial Network (GAN) to obtain a manifold of 3D models useful for other downstream machine learning tasks that rely on 3D data. We demonstrate our workflow using an artificial prostate model made of biomimetic hydrogels with imaging contrast in multiple zones. This is used to physically simulate endoscopic surgery. For evaluation and 3D data generation, we place it into a customized ultrasound scanner that records the prostate before and after the procedure. A neural network is trained to segment the recorded ultrasound images, which outperforms conventional, non-learning-based computer vision techniques in terms of intersection over union (IoU). Based on the segmentations, a 3D mesh model is reconstructed, and performance feedback is provided.

基于机器学习的手术规划和训练需要大量的通过医学影像重建的3D解剖模型，这目前是主要的瓶颈之一。由于法律、伦理和技术上的挑战，从真实患者和手术中获取这些数据是非常困难的，即使可能也是如此。对于成像对比度较差的软组织器官（如前列腺）来说尤其如此。为了克服这些挑战，我们提出了一种使用从物理器官模型获得的数据进行自动3D解剖数据生成的新型工作流程。此外，我们还使用了一个3D生成对抗网络（GAN），以获取可用于依赖3D数据的下游机器学习任务的多个3D模型流形。我们使用由多区成像对比度的生物仿制水凝胶制成的人工前列腺模型来演示我们的工作流程。这用于物理模拟内窥镜手术。对于评估和3D数据生成，我们将其放入定制的超声扫描仪中，记录手术前后的前列腺情况。我们训练了一个神经网络来对记录的超声图像进行分割，与传统的非学习型的计算机视觉技术相比，它在交并比（IoU）方面表现更好。基于分割结果，重建了3D网格模型，并提供了性能反馈。

论文及项目相关链接

PDF 6 pages, 4 figures, 1 table, IEEE International Conference on Intelligent Robots and Systems (IROS)

Summary

基于机器学习的手术规划和训练需要大量的三维解剖模型，这些模型是通过医学影像重建得到的，这是目前的主要瓶颈之一。从真实患者和手术中收集这些数据具有法律、伦理和技术方面的挑战，特别是对于成像对比度较差的软组织器官（如前列腺）更是如此。为了克服这些挑战，我们提出了一种使用从实体器官模型获得的数据自动化生成三维解剖数据的新工作流程，并利用三维生成对抗网络（GAN）获得一系列用于其他依赖于三维数据的下游机器学习任务的三维模型。我们使用由生物模拟水凝胶制成的人工前列腺模型进行演示，该模型具有多区成像对比度，用于物理模拟内窥镜手术。将其放入定制的超声扫描仪中进行评估和三维数据生成，记录手术前后的前列腺图像。训练神经网络对记录的超声图像进行分割，与传统的非基于学习的计算机视觉技术相比，在交并比（IoU）方面表现出较好的性能。基于分割结果重建三维网格模型并提供性能反馈。

Key Takeaways

机器学习和三维解剖模型在手术规划和训练中的应用面临数据收集的瓶颈。
从真实患者和手术中收集数据存在法律、伦理和技术挑战。
对于成像对比度差的软组织器官（如前列腺），数据收集尤为困难。
提出一种使用实体器官模型数据自动化生成三维解剖数据的新工作流程。
利用三维生成对抗网络（GAN）生成一系列三维模型以支持其他依赖三维数据的下游任务。
利用人工前列腺模型进行演示，通过定制的超声扫描仪记录和评估数据。

Cool Papers

点此查看论文截图

When Semantics Regulate: Rethinking Patch Shuffle and Internal Bias for Generated Image Detection with CLIP

Authors:Beilin Chu, Weike You, Mengtao Li, Tingting Zheng, Kehan Zhao, Xuan Xu, Zhigao Lu, Jia Song, Moxuan Xu, Linna Zhou

The rapid progress of GANs and Diffusion Models poses new challenges for detecting AI-generated images. Although CLIP-based detectors exhibit promising generalization, they often rely on semantic cues rather than generator artifacts, leading to brittle performance under distribution shifts. In this work, we revisit the nature of semantic bias and uncover that Patch Shuffle provides an unusually strong benefit for CLIP, that disrupts global semantic continuity while preserving local artifact cues, which reduces semantic entropy and homogenizes feature distributions between natural and synthetic images. Through a detailed layer-wise analysis, we further show that CLIP’s deep semantic structure functions as a regulator that stabilizes cross-domain representations once semantic bias is suppressed. Guided by these findings, we propose SemAnti, a semantic-antagonistic fine-tuning paradigm that freezes the semantic subspace and adapts only artifact-sensitive layers under shuffled semantics. Despite its simplicity, SemAnti achieves state-of-the-art cross-domain generalization on AIGCDetectBenchmark and GenImage, demonstrating that regulating semantics is key to unlocking CLIP’s full potential for robust AI-generated image detection.

生成对抗网络（GAN）和扩散模型（Diffusion Models）的快速进步给检测AI生成的图像带来了新挑战。虽然基于CLIP的探测器显示出有前途的泛化能力，但它们通常依赖于语义线索而非生成器特征，导致在分布变化下的性能不稳定。在这项工作中，我们重新审视了语义偏差的本质，并发现Patch Shuffle对CLIP提供了异常强大的优势，这种策略会破坏全局语义连续性同时保留局部特征线索，降低了语义熵并同质化了自然图像和合成图像之间的特征分布。通过详细的逐层分析，我们还表明CLIP的深度语义结构起到调节作用，一旦抑制了语义偏差，就可以稳定跨域表示。基于这些发现，我们提出了SemAnti，这是一种语义对抗微调范式，冻结语义子空间，仅在打乱语义的情况下适应特征敏感层。尽管很简单，但SemAnti在AIGCDetectBenchmark和GenImage上实现了最先进的跨域泛化，证明调节语义是解锁CLIP在稳健AI生成图像检测中的全部潜力的关键。

论文及项目相关链接

PDF 14 pages, 7 figures and 7 tables

Summary

GAN和Diffusion模型的快速发展给检测AI生成图像带来了新的挑战。本研究重新审视了语义偏见的本质，发现Patch Shuffle对CLIP的效益显著，其能破坏全局语义连续性同时保留局部伪影线索，降低语义熵并统一自然图像和合成图像的特征分布。通过详细的逐层分析，我们进一步发现CLIP的深度语义结构起到调节作用，一旦抑制语义偏见，就能稳定跨域表示。基于此，我们提出了SemAnti，一种语义对抗微调范式，冻结语义子空间，仅适应伪影敏感层进行语义洗牌。尽管简单，但SemAnti在AIGCDetectBenchmark和GenImage上实现了跨域最优泛化性能，证明了调节语义是解锁CLIP在稳健的AI生成图像检测中潜力的关键。

Key Takeaways

GAN和Diffusion模型的进步给检测AI生成图像带来挑战。
CLIP-based detectors依赖语义线索而非生成器特征，性能不稳定。
Patch Shuffle对CLIP有显著效益，能破坏全局语义连续性同时保留局部伪影线索。
CLIP的深度语义结构起到稳定跨域表示的作用，当语义偏见被抑制时。
提出SemAnti方法，通过冻结语义子空间并仅调整伪影敏感层来提升CLIP的性能。
SemAnti在跨域泛化性能上达到最优，证明调节语义是解锁CLIP潜力的关键。

Cool Papers

点此查看论文截图

VAOT: Vessel-Aware Optimal Transport for Retinal Fundus Enhancement

Authors:Xuanzhao Dong, Wenhui Zhu, Yujian Xiong, Xiwen Chen, Hao Wang, Xin Li, Jiajun Cheng, Zhipeng Wang, Shao Tang, Oana Dumitrascu, Yalin Wang

Color fundus photography (CFP) is central to diagnosing and monitoring retinal disease, yet its acquisition variability (e.g., illumination changes) often degrades image quality, which motivates robust enhancement methods. Unpaired enhancement pipelines are typically GAN-based, however, they can distort clinically critical vasculature, altering vessel topology and endpoint integrity. Motivated by these structural alterations, we propose Vessel-Aware Optimal Transport (\textbf{VAOT}), a framework that combines an optimal-transport objective with two structure-preserving regularizers: (i) a skeleton-based loss to maintain global vascular connectivity and (ii) an endpoint-aware loss to stabilize local termini. These constraints guide learning in the unpaired setting, reducing noise while preserving vessel structure. Experimental results on synthetic degradation benchmark and downstream evaluations in vessel and lesion segmentation demonstrate the superiority of the proposed methods against several state-of-the art baselines. The code is available at https://github.com/Retinal-Research/VAOT

眼底彩色摄影（CFP）是诊断视网膜疾病的关键技术，然而由于其采集过程中的各种变化（如光照变化），往往会导致图像质量下降，这促使需要稳定的增强方法。非配对增强管道通常是基于生成对抗网络（GAN）的，但它们可能会扭曲临床上重要的血管结构，改变血管拓扑和端点完整性。受这些结构变化的启发，我们提出了结合最佳传输目标与两个结构保留正则化器的血管感知最佳传输（VAOT）框架：（i）基于骨架的损失来保持全局血管连接性；（ii）端点感知损失来稳定局部端点。这些约束在配对环境中引导学习，在减少噪声的同时保留血管结构。在合成退化基准测试和下游血管及病变分割评估中的实验结果表明，该方法优于若干先进的基线方法。代码可通过https://github.com/Retinal-Research/VAOT获取。

论文及项目相关链接

PDF

Summary

基于眼底彩色摄影技术在视网膜疾病诊断和治疗中的重要性，以及图像采集过程中易受光照变化等因素影响导致的图像质量问题，研究提出了Vessel-Aware Optimal Transport（VAOT）框架。该框架结合了最优传输目标与两种结构保留正则化器，旨在减少噪声同时保留血管结构，提高无配对增强管道的性能，减少血管结构扭曲。实验结果表明，该方法在合成退化基准测试和血管及病变分割下游评估中具有优越性。

Key Takeaways

眼底彩色摄影在视网膜疾病诊断与治疗中具有关键作用。
图像采集过程中的光照变化等因素可能导致图像质量下降。
无配对增强管道通常基于GAN技术，但可能导致临床关键的血管结构扭曲。
VAOT框架结合了最优传输目标以优化无配对环境下的学习。
VAOT包含两种结构保留正则化器：基于骨架的损失以维持全球血管连接和端点感知损失以稳定局部终点。
实验结果表明，VAOT在合成退化基准测试和下游评估中表现优越。

Cool Papers

点此查看论文截图

CoD: A Diffusion Foundation Model for Image Compression

Authors:Zhaoyang Jia, Zihan Zheng, Naifu Xue, Jiahao Li, Bin Li, Zongyu Guo, Xiaoyi Zhang, Houqiang Li, Yan Lu

Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion. However, text conditioning is suboptimal from a compression perspective, hindering the potential of downstream diffusion codecs, particularly at ultra-low bitrates. To address it, we introduce \textbf{CoD}, the first \textbf{Co}mpression-oriented \textbf{D}iffusion foundation model, trained from scratch to enable end-to-end optimization of both compression and generation. CoD is not a fixed codec but a general foundation model designed for various diffusion-based codecs. It offers several advantages: \textbf{High compression efficiency}, replacing Stable Diffusion with CoD in downstream codecs like DiffC achieves SOTA results, especially at ultra-low bitrates (e.g., 0.0039 bpp); \textbf{Low-cost and reproducible training}, 300$\times$ faster training than Stable Diffusion ($\sim$ 20 vs. $\sim$ 6,250 A100 GPU days) on entirely open image-only datasets; \textbf{Providing new insights}, e.g., We find pixel-space diffusion can achieve VTM-level PSNR with high perceptual quality and can outperform GAN-based codecs using fewer parameters. We hope CoD lays the foundation for future diffusion codec research. Codes will be released.

现有扩散编码器通常基于文本到图像的扩散基础模型（如Stable Diffusion）。然而，从压缩的角度来看，文本条件并不理想，制约了下游扩散编码器的潜力，特别是在超低比特率下。为了解决这个问题，我们引入了面向压缩的扩散基础模型CoD，它是从头开始训练的，以实现压缩和生成的端到端优化。CoD不是固定的编码器，而是为各种基于扩散的编码器设计的通用基础模型。它提供了几个优势：高压缩效率，在下游编码器（如DiffC）中使用CoD替换Stable Diffusion达到了最先进的成果，特别是在超低比特率下（例如，0.0039 bpp）；低成本和可复制的训练，在完全开放的仅图像数据集上，其训练速度是Stable Diffusion的300倍（约20天与约6,250天A100 GPU）；提供新见解，例如我们发现像素空间扩散可以达到与VTM相当的PSNR值，并具有高度的感知质量，并且可以使用更少的参数来超越基于GAN的编码器。我们希望CoD为未来扩散编码器研究奠定基础。代码将发布。

论文及项目相关链接

PDF

Summary

本文介绍了一种面向压缩的扩散基础模型CoD，旨在优化压缩和生成两个过程的端到端性能。相较于现有的文本图像扩散模型如Stable Diffusion，CoD具有更高的压缩效率、更低的训练成本和可重复性，并在下游扩散编码器中实现了领先水平，特别是在超低比特率下。CoD还提供像素级扩散方法的新见解，并为未来的扩散编码器研究奠定基础。

Key Takeaways

面向压缩的扩散基础模型CoD被引入，旨在优化压缩和生成的端到端性能。
CoD相较于现有的文本图像扩散模型如Stable Diffusion具有更高的压缩效率，特别是在超低比特率下。
CoD适用于各种扩散编码器，是一个通用基础模型。
CoD训练成本低且可重复性强，相较于Stable Diffusion训练速度提升300倍。
在下游编码器如DiffC中使用CoD替代Stable Diffusion取得了领先水平。
CoD提供像素级扩散方法的新见解，可实现与VTM相近的PSNR并具有高感知质量。

Cool Papers

点此查看论文截图

Uni-DAD: Unified Distillation and Adaptation of Diffusion Models for Few-step Few-shot Image Generation

Authors:Yara Bahram, Melodie Desbos, Mohammadhadi Shateri, Eric Granger

Diffusion models (DMs) produce high-quality images, yet their sampling remains costly when adapted to new domains. Distilled DMs are faster but typically remain confined within their teacher’s domain. Thus, fast and high-quality generation for novel domains relies on two-stage training pipelines: Adapt-then-Distill or Distill-then-Adapt. However, both add design complexity and suffer from degraded quality or diversity. We introduce Uni-DAD, a single-stage pipeline that unifies distillation and adaptation of DMs. It couples two signals during training: (i) a dual-domain distribution-matching distillation objective that guides the student toward the distributions of the source teacher and a target teacher, and (ii) a multi-head generative adversarial network (GAN) loss that encourages target realism across multiple feature scales. The source domain distillation preserves diverse source knowledge, while the multi-head GAN stabilizes training and reduces overfitting, especially in few-shot regimes. The inclusion of a target teacher facilitates adaptation to more structurally distant domains. We perform evaluations on a variety of datasets for few-shot image generation (FSIG) and subject-driven personalization (SDP). Uni-DAD delivers higher quality than state-of-the-art (SoTA) adaptation methods even with less than 4 sampling steps, and outperforms two-stage training pipelines in both quality and diversity.

扩散模型（DMs）能够生成高质量图像，但当它们适应新领域时，采样仍然很昂贵。蒸馏扩散模型速度更快，但通常仍局限于其教师的领域范围内。因此，针对新领域的快速和高质量生成依赖于两阶段训练管道：先适应再蒸馏或先蒸馏再适应。然而，两者都增加了设计复杂性，并可能出现质量下降或多样性不足的问题。我们引入了Uni-DAD，这是一个统一扩散模型的蒸馏和适应性的单阶段训练管道。它在训练过程中结合了两种信号：（i）一种面向双域分布匹配的蒸馏目标，引导学生走向源教师和目标教师的分布；（ii）一个多峰生成对抗网络（GAN）损失，鼓励多特征尺度上的目标真实性。源域蒸馏保留了多样化的源知识，而多峰GAN稳定了训练过程并减少了过拟合现象，特别是在小样本情况下。包含目标教师有助于适应结构更远的领域。我们在多种数据集上进行了少量图像生成（FSIG）和主题驱动个性化（SDP）的评估。即使在不到4个采样步骤的情况下，Uni-DAD的质量也超过了最新先进的适应方法，并且在质量和多样性方面都优于两阶段训练管道。

论文及项目相关链接

PDF Under review paper at CVPR 2026

Summary

本文介绍了扩散模型（DMs）在生成高质量图像时面临的挑战，特别是当它们适应新领域时的采样成本问题。尽管蒸馏DMs更快，但它们通常局限于教师的领域。文章提出了一种新的方法Uni-DAD，它结合了蒸馏和扩散模型的自适应过程，通过双域分布匹配蒸馏目标和多头生成对抗网络（GAN）损失的训练信号来实现单一阶段的管道。此方法能够在源域和目标域之间实现快速、高质量且多样化的生成。初步评价显示，该方法在各种数据集上的少数图像生成和主题驱动个性化任务上的表现优于其他先进方法和两阶段训练流程。

Key Takeaways

扩散模型（DMs）生成新领域的高品质图像较为耗时。
目前蒸馏DMs局限于教师的领域，存在对新领域的适应性不足的问题。
提出了一种统一扩散模型蒸馏和自适应的新方法——Uni-DAD。它采用了双域分布匹配蒸馏目标和多头GAN损失。
Uni-DAD能实现单一阶段的管道，兼具快速性和高质量性，并在新领域的图像生成中展现多样性。
Uni-DAD的源域蒸馏能保留多样的源知识，而多头GAN有助于稳定训练并减少过拟合，特别是在样本量较小的情况下。
Uni-DAD包含目标教师模型，使其更容易适应结构差异较大的领域。

Cool Papers

点此查看论文截图

OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution

Authors:Zhiqiang Wu, Zhaomang Sun, Tong Zhou, Bingtao Fu, Ji Cong, Yitong Dong, Huaqi Zhang, Xuan Tang, Mingsong Chen, Xian Wei

Denoising Diffusion Probabilistic Models (DDPMs) show promising potential in one-step Real-World Image Super-Resolution (Real-ISR). Current one-step Real-ISR methods typically inject the low-quality (LQ) image latent representation at the start or end timestep of the DDPM scheduler. Recent studies have begun to note that the LQ image latent and the pre-trained noisy latent representations are intuitively closer at a mid-timestep. However, a quantitative analysis of these latent representations remains lacking. Considering these latent representations can be decomposed into signal and noise, we propose a method based on the Signal-to-Noise Ratio (SNR) to pre-compute an average optimal mid-timestep for injection. To better approximate the pre-trained noisy latent representation, we further introduce the Latent Representation Refinement (LRR) loss via a LoRA-enhanced VAE encoder. We also fine-tune the backbone of the DDPM-based generative model using LoRA to perform one-step denoising at the average optimal mid-timestep. Based on these components, we present OMGSR, a GAN-based Real-ISR framework that employs a DDPM-based generative model as the generator and a DINOv3-ConvNeXt model with multi-level discriminator heads as the discriminator. We also propose the DINOv3-ConvNeXt DISTS (Dv3CD) loss, which is enhanced for structural perception at varying resolutions. Within the OMGSR framework, we develop OMGSR-S based on SD2.1-base. An ablation study confirms that our pre-computation strategy and LRR loss significantly improve the baseline. Comparative studies demonstrate that OMGSR-S achieves state-of-the-art performance across multiple metrics. Code is available at \hyperlink{Github}{https://github.com/wuer5/OMGSR}.

去噪扩散概率模型（DDPMs）在一步式现实世界图像超分辨率（Real-ISR）中显示出巨大的潜力。当前的一步式Real-ISR方法通常将低质量（LQ）图像潜在表示注入DDPM调度器的开始或结束时间步长。最近的研究开始注意到，在中期时间步长，LQ图像潜在与预训练的噪声潜存在直觉上更接近。然而，对这些潜在表示的定量分析仍然缺乏。考虑到这些潜在表示可以分解为信号和噪声，我们提出了一种基于信噪比（SNR）的方法，预先计算平均最佳中期时间步长进行注入。为了更好地近似预训练的噪声潜在表示，我们进一步通过LoRA增强的VAE编码器引入潜在表示细化（LRR）损失。我们还使用LoRA微调DDPM基于的生成模型的骨干网，以在平均最佳中期时间步长执行一步去噪。基于这些组件，我们提出了OMGSR，这是一种基于GAN的Real-ISR框架，它使用DDPM基于的生成模型作为生成器，使用配备多级判别器头部的DINOv3-ConvNeXt模型作为判别器。我们还提出了DINOv3-ConvNeXt DISTS（Dv3CD）损失，它增强了不同分辨率的结构感知能力。在OMGSR框架内，我们基于SD2.1-base开发了OMGSR-S。消融研究证实，我们的预计算策略和LRR损失显着提高了基线性能。对比研究表明，OMGSR-S在多个指标上达到了最先进的性能。代码可在Github找到。

论文及项目相关链接

PDF

Summary

DDPM模型在一步式真实世界图像超分辨率（Real-ISR）中展现出潜力。本文提出一种基于信噪比（SNR）预计算平均最优中间时间步的方法，并引入潜在表示细化（LRR）损失和LoRA增强VAE编码器来改进DDPM模型。同时，开发了一个名为OMGSR的GAN-based Real-ISR框架，包含多级别判别器头部的DINOv3-ConvNeXt模型和增强的结构感知损失。OMGSR-S版本在多个指标上达到最佳性能。代码已公开。

Key Takeaways

DDPM模型在Real-ISR中展现潜力。
提出基于信噪比（SNR）预计算平均最优中间时间步的方法。
引入潜在表示细化（LRR）损失以改进DDPM模型。
使用LoRA增强VAE编码器来更好地近似预训练的噪声潜在表示。
开发名为OMGSR的GAN-based Real-ISR框架，包含DINOv3-ConvNeXt模型和增强的结构感知损失。
OMGSR-S版本在多个性能指标上达到最佳表现。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-11-26/GAN/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

GAN

3DGS

3DGS 方向最新论文已更新，请持续关注 Update in 2025-11-26 DensifyBeforehand LiDAR-assisted Content-aware Densification for Efficient and Quality 3D Gaussian Splatting

2025-11-26 3DGS

3DGS

Speech

Speech 方向最新论文已更新，请持续关注 Update in 2025-11-26 AIRHILT A Human-in-the-Loop Testbed for Multimodal Conflict Detection in Aviation

2025-11-26 Speech

Speech