嘘~ 正在从服务器偷取页面 . . .

GAN


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-02-27 更新

IG-CFAT: An Improved GAN-Based Framework for Effectively Exploiting Transformers in Real-World Image Super-Resolution

Authors:Alireza Aghelan, Ali Amiryan, Abolfazl Zarghani, Modjtaba Rouhani

In the field of single image super-resolution (SISR), transformer-based models, have demonstrated significant advancements. However, the potential and efficiency of these models in applied fields such as real-world image super-resolution have been less noticed and there are substantial opportunities for improvement. Recently, composite fusion attention transformer (CFAT), outperformed previous state-of-the-art (SOTA) models in classic image super-resolution. In this paper, we propose a novel GAN-based framework by incorporating the CFAT model to effectively exploit the performance of transformers in real-world image super-resolution. In our proposed approach, we integrate a semantic-aware discriminator to reconstruct fine details more accurately and employ an adaptive degradation model to better simulate real-world degradations. Moreover, we introduce a new combination of loss functions by adding wavelet loss to loss functions of GAN-based models to better recover high-frequency details. Empirical results demonstrate that IG-CFAT significantly outperforms existing SOTA models in both quantitative and qualitative metrics. Our proposed model revolutionizes the field of real-world image super-resolution and demonstrates substantially better performance in recovering fine details and generating realistic textures. The introduction of IG-CFAT offers a robust and adaptable solution for real-world image super-resolution tasks.

在单图像超分辨率(SISR)领域,基于转换器的模型已经显示出重大进展。然而,这些模型在现实世界图像超分辨率等应用领域的潜力和效率尚未得到足够重视,仍存在大量改进机会。最近,复合融合注意力转换器(CFAT)在经典图像超分辨率方面超越了先前最先进的(SOTA)模型。在本文中,我们提出了一种基于GAN的新型框架,通过结合CFAT模型,有效利用转换器在现实世界图像超分辨率中的性能。在我们提出的方法中,我们集成了一个语义感知鉴别器以更准确地重建细节,并采用自适应退化模型以更好地模拟现实世界的退化。此外,我们通过将小波损失添加到GAN模型的损失函数中,引入了一种新的损失函数组合,以更好地恢复高频细节。经验结果表明,IG-CFAT在定量和定性指标上均显著优于现有SOTA模型。我们提出的模型在现实世界图像超分辨率领域具有划时代意义,在恢复细节和生成真实纹理方面表现出卓越的性能。IG-CFAT的引入为现实世界图像超分辨率任务提供了稳健且可适应的解决方案。

论文及项目相关链接

PDF

Summary
在单图像超分辨率(SISR)领域,基于变压器的模型显示出显著进步,但在现实图像超分辨率等应用领域,其潜力和效率尚未得到充分认识,存在巨大的改进空间。本文提出一种结合复合融合注意力变压器(CFAT)模型的新型GAN框架,以有效利用变压器在现实图像超分辨率中的性能。通过引入语义感知鉴别器、自适应退化模型和新型损失函数组合(包括小波损失),该框架可以更准确地重建细节,更好地模拟现实退化,并在定量和定性指标上显著超越现有先进技术。

Key Takeaways

  1. 基于变压器的模型在单图像超分辨率领域已有显著进展。
  2. 现实图像超分辨率领域存在改进空间,基于变压器的模型潜力巨大。
  3. 提出结合CFAT模型的GAN框架,以提高现实图像超分辨率的性能。
  4. 语义感知鉴别器能更准确地重建细节。
  5. 自适应退化模型能更好地模拟现实退化。
  6. 结合小波损失的损失函数组合有助于更好地恢复高频细节。

Cool Papers

点此查看论文截图

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs

Authors:Yihong Luo, Xiaolong Chen, Xinghua Qu, Tianyang Hu, Jing Tang

Recently, some works have tried to combine diffusion and Generative Adversarial Networks (GANs) to alleviate the computational cost of the iterative denoising inference in Diffusion Models (DMs). However, existing works in this line suffer from either training instability and mode collapse or subpar one-step generation learning efficiency. To address these issues, we introduce YOSO, a novel generative model designed for rapid, scalable, and high-fidelity one-step image synthesis with high training stability and mode coverage. Specifically, we smooth the adversarial divergence by the denoising generator itself, performing self-cooperative learning. We show that our method can serve as a one-step generation model training from scratch with competitive performance. Moreover, we extend our YOSO to one-step text-to-image generation based on pre-trained models by several effective training techniques (i.e., latent perceptual loss and latent discriminator for efficient training along with the latent DMs; the informative prior initialization (IPI), and the quick adaption stage for fixing the flawed noise scheduler). Experimental results show that YOSO achieves the state-of-the-art one-step generation performance even with Low-Rank Adaptation (LoRA) fine-tuning. In particular, we show that the YOSO-PixArt-$\alpha$ can generate images in one step trained on 512 resolution, with the capability of adapting to 1024 resolution without extra explicit training, requiring only ~10 A800 days for fine-tuning. Our code is provided at https://github.com/Luo-Yihong/YOSO.

最近,一些研究尝试将扩散(Diffusion)和生成对抗网络(GANs)相结合,以减轻扩散模型(DMs)中迭代去噪推理的计算成本。然而,现有的研究在这一领域面临着训练不稳定和模式崩溃的问题,或者单次生成学习效率低下的问题。为了解决这些问题,我们引入了YOSO,这是一种新型的生成模型,旨在实现快速、可扩展、高保真的一步图像合成,具有高度的训练稳定性和模式覆盖性。具体来说,我们通过去噪生成器本身平滑对抗散度,实现自我协作学习。我们展示了我们的方法可以作为从头开始训练的一步生成模型,具有竞争力的性能。此外,我们通过几种有效的训练技术(即潜在感知损失和潜在鉴别器以进行有效训练以及潜在DMs;信息先验初始化(IPI)和快速适应阶段以修复有缺陷的噪声调度器)将我们的YOSO扩展到基于预训练模型的一步文本到图像生成。实验结果表明,即使在Low-Rank Adaptation(LoRA)微调的情况下,YOSO也实现了最先进的单步生成性能。特别是,我们展示了YOSO-PixArt-α在512分辨率上经过一步训练就能生成图像,并且能够适应1024分辨率而无需额外的明确训练,微调只需约10个A800天。我们的代码已在https://github.com/Luo-Yihong/YOSO提供。

论文及项目相关链接

PDF ICLR 2025

Summary

基于扩散模型和生成对抗网络(GANs)的结合,针对迭代去噪推断的高计算成本问题,最近一些研究尝试通过引入新型生成模型YOSO,以实现快速、可扩展、高保真的一步图像合成。YOSO通过去噪生成器自身平滑对抗发散,进行自合作学习,提高训练稳定性和模式覆盖。它不仅可以作为一步生成模型从头开始训练并具备竞争力,还可扩展至基于预训练模型的一步文本到图像生成。实验结果显示,YOSO在低秩适应(LoRA)微调下实现了一流的一步生成性能,并且能够在不额外明确训练的情况下适应1024分辨率的图像生成。代码已公开于GitHub。

Key Takeaways

  1. YOSO结合了扩散模型和生成对抗网络(GANs),旨在解决迭代去噪推断的计算成本问题。
  2. YOSO通过自合作学习提高训练稳定性和模式覆盖。
  3. YOSO可以作为一步生成模型从头开始训练,具备竞争力。
  4. YOSO可扩展到基于预训练模型的一步文本到图像生成。
  5. YOSO在低秩适应(LoRA)微调下实现了一流的一步生成性能。
  6. YOSO能够在不额外明确训练的情况下适应1024分辨率的图像生成。
  7. 相关代码已公开在GitHub上。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
  目录