GAN

发布日期: 2025-10-21

更新日期: 2025-11-27

文章字数: 2.3k

阅读时长: 9 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-21 更新

Dataset Distillation for Super-Resolution without Class Labels and Pre-trained Models

Authors:Sunwoo Cho, Yejin Jung, Nam Ik Cho, Jae Woong Soh

Training deep neural networks has become increasingly demanding, requiring large datasets and significant computational resources, especially as model complexity advances. Data distillation methods, which aim to improve data efficiency, have emerged as promising solutions to this challenge. In the field of single image super-resolution (SISR), the reliance on large training datasets highlights the importance of these techniques. Recently, a generative adversarial network (GAN) inversion-based data distillation framework for SR was proposed, showing potential for better data utilization. However, the current method depends heavily on pre-trained SR networks and class-specific information, limiting its generalizability and applicability. To address these issues, we introduce a new data distillation approach for image SR that does not need class labels or pre-trained SR models. In particular, we first extract high-gradient patches and categorize images based on CLIP features, then fine-tune a diffusion model on the selected patches to learn their distribution and synthesize distilled training images. Experimental results show that our method achieves state-of-the-art performance while using significantly less training data and requiring less computational time. Specifically, when we train a baseline Transformer model for SR with only 0.68% of the original dataset, the performance drop is just 0.3 dB. In this case, diffusion model fine-tuning takes 4 hours, and SR model training completes within 1 hour, much shorter than the 11-hour training time with the full dataset.

训练深度神经网络的要求越来越高，需要大量的数据集和重要的计算资源，尤其是随着模型复杂性的提高。旨在提高数据效率的数据蒸馏方法已成为应对这一挑战的有前途的解决方案。在单图像超分辨率（SISR）领域，对大规模训练数据集的依赖突显了这些技术的重要性。最近，提出了一种基于生成对抗网络（GAN）反演的用于SR的数据蒸馏框架，显示出其在数据利用方面的潜力。然而，当前的方法严重依赖于预训练的SR网络和特定类别的信息，这限制了其通用性和适用性。为了解决这些问题，我们介绍了一种新的用于图像SR的数据蒸馏方法，该方法不需要类标签或预训练的SR模型。具体来说，我们首先提取高梯度补丁并根据CLIP特征对图像进行分类，然后对选定的补丁微调扩散模型，以学习其分布并合成蒸馏的训练图像。实验结果表明，我们的方法在使用显著更少训练数据和更短计算时间的情况下实现了最先进的性能。具体来说，当我们仅使用原始数据集的0.68%来训练一个基准Transformer模型进行SR时，性能下降仅为0.3 dB。在这种情况下，扩散模型微调需要4小时，SR模型训练在1小时内完成，远远短于使用完整数据集的11小时训练时间。

Summary
深度学习训练对大数据集和计算资源的需求日益增长，尤其在模型复杂性提升的情况下。数据蒸馏方法旨在提高数据效率，作为解决这一挑战的有前途的解决方案。针对单图像超分辨率（SISR）领域，新的基于GAN反转的蒸馏框架展示了更好的数据利用潜力。然而，当前方法严重依赖于预训练的SR网络和类别特定信息，限制了其通用性和适用性。为解决这些问题，我们提出了一种新的图像SR数据蒸馏方法，无需类标签或预训练的SR模型。我们通过提取高梯度补丁、基于CLIP特征分类图像，然后微调扩散模型来合成蒸馏训练图像。实验表明，该方法在减少训练数据和计算时间的同时，达到了最先进的性能。

Key Takeaways

深度学习训练需要大量数据集和计算资源，尤其在模型复杂性提升时。
数据蒸馏方法能提高数据效率，是应对这一挑战的有前途的解决方案。
当前GAN反转蒸馏框架在单图像超分辨率（SISR）领域有应用潜力，但依赖预训练模型和类别信息，限制了其通用性。
新的数据蒸馏方法无需类标签或预训练SR模型，通过提取高梯度补丁和基于CLIP特征分类图像来合成蒸馏训练图像。
实验结果显示，新方法在减少训练数据和计算时间的同时达到了最先进的性能。
对比实验显示，使用仅0.68%的原始数据集训练基准Transformer模型时，性能下降仅为0.3 dB。

Cool Papers

点此查看论文截图

Methods and Trends in Detecting AI-Generated Images: A Comprehensive Review

Authors:Arpan Mahara, Naphtali Rishe

The proliferation of generative models, such as Generative Adversarial Networks (GANs), Diffusion Models, and Variational Autoencoders (VAEs), has enabled the synthesis of high-quality multimedia data. However, these advancements have also raised significant concerns regarding adversarial attacks, unethical usage, and societal harm. Recognizing these challenges, researchers have increasingly focused on developing methodologies to detect synthesized data effectively, aiming to mitigate potential risks. Prior reviews have predominantly focused on deepfake detection and often overlook recent advancements in synthetic image forensics, particularly approaches that incorporate multimodal frameworks, reasoning-based detection, and training-free methodologies. To bridge this gap, this survey provides a comprehensive and up-to-date review of state-of-the-art techniques for detecting and classifying synthetic images generated by advanced generative AI models. The review systematically examines core detection paradigms, categorizes them into spatial-domain, frequency-domain, fingerprint-based, patch-based, training-free, and multimodal reasoning-based frameworks, and offers concise descriptions of their underlying principles. We further provide detailed comparative analyses of these methods on publicly available datasets to assess their generalizability, robustness, and interpretability. Finally, the survey highlights open challenges and future directions, emphasizing the potential of hybrid frameworks that combine the efficiency of training-free approaches with the semantic reasoning of multimodal models to advance trustworthy and explainable synthetic image forensics.

生成模型（如生成对抗网络（GANs）、扩散模型和变分自编码器（VAEs））的激增已经能够实现高质量多媒体数据的合成。然而，这些进展也引发了关于对抗性攻击、不道德使用和社会危害的严重关注。认识到这些挑战，研究人员越来越专注于开发有效检测合成数据的方法，旨在减轻潜在风险。之前的审查主要集中在深度伪造检测上，往往忽视了合成图像取证领域的最新进展，特别是采用多模态框架、基于推理的检测和免训练方法。为了填补这一空白，这篇综述提供了检测和分类由先进生成式人工智能模型生成的合成图像的最先进技术的全面和最新概述。该综述系统地检验了核心检测范式，将它们分类为基于空间域、频率域、指纹、补丁、免训练和多模态推理的框架，并简洁地描述了它们的基本原理。我们进一步在公开数据集上对这些方法进行了详细的比较分析，以评估它们的通用性、稳健性和可解释性。最后，综述强调了开放挑战和未来方向，强调混合框架的潜力，该框架结合了免训练方法的效率和多模态模型的语义推理，以推动可信和可解释的合成图像取证。

论文及项目相关链接

PDF 34 pages, 4 Figures, 10 Tables

Summary

本文综述了当前用于检测与分类由先进生成式人工智能模型生成的合成图像的最新技术。文章系统梳理了核心检测范式，包括空间域、频率域、基于指纹、基于补丁、无需训练和多模态推理框架等，并提供了它们的基本原理的简洁描述。此外，文章还在公开数据集上对这些方法进行了详细的比较分析，以评估它们的泛化性、鲁棒性和可解释性。最后，文章强调了混合框架的潜力，该框架结合了无需训练的效率和多模态模型的语义推理，有望推动可信和可解释的合成图像取证技术的发展。

Key Takeaways