GAN

发布日期: 2025-11-06

更新日期: 2025-11-27

文章字数: 3.3k

阅读时长: 13 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-06 更新

AI-Generated Image Detection: An Empirical Study and Future Research Directions

Authors:Nusrat Tasnim, Kutub Uddin, Khalid Mahmood Malik

The threats posed by AI-generated media, particularly deepfakes, are now raising significant challenges for multimedia forensics, misinformation detection, and biometric system resulting in erosion of public trust in the legal system, significant increase in frauds, and social engineering attacks. Although several forensic methods have been proposed, they suffer from three critical gaps: (i) use of non-standardized benchmarks with GAN- or diffusion-generated images, (ii) inconsistent training protocols (e.g., scratch, frozen, fine-tuning), and (iii) limited evaluation metrics that fail to capture generalization and explainability. These limitations hinder fair comparison, obscure true robustness, and restrict deployment in security-critical applications. This paper introduces a unified benchmarking framework for systematic evaluation of forensic methods under controlled and reproducible conditions. We benchmark ten SoTA forensic methods (scratch, frozen, and fine-tuned) and seven publicly available datasets (GAN and diffusion) to perform extensive and systematic evaluations. We evaluate performance using multiple metrics, including accuracy, average precision, ROC-AUC, error rate, and class-wise sensitivity. We also further analyze model interpretability using confidence curves and Grad-CAM heatmaps. Our evaluations demonstrate substantial variability in generalization, with certain methods exhibiting strong in-distribution performance but degraded cross-model transferability. This study aims to guide the research community toward a deeper understanding of the strengths and limitations of current forensic approaches, and to inspire the development of more robust, generalizable, and explainable solutions.

人工智能生成的媒体，尤其是深度伪造技术所带来的威胁，给多媒体取证、虚假信息检测以及生物识别系统带来了重大挑战，导致公众对法律系统的信任受到侵蚀、欺诈行为和社会工程攻击显著增加。虽然已提出多种取证方法，但它们存在三个关键差距：(i)在使用GAN或扩散生成的图像方面没有标准化的基准测试；(ii)训练协议不一致（例如，从头开始、冻结、微调）；(iii)评估指标有限，无法捕捉通用性和可解释性。这些局限性阻碍了公平比较，掩盖了真正的稳健性，并限制了其在安全关键应用中的部署。本文引入了一个统一的基准测试框架，以在受控和可复制的条件下对取证方法进行系统评估。我们对十种最新取证方法（从头开始、冻结和微调）和七个公开数据集（GAN和扩散）进行基准测试，以进行全面系统的评估。我们使用多个指标评估性能，包括准确性、平均精度、ROC-AUC、错误率和按类别的敏感性。我们还进一步使用置信曲线和Grad-CAM热图分析模型的可解释性。我们的评估显示，在通用性方面存在很大差异，某些方法在内部数据表现良好，但在跨模型迁移时性能下降。本研究旨在引导研究界更深入地了解当前取证方法的优点和局限性，并激发开发更稳健、通用和可解释的解决方案的灵感。

论文及项目相关链接

PDF

Summary
在人工智能生成的媒体，特别是深度伪造技术的影响下，多媒体取证、虚假信息检测以及生物识别系统面临着重大挑战。这些挑战导致公众对法律系统的信任受到侵蚀，欺诈行为和社会工程攻击显著增加。当前存在的几种取证方法存在三大缺陷，无法应对真实场景的复杂性和差异性。本研究引入统一的基准测试框架，旨在系统地评估取证方法。我们对十种最新取证方法和七个公开数据集进行了基准测试，包括使用多种评估指标进行广泛而系统的评估。我们的评估显示，在泛化能力方面存在显著的可变性。本研究旨在引导研究界深入了解当前取证方法的优缺点，并激发开发更稳健、可推广和可解释的解决方案的灵感。

Key Takeaways

AI生成的媒体，特别是深度伪造技术，对多媒体取证、虚假信息检测以及生物识别系统构成重大威胁。
当前存在的取证方法存在三大主要缺陷：缺乏标准化基准测试、训练协议不一致以及评估指标有限。
本研究引入了一个统一的基准测试框架，旨在进行系统的取证方法评估。
对十种最新取证方法和七个公开数据集进行了广泛的基准测试。
评估指标包括准确性、平均精度、ROC-AUC、错误率和类别敏感性等。
模型解释性通过置信曲线和Grad-CAM热图进一步分析。

Cool Papers

点此查看论文截图

Deep Generative Models for Enhanced Vitreous OCT Imaging

Authors:Simone Sarrocco, Philippe C. Cattin, Peter M. Maloca, Paul Friedrich, Philippe Valmaggia

Purpose: To evaluate deep learning (DL) models for enhancing vitreous optical coherence tomography (OCT) image quality and reducing acquisition time. Methods: Conditional Denoising Diffusion Probabilistic Models (cDDPMs), Brownian Bridge Diffusion Models (BBDMs), U-Net, Pix2Pix, and Vector-Quantised Generative Adversarial Network (VQ-GAN) were used to generate high-quality spectral-domain (SD) vitreous OCT images. Inputs were SD ART10 images, and outputs were compared to pseudoART100 images obtained by averaging ten ART10 images per eye location. Model performance was assessed using image quality metrics and Visual Turing Tests, where ophthalmologists ranked generated images and evaluated anatomical fidelity. The best model’s performance was further tested within the manually segmented vitreous on newly acquired data. Results: U-Net achieved the highest Peak Signal-to-Noise Ratio (PSNR: 30.230) and Structural Similarity Index Measure (SSIM: 0.820), followed by cDDPM. For Learned Perceptual Image Patch Similarity (LPIPS), Pix2Pix (0.697) and cDDPM (0.753) performed best. In the first Visual Turing Test, cDDPM ranked highest (3.07); in the second (best model only), cDDPM achieved a 32.9% fool rate and 85.7% anatomical preservation. On newly acquired data, cDDPM generated vitreous regions more similar in PSNR to the ART100 reference than true ART1 or ART10 B-scans and achieved higher PSNR on whole images when conditioned on ART1 than ART10. Conclusions: Results reveal discrepancies between quantitative metrics and clinical evaluation, highlighting the need for combined assessment. cDDPM showed strong potential for generating clinically meaningful vitreous OCT images while reducing acquisition time fourfold. Translational Relevance: cDDPMs show promise for clinical integration, supporting faster, higher-quality vitreous imaging. Dataset and code will be made publicly available.

目的：旨在评估深度学习（DL）模型在提高玻璃体光学相干断层扫描（OCT）图像质量和减少采集时间方面的效果。

方法：采用条件去噪扩散概率模型（cDDPMs）、布朗桥扩散模型（BBDMs）、U-Net、Pix2Pix和向量量化生成对抗网络（VQ-GAN）生成高质量谱域（SD）玻璃体OCT图像。输入为SD ART10图像，输出与通过每个眼位平均十张ART10图像得到的伪ART100图像进行比较。模型性能通过图像质量指标和视觉图灵测试进行评估，眼科医生对生成的图像进行排名并评估其解剖保真度。最佳模型的性能在新获取的数据中的手动分割玻璃体区域进行了进一步测试。

结果：U-Net在峰值信号噪声比（PSNR：30.230）和结构相似性指数度量（SSIM：0.820）方面表现最佳，其次是cDDPM。在感知图像块相似性（LPIPS）方面，Pix2Pix（0.697）和cDDPM（0.753）表现最好。在第一次视觉图灵测试中，cDDPM排名最高（3.07）；在第二次（仅最佳模型）中，cDDPM的欺骗率达到32.9%，解剖保留率为85.7%。在新获取的数据中，cDDPM生成的玻璃体区域在PSNR方面更接近于ART100参考，而非真实的ART1或ART10 B扫描，并且在以ART1为条件时，整个图像的PSNR更高。

论文及项目相关链接

PDF

摘要

本文旨在评估深度学习模型在提高玻璃体光学相干断层扫描（OCT）图像质量及减少采集时间方面的表现。研究使用了条件去噪扩散概率模型（cDDPMs）、布朗桥扩散模型（BBDMs）、U-Net、Pix2Pix和向量量化生成对抗网络（VQ-GAN）来生成高质量谱域（SD）玻璃体OCT图像。模型性能通过图像质量指标和视觉图灵测试进行评估，其中眼科医生对生成图像进行排名并评估其解剖学的忠实度。结果显示，U-Net在峰值信号噪声比（PSNR）和结构相似性指数（SSIM）方面表现最佳，cDDPM紧随其后。在视觉图灵测试中，cDDPM排名第一。在新获取的数据上，cDDPM生成的玻璃体区域与ART100参考的PSNR更相似，并有潜力减少采集时间四倍。因此，cDDPM在生成具有临床意义的玻璃体OCT图像方面表现出强大潜力。

要点

研究使用多种深度学习模型来提升玻璃体OCT图像质量并缩短采集时间。
U-Net和cDDPM在图像质量评估指标上表现突出。
cDDPM在视觉图灵测试中排名第一，生成图像获得眼科医生的高度评价。
cDDPM在新数据上表现出强大的潜力，生成的玻璃体图像与高标准参考相似，并能显著减少采集时间。
研究强调了综合评估的重要性，因为定量指标和临床评价之间存在差异。
cDDPM具有潜在的临床整合价值，支持更快、更高质量的玻璃体成像。

Cool Papers

点此查看论文截图

MediQ-GAN: Quantum-Inspired GAN for High Resolution Medical Image Generation

Authors:Qingyue Jiao, Yongcan Tang, Jun Zhuang, Jason Cong, Yiyu Shi

Machine learning-assisted diagnosis shows promise, yet medical imaging datasets are often scarce, imbalanced, and constrained by privacy, making data augmentation essential. Classical generative models typically demand extensive computational and sample resources. Quantum computing offers a promising alternative, but existing quantum-based image generation methods remain limited in scale and often face barren plateaus. We present MediQ-GAN, a quantum-inspired GAN with prototype-guided skip connections and a dual-stream generator that fuses classical and quantum-inspired branches. Its variational quantum circuits inherently preserve full-rank mappings, avoid rank collapse, and are theory-guided to balance expressivity with trainability. Beyond generation quality, we provide the first latent-geometry and rank-based analysis of quantum-inspired GANs, offering theoretical insight into their performance. Across three medical imaging datasets, MediQ-GAN outperforms state-of-the-art GANs and diffusion models. While validated on IBM hardware for robustness, our contribution is hardware-agnostic, offering a scalable and data-efficient framework for medical image generation and augmentation.

机器学习辅助诊断具有广阔前景，但医学成像数据集往往稀缺、不平衡且受隐私限制，使得数据增强变得至关重要。传统生成模型通常需要大量的计算和样本资源。量子计算提供了一个有前景的替代方案，但现有的基于量子图像生成的方法在规模上仍然有限，并且经常面临计算资源瓶颈。我们提出了MediQ-GAN，这是一种受量子启发的GAN，具有原型引导跳跃连接和双流生成器，融合了经典和量子启发分支。其变分量子电路本质上保留了全秩映射，避免了秩崩溃，并在理论指导下平衡了表达能力和可训练性。除了生成质量外，我们还对量子启发型的GAN进行了首次基于潜在几何和秩的分析，为其性能提供了理论见解。在三个医学成像数据集上，MediQ-GAN优于最先进的GAN和扩散模型。虽然在IBM硬件上进行了稳健性验证，但我们的贡献与硬件无关，为医学图像生成和增强提供了一个可扩展和高效的数据框架。

论文及项目相关链接

PDF

Summary

医疗影像数据集常面临稀缺、不均衡和隐私保护等多重挑战，机器学习辅助诊断在此背景下展现潜力。量子计算为数据增强提供了有前景的替代方案，但现有量子图像生成方法规模有限且常面临训练瓶颈。本研究提出MediQ-GAN，结合原型引导跳跃连接和双流生成器，融合经典与量子分支。其变分量子电路固有地保持全秩映射，避免秩崩溃，并在理论指导下平衡表达性与可训练性。此外，本研究还提供量子启发式GAN的潜在几何结构和秩的分析。MediQ-GAN在三项医学成像数据集上的表现优于现有前沿GAN和扩散模型。虽在IBM硬件上验证稳健性，但本研究的贡献是硬件无关的，为医学图像生成和增强提供可伸缩和数据高效框架。

Key Takeaways