GAN

发布日期: 2025-09-24

更新日期: 2025-11-27

文章字数: 5.8k

阅读时长: 23 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-24 更新

HyPlaneHead: Rethinking Tri-plane-like Representations in Full-Head Image Synthesis

Authors:Heyuan Li, Kenkun Liu, Lingteng Qiu, Qi Zuo, Keru Zheng, Zilong Dong, Xiaoguang Han

Tri-plane-like representations have been widely adopted in 3D-aware GANs for head image synthesis and other 3D object/scene modeling tasks due to their efficiency. However, querying features via Cartesian coordinate projection often leads to feature entanglement, which results in mirroring artifacts. A recent work, SphereHead, attempted to address this issue by introducing spherical tri-planes based on a spherical coordinate system. While it successfully mitigates feature entanglement, SphereHead suffers from uneven mapping between the square feature maps and the spherical planes, leading to inefficient feature map utilization during rendering and difficulties in generating fine image details. Moreover, both tri-plane and spherical tri-plane representations share a subtle yet persistent issue: feature penetration across convolutional channels can cause interference between planes, particularly when one plane dominates the others. These challenges collectively prevent tri-plane-based methods from reaching their full potential. In this paper, we systematically analyze these problems for the first time and propose innovative solutions to address them. Specifically, we introduce a novel hybrid-plane (hy-plane for short) representation that combines the strengths of both planar and spherical planes while avoiding their respective drawbacks. We further enhance the spherical plane by replacing the conventional theta-phi warping with a novel near-equal-area warping strategy, which maximizes the effective utilization of the square feature map. In addition, our generator synthesizes a single-channel unified feature map instead of multiple feature maps in separate channels, thereby effectively eliminating feature penetration. With a series of technical improvements, our hy-plane representation enables our method, HyPlaneHead, to achieve state-of-the-art performance in full-head image synthesis.

基于三角平面的表示方法因其高效性在三维生成对抗网络（GANs）中广泛应用于头部图像合成和其他三维物体/场景建模任务。然而，通过笛卡尔坐标投影查询特征往往会导致特征纠缠，从而产生镜像伪影。最近的一项工作SphereHead试图通过引入基于球形坐标系统的球形三角平面来解决这个问题。虽然它成功地减轻了特征纠缠问题，但SphereHead在方形特征图与球形平面之间的映射不均匀，导致渲染时特征图的有效利用率降低，难以生成精细的图像细节。此外，无论是三角平面还是球形三角平面表示都存在一个微妙而持久的问题：卷积通道间的特征穿透会干扰平面间关系，特别是当一个平面占据主导地位时。这些挑战共同阻碍了基于三角平面的方法发挥潜力。在本文中，我们首次系统地分析了这些问题并提出了创新的解决方案。具体来说，我们引入了一种新型混合平面（简称hy-plane）表示法，它结合了平面和球形平面的优点，避免了各自的缺点。我们进一步通过采用新型近等面积弯曲策略取代了传统的theta-phi弯曲，以最大化方形特征图的有效利用率。此外，我们的生成器合成了一个单通道的统一特征图，而不是多个单独通道的特征图，从而有效地消除了特征穿透。经过一系列的技术改进，我们的hy-plane表示法使我们的方法HyPlaneHead在全头图像合成中达到了最先进的性能。

论文及项目相关链接

PDF Accepted by NeurIPS 2025

Summary

本文介绍了在3D-aware GANs中广泛采用的Tri-plane-like表示方法及其在头图像合成和其他3D对象/场景建模任务中的应用。然而，通过笛卡尔坐标投影查询特征常常会导致特征纠缠，从而产生镜像伪影。最新工作SphereHead尝试通过引入基于球面坐标系统的球形三平面来解决这一问题，虽然成功缓解了特征纠缠问题，但存在方形特征图与球面平面之间的映射不均匀、渲染时特征图利用效率低以及难以生成精细图像细节等缺点。针对这些问题，本文系统地分析了tri-plane和球形tri-plane表示的缺点，并首次提出了创新解决方案。具体地，引入了一种结合平面和球面平面优势的新型混合平面（简称hy-plane）表示法，同时避免了各自的缺点。通过采用新型近等面积 warping 策略替换传统的 theta-phi warping，提高了方形特征图的有效利用率。此外，生成器合成单通道统一特征图，有效消除了跨通道的特征穿透。通过一系列技术改进，hy-plane表示法使HyPlaneHead方法在全景头图像合成方面达到最新技术水平。

Key Takeaways

Tri-plane-like representations are widely used in 3D-aware GANs for 3D object/scene modeling tasks due to their efficiency.
Feature entanglement caused by Cartesian coordinate projection querying leads to mirroring artifacts.
SphereHead addresses feature entanglement but faces challenges of uneven mapping and inefficient feature map utilization.
The proposed hy-plane representation combines the strengths of planar and spherical planes, addressing the respective drawbacks.
A novel near-equal-area warping strategy maximizes the effective utilization of square feature maps.
The generator’s single-channel unified feature map effectively eliminates feature penetration.

Cool Papers

点此查看论文截图

DistillMatch: Leveraging Knowledge Distillation from Vision Foundation Model for Multimodal Image Matching

Authors:Meng Yang, Fan Fan, Zizhuo Li, Songchu Deng, Yong Ma, Jiayi Ma

Multimodal image matching seeks pixel-level correspondences between images of different modalities, crucial for cross-modal perception, fusion and analysis. However, the significant appearance differences between modalities make this task challenging. Due to the scarcity of high-quality annotated datasets, existing deep learning methods that extract modality-common features for matching perform poorly and lack adaptability to diverse scenarios. Vision Foundation Model (VFM), trained on large-scale data, yields generalizable and robust feature representations adapted to data and tasks of various modalities, including multimodal matching. Thus, we propose DistillMatch, a multimodal image matching method using knowledge distillation from VFM. DistillMatch employs knowledge distillation to build a lightweight student model that extracts high-level semantic features from VFM (including DINOv2 and DINOv3) to assist matching across modalities. To retain modality-specific information, it extracts and injects modality category information into the other modality’s features, which enhances the model’s understanding of cross-modal correlations. Furthermore, we design V2I-GAN to boost the model’s generalization by translating visible to pseudo-infrared images for data augmentation. Experiments show that DistillMatch outperforms existing algorithms on public datasets.

多模态图像匹配旨在寻找不同模态图像之间的像素级对应关系，这对于跨模态感知、融合和分析至关重要。然而，不同模态之间显著的外观差异使这项任务具有挑战性。由于高质量标注数据集的稀缺，现有的深度学习匹配方法提取模态通用特征的表现不佳，并且缺乏对多种场景的适应性。视觉基础模型（VFM）在大规模数据上进行训练，可以产生可推广和适应各种模态数据和任务的特征表示，包括多模态匹配。因此，我们提出了使用知识蒸馏的DistillMatch多模态图像匹配方法。DistillMatch采用知识蒸馏技术构建了一个轻量级的学生模型，该模型可以从VFM（包括DINOv2和DINOv3）提取高级语义特征，以协助跨模态匹配。为了保留模态特定信息，它将模态类别信息提取并注入到其他模态的特征中，增强了模型对跨模态关联的理解。此外，我们设计了V2I-GAN，通过将从可见光图像转换为伪红外图像来进行数据增强，以提高模型的泛化能力。实验表明，DistillMatch在公共数据集上的性能优于现有算法。

论文及项目相关链接

PDF 10 pages, 4 figures, 3 tables

Summary

基于大规模数据集训练的Vision Foundation Model（VFM）可有效实现多模态图像匹配。针对跨模态匹配难题，提出DistillMatch方法，利用知识蒸馏技术从VFM中构建轻量级学生模型，提取高语义特征进行匹配。同时结合模态类别信息增强模型对跨模态关联的理解，并通过V2I-GAN提升模型的泛化能力。实验表明，DistillMatch在公开数据集上优于现有算法。

Key Takeaways

多模态图像匹配面临像素级对应不同模态图像的挑战。
Vision Foundation Model（VFM）提供通用、适应多种模态的鲁棒特征表示。
DistillMatch利用知识蒸馏技术从VFM中学习，适用于多模态匹配。
DistillMatch构建轻量级学生模型，提取高语义特征并注入模态类别信息以增强跨模态匹配。
V2I-GAN用于提升模型的泛化能力，通过可见光到伪红外图像的转换实现数据增强。
DistillMatch在公开数据集上的表现优于现有算法。

Cool Papers

点此查看论文截图

QWD-GAN: Quality-aware Wavelet-driven GAN for Unsupervised Medical Microscopy Images Denoising

Authors:Qijun Yang, Yating Huang, Lintao Xiang, Hujun Yin

Image denoising plays a critical role in biomedical and microscopy imaging, especially when acquiring wide-field fluorescence-stained images. This task faces challenges in multiple fronts, including limitations in image acquisition conditions, complex noise types, algorithm adaptability, and clinical application demands. Although many deep learning-based denoising techniques have demonstrated promising results, further improvements are needed in preserving image details, enhancing algorithmic efficiency, and increasing clinical interpretability. We propose an unsupervised image denoising method based on a Generative Adversarial Network (GAN) architecture. The approach introduces a multi-scale adaptive generator based on the Wavelet Transform and a dual-branch discriminator that integrates difference perception feature maps with original features. Experimental results on multiple biomedical microscopy image datasets show that the proposed model achieves state-of-the-art denoising performance, particularly excelling in the preservation of high-frequency information. Furthermore, the dual-branch discriminator is seamlessly compatible with various GAN frameworks. The proposed quality-aware, wavelet-driven GAN denoising model is termed as QWD-GAN.

图像去噪在生物医学和显微镜成像中扮演着至关重要的角色，尤其是在获取宽场荧光染色图像时。此任务面临着多方面的挑战，包括图像采集条件的限制、复杂的噪声类型、算法适应性和临床应用需求。尽管许多基于深度学习的去噪技术在细节保护、算法效率提升和临床可解释性增强方面取得了令人鼓舞的结果，但仍需要进一步改进。我们提出了一种基于生成对抗网络（GAN）架构的无监督图像去噪方法。该方法引入了一种基于小波变换的多尺度自适应生成器，以及一种将差异感知特征图与原始特征相结合的双重分支鉴别器。在多个生物医学显微镜图像数据集上的实验结果表明，该模型实现了最先进的去噪性能，特别是在高频信息保护方面表现出色。此外，双重分支鉴别器可以无缝地与各种GAN框架兼容。所提出的质量感知、小波驱动的去噪GAN模型被称为QWD-GAN。

论文及项目相关链接

PDF

Summary

基于生成对抗网络（GAN）的医学图像去噪方法。提出一种多尺度自适应生成器与双分支鉴别器结合的方法，实验结果显示该模型在去噪性能上达到先进水平，特别擅长保留高频信息。此方法适用于多种GAN框架，命名为QWD-GAN。

Key Takeaways

图像去噪在生物医学和显微镜成像中至关重要，尤其是在获取宽场荧光染色图像时面临的挑战包括图像采集条件限制、复杂噪声类型、算法适应性和临床应用需求。
虽然许多基于深度学习的去噪技术在保护图像细节、提高算法效率和增加临床解释性方面取得了令人鼓舞的结果，但仍需进一步改进。
提出了一种基于生成对抗网络（GAN）架构的无监督图像去噪方法。
该方法引入了一个多尺度自适应生成器，基于小波变换和双分支鉴别器，该鉴别器结合了差异感知特征图和原始特征。
实验结果表明，所提出模型在去噪性能上达到先进水平，尤其在保留高频信息方面表现出色。
双分支鉴别器与各种GAN框架无缝兼容。

Cool Papers

点此查看论文截图

Causal Fingerprints of AI Generative Models

Authors:Hui Xu, Chi Liu, Congcong Zhu, Minghao Wang, Youyang Qu, Longxiang Gao

AI generative models leave implicit traces in their generated images, which are commonly referred to as model fingerprints and are exploited for source attribution. Prior methods rely on model-specific cues or synthesis artifacts, yielding limited fingerprints that may generalize poorly across different generative models. We argue that a complete model fingerprint should reflect the causality between image provenance and model traces, a direction largely unexplored. To this end, we conceptualize the \emph{causal fingerprint} of generative models, and propose a causality-decoupling framework that disentangles it from image-specific content and style in a semantic-invariant latent space derived from pre-trained diffusion reconstruction residual. We further enhance fingerprint granularity with diverse feature representations. We validate causality by assessing attribution performance across representative GANs and diffusion models and by achieving source anonymization using counterfactual examples generated from causal fingerprints. Experiments show our approach outperforms existing methods in model attribution, indicating strong potential for forgery detection, model copyright tracing, and identity protection.

人工智能生成模型在其生成的图像中留下隐含的痕迹，通常被称为模型指纹，并被用于来源归属。之前的方法依赖于特定模型的线索或合成痕迹，产生的指纹有限，在不同生成模型之间的推广性较差。我们认为，完整的模型指纹应该反映图像来源和模型痕迹之间的因果关系，这是一个尚未被充分探索的方向。为此，我们提出了生成模型的“因果指纹”概念，并构建了一个因果解耦框架，从语义不变的潜在空间中解耦出图像特定的内容和风格，该潜在空间来源于预训练的扩散重建残差。我们进一步使用多样的特征表示增强指纹的粒度。我们通过评估代表性GAN和扩散模型的归属性能以及通过使用因果指纹生成的反事实示例实现来源匿名化来验证因果关系。实验表明，我们的方法在模型归属方面优于现有方法，显示出在伪造检测、模型版权追踪和身份保护方面的强大潜力。

论文及项目相关链接

PDF 5 page. In submission

Summary

本文探讨AI生成模型在生成图像时留下的隐式痕迹，即模型指纹，并用于来源归属。传统方法依赖于模型特定线索或合成工件，产生的指纹有限，且在不同生成模型之间的泛化性能可能较差。本文提出完整的模型指纹应反映图像来源与模型痕迹之间的因果关系，这是一个尚未广泛探索的方向。为此，本文概念化了生成模型的“因果指纹”，并提出一个因果解耦框架，该框架在语义不变的潜在空间中从预训练的扩散重建残差中将其与图像特定的内容和风格区分开。通过多样的特征表示增强了指纹的粒度。通过评估代表性GAN和扩散模型的归属性能，并通过使用因果指纹生成的反事实实例实现源匿名化，验证了因果性。实验表明，该方法在模型归属方面的性能优于现有方法，显示出在伪造检测、模型版权追踪和身份保护方面的强大潜力。

Key Takeaways

AI生成模型在生成的图像中留下隐式痕迹，即模型指纹，可用于来源归属。
现有方法主要依赖于模型特定线索或合成工件，其指纹泛化能力有限。
完整的模型指纹应反映图像来源与模型痕迹之间的因果关系，这是一个新的探索方向。
提出了生成模型的“因果指纹”概念，并设计了因果解耦框架。
该框架在语义不变的潜在空间中区分模型指纹和图像内容、风格。
通过多种特征表示增强了指纹的粒度，提高了归属性能。

Cool Papers

点此查看论文截图

RaceGAN: A Framework for Preserving Individuality while Converting Racial Information for Image-to-Image Translation

Authors:Mst Tasnim Pervin, George Bebis, Fang Jiang, Alireza Tavakkoli

Generative adversarial networks (GANs) have demonstrated significant progress in unpaired image-to-image translation in recent years for several applications. CycleGAN was the first to lead the way, although it was restricted to a pair of domains. StarGAN overcame this constraint by tackling image-to-image translation across various domains, although it was not able to map in-depth low-level style changes for these domains. Style mapping via reference-guided image synthesis has been made possible by the innovations of StarGANv2 and StyleGAN. However, these models do not maintain individuality and need an extra reference image in addition to the input. Our study aims to translate racial traits by means of multi-domain image-to-image translation. We present RaceGAN, a novel framework capable of mapping style codes over several domains during racial attribute translation while maintaining individuality and high level semantics without relying on a reference image. RaceGAN outperforms other models in translating racial features (i.e., Asian, White, and Black) when tested on Chicago Face Dataset. We also give quantitative findings utilizing InceptionReNetv2-based classification to demonstrate the effectiveness of our racial translation. Moreover, we investigate how well the model partitions the latent space into distinct clusters of faces for each ethnic group.

生成对抗网络（GANs）近年来在多种应用的非配对图像到图像转换方面取得了显著进展。虽然CycleGAN率先引领了这一方向，但它仅限于两个领域的转换。StarGAN突破了这一限制，解决了跨多个领域的图像到图像转换问题，但它无法对这些领域的深层低级风格变化进行映射。通过StarGANv2和StyleGAN的创新，通过参考引导图像合成进行风格映射已成为可能。然而，这些模型不能保持个性，除了输入图像外还需要额外的参考图像。我们的研究旨在通过多领域图像到图像转换进行种族特征转换。我们提出了RaceGAN，这是一个新型框架，能够在种族属性转换过程中映射多个领域的风格代码，同时保持个性并依赖高级语义而无需参考图像。在芝加哥面部数据集上进行测试时，RaceGAN在翻译种族特征（如亚洲人、白人和黑人）方面优于其他模型。我们还使用基于InceptionReNetv2的分类方法进行定量研究，以证明我们种族翻译的有效性。此外，我们还调查了模型如何将潜在空间划分成每个民族群体的不同面部集群。

论文及项目相关链接

PDF

Summary

基于生成对抗网络（GANs）在近年来在无配对图像到图像转换方面的显著进展，本研究提出了RaceGAN框架。相较于仅适用于配对领域的CycleGAN以及无法实现深度低级风格变化的StarGAN，RaceGAN能够映射多个领域的风格代码，进行种族特征翻译时保持个体性和高级语义，且无需参考图像。在芝加哥面部数据集上测试时，RaceGAN在翻译亚洲、白人、黑人等种族特征方面表现出卓越性能。此外，本研究还使用InceptionReNetv2分类证明了种族翻译的有效性，并探讨了模型对潜在空间的不同种族面部集群的划分情况。

Key Takeaways

研究采用生成对抗网络（GANs）进行多领域图像到图像的转换，旨在实现种族特征的翻译。
提出的RaceGAN框架能在不同领域的风格代码映射中进行种族特征翻译，保持个体性和高级语义。
RaceGAN在芝加哥面部数据集上的表现优于其他模型，特别是在翻译亚洲、白人、黑人等种族特征时。
研究使用InceptionReNetv2分类来证明种族翻译的有效性。
模型能将潜在空间划分为不同种族的面部集群。
RaceGAN无需参考图像，具有更高的灵活性和实用性。

Cool Papers

点此查看论文截图

X-GAN: A Generative AI-Powered Unsupervised Model for Main Vessel Segmentation of Glaucoma Screening

Authors:Cheng Huang, Weizheng Xie, Tsengdar J. Lee, Jui-Kai Wang, Karanjit Kooner, Ning Zhang, Jia Zhang

Structural changes in main retinal blood vessels serve as critical biomarkers for the onset and progression of glaucoma. Identifying these vessels is vital for vascular modeling yet highly challenging. This paper proposes X-GAN, a generative AI-powered unsupervised segmentation model designed for extracting main blood vessels from Optical Coherence Tomography Angiography (OCTA) images. The process begins with the Space Colonization Algorithm (SCA) to rapidly generate a skeleton of vessels, featuring their radii. By synergistically integrating the generative adversarial network (GAN) with biostatistical modeling of vessel radii, X-GAN enables a fast reconstruction of both 2D and 3D representations of the vessels. Based on this reconstruction, X-GAN achieves nearly 100% segmentation accuracy without relying on labeled data or high-performance computing resources. Experimental results confirm X-GAN’s superiority in evaluating main vessel segmentation compared to existing deep learning models.

视网膜主血管的结构变化是青光眼发生和发展的关键生物标志物。识别这些血管对血管建模至关重要，但极具挑战性。本文提出了X-GAN，这是一种基于生成式人工智能的无监督分割模型，旨在从光学相干断层扫描血管造影（OCTA）图像中提取主血管。流程始于空间殖民算法（SCA），该算法可以快速生成血管的骨架并显示其半径。通过协同整合生成对抗网络（GAN）与血管半径的生物统计建模，X-GAN可以快速重建血管的二维和三维表示。基于这种重建，X-GAN在不依赖标记数据或高性能计算资源的情况下实现了近100％的分割精度。实验结果证实了X-GAN在评估主血管分割方面优于现有的深度学习模型。

论文及项目相关链接

PDF

Summary
视网膜主血管结构变化是青光眼发生和发展的重要生物标志物，其识别对血管建模至关重要但极具挑战。本文提出一种基于生成对抗神经网络（GAN）的无监督分割模型X-GAN，可从光学相干断层扫描血管造影（OCTA）图像中提取主血管。结合血管半径的生物统计建模，X-GAN能迅速重建血管的二维和三维表示，实现近百分之百的分割精度，无需依赖标注数据和高性能计算资源。

Key Takeaways