GAN

发布日期: 2025-09-20

更新日期: 2025-11-27

文章字数: 4.7k

阅读时长: 19 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-20 更新

A Race Bias Free Face Aging Model for Reliable Kinship Verification

Authors:Ali Nazari, Bardiya Kariminia, Mohsen Ebrahimi Moghaddam

The age gap in kinship verification addresses the time difference between the photos of the parent and the child. Moreover, their same-age photos are often unavailable, and face aging models are racially biased, which impacts the likeness of photos. Therefore, we propose a face aging GAN model, RA-GAN, consisting of two new modules, RACEpSp and a feature mixer, to produce racially unbiased images. The unbiased synthesized photos are used in kinship verification to investigate the results of verifying same-age parent-child images. The experiments demonstrate that our RA-GAN outperforms SAM-GAN on an average of 13.14% across all age groups, and CUSP-GAN in the 60+ age group by 9.1% in terms of racial accuracy. Moreover, RA-GAN can preserve subjects’ identities better than SAM-GAN and CUSP-GAN across all age groups. Additionally, we demonstrate that transforming parent and child images from the KinFaceW-I and KinFaceW-II datasets to the same age can enhance the verification accuracy across all age groups. The accuracy increases with our RA-GAN for the kinship relationships of father-son and father-daughter, mother-son, and mother-daughter, which are 5.22, 5.12, 1.63, and 0.41, respectively, on KinFaceW-I. Additionally, the accuracy for the relationships of father-daughter, father-son, and mother-son is 2.9, 0.39, and 1.6 on KinFaceW-II, respectively. The code is available at~\href{https://github.com/bardiya2254kariminia/An-Age-Transformation-whitout-racial-bias-for-Kinship-verification}{Github}

在亲属关系验证中的年龄差距主要解决了父母和孩子照片之间的时间差异问题。此外，他们的同龄照片通常不可用，并且面部衰老模型存在种族偏见，这影响了照片的真实性。因此，我们提出了一种面部衰老生成对抗网络模型（RA-GAN），它由两个新模块RACEpSp和特征混合器组成，以生成无种族偏见的图像。这些无偏的合成照片用于亲属关系验证，以验证同龄的父母与孩子的图像结果。实验表明，我们的RA-GAN在所有年龄段平均比SAM-GAN高出13.14%，在60岁以上的年龄段比CUSP-GAN高出9.1%，就种族准确性而言。此外，RA-GAN在所有年龄段中都能更好地保留主体的身份，相较于SAM-GAN和CUSP-GAN。另外，我们证明了将KinFaceW-I和KinFaceW-II数据集中的父母和孩子图像转变为相同的年龄可以提高所有年龄段的验证准确性。在KinFaceW-I上，对于父子、父女、母子、母女等亲属关系的验证准确性，随着我们RA-GAN的使用而提高，分别为5.22、5.12、1.63和0.41。此外，在KinFaceW-II上，对于父女、父子、母子的关系验证准确性分别为2.9、0.39和1.6。代码可在Github上获得：https://github.com/bardiya2254kariminia/An-Age-Transformation-whitout-racial-bias-for-Kinship-verification。

论文及项目相关链接

PDF

Summary
在亲子关系验证中，存在年龄差距问题，即父母与孩子的照片时间不同，且难以找到他们的同龄照片。此外，面部衰老模型存在种族偏见，影响照片的真实度。为此，研究提出了一种面部衰老GAN模型RA-GAN，包含两个新模块RACEpSp和特征混合器，以生成无种族偏见的图像。该模型在亲子关系验证中的同年龄父母-孩子图像验证结果表现出色。实验显示，RA-GAN在所有年龄段平均比SAM-GAN高出13.14%，在60岁以上年龄段比CUSP-GAN高出9.1%的种族准确性。此外，RA-GAN在保留主体身份方面优于SAM-GAN和CUSP-GAN。通过对KinFaceW-I和KinFaceW-II数据集的图片进行年龄转换，可以提高所有年龄段的验证准确性。关于亲缘关系的准确性方面，RA-GAN在多个关系上的表现均有提升。相关代码已上传至GitHub。

Key Takeaways

研究关注亲子关系验证中的年龄差距问题，提出面部衰老GAN模型RA-GAN。
RA-GAN包含RACEpSp和特征混合器两个新模块，用于生成无种族偏见的图像。
实验显示RA-GAN在亲子关系验证中表现优越，种族准确性高于其他模型。
RA-GAN能更准确地保留主体身份，优于其他模型。
通过转换数据集图片至相同年龄，可提高验证准确性。
RA-GAN在多种亲缘关系上的准确性有所提升。

Cool Papers

点此查看论文截图

Dataset Distillation for Super-Resolution without Class Labels and Pre-trained Models

Authors:Sunwoo Cho, Yejin Jung, Nam Ik Cho, Jae Woong Soh

Training deep neural networks has become increasingly demanding, requiring large datasets and significant computational resources, especially as model complexity advances. Data distillation methods, which aim to improve data efficiency, have emerged as promising solutions to this challenge. In the field of single image super-resolution (SISR), the reliance on large training datasets highlights the importance of these techniques. Recently, a generative adversarial network (GAN) inversion-based data distillation framework for SR was proposed, showing potential for better data utilization. However, the current method depends heavily on pre-trained SR networks and class-specific information, limiting its generalizability and applicability. To address these issues, we introduce a new data distillation approach for image SR that does not need class labels or pre-trained SR models. In particular, we first extract high-gradient patches and categorize images based on CLIP features, then fine-tune a diffusion model on the selected patches to learn their distribution and synthesize distilled training images. Experimental results show that our method achieves state-of-the-art performance while using significantly less training data and requiring less computational time. Specifically, when we train a baseline Transformer model for SR with only 0.68% of the original dataset, the performance drop is just 0.3 dB. In this case, diffusion model fine-tuning takes 4 hours, and SR model training completes within 1 hour, much shorter than the 11-hour training time with the full dataset.

训练深度神经网络的要求越来越高，需要大量的数据集和重要的计算资源，尤其是随着模型复杂性的提高。旨在提高数据效率的数据蒸馏方法已在这一挑战中展现出有前途的解决方案。在单图像超分辨率（SISR）领域，对大量训练数据集的依赖突显了这些技术的重要性。最近，提出了一种基于生成对抗网络（GAN）反演的用于SR的数据蒸馏框架，显示出更好的数据利用潜力。然而，当前的方法严重依赖于预训练的SR网络和特定类别的信息，这限制了其通用性和适用性。为了解决这些问题，我们介绍了一种用于图像SR的新数据蒸馏方法，该方法不需要类标签或预训练的SR模型。具体来说，我们首先提取高梯度补丁并根据CLIP特征对图像进行分类，然后对选定的补丁微调扩散模型以学习其分布并合成蒸馏的训练图像。实验结果表明，我们的方法在使用显著更少训练数据和更短计算时间的情况下实现了最先进的性能。具体来说，当我们仅使用原始数据集的0.68%来训练SR的基线Transformer模型时，性能下降仅为0.3分贝。在这种情况下，扩散模型微调需要4小时，SR模型训练在1小时内完成，远远短于使用完整数据集的11小时训练时间。

论文及项目相关链接

PDF

Summary
神经网络训练对大数据集和计算资源的需求日益增加，尤其是模型复杂性提高的情况下。数据蒸馏方法旨在提高数据效率，已成为应对这一挑战的有前途的解决方案。在单图像超分辨率（SISR）领域，新的基于生成对抗网络（GAN）反转的数据蒸馏框架显示出更好的数据利用潜力，但当前方法严重依赖于预训练的SR网络和特定类别的信息，限制了其通用性和适用性。为解决这些问题，我们提出了一种新的图像SR数据蒸馏方法，无需类标签或预训练的SR模型。我们通过提取高梯度补丁并使用CLIP特征对图像进行分类，然后微调扩散模型以学习其分布并合成蒸馏训练图像。实验结果表明，我们的方法在使用较少的训练数据和更短的计算时间内达到了最先进的性能。

Key Takeaways

数据蒸馏方法旨在提高数据效率，成为应对神经网络训练高需求的有前途的解决方案。
当前GAN在SISR中的数据蒸馏方法存在对预训练模型和类标签的依赖，限制了其通用性。
新的数据蒸馏方法通过提取高梯度补丁并使用CLIP特征分类图像，实现了无需类标签或预训练SR模型的图像SR。
通过对选定的补丁微调扩散模型，该方法能够学习其分布并合成蒸馏训练图像。
实验结果表明，新方法在减少训练数据使用和提高计算效率的同时，达到了先进的性能。
使用仅0.68%的原始数据集进行训练，性能下降仅为0.3 dB，显示出该方法的实际效果。

Cool Papers

点此查看论文截图

PVLM: Parsing-Aware Vision Language Model with Dynamic Contrastive Learning for Zero-Shot Deepfake Attribution

Authors:Yaning Zhang, Jiahe Zhang, Chunjie Ma, Weili Guan, Tian Gan, Zan Gao

The challenge of tracing the source attribution of forged faces has gained significant attention due to the rapid advancement of generative models. However, existing deepfake attribution (DFA) works primarily focus on the interaction among various domains in vision modality, and other modalities such as texts and face parsing are not fully explored. Besides, they tend to fail to assess the generalization performance of deepfake attributors to unseen advanced generators like diffusion in a fine-grained manner. In this paper, we propose a novel parsing-aware vision language model with dynamic contrastive learning(PVLM) method for zero-shot deepfake attribution (ZS-DFA),which facilitates effective and fine-grained traceability to unseen advanced generators. Specifically, we conduct a novel and fine-grained ZS-DFA benchmark to evaluate the attribution performance of deepfake attributors to unseen advanced generators like diffusion. Besides, we propose an innovative parsing-guided vision language model with dynamic contrastive learning (PVLM) method to capture general and diverse attribution features. We are motivated by the observation that the preservation of source face attributes in facial images generated by GAN and diffusion models varies significantly. We employ the inherent face attributes preservation differences to capture face parsing-aware forgery representations. Therefore, we devise a novel parsing encoder to focus on global face attribute embeddings, enabling parsing-guided DFA representation learning via dynamic vision-parsing matching. Additionally, we present a novel deepfake attribution contrastive center loss to pull relevant generators closer and push irrelevant ones away, which can be introduced into DFA models to enhance traceability. Experimental results show that our model exceeds the state-of-the-art on the ZS-DFA benchmark via various protocol evaluations.

随着生成模型的快速发展，追踪伪造面孔的来源归属问题得到了广泛关注。然而，现有的深度伪造归属（DFA）主要关注视觉模态中不同域之间的交互，而文本和面部解析等其他模态并未得到充分的探索。此外，它们往往无法以精细的方式评估深度伪造归属者对未见的高级生成器（如扩散）的泛化性能。在本文中，我们提出了一种新颖的解析感知视觉语言模型，采用动态对比学习（PVLM）方法进行零样本深度伪造归属（ZS-DFA），该方法有助于对未见的高级生成器进行有效的精细追溯。具体来说，我们构建了一个新颖且精细的ZS-DFA基准测试，以评估深度伪造归属者对未见的高级生成器（如扩散）的归属性能。此外，我们提出了一个创新的解析引导视觉语言模型，采用动态对比学习方法来捕捉通用和多样化的归属特征。我们的动机是观察到GAN和扩散模型生成的面部图像中保留的源脸属性差异很大。我们利用固有的面部属性保留差异来捕捉面部解析感知的伪造表示。因此，我们设计了一种新颖的解析编码器，专注于全局面部属性嵌入，通过动态视觉解析匹配实现解析引导DFA表示学习。此外，我们还提出了一种新的深度伪造归属对比中心损失，将相关生成器拉近并推开不相关的生成器，可以引入到DFA模型中以提高追溯能力。实验结果表明，我们的模型在各种协议评估上的ZS-DFA基准测试上超过了最新技术状态。

论文及项目相关链接

PDF

Summary
生成对抗网络（GAN）的伪造面部溯源问题已备受关注。现有深度伪造归属（DFA）主要集中在视觉模态的域交互上，其他模态如文本和面部解析尚未充分探索。本文提出一种新颖的解析感知视觉语言模型——动态对比学习（PVLM）方法，用于零样本深度伪造归属（ZS-DFA），实现对未见先进生成器的有效和精细溯源。

Key Takeaways

现有的深度伪造归属方法主要集中在视觉模态的域交互上，对其他模态如文本和面部解析的探索不足。
针对未见先进生成器（如扩散模型）的溯源问题，本文提出了基于解析感知的零样本深度伪造归属方法。
采用动态对比学习，通过解析指导的视觉语言模型（PVLM）捕捉一般和多样的归属特征。
利用面部图像中由GAN和扩散模型生成的源面部属性保留差异，进行面部解析感知的伪造表示学习。
引入面部解析编码器，专注于全局面部属性嵌入，通过动态视觉解析匹配实现解析指导的DFA表示学习。
提出一种新的深度伪造归属对比中心损失，使相关生成器更接近，不相关的则远离，可引入DFA模型以增强追溯性。

Cool Papers

点此查看论文截图

A Mutual Information Perspective on Multiple Latent Variable Generative Models for Positive View Generation

Authors:Dario Serez, Marco Cristani, Alessio Del Bue, Vittorio Murino, Pietro Morerio

In image generation, Multiple Latent Variable Generative Models (MLVGMs) employ multiple latent variables to gradually shape the final images, from global characteristics to finer and local details (e.g., StyleGAN, NVAE), emerging as powerful tools for diverse applications. Yet their generative dynamics remain only empirically observed, without a systematic understanding of each latent variable’s impact. In this work, we propose a novel framework that quantifies the contribution of each latent variable using Mutual Information (MI) as a metric. Our analysis reveals that current MLVGMs often underutilize some latent variables, and provides actionable insights for their use in downstream applications. With this foundation, we introduce a method for generating synthetic data for Self-Supervised Contrastive Representation Learning (SSCRL). By leveraging the hierarchical and disentangled variables of MLVGMs, our approach produces diverse and semantically meaningful views without the need for real image data. Additionally, we introduce a Continuous Sampling (CS) strategy, where the generator dynamically creates new samples during SSCRL training, greatly increasing data variability. Our comprehensive experiments demonstrate the effectiveness of these contributions, showing that MLVGMs’ generated views compete on par with or even surpass views generated from real data. This work establishes a principled approach to understanding and exploiting MLVGMs, advancing both generative modeling and self-supervised learning. Code and pre-trained models at: https://github.com/SerezD/mi_ml_gen.

在图像生成领域，多重潜在变量生成模型（MLVGMs）利用多个潜在变量逐步形成最终图像，从全局特征到更精细和局部细节（例如StyleGAN、NVAE），成为多样应用的强大工具。然而，它们的生成动态仅被经验性观察，而没有系统了解每个潜在变量的影响。在这项工作中，我们提出了一个利用互信息（MI）作为指标来量化每个潜在变量贡献的新框架。我们的分析表明，当前的MLVGMs通常未能充分利用某些潜在变量，并为下游应用中的使用提供了可操作的见解。在此基础上，我们引入了一种用于自我监督对比表示学习（SSCRL）生成合成数据的方法。通过利用MLVGMs的分层和解纠缠变量，我们的方法能够在不需要真实图像数据的情况下，产生多样且语义丰富的视图。此外，我们引入了连续采样（CS）策略，生成器在SSCRL训练过程中动态创建新样本，从而极大地增加了数据的变化性。我们的全面实验证明了这些贡献的有效性，显示MLVGMs生成的视图与来自真实数据的视图竞争，甚至表现更好。这项工作为理解和利用MLVGMs建立了原则性的方法，推动了生成建模和自我监督学习的发展。代码和预先训练的模型请见：https://github.com/SerezD/mi_ml_gen。

论文及项目相关链接

PDF

Summary
本文研究了多重潜在变量生成模型（MLVGMs）在图像生成中的应用。通过提出使用互信息（MI）作为指标来量化每个潜在变量的贡献，文章揭示了MLVGMs的生成动态。研究发现，当前MLVGMs往往未能充分利用某些潜在变量，并提出了一种基于MLVGMs生成合成数据用于自监督对比表示学习（SSCRL）的方法。该方法利用MLVGMs的分层和分离变量产生多样且语义丰富的视图，且无需真实图像数据。此外，引入了一种动态采样（CS）策略，在SSCRL训练过程中动态创建新样本，极大提高了数据变化性。实验证明，MLVGMs生成的视图与真实数据生成的视图相当甚至更优。本文为理解和利用MLVGMs提供了原则性方法，推动了生成建模和自监督学习的发展。

Key Takeaways