发布日期: 2025-09-20

更新日期: 2025-11-27

文章字数: 1.5k

阅读时长: 6 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-20 更新

PVLM: Parsing-Aware Vision Language Model with Dynamic Contrastive Learning for Zero-Shot Deepfake Attribution

Authors:Yaning Zhang, Jiahe Zhang, Chunjie Ma, Weili Guan, Tian Gan, Zan Gao

The challenge of tracing the source attribution of forged faces has gained significant attention due to the rapid advancement of generative models. However, existing deepfake attribution (DFA) works primarily focus on the interaction among various domains in vision modality, and other modalities such as texts and face parsing are not fully explored. Besides, they tend to fail to assess the generalization performance of deepfake attributors to unseen advanced generators like diffusion in a fine-grained manner. In this paper, we propose a novel parsing-aware vision language model with dynamic contrastive learning(PVLM) method for zero-shot deepfake attribution (ZS-DFA),which facilitates effective and fine-grained traceability to unseen advanced generators. Specifically, we conduct a novel and fine-grained ZS-DFA benchmark to evaluate the attribution performance of deepfake attributors to unseen advanced generators like diffusion. Besides, we propose an innovative parsing-guided vision language model with dynamic contrastive learning (PVLM) method to capture general and diverse attribution features. We are motivated by the observation that the preservation of source face attributes in facial images generated by GAN and diffusion models varies significantly. We employ the inherent face attributes preservation differences to capture face parsing-aware forgery representations. Therefore, we devise a novel parsing encoder to focus on global face attribute embeddings, enabling parsing-guided DFA representation learning via dynamic vision-parsing matching. Additionally, we present a novel deepfake attribution contrastive center loss to pull relevant generators closer and push irrelevant ones away, which can be introduced into DFA models to enhance traceability. Experimental results show that our model exceeds the state-of-the-art on the ZS-DFA benchmark via various protocol evaluations.

随着生成模型的快速发展，追踪伪造面孔的源头归属挑战备受关注。然而，现有的深度伪造归属（DFA）主要关注视觉模态中不同域之间的交互，而文本和面部解析等其他模态并未得到完全探索。此外，它们往往无法以精细的方式评估深度伪造归属者对未见的高级生成器（如扩散）的通用性能。在本文中，我们提出了一种新颖的解析感知视觉语言模型，结合动态对比学习（PVLM）方法进行零样本深度伪造归属（ZS-DFA），以促进对未见的高级生成器的有效和精细追踪。具体来说，我们构建了一个新颖且精细的ZS-DFA基准测试，以评估深度伪造归属者对未见的高级生成器的归属性能，如扩散模型。除此之外，我们提出了一个创新的解析引导视觉语言模型与动态对比学习方法（PVLM），以捕捉通用和多样化的归属特征。我们的动机来源于观察到由GAN和扩散模型生成的面部图像中保留的源脸属性差异很大。我们利用这种固有的脸属性保留差异来捕捉面部解析感知的伪造表示。因此，我们设计了一种新的解析编码器，专注于全局面部属性嵌入，通过动态视觉解析匹配实现解析引导DFA表示学习。此外，我们还提出了一种新的深度伪造归属对比中心损失，使相关生成器更接近，将不相关的生成器推开，可以引入到DFA模型中以增强可追踪性。实验结果表明，我们的模型在各种协议评估上的ZS-DFA基准测试上超过了现有技术。

论文及项目相关链接

PDF

摘要

随着生成模型技术的快速发展，伪造人脸溯源问题备受关注。现有深度伪造归因（DFA）主要关注视觉模态中不同域之间的交互，而对文本和人脸解析等其他模态的探索不足。此外，它们往往无法对深度伪造归因者的泛化性能进行精细评估，尤其是在面对未见的高级生成器如扩散模型时。本文提出了一种新颖的解析感知视觉语言模型，结合动态对比学习（PVLM）方法进行零样本深度伪造归因（ZS-DFA），实现有效且精细的未见高级生成器溯源。我们构建了精细的ZS-DFA基准测试来评估深度伪造归因的性能，并创新性地提出了一个解析引导的视觉语言模型与动态对比学习方法，以捕捉一般且多样的归因特征。我们观察到，由GAN和扩散模型生成的面部图像在源面部属性保留方面存在显著差异，利用这一差异捕捉面部解析感知的伪造表示。因此，我们设计了一种新型解析编码器，专注于全局面部属性嵌入，通过动态视觉解析匹配实现解析引导DFA表示学习。此外，我们还提出了一种深度伪造对比中心损失，能将相关生成器拉近并推离不相关生成器，可引入DFA模型以增强溯源能力。实验结果表明，我们的模型在ZS-DFA基准测试中超过了现有技术。

关键见解