发布日期: 2025-09-28

更新日期: 2025-11-27

文章字数: 1.8k

阅读时长: 7 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-09-28 更新

CusEnhancer: A Zero-Shot Scene and Controllability Enhancement Method for Photo Customization via ResInversion

Authors:Maoye Ren, Praneetha Vaddamanu, Jianjin Xu, Fernando De la Torre Frade

Recently remarkable progress has been made in synthesizing realistic human photos using text-to-image diffusion models. However, current approaches face degraded scenes, insufficient control, and suboptimal perceptual identity. We introduce CustomEnhancer, a novel framework to augment existing identity customization models. CustomEnhancer is a zero-shot enhancement pipeline that leverages face swapping techniques, pretrained diffusion model, to obtain additional representations in a zeroshot manner for encoding into personalized models. Through our proposed triple-flow fused PerGeneration approach, which identifies and combines two compatible counter-directional latent spaces to manipulate a pivotal space of personalized model, we unify the generation and reconstruction processes, realizing generation from three flows. Our pipeline also enables comprehensive training-free control over the generation process of personalized models, offering precise controlled personalization for them and eliminating the need for controller retraining for per-model. Besides, to address the high time complexity of null-text inversion (NTI), we introduce ResInversion, a novel inversion method that performs noise rectification via a pre-diffusion mechanism, reducing the inversion time by 129 times. Experiments demonstrate that CustomEnhancer reach SOTA results at scene diversity, identity fidelity, training-free controls, while also showing the efficiency of our ResInversion over NTI. The code will be made publicly available upon paper acceptance.

近期，利用文本到图像扩散模型合成逼真的人脸照片方面取得了显著进展。然而，当前的方法仍然面临场景质量下降、控制不足以及感知身份不够理想等问题。我们引入了CustomEnhancer，这是一个增强现有身份定制模型的新型框架。CustomEnhancer是一种零样本增强管道，它利用人脸替换技术和预训练的扩散模型，以零样本的方式获得额外的表示，并编码到个性化模型中。通过我们提出的Triple-flow fused PerGeneration方法，该方法能够识别和结合两个兼容的逆向潜在空间，以操作个性化模型的关键空间，我们统一了生成和重建过程，实现了三流生成。我们的管道还为个性化模型的生成过程提供了全面的无训练控制，为它们提供了精确的控制个性化，并消除了对每一个模型进行控制器再训练的需求。此外，为了解决空文本反转（NTI）的高时间复杂度问题，我们引入了ResInversion，这是一种新的反转方法，通过预扩散机制进行噪声修正，将反转时间减少了129倍。实验表明，CustomEnhancer在场景多样性、身份保真度、无训练控制等方面达到了最新水平，同时ResInversion相对于NTI也表现出了高效性。论文被接受后，代码将公开发布。

论文及项目相关链接

PDF

Summary

本文介绍了一种名为CustomEnhancer的新型框架，用于增强现有身份定制模型。该框架利用面部替换技术和预训练的扩散模型，以零样本方式获得额外的表示，并编码到个性化模型中。通过提出的三流融合PerGeneration方法，统一了生成和重建过程，实现了从三个流的生成。此外，还引入了一种新型的零文本倒置（ResInversion）方法，以减少倒置时间并提高模型效率。CustomEnhancer在场景多样性、身份保真度和无需训练的控制方面达到了最新结果。

Key Takeaways

CustomEnhancer框架用于增强身份定制模型，利用面部替换技术和预训练扩散模型。
提出的三流融合PerGeneration方法能够统一生成和重建过程。
CustomEnhancer实现了对个性化模型的全面训练控制，无需重新训练控制器。
引入新型倒置方法ResInversion，通过预扩散机制减少噪声校正，大幅度减少倒置时间。
CustomEnhancer在场景多样性、身份保真度方面达到最新结果。
该框架具有高效性和实用性，适用于多种应用场景。

Cool Papers

点此查看论文截图

PerFace: Metric Learning in Perceptual Facial Similarity for Enhanced Face Anonymization

Authors:Haruka Kumagai, Leslie Wöhler, Satoshi Ikehata, Kiyoharu Aizawa

In response to rising societal awareness of privacy concerns, face anonymization techniques have advanced, including the emergence of face-swapping methods that replace one identity with another. Achieving a balance between anonymity and naturalness in face swapping requires careful selection of identities: overly similar faces compromise anonymity, while dissimilar ones reduce naturalness. Existing models, however, focus on binary identity classification “the same person or not”, making it difficult to measure nuanced similarities such as “completely different” versus “highly similar but different.” This paper proposes a human-perception-based face similarity metric, creating a dataset of 6,400 triplet annotations and metric learning to predict the similarity. Experimental results demonstrate significant improvements in both face similarity prediction and attribute-based face classification tasks over existing methods.

随着社会对隐私问题的意识不断提高，面部匿名化技术也得到了发展，包括出现了可以将一个身份替换为另一个身份的面部替换方法。在面部替换中实现匿名性和自然性之间的平衡需要仔细选择身份：面部过于相似会损害匿名性，而差异较大的面部会减少自然性。然而，现有模型主要关注二元身份分类（是否为同一人），这使得难以衡量微妙的相似性，如“完全不同”与“高度相似但不同”。本文提出了基于人类感知的面部相似性度量方法，创建了一个包含6400个三元组注释的数据集，并通过度量学习来预测相似性。实验结果表明，与现有方法相比，该方法在面部相似性预测和基于属性的面部分类任务方面都有显著提高。

论文及项目相关链接

PDF

Summary

随着社会对隐私问题的关注度不断提高，面部匿名化技术得到了发展，出现了用其他身份替换面部身份的面部分换方法。在面部分换中实现匿名和自然平衡需要谨慎选择身份：过于相似的面部会危及匿名性，而差异过大的面部则影响自然度。现有模型主要进行二元身份分类，难以衡量细微的相似度差异，如“完全不同”与“高度相似但不同”。本文提出了基于人类感知的面部相似度度量方法，创建了一个包含6400个三元组注释的数据集，并通过度量学习预测相似度。实验结果表明，该方法在面部相似度预测和基于属性的面部分类任务上均显著优于现有方法。

Key Takeaways