发布日期: 2025-01-14

更新日期: 2025-02-10

文章字数: 2.7k

阅读时长: 10 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-01-14 更新

Authors:Siyu Liu, Zheng-Peng Duan, Jia OuYang, Jiayi Fu, Hyunhee Park, Zikun Liu, Chun-Le Guo, Chongyi Li

Blind face restoration is a highly ill-posed problem due to the lack of necessary context. Although existing methods produce high-quality outputs, they often fail to faithfully preserve the individual’s identity. In this paper, we propose a personalized face restoration method, FaceMe, based on a diffusion model. Given a single or a few reference images, we use an identity encoder to extract identity-related features, which serve as prompts to guide the diffusion model in restoring high-quality and identity-consistent facial images. By simply combining identity-related features, we effectively minimize the impact of identity-irrelevant features during training and support any number of reference image inputs during inference. Additionally, thanks to the robustness of the identity encoder, synthesized images can be used as reference images during training, and identity changing during inference does not require fine-tuning the model. We also propose a pipeline for constructing a reference image training pool that simulates the poses and expressions that may appear in real-world scenarios. Experimental results demonstrate that our FaceMe can restore high-quality facial images while maintaining identity consistency, achieving excellent performance and robustness.

面部盲解算是一个高度不适定的问题，由于缺乏必要的上下文信息。尽管现有方法能够产生高质量的输出，但它们往往不能忠实保留个体的身份特征。在本文中，我们提出了一种基于扩散模型的个性化面部恢复方法FaceMe。给定一个或多个参考图像，我们使用身份编码器提取与身份相关的特征，这些特征作为提示来指导扩散模型恢复高质量且身份一致的面部图像。通过简单地结合身份相关特征，我们在训练过程中有效地减少了与身份无关特征的影响，并在推理过程中支持任何数量的参考图像输入。此外，由于身份编码器的稳健性，合成的图像可以用作训练过程中的参考图像，推理过程中的身份变化不需要对模型进行微调。我们还提出了构建模拟现实世界场景中可能出现的姿势和表情的参考图像训练池的流程。实验结果表明，我们的FaceMe能够恢复高质量的面部图像，同时保持身份一致性，实现了卓越的性能和稳健性。

论文及项目相关链接

PDF To appear at AAAI 2025

Summary

基于扩散模型，本文提出了一种个性化的人脸修复方法FaceMe。该方法利用身份编码器提取身份相关特征，作为提示信息来指导扩散模型恢复高质量且身份一致的人脸图像。通过结合身份相关特征，该方法在训练过程中有效减少了身份无关特征的影响，并在推理阶段支持任何数量的参考图像输入。此外，由于身份编码器的稳健性，合成的图像可用于训练阶段的参考图像，而身份变化在推理阶段无需微调模型。实验结果表明，FaceMe能够在保持身份一致性的同时恢复高质量的人脸图像，具有良好的性能和鲁棒性。

Key Takeaways

本文提出了一种基于扩散模型的人脸修复方法FaceMe，旨在解决盲人脸修复中的身份保留问题。
使用身份编码器提取身份相关特征，作为扩散模型的提示信息，以恢复高质量且身份一致的人脸图像。
方法结合了身份相关特征，有效减少训练过程中的身份无关特征影响。
该方法在推理阶段支持任意数量的参考图像输入。
身份编码器的稳健性允许使用合成图像作为训练阶段的参考图像，且身份变化时无需微调模型。
构建了模拟真实场景中的姿态和表情的参考图像训练池。

Cool Papers

点此查看论文截图

ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion

Authors:Zitian Zhang, Frédéric Fortier-Chouinard, Mathieu Garon, Anand Bhattad, Jean-François Lalonde

We present ZeroComp, an effective zero-shot 3D object compositing approach that does not require paired composite-scene images during training. Our method leverages ControlNet to condition from intrinsic images and combines it with a Stable Diffusion model to utilize its scene priors, together operating as an effective rendering engine. During training, ZeroComp uses intrinsic images based on geometry, albedo, and masked shading, all without the need for paired images of scenes with and without composite objects. Once trained, it seamlessly integrates virtual 3D objects into scenes, adjusting shading to create realistic composites. We developed a high-quality evaluation dataset and demonstrate that ZeroComp outperforms methods using explicit lighting estimations and generative techniques in quantitative and human perception benchmarks. Additionally, ZeroComp extends to real and outdoor image compositing, even when trained solely on synthetic indoor data, showcasing its effectiveness in image compositing.

我们提出了ZeroComp，这是一种有效的零样本3D对象合成方法，在训练过程中不需要成对的复合场景图像。我们的方法利用ControlNet对内在图像进行条件处理，并结合Stable Diffusion模型利用场景先验，共同构成一个有效的渲染引擎。在训练过程中，ZeroComp使用基于几何、颜色和遮罩阴影的内在图像，而无需成对带有和不带复合对象的场景图像。训练完成后，它可以将虚拟的3D对象无缝集成到场景中，通过调整阴影来创建逼真的合成效果。我们开发了一个高质量的评价数据集，并证明ZeroComp在定量和人类感知基准测试中表现优于使用显式光照估计和生成技术的方法。此外，即使在仅使用合成室内数据训练的情况下，ZeroComp也能扩展到真实和户外图像合成，展示了其在图像合成中的有效性。

Summary

ZeroComp是一种有效的零样本3D对象合成方法，无需配对复合场景图像进行训练。该方法结合ControlNet与Stable Diffusion模型，以内在图像为条件，利用场景先验，有效渲染引擎。ZeroComp利用几何、颜色和遮罩阴影的内在图像进行训练，无需场景配对图像。训练后，可无缝集成虚拟3D对象，调整阴影以创建逼真的合成图像。

Key Takeaways

ZeroComp是一种零样本3D对象合成方法。
无需配对复合场景图像进行训练。
结合ControlNet与Stable Diffusion模型形成有效渲染引擎。
利用内在图像（基于几何、颜色和遮罩阴影）进行训练。
可无缝集成虚拟3D对象并调整阴影。
在定量和人类感知基准测试中表现出色。

Cool Papers

点此查看论文截图

Guess What I Think: Streamlined EEG-to-Image Generation with Latent Diffusion Models

Authors:Eleonora Lopez, Luigi Sigillo, Federica Colonnese, Massimo Panella, Danilo Comminiello

Generating images from brain waves is gaining increasing attention due to its potential to advance brain-computer interface (BCI) systems by understanding how brain signals encode visual cues. Most of the literature has focused on fMRI-to-Image tasks as fMRI is characterized by high spatial resolution. However, fMRI is an expensive neuroimaging modality and does not allow for real-time BCI. On the other hand, electroencephalography (EEG) is a low-cost, non-invasive, and portable neuroimaging technique, making it an attractive option for future real-time applications. Nevertheless, EEG presents inherent challenges due to its low spatial resolution and susceptibility to noise and artifacts, which makes generating images from EEG more difficult. In this paper, we address these problems with a streamlined framework based on the ControlNet adapter for conditioning a latent diffusion model (LDM) through EEG signals. We conduct experiments and ablation studies on popular benchmarks to demonstrate that the proposed method beats other state-of-the-art models. Unlike these methods, which often require extensive preprocessing, pretraining, different losses, and captioning models, our approach is efficient and straightforward, requiring only minimal preprocessing and a few components. The code is available at https://github.com/LuigiSigillo/GWIT.

将脑电波转化为图像正受到越来越多的关注，因为它具有通过了解脑信号如何编码视觉线索来推动脑机接口（BCI）系统发展的潜力。大多数文献都集中在功能磁共振成像（fMRI）到图像的任务上，因为fMRI具有高的空间分辨率。然而，fMRI是一种昂贵的神经成像模式，无法实现实时BCI。另一方面，脑电图（EEG）是一种低成本、非侵入式且便携的神经成像技术，使其成为未来实时应用的理想选择。然而，EEG存在固有的挑战，其空间分辨率低，易受到噪声和伪影的影响，这使得从EEG生成图像更加困难。在本文中，我们使用基于ControlNet适配器的简化框架来解决这些问题，该框架通过脑电图信号调节潜在扩散模型（LDM）。我们在流行基准测试上进行了实验和消融研究，以证明所提出的方法优于其他最先进模型。与其他方法不同，这些方法通常需要大量的预处理、预训练、不同的损失和标题模型，我们的方法高效且直观，仅需要最少的预处理和几个组件。代码可在https://github.com/LuigiSigillo/GWIT找到。

论文及项目相关链接

PDF Accepted at ICASSP 2025

Summary
本文关注通过脑电波生成图像的研究，旨在利用潜在扩散模型（LDM）通过EEG信号实现图像生成，以解决fMRI高成本及不利于实时BCI的问题。研究提出一个基于ControlNet适配器的简化框架，有效应对EEG信号的低空间分辨率、易受噪声和伪影干扰等挑战。实验和消融研究证明，该方法优于其他先进模型，且更高效、简洁，只需少量预处理和组件。代码已公开。

Key Takeaways