⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-10-23 更新
Efficient Few-shot Identity Preserving Attribute Editing for 3D-aware Deep Generative Models
Authors:Vishal Vinod
Identity preserving editing of faces is a generative task that enables modifying the illumination, adding/removing eyeglasses, face aging, editing hairstyles, modifying expression etc., while preserving the identity of the face. Recent progress in 2D generative models have enabled photorealistic editing of faces using simple techniques leveraging the compositionality in GANs. However, identity preserving editing for 3D faces with a given set of attributes is a challenging task as the generative model must reason about view consistency from multiple poses and render a realistic 3D face. Further, 3D portrait editing requires large-scale attribute labelled datasets and presents a trade-off between editability in low-resolution and inflexibility to editing in high resolution. In this work, we aim to alleviate some of the constraints in editing 3D faces by identifying latent space directions that correspond to photorealistic edits. To address this, we present a method that builds on recent advancements in 3D-aware deep generative models and 2D portrait editing techniques to perform efficient few-shot identity preserving attribute editing for 3D-aware generative models. We aim to show from experimental results that using just ten or fewer labelled images of an attribute is sufficient to estimate edit directions in the latent space that correspond to 3D-aware attribute editing. In this work, we leverage an existing face dataset with masks to obtain the synthetic images for few attribute examples required for estimating the edit directions. Further, to demonstrate the linearity of edits, we investigate one-shot stylization by performing sequential editing and use the (2D) Attribute Style Manipulation (ASM) technique to investigate a continuous style manifold for 3D consistent identity preserving face aging. Code and results are available at: https://vishal-vinod.github.io/gmpi-edit/
面部身份保留编辑是一项生成任务,可以修改照明、添加/删除眼镜、面部衰老、编辑发型、修改表情等,同时保留面部的身份。最近二维生成模型的进步已经能够使用简单的技术利用生成对抗网络(GANs)中的组合性来进行逼真的面部编辑。然而,对于给定属性集的3D面部进行身份保留编辑是一项具有挑战性的任务,因为生成模型必须从多个姿势中推断视图的一致性并呈现一个逼真的3D面部。此外,3D肖像编辑需要大规模属性标签数据集,并在低分辨率下的可编辑性和高分辨率下的编辑灵活性之间存在权衡。在这项工作中,我们的目标是通过识别对应于逼真编辑的潜在空间方向来缓解编辑3D面部的一些约束。为解决这一问题,我们提出了一种方法,该方法建立在最新的三维感知深度生成模型和二维肖像编辑技术之上,以执行高效且少量的身份保留属性编辑的三维感知生成模型。我们的实验结果表明,仅使用十个或更少的属性标签图像就足以在潜在空间中估计对应于三维感知属性编辑的编辑方向。在这项工作中,我们利用现有的带面具的面孔数据集来获得少量属性示例的合成图像,这些图像用于估计编辑方向。此外,为了展示编辑的线性,我们通过按顺序编辑来探究一次性风格化,并使用(二维)属性风格操作(ASM)技术来探究用于三维一致的身份保留面部衰老的连续风格流形。代码和结果可通过以下网址获取:https://vishal-vinod.github.io/gmpi-edit/ 。
论文及项目相关链接
PDF 14 pages, 7 figures
Summary
身份保留的面部编辑是一项能够修改照明、增减眼镜、面部老化、编辑发型、表情等,同时保留面部身份的生成任务。尽管二维面部生成模型有所进展,但使用GANs的组成性技术简单编辑三维面部仍具挑战性。本研究旨在缓解编辑三维面部的一些约束,通过识别对应于逼真编辑的潜在空间方向。我们利用三维感知深度生成模型的最新进展和二维肖像编辑技术,为三维感知生成模型执行高效少数身份保留属性编辑。实验结果表明,仅使用少数属性标签图像即可估计潜在空间中的编辑方向,对应三维感知属性编辑。本研究使用带有掩码的面部数据集来获得少数属性示例所需的合成图像,以估计编辑方向。同时研究展示编辑的线性,通过顺序编辑展示一次风格化,并利用二维属性风格操纵技术探究连续风格流形用于三维一致的身份保留面部老化。代码和结果可在相关网站找到。
Key Takeaways
- 身份保留的面部编辑是修改面部属性同时保留身份特征的生成任务。
- 在三维面部编辑中,需要处理多姿态下的视图一致性,并在低分辨率和高分辨率之间取得平衡。
- 本研究旨在通过识别潜在空间方向来进行高效少数身份保留属性编辑。
- 仅使用少数属性标签图像即可估计编辑方向,对应三维感知属性编辑的实现。
- 利用带有掩码的面部数据集获得合成图像用于估计编辑方向。
- 研究展示了编辑的线性特性,通过顺序编辑展示一次风格化。
点此查看论文截图
Regression is all you need for medical image translation
Authors:Sebastian Rassmann, David Kügler, Christian Ewert, Martin Reuter
While Generative Adversarial Nets (GANs) and Diffusion Models (DMs) have achieved impressive results in natural image synthesis, their core strengths - creativity and realism - can be detrimental in medical applications, where accuracy and fidelity are paramount. These models instead risk introducing hallucinations and replication of unwanted acquisition noise. Here, we propose YODA (You Only Denoise once - or Average), a 2.5D diffusion-based framework for medical image translation (MIT). Consistent with DM theory, we find that conventional diffusion sampling stochastically replicates noise. To mitigate this, we draw and average multiple samples, akin to physical signal averaging. As this effectively approximates the DM’s expected value, we term this Expectation-Approximation (ExpA) sampling. We additionally propose regression sampling YODA, which retains the initial DM prediction and omits iterative refinement to produce noise-free images in a single step. Across five diverse multi-modal datasets - including multi-contrast brain MRI and pelvic MRI-CT - we demonstrate that regression sampling is not only substantially more efficient but also matches or exceeds image quality of full diffusion sampling even with ExpA. Our results reveal that iterative refinement solely enhances perceptual realism without benefiting information translation, which we confirm in relevant downstream tasks. YODA outperforms eight state-of-the-art DMs and GANs and challenges the presumed superiority of DMs and GANs over computationally cheap regression models for high-quality MIT. Furthermore, we show that YODA-translated images are interchangeable with, or even superior to, physical acquisitions for several medical applications.
尽管生成对抗网络(GANs)和扩散模型(DMs)在自然图像合成方面取得了令人印象深刻的结果,但它们在医学应用中的核心优势——创造性和现实感——可能会产生不利影响,因为在医学应用中,准确性和保真度是极其重要的。这些模型反而可能引入幻觉和不必要的获取噪声的复制。在这里,我们提出了YODA(You Only Denoise once - or Average,即仅去噪一次或平均),这是一个基于2.5D扩散的医学图像翻译(MIT)框架。根据DM理论,我们发现传统的扩散采样会随机复制噪声。为了缓解这一问题,我们绘制并平均多个样本,类似于物理信号平均。由于这有效地逼近DM的期望值,我们将其称为期望近似(ExpA)采样。此外,我们还提出了回归采样YODA,它保留了初始DM预测值,并省略了迭代细化,以单步生成无噪声图像。在五个多样的多模式数据集上——包括多对比度脑部MRI和盆腔MRI-CT——我们证明回归采样不仅大大提高了效率,而且即使使用ExpA,其图像质量也与全扩散采样相匹配或更高。我们的结果表明,迭代细化只是提高了感知的真实性,并没有对信息翻译产生益处,这一点在相关的下游任务中也得到了证实。YODA优于八种最先进的DMs和GANs,并对DMs和GANs在高质量MIT上相对于计算成本较低的回归模型的优越性提出了质疑。此外,我们还展示了YODA翻译的图像可以互换或甚至优于多种医学应用的物理采集图像。
论文及项目相关链接
Summary
针对医学图像翻译(MIT),提出了YODA(You Only Denoise once - or Average)框架,采用基于扩散模型的Expect-Approximation采样方法和回归采样策略,既减少了噪声复制,又提高了效率,提升了图像质量。研究结果表明YODA性能卓越,超越其他主流模型和GAN在多样的多模态数据集上表现显著,被验证能高效进行高质量的医学图像翻译,改变了深度学习中过度依赖复杂模型的趋势。
Key Takeaways
- GANs和DMs在自然图像合成上表现优异,但在医学应用中可能引入伪像和复制噪音,存在准确性和忠实度的问题。
- YODA框架是一种基于扩散模型的医学图像翻译框架,用于减少噪声复制并提高效率。
- Expect-Approximation采样通过平均多个样本近似DM的预期值来优化效果。
- YODA框架引入回归采样策略以生成无噪声图像,极大地提高了计算效率。
- 在多模态数据集上的实验显示YODA优于其他先进DMs和GANs,能够生成高质量医学图像。这对于下游任务尤其重要。
点此查看论文截图