发布日期: 2025-11-19

更新日期: 2025-11-27

文章字数: 2.4k

阅读时长: 9 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-19 更新

PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos

Authors:Dianbing Xi, Guoyuan An, Jingsen Zhu, Zhijian Liu, Yuan Liu, Ruiyuan Zhang, Jiayuan Lu, Rui Wang, Yuchi Huo

We propose PFAvatar (Pose-Fusion Avatar), a new method that reconstructs high-quality 3D avatars from ``Outfit of the Day’’ (OOTD) photos, which exhibit diverse poses, occlusions, and complex backgrounds. Our method consists of two stages: (1) fine-tuning a pose-aware diffusion model from few-shot OOTD examples and (2) distilling a 3D avatar represented by a neural radiance field (NeRF). In the first stage, unlike previous methods that segment images into assets (e.g., garments, accessories) for 3D assembly, which is prone to inconsistency, we avoid decomposition and directly model the full-body appearance. By integrating a pre-trained ControlNet for pose estimation and a novel Condition Prior Preservation Loss (CPPL), our method enables end-to-end learning of fine details while mitigating language drift in few-shot training. Our method completes personalization in just 5 minutes, achieving a 48$\times$ speed-up compared to previous approaches. In the second stage, we introduce a NeRF-based avatar representation optimized by canonical SMPL-X space sampling and Multi-Resolution 3D-SDS. Compared to mesh-based representations that suffer from resolution-dependent discretization and erroneous occluded geometry, our continuous radiance field can preserve high-frequency textures (e.g., hair) and handle occlusions correctly through transmittance. Experiments demonstrate that PFAvatar outperforms state-of-the-art methods in terms of reconstruction fidelity, detail preservation, and robustness to occlusions/truncations, advancing practical 3D avatar generation from real-world OOTD albums. In addition, the reconstructed 3D avatar supports downstream applications such as virtual try-on, animation, and human video reenactment, further demonstrating the versatility and practical value of our approach.

我们提出了PFAvatar（姿态融合化身）这一新方法，能够从“日常穿搭”（OOTD）照片重建高质量的三维化身。这些照片展现出多种姿态、遮挡和复杂背景。我们的方法分为两个阶段：（1）利用少量OOTD示例对姿态感知扩散模型进行微调；（2）通过神经辐射场（NeRF）蒸馏出三维化身。在第一阶段，不同于以往将图像分割成资产（如服装、配饰）进行三维组装的方法，这种方法容易导致不一致性。我们避免分解，直接对全身外观进行建模。通过集成预训练的ControlNet进行姿态估计和新颖的条件先验保留损失（CPPL），我们的方法能够在端到端学习中精细细节，同时减轻少量训练中的语言漂移。我们的方法只需5分钟即可完成个性化，与以前的方法相比，速度提高了48倍。在第二阶段，我们引入基于NeRF的化身表示，通过规范的SMPL-X空间采样和多分辨率3D-SDS进行优化。与基于网格的表示方法相比，我们的连续辐射场可以保留高频纹理（如头发），并通过透射正确地处理遮挡。实验表明，PFAvatar在重建保真度、细节保留、遮挡/截断鲁棒性方面优于现有技术，推动了从现实世界OOTD相册中进行实用三维化身生成的发展。此外，重建的三维化身支持下游应用，如虚拟试穿、动画和人类视频重演，进一步证明了我们的方法的通用性和实用价值。

论文及项目相关链接

PDF Accepted by AAAI 2026

Summary
本文提出一种名为PFAvatar的新方法，能够从姿势多变、遮挡复杂、背景丰富的日常穿搭照片重建高质量的三维虚拟角色。此方法分为两个阶段：第一阶段利用少量照片精细调整姿态感知扩散模型，无需分解为不同素材而直接模拟全身外观来避免不一致性问题；第二阶段引入基于神经辐射场的虚拟角色表示方法，以更有效地保存高频纹理和正确应对遮挡问题。相较于其他方法，PFAvatar在个人定制化上仅耗时五分钟，实现了速度的显著提升；并且其重建的三维虚拟角色更真实、细节更丰富，遮挡处理更稳健，可广泛应用于虚拟试穿、动画制作、视频替换等下游应用。

Key Takeaways

PFAvatar能从日常穿搭照片重建高质量的三维虚拟角色。
方法分为两个阶段：姿态感知扩散模型的调整和基于神经辐射场的虚拟角色表示。
第一阶段避免了分解图像导致的素材不一致性问题，通过整合姿态估计和控制网络以及条件先验保存损失，实现细节精细学习并减少语言漂移。
第二阶段引入了基于NeRF的虚拟角色表示方法，解决了网格表示中的分辨率依赖离散化和遮挡几何错误问题。
PFAvatar实现了个性化的快速完成（仅需五分钟），相比之前的方法有巨大的速度提升。
PFAvatar在重建的三维虚拟角色中能够保存高频纹理并正确处理遮挡。

Cool Papers

点此查看论文截图

Dream, Lift, Animate: From Single Images to Animatable Gaussian Avatars

Authors:Marcel C. Bühler, Ye Yuan, Xueting Li, Yangyi Huang, Koki Nagano, Umar Iqbal

We introduce Dream, Lift, Animate (DLA), a novel framework that reconstructs animatable 3D human avatars from a single image. This is achieved by leveraging multi-view generation, 3D Gaussian lifting, and pose-aware UV-space mapping of 3D Gaussians. Given an image, we first dream plausible multi-views using a video diffusion model, capturing rich geometric and appearance details. These views are then lifted into unstructured 3D Gaussians. To enable animation, we propose a transformer-based encoder that models global spatial relationships and projects these Gaussians into a structured latent representation aligned with the UV space of a parametric body model. This latent code is decoded into UV-space Gaussians that can be animated via body-driven deformation and rendered conditioned on pose and viewpoint. By anchoring Gaussians to the UV manifold, our method ensures consistency during animation while preserving fine visual details. DLA enables real-time rendering and intuitive editing without requiring post-processing. Our method outperforms state-of-the-art approaches on the ActorsHQ and 4D-Dress datasets in both perceptual quality and photometric accuracy. By combining the generative strengths of video diffusion models with a pose-aware UV-space Gaussian mapping, DLA bridges the gap between unstructured 3D representations and high-fidelity, animation-ready avatars.

我们介绍了Dream、Lift、Animate（DLA），这是一个新型框架，可以从单张图像重建可动画的3D人类化身。这是通过利用多视角生成、3D高斯提升和姿态感知的UV空间高斯映射来实现的。给定一张图像，我们首先使用视频扩散模型模拟出合理的多视角，捕捉丰富的几何和外观细节。然后，这些视角被提升为无结构的3D高斯。为了实现动画效果，我们提出了一种基于变压器的编码器，该编码器能够模拟全局空间关系并将这些高斯投影到与参数化身体模型的UV空间相对应的结构化潜在表示中。这个潜在代码被解码为UV空间的高斯，可以通过身体驱动变形进行动画处理，并根据姿势和视点进行渲染。通过将高斯锚定到UV流形，我们的方法确保了动画过程中的一致性，同时保留了精细的视觉细节。DLA实现了实时渲染和直观编辑，无需后期处理。我们的方法在ActorsHQ和4D-Dress数据集上的感知质量和光度准确性方面都优于最新技术。通过将视频扩散模型的生成能力与姿态感知的UV空间高斯映射相结合，DLA在无结构的3D表示和高保真、可动画的化身之间搭建了桥梁。

论文及项目相关链接

PDF Accepted to 3DV 2026

Summary

本文介绍了名为Dream，Lift，Animate（DLA）的新型框架，该框架能够从单一图像重建可动态调整的3D人类角色。它利用多角度生成、3D高斯提升和姿态感知UV空间高斯映射等技术实现。给定图像，首先通过视频扩散模型模拟出多角度视图，捕获丰富的几何和外观细节。这些视图被提升到无结构化的3D高斯中。为了支持动画效果，提出了一种基于变压器的编码器，该编码器能够模拟全局空间关系并将这些高斯投影到与参数化身体模型的UV空间对齐的结构化潜在表示中。这种潜在代码被解码为UV空间高斯，可以通过身体驱动变形进行动画处理，并根据姿势和视点进行渲染。DLA方法确保了动画过程中的一致性，同时保留了精细的视觉细节，实现了实时渲染和直观编辑，无需后期处理。在ActorsHQ和4D-Dress数据集上的表现优于现有技术，在感知质量和光度准确性方面均有所提升。

Key Takeaways