发布日期: 2025-11-20

更新日期: 2025-11-27

文章字数: 1.4k

阅读时长: 5 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-11-20 更新

PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos

Authors:Dianbing Xi, Guoyuan An, Jingsen Zhu, Zhijian Liu, Yuan Liu, Ruiyuan Zhang, Jiayuan Lu, Yuchi Huo, Rui Wang

We propose PFAvatar (Pose-Fusion Avatar), a new method that reconstructs high-quality 3D avatars from Outfit of the Day(OOTD) photos, which exhibit diverse poses, occlusions, and complex backgrounds. Our method consists of two stages: (1) fine-tuning a pose-aware diffusion model from few-shot OOTD examples and (2) distilling a 3D avatar represented by a neural radiance field (NeRF). In the first stage, unlike previous methods that segment images into assets (e.g., garments, accessories) for 3D assembly, which is prone to inconsistency, we avoid decomposition and directly model the full-body appearance. By integrating a pre-trained ControlNet for pose estimation and a novel Condition Prior Preservation Loss (CPPL), our method enables end-to-end learning of fine details while mitigating language drift in few-shot training. Our method completes personalization in just 5 minutes, achieving a 48x speed-up compared to previous approaches. In the second stage, we introduce a NeRF-based avatar representation optimized by canonical SMPL-X space sampling and Multi-Resolution 3D-SDS. Compared to mesh-based representations that suffer from resolution-dependent discretization and erroneous occluded geometry, our continuous radiance field can preserve high-frequency textures (e.g., hair) and handle occlusions correctly through transmittance. Experiments demonstrate that PFAvatar outperforms state-of-the-art methods in terms of reconstruction fidelity, detail preservation, and robustness to occlusions/truncations, advancing practical 3D avatar generation from real-world OOTD albums. In addition, the reconstructed 3D avatar supports downstream applications such as virtual try-on, animation, and human video reenactment, further demonstrating the versatility and practical value of our approach.

我们提出了PFAvatar（姿态融合化身），这是一种新的方法，可以从日常穿搭（OOTD）照片重建高质量的三维化身，这些照片展现出各种姿态、遮挡和复杂背景。我们的方法分为两个阶段：（1）通过少量OOTD示例对姿态感知扩散模型进行微调；（2）通过神经辐射场（NeRF）表示三维化身并进行提炼。在第一阶段，不同于以往将图像分割成资产（如服装、配饰）进行三维组装的方法，这种方法容易导致不一致性。我们避免分解，直接对全身外观进行建模。通过集成预先训练的ControlNet进行姿态估计和新颖的条件先验保留损失（CPPL），我们的方法能够在端到端学习中精细细节，同时减少少量训练中的语言漂移。我们的方法只需5分钟即可完成个性化设置，与以前的方法相比，速度提高了48倍。在第二阶段，我们引入了一种基于NeRF的化身表示，通过标准SMPL-X空间采样和多分辨率3D-SDS进行优化。与基于网格的表示方法相比，后者受限于分辨率的离散化和遮挡几何的错误，我们的连续辐射场能够保留高频纹理（例如头发）并正确处理遮挡通过透射率。实验表明，PFAvatar在重建保真度、细节保留、遮挡/截断稳健性方面优于最先进的方法，推动了从现实世界OOTD相册生成实用三维化身的进步。此外，重建的三维化身支持下游应用，如虚拟试穿、动画和人类视频再现等，进一步证明了我们方法的通用性和实用价值。

论文及项目相关链接

PDF Accepted by AAAI 2026

Summary
基于Pose-Fusion Avatar（PFAvatar）的新方法，能从日常穿搭照片重建高质量3D头像。此方法分为两个阶段：一是微调姿态感知扩散模型，二是蒸馏由神经网络辐射场表示的3D头像。该方法避免了分解图像资产导致的潜在不一致性，通过集成姿态估计的预训练ControlNet和新颖的条件先验保留损失（CPPL），实现细节端到端学习并减轻少量训练的“语言漂移”。在第二阶段，引入基于NeRF的头像表示法，通过标准SMPL-X空间采样和多分辨率3D-SDS进行优化。相较于网格表示法，连续辐射场能保留高频纹理并正确处理遮挡问题。实验显示，PFAvatar在重建保真度、细节保留以及遮挡/截断鲁棒性方面超越现有方法，推动从现实世界相册生成实用3D头像的技术进步。此外，重建的3D头像支持下游应用，如虚拟试穿、动画和人体视频再现，进一步证明该方法的通用性和实用价值。

Key Takeaways

PFAvatar是一种从日常穿搭照片重建高质量3D头像的新方法。
方法分为两个阶段：微调姿态感知扩散模型和蒸馏神经网络辐射场表示的3D头像。
避免图像分解导致的潜在不一致性，通过集成ControlNet和CPPL实现细节端到端学习。
引入基于NeRF的头像表示法，能保留高频纹理并正确处理遮挡问题。
PFAvatar在重建保真度、细节保留和遮挡鲁棒性方面超越现有方法。
重建的3D头像支持虚拟试穿、动画和人体视频再现等下游应用。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-11-20/%E5%85%83%E5%AE%87%E5%AE%99_%E8%99%9A%E6%8B%9F%E4%BA%BA/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

元宇宙/虚拟人

3DGS

3DGS 方向最新论文已更新，请持续关注 Update in 2025-11-20 SparseSurf Sparse-View 3D Gaussian Splatting for Surface Reconstruction

2025-11-20 3DGS

3DGS

Speech

Speech 方向最新论文已更新，请持续关注 Update in 2025-11-20 IMSE Efficient U-Net-based Speech Enhancement using Inception Depthwise Convolution and Amplitude-Aware Linear Attention

2025-11-20 Speech

Speech