⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验
2025-09-18 更新
Dream3DAvatar: Text-Controlled 3D Avatar Reconstruction from a Single Image
Authors:Gaofeng Liu, Hengsen Li, Ruoyu Gao, Xuetong Li, Zhiyuan Ma, Tao Fang
With the rapid advancement of 3D representation techniques and generative models, substantial progress has been made in reconstructing full-body 3D avatars from a single image. However, this task remains fundamentally ill-posedness due to the limited information available from monocular input, making it difficult to control the geometry and texture of occluded regions during generation. To address these challenges, we redesign the reconstruction pipeline and propose Dream3DAvatar, an efficient and text-controllable two-stage framework for 3D avatar generation. In the first stage, we develop a lightweight, adapter-enhanced multi-view generation model. Specifically, we introduce the Pose-Adapter to inject SMPL-X renderings and skeletal information into SDXL, enforcing geometric and pose consistency across views. To preserve facial identity, we incorporate ID-Adapter-G, which injects high-resolution facial features into the generation process. Additionally, we leverage BLIP2 to generate high-quality textual descriptions of the multi-view images, enhancing text-driven controllability in occluded regions. In the second stage, we design a feedforward Transformer model equipped with a multi-view feature fusion module to reconstruct high-fidelity 3D Gaussian Splat representations (3DGS) from the generated images. Furthermore, we introduce ID-Adapter-R, which utilizes a gating mechanism to effectively fuse facial features into the reconstruction process, improving high-frequency detail recovery. Extensive experiments demonstrate that our method can generate realistic, animation-ready 3D avatars without any post-processing and consistently outperforms existing baselines across multiple evaluation metrics.
随着3D表征技术和生成模型的高速发展,从单张图像重建全身3D化身已经取得了实质性的进展。然而,由于单目输入信息有限,此任务本质上是欠规范的,因此在生成过程中难以控制遮挡区域的几何结构和纹理。为了应对这些挑战,我们重新设计了重建流程,并提出了Dream3DAvatar,这是一个用于3D化身生成的高效且文本可控的两阶段框架。在第一阶段,我们开发了一个轻巧的、带适配器的多视角生成模型。具体来说,我们引入了Pose-Adapter,将SMPL-X渲染和骨骼信息注入SDXL,强制不同视角之间的几何和姿势一致性。为了保留面部身份,我们将ID-Adapter-G融入生成过程,将高分辨率面部特征注入其中。此外,我们还利用BLIP2对多视角图像生成高质量文本描述,增强遮挡区域的文本驱动可控性。在第二阶段,我们设计了一个前馈Transformer模型,配备多视角特征融合模块,从生成的图像中重建高保真3D高斯splat表示(3DGS)。此外,我们还引入了ID-Adapter-R,它利用门控机制有效地将面部特征融合到重建过程中,提高了高频细节的恢复能力。大量实验表明,我们的方法能够生成逼真的、可用于动画的3D化身,无需任何后期处理,并且在多个评估指标上一致地超越了现有基线。
论文及项目相关链接
Summary
随着3D表示技术和生成模型的发展,从单张图像重建全身3D化身的研究取得显著进展。但仍存在信息不全导致无法生成遮挡区域的挑战。本研究推出Dream3DAvatar框架,分两个阶段高效且文字可控地完成重建。第一阶段设计轻量级多视角生成模型,通过Pose-Adapter和ID-Adapter注入信息增强一致性。第二阶段使用前馈Transformer模型融合多视角特征重建高质量3D化身。引入ID-Adapter-R改善面部特征的融合与恢复。实验证明,该方法可生成逼真的动画准备型态的3D化身,超越现有基线评估指标。
Key Takeaways
点此查看论文截图





