发布日期: 2025-10-22

更新日期: 2025-11-27

文章字数: 1.4k

阅读时长: 5 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-22 更新

Capturing Head Avatar with Hand Contacts from a Monocular Video

Authors:Haonan He, Yufeng Zheng, Jie Song

Photorealistic 3D head avatars are vital for telepresence, gaming, and VR. However, most methods focus solely on facial regions, ignoring natural hand-face interactions, such as a hand resting on the chin or fingers gently touching the cheek, which convey cognitive states like pondering. In this work, we present a novel framework that jointly learns detailed head avatars and the non-rigid deformations induced by hand-face interactions. There are two principal challenges in this task. First, naively tracking hand and face separately fails to capture their relative poses. To overcome this, we propose to combine depth order loss with contact regularization during pose tracking, ensuring correct spatial relationships between the face and hand. Second, no publicly available priors exist for hand-induced deformations, making them non-trivial to learn from monocular videos. To address this, we learn a PCA basis specific to hand-induced facial deformations from a face-hand interaction dataset. This reduces the problem to estimating a compact set of PCA parameters rather than a full spatial deformation field. Furthermore, inspired by physics-based simulation, we incorporate a contact loss that provides additional supervision, significantly reducing interpenetration artifacts and enhancing the physical plausibility of the results. We evaluate our approach on RGB(D) videos captured by an iPhone. Additionally, to better evaluate the reconstructed geometry, we construct a synthetic dataset of avatars with various types of hand interactions. We show that our method can capture better appearance and more accurate deforming geometry of the face than SOTA surface reconstruction methods.

逼真的3D头部角色对于远程出席、游戏和虚拟现实至关重要。然而，大多数方法都只关注面部区域，忽略了自然的手部与面部的交互作用，如手放在下巴上或手指轻轻触摸脸颊等，这些交互作用能够传达思考等认知状态。在这项工作中，我们提出了一种新型框架，该框架可以学习详细的头部角色以及由手部与面部交互引起的非刚性变形。这项任务面临两个主要挑战。首先，天真地分别跟踪手和脸，无法捕捉到它们之间的相对姿势。为了克服这一问题，我们建议在姿势跟踪过程中结合深度顺序损失和接触正则化，以确保手和脸之间的空间关系正确。其次，没有公开的先验知识可用于手部引起的变形，这使得从单目视频中学习变得非常困难。为了解决这个问题，我们从面部手部交互数据集中学习了专门针对手部引起的面部变形的PCA基础。这将问题简化为估计一组紧凑的PCA参数，而不是完整的空间变形场。此外，受物理模拟的启发，我们引入了接触损失，提供了额外的监督，大大降低了穿透伪影，提高了结果的物理可信度。我们在用iPhone捕获的RGB(D)视频上评估了我们的方法。另外，为了更好地评估重建的几何形状，我们构建了一个包含各种手部交互的虚拟角色合成数据集。我们展示我们的方法可以捕获比现有技术更好的面部外观和更准确的变形几何形状。

论文及项目相关链接

PDF ICCV 2025

Summary
在遥在、游戏和虚拟现实领域，真实三维头像的化身扮演重要角色。当前多数方法只注重面部区域，忽视了自然的手部与面部的交互动作，如手托下巴或手指轻触脸颊等，这些动作可以表达思考等认知状态。本研究提出了一种新颖框架，能同时学习详细的头部化身和手部与面部交互引发非刚性变形。该任务面临两大挑战：一是单纯分别追踪手和脸无法捕捉它们的相对姿势；二是缺乏手部诱导变形的公开先验信息，使得从单目视频中学习变得非平凡。为解决这些问题，研究结合了深度顺序损失和接触正则化进行姿态追踪，确保了手和脸之间的正确空间关系；同时学习了手部诱导面部变形的PCA基础，将问题简化为估计PCA参数集，减少了计算空间变形场的复杂性。此外，受到物理模拟的启发，增加了接触损失，提升了结果的物理合理性，减少了穿模错误。经iPhone捕获的RGB-D视频和合成的带有各种手部交互动作的虚拟人数据集验证，此方法能捕捉更好的外观和更准确的面部变形几何形态。

Key Takeaways