嘘~ 正在从服务器偷取页面 . . .

元宇宙/虚拟人


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-04-22 更新

Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis

Authors:Radek Daněček, Carolin Schmitt, Senya Polikovsky, Michael J. Black

In order to be widely applicable, speech-driven 3D head avatars must articulate their lips in accordance with speech, while also conveying the appropriate emotions with dynamically changing facial expressions. The key problem is that deterministic models produce high-quality lip-sync but without rich expressions, whereas stochastic models generate diverse expressions but with lower lip-sync quality. To get the best of both, we seek a stochastic model with accurate lip-sync. To that end, we develop a new approach based on the following observation: if a method generates realistic 3D lip motions, it should be possible to infer the spoken audio from the lip motion. The inferred speech should match the original input audio, and erroneous predictions create a novel supervision signal for training 3D talking head avatars with accurate lip-sync. To demonstrate this effect, we propose THUNDER (Talking Heads Under Neural Differentiable Elocution Reconstruction), a 3D talking head avatar framework that introduces a novel supervision mechanism via differentiable sound production. First, we train a novel mesh-to-speech model that regresses audio from facial animation. Then, we incorporate this model into a diffusion-based talking avatar framework. During training, the mesh-to-speech model takes the generated animation and produces a sound that is compared to the input speech, creating a differentiable analysis-by-audio-synthesis supervision loop. Our extensive qualitative and quantitative experiments demonstrate that THUNDER significantly improves the quality of the lip-sync of talking head avatars while still allowing for generation of diverse, high-quality, expressive facial animations.

为了广泛应用于各个领域,语音驱动的3D头部化身必须根据语音进行唇部动作,同时借助动态变化的面部表情传达适当的情绪。关键问题是确定性模型虽然能生成高质量的唇形同步,但缺乏丰富的表情,而随机模型虽然能产生多种表情,但唇形同步质量较低。为了两者兼顾,我们寻找具有精确唇形同步的随机模型。为此,我们基于以下观察结果开发了一种新方法:如果一种方法能够生成逼真的3D唇部运动,那么从唇部运动中推断出的语音应该与原始输入音频相匹配。错误的预测会创建一个新的监督信号,用于训练具有精确唇形同步的3D谈话头部化身。为了证明这一点,我们提出了THUNDER(神经网络可区分语音重建的说话人头部,Talking Heads Under Neural Differentiable Elocution Reconstruction),这是一个3D谈话头部化身框架,通过可区分的语音生产引入了一种新型监督机制。首先,我们训练了一种新型网格到语音模型,该模型可以从面部动画回归音频。然后,我们将该模型纳入基于扩散的谈话化身框架。在训练过程中,网格到语音模型接受生成的动画并产生声音,该声音与输入语音进行比较,从而创建一个可通过音频合成进行分析的可区分监督循环。我们的广泛定性和定量实验表明,THUNDER显著提高了谈话头部化身的唇形同步质量,同时仍能够生成多样、高质量、富有表现力的面部动画。

论文及项目相关链接

PDF

Summary

本文探讨了在构建语音驱动的3D头像时,如何结合确定性模型和随机模型的优点,实现既具备高质量唇同步又拥有丰富表情的头像。为此,提出了一种基于神经网络的可微分语音生产机制的新方法THUNDER,该方法通过从面部动画回归音频,有效提高了头像的唇同步质量,并保持了丰富的表情动画。

Key Takeaways

  1. 语音驱动的3D头像需要同时实现高质量的唇同步和丰富的表情表达。
  2. 确定性模型可以实现高质量的唇同步,但缺乏丰富的表情;而随机模型可以生成多样的表情,但唇同步质量较低。
  3. 为了结合两者的优点,需要开发一种具有准确唇同步的随机模型。
  4. THUNDER方法通过引入可微分的语音生产机制,实现了高质量的唇同步和丰富的表情动画。
  5. THUNDER方法包括训练一个从面部动画回归音频的模型,并将其纳入基于扩散的头像谈话框架。
  6. 在训练过程中,通过比较生成的动画和输入语音产生的声音,创建一个可微分的分析-由音频合成监督循环。

Cool Papers

点此查看论文截图

SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars

Authors:Jaeseong Lee, Taewoong Kang, Marcel C. Bühler, Min-Jung Kim, Sungwon Hwang, Junha Hyung, Hyojin Jang, Jaegul Choo

Recent advancements in head avatar rendering using Gaussian primitives have achieved significantly high-fidelity results. Although precise head geometry is crucial for applications like mesh reconstruction and relighting, current methods struggle to capture intricate geometric details and render unseen poses due to their reliance on similarity transformations, which cannot handle stretch and shear transforms essential for detailed deformations of geometry. To address this, we propose SurFhead, a novel method that reconstructs riggable head geometry from RGB videos using 2D Gaussian surfels, which offer well-defined geometric properties, such as precise depth from fixed ray intersections and normals derived from their surface orientation, making them advantageous over 3D counterparts. SurFhead ensures high-fidelity rendering of both normals and images, even in extreme poses, by leveraging classical mesh-based deformation transfer and affine transformation interpolation. SurFhead introduces precise geometric deformation and blends surfels through polar decomposition of transformations, including those affecting normals. Our key contribution lies in bridging classical graphics techniques, such as mesh-based deformation, with modern Gaussian primitives, achieving state-of-the-art geometry reconstruction and rendering quality. Unlike previous avatar rendering approaches, SurFhead enables efficient reconstruction driven by Gaussian primitives while preserving high-fidelity geometry.

近期使用高斯基本体进行头部化身渲染的研究进展已经取得了非常高保真度的结果。虽然精确的头部几何对于网格重建和重新照明等应用至关重要,但当前的方法在捕捉复杂的几何细节和渲染看不见的姿态方面遇到了困难,因为它们依赖于相似变换,无法处理拉伸和剪切变换,这对于几何的详细变形至关重要。为了解决这一问题,我们提出了SurFhead这一新方法,它使用二维高斯表面从RGB视频重建可控制的头部几何形状。高斯表面具有明确的几何属性,如通过固定光线交点获得的精确深度和其表面方向衍生的法线,这使得它们在优势上超越了三维对应物。SurFhead通过利用传统的基于网格的变形转移和仿射变换插值,确保了即使在极端姿态下也能实现法线和图像的高保真渲染。SurFhead引入了精确的几何变形,并通过变换的极分解来混合表面,包括影响法线的变换。我们的主要贡献在于将基于网格的经典图形技术与现代高斯基本体相结合,实现了最先进的几何重建和渲染质量。不同于以前的化身渲染方法,SurFhead能够通过高斯基本体驱动实现有效的重建,同时保持高保真度的几何结构。

论文及项目相关链接

PDF ICLR 2025, Project page with videos: https://summertight.github.io/SurFhead/

Summary
基于高斯原始技术的头部化身渲染技术取得了高保真结果。尽管精确头部几何对于网格重建和重新照明等应用至关重要,但当前方法难以捕捉复杂的几何细节并呈现未见姿态,因为它们依赖于相似变换,无法处理用于详细几何变形的拉伸和剪切变换。为此,我们提出了SurFhead新方法,它通过RGB视频重建可控制的头部几何形状,并使用二维高斯表面单元(surfels),具有精确的深度从固定的光线交汇点和源自其表面方向的法线等明确的几何特征,相较于三维对象具有优势。SurFhead确保通过经典网格变形转移和仿射变换插值实现高保真法线和图像渲染,即使在极端姿态下也是如此。SurFhead引入精确的几何变形并通过变换的极分解融合surfels,包括影响法线的变换。我们的主要贡献在于将基于网格的经典图形技术与现代高斯原始技术相结合,实现了最先进的几何重建和渲染质量。不同于以往的化身渲染方法,SurFhead利用高斯原始技术实现高效重建,同时保持高保真几何。

Key Takeaways

  1. 最新使用高斯原始技术的头部化身渲染取得了高保真成果。
  2. 当前方法在捕捉复杂几何细节和呈现未见姿态方面存在挑战,主要因为它们依赖于无法处理几何详细变形的相似变换。
  3. SurFhead是一个从RGB视频重建可控头部几何的新方法,使用二维高斯surfels提供精确的几何特性。
  4. SurFhead实现了高保真法线和图像渲染,即使在极端姿态下也能保持性能。
  5. SurFhead通过融合surfels的精确几何变形和变换的极分解来处理影响法线的变换。
  6. 该方法结合了经典网格图形技术和现代高斯原始技术,实现了先进的几何重建和渲染质量。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
3DGS 3DGS
3DGS 方向最新论文已更新,请持续关注 Update in 2025-04-22 Green Robotic Mixed Reality with Gaussian Splatting
2025-04-22
下一篇 
GAN GAN
GAN 方向最新论文已更新,请持续关注 Update in 2025-04-22 Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion
2025-04-22
  目录