嘘~ 正在从服务器偷取页面 . . .

Talking Head Generation


⚠️ 以下所有内容总结都来自于 大语言模型的能力,如有错误,仅供参考,谨慎使用
🔴 请注意:千万不要用于严肃的学术场景,只能用于论文阅读前的初筛!
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ,还请您给我们一些鼓励!⭐️ HuggingFace免费体验

2025-01-28 更新

SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation

Authors:Yujian Liu, Shidang Xu, Jing Guo, Dingbin Wang, Zairan Wang, Xianfeng Tan, Xiaoli Liu

Generating talking avatar driven by audio remains a significant challenge. Existing methods typically require high computational costs and often lack sufficient facial detail and realism, making them unsuitable for applications that demand high real-time performance and visual quality. Additionally, while some methods can synchronize lip movement, they still face issues with consistency between facial expressions and upper body movement, particularly during silent periods. In this paper, we introduce SyncAnimation, the first NeRF-based method that achieves audio-driven, stable, and real-time generation of speaking avatar by combining generalized audio-to-pose matching and audio-to-expression synchronization. By integrating AudioPose Syncer and AudioEmotion Syncer, SyncAnimation achieves high-precision poses and expression generation, progressively producing audio-synchronized upper body, head, and lip shapes. Furthermore, the High-Synchronization Human Renderer ensures seamless integration of the head and upper body, and achieves audio-sync lip. The project page can be found at https://syncanimation.github.io

音频驱动的动画角色生成仍是重要的挑战。现有的方法通常需要巨大的计算成本,并且常常缺乏足够的面部细节和真实感,因此它们不适用于需要高实时性能和视觉质量的应用。虽然一些方法可以实现唇部同步,但它们仍然面临面部表情和上半身动作同步的问题,特别是在无声时期。在本文中,我们介绍了SyncAnimation,这是基于NeRF的第一个实现音频驱动的实时动画角色生成的方法,它通过广义的音频到姿态匹配和音频到表情同步相结合来实现。通过集成AudioPose Syncer和AudioEmotion Syncer,SyncAnimation实现了高精度的姿态和表情生成,逐步生成与音频同步的头部和上半身动作以及唇部形状。此外,高同步人体渲染器确保头部和上半身的无缝集成,并实现音频同步的唇部动作。项目页面可以在https://syncanimation.github.io找到。

论文及项目相关链接

PDF 11 pages, 7 figures

Summary

本文介绍了一种基于NeRF的SyncAnimation方法,实现了音频驱动的实时说话人动画生成。通过结合音频姿态匹配和音频表情同步技术,SyncAnimation实现了高精度的姿态和表情生成,可逐步生成与音频同步的上半身、头部和嘴唇形状。此外,该方法的High-Synchronization Human Renderer确保头部和上半身的无缝集成,实现音频同步的嘴唇动作。

Key Takeaways

  1. SyncAnimation是首个基于NeRF的音频驱动说话人动画生成方法。
  2. 它实现了稳定的实时生成音频驱动的说话人动画。
  3. 通过结合音频姿态匹配和音频表情同步技术,提高了姿态和表情生成的精度。
  4. SyncAnimation能逐步生成与音频同步的上半身、头部和嘴唇形状。
  5. High-Synchronization Human Renderer确保头部和上半身的无缝集成。
  6. 该方法实现了音频同步的嘴唇动作,提高了视觉效果的真实性。

Cool Papers

点此查看论文截图


文章作者: Kedreamix
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !
 上一篇
LLM LLM
LLM 方向最新论文已更新,请持续关注 Update in 2025-01-29 RAPID Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval
2025-01-29
下一篇 
TTS TTS
TTS 方向最新论文已更新,请持续关注 Update in 2025-01-28 Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models
2025-01-28
  目录