发布日期: 2024-12-21

更新日期: 2024-12-21

文章字数: 789

阅读时长: 3 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2024-12-21 更新

Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters

Authors:Steven Hogue, Chenxu Zhang, Yapeng Tian, Xiaohu Guo

Recent advances in co-speech gesture and talking head generation have been impressive, yet most methods focus on only one of the two tasks. Those that attempt to generate both often rely on separate models or network modules, increasing training complexity and ignoring the inherent relationship between face and body movements. To address the challenges, in this paper, we propose a novel model architecture that jointly generates face and body motions within a single network. This approach leverages shared weights between modalities, facilitated by adapters that enable adaptation to a common latent space. Our experiments demonstrate that the proposed framework not only maintains state-of-the-art co-speech gesture and talking head generation performance but also significantly reduces the number of parameters required.

近期在协同语音手势和说话头部生成方面的进展令人印象深刻，然而大多数方法只专注于其中一个任务。那些试图同时生成两者的方法通常依赖于单独的模型或网络模块，增加了训练的复杂性并忽略了面部和肢体动作之间的内在关系。为了解决这些挑战，本文提出了一种新型模型架构，该架构能够在单个网络中联合生成面部和肢体动作。这种方法通过适配器利用不同模态之间的共享权重，从而适应公共潜在空间。我们的实验表明，所提出的框架不仅保持了最先进的协同语音手势和说话头部生成性能，而且还显著减少了所需的参数数量。

论文及项目相关链接

PDF

Summary

本文提出一种新型模型架构，能够在单一网络中联合生成面部和身体动作。该架构利用不同模态之间的共享权重，通过适配器实现适应共同潜在空间，不仅保持了最先进的随语音手势和谈话头部生成性能，还显著减少了所需参数数量。

Key Takeaways

现有方法在随语音手势和谈话头部生成方面存在局限性，多数方法仅专注于其中一项任务。
尝试同时生成两项任务的方法通常依赖于单独模型或网络模块，增加了训练复杂性并忽略了面部和身体动作之间的内在关系。
本文提出的模型架构能在单一网络中联合生成面部和身体动作。
该架构利用适配器实现不同模态之间的共享权重，以适应共同潜在空间。
该架构不仅保持了最先进的性能，还显著减少了所需参数数量。
通过实验证明，该架构在随语音手势和谈话头部生成方面表现出色。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2024-12-21/Talking%20Head%20Generation/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

Talking Head Generation

Text-to-Motion

Text-to-Motion 方向最新论文已更新，请持续关注 Update in 2024-12-21 EnergyMoGen Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space

2024-12-21 Text-to-Motion

Text-to-Motion

Interactive

Interactive 方向最新论文已更新，请持续关注 Update in 2024-12-21 EarthDial Turning Multi-sensory Earth Observations to Interactive Dialogues

2024-12-21 Interactive

Interactive