发布日期: 2025-02-26

更新日期: 2025-05-14

文章字数: 1.2k

阅读时长: 4 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-02-26 更新

BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis

Authors:Seong-Eun Hong, Soobin Lim, Juyeong Hwang, Minwook Chang, Hyeongyeop Kang

Generating natural and expressive human motions from textual descriptions is challenging due to the complexity of coordinating full-body dynamics and capturing nuanced motion patterns over extended sequences that accurately reflect the given text. To address this, we introduce BiPO, Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis, a novel model that enhances text-to-motion synthesis by integrating part-based generation with a bidirectional autoregressive architecture. This integration allows BiPO to consider both past and future contexts during generation while enhancing detailed control over individual body parts without requiring ground-truth motion length. To relax the interdependency among body parts caused by the integration, we devise the Partial Occlusion technique, which probabilistically occludes the certain motion part information during training. In our comprehensive experiments, BiPO achieves state-of-the-art performance on the HumanML3D dataset, outperforming recent methods such as ParCo, MoMask, and BAMM in terms of FID scores and overall motion quality. Notably, BiPO excels not only in the text-to-motion generation task but also in motion editing tasks that synthesize motion based on partially generated motion sequences and textual descriptions. These results reveal the BiPO’s effectiveness in advancing text-to-motion synthesis and its potential for practical applications.

从文本描述生成自然且富有表现力的人类动作是一个挑战，这主要是因为需要协调全身动力学并捕捉扩展序列中的细微动作模式，以准确反映给定文本。为了解决这一问题，我们推出了BiPO，即面向文本转动作合成的双向局部遮挡网络。BiPO是一款新型模型，通过集成基于部分的生成和双向自回归架构，增强文本转动作合成的效能。这种集成使得BiPO在生成过程中能够考虑过去和未来的上下文，同时增强对单个身体部位的精细控制，无需使用真实运动长度。为了缓解集成带来的身体部位之间的依赖关系，我们设计了局部遮挡技术，该技术会在训练过程中概率性地遮挡某些运动部位的信息。在我们的综合实验中，BiPO在HumanML3D数据集上达到了最新技术水平，与ParCo、MoMask和BAMM等近期方法在FID分数和整体运动质量方面相比表现更优秀。值得注意的是，BiPO不仅在文本转动作生成任务上表现出色，而且在基于部分生成的动作序列和文本描述进行动作编辑的任务上也表现出色。这些结果揭示了BiPO在推进文本转动作合成方面的有效性及其在实际应用中的潜力。

论文及项目相关链接

PDF

Summary
文本描述生成自然且富有表现力的动作是一项挑战，因为需要协调全身动作并捕捉扩展序列中的微妙动作模式。为解决此问题，我们推出BiPO网络（Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis），该网络结合了部分生成与双向自回归架构，增强文本至动作合成功能。集成时考虑了前后文的上下文，且不需真实的动作长度即可对个别部位进行精细控制。我们采用部分遮挡技术，在训练时随机遮挡某些动作部位信息以减轻集成后部位间的相互依赖性。在HumanML3D数据集上进行的实验表明，BiPO性能卓越，优于ParCo、MoMask和BAMM等方法，体现在FID分数和整体动作质量上。BiPO不仅在文本到动作的生成任务上表现出色，还在基于部分生成的序列和文本描述的动作合成任务上表现出卓越的能力。这显示了BiPO在推动文本到动作合成方面的有效性以及实际应用潜力。

Key Takeaways

BiPO网络通过结合部分生成与双向自回归架构来增强文本至动作合成的性能。
该网络可以在考虑前后文上下文的同时，对个别身体部位进行精细控制。
部分遮挡技术被用来减轻身体部位间的相互依赖性，提高模型的性能。
BiPO在HumanML3D数据集上的性能优于其他方法，体现在FID分数和整体动作质量上。
BiPO不仅擅长文本到动作的生成任务，还在运动编辑任务上表现出卓越的能力。
BiPO网络具有潜在的实际应用价值。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-02-26/Text-to-Motion/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

Text-to-Motion

LLM

LLM 方向最新论文已更新，请持续关注 Update in 2025-02-27 LLM-Based Design Pattern Detection

2025-02-27 LLM

LLM

Talking Head Generation

Talking Head Generation 方向最新论文已更新，请持续关注 Update in 2025-02-26 GLCF A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection

2025-02-26 Talking Head Generation

Talking Head Generation