发布日期: 2025-05-28

更新日期: 2025-06-24

文章字数: 1.1k

阅读时长: 4 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-05-28 更新

Absolute Coordinates Make Motion Generation Easy

Authors:Zichong Meng, Zeyu Han, Xiaogang Peng, Yiming Xie, Huaizu Jiang

State-of-the-art text-to-motion generation models rely on the kinematic-aware, local-relative motion representation popularized by HumanML3D, which encodes motion relative to the pelvis and to the previous frame with built-in redundancy. While this design simplifies training for earlier generation models, it introduces critical limitations for diffusion models and hinders applicability to downstream tasks. In this work, we revisit the motion representation and propose a radically simplified and long-abandoned alternative for text-to-motion generation: absolute joint coordinates in global space. Through systematic analysis of design choices, we show that this formulation achieves significantly higher motion fidelity, improved text alignment, and strong scalability, even with a simple Transformer backbone and no auxiliary kinematic-aware losses. Moreover, our formulation naturally supports downstream tasks such as text-driven motion control and temporal/spatial editing without additional task-specific reengineering and costly classifier guidance generation from control signals. Finally, we demonstrate promising generalization to directly generate SMPL-H mesh vertices in motion from text, laying a strong foundation for future research and motion-related applications.

当前最先进的文本到动作生成模型依赖于HumanML3D普及的具有动力学感知能力的局部相对动作表示。这种表示方法编码了相对于骨盆和前一帧的动作，并内置了冗余性。虽然这种设计简化了早期模型的训练，但它为扩散模型引入了关键限制，并阻碍了其在下游任务中的应用。在这项工作中，我们重新考虑了动作表示，并提出了一种被长期忽略的用于文本到动作生成的简化替代方案：全局空间中的绝对关节坐标。通过对设计选择进行系统的分析，我们证明这种表述方式实现了更高的动作保真度、更好的文本对齐性和强大的可扩展性，即使在具有简单的Transformer主干和没有辅助的动力学感知损失的情况下也是如此。此外，我们的表述方式自然地支持下游任务，如文本驱动的运动控制和时空编辑，无需额外的任务特定再工程和昂贵的分类器指导生成控制信号。最后，我们展示了直接从文本生成SMPL-H网格顶点运动的通用性潜力，为未来研究和运动相关应用奠定了坚实基础。

论文及项目相关链接

PDF Preprint

Summary

本文重新考虑了文本到运动的表示方法，提出了一种简化且被长期忽视的替代方案——全局空间中的绝对关节坐标，用于文本到运动的生成。该研究系统地分析了设计选择，表明该表述方式在运动保真度、文本对齐以及可扩展性方面取得了显著成果，即使使用简单的Transformer backbone和没有额外的运动感知损失也是如此。此外，该研究自然地支持下游任务，如文本驱动的运动控制和时空编辑，无需额外的任务特定再工程和昂贵的分类器指导生成控制信号。最后，该研究展示了从文本直接生成SMPL-H网格顶点运动的潜力，为未来的研究和运动相关应用奠定了坚实基础。

Key Takeaways