发布日期: 2025-08-05

更新日期: 2025-08-20

文章字数: 1.4k

阅读时长: 5 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-08-05 更新

ReAlign: Bilingual Text-to-Motion Generation via Step-Aware Reward-Guided Alignment

Authors:Wanjiang Weng, Xiaofeng Tan, Hongsong Wang, Pan Zhou

Bilingual text-to-motion generation, which synthesizes 3D human motions from bilingual text inputs, holds immense potential for cross-linguistic applications in gaming, film, and robotics. However, this task faces critical challenges: the absence of bilingual motion-language datasets and the misalignment between text and motion distributions in diffusion models, leading to semantically inconsistent or low-quality motions. To address these challenges, we propose BiHumanML3D, a novel bilingual human motion dataset, which establishes a crucial benchmark for bilingual text-to-motion generation models. Furthermore, we propose a Bilingual Motion Diffusion model (BiMD), which leverages cross-lingual aligned representations to capture semantics, thereby achieving a unified bilingual model. Building upon this, we propose Reward-guided sampling Alignment (ReAlign) method, comprising a step-aware reward model to assess alignment quality during sampling and a reward-guided strategy that directs the diffusion process toward an optimally aligned distribution. This reward model integrates step-aware tokens and combines a text-aligned module for semantic consistency and a motion-aligned module for realism, refining noisy motions at each timestep to balance probability density and alignment. Experiments demonstrate that our approach significantly improves text-motion alignment and motion quality compared to existing state-of-the-art methods. Project page: https://wengwanjiang.github.io/ReAlign-page/.

双语文本到动作生成技术可以从双语文本输入中合成3D人类动作，为游戏、电影和机器人等领域的跨语言应用提供了巨大潜力。然而，这一任务面临着关键挑战：缺乏双语运动语言数据集以及扩散模型中文本和运动分布之间的不匹配，这导致动作语义不一致或质量低下。为了应对这些挑战，我们提出了BiHumanML3D，这是一个新的双语人类运动数据集，为双语文本到动作生成模型建立了关键基准。此外，我们提出了一种双语运动扩散模型（BiMD），该模型利用跨语言对齐表示来捕捉语义，从而实现统一的双语模型。在此基础上，我们提出了奖励引导采样对齐（ReAlign）方法，包括用于评估对齐质量的步骤感知奖励模型和引导扩散过程向最佳对齐分布的奖励引导策略。该奖励模型结合了步骤感知令牌，并融合了文本对齐模块来保证语义一致性，和运动对齐模块来保证真实性，在每个时间步长对噪声动作进行细化，以平衡概率密度和对齐。实验表明，我们的方法相较于现有的最先进的方法，在文本动作对齐和运动质量方面都有显著提高。项目页面：https://wengwanjiang.github.io/ReAlign-page/。

论文及项目相关链接

PDF We believe that there are some areas in the manuscript that require further improvement, and out of our commitment to refining this work, we have decided to withdraw our manuscript after careful deliberation and discussion

摘要
在文本转换为运动（Text-to-Motion）的研究中，研究者推出了双语版模型研究与应用领域非常具有发展潜力和挑战性的领域。本文提出了双语运动数据集BiHumanML3D，为双语文本转换为运动模型提供了重要基准。同时，研究者还提出了双语运动扩散模型（BiMD），利用跨语言对齐表示来捕捉语义信息，实现了统一的双语模型。在此基础上，还提出了奖励引导采样对齐（ReAlign）方法，包括步骤感知奖励模型，用于在采样过程中评估对齐质量以及奖励引导策略，该策略能够引导扩散过程达到最佳对齐分布。此方法结合步骤感知令牌与文本对齐模块以及运动对齐模块共同评估并实现语义一致性以及运动真实性，每一步都对噪声运动进行精细修正以达到概率密度与对齐的平衡。实验证明，相较于现有的先进方法，该研究方法显著提高了文本与运动的匹配度以及运动质量。

关键要点

跨语言应用领域具有发展潜力的双语文本到运动生成（Text-to-Motion）。对于游戏、电影和机器人等跨语言应用领域非常重要。
缺乏双语运动语言数据集和对齐问题成为了双语文本到运动生成的关键挑战。缺乏足够的双语数据集可能导致语义不一致或低质量的运动生成。
提出了一种新的双语人类运动数据集BiHumanML3D，为双语文本到运动生成模型提供了重要基准数据集作为比较研究的参考依据。数据集中的多样化语言和高质量标注解决了这个问题的一个部分。为支持新的技术应用的发展提供了新的可能性。
提出了一种双语运动扩散模型（BiMD），该模型利用跨语言对齐表示来捕捉语义信息，实现统一的双语文本转换模型；引入语义一致性的判定准则进一步提升了文本和运动的融合质量，为后续技术的创新奠定了扎实基础。这不仅改善了运动生成的语义一致性问题也大大增强了运动质量。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-08-05/Text-to-Motion/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

Text-to-Motion

Talking Head Generation

Talking Head Generation 方向最新论文已更新，请持续关注 Update in 2025-08-06 Text2Lip Progressive Lip-Synced Talking Face Generation from Text via Viseme-Guided Rendering

2025-08-06 Talking Head Generation

Talking Head Generation

Interactive

Interactive 方向最新论文已更新，请持续关注 Update in 2025-08-05 Learning Potential Energy Surfaces of Hydrogen Atom Transfer Reactions in Peptides

2025-08-05 Interactive

Interactive