发布日期: 2025-10-01

更新日期: 2025-11-27

文章字数: 2k

阅读时长: 7 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-10-01 更新

LaMoGen: Laban Movement-Guided Diffusion for Text-to-Motion Generation

Authors:Heechang Kim, Gwanghyun Kim, Se Young Chun

Diverse human motion generation is an increasingly important task, having various applications in computer vision, human-computer interaction and animation. While text-to-motion synthesis using diffusion models has shown success in generating high-quality motions, achieving fine-grained expressive motion control remains a significant challenge. This is due to the lack of motion style diversity in datasets and the difficulty of expressing quantitative characteristics in natural language. Laban movement analysis has been widely used by dance experts to express the details of motion including motion quality as consistent as possible. Inspired by that, this work aims for interpretable and expressive control of human motion generation by seamlessly integrating the quantification methods of Laban Effort and Shape components into the text-guided motion generation models. Our proposed zero-shot, inference-time optimization method guides the motion generation model to have desired Laban Effort and Shape components without any additional motion data by updating the text embedding of pretrained diffusion models during the sampling step. We demonstrate that our approach yields diverse expressive motion qualities while preserving motion identity by successfully manipulating motion attributes according to target Laban tags.

多样化的人类运动生成是一个日益重要的任务，在计算机视觉、人机交互和动画等领域有着广泛的应用。虽然使用扩散模型的文本到运动合成已经成功生成了高质量的运动，但实现精细粒度的表达运动控制仍然是一个巨大的挑战。这是由于数据集中运动风格多样性的缺乏以及自然语言表达定量特征的困难。拉班运动分析已被舞蹈专家广泛用来表达运动细节，尽可能保持运动质量的一致性。受此启发，本研究旨在通过无缝集成拉班努力（Effort）和形状（Shape）组件的量化方法，实现文本引导的运动生成模型的可解释和表达控制。我们提出零样本、推理时间优化方法，在采样步骤中通过更新预训练扩散模型的文本嵌入，引导运动生成模型拥有所需的拉班努力（Effort）和形状（Shape）组件，而无需任何额外的运动数据。我们证明，我们的方法能够生成具有多样性的表达运动质量，同时保留运动身份，并根据目标拉班标签成功调整运动属性。

论文及项目相关链接

PDF

Summary

本文介绍了基于扩散模型的人运动生成技术的新进展。文章指出，尽管文本到运动的合成已经取得了一定的成功，但在实现精细表达的运动控制上仍存在挑战。作者借鉴了舞蹈专家广泛使用的Laban运动分析方法，通过结合Laban Effort和Shape组件的量化方法，提出了具有解释性和表达性的运动生成控制方法。该方法能够在无需额外运动数据的情况下，通过更新预训练扩散模型的文本嵌入，在采样步骤中引导运动生成模型达到期望的Laban Effort和Shape组件。实验证明，该方法能够生成具有多样性的表达运动质量，同时保留运动身份并根据目标Laban标签成功调整运动属性。

Key Takeaways

文本到运动生成技术在计算机视觉、人机交互和动画等领域有广泛应用。
当前面临的挑战是实现精细表达的运动控制。
Laban运动分析方法被用于表达运动的细节，包括运动质量。
本文结合了Laban Effort和Shape组件的量化方法，提出了具有解释性和表达性的运动生成控制方法。
所提出的方法能够在无需额外运动数据的情况下，通过更新预训练模型的文本嵌入来引导运动生成。
实验证明该方法能够生成具有多样性的表达运动，同时保留运动身份。

Cool Papers

点此查看论文截图

Authors:Prerit Gupta, Shourya Verma, Ananth Grama, Aniket Bera

Generating realistic, context-aware two-person motion conditioned on diverse modalities remains a central challenge in computer graphics, animation, and human-computer interaction. We introduce DualFlow, a unified and efficient framework for multi-modal two-person motion generation. DualFlow conditions 3D motion synthesis on diverse inputs, including text, music, and prior motion sequences. Leveraging rectified flow, it achieves deterministic straight-line sampling paths between noise and data, reducing inference time and mitigating error accumulation common in diffusion-based models. To enhance semantic grounding, DualFlow employs a Retrieval-Augmented Generation (RAG) module that retrieves motion exemplars using music features and LLM-based text decompositions of spatial relations, body movements, and rhythmic patterns. We use contrastive objective that further strengthens alignment with conditioning signals and introduce synchronization loss that improves inter-person coordination. Extensive evaluations across text-to-motion, music-to-motion, and multi-modal interactive benchmarks show consistent gains in motion quality, responsiveness, and efficiency. DualFlow produces temporally coherent and rhythmically synchronized motions, setting state-of-the-art in multi-modal human motion generation.

生成真实、上下文感知的两人运动，以适应多种模式，仍是计算机图形学、动画和人机交互领域的一个核心挑战。我们推出了DualFlow，这是一个用于多模态两人运动生成的统一高效框架。DualFlow根据多种输入进行3D运动合成，包括文本、音乐和先前的运动序列。它利用校正流在噪声和数据之间实现确定性直线采样路径，从而缩短推理时间并缓解扩散模型中常见的误差累积问题。为了增强语义定位，DualFlow采用检索增强生成（RAG）模块，该模块使用音乐特征和基于LLM的文本分解（包括空间关系、身体运动和节奏模式），检索运动实例。我们采用对比目标来进一步加强与条件信号的匹配，并引入同步损失以提高人物间的协调性。在文本到运动、音乐到运动和多模态交互等多个基准测试上的广泛评估显示，DualFlow在运动质量、响应能力和效率方面都有显著的提升。DualFlow产生时间上连贯、节奏同步的运动，在多模态人类运动生成方面达到最新水平。

论文及项目相关链接

PDF Under review at ICLR 2026

Summary

DualFlow框架用于多模态两人运动生成，可基于文本、音乐等多样化输入条件生成真实、具有上下文意识的两人运动。它采用纠正流技术，实现噪声和数据之间的确定性直线采样路径，减少推理时间，并减轻扩散模型中的常见错误累积问题。通过增强语义定位，DualFlow使用检索增强生成模块，利用音乐特征和基于文本的空间关系、动作和节奏模式检索运动范例。采用对比目标和同步损失函数，进一步提高与条件信号的匹配度和人际协调性。评估表明，DualFlow在多种模态下的运动生成上具有卓越性能，生成的运动在质量、响应性和效率上均有所提升，且具有良好的时间连贯性和节奏感。

Key Takeaways