发布日期: 2025-06-06

更新日期: 2025-07-06

文章字数: 1k

阅读时长: 4 分

阅读次数:

⚠️ 以下所有内容总结都来自于大语言模型的能力，如有错误，仅供参考，谨慎使用
🔴 请注意：千万不要用于严肃的学术场景，只能用于论文阅读前的初筛！
💗 如果您觉得我们的项目对您有帮助 ChatPaperFree ，还请您给我们一些鼓励！⭐️ HuggingFace免费体验

2025-06-06 更新

ATI: Any Trajectory Instruction for Controllable Video Generation

Authors:Angtian Wang, Haibin Huang, Jacob Zhiyuan Fang, Yiding Yang, Chongyang Ma

We propose a unified framework for motion control in video generation that seamlessly integrates camera movement, object-level translation, and fine-grained local motion using trajectory-based inputs. In contrast to prior methods that address these motion types through separate modules or task-specific designs, our approach offers a cohesive solution by projecting user-defined trajectories into the latent space of pre-trained image-to-video generation models via a lightweight motion injector. Users can specify keypoints and their motion paths to control localized deformations, entire object motion, virtual camera dynamics, or combinations of these. The injected trajectory signals guide the generative process to produce temporally consistent and semantically aligned motion sequences. Our framework demonstrates superior performance across multiple video motion control tasks, including stylized motion effects (e.g., motion brushes), dynamic viewpoint changes, and precise local motion manipulation. Experiments show that our method provides significantly better controllability and visual quality compared to prior approaches and commercial solutions, while remaining broadly compatible with various state-of-the-art video generation backbones. Project page: https://anytraj.github.io/.

我们提出了一种视频生成中的运动控制统一框架，它通过基于轨迹的输入无缝集成了相机移动、对象级别的平移和精细的局部运动。与之前通过单独模块或特定任务设计来解决这些运动类型的方法不同，我们的方法通过轻量级运动注入器将用户定义的轨迹投影到预训练图像到视频生成模型的潜在空间中，从而提供了一种连贯的解决方案。用户可以指定关键点及其运动路径来控制局部变形、整个物体的运动、虚拟相机的动态或这些的组合。注入的轨迹信号引导生成过程，以产生时间上一致且语义上对齐的运动序列。我们的框架在多个视频运动控制任务中表现出卓越的性能，包括风格化的运动效果（例如，运动刷）、动态视点变化和精确的局部运动操作。实验表明，我们的方法与先前的方法和商业解决方案相比，在可控性和视觉质量方面提供了显著的改进，同时与各种最先进的视频生成主干广泛兼容。项目页面：https://anytraj.github.io/。

论文及项目相关链接

PDF

Summary
提出一个统一框架，通过轨迹输入无缝集成摄像机运动、对象级翻译和精细局部运动，实现视频生成中的运动控制。用户可通过轻量级运动注入器将用户定义的轨迹投影到预训练图像到视频生成模型的潜在空间中，从而控制局部变形、整个对象运动、虚拟相机动态或这些组合。注入的轨迹信号引导生成过程，产生时间一致且语义对齐的运动序列。框架在多个视频运动控制任务上表现优越，如风格化运动效果、动态视点更改和精确局部运动操控。

Key Takeaways

提出一个统一框架用于视频生成中的运动控制，集成多种运动类型。
通过轻量级运动注入器，用户可定义关键点及其运动路径来控制局部变形、整个对象运动及虚拟相机动态。
轨迹输入用于指导生成过程，产生时间一致且语义对齐的运动序列。
框架在多种视频运动控制任务上表现优越，包括风格化运动效果、动态视点转换和精确局部运动操控。
与先前方法和商业解决方案相比，该方法在可控性和视觉质量方面表现更佳。
框架与各种先进的视频生成模型骨架广泛兼容。

Cool Papers

点此查看论文截图

Kedreamix

https://kedreamix.github.io/Talk2Paper/Paper/2025-06-06/I2I%20Translation/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Kedreamix !

I2I Translation

Vision Transformer

Vision Transformer 方向最新论文已更新，请持续关注 Update in 2025-06-06 Vocabulary-free few-shot learning for Vision-Language Models

2025-06-06 Vision Transformer

Vision Transformer

Few-Shot

Few-Shot 方向最新论文已更新，请持续关注 Update in 2025-06-06 QQSUM A Novel Task and Model of Quantitative Query-Focused Summarization for Review-based Product Question Answering

2025-06-06 Few-Shot

Few-Shot