Generative Movement of Object Trajectories in Videos

arxiv

github bibtex

Kiran Chhatre, Hyeonho Jeong, Yulia Gryaditskaya, Christopher E. Peters, Chun-Hao Paul Huang, Paul Guerrero

KTH Royal Institute of Technology

Adobe Research

Abstract

Generative video editing has enabled several intuitive editing operations for short video clips that would previously have been difficult to achieve, especially for non-expert editors. Existing methods focus on prescribing an object's 3D or 2D motion trajectory in a video, or on altering the appearance of an object or a scene, while preserving both the video's plausibility and identity. Yet a method to move an object's 3D motion trajectory in a video, i.e. moving an object while preserving its relative 3D motion, is currently still missing. The main challenge lies in obtaining paired video data for this scenario. Previous methods typically rely on clever data generation approaches to construct plausible paired data from unpaired videos, but this approach fails if one of the videos in a pair cannot easily be constructed from the other. Instead, we introduce TrajectoryAtlas, a new data generation pipeline for large-scale synthetic paired video data and a video generator TrajectoryMover fine-tuned with this data. We show that this successfully enables generative movement of object trajectories.

TrajectoryAtlas Data Generation Pipeline

TrajectoryAtlas data generation pipeline. The pipeline has five stages, Asset Cache Preparation, Preflight Validation, Collision Aware Sampling and Scaling, Task Simulation, and Canonical Rendering with Runtime Metadata. Inputs including camera, 3D scene, lights and materials, and Objaverse or primitive assets are converted to reusable collision caches, then skip render preflight selects valid frames. Paired A/B placements with shared scale are filtered by visibility, support normal, and penetration clearance, and optional no hit processing removes only non structural obstacles in the trajectory corridor. Throw, drop, roll, and drag trajectories are simulated with Bullet and rendered with Blender into canonical RGB and binary segmentation videos.

TrajectoryMover Architecture

TrajectoryMover architecture. We concatenate three latent streams z_trj, z_src, and z_bb before denoising. In the control image, red marks the source box and green marks the target box.

Common Baseline Repurposing Pipeline

Common baseline repurposing pipeline. We convert each source-target case into method specific controls. We estimate source depth, extract source and target frame 0 masks, lift source object motion to a 3D trajectory proxy, compute the frame 0 displacement E, and re-anchor the trajectory to the target start. Red indicates source localization, green indicates target localization, and trajectory overlays visualize source and re-anchored motion elements used for downstream baseline control conversion.

Qualitative Results

BibTeX

@article{trajectorymover_2026,
  title   = {TrajectoryMover: Generative Movement of Object Trajectories in Videos},
  author  = {To be added},
  journal = {To be added},
  year    = {2026}
}

Acknowledgements

We thank Valentin Deschaintre and Iliyan Georgiev for insightful discussions and valuable feedback. We are also grateful to Yannick Hold-Geoffroy and Vladimir Kim for their help with the assets used in dataset generation, and to Zhening Huang for support with the model pipeline. The core ideas for this project were developed while Kiran Chhatre was an intern at Adobe Research.