Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos

Kaifeng Zhang¹, Baoyu Li², Kris Hauser², Yunzhu Li¹

¹Columbia University ²University of Illinois Urbana-Champaign

Robotics: Science and Systems (RSS), 2025

Particle-Grid Neural Dynamics is a learning-based digital twin framework for deformable objects, trained on real RGB-D videos and operable directly on raw observations.

Try our Huggingface interactive demo of PGND simulating 3D Gaussians:
🤗 Interactive Demo

Abstract

Modeling the dynamics of deformable objects is challenging due to their diverse physical properties and the difficulty of estimating states from limited visual information. We address these challenges with a neural dynamics framework that combines object particles and spatial grids in a hybrid representation. Our particle-grid model captures global shape and motion information while predicting dense particle movements, enabling the modeling of objects with varied shapes and materials. Particles represent object shapes, while the spatial grid discretizes the 3D space to ensure spatial continuity and enhance learning efficiency. Coupled with Gaussian Splattings for visual rendering, our framework achieves a fully learning-based digital twin of deformable objects and generates 3D action-conditioned videos. Through experiments, we demonstrate that our model learns the dynamics of diverse objects—such as ropes, cloths, stuffed animals, and paper bags—from sparse-view RGB-D recordings of robot-object interactions, while also generalizing at the category level to unseen instances. Our approach outperforms state-of-the-art learning-based and physics-based simulators, particularly in scenarios with limited camera views. Furthermore, we showcase the utility of our learned models in model-based planning, enabling goal-conditioned object manipulation across a range of tasks.

Video

Results

We evaluate our method on 6 diverse deformable object categories, including ropes, cloths, stuffed animals, and paper bags. Our model is trained separately on each category using less than 20 minutes of RGB-D videos of robot-object interactions.

Future Prediction

Given initial states and actions, we show the prediction results of the GBND baseline compared to our particle-grid neural dynamics model. We overlay the predictions with ground truth final state images to highlight the prediction errors. PGND's predictions are more aligned with the ground truth, offering higher-density particle predictions and fewer artifacts compared to the baseline.

3D Action-Conditioned Video Prediction

When plugged into a Gaussian Splatting renderer, PGND can generate high-quality 3D action-conditioned videos. PGND's results aligns better with the ground truth while the SOTA baseline method predicts visually nonrealistic deformations.

Simulation with 3D Gaussians

PGND can serve as a deformable object simulator given Gaussian Splatting reconstructions of the scene. Given only the initial static reconstruction, we apply PGND to simulate the segmented object given a sequence of actions (red arrows).

Model-Based Planning

PGND can be integrated with MPC to generate actions for manipulating objects. We test on 4 tasks with distinct object types: cloth lifting, box closing, rope manipulation, and plush toy relocating. In all tasks, our method produces results that are closer to the target.

Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos

Particle-Grid Neural Dynamics is a learning-based digital twin framework for deformable objects, trained on real RGB-D videos and operable directly on raw observations.

Try our Huggingface interactive demo of PGND simulating 3D Gaussians:
🤗 Interactive Demo

Abstract

Video

Motivation

Method

Results

Future Prediction

3D Action-Conditioned Video Prediction

Simulation with 3D Gaussians

Model-Based Planning

BibTeX

Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos

Particle-Grid Neural Dynamics is a learning-based digital twin framework for deformable objects, trained on real RGB-D videos and operable directly on raw observations.

Try our Huggingface interactive demo of PGND simulating 3D Gaussians: 🤗 Interactive Demo

Abstract

Video

Motivation

Method

Results

Future Prediction

3D Action-Conditioned Video Prediction

Simulation with 3D Gaussians

Model-Based Planning

BibTeX

Try our Huggingface interactive demo of PGND simulating 3D Gaussians:
🤗 Interactive Demo