GOMP: Grasped Object Manifold Projection for Multimodal Imitation Learning of Manipulation

Abstract

Imitation Learning (IL) holds great potential for learning repetitive manipulation tasks, such as those in industrial assembly. However, its effectiveness is often limited by insufficient trajectory precision due to compounding errors. In this paper, we introduce Grasped Object Manifold Projection (GOMP), an interactive method that mitigates these errors by constraining a non-rigidly grasped object to a lower-dimensional manifold. GOMP assumes a precise task in which a manipulator holds an object that may shift within the grasp in an observable manner and must be mated with a grounded part. Crucially, all GOMP enhancements are learned from the same expert dataset used to train the base IL policy, and are adjusted with an n-arm bandit-based interactive component. We show that this manifold constraint theoretically improves upon the well-known compounding error bound in IL literature. We demonstrate the framework on four precise assembly tasks using tactile feedback, and note that the approach remains modality-agnostic.

Visualization of Grasped Object Manifold Projections

6 possible projections of object pose trajectories from expert demonstrations of nut threading. Each projection has a different number of degrees of freedom. GOMP defines a set of manifolds on which to project object pose trajectories, then interactively selects the best manifold for the task from this set.

Visualization of current object pose, desired object pose from Vanilla Diffusion Policy, and desired object pose from Diffusion Policy + GOMP during rollout of select tasks.

Task 1: Nut Threading

Vanilla Diffusion Policy

GOMP-enhanced Diffusion Policy

Task 2: Tight Hexagonal Peg Insertion

Vanilla Diffusion Policy

GOMP-enhanced Diffusion Policy

Task 3: Remote Battery Cover Placement

Vanilla Diffusion Policy

GOMP-enhanced Diffusion Policy

Task 4: USB Insertion

Vanilla Diffusion Policy

GOMP-enhanced Diffusion Policy

Data Collection

Visualization of in-hand pose (IHP) data collection to train the SE(2) tactile IHP estimator we used.

We used kinesthetic demonstrations used to train GOMP and Diffusion Policy in this work.

BibTeX

@article{gomp2024,
  title={GOMP: Grasped Object Manifold Projection for Multimodal Imitation Learning of Manipulation},
  author={van den Bogert, William and Linkowski, Gregory and Fazeli, Nima},
  journal={arXiv},
  year={2025}
}