Real-Time Operator Takeover for Visuomotor Diffusion Policy Training

Nils Ingelhag*, Jesper Munkeby*, Michael C. Welle*, Marco Moletta, Danica Kragic

In this work, we introduce a Real-Time Operator Takeover (RTOT) paradigm for imitation learning-based methods, alongside novel insights into leveraging the Mahalanobis distance to automatically detect undesirable states. RTOT enables operators to seamlessly take control of a live visuomotor diffusion policy, guiding the system back into desirable states or reinforcing specific demonstrations. Once the operator has intervened and redirected the system, the control is seamlessly returned to the policy, which resumes generating actions until further intervention is required. We demonstrate that incorporating these targeted takeover demonstrations significantly improves policy performance compared to training solely with an equivalent number of, but longer, initial demonstrations. Furthermore, we provide an in-depth analysis of using the Mahalanobis distance to detect out-of-distribution states, illustrating its utility for identifying critical failure points during execution. Supporting materials, including videos of initial and takeover demonstrations and all rice-scooping experiments, are available on the project website.

Real-Time Operator Takeover

Download Preprint

Expert Demonstrations

Inital rice scooping demos

Takeover rice scooping demos #1

Takeover rice scooping demos #1

Experiments

We evaluated policies trained on only initial demonstrations and onces with takeover demonstrations

Rice Scooping with Takeover #2

Rice Scooping with Takeover #1

Rice Scooping initial 20

All experiments:

Contact

  • Michael C. Welle; mwelle(at)kth.se; KTH Royal Institute of Technology, Sweden