We present a Real-Time Operator Takeover (RTOT) paradigm that enables operators to seamlessly take control of a live visuomotor diffusion policy, guiding the system back to desirable states or providing targeted corrective demonstrations. Within this framework, the operator can intervene to correct the robot’s motion, after which control is smoothly returned to the policy until further intervention is needed. We evaluate the takeover framework on three tasks spanning rigid, deformable, and granular objects, and show that incorporating targeted takeover demonstrations significantly improves policy performance compared with training on an equivalent number of initial demonstrations alone. Additionally, we provide an in-depth analysis of the Mahalanobis distance as a signal for automatically identifying undesirable or out-of-distribution states during execution. Supporting materials, including videos of the initial and takeover demonstrations and all experiments, are available on the project website.
Real-Time Operator Takeover
Download Preprint
Example of initial demonstrations and takeover demonstrations on the Rice Scooping task.
Inital rice scooping demos
Takeover rice scooping demos #1
Takeover rice scooping demos #1
We evaluated policies trained on only initial demonstrations and onces with takeover demonstrations
Rice Scooping Task
Pen-in-box Task
Trousers folding Task
All experiments: