In this work, we introduce a Real-Time Operator Takeover (RTOT) paradigm for imitation learning-based methods, alongside novel insights into leveraging the Mahalanobis distance to automatically detect undesirable states. RTOT enables operators to seamlessly take control of a live visuomotor diffusion policy, guiding the system back into desirable states or reinforcing specific demonstrations. Once the operator has intervened and redirected the system, the control is seamlessly returned to the policy, which resumes generating actions until further intervention is required. We demonstrate that incorporating these targeted takeover demonstrations significantly improves policy performance compared to training solely with an equivalent number of, but longer, initial demonstrations. Furthermore, we provide an in-depth analysis of using the Mahalanobis distance to detect out-of-distribution states, illustrating its utility for identifying critical failure points during execution. Supporting materials, including videos of initial and takeover demonstrations and all rice-scooping experiments, are available on the project website.
Real-Time Operator Takeover
Download Preprint
Inital rice scooping demos
Takeover rice scooping demos #1
Takeover rice scooping demos #1
We evaluated policies trained on only initial demonstrations and onces with takeover demonstrations
Rice Scooping with Takeover #2
Rice Scooping with Takeover #1
Rice Scooping initial 20
All experiments: