Click here to flash read.
This work presents a next-generation human-robot interface that can infer and
realize the user's manipulation intention via sight only. Specifically, we
develop a system that integrates near-eye-tracking and robotic manipulation to
enable user-specified actions (e.g., grasp, pick-and-place, etc), where visual
information is merged with human attention to create a mapping for desired
robot actions. To enable sight guided manipulation, a head-mounted
near-eye-tracking device is developed to track the eyeball movements in
real-time, so that the user's visual attention can be identified. To improve
the grasping performance, a transformer based grasp model is then developed.
Stacked transformer blocks are used to extract hierarchical features where the
volumes of channels are expanded at each stage while squeezing the resolution
of feature maps. Experimental validation demonstrates that the eye-tracking
system yields low gaze estimation error and the grasping system yields
promising results on multiple grasping datasets. This work is a proof of
concept for gaze interaction-based assistive robot, which holds great promise
to help the elder or upper limb disabilities in their daily lives. A demo video
is available at https://www.youtube.com/watch?v=yuZ1hukYUrM
No creative common's license