The automatic recognition of how people use their hands and fingers in natural settings – without instrumenting the fngers – can be useful for many mobile computing applications. To achieve such an interface, we propose a vision-based 3D hand pose estimation framework using a wrist-worn camera. The main challenge is the oblique angle of the wrist-worn camera, which makes the fngers scarcely visible. To address this, a special network that observes deformations on the back of the hand is required. We introduce DorsalNet, a two-stream convolutional neural network to regress finger joint angles from spatio-temporal features of the dorsal hand region (the movement of bones, muscle, and tendons). This work is the frst vision-based real-time 3D hand pose estimator using visual features from the dorsal hand region. Our system achieves a mean joint-angle error of 8.81° for user-specifc models and 9.77° for a general model. Further evaluation shows that our system outperforms previous work with an average of 20% higher accuracy in recognizing dynamic gestures, and achieves a 75% accuracy of detecting 11 different grasp types. We also demonstrate 3 applications which employ our system as a control device, an input device, and a grasped object recognizer.
- Erwin Wu, Ye Yuan, Hui-Shyong Yeo, Aaron Quigley, Hideki Koike, Kris M. Kitani, Back-Hand-Pose: 3D Hand Pose Estimation for a Wrist-worn Camera via Dorsum Deformation Network, ACM User Interface Software and Technology (UIST2020)
Ещё видео!