“Why doesn’t it recognize The Finger?!” asks my indignant, mischievous 10-year-old Henry, who with his brother has offered to donate a rich set of training data. 🙃
Juvenile amusement notwithstanding, I’m delighted that my teammates have released a badass hand-tracking model, especially handy (oh boy) for use with MediaPipe (see previous), our open-source pipeline for building ML projects.

Today we are announcing the release of a new approach to hand perception, which we previewed CVPR 2019 in June, implemented in MediaPipe—an open source cross platform framework for building pipelines to process perceptual data of different modalities, such as video and audio. This approach provides high-fidelity hand and finger tracking by employing machine learning (ML) to infer 21 3D keypoints of a hand from just a single frame. Whereas current state-of-the-art approaches rely primarily on powerful desktop environments for inference, our method achieves real-time performance on a mobile phone, and even scales to multiple hands. We hope that providing this hand perception functionality to the wider research and development community will result in an emergence of creative use cases, stimulating new applications and new research avenues.
🙌