From sign language to sports training to AR effects, tracking the human body unlocks some amazing possibilities, and my Google Research teammates are delivering great new tools:
We are excited to announce MediaPipe Holistic, […] a new pipeline with optimized pose, face and hand components that each run in real-time, with minimum memory transfer between their inference backends, and added support for interchangeability of the three components, depending on the quality/speed tradeoffs.
Call it AI, ML, FM (F’ing Magic), whatever: tech like this warms the heart and can free body & soul. Google’s Project Guideline helps people with impaired vision navigate the world on their own, independently & at speed. Runner & CEO Thomas Panek, who is blind, writes,
In the fall of 2019, I asked that question to a group of designers and technologists at a Google hackathon. I wasn’t anticipating much more than an interesting conversation, but by the end of the day they’d built a rough demo […].
I’d wear a phone on a waistband, and bone-conducting headphones. The phone’s camera would look for a physical guideline on the ground and send audio signals depending on my position. If I drifted to the left of the line, the sound would get louder and more dissonant in my left ear. If I drifted to the right, the same thing would happen, but in my right ear. Within a few months, we were ready to test it on an indoor oval track. […] It was the first unguided mile I had run in decades.
Check out the journey. (Side note: how great is “Blaze” as a name for a speedy canine running companion? ☺️)
There’s no way the title can do this one justice, so just watch as this ML-based technique identifies moving humans (including their reflections!), then segments them out to enable individual manipulation—including syncing up their motions and even removing people wholesale:
Here’s the vid directly from the research team, which includes longtime Adobe vet David Salesin:
Awesome work by the team. Come grab a copy & build something great!
The ML Kit Pose Detection API is a lightweight versatile solution for app developers to detect the pose of a subject’s body in real time from a continuous video or static image. A pose describes the body’s position at one moment in time with a set of x,y skeletal landmark points. The landmarks correspond to different body parts such as the shoulders and hips. The relative positions of landmarks can be used to distinguish one pose from another.
My old teammates keep slapping out the bangers, releasing machine-learning tech to help build apps that key off the human form.
First up is Media Pipe Iris, enabling depth estimation for faces without fancy (iPhone X-/Pixel 4-style) hardware, and that in turn opens up access to accurate virtual try-on for glasses, hats, etc.:
The model enables cool tricks like realtime eye recoloring:
I always find it interesting to glimpse the work that goes in behind the scenes. For example:
To train the model from the cropped eye region, we manually annotated ~50k images, representing a variety of illumination conditions and head poses from geographically diverse regions, as shown below.
The team has followed up this release with MediaPipe BlazePose, which is in testing now & planned for release via the cross-platform ML Kit soon:
Our approach provides human pose tracking by employing machine learning (ML) to infer 33, 2D landmarks of a body from a single frame. In contrast to current pose models based on the standard COCO topology, BlazePose accurately localizes more keypoints, making it uniquely suited for fitness applications…
If one leverages GPU inference, BlazePose achieves super-real-time performance, enabling it to run subsequent ML models, like face or hand tracking.
Now I can’t wait for apps to help my long-suffering CrossFit coaches actually quantify the crappiness of my form. Thanks, team! 😛
Since making Google Meet premium video meetings free and available to everyone, we’ve continued to accelerate the development of new features… In the coming months, we’ll make it easy to blur out your background, or replace it with an image of your choosing so you can keep your team’s focus solely on you.
This is kinda inside-baseball, but I’m really happy that friends from my previous team will now have their work distributed on hundreds of millions, if not billions, of devices:
[A] face contours model — which can detect over 100 points in and around a user’s face and overlay masks and beautification elements atop them — has been added to the list of APIs shipped through Google Play Services…
Lastly, two new APIs are now available as part of the ML Kit early access program: entity extraction and pose detection… Pose detection supports 33 skeletal points like hands and feet tracking.
Let’s see what rad stuff the world can build with these foundational components. Here’s an example of folks putting an earlier version to use, and you can find a ton more in my Body Tracking category: