“VOGUE: Try-On by StyleGAN,” from my former Google colleague Ira Kemelmacher-Shlizerman & her team, promises to synthesize photorealistic clothing & automatically apply it to a range of body shapes (leveraging the same StyleGAN foundation that my new teammates are using to build images via text):
No markers, no mocap cameras, no suit, no keyframing. This take uses 3 DSLR cameras, though, and pretty far from being real-time. […]
Under the hood, it uses #OpenPose ML-network for 2d tracking of joints on each camera, and then custom Houdini setup for triangulating the results into 3d, stabilizing it and driving the rig (volumes, CHOPs, #kinefx, FEM – you name it 🙂
From sign language to sports training to AR effects, tracking the human body unlocks some amazing possibilities, and my Google Research teammates are delivering great new tools:
We are excited to announce MediaPipe Holistic, […] a new pipeline with optimized pose, face and hand components that each run in real-time, with minimum memory transfer between their inference backends, and added support for interchangeability of the three components, depending on the quality/speed tradeoffs.
Call it AI, ML, FM (F’ing Magic), whatever: tech like this warms the heart and can free body & soul. Google’s Project Guideline helps people with impaired vision navigate the world on their own, independently & at speed. Runner & CEO Thomas Panek, who is blind, writes,
In the fall of 2019, I asked that question to a group of designers and technologists at a Google hackathon. I wasn’t anticipating much more than an interesting conversation, but by the end of the day they’d built a rough demo […].
I’d wear a phone on a waistband, and bone-conducting headphones. The phone’s camera would look for a physical guideline on the ground and send audio signals depending on my position. If I drifted to the left of the line, the sound would get louder and more dissonant in my left ear. If I drifted to the right, the same thing would happen, but in my right ear. Within a few months, we were ready to test it on an indoor oval track. […] It was the first unguided mile I had run in decades.
Check out the journey. (Side note: how great is “Blaze” as a name for a speedy canine running companion? ☺️)
There’s no way the title can do this one justice, so just watch as this ML-based technique identifies moving humans (including their reflections!), then segments them out to enable individual manipulation—including syncing up their motions and even removing people wholesale:
Here’s the vid directly from the research team, which includes longtime Adobe vet David Salesin:
Awesome work by the team. Come grab a copy & build something great!
The ML Kit Pose Detection API is a lightweight versatile solution for app developers to detect the pose of a subject’s body in real time from a continuous video or static image. A pose describes the body’s position at one moment in time with a set of x,y skeletal landmark points. The landmarks correspond to different body parts such as the shoulders and hips. The relative positions of landmarks can be used to distinguish one pose from another.