Category Archives: VR/AR

Inside Google’s realtime face ML tech

I’m delighted that my teammates are getting to share the details of how the awesome face-tracking tech they built works

We employ machine learning (ML) to infer approximate 3D surface geometry to enable visual effects, requiring only a single camera input without the need for a dedicated depth sensor. This approach provides the use of AR effects at realtime speeds, using TensorFlow Lite for mobile CPU inference or its new mobile GPU functionality where available. This technology is the same as what powers YouTube Stories’ new creator effects, and is also available to the broader developer community via the latest ARCore SDK release and the ML Kit Face Contour Detection API.

We’ve been hard at work ensuring that the tech works well for really demanding applications like realistic makeup try-on:

NewImage

If you’re a developer, dig into the links above to see how you can use the tech—and everyone else, stay tuned for more fun, useful applications of it across Google products.

Googlers win VFX Oscars

Congrats to Paul Debevec, Xueming Yu Wan-Chun Alex Ma, and their former colleague Timothy Hawkins for the recognition of their groundbreaking Light Stage work! 

[YouTube]

Now they’re working with my extended team:

“We try to bring our knowledge and background to try to make better Google products,” Ma says. “We’re working on improving the realism of VR and AR experiences.”

I go full SNL Sue thinking about what might be possible.

NewImage

Oh, and they worked on Ready Player One (nominated for Best Visual Effects this year) and won for Blade Runner 2049 last year:

Just prior to heading to Google, they worked on “Blade Runner 2049,” which took home the Oscar for Best Visual Effects last year and brought back the character Rachael from the original “Blade Runner” movie. The new Rachael was constructed with facial features from the original actress, Sean Young, and another actress, Loren Peta, to make the character appear to be the same age she was in the first film.

Check out their work in action:

[YouTube 1 & 2]

Machine learning in your browser tracks your sweet bod

A number of our partner teams have been working on both the foundation for browser-based ML & on cool models that can run there efficiently:

We are excited to announce the release of BodyPix, an open-source machine learning model which allows for person and body-part segmentation in the browser with TensorFlow.js. With default settings, it estimates and renders person and body-part segmentation at 25 fps on a 2018 15-inch MacBook Pro, and 21 fps on an iPhone X. […]

This might all make more sense if you try a live demo here.

Check out this post for more details.

It’s Friday: Let’s melt some faces!

I’m so pleased to say that my team’s face-tracking tech (which you may have seen powering AR effects in YouTube Stories and elsewhere) is now available for developers to build upon:

ARCore’s new Augmented Faces API (available on the front-facing camera) offers a high quality, 468-point 3D mesh that lets users attach fun effects to their faces. From animated masks, glasses, and virtual hats to skin retouching, the mesh provides coordinates and region specific anchors that make it possible to add these delightful effects.


“Why do you keep looking at King Midas’s wife?” my son Finn asked as I was making this GIF the other day. :-p

Check out details & grab the SDKs:

We can’t wait to see what folks build with this tech, and we’ll share more details soon!

AR walking nav is starting to arrive in Google Maps

I’m really pleased to see that augmented reality navigation has gone into testing with Google Maps users:

On the Google AI Blog, the team gives some insights into the cool tech at work:

We’re experimenting with a way to solve this problem using a technique we call global localization, which combines Visual Positioning Service (VPS), Street View, and machine learning to more accurately identify position and orientation. […]

VPS determines the location of a device based on imagery rather than GPS signals. VPS first creates a map by taking a series of images which have a known location and analyzing them for key visual features, such as the outline of buildings or bridges, to create a large scale and fast searchable index of those visual features. To localize the device, VPS compares the features in imagery from the phone to those in the VPS index. However, the accuracy of localization through VPS is greatly affected by the quality of the both the imagery and the location associated with it. And that poses another question—where does one find an extensive source of high-quality global imagery?

Read on for the full story.

AR: Gambeezy lands on Pixel!

This is America… augmented by Childish Gambino on Pixel:

NewImage

The Childish Gambino Playmoji pack features unique moves that map to three different songs: “Redbone,” “Summertime Magic,” and “This is America.” Pixel users can start playing with them today using the camera on their Pixel, Pixel XL, Pixel 2, Pixel 2 XL, Pixel 3 and Pixel 3 XL.

And with some help from my team:

He even reacts to your facial expressions in real time thanks to machine learning—try smiling or frowning in selfie mode and see how he responds.

Enjoy!

Wacom’s teaming up with Magic Leap for collaborative creation

Hmm—it’s a little hard to gauge just from a written description, but I’m excited to see new AR blood teaming up with an OG of graphics to try defining a new collaborative environment:

Wearing a Magic Leap One headset connected to a Wacom Intuos Pro pen tablet, designers can use the separate three-button Pro Pen 3D stylus to control their content on a platform called Spacebridge, which streams 3D data into a spatial computing environment. The program allows multiple people in a room to interact with the content, with the ability to view, scale, move, and sketch in the same environment.

Check out the rest of the Verge article for details. I very much look forward to seeing how this develops.

NewImage

AR & AI help blind users navigate space & perceive emotions

I love assistive superpowers like this work from Caltech:

VR Scout shares numerous details:

[T]he team used the Microsoft HoloLens’s capability to create a digital mesh over a “scene” of the real-world. Using unique software called Cognitive Augmented Reality Assistant (CARA), they were able to convert information into audio messages, giving each object a “voice” that you would hear while wearing the headset. […]

If the object is at the left, the voice will come from the left side of the AR headset, while any object on the right will speak out to you from the right side of the headset. The pitch of the voice will change depending on how far you are from the object.

NewImage

Meanwhile Huawei is using AI to help visually impaired users “hear” facial expressions:

Facing Emotions taps the Mate 20 Pro’s back cameras to scan the faces of conversation partners, identifying facial features like eyes, nose, brows, and mouth, and their positions in relation to each other. An offline, on-device machine learning algorithm interprets the detected emotions as sounds, which the app plays on the handset’s loudspeaker.

NewImage

[YouTube] [Via Helen Papagiannis]