As I’m on a kick sharing recent work from Ira Kemelmacher-Shlizerman & team, here’s another banger:
Given an “in-the-wild” video, we train a deep network with the video frames to produce an animatable human representation.
This can be rendered from any camera view in any body pose, enabling applications such as motion re-targeting and bullet-time rendering without the need for rigged 3D meshes.
I look forward (?) to the not-so-distant day when a 3D-extracted Trevor Lawrence hucks a touchdown to Cleatus the Fox Sports Robot. Grand slam!!
“VOGUE: Try-On by StyleGAN,” from my former Google colleague Ira Kemelmacher-Shlizerman & her team, promises to synthesize photorealistic clothing & automatically apply it to a range of body shapes (leveraging the same StyleGAN foundation that my new teammates are using to build images via text):
I remain fascinated by what Snap & Facebook are doing with their respective AR platforms, putting highly programmable camera stacks into the hands of hundreds of millions of consumers & hundreds of thousands of creators. If you have thoughts on the subject & want to nerd out some time, drop me a note.
A few months back I wanted to dive into the engine that’s inside Instagram, and I came across the Spark AR masterclass put together & presented by filter creator Eddy Adams. I found it engaging & informative, if even a bit fast for my aging brain 🙃. If you’re tempted to get your feet wet in this emerging space, I recommend giving it a shot.
“‘Augmented Reality: A Land Of Contrasts.’ In this essay, I will…”
Okay, no, not really, but let me highlight some interesting mixed signals. (It’s worth noting that these are strictly my opinions, not those of any current or past employer.)
Pokémon Go debuted almost exactly 5 years ago, and last year, even amidst a global pandemic that largely immobilized people, it generated its best revenue ever—more than a billion dollars in just the first 10 months of the year, bringing its then-total to more than $4 billion.
Having said that…
In the five years since its launch, what other location-based AR games (or AR games, period) have you seen really take off? Even with triple-A characters & brands, Niantic’s own Harry Potter title made a far smaller splash, and Minecraft Earth (hyped extensively at an Apple keynote event) is being shut down.
When I launched Pokémon Go last year (for the first time in years), I noticed that the only apparent change since launch was that AR now defaults to off. That is, Niantic apparently decided that monster-catching was easier, more fun, and/or less resource-intensive when done in isolation, with no camera overlay.
The gameplay remains extremely rudimentary—no use (at least that I could see) of fancy SLAM tracking, depth processing, etc., despite Niantic having acquired startups to enable just this sort of thing, showing demos three years ago.
Network providers & handset makers really, really want you to want 5G—but I’ve yet to see it prove to be transformative (even for the cloud-rendered streaming AR that my Google team delivered last year). Even when “real” 5G is available beyond a couple of urban areas, it’s hard to imagine a popular title being 5G-exclusive.
So does this mean I think location-based AR games are doomed? Well, no, as I claim zero prognostication-fu here. I didn’t see Pokémon Go coming, despite my roommate in Nepal (who casually mentioned that he’d helped found Google Earth—as one does) describing it ahead of launch; and given the way public interest in the app dropped after launch (see above), I’d never have guessed that it would be generating record revenue now—much less during a pandemic!
So, who knows: maybe Niantic & its numerous partners will figure out how to recapture lighting in a bottle. Here’s a taste of how they expect that to look:
If I had to bet on someone, though, it’d be Snap: they’ve been doing amazing site-specific AR for the last couple of years, and they’ve prototyped collaborative experiences built on the AR engine that hundreds of millions of people use every day; see below. Game on!
I spent my last couple of years at Google working on a 3D & AR engine that could power experiences across Maps, YouTube, Search, and other surfaces. Meanwhile my colleagues have been working on data-gathering that’ll use this system to help people navigate via augmented reality. As TechCrunch writes:
Indoor Live View is the flashiest of these. Google’s existing AR Live View walking directions currently only work outdoors, but thanks to some advances in its technology to recognize where exactly you are (even without a good GPS signal), the company is now able to bring this indoors.
This feature is already live in some malls in the U.S. in Chicago, Long Island, Los Angeles, Newark, San Francisco, San Jose and Seattle, but in the coming months, it’ll come to select airports, malls and transit stations in Tokyo and Zurich as well (just in time for vaccines to arrive and travel to — maybe — rebound). Because Google is able to locate you by comparing the images around you to its database, it can also tell which floor you are on and hence guide you to your gate at the Zurich airport, for example.
I really enjoyed listening to the podcast version of this funny, accessible talk from AI Weirdness writer Janelle Shane, and think you’d get a kick out of it, too.
On her blog, Janelle writes about AI and the weird, funny results it can produce. She has trained AIs to produce things like cat names, paint colors, and candy heart messages. In this talk she explains how AIs learn, fail, adapt, and reflect the best and worst of humanity.
The high-key nutty (am I saying that right, kids?) thing is that they’ve devised a whole musical persona to go with it, complete with music videos:
L.L.A.M.A. is the first ever Lego mini-figure to be signed to a major label and the building toy group’s debut attempt at creating its own star DJ/ producer.
A cross between a helmet headed artist like Marshmello and a corporate synergy-prone artificial entity like Lil Miquela, L.L.A.M.A., which stands for “Love, Laughter and Music Always” (not kidding), is introducing himself to the world today with a debut single, “Shake.”
It appears that this guy & pals fly around on giant luckdragon-style copies of our goldendoodle Seamus, and I am here for that.
No markers, no mocap cameras, no suit, no keyframing. This take uses 3 DSLR cameras, though, and pretty far from being real-time. […]
Under the hood, it uses #OpenPose ML-network for 2d tracking of joints on each camera, and then custom Houdini setup for triangulating the results into 3d, stabilizing it and driving the rig (volumes, CHOPs, #kinefx, FEM – you name it 🙂
This is one of the many Google projects to which I’ve been lucky enough to contribute just a bit (focusing on object tracking & graphical adornments). It’s built into Google Photos, among other surfaces, and I’m really pleased that people are seeking it out:
Imagine loading multi-gigabyte 3D models nearly instantaneously into your mobile device, then placing them into your driveway and stepping inside. That’s what we’ve now enabled via Google Search on Android:
Take it for a spin via the models listed below, and please let us know what you think!
As part of Fiat Chrysler’s Virtual Showroom CES event, you can experience the new innovative 2021 Jeep Wrangler 4xe by scanning a QR code with your phone. You can then see an Augmented Reality (AR) model of the Wrangler right in front of you—conveniently in your own driveway or in any open space. Check out what the car looks like from any angle, in different colors, and even step inside to see the interior with incredible details.
A bit on how it works:
The Cloud AR tech uses a combination of edge computing and AR technology to offload the computing power needed to display large 3D files, rendered by Unreal Engine, and stream them down to AR-enabled devices using Google’s Scene Viewer. Using powerful rendering servers with gaming-console-grade GPUs, memory, and processors located geographically near the user, we’re able to deliver a powerful but low friction, low latency experience.
This rendering hardware allows us to load models with tens of millions of triangles and textures up to 4k, allowing the content we serve to be orders of magnitude larger than what’s served on mobile devices (i.e., on-device rendered assets).
And to try it out:
Scan the QR code below, or check out the FCA CES website. Depending on your OS, device, and network strength, you will see either a photorealistic, cloud-streamed AR model or an on-device 3D car model, both of which can then be placed in your physical environment.
Mandalorian fans: This is the way to see the Child in your space. Just search “the Child” and then tap “View in 3D.” Send us your best adventures using #Google3D and catch up on #TheMandalorian, now streaming on Disney+. pic.twitter.com/cyRYdpeB0F
I’m delighted to be closing out 2020 on a pair of high notes, welcoming the arrival of my two biggest efforts from the last year+.
First, Google Search now supports 150+ new cars that you can view in 3D and AR (via iPhone or Android device), including in beautiful cloud-rendered quality (provided you have a good connection & up-to-date Android). As we initially previewed in October:
Bring the showroom to you with AR
You can easily check out what the car looks like in different colors, zoom in to see intricate details like buttons on the dashboard, view it against beautiful backdrops and even see it in your driveway. We’re experimenting with this feature in the U.S. and working with top auto brands, such as Volvo and Porsche, to bring these experiences to you soon.
Now, when you search for a lipstick or eyeshadow product, like L’Oreal’s Infallible Paints Metallic Eyeshadow, you can see what it looks like on a range of skin tones and compare shades and textures to help you find the right products.
To help you find the perfect match, you can now also virtually try makeup products right from the Google app.
Google researchers Ira Kemelmacher-Shlizerman, Brian Curless, and Steve Seitz have been working with University of Washington folks on tech that promises “30fps in 4K resolution, and 60fps for HD on a modern GPU.”
Our technique is based on background matting, where an additional frame of the background is captured and used in recovering the alpha matte and the foreground layer.
Nearly 20 years ago, on one of my first customer visits as a Photoshop PM, I got to watch artists use PS + After Effects to extract people from photo backgrounds, then animate the results. The resulting film—The Kid Stays In The Picture—lent its name to the distinctive effect (see previous).
Now I’m delighted that Google Photos is rolling out similar output to its billion+ users, without requiring any effort or tools:
We use machine learning to predict an image’s depth and produce a 3D representation of the scene—even if the original image doesn’t include depth information from the camera. Then we animate a virtual camera for a smooth panning effect—just like out of the movies.
Photos is also rolling out new collages, like this:
And they’re introducing new themes in the stories-style Memories section up top as well:
Now you’ll see Memories surface photos of the most important people in your life… And starting soon, you’ll also see Memories about your favorite things—like sunsets—and activities—like baking or hiking—based on the photos you upload.
I love these simple, practical uses of augmented reality. The Maps team writes,
Last month, we launched Live View in Location Sharing for Pixel users, and we’ll soon expand this to all Android and iOS users around the globe. When a friend has chosen to share their location with you, you can easily tap on their icon and then on Live View to see where and how far away they are–with overlaid arrows and directions that help you know where to go.
Live View in Location Sharing will soon expand to all Android and iOS users globally on ARCore and ARKit supported phones.
They’re also working hard to leverage visual data & provide better localization and annotation.
With the help of machine learning and our understanding of the world’s topography, we’re able to take the elevation of a place into account so we can more accurately display the location of the destination pin in Live View. Below, you can see how Lombard Street—a steep, winding street in San Francisco—previously appeared far off into the distance. Now, you can quickly see that Lombard Street is much closer and the pin is aligned with where the street begins at the bottom of the hill.
When Google started putting 3D animals in Search last year it only had a few standard animals available like a tiger, a lion, a wolf, and a dog. It added more creatures in March, including alligators, ducks, and hedgehogs. In August, Google made prehistoric creatures and historical artifacts available in AR via its Arts and Culture app— and who among us wouldn’t love to check out the ancient crustacean Cambropachycope up close and personal?
From sign language to sports training to AR effects, tracking the human body unlocks some amazing possibilities, and my Google Research teammates are delivering great new tools:
We are excited to announce MediaPipe Holistic, […] a new pipeline with optimized pose, face and hand components that each run in real-time, with minimum memory transfer between their inference backends, and added support for interchangeability of the three components, depending on the quality/speed tradeoffs.
When including all three components, MediaPipe Holistic provides a unified topology for a groundbreaking 540+ keypoints (33 pose, 21 per-hand and 468 facial landmarks) and achieves near real-time performance on mobile devices. MediaPipe Holistic is being released as part of MediaPipe and is available on-device for mobile (Android, iOS) and desktop. We are also introducing MediaPipe’s new ready-to-use APIs for research (Python) and web (JavaScript) to ease access to the technology.
Researchers at Facebook & universities have devised a way to estimate depth from regular (monocular) video, enabling some beautiful AR effects. Check it out:
Call it AI, ML, FM (F’ing Magic), whatever: tech like this warms the heart and can free body & soul. Google’s Project Guideline helps people with impaired vision navigate the world on their own, independently & at speed. Runner & CEO Thomas Panek, who is blind, writes,
In the fall of 2019, I asked that question to a group of designers and technologists at a Google hackathon. I wasn’t anticipating much more than an interesting conversation, but by the end of the day they’d built a rough demo […].
I’d wear a phone on a waistband, and bone-conducting headphones. The phone’s camera would look for a physical guideline on the ground and send audio signals depending on my position. If I drifted to the left of the line, the sound would get louder and more dissonant in my left ear. If I drifted to the right, the same thing would happen, but in my right ear. Within a few months, we were ready to test it on an indoor oval track. […] It was the first unguided mile I had run in decades.
Check out the journey. (Side note: how great is “Blaze” as a name for a speedy canine running companion? ☺️)
There’s no way the title can do this one justice, so just watch as this ML-based technique identifies moving humans (including their reflections!), then segments them out to enable individual manipulation—including syncing up their motions and even removing people wholesale:
https://youtu.be/2pWK0arWAmU
Here’s the vid directly from the research team, which includes longtime Adobe vet David Salesin:
Google and Lucasfilm have teamed up to bring iconic moments from the first season of “The Mandalorian” to life with “The Mandalorian” AR Experience (available on the Play Store for 5G Google Pixels and other select 5G Android phones) as fans follow the show’s second season.
The app uses ARCore’s new Depth API to enable occlusion for more realistic environmental interactions:
New content will keep rolling out in the app each week on Mando Mondays, so stay tuned—and Pixel owners should keep an eye out for additional exclusive content outside of the app as well.
I’m reminded of the phrase, “I don’t know karate, but I know ka-reepy!”
xpression camera is a virtual camera app that imprints the movements of your face and head onto anyone you want while you chat on Zoom, stream on Twitch, or create a YouTube video.
Downside: You’re sticking it to the earth to the tune of 12mpg. Upside: U-turn arrows are neatly curved!
Swapping out the traditional display area & presenting a camera feed—which can evidently feature night vision as well—is a clever alternative to projecting a washed-out HUD onto the windshield.
Visit this page in your mobile browser, or just take a peek (below) at the project, brought to you by the Google Arts & Culture Lab:
Some context for folks like me, who didn’t grow up with a connection to Indian traditions:
Diwali is the Indian festival of lights, usually lasting five days and celebrated during the Hindu Lunisolar month Kartika. One of the most popular festivals of Hinduism, Diwali symbolizes the spiritual “victory of light over darkness, good over evil, and knowledge over ignorance”.
With AR on Google, you can meet eight life-sized Aussie animals up close and bring them into your backyard, living room, classroom—or take them on your adventures. Just search for koala, kangaroo, quokka, wombat, platypus, emu, kookaburra or echidna on your mobile browser (Android or iOS) or in the Google App and tap “View in 3D.”
L’Oréal Paris is also releasing one “exclusive look” on Google Duo, making it the first beauty brand to be used directly within Google’s video conference system.
It’ll be interesting to see how the market for digital makeup & apparel evolves in a more socially distant, WFH world.
Brief, beautiful choreography intertwines with particle effects in Barnaby Roper’s collaboration with dancer Kendi Jones, memorializing Elizabeth Eckford’s historic walk to school in 1957:
Probably needless to say, 3D model creation remains hard AF for most people, and as such it’s a huge chokepoint in the adoption of 3D & AR viewing experiences.
Fortunately we may be on the cusp of some breakthroughs. Apple is about to popularize LIDAR on phones, and with it we’ll see interesting photogrammetry apps like Polycam:
Search for Halloween, Jack-o-lantern, human skeleton, cat, dog, or German Shepherd in the Google App or on your mobile browser (Android or iOS) and you’ll find these de-fright-ful AR characters on Google. Tap “View in 3D” to see it up close and then bring it into your space with AR. Don’t forget to take pictures or videos!
The notion of a metaverse, “a collective virtual shared space, created by the convergence of virtually enhanced physical reality and physically persistent virtual space,” has long beguiled those of us captivated by augmented reality. Now Snap has been doing the hard work of making this more real, being able to scan & recognize one’s surroundings and impose a “persistent, shared AR world built right on top of your neighborhood.” Check it out:
This experience (presently available on just one street in London, but presumably destined to reach many others) builds on the AR Landmarkers work the company did previously. (As it happens, I think David Salesin—who led Adobe Research for many years—contributed to this effort during his stopover at Snap before joining Google Research.)
I’m delighted to share that my team’s work to add 3D & AR automotive results to Google Search—streaming in cinematic quality via cloud rendering—has now been announced! Check out the demo starting around 36:30:
You can easily check out what the car looks like in different colors, zoom in to see intricate details like buttons on the dashboard, view it against beautiful backdrops and even see it in your driveway. We’re experimenting with this feature in the U.S. and working with top auto brands, such as Volvo and Porsche, to bring these experiences to you soon.
Cloud streaming enables us to take file size out of the equation, so we can serve up super detailed visuals from models that are hundreds of megabytes in size:
Right now the feature is in testing in the US, so there’s a chance you can experience it via Android right now (with iOS planned soon). We hope to make it available widely soon, and I can’t wait to hear what you think!
This is one of the far-flung projects I’ve been glad to help support. New features (like this one that’s available on Pixel, and coming soon to iOS & Android):
When a friend has chosen to share their location with you, you can easily tap on their icon and then on Live View to see where and how far away they are–with overlaid arrows and directions that help you know where to go.
It’s also getting smarter about recognizing landmarks:
Soon, you’ll also be able to see nearby landmarks so you can quickly and easily orient yourself and understand your surroundings. Live View will show you how far away certain landmarks are from you and what direction you need to go to get there.
“Are they gonna use the Snapchat dancing hot dog to steer them or what?” — Henry Nack, age 11, bringing the 🔥 feature requests 😌
Funded by the US military and developed by a Seattle-based company called Command Sight, the new goggles will allow handlers to see through a dog’s eyes and give directions while staying out of sight and at a safe distance.
While looking through the dog’s eyes thanks to the goggle’s built-in camera, the handler can direct the dog by controlling an augmented reality visual indicator seen by the dog wearing the goggles.
I’m excited to see the tech my team has built into YouTube, Duo, and other apps land in Arts & Culture, powering five new fun experiences:
Snap a video or image of yourself to become Van Gogh or Frida Kahlo’s self-portraits, or the famous Girl with a Pearl Earring. You can also step deep into history with a traditional Samurai helmet or a remarkable Ancient Egyptian necklace.
To get started, open the free Google Arts & Culture app for Android or iOS and tap the rainbow camera icon at the bottom of the homepage.
Okay, not wars—how about enamel pins? Color me a little skeptical that the augmented reality portion of these pins will get much use, but hey, if it’s just a nice little bonus on something people already wanted, what the heck?
I’m pleased to be playing a very small role in making very large things, well, rather small.
Today, Google Arts & Culture has brought together a new collection to help anyone choose their perfect virtual travel with thousands of museums and cultural destinations to explore. And with the help of our partner CyArk, we’ve launched on Google Search 37 cultural heritage sites from across the world in Augmented Reality (AR). Hop from your couch and search on your mobile phone to bring the Moai statues of Ahu Ature Huki, Rapa Nui (Easter Island), the Brandenburg Gate in Germany, or the Maya pyramid of Chichén Itzá, Mexico right into your living room.
Here’s a list of landmarks you can search & explore:
El Castillo – Chichen Itza, Mexico
Brandenburg Gate, Germany
Ayutthaya, Thailand
Eim ya kyaung Temple – Bagan, Myanmar
Palace of Fine Arts, Mexico
Chacmol statue – Templo Mayor, Mexico
Thomas Jefferson Memorial, US
Lanzón – Chavín de Huántar, Peru
War Canoe – Waitangi Treaty Grounds, New Zealand
Ahu Ature Huki, Easter Island
Tomb of Tu Duc (Complex of Hué Monuments), Vietnam
Hovering the camera over the steering wheel will show customers how to use the steering wheel controls or paddle shifters, while pointing at the dashboard will show infotainment functionality.
The app was developed in just three months to roll out on the 2021 Ram TRX. The wild truck will be the first vehicle to use the Know & Go app, and it will be available on other FCA vehicles down the line.
Employee-Developed Know & Go Mobile App Debuts on 2021 Ram 1500 TRX
The free new app Diorama pairs with the $99 finger-worn Litho device to let you create AR movies directly inside your phone, using a selection of props & tapping into the Google Poly library:
“Diorama will democratize the creation of special effects in the same way the smartphone democratized photography. It will allow anyone to create beautiful visual effects the likes of which have previously only been accessible to Hollywood studios,” said Nat Martin, Founder at Litho in a statement.
When combined with the Litho controller users can animate objects simply by dragging them, fine tuning the path by grabbing specific points. Mood lighting can be added thanks to a selection of filters plus the app supports body tracking so creators can interact with a scene.
In March we introduced a new WebAssembly (Wasm) accelerated backend for TensorFlow.js (scroll further down to learn more about Wasm and why this is important). Today we are excited to announce a major performance update: as of TensorFlow.js version 2.3.0, our Wasm backend has become up to 10X faster by leveraging SIMD (vector) instructions and multithreading via XNNPACK, a highly optimized library of neural network operators.
You can see the performance improvements for yourself:
Check out this demo of our BlazeFace model, which has been updated to use the new Wasm backend: https://tfjs-wasm-simd-demo.netlify.app/ To compare against the unoptimized binary, try this version of the demo, which manually turns off SIMD and multithreading support.
I have to admit, as eager as I am to see augmented reality thrive, I was a little skeptical about the value of this AR bike-modding application, but my neighbor Chris (who rides when he’s not designing motorsports gear) is enthusiastic and offered some good perspective:
Over the winter I will build my Suzuki into a pure track bike, but there are things I won’t know if they will fit until I get them all together. I know they all fit an otherwise stock bike, but won’t know if they fit together.
I dig it, though everything depicted is much more AR than what I think of as Photoshop—and I’d love to live in a world where AR & spatial effects are this delightfully easy to create.
Awesome work by the team. Come grab a copy & build something great!
The ML Kit Pose Detection API is a lightweight versatile solution for app developers to detect the pose of a subject’s body in real time from a continuous video or static image. A pose describes the body’s position at one moment in time with a set of x,y skeletal landmark points. The landmarks correspond to different body parts such as the shoulders and hips. The relative positions of landmarks can be used to distinguish one pose from another.
“I know it’s not the ‘woke’ thing to say, but I hope the world is enslaved by an ancient soulless sentience.” — my Lovecraft-loving weirdo friend
Heh—they’re not all creepy, to be sure, but come browse fun stuff on Google Arts & Culture, view the models in 3D, and if you’re on your phone, place them in your space via AR.
Some of the creatures include the Aegirocassis, a sea creature that existed 480 million years ago; a creepy-looking ancient crustacean; and a digital remodel of the whale skeleton, which is currently in view in the National History Museum’s Hintze Hall.
Exceedingly tangentially: who doesn’t love a good coelacanth reference?
My old teammates keep slapping out the bangers, releasing machine-learning tech to help build apps that key off the human form.
First up is Media Pipe Iris, enabling depth estimation for faces without fancy (iPhone X-/Pixel 4-style) hardware, and that in turn opens up access to accurate virtual try-on for glasses, hats, etc.:
The model enables cool tricks like realtime eye recoloring:
I always find it interesting to glimpse the work that goes in behind the scenes. For example:
To train the model from the cropped eye region, we manually annotated ~50k images, representing a variety of illumination conditions and head poses from geographically diverse regions, as shown below.
The team has followed up this release with MediaPipe BlazePose, which is in testing now & planned for release via the cross-platform ML Kit soon:
Our approach provides human pose tracking by employing machine learning (ML) to infer 33, 2D landmarks of a body from a single frame. In contrast to current pose models based on the standard COCO topology, BlazePose accurately localizes more keypoints, making it uniquely suited for fitness applications…
If one leverages GPU inference, BlazePose achieves super-real-time performance, enabling it to run subsequent ML models, like face or hand tracking.
Now I can’t wait for apps to help my long-suffering CrossFit coaches actually quantify the crappiness of my form. Thanks, team! 😛
“Comparison is the thief of joy.” — Theodore Roosevelt “Move your ass, fat boy!” — CrossFit
Okay, CF doesn’t say the latter, at least at my gym, but there’s a lot to be said for having a mix of social support/pressure—which is exactly why I’m happy to pay for CF as well as Peloton (leaderboards, encouragement, etc.).
Now the Ghost Pacer headset promises to run you ragged, or at least keep you honest, through augmented reality: