I’ve long considered augmented reality apps to be “realtime Photoshop”—or perhaps more precisely, “realtime After Effects.” I think that’s true & wonderful, but most consumer AR tends to be ultra-confined filters that produce ~1 outcome well.
Walking around San Francisco today, it struck me today that DALL•E & other emerging generative-art tools could—if made available via a simple mobile UI—offer a new kind of (almost) realtime Photoshop, with radically greater creative flexibility.
Here I captured a nearby sculpture, dropped out the background in Photoshop, uploaded it to DALL•E, and requested “a low-polygon metallic tree surrounded by big dancing robots and small dancing robots.” I like the results!
Building on yesterday’s post about Google’s new Geospatial API, developers can now embed a live view featuring a camera feed + augmentations, and developers like Bird are wasting no time in putting it to use. TNW writes,
When parking a scooter, the app prompts a rider to quickly scan the QR code on the vehicle and its surrounding area using their smartphone camera… [T]his results in precise, centimeter-level geolocation that enables the system to detect and prevent improper parking with extreme accuracy — all while helping monitor user behavior.
Out of the over 200 cities that Lime serves, its VPS is live now in six: London, Paris, Tel Aviv, Madrid, San Diego and Bordeaux. Similar to Bird, Lime’s pilot involves testing the tech with a portion of riders. The company said results from its pilots have been promising, with those who used the new tool seeing a 26% decrease in parking errors compared to riders who didn’t have the tool enabled.
My friend Bilawal & I collaborated on AR at Google, including our efforts to build a super compact 3D engine for driving spatial annotation & navigation. We’d often talk excitedly about location-based AR experiences, especially the Landmarker functionality arriving in Snapchat. All the while he’s been busy pushing the limits of photogrammetry (including putting me in space!) to scan 3D objects.
Now I’m delighted to see him & his team unveiling the Geospatial API (see blog post, docs, and code), which enables cross-platform (iOS, Android) deployment of experiences that present both close-up & far-off augmentations. Here’s the 1-minute sizzle reel:
For a closer look, check out this interesting deep dive into what it offers & how it works:
Among the Google teams working on augmented reality, there was a low-key religious war about the importance of “metric scale” (i.e. matching real-world proportions 1:1). The ARCore team believed it was essential (no surprise, given their particular tech stack), while my team (Research) believed that simply placing things in the world with a best guess as to size, then letting users adjust an object if needed, was often the better path.
I thought of this upon seeing StreetEasy’s new AR tech for apartment-hunting in NYC. At the moment it lets you scan a building to see its inventory. That’s very cool, but my mind jumped to the idea of seeing 3D representations of actual apartments (something the company already offers, albeit not in AR), and I’m amused to think of my old Manhattan place represented in AR: drawing it as a tiny box at one’s feet would be metric scale. 😅 My God that place sucked. Anyway, we’ll see how useful this tech proves & where it can go from here.
“A StreetEasy Instagram poll found that 95% of people have walked past an apartment building and wondered if it has an available unit that meets their criteria. At the same time, 77% have had trouble identifying a building’s address to search for later.”
I got a rude awakening a couple of years ago while working in Google’s AR group: the kind of displays that could fit into “glasses that look like glasses” (i.e. not Glass-style unicorn protuberances) had really tiny fields of view, crummy resolution, short battery life, and more. I knew that my efforts to enable cloud-raytraced Volvos & Stormtroopers & whatnot wouldn’t last long in a world that prioritized Asteroids-quality vector graphics on a display the size of a 3″x5″ index card held at arm’s length.
Having been out of that world for a year+ now, I have no inside info on how Google’s hardware efforts have been evolving, but I’m glad to see that they’re making a serious (billion-dollar+) investment in buying more compelling display tech. Per The Verge,
According to Raxium’s website, a Super AMOLED screen on your phone has a pixel pitch (the distance between the center of one pixel, and the center of another pixel next to it) of about 50 microns, while its MicroLED could manage around 3.5 microns. It also boasts of “unprecedented efficiency” that’s more than five times better than any world record.
How does any of this compare to what we’ll see out of Apple, Meta, Snap, etc.? I have no idea, but at least parts of the future promise to be fun.
“In 2019, we started with templates of 30 beloved sites around the world which creators could build upon called Landmarkers… Today, we’re launching Custom Landmarkers in Lens Studio, letting creators anchor Lenses to local places they care about to tell richer stories about their communities through AR.”
At its Lens Fest event, the company announced that 250,000 lens creators from more than 200 countries have made 2.5 million lenses that have been viewed more than 3.5 trillion times. Meanwhile, on Snapchat’s TikTok clone Spotlight, the app awarded 12,000 creators a total of $250 million for their posts. The company says that more than 65% of Spotlight submissions use one of Snapchat’s creative tools or lenses.
Once the deal closes, BRIO XR will be joining an unparalleled community of engineers and product experts at Adobe – visionaries who are pushing the boundaries of what’s possible in 3D and immersive creation. Our BRIO XR team will contribute to Adobe’s Creative Cloud 3D authoring and experience design teams. Simply put, Adobe is the place to be, and in fact, it’s a place I’ve long set my sights on joining.
[Adobe] announced a tool that allows consumers to point their phone at a product image on an ecommerce site—and then see the item rendered three-dimensionally in their living space. Adobe says the true-to-life size precision—and the ability to pull multiple products into the same view—set its AR service apart from others on the market. […]
Chang Xiao, the Adobe research scientist who created the tool, said many of the AR services currently on the market provide only rough estimations of the size of the product. Adobe is able to encode dimensions information in its invisible marker code embedded in the photos, which its computer vision algorithms can translate into more precisely sized projections.
As I’ve noted previously, I’m (oddly?) much more bullish on Snap than on Niantic to figure out location-based augmentation of the world. That’s in part because of their very cool world lens tech, which can pair specific experiences with specific spots. It’s cool to see it rolling out more widely:
The first Lens is a new AR experience that takes users through the story of Asian-American businesswoman Lucy Yu, the owner of ‘Yu & Me Books’ in NYC, which is an independent bookshop that’s dedicated to showcasing stories from underrepresented authors.
And for one that’s more widely accessible,
Snap’s also added a new Year of the Tiger Lens, which uses Sky Segmentation technology to add an animated watercolor tiger jumping through the clouds.
“I’m like, ‘Bro, how much furniture do you think I buy??'”
I forget who said this while I was working on AR at Google, but it’s always made me laugh, because nearly every demo inevitably gets into the territory of, “Don’t you wish you could see whether this sofa fits in your space?”
Still, though, it’s a useful capability—especially if one can offer a large enough corpus of 3D models (something we found challenging, at least a few years back). Now, per the Verge:
Pinterest is adding a “Try On for Home Decor” feature to its app, letting you see furniture from stores like Crate & Barrel, CB2, Walmart, West Elm, and Wayfair in your house… According to the company’s announcement post, you’ll be able to use its Lens camera to try out over 80,000 pieces of furniture from “shoppable Pins.”
Hmm—I always want to believe in tools like this, but I remain skeptical. Back at Google I played with Blocks, which promised to make 3D creation fun, but which in my experience combined the inherent complexity of that art with the imprecision and arm fatigue of waving controllers in space. But who knows—maybe Shapes is different?
I’m intrigued but not quite sure how to feel about this. Precisely tracking groups of fast-moving human bodies & producing lifelike 3D copies in realtime is obviously a stunning technical coup—but is watching the results something people will prefer to high-def video of the real individuals & all their expressive nuances? I have no idea, but I’d like to know more.
(No, not that Notre Dame—the cathedral undergoing restoration.) This VR tour looks compelling:
Equipped with an immersive device (VR headset and backpack), visitors will be able to move freely in a 500 sqm space in Virtual Reality. Guided by a “Compagnon du Devoir” they will travel through different centuries and will explore several eras of Notre Dame de Paris and its environement, recreated in 3D.
Thanks to scientific surveys, and precise historical data, the cathedral and its surroundings have been precisely reproduced to enhance the visitor’s immersion and engagement in the experience.
Discover the experience for yourself with these QR Codes by downloading the Aero app. We recommend running the experience for iOS on 8S and above, or on Android, Private Beta, US only, a list of Android can be found here on HelpX. (FYI, the experience may take a few seconds to load as it is a more sophisticated AR project.)
Pokemon Go remains the one-hit wonder of the location-based content/gaming space. That being true 5+ years after its launch, during which time Niantic has launched & killed Harry Potter Wizard Unite; Microsoft has done the same with Minecraft Earth; and Google has (AFAIK) followed suit with their location-based gaming API, I’m not sure that we’ll turn a corner until real AR glasses arrive.
The Niantic Lightship Augmented Reality Developer Kit, or ARDK, is now available for all AR developers around the world at Lightship.dev. To celebrate the launch, we’re sharing a glimpse of the earliest AR applications and demo experiences from global brand partners and developer studios from across the world.
We’re also announcing the formation of Niantic Ventures to invest in and partner with companies building the future of AR. With an initial $20 million fund, Niantic Ventures will invest in companies building applications that share our vision for the Real-World Metaverse and contribute to the global ecosystem we are building. To learn more about Niantic Ventures, go to Lightship.dev.
It’s cool that “The Multiplayer API is free for apps with fewer than 50,000 monthly active users,” and even above that number, it’s free to everyone for the first six months.
I was so excited to build an AR stack for Google Lens, aiming to bring realtime magic to billions of phones’ default camera. Sadly, after AR Playground went out the door three years ago & the world shrugged, Google lost interest.
Dubbed “Quick Tap to Snap,” the new feature will enable users to tap the back of the device twice to open the Snapchat camera directly from the lock screen. Users will have to authenticate before sending photos or videos to a friend or their personal Stories page.
Snapchat’s Pixel service will also include extra augmented-reality lenses and integrate some Google features, like live translation in the chat feature, according to the company.
I wish Apple would offer similar access to third-party camera apps like Halide Camera, etc. Its absence has entirely killed my use of those apps, no matter how nice they may be.
By now you’ve probably seen this big gato bounding around:
I’ve been wondering how it was done (e.g. was it something from Snap, using the landmarker tech that’s enabled things like Game of Thrones dragons to scale the Flatiron Building?). Fortunately the Verge provides some insights:
In short, what’s going on is that an animation of the virtual panther, which was made in Unreal Engine, is being rendered within a live feed of the real world. That means camera operators have to track and follow the animations of the panther in real time as it moves around the stadium, like camera operators would with an actual living animal. To give the panther virtual objects to climb on and interact with, the stadium is also modeled virtually but is invisible.
This tech isn’t baked into an app, meaning you won’t be pointing your phone’s camera in the stadium to get another angle on the panther if you’re attending a game. The animations are intended to air live. In Sunday’s case, the video was broadcast live on the big screens at the stadium.
I look forward to the day when this post is quaint, given how frequently we’re all able to glimpse things like this via AR glasses. I give it 5 years, or maybe closer to 10—but let’s see.
I swear I spent half of last summer staring at tiny 3D Naomi Osaka volleying shots on my desktop. I remain jealous of my former teammates who got to work with these athletes (and before them, folks like Donald Glover as Childish Gambino), even though doing so meant dealing with a million Covid safety protocols. Here’s a quick look at how they captured folks flexing & flying through space:
Last summer my former teammates got all kinds of clever in working around Covid restrictions—and the constraints of physics and 3D capture—to digitize top Olympic athletes performing their signature moves. I wish they’d share the behind-the-scenes footage, as it’s legit fascinating. (Also great: seeing Donald Glover, covered in mocap ping pong balls for the making of Pixel Childish Gambino AR content, sneaking up behind my colleague like some weird-ass phantom. 😝)
Anyway, after so much delay and uncertainty, I’m happy to see those efforts now paying off in the form of 3D/AR search results. Check it out:
Hmm—this looks slick, but I’m not sure that I want to have a big plastic box swinging around my face while I’m trying to get fit. As a commenter notes, “That’s just Beat Saber with someone saying ‘good job’ once in a while”—but a friend of mine says it’s great. ¯\_(ツ)_/¯
This vid (same poster frame but different content) shows more of the actual gameplay:
You’ll scream, you’ll cry, promises designer Dave Werner—and maybe not due just to “my questionable dance moves.”
Live-perform 2D character animation using your body. Powered by Adobe Sensei, Body Tracker automatically detects human body movement using a web cam and applies it to your character in real time to create animation. For example, you can track your arms, torso, and legs automatically. View the full release notes.
“VOGUE: Try-On by StyleGAN,” from my former Google colleague Ira Kemelmacher-Shlizerman & her team, promises to synthesize photorealistic clothing & automatically apply it to a range of body shapes (leveraging the same StyleGAN foundation that my new teammates are using to build images via text):
I remain fascinated by what Snap & Facebook are doing with their respective AR platforms, putting highly programmable camera stacks into the hands of hundreds of millions of consumers & hundreds of thousands of creators. If you have thoughts on the subject & want to nerd out some time, drop me a note.
A few months back I wanted to dive into the engine that’s inside Instagram, and I came across the Spark AR masterclass put together & presented by filter creator Eddy Adams. I found it engaging & informative, if even a bit fast for my aging brain 🙃. If you’re tempted to get your feet wet in this emerging space, I recommend giving it a shot.
“‘Augmented Reality: A Land Of Contrasts.’ In this essay, I will…”
Okay, no, not really, but let me highlight some interesting mixed signals. (It’s worth noting that these are strictly my opinions, not those of any current or past employer.)
Pokémon Go debuted almost exactly 5 years ago, and last year, even amidst a global pandemic that largely immobilized people, it generated its best revenue ever—more than a billion dollars in just the first 10 months of the year, bringing its then-total to more than $4 billion.
Having said that…
In the five years since its launch, what other location-based AR games (or AR games, period) have you seen really take off? Even with triple-A characters & brands, Niantic’s own Harry Potter title made a far smaller splash, and Minecraft Earth (hyped extensively at an Apple keynote event) is being shut down.
When I launched Pokémon Go last year (for the first time in years), I noticed that the only apparent change since launch was that AR now defaults to off. That is, Niantic apparently decided that monster-catching was easier, more fun, and/or less resource-intensive when done in isolation, with no camera overlay.
The gameplay remains extremely rudimentary—no use (at least that I could see) of fancy SLAM tracking, depth processing, etc., despite Niantic having acquired startups to enable just this sort of thing, showing demos three years ago.
Network providers & handset makers really, really want you to want 5G—but I’ve yet to see it prove to be transformative (even for the cloud-rendered streaming AR that my Google team delivered last year). Even when “real” 5G is available beyond a couple of urban areas, it’s hard to imagine a popular title being 5G-exclusive.
So does this mean I think location-based AR games are doomed? Well, no, as I claim zero prognostication-fu here. I didn’t see Pokémon Go coming, despite my roommate in Nepal (who casually mentioned that he’d helped found Google Earth—as one does) describing it ahead of launch; and given the way public interest in the app dropped after launch (see above), I’d never have guessed that it would be generating record revenue now—much less during a pandemic!
So, who knows: maybe Niantic & its numerous partners will figure out how to recapture lighting in a bottle. Here’s a taste of how they expect that to look:
If I had to bet on someone, though, it’d be Snap: they’ve been doing amazing site-specific AR for the last couple of years, and they’ve prototyped collaborative experiences built on the AR engine that hundreds of millions of people use every day; see below. Game on!
I spent my last couple of years at Google working on a 3D & AR engine that could power experiences across Maps, YouTube, Search, and other surfaces. Meanwhile my colleagues have been working on data-gathering that’ll use this system to help people navigate via augmented reality. As TechCrunch writes:
Indoor Live View is the flashiest of these. Google’s existing AR Live View walking directions currently only work outdoors, but thanks to some advances in its technology to recognize where exactly you are (even without a good GPS signal), the company is now able to bring this indoors.
This feature is already live in some malls in the U.S. in Chicago, Long Island, Los Angeles, Newark, San Francisco, San Jose and Seattle, but in the coming months, it’ll come to select airports, malls and transit stations in Tokyo and Zurich as well (just in time for vaccines to arrive and travel to — maybe — rebound). Because Google is able to locate you by comparing the images around you to its database, it can also tell which floor you are on and hence guide you to your gate at the Zurich airport, for example.
I really enjoyed listening to the podcast version of this funny, accessible talk from AI Weirdness writer Janelle Shane, and think you’d get a kick out of it, too.
On her blog, Janelle writes about AI and the weird, funny results it can produce. She has trained AIs to produce things like cat names, paint colors, and candy heart messages. In this talk she explains how AIs learn, fail, adapt, and reflect the best and worst of humanity.
The high-key nutty (am I saying that right, kids?) thing is that they’ve devised a whole musical persona to go with it, complete with music videos:
L.L.A.M.A. is the first ever Lego mini-figure to be signed to a major label and the building toy group’s debut attempt at creating its own star DJ/ producer.
A cross between a helmet headed artist like Marshmello and a corporate synergy-prone artificial entity like Lil Miquela, L.L.A.M.A., which stands for “Love, Laughter and Music Always” (not kidding), is introducing himself to the world today with a debut single, “Shake.”
It appears that this guy & pals fly around on giant luckdragon-style copies of our goldendoodle Seamus, and I am here for that.
No markers, no mocap cameras, no suit, no keyframing. This take uses 3 DSLR cameras, though, and pretty far from being real-time. […]
Under the hood, it uses #OpenPose ML-network for 2d tracking of joints on each camera, and then custom Houdini setup for triangulating the results into 3d, stabilizing it and driving the rig (volumes, CHOPs, #kinefx, FEM – you name it 🙂
This is one of the many Google projects to which I’ve been lucky enough to contribute just a bit (focusing on object tracking & graphical adornments). It’s built into Google Photos, among other surfaces, and I’m really pleased that people are seeking it out:
Imagine loading multi-gigabyte 3D models nearly instantaneously into your mobile device, then placing them into your driveway and stepping inside. That’s what we’ve now enabled via Google Search on Android:
Take it for a spin via the models listed below, and please let us know what you think!
As part of Fiat Chrysler’s Virtual Showroom CES event, you can experience the new innovative 2021 Jeep Wrangler 4xe by scanning a QR code with your phone. You can then see an Augmented Reality (AR) model of the Wrangler right in front of you—conveniently in your own driveway or in any open space. Check out what the car looks like from any angle, in different colors, and even step inside to see the interior with incredible details.
A bit on how it works:
The Cloud AR tech uses a combination of edge computing and AR technology to offload the computing power needed to display large 3D files, rendered by Unreal Engine, and stream them down to AR-enabled devices using Google’s Scene Viewer. Using powerful rendering servers with gaming-console-grade GPUs, memory, and processors located geographically near the user, we’re able to deliver a powerful but low friction, low latency experience.
This rendering hardware allows us to load models with tens of millions of triangles and textures up to 4k, allowing the content we serve to be orders of magnitude larger than what’s served on mobile devices (i.e., on-device rendered assets).
And to try it out:
Scan the QR code below, or check out the FCA CES website. Depending on your OS, device, and network strength, you will see either a photorealistic, cloud-streamed AR model or an on-device 3D car model, both of which can then be placed in your physical environment.
I’m delighted to be closing out 2020 on a pair of high notes, welcoming the arrival of my two biggest efforts from the last year+.
First, Google Search now supports 150+ new cars that you can view in 3D and AR (via iPhone or Android device), including in beautiful cloud-rendered quality (provided you have a good connection & up-to-date Android). As we initially previewed in October:
Bring the showroom to you with AR
You can easily check out what the car looks like in different colors, zoom in to see intricate details like buttons on the dashboard, view it against beautiful backdrops and even see it in your driveway. We’re experimenting with this feature in the U.S. and working with top auto brands, such as Volvo and Porsche, to bring these experiences to you soon.
Now, when you search for a lipstick or eyeshadow product, like L’Oreal’s Infallible Paints Metallic Eyeshadow, you can see what it looks like on a range of skin tones and compare shades and textures to help you find the right products.
To help you find the perfect match, you can now also virtually try makeup products right from the Google app.
Google researchers Ira Kemelmacher-Shlizerman, Brian Curless, and Steve Seitz have been working with University of Washington folks on tech that promises “30fps in 4K resolution, and 60fps for HD on a modern GPU.”
Our technique is based on background matting, where an additional frame of the background is captured and used in recovering the alpha matte and the foreground layer.
Nearly 20 years ago, on one of my first customer visits as a Photoshop PM, I got to watch artists use PS + After Effects to extract people from photo backgrounds, then animate the results. The resulting film—The Kid Stays In The Picture—lent its name to the distinctive effect (see previous).
Now I’m delighted that Google Photos is rolling out similar output to its billion+ users, without requiring any effort or tools:
We use machine learning to predict an image’s depth and produce a 3D representation of the scene—even if the original image doesn’t include depth information from the camera. Then we animate a virtual camera for a smooth panning effect—just like out of the movies.
Photos is also rolling out new collages, like this:
And they’re introducing new themes in the stories-style Memories section up top as well:
Now you’ll see Memories surface photos of the most important people in your life… And starting soon, you’ll also see Memories about your favorite things—like sunsets—and activities—like baking or hiking—based on the photos you upload.
I love these simple, practical uses of augmented reality. The Maps team writes,
Last month, we launched Live View in Location Sharing for Pixel users, and we’ll soon expand this to all Android and iOS users around the globe. When a friend has chosen to share their location with you, you can easily tap on their icon and then on Live View to see where and how far away they are–with overlaid arrows and directions that help you know where to go.
Live View in Location Sharing will soon expand to all Android and iOS users globally on ARCore and ARKit supported phones.
They’re also working hard to leverage visual data & provide better localization and annotation.
With the help of machine learning and our understanding of the world’s topography, we’re able to take the elevation of a place into account so we can more accurately display the location of the destination pin in Live View. Below, you can see how Lombard Street—a steep, winding street in San Francisco—previously appeared far off into the distance. Now, you can quickly see that Lombard Street is much closer and the pin is aligned with where the street begins at the bottom of the hill.
When Google started putting 3D animals in Search last year it only had a few standard animals available like a tiger, a lion, a wolf, and a dog. It added more creatures in March, including alligators, ducks, and hedgehogs. In August, Google made prehistoric creatures and historical artifacts available in AR via its Arts and Culture app— and who among us wouldn’t love to check out the ancient crustacean Cambropachycope up close and personal?
From sign language to sports training to AR effects, tracking the human body unlocks some amazing possibilities, and my Google Research teammates are delivering great new tools:
We are excited to announce MediaPipe Holistic, […] a new pipeline with optimized pose, face and hand components that each run in real-time, with minimum memory transfer between their inference backends, and added support for interchangeability of the three components, depending on the quality/speed tradeoffs.
Call it AI, ML, FM (F’ing Magic), whatever: tech like this warms the heart and can free body & soul. Google’s Project Guideline helps people with impaired vision navigate the world on their own, independently & at speed. Runner & CEO Thomas Panek, who is blind, writes,
In the fall of 2019, I asked that question to a group of designers and technologists at a Google hackathon. I wasn’t anticipating much more than an interesting conversation, but by the end of the day they’d built a rough demo […].
I’d wear a phone on a waistband, and bone-conducting headphones. The phone’s camera would look for a physical guideline on the ground and send audio signals depending on my position. If I drifted to the left of the line, the sound would get louder and more dissonant in my left ear. If I drifted to the right, the same thing would happen, but in my right ear. Within a few months, we were ready to test it on an indoor oval track. […] It was the first unguided mile I had run in decades.
Check out the journey. (Side note: how great is “Blaze” as a name for a speedy canine running companion? ☺️)
There’s no way the title can do this one justice, so just watch as this ML-based technique identifies moving humans (including their reflections!), then segments them out to enable individual manipulation—including syncing up their motions and even removing people wholesale:
Here’s the vid directly from the research team, which includes longtime Adobe vet David Salesin: