Building on yesterday’s post about Google’s new Geospatial API, developers can now embed a live view featuring a camera feed + augmentations, and developers like Bird are wasting no time in putting it to use. TNW writes,
When parking a scooter, the app prompts a rider to quickly scan the QR code on the vehicle and its surrounding area using their smartphone camera… [T]his results in precise, centimeter-level geolocation that enables the system to detect and prevent improper parking with extreme accuracy — all while helping monitor user behavior.
Out of the over 200 cities that Lime serves, its VPS is live now in six: London, Paris, Tel Aviv, Madrid, San Diego and Bordeaux. Similar to Bird, Lime’s pilot involves testing the tech with a portion of riders. The company said results from its pilots have been promising, with those who used the new tool seeing a 26% decrease in parking errors compared to riders who didn’t have the tool enabled.
My friend Bilawal & I collaborated on AR at Google, including our efforts to build a super compact 3D engine for driving spatial annotation & navigation. We’d often talk excitedly about location-based AR experiences, especially the Landmarker functionality arriving in Snapchat. All the while he’s been busy pushing the limits of photogrammetry (including putting me in space!) to scan 3D objects.
Now I’m delighted to see him & his team unveiling the Geospatial API (see blog post, docs, and code), which enables cross-platform (iOS, Android) deployment of experiences that present both close-up & far-off augmentations. Here’s the 1-minute sizzle reel:
For a closer look, check out this interesting deep dive into what it offers & how it works:
Hmm—dunno whether I’d prefer carrying this little dude over just pocketing a battery pack or two—but I dig the idea & message:
Once set up on its tripod, the 3-pound, 40-watt device automatically rotates towards the wind and starts charging its 5V, 12,000 mAh battery. (Alternatively it can charge your device directly via USB.) The company says that in peak conditions, the Shine Turbine can generate enough juice to charge a smartphone in just 20 minutes.
Heh—I got a kick out of seeing how AI would go about hallucinating its idea of what my flamed-out ’84 Volvo wagon looked like. See below for a comparison. And in retrospect, how did I not adorn mine with a tail light made from a traffic cone (or is it giant candy corn?) and “VOOFO NACK”? 😅
Not yet having access to this system [taps mic impatiently], I’m just checking out its simple but effective interface from afar. Here’s how artists can designate specific regions in order to repopulate them:
Among the Google teams working on augmented reality, there was a low-key religious war about the importance of “metric scale” (i.e. matching real-world proportions 1:1). The ARCore team believed it was essential (no surprise, given their particular tech stack), while my team (Research) believed that simply placing things in the world with a best guess as to size, then letting users adjust an object if needed, was often the better path.
I thought of this upon seeing StreetEasy’s new AR tech for apartment-hunting in NYC. At the moment it lets you scan a building to see its inventory. That’s very cool, but my mind jumped to the idea of seeing 3D representations of actual apartments (something the company already offers, albeit not in AR), and I’m amused to think of my old Manhattan place represented in AR: drawing it as a tiny box at one’s feet would be metric scale. 😅 My God that place sucked. Anyway, we’ll see how useful this tech proves & where it can go from here.
“A StreetEasy Instagram poll found that 95% of people have walked past an apartment building and wondered if it has an available unit that meets their criteria. At the same time, 77% have had trouble identifying a building’s address to search for later.”
I got a rude awakening a couple of years ago while working in Google’s AR group: the kind of displays that could fit into “glasses that look like glasses” (i.e. not Glass-style unicorn protuberances) had really tiny fields of view, crummy resolution, short battery life, and more. I knew that my efforts to enable cloud-raytraced Volvos & Stormtroopers & whatnot wouldn’t last long in a world that prioritized Asteroids-quality vector graphics on a display the size of a 3″x5″ index card held at arm’s length.
Having been out of that world for a year+ now, I have no inside info on how Google’s hardware efforts have been evolving, but I’m glad to see that they’re making a serious (billion-dollar+) investment in buying more compelling display tech. Per The Verge,
According to Raxium’s website, a Super AMOLED screen on your phone has a pixel pitch (the distance between the center of one pixel, and the center of another pixel next to it) of about 50 microns, while its MicroLED could manage around 3.5 microns. It also boasts of “unprecedented efficiency” that’s more than five times better than any world record.
How does any of this compare to what we’ll see out of Apple, Meta, Snap, etc.? I have no idea, but at least parts of the future promise to be fun.
Each week we’ll cover a different aspect of machine learning. A short lecture covering theories and practices will be followed by demoes using open source web tools and a web-browser tool called Google Colab. The last 3 weeks of class you’ll be given the chance to create your own project using the skills you’ve learned. Topics will include selecting the right model for your use case, gathering and manipulating datasets, and connecting your models to data sources such as audio, text, or numerical data. We’ll also talk a little ethics, because we can’t teach machine learning without a little ethics.
I really enjoyed this conversation—touching, as it does, on my latest fascination (AI-generated art via DALL•E) and myriad other topics. In fact, I plan to listen to it again—hopefully this time near a surface through which to jot down & share some of the most resonant observations. Meanwhile, I think you’ll find it thoughtful & stimulating.
In this episode of the podcast, Sam Harris speaks with Eric Schmidt about the ways artificial intelligence is shifting the foundations of human knowledge and posing questions of existential risk.
Last year I took my then-11yo son Henry (aka my astromech droid) on a 2000-mile “Miodyssey” down Route 66 in my dad’s vintage Miata. It was a great way to see the country (see more pics & posts than you might ever want), and despite the tight quarters we managed not to kill one another—or to get slain by Anton Chigurh in an especially murdery Texas town (but that’s another story!).
Well, they do call themselves a camera company… ¯\_(ツ)_/¯ This little contraption looks incredibly lightweight (pocketable, even) and easy to use. Visual quality (particularly stabilization) seems a little borderline, but I dig its person-centric nature, including tracking & AR effects (segmentation, cloning, etc.). Check out a great review—including a man-machine “romantic montage” (!):
Adobe Super Resolution technology is the best solution I’ve yet found for increasing the resolution of digital images. It doubles the linear resolution of your file, quadrupling the total pixel count while preserving fine detail. Super Resolution is available in both Adobe Camera Raw (ACR) and Lightroom and is accessed via the Enhance command. And because it’s built-in, it’s free for subscribers to the Creative Cloud Photography Plan.
“In 2019, we started with templates of 30 beloved sites around the world which creators could build upon called Landmarkers… Today, we’re launching Custom Landmarkers in Lens Studio, letting creators anchor Lenses to local places they care about to tell richer stories about their communities through AR.”
At its Lens Fest event, the company announced that 250,000 lens creators from more than 200 countries have made 2.5 million lenses that have been viewed more than 3.5 trillion times. Meanwhile, on Snapchat’s TikTok clone Spotlight, the app awarded 12,000 creators a total of $250 million for their posts. The company says that more than 65% of Spotlight submissions use one of Snapchat’s creative tools or lenses.
My old boss on Photoshop, Kevin Connor, used to talk about the inexorable progression of imaging tools from the very general (e.g. the Clone Stamp) to the more specific (e.g. the Healing Brush). In the process, high-complexity, high-skill operations were rendered far more accessible—arguably to a fault. (I used to joke that believe it or not, drop shadows were cool before Photoshop made them easy. ¯\_(ツ)_/¯)
I think of that observation when seeing things like the Face Swap tool from Icons8. What once took considerable time & talent in an app like Photoshop is now rendered trivially fast (and free!) to do. “Days of Miracles & Wonder,” though we hardly even wonder now. (How long will it take DALL•E to go from blown minds to shrugged shoulders? But that’s a subject for another day.)
I’m no 3D artist (had I but world enough and time…), but I sure love their work & anything that makes it faster and easier. Perhaps my most obscure point of pride from my Photoshop years is that we added per-layer timestamps into PSD files, so that Pixar could more efficiently render content by noticing which layers had actually been modified.
The Substance 3D plugin (BETA) enables the use of Substance materials directly in Unreal Engine 5 and Unreal Engine 4. Whether you are working on games, visualization and or deploying across mobile, desktop, or XR, Substance delivers a unique experience with optimized features for enhanced productivity.
Work faster, be more productive: Substance parameters allow for real-time material changes and texture updates.
Substance 3D for Unreal Engine 5 contains the plugin for Substance Engine.
The Substance Assets platform is a vast library containing high-quality PBR-ready Substance materials and is accessible directly in Unreal through the Substance plugin. These customizable Substance files can easily be adapted to a wide range of projects.
Frame.io for Creative Cloud includes real-time review and approval tools with commenting and frame-accurate annotations, accelerated file transfers for fast uploading and downloading of media, 100GB of dedicated Frame.io cloud storage, the ability to work on up to 5 different projects with another user, free sharing with an unlimited number of reviewers, and Camera to Cloud.
I generally really enjoyed HBO’s Peacemaker series—albeit, as I told the kids, if even I found the profanity excessive, insofar as “too much salt spoils the soup.” I really enjoyed the whacked-out intro music & choreography:
Here the creators give a peek into how it was made:
And here a dance troupe in Bangladesh puts their spin on it:
There’s no way this is real, is there?! I think it must use NFW technology (No F’ing Way), augmented with a side of LOL WTAF. 😛
Here’s an NYT video showing the system in action:
The NYT article offers a concise, approachable description of how the approach works:
A neural network learns skills by analyzing large amounts of data. By pinpointing patterns in thousands of avocado photos, for example, it can learn to recognize an avocado. DALL-E looks for patterns as it analyzes millions of digital images as well as text captions that describe what each image depicts. In this way, it learns to recognize the links between the images and the words.
When someone describes an image for DALL-E, it generates a set of key features that this image might include. One feature might be the line at the edge of a trumpet. Another might be the curve at the top of a teddy bear’s ear.
Then, a second neural network, called a diffusion model, creates the image and generates the pixels needed to realize these features. The latest version of DALL-E, unveiled on Wednesday with a new research paper describing the system, generates high-resolution images that in many cases look like photos.
Though DALL-E often fails to understand what someone has described and sometimes mangles the image it produces, OpenAI continues to improve the technology. Researchers can often refine the skills of a neural network by feeding it even larger amounts of data.
A big part of my rationale in going to Google eight (!) years ago was that a lot of creativity & expressivity hinge on having broad, even mind-of-God knowledge of one’s world (everywhere you’ve been, who’s most important to you, etc.). Given access to one’s whole photo corpus, a robot assistant could thus do amazing things on one’s behalf.
In that vein, MyStyle proposes to do smarter face editing (adjusting expressions, filling in gaps, upscaling) by being trained on 100+ images of an individual face. Check it out:
“Lost your keys? Lost your job?” asks illustrator Don Moyer. “Look at the bright side. At least you’re not plagued by pterodactyls, pursued by giant robots, or pestered by zombie poodles. Life is good!”
Once the deal closes, BRIO XR will be joining an unparalleled community of engineers and product experts at Adobe – visionaries who are pushing the boundaries of what’s possible in 3D and immersive creation. Our BRIO XR team will contribute to Adobe’s Creative Cloud 3D authoring and experience design teams. Simply put, Adobe is the place to be, and in fact, it’s a place I’ve long set my sights on joining.
Adam Buxton recorded a conversation with his 5-year-old daughter discussing her thoughts on Princess Leia’s famous slave outfit. She is hilarious by herself but when he got The Brothers McLeod to animate her words, it all turned into pure comedic gold.
Can machines generate art like a human would? They already are.
Join us on March 30th, at 9AM Pacific for a live chat about what’s on the frontier of machine learning and art. Our team of panelists will break down how text prompts in machine learning models can create artwork like a human might, and what it all means for the future of artistic expression.
Aaron Hertzmann is a Principal Scientist at Adobe, Inc., and an Affiliate Professor at University of Washington. He received a BA in Computer Science and Art & Art History from Rice University in 1996, and a PhD in Computer Science from New York University in 2001. He was a professor at the University of Toronto for 10 years, and has worked at Pixar Animation Studios and Microsoft Research. He has published over 100 papers in computer graphics, computer vision, machine learning, robotics, human-computer interaction, perception, and art. He is an ACM Fellow and an IEEE Fellow.
Ryan is a Machine Learning Engineer/Researcher at Adobe with a focus on multimodal image editing. He has been creating generative art using machine learning for years, but is most known for his recent work with CLIP for text-to-image systems. With a Bachelor’s in Psychology from the University of Utah, he is largely self-taught.
We are excited to announce that Photoshop now has full support for the WebP file format! WebP files can now be opened, created, edited, and saved in Photoshop without the need for a plug-in or preference setting.
To open a WebP file, simply select and open the file in the same manner as you would any other supported file or document. In addition to open capabilities, you can now create, edit, and save WebP files. Once you are done editing your document, open Save As or Save a Copy and select WebP from the options provided in the file format drop-down menu to save your WebP file.
V7 Labs has created a new artificial intelligence-based (AI) software that works as a Google Chrome extension that is capable of detecting artificially generated profile pictures — like the ones above — with a claimed 99.28% accuracy.
Creator Alberto Rizzoli walks through the flow in this video (more detailed than the one below).
[Adobe] announced a tool that allows consumers to point their phone at a product image on an ecommerce site—and then see the item rendered three-dimensionally in their living space. Adobe says the true-to-life size precision—and the ability to pull multiple products into the same view—set its AR service apart from others on the market. […]
Chang Xiao, the Adobe research scientist who created the tool, said many of the AR services currently on the market provide only rough estimations of the size of the product. Adobe is able to encode dimensions information in its invisible marker code embedded in the photos, which its computer vision algorithms can translate into more precisely sized projections.