While we’re all still getting our heads around the 2D image-generation magic of DALL•E, Imagen, MidJourney, and more, Google researchers are stepping into a new dimension as well with Dream Fields—synthesizing geometry simply from words.
“Not a single keyframe of animation was set in the making of the title, created by tweaking and bending the alignment knobs of a vintage TV,” writes Anthony Vitagliano. “Instead, I shot it using a vintage Montgomery Ward ‘Airline’ Portable Television, an iPhone, and a patchwork of cables and converters in my basement.”
Check out the results:
See Anthony’s site for high-res captures of the frames.
I’ve long considered augmented reality apps to be “realtime Photoshop”—or perhaps more precisely, “realtime After Effects.” I think that’s true & wonderful, but most consumer AR tends to be ultra-confined filters that produce ~1 outcome well.
Walking around San Francisco today, it struck me today that DALL•E & other emerging generative-art tools could—if made available via a simple mobile UI—offer a new kind of (almost) realtime Photoshop, with radically greater creative flexibility.
Here I captured a nearby sculpture, dropped out the background in Photoshop, uploaded it to DALL•E, and requested “a low-polygon metallic tree surrounded by big dancing robots and small dancing robots.” I like the results!
I’m suddenly craving a mobile #dalle app that lets me photograph things, select them/backgrounds, and then inpaint with prompts. Here’s a quick experiment based on a “tree” I just saw 🤖: pic.twitter.com/Sx3LAACOVs
Hard on the heels of OpenAI revealing DALL•E 2 last month, Google has announced Imagen, promising “unprecedented photorealism × deep level of language understanding.” Unlike DALL•E, it’s not yet available via a demo, but the sample images (below) are impressive.
I’m slightly amused to see Google flexing on DALL•E by highlighting Imagen’s strengths in figuring out spatial arrangements & coherent text (places where DALL•E sometimes currently struggles). The site claims that human evaluators rate Imagen output more highly than what comes from competitors (e.g. MidJourney).
I couldn’t be more excited about these developments—most particularly to figure out how such systems can enable amazing things in concert with Adobe tools & users.
I’ve long admired President Obama’s official portrait, but I haven’t known much about Kehinde Wiley. I enjoyed this brief peek into his painting process:
With reporting from 250 locations around the world, AP is a key addition to the CAI’s mission to help consumers everywhere better understand the provenance and attribution of images and video.
“We are pleased to join the CAI in its efforts to combat misinformation and disinformation around photojournalism,” said AP Director of Photography David Ake. “AP has worked to advance factual reporting for over 175 years. Teaming up to help ensure the authenticity of images aligns with that mission.”
We are building some rad stuff (seriously, I wish I could show you already) and would love to have you join us:
We are looking for a versatile and passionate Senior Developer to join us and help drive full stack, complex component implementation. You’ll play a key role in architectural discussions, defining solutions, and solving highly technical issues. Our team builds both cloud services (Python, and C++) and web experiences (Javascript, typescript, web Components, etc …) . The winning candidate for this high impact role requires a deep knowledge about cloud-based architectures as well as a solid CS fundamentals.
Some key responsibilities:
Architect efficient and reusable full-stack systems that can support several different deep learned models
Design, architect, and implement multiple low-latency micro-services (we mostly use JavaScript, C++, and Python)
Building simple, robust, and scalable platforms used by many external users
Work closely with UX designers, Product managers, Machine Learning engineers to develop compelling experiences
Take a project from scoping requirements through the actual launch
This may be the most accessible overall intro & discussion I’ve seen, and it’s chock full of fun example output.
Even the system’s frequent text “fails” are often charmingly bizarre—like a snapshot of a dream that makes sense only while dreaming. Some faves from the vid above:
Building on yesterday’s post about Google’s new Geospatial API, developers can now embed a live view featuring a camera feed + augmentations, and developers like Bird are wasting no time in putting it to use. TNW writes,
When parking a scooter, the app prompts a rider to quickly scan the QR code on the vehicle and its surrounding area using their smartphone camera… [T]his results in precise, centimeter-level geolocation that enables the system to detect and prevent improper parking with extreme accuracy — all while helping monitor user behavior.
Out of the over 200 cities that Lime serves, its VPS is live now in six: London, Paris, Tel Aviv, Madrid, San Diego and Bordeaux. Similar to Bird, Lime’s pilot involves testing the tech with a portion of riders. The company said results from its pilots have been promising, with those who used the new tool seeing a 26% decrease in parking errors compared to riders who didn’t have the tool enabled.
My friend Bilawal & I collaborated on AR at Google, including our efforts to build a super compact 3D engine for driving spatial annotation & navigation. We’d often talk excitedly about location-based AR experiences, especially the Landmarker functionality arriving in Snapchat. All the while he’s been busy pushing the limits of photogrammetry (including putting me in space!) to scan 3D objects.
Now I’m delighted to see him & his team unveiling the Geospatial API (see blog post, docs, and code), which enables cross-platform (iOS, Android) deployment of experiences that present both close-up & far-off augmentations. Here’s the 1-minute sizzle reel:
For a closer look, check out this interesting deep dive into what it offers & how it works:
Congrats to my old teammates in Research & AR on getting to unveil the incredible magic on which they’ve been working hard for years:
This is what I’ve been talking about:
This visually simple stuff (navigation, translation, notifications, etc.) is what will drive AR glasses value—not 3D whales jumping out of basketball courts, etc. https://t.co/5SQzyi56yP
Hmm—dunno whether I’d prefer carrying this little dude over just pocketing a battery pack or two—but I dig the idea & message:
Once set up on its tripod, the 3-pound, 40-watt device automatically rotates towards the wind and starts charging its 5V, 12,000 mAh battery. (Alternatively it can charge your device directly via USB.) The company says that in peak conditions, the Shine Turbine can generate enough juice to charge a smartphone in just 20 minutes.
Heh—I got a kick out of seeing how AI would go about hallucinating its idea of what my flamed-out ’84 Volvo wagon looked like. See below for a comparison. And in retrospect, how did I not adorn mine with a tail light made from a traffic cone (or is it giant candy corn?) and “VOOFO NACK”? 😅
Not yet having access to this system [taps mic impatiently], I’m just checking out its simple but effective interface from afar. Here’s how artists can designate specific regions in order to repopulate them:
Greetings from the galactic core, to which my friend Bilawal has dispatched me by editing the 3D model he made from drone-selfie footage that I recorded last year:
Among the Google teams working on augmented reality, there was a low-key religious war about the importance of “metric scale” (i.e. matching real-world proportions 1:1). The ARCore team believed it was essential (no surprise, given their particular tech stack), while my team (Research) believed that simply placing things in the world with a best guess as to size, then letting users adjust an object if needed, was often the better path.
I thought of this upon seeing StreetEasy’s new AR tech for apartment-hunting in NYC. At the moment it lets you scan a building to see its inventory. That’s very cool, but my mind jumped to the idea of seeing 3D representations of actual apartments (something the company already offers, albeit not in AR), and I’m amused to think of my old Manhattan place represented in AR: drawing it as a tiny box at one’s feet would be metric scale. 😅 My God that place sucked. Anyway, we’ll see how useful this tech proves & where it can go from here.
“A StreetEasy Instagram poll found that 95% of people have walked past an apartment building and wondered if it has an available unit that meets their criteria. At the same time, 77% have had trouble identifying a building’s address to search for later.”
I got a rude awakening a couple of years ago while working in Google’s AR group: the kind of displays that could fit into “glasses that look like glasses” (i.e. not Glass-style unicorn protuberances) had really tiny fields of view, crummy resolution, short battery life, and more. I knew that my efforts to enable cloud-raytraced Volvos & Stormtroopers & whatnot wouldn’t last long in a world that prioritized Asteroids-quality vector graphics on a display the size of a 3″x5″ index card held at arm’s length.
Having been out of that world for a year+ now, I have no inside info on how Google’s hardware efforts have been evolving, but I’m glad to see that they’re making a serious (billion-dollar+) investment in buying more compelling display tech. Per The Verge,
According to Raxium’s website, a Super AMOLED screen on your phone has a pixel pitch (the distance between the center of one pixel, and the center of another pixel next to it) of about 50 microns, while its MicroLED could manage around 3.5 microns. It also boasts of “unprecedented efficiency” that’s more than five times better than any world record.
How does any of this compare to what we’ll see out of Apple, Meta, Snap, etc.? I have no idea, but at least parts of the future promise to be fun.
2 new classes this summer on Machine Learning Art for Designers w/@dvsch – 8 weekly online classes on Thursday evenings starting June 23 & a short workshop June 20 Generating Images from Text (on campus & online) @cooperunionhttps://t.co/MYUeXE0fAmpic.twitter.com/nrvJZvKk3f
Each week we’ll cover a different aspect of machine learning. A short lecture covering theories and practices will be followed by demoes using open source web tools and a web-browser tool called Google Colab. The last 3 weeks of class you’ll be given the chance to create your own project using the skills you’ve learned. Topics will include selecting the right model for your use case, gathering and manipulating datasets, and connecting your models to data sources such as audio, text, or numerical data. We’ll also talk a little ethics, because we can’t teach machine learning without a little ethics.
I really enjoyed this conversation—touching, as it does, on my latest fascination (AI-generated art via DALL•E) and myriad other topics. In fact, I plan to listen to it again—hopefully this time near a surface through which to jot down & share some of the most resonant observations. Meanwhile, I think you’ll find it thoughtful & stimulating.
In this episode of the podcast, Sam Harris speaks with Eric Schmidt about the ways artificial intelligence is shifting the foundations of human knowledge and posing questions of existential risk.
I really enjoyed this highly accessible overview from one of the creators of this game-changing engine, Aditya Ramesh:
The methods underlying DALL·E 2 are conceptually simple, and I thought people might find it interesting to understand how it works. Here's an explanation accessible to a non-ML audience: https://t.co/FOSAiL6YgF
Last year I took my then-11yo son Henry (aka my astromech droid) on a 2000-mile “Miodyssey” down Route 66 in my dad’s vintage Miata. It was a great way to see the country (see more pics & posts than you might ever want), and despite the tight quarters we managed not to kill one another—or to get slain by Anton Chigurh in an especially murdery Texas town (but that’s another story!).