Among the great pleasures of this year’s revolutions in AI imaging has been the chance to discover & connect with myriad amazing artists & technologists. I’ve admired the work of Nathan Shipley, so I was delighted to connect him with my self-described “grand-mentee” Joanne Jang, PM for DALL•E. Nathan & his team collaborated with the Dalí Museum & OpenAI to launch Dream Tapestry, a collaborative realtime art-making experience.
The Dream Tapestry allows visitors to create original, realistic Dream Paintings from a text description. Then, it stitches a visitor’s Dream Painting together with five other visitors’ paintings, filling in the spaces between them to generate one collective Dream Tapestry. The result is an ever-growing series of entirely original Dream Tapestries, exhibited on the walls of the museum.
Whew—no more wheedling my “grand-mentee” Joanne on behalf of colleagues wanting access. 😅
Starting today, we are removing the waitlist for the DALL·E beta so users can sign up and start using it immediately. More than 1.5M users are now actively creating over 2M images a day with DALL·E—from artists and creative directors to authors and architects—with over 100K users sharing their creations and feedback in our Discord community.
We are currently testing a DALL·E API with several customers and are excited to soon offer it more broadly to developers and businesses so they can build apps on this powerful system.
It’s hard to overstate just how much this groundbreaking technology has rocked our whole industry—all since publicly debuting less than 6 months ago! Congrats to the whole team. I can’t wait to see what they’re cooking up next.
Karen X. Cheng & pals (including my friend August Kamp) went to work extending famous works by Vermeer, Da Vinci, and Magritte, then placing them into AR filter (which you can launch from the post) that lets you walk right into the scenes. Wild!
Let the canvases extend in every direction! The thoughtfully designed new tiling UI makes it easy to synthesize adjacent chunks in sequence, partly overcoming current resolution limits in generative imaging:
Here’s a nice little demo from our designer Davis Brown, who takes his dad Russell’s surreal desert explorations to totally new levels:
Ever since DALL•E hit the scene, I’ve been wanting to know what words its model for language-image pairing would use to describe images:
Now the somewhat scarily named CLIP Interrogator promises exactly that kind of insight:
What do the different OpenAI CLIP models see in an image? What might be a good text prompt to create similar images using CLIP guided diffusion or another text to image model? The CLIP Interrogator is here to get you answers!
Here’s hoping it helps us get some interesting image -> text -> image flywheels spinning.
Though we don’t (yet?) have the ability to use 3D meshes (e.g. those generated from a photo of a person) to guide text-based synthesis through systems like DALL•E, here’s a pretty compelling example of making 2D art, then wrapping it onto a body in real time:
“This emerging tech isn’t perfect yet, so we got some weird results along with ones that looked like Heinz—but that was part of the fun. We then started plugging in ketchup combination phrases like ‘impressionist painting of a ketchup bottle’ or ‘ketchup tarot card’ and the results still largely resembled Heinz. We ultimately found that no matter how we were asking, we were still seeing results that looked like Heinz.”
It’s cool & commendable to see OpenAI making improvements in the tricky area of increasing representation & diversity among the humans it depicts. From email they sent today:
DALL·E now generates images of people that more accurately reflect the diversity of the world’s population. Thank you to everyone who has marked results as biased in our product; your feedback helped inform and evaluate this new mitigation, which we plan on refining as we gather more data and feedback.
Synthesizing wholly new images is incredible, but as I noted my recent podcast conversation, it may well be that surgical slices of tech like DALL•E will prove to be just as impactful—a la Content-Aware Fill emerging from a thin slice of the PatchMatch paper. In this case,
To fix the image, [Nicholas Sherlock] erased the blurry area of the ladybug’s body and then gave a text prompt that reads “Ladybug on a leaf, focus stacked high-resolution macro photograph.”
A keen eye will note that the bug’s spot pattern has changed, but it’s still the same bug. Pretty amazing.
I was thinking back yesterday to Ira Glass’s classic observations on the (productive) tension that comes from having developed a sense of taste but not yet the skills to create accordingly:
Independently I came across this encouraging tweet from digital artist Claire Silver:
As it happens, Claire’s Twitter bio includes the phrase “Taste is the new skill.” I’ve been thinking along these lines as tools like DALL•E & Imagen suddenly grant mass access to what previously required hard-won skill. When mechanical execution is taken largely off the table, what’s left? Maybe the sum total of your curiosity & life’s experiences—your developed perspective, your taste—is what sets you apart, making you you, letting you pair that uniqueness with better execution tools & thereby stand out. At least, y’know, until the next big language model drops. 🙃
I’ve gathered links to some of the topics we discussed:
Don’t Give Your Users Shit Work. Seriously. But knowing just where to draw the line between objectively wasteful crap (e.g. tedious file format conversion) and possibly welcome labor (e.g. laborious but meditative etching) isn’t always easy. What happens when you skip the proverbial 10,000 hours of practice required to master a craft? What happens when everyone in the gym is now using a mech suit that lifts 10,000 lbs.?
“Vemödalen: The Fear That Everything Has Already Been Done,” is demonstrated with painful hilarity via accounts like Insta Repeat. (And to make it meta, there’s my repetition of the term.) “So we beat on, boats against the current, borne back ceaselessly into the past…” Or as Marshawn Lynch might describe running through one’s face, “Over & over, and over & over & over…”
The disruption always makes me think of The Onion’s classic “Dolphins Evolve Opposable Thumbs“: “Holy f*ck, that’s it for us monkeys.” My new friend August replied with the armed dolphin below. 💪👀
A group of thoughtful creators recently mused on “What AI art means for human artists.” Like me, many of them likened this revolution to the arrival of photography in the 19th century. It immediately devalued much of what artists had labored for years to master—yet in doing so it freed them up to interpret the world more freely (think Impressionism, Cubism, etc.).
Content-Aware Fill was born from the amazing PatchMatch technology (see video). We got it into Photoshop by stripping it down to just one piece (inpainting), and I foresee similar streamlined applications of the many things DALL•E-type tech can do (layout creation, style transfer, and more).
Longtime generative artist Mario Klingemann used GPT-3 to coin a name for Promptomancy. I wonder how long these incantations & koans will remain central, and how quickly we’ll supplement or even supplant them with visual affordances (presets, sliders, grids, etc.).
O.C.-actor-turned-author Ben McKenzie wrote a book on crypto that promises to be sharp & entertaining, based on the interviews with him I’ve heard.
Obviously I’m almost criminally obsessed with DALL•E et al. (sorry if you wanted to see my normal filler here 😌). Here’s an accessible overview of how we got here & how it all works:
The vid below gathers a lot of emerging thoughts from sharp folks like my teammate Ryan Murdock & my friend Mario Klingemann. “Maybe the currency is ideas [vs. execution]. This is a future where everyone is an art director,” says Rob Sheridan. Check it out:
The technology’s ability not only to synthesize new content, but to match it to context, blows my mind. Check out this thread showing the results of filling in the gap in a simple cat drawing via various prompts. Some of my favorites are below:
While we’re all still getting our heads around the 2D image-generation magic of DALL•E, Imagen, MidJourney, and more, Google researchers are stepping into a new dimension as well with Dream Fields—synthesizing geometry simply from words.
I’ve long considered augmented reality apps to be “realtime Photoshop”—or perhaps more precisely, “realtime After Effects.” I think that’s true & wonderful, but most consumer AR tends to be ultra-confined filters that produce ~1 outcome well.
Walking around San Francisco today, it struck me today that DALL•E & other emerging generative-art tools could—if made available via a simple mobile UI—offer a new kind of (almost) realtime Photoshop, with radically greater creative flexibility.
Here I captured a nearby sculpture, dropped out the background in Photoshop, uploaded it to DALL•E, and requested “a low-polygon metallic tree surrounded by big dancing robots and small dancing robots.” I like the results!
I’m suddenly craving a mobile #dalle app that lets me photograph things, select them/backgrounds, and then inpaint with prompts. Here’s a quick experiment based on a “tree” I just saw 🤖: pic.twitter.com/Sx3LAACOVs
Hard on the heels of OpenAI revealing DALL•E 2 last month, Google has announced Imagen, promising “unprecedented photorealism × deep level of language understanding.” Unlike DALL•E, it’s not yet available via a demo, but the sample images (below) are impressive.
I’m slightly amused to see Google flexing on DALL•E by highlighting Imagen’s strengths in figuring out spatial arrangements & coherent text (places where DALL•E sometimes currently struggles). The site claims that human evaluators rate Imagen output more highly than what comes from competitors (e.g. MidJourney).
I couldn’t be more excited about these developments—most particularly to figure out how such systems can enable amazing things in concert with Adobe tools & users.
Heh—I got a kick out of seeing how AI would go about hallucinating its idea of what my flamed-out ’84 Volvo wagon looked like. See below for a comparison. And in retrospect, how did I not adorn mine with a tail light made from a traffic cone (or is it giant candy corn?) and “VOOFO NACK”? 😅
Not yet having access to this system [taps mic impatiently], I’m just checking out its simple but effective interface from afar. Here’s how artists can designate specific regions in order to repopulate them:
I really enjoyed this conversation—touching, as it does, on my latest fascination (AI-generated art via DALL•E) and myriad other topics. In fact, I plan to listen to it again—hopefully this time near a surface through which to jot down & share some of the most resonant observations. Meanwhile, I think you’ll find it thoughtful & stimulating.
In this episode of the podcast, Sam Harris speaks with Eric Schmidt about the ways artificial intelligence is shifting the foundations of human knowledge and posing questions of existential risk.
My old boss on Photoshop, Kevin Connor, used to talk about the inexorable progression of imaging tools from the very general (e.g. the Clone Stamp) to the more specific (e.g. the Healing Brush). In the process, high-complexity, high-skill operations were rendered far more accessible—arguably to a fault. (I used to joke that believe it or not, drop shadows were cool before Photoshop made them easy. ¯\_(ツ)_/¯)
I think of that observation when seeing things like the Face Swap tool from Icons8. What once took considerable time & talent in an app like Photoshop is now rendered trivially fast (and free!) to do. “Days of Miracles & Wonder,” though we hardly even wonder now. (How long will it take DALL•E to go from blown minds to shrugged shoulders? But that’s a subject for another day.)
There’s no way this is real, is there?! I think it must use NFW technology (No F’ing Way), augmented with a side of LOL WTAF. 😛
Here’s an NYT video showing the system in action:
The NYT article offers a concise, approachable description of how the approach works:
A neural network learns skills by analyzing large amounts of data. By pinpointing patterns in thousands of avocado photos, for example, it can learn to recognize an avocado. DALL-E looks for patterns as it analyzes millions of digital images as well as text captions that describe what each image depicts. In this way, it learns to recognize the links between the images and the words.
When someone describes an image for DALL-E, it generates a set of key features that this image might include. One feature might be the line at the edge of a trumpet. Another might be the curve at the top of a teddy bear’s ear.
Then, a second neural network, called a diffusion model, creates the image and generates the pixels needed to realize these features. The latest version of DALL-E, unveiled on Wednesday with a new research paper describing the system, generates high-resolution images that in many cases look like photos.
Though DALL-E often fails to understand what someone has described and sometimes mangles the image it produces, OpenAI continues to improve the technology. Researchers can often refine the skills of a neural network by feeding it even larger amounts of data.