Our friend Christian Cantrell (20-year Adobe vet, now VP of Product at Stability.ai) continues his invaluable world to plug the world of generative imaging directly into Photoshop. Check out the latest, available for free here:
Category Archives: DALL•E
Dalí meets DALL•E! 👨🏻🎨🤖
Among the great pleasures of this year’s revolutions in AI imaging has been the chance to discover & connect with myriad amazing artists & technologists. I’ve admired the work of Nathan Shipley, so I was delighted to connect him with my self-described “grand-mentee” Joanne Jang, PM for DALL•E. Nathan & his team collaborated with the Dalí Museum & OpenAI to launch Dream Tapestry, a collaborative realtime art-making experience.
The Dream Tapestry allows visitors to create original, realistic Dream Paintings from a text description. Then, it stitches a visitor’s Dream Painting together with five other visitors’ paintings, filling in the spaces between them to generate one collective Dream Tapestry. The result is an ever-growing series of entirely original Dream Tapestries, exhibited on the walls of the museum.
Check it out:
DALL•E arrives in Photoshop via the Flying Dog panel
I haven’t yet gotten to try this integration, but I’m excited to see it arrive.
🌿 Eat your heart out, Homer Simpson 🌿
Here’s another beautiful, DALL•E-infused collaboration between VFX whiz Paul Trillo & Shyama Golden:
DALL•E is now available to everyone
Whew—no more wheedling my “grand-mentee” Joanne on behalf of colleagues wanting access. 😅
Starting today, we are removing the waitlist for the DALL·E beta so users can sign up and start using it immediately. More than 1.5M users are now actively creating over 2M images a day with DALL·E—from artists and creative directors to authors and architects—with over 100K users sharing their creations and feedback in our Discord community.
You can sign up here. Also exciting:
We are currently testing a DALL·E API with several customers and are excited to soon offer it more broadly to developers and businesses so they can build apps on this powerful system.
It’s hard to overstate just how much this groundbreaking technology has rocked our whole industry—all since publicly debuting less than 6 months ago! Congrats to the whole team. I can’t wait to see what they’re cooking up next.
AR: Stepping inside famous paintings with a boost from DALL•E
Karen X. Cheng & pals (including my friend August Kamp) went to work extending famous works by Vermeer, Da Vinci, and Magritte, then placing them into AR filter (which you can launch from the post) that lets you walk right into the scenes. Wild!
“Little Simple Creatures”: Family & game art-making with DALL•E
Creative director Wes Phelan shared this charming little summary of how he creates kids’ books & games using DALL•E, including their newly launched outpainting support:
John Oliver gets DALL•E-pilled
Judi Dench fighting a centaur on the moon!
Happy Friday. 😅
DALL•E outpainting arrives
Let the canvases extend in every direction! The thoughtfully designed new tiling UI makes it easy to synthesize adjacent chunks in sequence, partly overcoming current resolution limits in generative imaging:
Here’s a nice little demo from our designer Davis Brown, who takes his dad Russell’s surreal desert explorations to totally new levels:
Using DALL•E for generative fashion design
Amazing work from the always clever Karen X. Cheng, collaborating with Paul Trillo & others:
View this post on Instagram
Speaking of Paul here’s a fun new little VFX creation made using DALL•E:
AI is going to change VFX. This is a silly little experiment but it shows how powerful dall-e 2 is in generating elements into a pre existing video. These tools will become easier to use so when spectacle becomes cheap, ideas will prevail#aiart #dalle #ufo @openaidalle #dalle2 pic.twitter.com/XGHy9uY09H— Paul Trillo (@paultrillo) August 30, 2022
CLIP interrogator reveals what your robo-artist assistant sees
Ever since DALL•E hit the scene, I’ve been wanting to know what words its model for language-image pairing would use to describe images:
Now the somewhat scarily named CLIP Interrogator promises exactly that kind of insight:
What do the different OpenAI CLIP models see in an image? What might be a good text prompt to create similar images using CLIP guided diffusion or another text to image model? The CLIP Interrogator is here to get you answers!
Here’s hoping it helps us get some interesting image -> text -> image flywheels spinning.
AE + DALL•E = Concept car madness
More wildly impressive inpainting & animation from Paul Trillo:
DALL•E + Snapchat = Clothing synthesis + try-on
Though we don’t (yet?) have the ability to use 3D meshes (e.g. those generated from a photo of a person) to guide text-based synthesis through systems like DALL•E, here’s a pretty compelling example of making 2D art, then wrapping it onto a body in real time:
Asked #dalle2 to generate some jeans look in a style of Gustav Klimt, then put it on cloth template from the latest workshop from @SnapAR ✨👖 pic.twitter.com/lUH0YSqB1t
— Maxiм (@maximkuzlin) August 3, 2022
Ketchup goes AI…? Heinz puts DALL•E to work
Interesting, and of course inevitable:
“This emerging tech isn’t perfect yet, so we got some weird results along with ones that looked like Heinz—but that was part of the fun. We then started plugging in ketchup combination phrases like ‘impressionist painting of a ketchup bottle’ or ‘ketchup tarot card’ and the results still largely resembled Heinz. We ultimately found that no matter how we were asking, we were still seeing results that looked like Heinz.”
Pass the Kemp!
[Via Aaron Hertzmann]
More DALL•E + After Effects magic
Creator Paul Trillo (see previous) is back at it. Here’s new work + a peek into how it’s made:
Kids swoon as DALL•E brings their ideas into view
Nicely done; can’t wait to see more experiences like this.
Animated magic made via DALL•E + After Effects
DALL•E now depicts greater diversity
It’s cool & commendable to see OpenAI making improvements in the tricky area of increasing representation & diversity among the humans it depicts. From email they sent today:
DALL·E now generates images of people that more accurately reflect the diversity of the world’s population. Thank you to everyone who has marked results as biased in our product; your feedback helped inform and evaluate this new mitigation, which we plan on refining as we gather more data and feedback.
People have been noticing & sharing examples, e.g. via this Reddit thread.
[Update: See their blog post for more details & examples.]
Using DALL•E to sharpen macro photography 👀
Synthesizing wholly new images is incredible, but as I noted my recent podcast conversation, it may well be that surgical slices of tech like DALL•E will prove to be just as impactful—a la Content-Aware Fill emerging from a thin slice of the PatchMatch paper. In this case,
To fix the image, [Nicholas Sherlock] erased the blurry area of the ladybug’s body and then gave a text prompt that reads “Ladybug on a leaf, focus stacked high-resolution macro photograph.”
A keen eye will note that the bug’s spot pattern has changed, but it’s still the same bug. Pretty amazing.
“Taste is the new skill” in the age of DALL•E
I was thinking back yesterday to Ira Glass’s classic observations on the (productive) tension that comes from having developed a sense of taste but not yet the skills to create accordingly:
Independently I came across this encouraging tweet from digital artist Claire Silver:
As it happens, Claire’s Twitter bio includes the phrase “Taste is the new skill.” I’ve been thinking along these lines as tools like DALL•E & Imagen suddenly grant mass access to what previously required hard-won skill. When mechanical execution is taken largely off the table, what’s left? Maybe the sum total of your curiosity & life’s experiences—your developed perspective, your taste—is what sets you apart, making you you, letting you pair that uniqueness with better execution tools & thereby stand out. At least, y’know, until the next big language model drops. 🙃
New podcast: DALL•E & You & Me
On Friday I had a ball chatting with Brian McCullough and Chris Messina on the arrival of DALL•E & other generative-imaging tech on the Techmeme Ride Home podcast. The section intro begins at 31:30, with me chiming in at 35:45 & riffing for about 45 minutes. I hope you enjoy listening as much as I enjoy talking (i.e. one heck of a lot 😅), and I’d love to know what you think.
I’ve gathered links to some of the topics we discussed:
- Don’t Give Your Users Shit Work. Seriously. But knowing just where to draw the line between objectively wasteful crap (e.g. tedious file format conversion) and possibly welcome labor (e.g. laborious but meditative etching) isn’t always easy. What happens when you skip the proverbial 10,000 hours of practice required to master a craft? What happens when everyone in the gym is now using a mech suit that lifts 10,000 lbs.?
- “Vemödalen: The Fear That Everything Has Already Been Done,” is demonstrated with painful hilarity via accounts like Insta Repeat. (And to make it meta, there’s my repetition of the term.) “So we beat on, boats against the current, borne back ceaselessly into the past…” Or as Marshawn Lynch might describe running through one’s face, “Over & over, and over & over & over…”
- As Louis CK deftly noted, “Everything is amazing & nobody’s happy.”
- The disruption always makes me think of The Onion’s classic “Dolphins Evolve Opposable Thumbs“: “Holy f*ck, that’s it for us monkeys.” My new friend August replied with the armed dolphin below. 💪👀
- A group of thoughtful creators recently mused on “What AI art means for human artists.” Like me, many of them likened this revolution to the arrival of photography in the 19th century. It immediately devalued much of what artists had labored for years to master—yet in doing so it freed them up to interpret the world more freely (think Impressionism, Cubism, etc.).
- Content-Aware Fill was born from the amazing PatchMatch technology (see video). We got it into Photoshop by stripping it down to just one piece (inpainting), and I foresee similar streamlined applications of the many things DALL•E-type tech can do (layout creation, style transfer, and more).
- StyleCLIP is my team’s effort to edit faces via text by combining OpenAI’s CLIP (part of DALL•E’s magic sauce) with NVIDIA’s StyleGAN. You can try it out here.
- Longtime generative artist Mario Klingemann used GPT-3 to coin a name for Promptomancy. I wonder how long these incantations & koans will remain central, and how quickly we’ll supplement or even supplant them with visual affordances (presets, sliders, grids, etc.).
- O.C.-actor-turned-author Ben McKenzie wrote a book on crypto that promises to be sharp & entertaining, based on the interviews with him I’ve heard.
- Check out the DALL•E-made 3D Lego Teslas that, at a glance, fooled longtime Pixar vet Guido Quaroni. I also love these gibberish-filled ZZ Top concert posters.
- My grand-mentee (!) Joanne is the PM for DALL•E.
- Bill Atkinson created MacPaint, blowing my 1984 mind with breakthroughs like flood fill. The arrival of DALL•E feels so similar.
Two-Minute Papers dives into Google Imagen: “Outrageously Good!”
“I promised myself that I’d try not to flip out, but… holy mother of papers, look at that!” Take ‘er away, doc:
Here’s a great illustration of its text-handling chops:
DALL•E text as Italian comedic gibberish
Amidst my current obsession with AI-generated images, I’ve been particularly charmed by DALL•E’s penchant for rendering some delightfully whacked-out text, as in these concert posters:
This reminded me of an old Italian novelty song meant to show non-native English speakers what the language sounds like to non-speakers. Enjoy. 😛
Famous Paintings Expanded With DALL•E
Inpainting FTW! Check out a whole gallery of experiments, wherein the regions surrounding the original artwork are hallucinated via AI.
Just scratching the surface on generative inpainting
I’m having a ball asking the system to create illustrations, after which I can select regions and generate new variations. Click/tap if needed to play the animation below:
It’s a lot of fun for photorealistic work, too. Here I erased everything but Russell Brown’s face, then let DALL•E synthesize the rest:
And check out what it did with a pic of my wife & our friend (“two women surrounded by numerous sugar skulls and imagery from día de los muertos, in the style of salvador dalí, digital art”). 💃🏻💀👀
“The AI that creates any picture you want, explained”￼
Obviously I’m almost criminally obsessed with DALL•E et al. (sorry if you wanted to see my normal filler here 😌). Here’s an accessible overview of how we got here & how it all works:
The vid below gathers a lot of emerging thoughts from sharp folks like my teammate Ryan Murdock & my friend Mario Klingemann. “Maybe the currency is ideas [vs. execution]. This is a future where everyone is an art director,” says Rob Sheridan. Check it out:
[Via Dave Dobish]
“Content-Aware Fill… cubed”: DALL•E inpainting is nuts
The technology’s ability not only to synthesize new content, but to match it to context, blows my mind. Check out this thread showing the results of filling in the gap in a simple cat drawing via various prompts. Some of my favorites are below:
Also, look at what it can build out around just a small sample image plus a text prompt (a chef in a sushi restaurant); just look at it!
Google tech can generate 3D from text
“Skynet begins to learn at a geometric rate…”
While we’re all still getting our heads around the 2D image-generation magic of DALL•E, Imagen, MidJourney, and more, Google researchers are stepping into a new dimension as well with Dream Fields—synthesizing geometry simply from words.
Mobile DALL•E = My kind of location-based AR
I’ve long considered augmented reality apps to be “realtime Photoshop”—or perhaps more precisely, “realtime After Effects.” I think that’s true & wonderful, but most consumer AR tends to be ultra-confined filters that produce ~1 outcome well.
Walking around San Francisco today, it struck me today that DALL•E & other emerging generative-art tools could—if made available via a simple mobile UI—offer a new kind of (almost) realtime Photoshop, with radically greater creative flexibility.
Here I captured a nearby sculpture, dropped out the background in Photoshop, uploaded it to DALL•E, and requested “a low-polygon metallic tree surrounded by big dancing robots and small dancing robots.” I like the results!
I’m suddenly craving a mobile #dalle app that lets me photograph things, select them/backgrounds, and then inpaint with prompts. Here’s a quick experiment based on a “tree” I just saw 🤖: pic.twitter.com/Sx3LAACOVs
— John Nack (@jnack) May 27, 2022
A warp-speed tour of DALL•E synthesis, variations, and editing
For my ADD peeps, here’s the gist in ~60s:
Meet “Imagen,” Google’s new AI image synthesizer
What a time to be alive…
Hard on the heels of OpenAI revealing DALL•E 2 last month, Google has announced Imagen, promising “unprecedented photorealism × deep level of language understanding.” Unlike DALL•E, it’s not yet available via a demo, but the sample images (below) are impressive.
I’m slightly amused to see Google flexing on DALL•E by highlighting Imagen’s strengths in figuring out spatial arrangements & coherent text (places where DALL•E sometimes currently struggles). The site claims that human evaluators rate Imagen output more highly than what comes from competitors (e.g. MidJourney).
I couldn’t be more excited about these developments—most particularly to figure out how such systems can enable amazing things in concert with Adobe tools & users.
What a time to be alive…
Marques Brownlee talks DALL•E
This may be the most accessible overall intro & discussion I’ve seen, and it’s chock full of fun example output.
Even the system’s frequent text “fails” are often charmingly bizarre—like a snapshot of a dream that makes sense only while dreaming. Some faves from the vid above:
DALL•E vs. Reality: Flavawagon Edition
Heh—I got a kick out of seeing how AI would go about hallucinating its idea of what my flamed-out ’84 Volvo wagon looked like. See below for a comparison. And in retrospect, how did I not adorn mine with a tail light made from a traffic cone (or is it giant candy corn?) and “VOOFO NACK”? 😅
Quick demo: DALL•E 2 inpainting
Not yet having access to this system [taps mic impatiently], I’m just checking out its simple but effective interface from afar. Here’s how artists can designate specific regions in order to repopulate them:
[Via Cameron Smith]
AI: Sam Harris talks with Eric Schmidt
I really enjoyed this conversation—touching, as it does, on my latest fascination (AI-generated art via DALL•E) and myriad other topics. In fact, I plan to listen to it again—hopefully this time near a surface through which to jot down & share some of the most resonant observations. Meanwhile, I think you’ll find it thoughtful & stimulating.
In this episode of the podcast, Sam Harris speaks with Eric Schmidt about the ways artificial intelligence is shifting the foundations of human knowledge and posing questions of existential risk.
An illuminating peek inside DALL•E 2
I really enjoyed this highly accessible overview from one of the creators of this game-changing engine, Aditya Ramesh:
NASA celebrates Hubble’s 32nd birthday with a lovely photo of five clustered galaxies
Honestly, from DALL•E innovations to classic mind-blowers like this, I feel like my brain is cooking in my head. 🙃 Take ‘er away, science:
Bonus madness (see thread for details):
Famous logos recreated in Grotesque Middle Ages style
Heh—I love this kind of silly mashup. (And now I want to see what kind of things DALL•E would dream up for prompts like “medieval grotesque Burger King logo.”)
A free online face-swapping tool
My old boss on Photoshop, Kevin Connor, used to talk about the inexorable progression of imaging tools from the very general (e.g. the Clone Stamp) to the more specific (e.g. the Healing Brush). In the process, high-complexity, high-skill operations were rendered far more accessible—arguably to a fault. (I used to joke that believe it or not, drop shadows were cool before Photoshop made them easy. ¯\_(ツ)_/¯)
I think of that observation when seeing things like the Face Swap tool from Icons8. What once took considerable time & talent in an app like Photoshop is now rendered trivially fast (and free!) to do. “Days of Miracles & Wonder,” though we hardly even wonder now. (How long will it take DALL•E to go from blown minds to shrugged shoulders? But that’s a subject for another day.)
DALL•E 2 looks too amazing to be true
There’s no way this is real, is there?! I think it must use NFW technology (No F’ing Way), augmented with a side of LOL WTAF. 😛
Here’s an NYT video showing the system in action:
The NYT article offers a concise, approachable description of how the approach works:
A neural network learns skills by analyzing large amounts of data. By pinpointing patterns in thousands of avocado photos, for example, it can learn to recognize an avocado. DALL-E looks for patterns as it analyzes millions of digital images as well as text captions that describe what each image depicts. In this way, it learns to recognize the links between the images and the words.
When someone describes an image for DALL-E, it generates a set of key features that this image might include. One feature might be the line at the edge of a trumpet. Another might be the curve at the top of a teddy bear’s ear.
Then, a second neural network, called a diffusion model, creates the image and generates the pixels needed to realize these features. The latest version of DALL-E, unveiled on Wednesday with a new research paper describing the system, generates high-resolution images that in many cases look like photos.
Though DALL-E often fails to understand what someone has described and sometimes mangles the image it produces, OpenAI continues to improve the technology. Researchers can often refine the skills of a neural network by feeding it even larger amounts of data.
I can’t wait to try it out.