Category Archives: DALL•E

DALL•E + Snapchat = Clothing synthesis + try-on

Though we don’t (yet?) have the ability to use 3D meshes (e.g. those generated from a photo of a person) to guide text-based synthesis through systems like DALL•E, here’s a pretty compelling example of making 2D art, then wrapping it onto a body in real time:

Ketchup goes AI…? Heinz puts DALL•E to work

Interesting, and of course inevitable:

“This emerging tech isn’t perfect yet, so we got some weird results along with ones that looked like Heinz—but that was part of the fun. We then started plugging in ketchup combination phrases like ‘impressionist painting of a ketchup bottle’ or ‘ketchup tarot card’ and the results still largely resembled Heinz. We ultimately found that no matter how we were asking, we were still seeing results that looked like Heinz.”

Pass the Kemp!

[Via Aaron Hertzmann]

More DALL•E + After Effects magic

Creator Paul Trillo (see previous) is back at it. Here’s new work + a peek into how it’s made:

DALL•E now depicts greater diversity

It’s cool & commendable to see OpenAI making improvements in the tricky area of increasing representation & diversity among the humans it depicts. From email they sent today:

DALL·E now generates images of people that more accurately reflect the diversity of the world’s population. Thank you to everyone who has marked results as biased in our product; your feedback helped inform and evaluate this new mitigation, which we plan on refining as we gather more data and feedback.

People have been noticing & sharing examples, e.g. via this Reddit thread.

[Update: See their blog post for more details & examples.]

Using DALL•E to sharpen macro photography 👀

Synthesizing wholly new images is incredible, but as I noted my recent podcast conversation, it may well be that surgical slices of tech like DALL•E will prove to be just as impactful—a la Content-Aware Fill emerging from a thin slice of the PatchMatch paper. In this case,

To fix the image, [Nicholas Sherlock] erased the blurry area of the ladybug’s body and then gave a text prompt that reads “Ladybug on a leaf, focus stacked high-resolution macro photograph.”

A keen eye will note that the bug’s spot pattern has changed, but it’s still the same bug. Pretty amazing.

“Taste is the new skill” in the age of DALL•E

I was thinking back yesterday to Ira Glass’s classic observations on the (productive) tension that comes from having developed a sense of taste but not yet the skills to create accordingly:

Independently I came across this encouraging tweet from digital artist Claire Silver:

As it happens, Claire’s Twitter bio includes the phrase “Taste is the new skill.” I’ve been thinking along these lines as tools like DALL•E & Imagen suddenly grant mass access to what previously required hard-won skill. When mechanical execution is taken largely off the table, what’s left? Maybe the sum total of your curiosity & life’s experiences—your developed perspective, your taste—is what sets you apart, making you you, letting you pair that uniqueness with better execution tools & thereby stand out. At least, y’know, until the next big language model drops. 🙃

New podcast: DALL•E & You & Me

On Friday I had a ball chatting with Brian McCullough and Chris Messina on the arrival of DALL•E & other generative-imaging tech on the Techmeme Ride Home podcast. The section intro begins at 31:30, with me chiming in at 35:45 & riffing for about 45 minutes. I hope you enjoy listening as much as I enjoy talking (i.e. one heck of a lot 😅), and I’d love to know what you think.

Image credit: August Kamp + DALL•E

I’ve gathered links to some of the topics we discussed:

  • Don’t Give Your Users Shit Work. Seriously. But knowing just where to draw the line between objectively wasteful crap (e.g. tedious file format conversion) and possibly welcome labor (e.g. laborious but meditative etching) isn’t always easy. What happens when you skip the proverbial 10,000 hours of practice required to master a craft? What happens when everyone in the gym is now using a mech suit that lifts 10,000 lbs.?
  • Vemödalen: The Fear That Everything Has Already Been Done,” is demonstrated with painful hilarity via accounts like Insta Repeat. (And to make it meta, there’s my repetition of the term.) “So we beat on, boats against the current, borne back ceaselessly into the past…” Or as Marshawn Lynch might describe running through one’s face, “Over & over, and over & over & over…”
  • As Louis CK deftly noted, “Everything is amazing & nobody’s happy.”
  • The disruption always makes me think of The Onion’s classic “Dolphins Evolve Opposable Thumbs“: “Holy f*ck, that’s it for us monkeys.” My new friend August replied with the armed dolphin below. 💪👀
  • A group of thoughtful creators recently mused on “What AI art means for human artists.” Like me, many of them likened this revolution to the arrival of photography in the 19th century. It immediately devalued much of what artists had labored for years to master—yet in doing so it freed them up to interpret the world more freely (think Impressionism, Cubism, etc.).
  • Content-Aware Fill was born from the amazing PatchMatch technology (see video). We got it into Photoshop by stripping it down to just one piece (inpainting), and I foresee similar streamlined applications of the many things DALL•E-type tech can do (layout creation, style transfer, and more).
  • StyleCLIP is my team’s effort to edit faces via text by combining OpenAI’s CLIP (part of DALL•E’s magic sauce) with NVIDIA’s StyleGAN. You can try it out here.
  • Longtime generative artist Mario Klingemann used GPT-3 to coin a name for Promptomancy. I wonder how long these incantations & koans will remain central, and how quickly we’ll supplement or even supplant them with visual affordances (presets, sliders, grids, etc.).
  • O.C.-actor-turned-author Ben McKenzie wrote a book on crypto that promises to be sharp & entertaining, based on the interviews with him I’ve heard.
  • Check out the DALL•E-made 3D Lego Teslas that, at a glance, fooled longtime Pixar vet Guido Quaroni. I also love these gibberish-filled ZZ Top concert posters.
  • My grand-mentee (!) Joanne is the PM for DALL•E.
  • Bill Atkinson created MacPaint, blowing my 1984 mind with breakthroughs like flood fill. The arrival of DALL•E feels so similar.

DALL•E text as Italian comedic gibberish

Amidst my current obsession with AI-generated images, I’ve been particularly charmed by DALL•E’s penchant for rendering some delightfully whacked-out text, as in these concert posters:

This reminded me of an old Italian novelty song meant to show non-native English speakers what the language sounds like to non-speakers. Enjoy. 😛

Just scratching the surface on generative inpainting

I’m having a ball asking the system to create illustrations, after which I can select regions and generate new variations. Click/tap if needed to play the animation below:

It’s a lot of fun for photorealistic work, too. Here I erased everything but Russell Brown’s face, then let DALL•E synthesize the rest:

And check out what it did with a pic of my wife & our friend (“two women surrounded by numerous sugar skulls and imagery from día de los muertos, in the style of salvador dalí, digital art”). 💃🏻💀👀

“The AI that creates any picture you want, explained”

Obviously I’m almost criminally obsessed with DALL•E et al. (sorry if you wanted to see my normal filler here 😌). Here’s an accessible overview of how we got here & how it all works:

The vid below gathers a lot of emerging thoughts from sharp folks like my teammate Ryan Murdock & my friend Mario Klingemann. “Maybe the currency is ideas [vs. execution]. This is a future where everyone is an art director,” says Rob Sheridan. Check it out:

[Via Dave Dobish]

“Content-Aware Fill… cubed”: DALL•E inpainting is nuts

The technology’s ability not only to synthesize new content, but to match it to context, blows my mind. Check out this thread showing the results of filling in the gap in a simple cat drawing via various prompts. Some of my favorites are below:

Also, look at what it can build out around just a small sample image plus a text prompt (a chef in a sushi restaurant); just look at it!

Mobile DALL•E = My kind of location-based AR

I’ve long considered augmented reality apps to be “realtime Photoshop”—or perhaps more precisely, “realtime After Effects.” I think that’s true & wonderful, but most consumer AR tends to be ultra-confined filters that produce ~1 outcome well.

Walking around San Francisco today, it struck me today that DALL•E & other emerging generative-art tools could—if made available via a simple mobile UI—offer a new kind of (almost) realtime Photoshop, with radically greater creative flexibility.

Here I captured a nearby sculpture, dropped out the background in Photoshop, uploaded it to DALL•E, and requested “a low-polygon metallic tree surrounded by big dancing robots and small dancing robots.” I like the results!

Meet “Imagen,” Google’s new AI image synthesizer

What a time to be alive…

Hard on the heels of OpenAI revealing DALL•E 2 last month, Google has announced Imagen, promising “unprecedented photorealism × deep level of language understanding.” Unlike DALL•E, it’s not yet available via a demo, but the sample images (below) are impressive.

I’m slightly amused to see Google flexing on DALL•E by highlighting Imagen’s strengths in figuring out spatial arrangements & coherent text (places where DALL•E sometimes currently struggles). The site claims that human evaluators rate Imagen output more highly than what comes from competitors (e.g. MidJourney).

I couldn’t be more excited about these developments—most particularly to figure out how such systems can enable amazing things in concert with Adobe tools & users.

What a time to be alive

DALL•E vs. Reality: Flavawagon Edition

Heh—I got a kick out of seeing how AI would go about hallucinating its idea of what my flamed-out ’84 Volvo wagon looked like. See below for a comparison. And in retrospect, how did I not adorn mine with a tail light made from a traffic cone (or is it giant candy corn?) and “VOOFO NACK”? 😅

AI: Sam Harris talks with Eric Schmidt

I really enjoyed this conversation—touching, as it does, on my latest fascination (AI-generated art via DALL•E) and myriad other topics. In fact, I plan to listen to it again—hopefully this time near a surface through which to jot down & share some of the most resonant observations. Meanwhile, I think you’ll find it thoughtful & stimulating.

In this episode of the podcast, Sam Harris speaks with Eric Schmidt about the ways artificial intelligence is shifting the foundations of human knowledge and posing questions of existential risk.

NASA celebrates Hubble’s 32nd birthday with a lovely photo of five clustered galaxies

Honestly, from DALL•E innovations to classic mind-blowers like this, I feel like my brain is cooking in my head. 🙃 Take ‘er away, science:

Bonus madness (see thread for details):

A free online face-swapping tool

My old boss on Photoshop, Kevin Connor, used to talk about the inexorable progression of imaging tools from the very general (e.g. the Clone Stamp) to the more specific (e.g. the Healing Brush). In the process, high-complexity, high-skill operations were rendered far more accessible—arguably to a fault. (I used to joke that believe it or not, drop shadows were cool before Photoshop made them easy. ¯\_(ツ)_/¯)

I think of that observation when seeing things like the Face Swap tool from Icons8. What once took considerable time & talent in an app like Photoshop is now rendered trivially fast (and free!) to do. “Days of Miracles & Wonder,” though we hardly even wonder now. (How long will it take DALL•E to go from blown minds to shrugged shoulders? But that’s a subject for another day.)

DALL•E 2 looks too amazing to be true

There’s no way this is real, is there?! I think it must use NFW technology (No F’ing Way), augmented with a side of LOL WTAF. 😛

Here’s an NYT video showing the system in action:

The NYT article offers a concise, approachable description of how the approach works:

A neural network learns skills by analyzing large amounts of data. By pinpointing patterns in thousands of avocado photos, for example, it can learn to recognize an avocado. DALL-E looks for patterns as it analyzes millions of digital images as well as text captions that describe what each image depicts. In this way, it learns to recognize the links between the images and the words.

When someone describes an image for DALL-E, it generates a set of key features that this image might include. One feature might be the line at the edge of a trumpet. Another might be the curve at the top of a teddy bear’s ear.

Then, a second neural network, called a diffusion model, creates the image and generates the pixels needed to realize these features. The latest version of DALL-E, unveiled on Wednesday with a new research paper describing the system, generates high-resolution images that in many cases look like photos.

Though DALL-E often fails to understand what someone has described and sometimes mangles the image it produces, OpenAI continues to improve the technology. Researchers can often refine the skills of a neural network by feeding it even larger amounts of data.

I can’t wait to try it out.