Generative AI: Nuance > Sanctimony

Listen, I know that it’s a lot more seductive & cathartic to say “I f*cking hate generative AI,” and you can get 90,000+ likes for doing so, but—believe it or not—thoughtfulness & nuance actually matter. That is, how one uses generative tech can have very different implications for the creative community.

It’s therefore important to evaluate a range of risk/reward scenarios: What’s unambiguously useful & low-risk, vs. what’s an inducement to ripping people off, and what lies in the middle?

I see a continuum like this (click/tap to see larger):

None of this will draw any attention or generate much conversation—at least if my attempts to engage people on Twitter are any indication—but it’s the kind of thing actual toolmakers must engage with if we’re to make progress together. And so, back to work.

PS—This, always this:

Reinterpreting classic instrument clusters in the age of CarPlay

“Tell me about a product you hate that you use regularly.” I asked this question of hundreds of Google PM candidates I interviewed, and it was always a great bozo detector. Most people don’t have much of an answer—no real passion or perspective. I want to know not just what sucks, but why it sucks.

If I were asked the same question, I’d immediately say “Every car infotainment system ever made.” As Tolstoy might say, “Each one is unhappy in its own way.” The most interesting thing, I think, isn’t just to talk about the crappy mismatched & competing experiences, but rather about why every system I’ve ever used sucks. The answer can’t be “Every person at every company is a moron”—so what is it?

So much comes down to the structure of the industry, with hardware & software being made by a mishmash of corporate frenemies, all contending with a soup of regulations, risk aversion (one recall can destroy the profitability of a whole product line), and surprisingly bargain-bin electronics.

Despite all that, talented folks continue to fight the good fight, and I enjoyed John LePore’s speculative designs that reinterpret the instrument clusters of classic cars (from Corvettes to DeLoreans) through Apple’s latest CarPlay framework:

Ahnuld’s Fables

My friend Nathan has fed a mix of Schwarzenegger photos & drawings from Aesop’s Fables into the new open-source Flux model, creating a rad woodcut style. That’s interesting enough on its own—but it’s so 24 hours ago, and thus he’s now taken to animating the results. Check out the thread below for details:

Pixel 9 adds on-device image generation

It’s wild that capabilities that blew our minds two years ago—for which I & others spent months on a waiting list for DALL•E, which demanded beefy servers to run—are now available (only better) running in your pocket, on your telephone. Check out the latest from Google:

Pixel Studio is a first-of-its-kind image generator. So now you can bring all ideas to life from scratch, right on your phone — a true creative canvas.9

It’s powered by combining an on-device diffusion model running on Tensor G4 and our Imagen 3 text-to-image model in the cloud. With a UI optimized for easy prompting, style changes and editing, you can quickly bring your ideas to conversations with friends and family.

Days of Miracles & Wonder, as always…

Google Pixel introduces an interactive “Add Me” feature

Back when I worked on Google Photos, and especially later when I worked in Research, I really wanted to ship a camera mode that would help ensure great group photos. Prior to the user pressing the capture button, it would observe the incoming video stream, notice when it had at least one instance of each face smiling with their eyes open, and then knit together a single image in which everyone looked good.

Of course, the idea was hardly new: I’d done the same thing manually with my own wedding photos back in 2005, and in 2013 Google+ introduced “AutoAwesome Smile” to select good expressions across images & merge them into a single shot. It was a great feature, though sadly the only time people noticed its existence is when it failed in often hilarious “AutoAwful” ways (turning your baby or dog into, say, a two-nosed Picasso). My idea was meant to improve on this by not requiring multiple photos, and of course by suppressing unwanted hilarity.

Anyway, Googlers gonna Google, and now the Pixel team has introduced an interactive mode that helps you capture & merge two shots—the first one of a group, and the second of the photographer who took the first. Check out Marques Brownlee’s 1-minute demo:

For more details, check out his full review of Google’s new devices.

That’s all well and good—but wake me when they decide to bring back David Hasselhoff photobombs:

 

Uizard & the future of AI-assisted design

Uizard (“Wizard”), which was recently acquired by Miro, has rolled out Autodesigner 2.0:

We take the intuitive conversational flow of ChatGPT and merge it with Uizard generative UI capabilities and drag-and-drop editor, to provide you with an intuitive UI design generator. You can turn a couple of ideas into a digital product design concept in a flash!

I’m really curious to see how the application of LLMs & conversational AI reshapes the design process, from ideation & collaboration to execution, deployment, and learning—and I’d love to hear your thoughts! Meanwhile here’s a very concise look at how Autodesigner works:

And if that piques your interest, here’s a more in-depth look:

A little birthday lunacy

I fondly recall Andy Samberg saying years ago that they’d sometimes cook up a sketch that would air at the absolute tail end of Saturday Night Live, be seen by almost no one, and be gotten by far fewer still—and yet for, like, 10,000 kids, it would become their favorite thing ever.

Given that it was just my birthday, I’ve dug up such an old… gem (?). This is why I’ve spent the last ~25 years hearing Jack Black belting out “Ha-ppy Birth-DAYYY!!” Enjoy (?!).

“Top Billing,” huge egos, and the art of title design

99% Invisible is back at it, uncovering hidden but fascinating bits of design in action. This time around it’s concerned with the art of movie title & poster design—specifically with how to deal with actors who insist on being top billed. In the case of the otherwise forgotten movie Outrageous Fortune:

Two different prints of the movie were made, one listing Shelley Long’s name first and the other listing Bette Midler’s name first. Not only that, two different covers to take-home products (LaserDisc and VHS) were also made, with different names first. The art was mirrored, so that the names aligned with the actors images.

One interesting pattern that’s emerged is to place one actor’s name in the lower left & another in the upper right—thus deliberately conflicting with normal reading order in English:

Anyway, as always with this show, just trust me—the subject is way more interesting than you might think.

Throwback: “Behind the scenes with Olympians & Google’s AR ‘Scan Van'”

I’m old enough to remember 2020, when we sincerely (?) thought that everyone would be excited to put 3D-scanned virtual Olympians onto their coffee tables… or something. (Hey, it was fun while it lasted! And it temporarily kept a bunch of graphics nerds from having to slink back to the sweatshop grind of video game development.)

Anyway, here’s a look back to what Google was doing around augmented reality and the 2020 (’21) Olympics:


I swear I spent half of last summer staring at tiny 3D Naomi Osaka volleying shots on my desktop. I remain jealous of my former teammates who got to work with these athletes (and before them, folks like Donald Glover as Childish Gambino), even though doing so meant dealing with a million Covid safety protocols. Here’s a quick look at how they captured folks flexing & flying through space:

 
 
 
 
 
View this post on Instagram
 
 
 
 
 
 
 
 
 
 
 

A post shared by Google (@google)

You can play with the content just by searching:

[Via Chikezie Ejiasi]

AI stuff I need to see in Photoshop

…and other creative imaging tools, stat!

Google Research has devised “Alchemist,” a new way to swap object textures:

And people keep doing wonderful things with realtime image synthesis:

“How To Draw An Owl,” AI edition

Always pushing the limits of expressive tech, Martin Nebelong has paired Photoshop painting with AI rendering, followed by Runway’s new image-to-video model. “Days of Miracles & Wonder,” as always:

Meta releases SAM 2 for fast segmentation

Man, I’m old enough to remember rotoscoping video by hand—a process that quickly made me want to jump right out a window. Years later, when we were working on realtime video segmentation at Google, I was so proud to show the tech to a bunch of high school design students—only to have them shrug and treat it as completely normal.

Ah, but so it goes: “One of history’s few iron laws is that luxuries tend to become necessities and to spawn new obligations. Once people get used to a certain luxury, they take it for granted.” — Yuval Noah Harari

In any case, Meta has just released what looks like a great update to their excellent—and open-source—Segment Anything Model. Check it out:

You can play with the demo and learn more on the site:

  • Following up on the success of the Meta Segment Anything Model (SAM) for images, we’re releasing SAM 2, a unified model for real-time promptable object segmentation in images and videos that achieves state-of-the-art performance.
  • In keeping with our approach to open science, we’re sharing the code and model weights with a permissive Apache 2.0 license.
  • We’re also sharing the SA-V dataset, which includes approximately 51,000 real-world videos and more than 600,000 masklets (spatio-temporal masks).
  • SAM 2 can segment any object in any video or image—even for objects and visual domains it has not seen previously, enabling a diverse range of use cases without custom adaptation.

Neural rendering: Neo + Firefly

Back when we launched Firefly (alllll the way back in March 2023), we hinted at the potential of combining 3D geometry with diffusion-based rendering, and I tweeted out a very early sneak peek:

A year+ later, I’m no longer working to integrate the Babylon 3D engine into Adobe tools—and instead I’m working directly with the Babylon team at Microsoft (!). Meanwhile I like seeing how my old teammates are continuing to explore integrations between 3D (in this case, project Neo). Here’s one quick flow:

Here’s a quick exploration from the always-interesting Martin Nebelong:

And here’s a fun little Neo->Firefly->AI video interpolation test from Kris Kashtanova:

AI in Ai: Illustrator adds Vector GenFill

As I’ve probably mentioned already, when I first surveyed Adobe customers a couple of years ago (right after DALL•E & Midjourney first shipped), it was clear that they wanted selective synthesis—adding things to compositions, and especially removing them—much more strongly than whole-image synthesis.

Thus it’s no surprise that Generative Fill in Photoshop has so clearly delivered Firefly’s strongest product-market fit, and I’m excited to see Illustrator following the same path—but for vectors:

Generative Shape Fill will help you improve your workflow including:

  • Create detailed, scalable vectors: After you draw or select your shape, silhouette, or outline in your artboard, use a text prompt to ideate on vector options to fill it.
  • Style Reference for brand consistency: Create a wide variety of options that match the color, style, and shape of your artwork to ensure a consistent look and feel.
  • Add effects to your creations: Enhance your vector options further by adding styles like 3D, geometric, pixel art or more.

They’re also adding the ability to create vector patterns simply via prompting:

Photoshop’s new Selection Brush helps control GenFill

Soon after Generative Fill shipped last year, people discovered that using a semi-opaque selection could help blend results into an environment (e.g. putting fish under water). The new Selection Brush in Photoshop takes functionality that’s been around for 30+ years (via Quick Select mode) and brings it more to the surface, which in turn makes it easier to control GenFill behavior:

Magnific magic comes to Photoshop

I’m delighted to see that Magnific is now available as a free Photoshop panel!

For now the functionality is limited to upscaling, but I have to think that they’ll soon turn on the super cool relighting & restyling tech that enables fun like transforming my dog using just different prompts (click to see larger):

Realtime face editing with LivePortrait

I wish Adobe hadn’t given up (at least for the last couple of years and foreseeable future) on the Smart Portrait tech we were developing. It’s been stuck at 1.0 since 2020 and could be so much better. Maybe someday!

In the meantime, check out LivePortrait:

And now you can try it out for yourself:

tyFlow: Stable Diffusion-based rendering in 3ds Max

Being able to declare what you want, instead of having to painstakingly set up parameters for materials, lighting, etc. may prove to be an incredibly unlock for visual expressivity, particularly around the generally intimidating realm of 3D. Check out what tyFlow is bringing to the table:

You can see a bit more about how it works in this vid…

…or a lot more in this one:

How I wish Photoshop would embrace AI

Years ago Adobe experimented with a real-time prototype of Photoshop’s Landscape Mixer Neural Filter, and the resulting responsiveness made one feel like a deity—fluidly changing summer to winter & back again. I was reminded of using Google Earth VR, where grabbing & dragging th

Nothing came of it, but in the time since then, realtime diffusion rendering (see amazing examples from Krea & others) and image-to-image restyling have opened some amazing new doors. I wish I could attach filters to any layer in Photoshop (text, 3D, shape, image) and have it reinterpreted like this:

Magic Insert promises stylistically harmonized compositing

New tech from my old Google teammates makes some exciting claims:

Using Magic Insert we are, for the first time, able to drag-and-drop a subject from an image with an arbitrary style onto another target image with a vastly different style and achieve a style-aware and realistic insertion of the subject into the target image.

Of course, much of the challenge here—where art meets science—is around identity preservation: to what extent can & should the output resemble the input? Here it’s subject to some interpretation. In other applications one wants an exact copy of a given person or thing, but optionally transformed in just certain ways (e.g. pose & lighting).

When we launched Firefly last year, we showed off some of Adobe’s then-new ObjectStitch tech for making realistic composites. It didn’t ship while I was there due to challenges around identity preservation. As far as I know those challenges remain only partially solved, so I’ll continue holding out hope—as I have for probably 30 years now!—for future tech breakthroughs that get us all the way across that line.

Day & Night, Magnific + Luma Edition

Check out this striking application of AI-powered relighting: a single rendering is deeply & realistically transformed via one AI tool, and the results are then animated & extended by another.

Meanwhile Krea has just jumped into the game with similar-looking relighting tech. I’m off to check it out!

Luma’s AI meme machine rolls on

Days of Miracles & Wonder, as always…

Here’s a micro tutorial on how to create similar effects:

Can you use Photoshop GenFill on video?

Well, it doesn’t create animated results, but it can work perhaps surprisingly well on regions in static shots:

It can also be used to expand the canvas of similar shots:

OMG: DALL•E -> LEGO

Much amaze, wowo wowo:

This Lego machine can easily create a beautiful pixelart of anything you want! It is programmed in Python, and, with help of OpenAI’s DALL-E 3, it can make anything!

DesignBoom writes,

Sten of the YouTube channel Creative Mindstorms demonstrates his very own robot printer named Pixelbot 3000, made of LEGO bricks, that can produce pixel art with the help of OpenAI’s DALL-E 3 and AI images. Using a 32 x 32 plate and numerous round LEGO bricks, the robot printer automatically pins the pieces onto their designated positions until it forms the pixel art version of the image. He uses Python as his main programming language, and to create pixel art of anything, he employs AI, specifically OpenAI’s DALL-E 3.

Glif enables SD-powered image remixing via right click

Fun! You can grab the free browser extension here.

* right-click-remix any image w/ tons of amazing AI presets: Style Transfer, Controlnets… * build & remix your own workflows with full comfyUI support * local + cloud!

besides some really great default presets using all sorts of amazing ComfyUI workflows (which you can inspect and remix on http://glif.app), the extension will now also pull your own compatible glifs into it!

MimicBrush promises prompt-free regional adjustment

The tech, a demo of which you can try here, promises “‘imitative editing,’ allowing users to edit images using reference images without the need for detailed text descriptions.”

Here it is in action:

Runway introduces Gen-3 video

Good grief, the pace of change makes “AI vertigo” such a real thing. Just last week we were seeing “skeleton underwater” memes with Runway submerged in a rusty chair. :-p I’m especially excited to see how it handles text (which remains a struggle for text-to-image models including DALL•E):

Google introduces a super fun GenType tool

I’m really digging the simple joy in this little experiment, powered by Imagen:

Here’s a bit of fun enabled by “weedy seadragons on PVC pipes in a magical undersea kingdom” (click to see at full res):

Luma unveils Dream Machine video generator

I’m super eager to try this one out!

It is a highly scalable and efficient transformer model trained directly on videos making it capable of generating physically accurate, consistent and eventful shots. Dream Machine is our first step towards building a universal imagination engine and it is available to everyone now!

Adobe TOS = POS? Not so much.

There’s been a firestorm this week about the terms of service that my old home team put forward, based (as such things have been since time immemorial) on a lot of misunderstanding & fear. Fortunately the company has been working to clarify what’s really going on.

I did at least find this bit of parody amusing:

HyperDreamBooth, explained in 5 minutes

My former Google teammates have been cranking out some amazing AI personalization tech, with HyperDreamBooth far surpassing the performance of their original DreamBooth (y’know, from 2022—such a simpler ancient time!). Here they offer a short & pretty accessible overview of how it works:

Using only a single input image, HyperDreamBooth is able to personalize a text-to-image diffusion model 25x faster than DreamBooth, by using (1) a HyperNetwork to generate an initial prediction of a subset of network weights that are then (2) refined using fast finetuning for high fidelity to subject detail. Our method both conserves model integrity and style diversity while closely approximating the subject’s essence and details.

Check out The TED AI Show

“Maybe the real treasure was the friends we made along the way” is, generally, ironic shorthand for “worthless treasure”—but I’ve also found it to be true. That’s particularly the case for the time I spent at Google, where I met excellent folks like Bilawal Sidhu (a fellow PM veteran of the augmented reality group). I’m delighted that he’s now crushing it as the new host of the TED AI Show podcast.

Check out their episodes so far, including an interview with former OpenAI board member Helen Toner, who discusses the circumstances of firing Sam Altman last year before losing her board position.