Monthly Archives: March 2025

New generative video hotness: Runway + Higgsfield

It’s funny to think of anyone & anything as being an “O.G.” in the generative space—but having been around for the last several years, Runway has as solid a claim as anyone. They’ve just dropped their Gen-4 model. Check out some amazing examples of character consistency & camera control:


Here’s just one of what I imagine will be a million impressive uses of the tech:

Meanwhile Higgsfield (of which I hadn’t heard before now) promises “AI video with swagger.” (Note: reel contains occasionally gory edgelord imagery.)

Virtual product photography in ChatGPT

Seeing this, I truly hope that Adobe isn’t as missing in action as they seem to be; fingers crossed.

In the meantime, simply uploading a pair of images & a simple prompt is more than enough to get some compelling results. See subsequent posts in the thread for details, including notes on some shortcomings I observed.

See also (one of a million tests being done in parallel, I’m sure):

Ideogram 3.0 is here

In the first three workdays of this week, we saw three new text-to-image models arrive! And now that it’s Thursday, I’m like, “WTF, no new Flux/Runway/etc.?” 🙂

For the last half-year or so, Ideogram has been my go-to model (see some of my more interesting creations), so I’m naturally delighted to see them moving things forward with the new 3.0 model:

I don’t yet quite understand the details of how their style-reference feature will work, but I’m excited to dig in.

Meanwhile, here’s a thread of some really impressive initial creations from the community:

ChatGPT reimagines family photos

“Dress Your Family in Corduroy and Denim” — David Sedaris
“Turn your fam into Minecraft & GTA” — Bilawal Sidhu

And meanwhile, on the server side:

Google’s “Photoshop Killer”?

Nearly twenty years ago (!), I wrote here about how The Killing’s Gotta Stop—ironically, perhaps, about then-new Microsoft apps competing with Adobe. I rejected false, zero-sum framing then, and I reject it now.

Having said that, my buddy Bilawal’s provocative framing in this video gets at something important: if Adobe doesn’t get on its game, actually delivering the conversational editing capabilities we publicly previewed 2+ years ago, things are gonna get bad. I’m reminded of the axiom that “AI will not replace you, but someone using AI just might.” The same goes for venerable old Photoshop competing against AI-infused & AI-first tools.

In any case, if you’re interested in the current state of the art around conversational editing (due to be different within weeks, of course!), I think you’ll enjoy this deep dive into what is—and isn’t—possible via Gemini:

Specific topic sections, if you want to jump right to ’em:

  • 00:00 Conversational Editing with Google’s Multimodal AI
  • 00:53 Image Generation w/ LLM World Knowledge
  • 02:12 Easy Image Editing & Colorization 
  • 02:46 Advanced Conversational Edits (Chaining Prompts Together)
  • 03:37 Long Text Generation (Google Beats OpenAI To The Punch)
  • 04:25 Making Spicy Memes (Google AI Studio Safety Settings) 
  • 05:48 Advanced Prompting (One Shot ComfyUI Workflows) 
  • 07:19 Re-posing Characters (While Keeping Likeness Intact) 
  • 08:27 Spatial 3D Understanding (NO ControlNet) 
  • 10:42 Semantic Editing & In/Out Painting 
  • 13:46 Sprite Sheets & Animation Keyframes 
  • 14:40 Using Gemini To Build Image Editing Apps
  • 16:37 Making Videos w/ Conversational Editing

Happy birthday, Adobe Firefly

The old (hah! but it seems that way) gal turns two today.

The ride has been… interesting, hasn’t it? I remain eager to see what all the smart folks at Adobe have been cooking up. As a user of Photoshop et al. for the last 30+ years, I selfishly hope it’s great!

In the meantime, I’ll admit that watching the video above—which I wrote & then made with the help of Davis Brown (son of Russell)—makes me kinda blue. Everything it depicts was based on real code we had working at the time. (I insisted that we not show anything that we didn’t think we could have shipping within three months’ time.) How much of that has ever gotten into users’ hands?

Yeah.

But as I say, I’m hoping and rooting for the best. My loyalty has never been to Adobe or to any other made-up entity, but rather to the spirit & practice of human creativity. Always will be, until they drag me off this rock. Rock the F on.

Adobe to offer access to non-Firefly models

Man, I’m old enough to remember writing a doc called “Yes, And…” immediately upon the launch of DALL•E in 2022, arguing that of course Adobe should develop its own generative models and of course it should also offer customers a choice of great third-party models—because of course no single model would be the best for every user in every situation.

And I’m old enough to remember being derided for just not Getting It™ about how selling per-use access to Firefly was going to be a goldmine, so of course we wouldn’t offer users a choice. ¯\_(ツ)_/¯

Oh well. Here we are, exactly two years after the launch of Firefly, and Adobe is going to offer access to third-party models. So… yay!

Runway reskins rock

Another day, another set of amazing reinterpretations of reality. Take it away Nathan…


…and Bilawal:

Mystic structure reference: Dracarys!

I love seeing the Magnific team’s continued rapid march in delivering identity-preserving reskinning

This example makes me wish my boys were, just for a moment, 10 years younger and still up for this kind of father/son play. 🙂

Behind the scenes: AI-augmented animation

“Rather than removing them from the process, it actually allowed [the artists] to do a lot more—so a small team can dream a lot bigger.”

Paul Trillo’s been killing it for years (see innumerable previous posts), and now he’s given a peek into how his team has been pushing 2D & 3D forward with the help of custom-trained generative AI:”

Charmingly terrible AI-made infographics

A passing YouTube vid made me wonder about the relative strengths of World War II-era bombers, and ChatGPT quickly obliged by making me a great little summary, including a useful table. I figured, however, that it would totally fail at making me a useful infographic from the data—and that it did!

Just for the lulz, I then ran the prompt (“An infographic comparing the Avro Lancaster, Boeing B-17, and Consolidated B-24 Liberator bombers”) through a variety of apps (Ideogram, Flux, Midjourney, and even ol’ Firefly), creating a rogue’s gallery of gibberish & Franken-planes. Check ’em out.

Surrealism blooms through Pika

Check out this delightful demo:

Individual steps, as I understand them:

  • Generate image (in this example, using Google Imagen).
  • Apply background segmentation.
  • Synthesize a new background, and run what I think is a fine-tuned version of IC-Light (using Stable Diffusion) to relight the entire image, harmonizing foreground/background. Note that identity preservation (face shape, hair color, dress pattern, etc.) is very good but not perfect; see changes in the woman’s hair color, expression, and dress pattern.
  • Put the original & modified images into Pika, then describe the desired transformation (smooth transition, flowers growing, clouds moving, etc.).