Category Archives: AI/ML

StarVector: Text/Image->SVG Code

Back at Adobe we introduced Firefly text-to-vector creation, but behind the scenes it was really text-to-image-to-tracing. That could be fine, actually, provided that the conversion process did some smart things around segmenting the image, moving objects onto their own layers, filling holes, and then harmoniously vectorizing the results. I’m not sure whether Adobe actually got around to shipping that support.

In any event, StarVector promises actual, direct creation of SVG. The results look simple enough that it hasn’t yet piqued my interest enough to spend my time with it, but I’m glad that folks are trying.

That Happy Meal feel

Sure, the environmental impact of this silliness isn’t great, but it’s probably still healthier than actually eating McDonald’s. :-p

Tangentially, I continue to have way too much fun applying different genres to amigos:

Google, Dolphins, and Ai-i-i-i-i!

Three years ago (seems like an eternity), I remarked regarding generative imaging.,

The disruption always makes me think of The Onion’s classic “Dolphins Evolve Opposable Thumbs“: “Holy f*ck, that’s it for us monkeys.” My new friend August replied with the armed dolphin below.

I’m reminded of this seeing Google’s latest AI-powered translation (?!) work. Just don’t tell them about abacuses!

[Via Rick McCawley]

Friday’s Microsoft Copilot event in 9 minutes

The team showed of good new stuff, including—OMG—showing how to use Photoshop! (On an extremely personal level, “This is what it’s like when worlds colliiiide!!”)

As it marks its 50th anniversary, Microsoft is updating Copilot with a host of new features that bring it in line with other AI systems like ChatGPT or Claude. We got a look at them during the tech giant’s 50th anniversary event today, including new search capabilities, Copilot Vision which will be able to analyze real-time video from a mobile camera. Copilot will also now be able to use the web on your behalf. Here’s everything you missed.

  • 00:00 Intro and Copilot Agents
  • 2:07 Copilot for planning
  • 2:30 Copilot AI podcast generating
  • 3:18 Copilot Shopping
  • 3:39 Copilot Vision
  • 4:07 Copilot feature use cases demo
  • 6:16 Researcher, Copilot Studio, custom agents
  • 6:53 Copilot Memory
  • 7:23 Custom Copilot appearances
  • 8:48 Outro

Rustlin’ up some Russells

2025 marks an unheard-of 40th year in Adobe creative director Russell Brown’s remarkable tenure at the company. I remember first encountering him via the Out Of Office message marking his 15-year (!) sabbatical (off to Burning Man with Rick Smolan, if I recall correctly). If it weren’t for Russell’s last-minute intervention back in 2002, when I was living out my last hours before being laid off from Adobe (interviewing at Microsoft, lol), I’d never have had the career I did, and you wouldn’t be reading this now.

In any event, early in the pandemic Russell kept himself busy & entertained by taking a wild series of self portraits. Having done some 3D printing with him (the output of which still forms my Twitter avatar!), I thought, “Hmm, what would those personas look like as plastic action figures? Let’s see what ChatGPT thinks.” And voila, here they are.

Click through the tweet below if you’re curious about the making-of process (e.g. the app starting to render him very faithfully, then freaking out midway through & insisting on delivering a more stylized, less specific rendition). But forget that—how insane is it that any of this is possible??

“The Worlds of Riley Harper”

It’s pretty stunning what a single creator can now create in a matter of days! Check out this sequence & accompanying explanation (click on the post) from Martin Gent:

Tools used:

Severance, through the animated lens of ChatGPT

People can talk all the smack they want about “AI slop”—and to be sure, there’s tons of soulless slop going around—but good luck convincing me that there’s no creativity in remixing visual idioms, and in reskinning the world in never-before-possible ways. We’re just now dipping a toe into this new ocean.

See the whole thread for a range of fun examples:

OMG AI KFC

It’s insane what a single creator—in this case David Blagojević—can do with AI tools; insane.

It’s worth noting that creative synthesis like this doesn’t “just happen,” much less in some way that replaces or devalues the human perspective & taste at the heart of the process: everything still hinges on having an artistic eye, a wealth of long-cultivated taste, and the willpower to make one’s vision real. It’s just that the distance between that vision & reality is now radically shorter than it’s ever been.

New generative video hotness: Runway + Higgsfield

It’s funny to think of anyone & anything as being an “O.G.” in the generative space—but having been around for the last several years, Runway has as solid a claim as anyone. They’ve just dropped their Gen-4 model. Check out some amazing examples of character consistency & camera control:


Here’s just one of what I imagine will be a million impressive uses of the tech:

Meanwhile Higgsfield (of which I hadn’t heard before now) promises “AI video with swagger.” (Note: reel contains occasionally gory edgelord imagery.)

Virtual product photography in ChatGPT

Seeing this, I truly hope that Adobe isn’t as missing in action as they seem to be; fingers crossed.

In the meantime, simply uploading a pair of images & a simple prompt is more than enough to get some compelling results. See subsequent posts in the thread for details, including notes on some shortcomings I observed.

See also (one of a million tests being done in parallel, I’m sure):

Ideogram 3.0 is here

In the first three workdays of this week, we saw three new text-to-image models arrive! And now that it’s Thursday, I’m like, “WTF, no new Flux/Runway/etc.?” 🙂

For the last half-year or so, Ideogram has been my go-to model (see some of my more interesting creations), so I’m naturally delighted to see them moving things forward with the new 3.0 model:

I don’t yet quite understand the details of how their style-reference feature will work, but I’m excited to dig in.

Meanwhile, here’s a thread of some really impressive initial creations from the community:

ChatGPT reimagines family photos

“Dress Your Family in Corduroy and Denim” — David Sedaris
“Turn your fam into Minecraft & GTA” — Bilawal Sidhu

And meanwhile, on the server side:

Google’s “Photoshop Killer”?

Nearly twenty years ago (!), I wrote here about how The Killing’s Gotta Stop—ironically, perhaps, about then-new Microsoft apps competing with Adobe. I rejected false, zero-sum framing then, and I reject it now.

Having said that, my buddy Bilawal’s provocative framing in this video gets at something important: if Adobe doesn’t get on its game, actually delivering the conversational editing capabilities we publicly previewed 2+ years ago, things are gonna get bad. I’m reminded of the axiom that “AI will not replace you, but someone using AI just might.” The same goes for venerable old Photoshop competing against AI-infused & AI-first tools.

In any case, if you’re interested in the current state of the art around conversational editing (due to be different within weeks, of course!), I think you’ll enjoy this deep dive into what is—and isn’t—possible via Gemini:

Specific topic sections, if you want to jump right to ’em:

  • 00:00 Conversational Editing with Google’s Multimodal AI
  • 00:53 Image Generation w/ LLM World Knowledge
  • 02:12 Easy Image Editing & Colorization 
  • 02:46 Advanced Conversational Edits (Chaining Prompts Together)
  • 03:37 Long Text Generation (Google Beats OpenAI To The Punch)
  • 04:25 Making Spicy Memes (Google AI Studio Safety Settings) 
  • 05:48 Advanced Prompting (One Shot ComfyUI Workflows) 
  • 07:19 Re-posing Characters (While Keeping Likeness Intact) 
  • 08:27 Spatial 3D Understanding (NO ControlNet) 
  • 10:42 Semantic Editing & In/Out Painting 
  • 13:46 Sprite Sheets & Animation Keyframes 
  • 14:40 Using Gemini To Build Image Editing Apps
  • 16:37 Making Videos w/ Conversational Editing

Happy birthday, Adobe Firefly

The old (hah! but it seems that way) gal turns two today.

The ride has been… interesting, hasn’t it? I remain eager to see what all the smart folks at Adobe have been cooking up. As a user of Photoshop et al. for the last 30+ years, I selfishly hope it’s great!

In the meantime, I’ll admit that watching the video above—which I wrote & then made with the help of Davis Brown (son of Russell)—makes me kinda blue. Everything it depicts was based on real code we had working at the time. (I insisted that we not show anything that we didn’t think we could have shipping within three months’ time.) How much of that has ever gotten into users’ hands?

Yeah.

But as I say, I’m hoping and rooting for the best. My loyalty has never been to Adobe or to any other made-up entity, but rather to the spirit & practice of human creativity. Always will be, until they drag me off this rock. Rock the F on.

Adobe to offer access to non-Firefly models

Man, I’m old enough to remember writing a doc called “Yes, And…” immediately upon the launch of DALL•E in 2022, arguing that of course Adobe should develop its own generative models and of course it should also offer customers a choice of great third-party models—because of course no single model would be the best for every user in every situation.

And I’m old enough to remember being derided for just not Getting It™ about how selling per-use access to Firefly was going to be a goldmine, so of course we wouldn’t offer users a choice. ¯\_(ツ)_/¯

Oh well. Here we are, exactly two years after the launch of Firefly, and Adobe is going to offer access to third-party models. So… yay!

Runway reskins rock

Another day, another set of amazing reinterpretations of reality. Take it away Nathan…


…and Bilawal:

Mystic structure reference: Dracarys!

I love seeing the Magnific team’s continued rapid march in delivering identity-preserving reskinning

This example makes me wish my boys were, just for a moment, 10 years younger and still up for this kind of father/son play. 🙂

Behind the scenes: AI-augmented animation

“Rather than removing them from the process, it actually allowed [the artists] to do a lot more—so a small team can dream a lot bigger.”

Paul Trillo’s been killing it for years (see innumerable previous posts), and now he’s given a peek into how his team has been pushing 2D & 3D forward with the help of custom-trained generative AI:”

Charmingly terrible AI-made infographics

A passing YouTube vid made me wonder about the relative strengths of World War II-era bombers, and ChatGPT quickly obliged by making me a great little summary, including a useful table. I figured, however, that it would totally fail at making me a useful infographic from the data—and that it did!

Just for the lulz, I then ran the prompt (“An infographic comparing the Avro Lancaster, Boeing B-17, and Consolidated B-24 Liberator bombers”) through a variety of apps (Ideogram, Flux, Midjourney, and even ol’ Firefly), creating a rogue’s gallery of gibberish & Franken-planes. Check ’em out.

Surrealism blooms through Pika

Check out this delightful demo:

Individual steps, as I understand them:

  • Generate image (in this example, using Google Imagen).
  • Apply background segmentation.
  • Synthesize a new background, and run what I think is a fine-tuned version of IC-Light (using Stable Diffusion) to relight the entire image, harmonizing foreground/background. Note that identity preservation (face shape, hair color, dress pattern, etc.) is very good but not perfect; see changes in the woman’s hair color, expression, and dress pattern.
  • Put the original & modified images into Pika, then describe the desired transformation (smooth transition, flowers growing, clouds moving, etc.).

NeRFtastic BAFTAs

The British Academy Film Awards have jumped into a whole new dimension to commemorate the winners of this year’s awards:

The capturing work was led by Harry Nelder and Amity Studio. Nelder used his 16-camera rig to capture the recent winners. The reconstruction software was a combination of a cloud-based platform created by Nelder, which is expected to be released later this year, along with Postshot. Nelder further utilized the Radiance Field method known as Gaussian Splatting for the reconstruction. A compilation video of all the captures, recently posted by BAFTA, was edited by Amity Studio

[Via Dan Goldman]

Lego together creative AI blocks in Flora

Looks promising:

Their pitch:

  • Create workflows, not just outputs. Connect Blocks to shape, refine, and scale your creative process.
  • Collaborate in real time. Work like you would in Figma, but for AI-powered media creation.
  • Discover & clone workflows. Learn from top creatives, build on proven systems and share generative workflows inside FLORA’s Community.

Perhaps image-to-3D was a mistake…

Behold the majesty (? :-)) of CapCut’s new “Microwave” filter (whose name makes more sense if you listen with sound on):

https://youtube.com/shorts/bshQXczbZdw?si=aFwvtgs-fKf2wl8x

As I asked Bilawal, who posted the compilation, “What is this, and how can I know less about it?”

EditIQ edits single long shots into multiples virtual shots

Check it out (probably easier to grok by watching vs. reading a description):

From the static camera feed, EditIQ initially generates multiple virtual feeds, emulating a team of cameramen. These virtual camera shots termed rushes are subsequently assembled using an automated editing algorithm, whose objective is to present the viewer with the most vivid scene content.

Controlling video generation with simple props

Tired: Random “slot machine”-style video generation
Inspired: Placing & moving simple guidance objects to control results:
Check out VideoNoiseWarp:

Analog meets AI in the papercraft world of Karen X Cheng

Check out this fun mixed-media romp, commissioned by Adobe:

And here’s a look behind the scenes:

A cool Firefly image->video flow

For the longest time, Firefly users’ #1 request was to use images to guide composition of new images. Now that Firefly Video has arrived, you can use a reference image to guide the creation of video. Here’s a slick little demo from Paul Trani:

Google Photos will flag AI-manipulated images

These changes, reported by Forbes, sound like reasonable steps in the right direction:

Starting now, Google will be adding invisible watermarks to images that have been edited on a Pixel using Magic Editor’s Reimagine feature that lets users change any element in an image by issuing text prompts.

The new information will show up in the AI Info section that appears when swiping up on an image in Google Photos.

The feature should make it easier for users to distinguish real photos from AI-powered manipulations, which will be especially useful as Reimagined photos continue to become more realistic.

DeepSeek meets Flux in Krea Chat

Conversational creation & iteration is such a promising pattern, as shown through people making ChatGPT take images to greater & greater extremes:


But how do we go from ironic laughs to actual usefulness? Krea is taking a swing by integrating (I think) the Flux imaging model with the DeepSeek LLM:

It doesn’t yet offer the kind of localized refinements people want (e.g. “show me a dog on the beach,” then “put a hat on the dog” and don’t change anything outside the hat area). Even so, it’s great to be able to create an image, add a photo reference to refine it, and then create a video. Here’s my cute, if not exactly accurate, first attempt. 🙂

A mind-blowing Gemini + Illustrator demo

Wow—check out this genuinely amazing demo from my old friend (and former Illustrator PM) Mordy:

In this video, I show how you can use Gemini in the free Google AI Studio as your own personal tutor to help you get your work done. After you watch me using it to learn how to take a sketch I made on paper to recreating a logo in Illustrator, I promise you’ll be running to do the same.

MatAnyone promises incredible video segmentation

What the what?

Per the paper,

We propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates memory from the previous frame. This ensures semantic stability in core regions while preserving fine-grained details along object boundaries. 

Premiere Pro now lets you find video clips by describing them

I love it: nothing too fancy, nothing controversial, just a solid productivity boost:

Users can enter search terms like “a person skating with a lens flare” to find corresponding clips within their media library. Adobe says the media intelligence AI can automatically recognize “objects, locations, camera angles, and more,” alongside spoken words — providing there’s a transcript attached to the video. The feature doesn’t detect audio or identify specific people, but it can scrub through any metadata attached to video files, which allows it to fetch clips based on shoot dates, locations, and camera types. The media analysis runs on-device, so doesn’t require an internet connection, and Adobe reiterates that users’ video content isn’t used to train any AI models.

Gemini turns photos into interactive simulations (!)

Check out this wild proof of concept from Trudy Painter at Google, and click into the thread for details.