Category Archives: AI/ML

Phota launches, promising maximum identity preservation

Phota—about which I expressed some initial misgivings, given its ability to rewrite memories—has launched Phota Studio & their API. From what I can tell, it builds upon a Nano Banana foundation and adds personalization that relies on uploading dozens of images of each individual in order to maximize identity preservation:

With Phota, for the first time, you can generate, edit, and enhance photos while keeping your identity intact, every time.

We’re not building a generic foundation model. We build personal models about you, and about the people and pets around you. At the center are profiles, built from your personal album that learn the details of your appearance that make you recognizable as yourself: how you smile, your eye color, and how your face looks from different angles. Your personal model is private and only used by you.

Here’s a quick thread in which I tried inserting myself into a couple of images, using both Phota’s model (which depended on my uploading 30+ images of myself) and just Nano Banana straight out of the Gemini app:

FreePik enables 3D photo shoots

I love seeing progress like this: upload a product pic, convert it to 3D, and photograph it on a virtual set:

Runway debuts Multi-Shot

Here’s a fun, ultra-simple way to turn an image (or just a prompt) into a short, multi-shot narrative:

Just for fun I fed it this image…

…and this prompt (based on an all-too-true story):

A family of Lego people and their dog gaze around Yosemite’s most iconic vista, then reminisce about that time they got stuck there in the snow in their VW van, expressing hope that they don’t get stuck again!

Check out the results:

I Love(art) to move it, move it…

I’ve long quoted James Ratliff, the super sharp designer behind Adobe’s Project Graph (who’s recently decamped to Figma), in nicely phrasing how the process of generating & refining ideas generally starts broad/declarative (searching, prompting) and moves towards fine-grained methods (selecting, moving, etc.):

I see an increasing number of tool & model creators mixing modalities—even in the Gemini Super Bowl ad featuring a mom & daughter drawing a simple circle to show where they’d like to add a dog bed.

I’m eager to check out Lovart’s take on the possibilities, especially for animation:

Update: Here’s a look at the UI, in which you can move & scale the selection rectangle, as well as the before & after images:

Spline enables agentic 3D creation

“3D scenes, websites, games, apps,” promises Spline. “Describe anything and Omma builds it for you in seconds.”

Omma combines code generation (LLMs), 3D AI mesh generation, and Image generation all in one place for you to build and ship. Deploy to production, assign custom domains, and more.

Sora is the new Peach

Ten years ago (!), the embryonic social app Peach suddenly blew up on the scene—only to molder shortly thereafter. Adam Lisagor tartly predicted that outcome right after Peach debuted:

I’m reminded of this upon hearing that OpenAI has bailed out on Sora, which they launched just a few months ago. In a way I’m not surprised—check out how interest in the tech spiked & then rapidly cratered—except that just a couple of months ago Disney signed a billion-dollar deal to use it. ¯\_(ツ)_/¯

Luma UNI-1 promises layered creation

When can we get this (or equivalent) into Photoshop??

On a conceptually (though not necessarily technically) related note, the LICA dataset may help model makers train layered generation:

Photoshop drops Rotate Object!

Speaking of spinning right ’round, check this out:

Check out another view, from Paul Trani:

Runway promises 100ms (!!) HD video generation

Five years ago, I spent an afternoon with a buddy watching Disco Diffusion resolve a weird, blurry, but ultimately delightful scene over the course of 15 minutes. Now Runway & NVIDIA are previewing generation that’s a mere ~90,000x faster than that. Ludicrous speed, go!!

Tips: Getting great text from Nano Banana

Structuring your prompt well turns out to be key in avoiding garbled text. As the presenter says, “It’s not about writing more. It’s about writing in the right order.” Check out this brief overview.

In this tutorial, you’ll see how to use Nano Banana Pro and Kling 3.0 Omni together to solve one of the most common pain points in AI product video: text that blurs, warps, or drifts mid-motion. We’ll walk through a practical workflow for maintaining legibility and visual consistency in product shots, so your labels, logos, and copy stay clean from the first frame to the last.

Using AI to save pets

Long dog walks are for nothing if not visualizing whatever silliness pops into my head—which today happened to be our puppy Ziggy becoming an impossible object called a “Ziggule.”

I shared this with my cousin Alicia, who does a tremendous amount of work sheltering & rescuing dogs in Austin, and she requested a portrait of their current foster pooch (Tesseract). I was of course all too happy to oblige:

As it happens, folks at Google have had the same idea, and they’ve been putting Nano Banana to work helping zhuzh up pics of shelter pets in hopes of helping them find their forever homes. Let’s hear it for using AI & old-fashioned human creativity for good!

Adobe Research shows promising “Vidmento” tech

As you’ve likely heard me say, I’ve gotten psyched up too many times about AI video-editing tech that fell short of its ambitions—but I’m hoping that this work from Adobe & Harvard collaborators can deliver what it describes:

We present Vidmento, an interactive video authoring tool that expands initial materials and ideas into compelling video stories through blending captured and generative media. To preserve narrative continuity and creative intent, Vidmento generates contextual clips that align with the user’s existing footage and story.

Per the site, Vidmento should enable:

  1. Story Discovery: Surface the stories within captured clips.
  2. Narrative Development: Suggest what’s needed to move the story forward.
  3. Contextual Blending: Generating visuals that align with real footage.
  4. Creative Control: Give creators controls to fine-tune the visuals and story.

From carbohydrates to polygons

Among the misbegotten “Oh, everyone will love this—but rarely will anyone actually use it” AR demos of 2017 (right alongside “See whether this toaster fits on my counter!”), imagining restaurants plopping a 3D model onto your plate was always a banger. Leaving aside whether anyone would actually want or value that experience, the cost of realistically modeling dishes was prohibitive.

This new tech at least promises to take the grunt work out of model creation, turning a single photo into an AR-ready 3D asset (give or take a tine or two ;-)):

Speak it -> See it, with Krea’s new voice mode

I try not to curse on this blog, doing so maybe a dozen times in 20+ (!!) years of posting. But circa 2013-2017, when I saw what felt like uncritical praise for Adobe’s voice-driven editing prototypes, I called bullshit.

The high-level concept was fine, but the tech at the time struck me as the worst of both worlds: the imprecision of language (e.g. how does a normal person know the term “saturation,” and how does an expert describe exactly how much they want?) combined with the fragility of traditional selection & adjustment algorithms.

Now, however, generative tech can indeed interpret our language & effect changes—and in the case of Krea’s new realtime mode, in a highly responsive way:

Whether or not voice per se becomes a popular modality here, closing the gap between idea & visual is just so seductive. To emphasize a previously made point:

Photoshop is totally cooked… except not

I couldn’t have contrived a better example of the power & pitfalls of generative imaging if I tried.

Here’s a pretty crummy cell phone picture I took yesterday from a moving train & then enhanced with a single prompt using Gemini. The results are incredible—if you don’t really care about the exact capacity of your jumbo jet! 🙂

The current state of AI-driven editing drives home the wisdom of that old Russian staying, “Trust… but verify.”

See also my previously shared example, in which Nano Banana quietly upgraded this propeller-driven plane into a jet:

AI + SVG: Vector all the things!

When it rains, it pours: No sooner did I post about text->vector than I saw two new entrants in that space. The new Quiver AI is claimed to have “solved vector design with AI”:

Here’s my first quick test, in which Quiver & Illustrator utterly smoke direct chat->vector output in Gemini & ChatGPT:

Meanwhile, check out what Recraft produced:

Elsewhere, Hero Studio promises great image->SVG conversion. I’ve applied for access & am eager to take it for a spin:

Can AI finally generate useful vectors?

When we launched Firefly three years ago (!), we talked up prompt-based vector creation. When the feature later arrived in Illustrator, it was really text-to-image-to-tracing. That could be fine, actually, provided that the conversion process did some smart things around segmenting the image, moving objects onto their own layers, filling holes, and then harmoniously vectorizing the results. I’m not sure whether Adobe actually got around to shipping that support.

In any case, Recraft now promises create vector creation directly from prompts:

Meanwhile Gemini promises SVG creation right out of the box. My previous attempts to use it produced results that were, um, impressionistic…

…and based on what they’re showing vis-à-vis recent updates, I haven’t been in a hurry to try again:

Creative technologist needed on the Flux team

I’ve really enjoyed collaborating with Black Forest Labs, the brain-geniuses behind Flux (and before that, Stable Diffusion). They’re looking for a creative technologist to join their team. Here’s a bit of the job listing in case the ideal candidate might be you or someone you know:

BFL’s models need someone who knows them inside out – not just what they can do today, but what nobody’s tried yet. This role sits at the intersection of creative excellence, deep model knowledge, and go-to-market impact. You’ll create the work that makes people realize what’s possible with generative media – original pieces, experiments, and creative assets that set the standard for what FLUX can do and show it to the world

Create original creative work that pushes FLUX to its limits – experiments, visual explorations, and pieces that show what’s possible before anyone else figures it out

Collaborate with the research and product teams from the start of training/product development to understand the core strengths of each new model/product and create assets that amplify and showcase these. You will also provide feedback to those teams throughout the development process on what needs to improve.

UI: Realtime generation & the undiscovered country

Former Apple designer Tuhin Kumar, who recently logged three years at Luma AI, makes a great point here:

To the extent I give Adobe gentle but unending grief about their near-total absence from the world of UI innovation, this is the kind of thing I have in mind. What if any layer in Photoshop—or any shape in Illustrator—could have realtime-rendering generative parameters attached?

Like, where are they? Don’t they want to lead? (It’s a genuine question: maybe the strategy is just to let everyone else try things, and then to finally follow along at scale.) And who knows, maybe certain folks are presently beavering away on secret awesome things. Maybe… I will continue hoping so!

Nano Banana goes to the Super Bowl

It’s hard to believe that when I dropped by Google in 2022, arguing vociferously that we work together to put Imagen into Photoshop, they yawned & said, “Can you show up with nine figures?”—and now they’re spending eight figures on a 60-second ad to promote the evolved version of that tech. Funny ol’ world…

Interactive relighting control for Qwen image creation

A couple of weeks ago I mentioned a cool, simple UI for changing camera angles using the Qwen imaging model. Along related lines, here’s an interface for relighting images:

Adobe vets launch AniStudio

My former colleagues Jue Wang & Chen Fang are making an impressive indie debut:

AniStudio exists because we believe animation deserves a future that’s faster, more accessible, and truly built for the AI era—not as an add-on, but from the ground up. This isn’t a finished story. It’s the first step of a new one, and we want to build it together with the people who care about animation the most.

Check it out:

GenLit enables animated relighting

I’m excited to learn more about GenLit, about which its creators say,

Given a single image and the 5D lighting signal, GenLit creates a video of a moving light source that is inside the scene. It moves around and behind scene objects, producing effects such as shading, cast shadows, secularities, and interreflections with a realism that is hard to obtain with traditional inverse rendering methods.

Vividon promises breakthrough relighting

I stumbled across some compelling teaser videos for this product, about which only a bit of info seems to be public:

A Photoshop plugin that brings truly photorealistic, prompt-free relighting into existing workflows. Instead of describing what you want in text, control lighting through visual adjustments. Change direction, intensity, and mood with precision… Modify lighting while preserving the structure and integrity of the original image. No more destructive edits or starting over.

Identity preservation—that is, exactly maintaining the shape & character of faces, products, and other objects—has been the lingering downfall of generative approaches to date, so I’m eager to take this for a spin & see how it compares to other approaches.

Krea is back with realtime creation—again

Those crazy presumable insomniacs are back at it, sharing a preview of the realtime generative composition tools they’re currently testing:

This stuff of course looks amazing—but not wholly new. Krea debuted realtime generation more than two years ago, leading to cool integrations with various apps, including Photoshop:

The interactive paradigm is brilliant, but comparatively low quality has always kept this approach from wide adoption. Compare these high-FPS renders to ChatGPT’s Studio Ghibli moment: the latter could require multiple minutes to produce a single image, but almost no one mentioned its slowness. “Fast is good, but good is better.”

I hope that Krea (and others) are quietly beavering away on a hybrid approach that combines this sort of addictive interactivity with a slower but higher-quality render (think realtime output fed into Nano Banana or similar for a final pass). I’d love to compare the results against unguided renders from the slower models. Perhaps we shall see!

Gettin’ deep with ML Sharp

Apple’s new 2D-to-3D tech looks like another great step in creating editable representations of the world that capture not just what a camera sensor saw, but what we humans would experience in real life:

Check out what my old teammate Luke was able to generate:

Adobe’s “Light Touch” promises powerful, intuitive relighting

Almost exactly 19 years ago (!), I blogged about some eye-popping tech that promised interactive control over portrait lighting:

I was of course incredibly eager to get it into Photoshop—but alas, it’d take years to iron out the details. Numerous projects have reached the market (see the whole big category here I’ve devoted to them), and now with “Light Touch,” Adobe is promising even more impressive & intuitive control:

This generative AI tool lets you reshape light sources after capture — turning day to night, adding drama, or adjusting focus and emotion without reshoots. It’s like having total control over the sun and studio lights, all in post.

Check it out:

If nothing else, make sure you see the pumpkin part, which rightfully causes the audience to go nuts. 🙂

“Keep the Robots Out of the Gym”

I keep finding myself thinking of this short essay from Daniel Miessler:

Think very carefully about where you get help from AI.

I think of it as Job vs. Gym.

  • If we’re working a manual labor job, it’s fine to have AI lift heavy things for us because the actual goal is to move the thing, not to lift it.
  • This is the exact opposite of going to the gym, where the goal is to lift the weight, not to move it.

He argues for identifying gym tasks (e.g. critical thinking, problem solving), and for those use just your brain (with minimal AI assistance, if any).

My primary metric for this is whether or not I am getting sharper at the skills that are closest to my identity.

The whole essay (2-min read) is worth checking out.

“There will still be smart people, but only those who choose to be”

As AI continues to infuse itself more deeply into our world, I feel like I’ll often think of Paul Graham’s observation here:

Qwen promises images->layers

I initially mistook this tech as text->layers, but it’s actually image->layers. Having said that, if it works well, it might be functionally similar to direct layer output. I need to take it for a spin!

Look at this & tell me Photoshop’s not cooked

Sorry-not-sorry to be a bit provocative, but seriously, to highlight one of one million examples:

And in a slightly more demanding case:

For the latter, I used Photoshop to remove a couple of artifacts from the initial Scarface-to-puppy Nano Banana generation, and to resize the image to fit onto a canvas—but geez, there’s almost no world where I’d now think to start in PS, as I would’ve for the last three decades.

Back in 2002, just after Photoshop godfather Mark Hamburg left the project in order to start what became Lightroom, he talked about how listening too closely to existing customers could backfire: they’ll always give you an endless list of nerdy feature requests, but in addressing those, you’ll get sucked up the complexity curve & end up focusing on increasingly niche value.

Meanwhile disruptive competitors will simply discard “must-have” features (in the case of Lightroom, layers), as those had often proved to be irreducibly complex. iOS did this to macOS not by making the file system easier to navigate, but by simply omitting normal file system access—and only later grudgingly allowing some of it.

Steve Jobs famously talked about personal computers vs. mobile devices in terms of cars vs. trucks:

Obviously Photoshop (and by analogy PowerPoint & Excel & other “indispensable” apps) will stick around for those who genuinely need it—but generative apps will do to Photoshop what (per Hamburg) Photoshop did to the Quantel Paintbox, i.e. shove it up into the tip of the complexity/usage pyramid.

Adobe will continue to gamely resist this by trying to make PS easier to use, which is fine (except of course where clumsy new affordances get in pros’ way, necessitating a whole new “quiet mode” just to STFU!). And—more excitingly to guys like me—they’ll keep incorporating genuinely transformative new AI tech, from image transformation to interactive lighting control & more.

Still, everyone sees what’s unfolding, and “You cannot stop it, you can only hope to contain it.” Where we’re going, we won’t need roads.

When life gives you hospitalized lemons…

…you waste pass the time screwing around doing competitive AI model featuring the building’s baffling architecture…

…and sketchy chow:

How-to: AI renovation vids

This seems like the kind of specific, repeatable workflow that’ll scale & create a lot of real-world value (for home owners, contractors, decorators, paint companies, and more). In this thread Justine Moore talks about how to do it (before, y’know, someone utterly streamlines it ~3 min from now!):

Google virtual try-on arrives

Well, after years and years of trying to make it happen, Google has now shipped the ability to upload a selfie & see yourself in a variety of outfits. You can try it here.


At least in my initial tests, results were kinda weird & off-putting:


I mean, obviously this ancient banger (courtesy of Bryan O’Neil Hughes, c.2003) is the only correct rendering! 🙂

Insane Nano Banana style transfer

As I’m fond of noting, only thing more incredible than witchcraft like this is just how little notice people now take of it. ¯\_(ツ)_/¯ But Imma keep noticing!

Two years ago (i.e. an AI eternity, obvs), I was duly impressed when, walking around a model train show with my son, DALL•E was able to create art kinda-sorta in the style of vintage boxes we beheld:

I still think that’s amazing—and it is!—but check out how far we’ve come. At a similar gathering yesterday, I took the photo below…

…and then uploaded it to Gemini with the following prompt: “Please create a stack of vintage toy car boxes using the style shown in the attached picture. The cars should be a silver 1990 Mazda Miata, a red 2003 Volkswagen Eurovan, a blue 2024 Volvo XC90, and a gray 2023 BMW 330.” And boom, head shot, here’s what it made:

I find all this just preposterously wonderful, and I hope I always do.

As Einstein is said to have remarked, “There are only two ways to live your life: one is as though nothing is a miracle, the other is as though everything is.”

Content-Aware Fail (and how to avoid it)

Jesús Ramirez has forgotten, as the saying goes, more about Photoshop than most people will ever know. So, encountering some hilarious & annoying Remove Tool fails…


…reminded me that I should check out his short overview on “How To Remove Anything From Photoshop.”