Category Archives: AI/ML

Google Earth + Nano Banana? Go Go Godzilla!

I love this kind of simple, scrappy creativity,:

Here’s the Chrome extension:

  • Capture any Google Earth 3D view
  • Transform with AI (Nano Banana Pro) into cinematic shots
  • Generate videos (Veo 3.1) with customizable duration and audio

GenFill + Vividon = Magic

It’s insane what we can do now—from object removal to lighting changes—that was simply out of the question even a year ago.

Check out this little progression of edits, starting with the newly enhanced Generative Fill in the Photoshop beta, followed by a couple of steps of Remove, followed by a pass with Vividon & a few tweaks in Camera Raw (running inside PS):

Nutty & I’m here for it. Per PetaPixel,

Co-founder and Chief Innovation Officer Marcus Kurn adds that the ability to deliver two or three lighting variations alongside every final image is a real differentiator: “once you start delivering two or three lighting variations with every final image, your clients will never want to go back.”

Vividon relighting comes to Photoshop

“No prompting, no friction. Just incredible results.”

As I mentioned back in January, Vividon offers new generative relighting tech that promises amazing realism & identity preservation:

Vividon places every relight on its own Photoshop layer. Adjust opacity, change blend modes, paint in or out exactly what you want, or remove it entirely. Your original always stays untouched.

Check out a 10s demo below, and visit their site for a more interactive preview:

And here’s a full 2-minute tour:

“A vehicle that cares back”

“People will forget you said, people will forget what you did, but people will never forget how you made them feel.” — Maya Angelou

I’ve reflected on this maxim countless times over the last couple of years, as I’ve considered the relationships I want with AI—particularly with notional creative partners. I want a partner who cares—who (which?) actually takes the time to get to know me, asking thoughtful questions, noodling on answers, and genuinely taking my feedback to heart.

I thought of this while listening to Stewart Brand talking to Ezra Klein the other day. Check out this poetic & provocative passage:


Well, it wound up that, basically, most of the book is Chapter 2, “Vehicles.” And the land vehicle that humans have used for 6,000 years is a horse, and the horse takes a lot of maintenance.

I’ll read something here from the book, if I may. There’s this philosopher named Albert Borgmann who wrote:

You cannot remain unmoved by the gentleness and conformation of a well-bred and well-trained horse — more than a thousand pounds of big-boned, well-muscled animal, slick of coat and sweet of smell, obedient and mannerly, and yet forever a menace with its innocent power and ineradicable inclination to seek refuge in flight, and always a burden with its need to be fed, wormed and shod, with its liability to cuts and infections, to laming and heaves. But when it greets you with a nicker, nuzzles your chest and regards you with a large and liquid eye, the question of where you want to be and what you want to do has been answered.

And I end with: “I wonder if that might come again someday — a vehicle that cares back.”

The scarily beautiful animation of Sincitium

Side note: “Macrófago” is 100% the best word I’ve learned all week.

AI filmmaking turns a (creepy, fun) corner

This is the first time I can recall watching a genuine narrative (not a handful of gee-whiz demo shots) made with AI & not really caring about the production details. We’re turning the inevitable corner where it’s just the quality of ideas & narrative that’ll matter—not so much how the proverbial sausage was made.

See yourself from a new angle in Google Photos

Get some fresh perspective from our amazing teammates in research:

Today we are announcing a new approach to fix scene alignment after a photo was taken. Our method, now available as part of the Auto frame feature in Google Photos, uses machine learning (ML) models to understand the scene and its spatial layout and uses generative AI to imagine the photo from that new perspective. In contrast to classical photo editing, our method interprets a photo as a 3D scene — think of a real moment frozen in time — and change the camera position automatically within that space.

How to get the most from Nano Banana

My new teammates have posted a series of detailed tips & tech specs (e.g. you can upload as many as 14 images together with a prompt). Check it out!

1. Introduction to Nano Banana

  • The Models: Overview of Nano Banana 2 (powered by real-time web search) and Nano Banana Pro (built for high-end reasoning).
  • Core Strengths: Deep reasoning capabilities, accurate visual rendering, and premium features like text rendering and upscaling (2K/4K).

2. Technical Specs at a Glance

  • Context Windows: Up to 131,072 input tokens for Nano Banana 2.
  • Versatility: Supports multiple aspect ratios (from 1:1 to 21:9) and up to 14 reference images in a single prompt.
  • Safety: Built-in SynthID watermarking and C2PA credentials for responsible AI use.

3. Best Practices for Prompting

  • Be Specific: Focus on concrete details regarding subject, lighting, and composition.
  • Positive Framing: Describe what should be there (e.g., “empty street”) rather than what shouldn’t.
  • Director’s Perspective: Use cinematic terms like “low angle,” “bokeh,” or “aerial view.”

4. Five Powerful Prompting Frameworks

  • Image Generation: Using the [Subject] + [Action] + [Context] + [Style] formula.
  • Image Editing: Utilizing “Semantic Masking” to change specific parts of an image via text.
  • Real-Time Data: Leveraging web search to create visuals based on current events or weather.
  • Text Rendering: How to get legible, localized text in over 10 languages within your images.
  • Creative Direction: Advanced tips for controlling lighting (e.g., Chiaroscuro), camera hardware (e.g., GoPro vs. Fujifilm), and film stock.

5. The Creative Ecosystem

  • How to combine Nano Banana with other models like Gemini (for prompt engineering), Veo (for video keyframes), and Lyria (for AI soundtracks).

Photoshop, 3D, and redemption

“Being early is the same as being wrong.” — Marc Andreessen, Vol. ~900

We put 3D into Photoshop nearly 20 years ago, and it got used by nearly 20 people total, lol. For many of the past several years, it was on the team’s “gotta throw overboard, as soon as we can find time” list—but happily that time was never found.

I am so glad to see this foundation now finding a meaningful niche, and I have high hopes for its generative future. Posing a person or thing directly is so much more intuitive than trying to precisely describe an outcome via prompt, and simple 3D manipulation + generative rendering could well deliver game-changing best of both worlds.

Canva’s new Magic Layers converter is really impressive

As generative imaging models like Nano Banana get increasingly adept at rendering text-heavy layouts, the ability to convert those layouts into native text/image compositions is of course hugely valuable for editing. Check out Canva’s new Magic Layers feature:

I couldn’t resist trying it out with a silly infographic I made using the new ChatGPT image model, and dang if it didn’t do a pretty a great job:

“LooseRoPE” promises super intuitive illustration & compositing

Man, it must be nearly 20 years ago that we started envisioning drag-and-drop-simple composition and compositing in Photoshop—back when gradient-domain painting & blending was the emerging hotness. After plenty of false starts, could these simple interaction patterns finally become mainstream? Maybe! I must know more of this witchcraft:

You can now verify Google AI-generated videos in the Gemini app

You can now check if a video was edited or created with Google AI directly in the Gemini app.

Just upload a video and ask something like, “Was this generated using Google AI?” Gemini will scan for the imperceptible SynthID watermark across both the audio and visual tracks and use its own reasoning to return a response that gives you context. For example, it might say: “SynthID detected within the audio between 10-20 secs. No SynthID detected in the visuals.”

Uploaded files can be up to 100 MB and 90 seconds long.

Scout with Maps, animate with Veo

Check out this super cool mashup between Google Maps & my new product, Veo (video generation):

The team writes,

With Maps Imagery Grounding, a film studio can use a laptop to quickly visualize a scene at a specific place, like Washington Square Park in New York City—before scouts ever set foot on set. It’s easy to use: just type a prompt like “generate an image of a futuristic spaceship hovering in front of the Washington Square Arch” into the Gemini Enterprise Agent Platform and enable grounding with Google Maps Imagery in settings. In seconds, you can storyboard your creative vision with an accurate image—and you can even use Veo to animate the scene.

“AI will never suffer from bipolar disorder and autism like me”

Spending four minutes listening to Diplo’s thoughts on how art will be made going forward, and specifically on the value of quirky, messy, world-experiencing humans will be a good use of your time, I promise. The machine needs us ghosts.

“A rare look at how Hollywood is already using AI”

I’ve been sending this video to friends & family to explain what the heck it is I actually, y’know, do for a living. (It’s somehow related to enabling all this!)

Here’s a good summary from Gemini:

  • Digital Clones for A-listers (0:33–1:56): The Creative Artist Agency (CAA) is helping actors create and store secure digital doubles of their likeness and vocal inflections. This serves as a “vault” to protect their intellectual property and assert rights against unauthorized use.
  • Deep Voodoo’s AI Innovations (2:15–3:54): Founded by Trey Parker and Matt Stone of South Park, this studio uses proprietary facial scanning and AI to perform tasks like real-time de-aging for projects like the TV series Before and Billy Joel‘s recent music video.
  • Production Efficiency and Ethics (6:03–7:40): Director Darren Aronofsky and filmmaker Eliza McNitt utilized Google’s Veo 3 model for the short film Ancestra. AI allowed them to create complex cosmic visuals and even recreate a newborn baby digitally to avoid the ethical concerns of filming with a real infant.
  • Commercially Safe AI Tools (8:00–9:10): Asteria Film Company, co-founded by Natasha Lyonne and Bin Moser, focuses on building “commercially safe” AI models trained strictly on licensed materials to avoid copyright infringement, emphasizing that learning to use AI is an essential skill for modern filmmakers.
  • The Human Element (4:48–5:13): Despite the rapid evolution of AI, industry unions like SAG-AFTRA emphasize that human performers bring a unique, special quality to projects that algorithms cannot replicate, advocating for guardrails to ensure AI serves as a tool for creators rather than a replacement.

Phota launches, promising maximum identity preservation

Phota—about which I expressed some initial misgivings, given its ability to rewrite memories—has launched Phota Studio & their API. From what I can tell, it builds upon a Nano Banana foundation and adds personalization that relies on uploading dozens of images of each individual in order to maximize identity preservation:

With Phota, for the first time, you can generate, edit, and enhance photos while keeping your identity intact, every time.

We’re not building a generic foundation model. We build personal models about you, and about the people and pets around you. At the center are profiles, built from your personal album that learn the details of your appearance that make you recognizable as yourself: how you smile, your eye color, and how your face looks from different angles. Your personal model is private and only used by you.

Here’s a quick thread in which I tried inserting myself into a couple of images, using both Phota’s model (which depended on my uploading 30+ images of myself) and just Nano Banana straight out of the Gemini app:

FreePik enables 3D photo shoots

I love seeing progress like this: upload a product pic, convert it to 3D, and photograph it on a virtual set:

Runway debuts Multi-Shot

Here’s a fun, ultra-simple way to turn an image (or just a prompt) into a short, multi-shot narrative:

Just for fun I fed it this image…

…and this prompt (based on an all-too-true story):

A family of Lego people and their dog gaze around Yosemite’s most iconic vista, then reminisce about that time they got stuck there in the snow in their VW van, expressing hope that they don’t get stuck again!

Check out the results:

I Love(art) to move it, move it…

I’ve long quoted James Ratliff, the super sharp designer behind Adobe’s Project Graph (who’s recently decamped to Figma), in nicely phrasing how the process of generating & refining ideas generally starts broad/declarative (searching, prompting) and moves towards fine-grained methods (selecting, moving, etc.):

I see an increasing number of tool & model creators mixing modalities—even in the Gemini Super Bowl ad featuring a mom & daughter drawing a simple circle to show where they’d like to add a dog bed.

I’m eager to check out Lovart’s take on the possibilities, especially for animation:

Update: Here’s a look at the UI, in which you can move & scale the selection rectangle, as well as the before & after images:

Spline enables agentic 3D creation

“3D scenes, websites, games, apps,” promises Spline. “Describe anything and Omma builds it for you in seconds.”

Omma combines code generation (LLMs), 3D AI mesh generation, and Image generation all in one place for you to build and ship. Deploy to production, assign custom domains, and more.

Sora is the new Peach

Ten years ago (!), the embryonic social app Peach suddenly blew up on the scene—only to molder shortly thereafter. Adam Lisagor tartly predicted that outcome right after Peach debuted:

I’m reminded of this upon hearing that OpenAI has bailed out on Sora, which they launched just a few months ago. In a way I’m not surprised—check out how interest in the tech spiked & then rapidly cratered—except that just a couple of months ago Disney signed a billion-dollar deal to use it. ¯\_(ツ)_/¯

Luma UNI-1 promises layered creation

When can we get this (or equivalent) into Photoshop??

On a conceptually (though not necessarily technically) related note, the LICA dataset may help model makers train layered generation:

Photoshop drops Rotate Object!

Speaking of spinning right ’round, check this out:

Check out another view, from Paul Trani:

Runway promises 100ms (!!) HD video generation

Five years ago, I spent an afternoon with a buddy watching Disco Diffusion resolve a weird, blurry, but ultimately delightful scene over the course of 15 minutes. Now Runway & NVIDIA are previewing generation that’s a mere ~90,000x faster than that. Ludicrous speed, go!!

Tips: Getting great text from Nano Banana

Structuring your prompt well turns out to be key in avoiding garbled text. As the presenter says, “It’s not about writing more. It’s about writing in the right order.” Check out this brief overview.

In this tutorial, you’ll see how to use Nano Banana Pro and Kling 3.0 Omni together to solve one of the most common pain points in AI product video: text that blurs, warps, or drifts mid-motion. We’ll walk through a practical workflow for maintaining legibility and visual consistency in product shots, so your labels, logos, and copy stay clean from the first frame to the last.

Using AI to save pets

Long dog walks are for nothing if not visualizing whatever silliness pops into my head—which today happened to be our puppy Ziggy becoming an impossible object called a “Ziggule.”

I shared this with my cousin Alicia, who does a tremendous amount of work sheltering & rescuing dogs in Austin, and she requested a portrait of their current foster pooch (Tesseract). I was of course all too happy to oblige:

As it happens, folks at Google have had the same idea, and they’ve been putting Nano Banana to work helping zhuzh up pics of shelter pets in hopes of helping them find their forever homes. Let’s hear it for using AI & old-fashioned human creativity for good!

Adobe Research shows promising “Vidmento” tech

As you’ve likely heard me say, I’ve gotten psyched up too many times about AI video-editing tech that fell short of its ambitions—but I’m hoping that this work from Adobe & Harvard collaborators can deliver what it describes:

We present Vidmento, an interactive video authoring tool that expands initial materials and ideas into compelling video stories through blending captured and generative media. To preserve narrative continuity and creative intent, Vidmento generates contextual clips that align with the user’s existing footage and story.

Per the site, Vidmento should enable:

  1. Story Discovery: Surface the stories within captured clips.
  2. Narrative Development: Suggest what’s needed to move the story forward.
  3. Contextual Blending: Generating visuals that align with real footage.
  4. Creative Control: Give creators controls to fine-tune the visuals and story.

From carbohydrates to polygons

Among the misbegotten “Oh, everyone will love this—but rarely will anyone actually use it” AR demos of 2017 (right alongside “See whether this toaster fits on my counter!”), imagining restaurants plopping a 3D model onto your plate was always a banger. Leaving aside whether anyone would actually want or value that experience, the cost of realistically modeling dishes was prohibitive.

This new tech at least promises to take the grunt work out of model creation, turning a single photo into an AR-ready 3D asset (give or take a tine or two ;-)):

Speak it -> See it, with Krea’s new voice mode

I try not to curse on this blog, doing so maybe a dozen times in 20+ (!!) years of posting. But circa 2013-2017, when I saw what felt like uncritical praise for Adobe’s voice-driven editing prototypes, I called bullshit.

The high-level concept was fine, but the tech at the time struck me as the worst of both worlds: the imprecision of language (e.g. how does a normal person know the term “saturation,” and how does an expert describe exactly how much they want?) combined with the fragility of traditional selection & adjustment algorithms.

Now, however, generative tech can indeed interpret our language & effect changes—and in the case of Krea’s new realtime mode, in a highly responsive way:

Whether or not voice per se becomes a popular modality here, closing the gap between idea & visual is just so seductive. To emphasize a previously made point:

Photoshop is totally cooked… except not

I couldn’t have contrived a better example of the power & pitfalls of generative imaging if I tried.

Here’s a pretty crummy cell phone picture I took yesterday from a moving train & then enhanced with a single prompt using Gemini. The results are incredible—if you don’t really care about the exact capacity of your jumbo jet! 🙂

The current state of AI-driven editing drives home the wisdom of that old Russian staying, “Trust… but verify.”

See also my previously shared example, in which Nano Banana quietly upgraded this propeller-driven plane into a jet:

AI + SVG: Vector all the things!

When it rains, it pours: No sooner did I post about text->vector than I saw two new entrants in that space. The new Quiver AI is claimed to have “solved vector design with AI”:

Here’s my first quick test, in which Quiver & Illustrator utterly smoke direct chat->vector output in Gemini & ChatGPT:

Meanwhile, check out what Recraft produced:

Elsewhere, Hero Studio promises great image->SVG conversion. I’ve applied for access & am eager to take it for a spin:

Can AI finally generate useful vectors?

When we launched Firefly three years ago (!), we talked up prompt-based vector creation. When the feature later arrived in Illustrator, it was really text-to-image-to-tracing. That could be fine, actually, provided that the conversion process did some smart things around segmenting the image, moving objects onto their own layers, filling holes, and then harmoniously vectorizing the results. I’m not sure whether Adobe actually got around to shipping that support.

In any case, Recraft now promises create vector creation directly from prompts:

Meanwhile Gemini promises SVG creation right out of the box. My previous attempts to use it produced results that were, um, impressionistic…

…and based on what they’re showing vis-à-vis recent updates, I haven’t been in a hurry to try again:

Creative technologist needed on the Flux team

I’ve really enjoyed collaborating with Black Forest Labs, the brain-geniuses behind Flux (and before that, Stable Diffusion). They’re looking for a creative technologist to join their team. Here’s a bit of the job listing in case the ideal candidate might be you or someone you know:

BFL’s models need someone who knows them inside out – not just what they can do today, but what nobody’s tried yet. This role sits at the intersection of creative excellence, deep model knowledge, and go-to-market impact. You’ll create the work that makes people realize what’s possible with generative media – original pieces, experiments, and creative assets that set the standard for what FLUX can do and show it to the world

Create original creative work that pushes FLUX to its limits – experiments, visual explorations, and pieces that show what’s possible before anyone else figures it out

Collaborate with the research and product teams from the start of training/product development to understand the core strengths of each new model/product and create assets that amplify and showcase these. You will also provide feedback to those teams throughout the development process on what needs to improve.

UI: Realtime generation & the undiscovered country

Former Apple designer Tuhin Kumar, who recently logged three years at Luma AI, makes a great point here:

To the extent I give Adobe gentle but unending grief about their near-total absence from the world of UI innovation, this is the kind of thing I have in mind. What if any layer in Photoshop—or any shape in Illustrator—could have realtime-rendering generative parameters attached?

Like, where are they? Don’t they want to lead? (It’s a genuine question: maybe the strategy is just to let everyone else try things, and then to finally follow along at scale.) And who knows, maybe certain folks are presently beavering away on secret awesome things. Maybe… I will continue hoping so!

Nano Banana goes to the Super Bowl

It’s hard to believe that when I dropped by Google in 2022, arguing vociferously that we work together to put Imagen into Photoshop, they yawned & said, “Can you show up with nine figures?”—and now they’re spending eight figures on a 60-second ad to promote the evolved version of that tech. Funny ol’ world…

Interactive relighting control for Qwen image creation

A couple of weeks ago I mentioned a cool, simple UI for changing camera angles using the Qwen imaging model. Along related lines, here’s an interface for relighting images:

Adobe vets launch AniStudio

My former colleagues Jue Wang & Chen Fang are making an impressive indie debut:

AniStudio exists because we believe animation deserves a future that’s faster, more accessible, and truly built for the AI era—not as an add-on, but from the ground up. This isn’t a finished story. It’s the first step of a new one, and we want to build it together with the people who care about animation the most.

Check it out: