Five years ago, I spent an afternoon with a buddy watching Disco Diffusion resolve a weird, blurry, but ultimately delightful scene over the course of 15 minutes. Now Runway & NVIDIA are previewing generation that’s a mere ~90,000x faster than that. Ludicrous speed, go!!
A breakthrough in real-time video generation.
As a research preview developed with @NVIDIA and shared at @NVIDIAGTC this week, we trained a new real-time video model running on Vera Rubin. HD videos generate instantly, with time-to-first-frame under 100ms. Unlocking an entirely… pic.twitter.com/juafjvk0wm
I always appreciate getting a peek into the incredible effort & craftsmanship that go into a production like this. Forget special effects: the physical grit on display here can’t be faked.
Now throw your shoulders back and go effin’ nuts. 😀
And for some more blog-appropriate content: Here are some fun pics & vids my son Henry & I captured on Saturday during SF’s wonderfully diverse & quirky St. Patrick’s Day parade:
Bonus: here’s a gallery of Irish wolfhounds, if you’re into that kind of thing. I couldn’t quite get these good boys to align like Cerberus, so I resorted to telling Gemini my hopes & dreams—as one does.
An AI paradox: as models get vastly more complex, interfaces can get vastly simpler. We can make computers conform to our reality—not the other way around.
Steve Jobs described exactly this evolution all the way back in 1981:
Structuring your prompt well turns out to be key in avoiding garbled text. As the presenter says, “It’s not about writing more. It’s about writing in the right order.” Check out this brief overview.
In this tutorial, you’ll see how to use Nano Banana Pro and Kling 3.0 Omni together to solve one of the most common pain points in AI product video: text that blurs, warps, or drifts mid-motion. We’ll walk through a practical workflow for maintaining legibility and visual consistency in product shots, so your labels, logos, and copy stay clean from the first frame to the last.
Hey, remember the pandemic? We sure made some impulse buys then, didn’t we?
For me it was Insta360’s bizarre, modular 360º camera plus the elaborate mounting kit that promised to strap its shards onto the top & bottom of my DJI Mavic, enabling some magical, drone-less captures. Suffice it to say the thing was a complete POS—dysfunctional even as a handheld action cam, much less as a bunch of theoretically interconnected pieces thousands of feet in the air.
And yet… who doesn’t love the promise of capturing immersive footage that enables crazy post-processing camera moves? Insta’s on it, releasing their first 360º drone, the Antigravity A1:
Some cool details:
With Antigravity’s proprietary FreeMotion technology, the drone — together with the Vision goggles and Grip controller — enables an immersive flying experience that feels both natural and intuitive. Pilots can fly in one direction while looking in another. This level of immersion enables more freedom to explore. The 360 immersion doesn’t end just because the drone lands — recorded footage can be viewed in 360 over and over again, letting users discover new angles every time they watch.
Long dog walks are for nothing if not visualizing whatever silliness pops into my head—which today happened to be our puppy Ziggy becoming an impossible object called a “Ziggule.”
I shared this with my cousin Alicia, who does a tremendous amount of work sheltering & rescuing dogs in Austin, and she requested a portrait of their current foster pooch (Tesseract). I was of course all too happy to oblige:
As it happens, folks at Google have had the same idea, and they’ve been putting Nano Banana to work helping zhuzh up pics of shelter pets in hopes of helping them find their forever homes. Let’s hear it for using AI & old-fashioned human creativity for good!
Photos play a big role in pet adoption.
We’ve teamed up with shelters across the country to give rescue pets glamorous headshots that show off their personalities, made with Nano Banana Pro.
As you’ve likely heard me say, I’ve gotten psyched up too many times about AI video-editing tech that fell short of its ambitions—but I’m hoping that this work from Adobe & Harvard collaborators can deliver what it describes:
We present Vidmento, an interactive video authoring tool that expands initial materials and ideas into compelling video stories through blending captured and generative media. To preserve narrative continuity and creative intent, Vidmento generates contextual clips that align with the user’s existing footage and story.
Per the site, Vidmento should enable:
Story Discovery: Surface the stories within captured clips.
Narrative Development: Suggest what’s needed to move the story forward.
Contextual Blending: Generating visuals that align with real footage.
Creative Control: Give creators controls to fine-tune the visuals and story.
The older I get, the harder it is to get the Kids These Days™ to grok just what a road-to-Damascus moment the arrival of the Mac presented. I flap my arms like some conspiracy nut at his cork board, trying in vain to convey the idea that in the pre-Mac days, personal computer “art” consisted of pecking out some green ASCII blocks on an Apple ][. Okay, grandpa, let’s get you to bed…
Anyway, predating even me (heh) is this glimpse of how computer animation was painstakingly eeked out via data tape (!) back in 1971.
Among the misbegotten “Oh, everyone will love this—but rarely will anyone actually use it” AR demos of 2017 (right alongside “See whether this toaster fits on my counter!”), imagining restaurants plopping a 3D model onto your plate was always a banger. Leaving aside whether anyone would actually want or value that experience, the cost of realistically modeling dishes was prohibitive.
This new tech at least promises to take the grunt work out of model creation, turning a single photo into an AR-ready 3D asset (give or take a tine or two ;-)):
AR GenAI by AR Code is transforming the food industry. Creating an AR experience for a dish can now start with a single photo.
As shown in the video, a single dessert photo is converted into an AR-ready 3D model with realistic textures and depth. AR Code SaaS then instantly… pic.twitter.com/s1H5do1UUf
“Wow, that’s some really sharp After Effects work,” I thought last year, when my wife showed me some animation her Airbnb colleague had created. But nope—the work came straight out of Canva.
Not content to chill with their surprisingly capable foundation, Canva is continuing to build out the “Creative Operating System” and has announced the acquisition of up-and-coming 2D animation tool Cavalry:
In their blog post they seem pretty adamant that the acquisition won’t result in dumbing down the core app:
Built for professional motion designers
Cavalry earned its place in the motion design world by doing something different. Its procedural, systems-based approach prioritises flexibility, repeatability, and performance. It wasn’t built as a simplified alternative; it was built specifically for professional motion designers and the complex workflows they rely on. That professional focus remains central.
We’ve invested in Cavalry because of its depth as a professional-grade motion tool. The goal isn’t to simplify what makes it powerful, but to support and strengthen it. Professional motion design demands precision, flexibility, and tools that can scale across complex projects.
Much as with their acquisition of Affinity, however, I’d fully expect Canva to integrate underlying tech into the core design platform, radically simplifying the interface to it—including by providing agentic and chat-based touchpoints.
As with the myriad node-based systems that sprung up last year, I wouldn’t expect most people to ever see or touch the underlying data structures. Rather, what’s essential is that the main tool can understand & modify them, so that it can deliver brilliant results at scale. That necessitates a very approachable, and totally complementary, UX.
I try not to curse on this blog, doing so maybe a dozen times in 20+ (!!) years of posting. But circa 2013-2017, when I saw what felt like uncritical praise for Adobe’s voice-driven editing prototypes, I called bullshit.
The high-level concept was fine, but the tech at the time struck me as the worst of both worlds: the imprecision of language (e.g. how does a normal person know the term “saturation,” and how does an expert describe exactly how much they want?) combined with the fragility of traditional selection & adjustment algorithms.
Now, however, generative tech can indeed interpret our language & effect changes—and in the case of Krea’s new realtime mode, in a highly responsive way:
Whether or not voice per se becomes a popular modality here, closing the gap between idea & visual is just so seductive. To emphasize a previously made point:
We simply have not started rethinking interactions from the grounds up.
So many possibilities wide open when you think of human – AI in micro feedback loops vs automation alone or classic back and forth. https://t.co/iVKb02SbdU
I got into the Mac scene just a touch too late to have interacted with Aldus (acquired by Adobe in 1994), and I’m sorry not to have known the late Paul Brainerd, who passed away a couple of weeks ago. To mark the occasion, some friends have been resharing this video, created when the company became part of the Big Red A. It’s fun to see a few familiar faces & to remember the tech vibe of those early days:
I had no idea that the ol’ girl had it in (er, on) her—but this is too odd & thus interesting not to pass along:
Meanwhile, speaking of odd: Having just visited the Mojave aircraft boneyard (see pics) and Spaceport, from which the weird creations of Burt Rutan & co. operate, I couldn’t resist trying this silliness:
I asked Nano Banana to imagine legendary aircraft designer Burt Rutan rocking the sort of canard wings he loves including on planes.