Here’s a little holiday-appropriate experiment featuring a shot of my dad & me (in Lego form, naturally) at my grandmother’s family farm in County Mayo. Sláinte!
Speaking of reskinning imagery (see last several posts), check out what’s now possible via Google’s Gemini model, below. I’ve been putting it to the test & will share results shortly.
Alright, Google really killed it here.
You can easily swap your garment just by uploading the pieces to Gemini Flash 2.0 and telling it what to do. pic.twitter.com/pNPBkIdRqy
This temporally coherent inpainting is utterly bonkers. It’s just the latest—and perhaps the most promising—in myriad virtual try-on techniques I’ve seen & written about over the years.
I love seeing the Magnific team’s continued rapid march in delivering identity-preserving reskinning
IT’S FINALLY HERE!
Mystic Structure Reference!
Generate any image controlling structural integrity Infinite use cases! Films, 3D, video games, art, interiors, architecture… From cartoon to real, the opposite, or ANYTHING in between!
This example makes me wish my boys were, just for a moment, 10 years younger and still up for this kind of father/son play. 🙂
Storyboarding? No clue! But with some toy blocks, my daughter’s wild imagination, and a little help from Magnific Structure Reference, we built a castle attacked by dragons. Her idea coming to life powered up with AI magic. Just a normal Saturday Morning. Behold, my daughter’s… pic.twitter.com/52tDZokmIT
“Rather than removing them from the process, it actually allowed [the artists] to do a lot more—so a small team can dream a lot bigger.”
Paul Trillo’s been killing it for years (see innumerable previous posts), and now he’s given a peek into how his team has been pushing 2D & 3D forward with the help of custom-trained generative AI:”
Traditional 2d animation meets the bleeding edge of experimental techniques. This is a behind the scenes look at how we at Asteria brought the old and the new together in this throwback animation “A Love Letter to Los Angeles” and collaboration with music artist Cuco and visual… pic.twitter.com/3eWSdgckXn
A passing YouTube vid made me wonder about the relative strengths of World War II-era bombers, and ChatGPT quickly obliged by making me a great little summary, including a useful table. I figured, however, that it would totally fail at making me a useful infographic from the data—and that it did!
Just for the lulz, I then ran the prompt (“An infographic comparing the Avro Lancaster, Boeing B-17, and Consolidated B-24 Liberator bombers”) through a variety of apps (Ideogram, Flux, Midjourney, and even ol’ Firefly), creating a rogue’s gallery of gibberish & Franken-planes. Check ’em out.
Currently amusing myself with how charmingly bad every AI image generator is at making infographics—each uniquely bizarre! pic.twitter.com/U3cs8ySoVa
By combining @pika_labs Pikaframes and @freepik, I now have the magical ability to jump through space and time and in this example, music becomes a transformative element teleporting this woman to a new location. This is how it’s done. 1/6
Generate image (in this example, using Google Imagen).
Apply background segmentation.
Synthesize a new background, and run what I think is a fine-tuned version of IC-Light (using Stable Diffusion) to relight the entire image, harmonizing foreground/background. Note that identity preservation (face shape, hair color, dress pattern, etc.) is very good but not perfect; see changes in the woman’s hair color, expression, and dress pattern.
Put the original & modified images into Pika, then describe the desired transformation (smooth transition, flowers growing, clouds moving, etc.).
Another day, another ~infinite canvas for ideation & synthesis. This time, somewhat to my surprise, the surface comes from VSCO—a company whose users I’d have expected to be precious & doctrinaire in their opposition to any kind of AI-powered image generation. But who knows, “you can just do things.” ¯\_(ツ)_/¯
The capturing work was led by Harry Nelder and Amity Studio. Nelder used his 16-camera rig to capture the recent winners. The reconstruction software was a combination of a cloud-based platform created by Nelder, which is expected to be released later this year, along with Postshot. Nelder further utilized the Radiance Field method known as Gaussian Splatting for the reconstruction. A compilation video of all the captures, recently posted by BAFTA, was edited by Amity Studio
Is it for me? Dunno: lately the only thing that justifies shooting with something other than my phone is a big, fast zoom lens, and I don’t know whether pairing such a thing with this slim beauty would kinda defeat the purpose. Still, I must know more…
Here’s a nice early look at the cam plus a couple of newly announced lenses:
Check it out (probably easier to grok by watching vs. reading a description):
From the static camera feed, EditIQ initially generates multiple virtual feeds, emulating a team of cameramen. These virtual camera shots termed rushes are subsequently assembled using an automated editing algorithm, whose objective is to present the viewer with the most vivid scene content.
Tired: Random “slot machine”-style video generation Inspired: Placing & moving simple guidance objects to control results: Check out VideoNoiseWarp:
Every now and then something comes along that feels like it could change everything… NoiseWarp + CogVideoX lets you animate live action scenes with rough mockups!
Here’s the tutorial! This video combines AI-generated elements (balloon, kite, surfboard, and backgrounds) with my own real-world practical effects and stop motion.
The YouTube mobile app can now tap into Google’s Veo model to generate video, as shown below. Hmm—this feels pretty niche at the moment, but it may suggest the shape of things to come (ubiquitous media synthesis, anywhere & anytime it’s wanted).
For the longest time, Firefly users’ #1 request was to use images to guide composition of new images. Now that Firefly Video has arrived, you can use a reference image to guide the creation of video. Here’s a slick little demo from Paul Trani:
Building on the strong work from the previous season,
Berlin’s Extraweg have created… a full-blown motion design masterpiece that takes you on a wild ride through Mark’s fractured psyche. Think trippy CGI, hypnotic 3D animations, and a surreal vibe that’ll leave you questioning reality. It’s like Inception met a kaleidoscope, and they decided to throw a rave in your brain. [more]
These changes, reported by Forbes, sound like reasonable steps in the right direction:
Starting now, Google will be adding invisible watermarks to images that have been edited on a Pixel using Magic Editor’s Reimagine feature that lets users change any element in an image by issuing text prompts.
The new information will show up in the AI Info section that appears when swiping up on an image in Google Photos.
The feature should make it easier for users to distinguish real photos from AI-powered manipulations, which will be especially useful as Reimagined photos continue to become more realistic.
I really love the way the visual medium (simply black & white dots) enriches & evolves right alongside its subject matter in this ad for ChatGPT, and I hope we get to hear more soon from the creative team behind it.
But how do we go from ironic laughs to actual usefulness? Krea is taking a swing by integrating (I think) the Flux imaging model with the DeepSeek LLM:
Krea Chat is here.
a brand new way of creating images and videos with AI.
It doesn’t yet offer the kind of localized refinements people want (e.g. “show me a dog on the beach,” then “put a hat on the dog” and don’t change anything outside the hat area). Even so, it’s great to be able to create an image, add a photo reference to refine it, and then create a video. Here’s my cute, if not exactly accurate, first attempt. 🙂
Wow—check out this genuinely amazing demo from my old friend (and former Illustrator PM) Mordy:
In this video, I show how you can use Gemini in the free Google AI Studio as your own personal tutor to help you get your work done. After you watch me using it to learn how to take a sketch I made on paper to recreating a logo in Illustrator, I promise you’ll be running to do the same.
We propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates memory from the previous frame. This ensures semantic stability in core regions while preserving fine-grained details along object boundaries.
Users can enter search terms like “a person skating with a lens flare” to find corresponding clips within their media library. Adobe says the media intelligence AI can automatically recognize “objects, locations, camera angles, and more,” alongside spoken words — providing there’s a transcript attached to the video. The feature doesn’t detect audio or identify specific people, but it can scrub through any metadata attached to video files, which allows it to fetch clips based on shoot dates, locations, and camera types. The media analysis runs on-device, so doesn’t require an internet connection, and Adobe reiterates that users’ video content isn’t used to train any AI models.
Goodbye, endless scrolling. Hello, AI-powered search panel. With the all-new Media Intelligence in #PremierePro (beta), the content of your clips is automatically recognized, including objects, locations, camera angles & more. Just input your search to find exactly what you need. pic.twitter.com/cOYXDKKaFI
If you’re like me, you may well have spent hours of your youth lovingly recreating the iconic designs of pioneering Santa Cruz artist Jim Phillips. My first deck was a Roskopp 6, and I covered countless notebook covers, a leg cast, my bedroom door, and other surfaces with my humble recreations of his work.
That work is showcased in the documentary “Art And Life,” screening on Thursday in Santa Cruz. I hope to be there, and maybe to see you there as well. (To this day I can’t quite get over the fact that “Santa Cruz” is a real place, and that I can actually visit it. Growing up it was like “Timbuktu” or “Shangri-La.” Funny ol’ world.)
Putting the proverbial chocolate in the peanut butter, those fast-moving kids at Krea have combined custom model training with 3D-guided image generation. Generation is amazingly fast, and the results are some combo of delightful & grotesque (aka “…The JNack Story”). Check it out:
God help you, though, if you import your photo & convert it to 3D for use with the realtime mode. (Who knew I was Cletus the Slack-Jawed Yokel?) pic.twitter.com/nuesUOZ1Db
Here’s another interesting snapshot of progress in our collective speedrun towards generative storytelling. It’s easy to pick on the shortcomings, but can you imagine what you’d say upon seeing this in, say, the olden times of 2023?
The creator writes,
Introducing The Heist – Directed by Jason Zada. Every shot of this film was done via text-to video with Google Veo 2. It took thousands of generations to get the final film, but I am absolutely blown away by the quality, the consistency, and adherence to the original prompt. When I described “gritty NYC in the 80s” it delivered in spades – CONSISTENTLY. While this is still not perfect, it is, hands down, the best video generation model out there, by a long shot. Additionally, it’s important to add that no VFX, no clean up, no color correction has been added. Everything is straight out of Veo 2. Google DeepMind
Here’s a nice write-up covering this paper. It’ll be interesting to dig into the details of how it compares to previous work (see category). [Update: The work comes in part from Adobe Research—I knew those names looked familiar :-)—so here’s hoping we see it in Photoshop & other tools soon.]
this is wild..
this new AI relighting tool can detect the light source in the 3D environment of your image and relight your character, the shadows look so realistic..
Part 9,201 of me never getting over the fact we were working on stuff like this 2 years ago at Adobe (modulo the realtime aspect, which is rad) & couldn’t manage to ship it. It’ll be interesting to see whether the Krea guys (and/or others) pair this kind of interactive-quality rendering with a really high-quality pass, as NVIDIA demonstrated last week using Flux.
3D arrived to Krea.
this new feature lets you turn images into 3D objects and use them in our Real-time tool.
Powered by advanced AI, TRELLIS enables users to create high-quality, customizable 3D objects effortlessly using simple text or image prompts. This innovation promises to improve 3D design workflows, making it accessible to professionals and beginner alike. Here are some examples:
Alpha channels are crucial for visual effects (VFX), allowing transparent elements like smoke and reflections to blend seamlessly into scenes. We introduce TransPixar, a method to extend pretrained video models for RGBA generation while retaining the original RGB capabilities. […] Our approach effectively generates diverse and consistent RGBA videos, advancing the possibilities for VFX and interactive content creation.
The world moves on, and now NVIDIA has teamed up with Black Forest Labs to enable 3D-conditioned image generation. Check out this demo (starting around 1:31:48):
For users interested in integrating the FLUX NIM microservice into their workflows, we have collaborated with NVIDIA to launch the NVIDIA AI Blueprint for 3D-guided generative AI. This packaged workflow allows users to guide image generation by laying out a scene in 3D applications like Blender, and using that composition with the FLUX NIM microservice to generate images that adhere to the scene. This integration simplifies image generation control and showcases what’s possible with FLUX models.
The Former Bird App™ is of course awash in mediocre AI-generated video creations, so it’s refreshing to see what a gifted filmmaker (in this case Ruairi Robinson) can do with emerging tools (in this case Google Veo)—even if that’s some slithering horror I’d frankly rather not behold!