Building on the strong work from the previous season,
Berlin’s Extraweg have created… a full-blown motion design masterpiece that takes you on a wild ride through Mark’s fractured psyche. Think trippy CGI, hypnotic 3D animations, and a surreal vibe that’ll leave you questioning reality. It’s like Inception met a kaleidoscope, and they decided to throw a rave in your brain. [more]
These changes, reported by Forbes, sound like reasonable steps in the right direction:
Starting now, Google will be adding invisible watermarks to images that have been edited on a Pixel using Magic Editor’s Reimagine feature that lets users change any element in an image by issuing text prompts.
The new information will show up in the AI Info section that appears when swiping up on an image in Google Photos.
The feature should make it easier for users to distinguish real photos from AI-powered manipulations, which will be especially useful as Reimagined photos continue to become more realistic.
I really love the way the visual medium (simply black & white dots) enriches & evolves right alongside its subject matter in this ad for ChatGPT, and I hope we get to hear more soon from the creative team behind it.
But how do we go from ironic laughs to actual usefulness? Krea is taking a swing by integrating (I think) the Flux imaging model with the DeepSeek LLM:
Krea Chat is here.
a brand new way of creating images and videos with AI.
It doesn’t yet offer the kind of localized refinements people want (e.g. “show me a dog on the beach,” then “put a hat on the dog” and don’t change anything outside the hat area). Even so, it’s great to be able to create an image, add a photo reference to refine it, and then create a video. Here’s my cute, if not exactly accurate, first attempt. 🙂
Wow—check out this genuinely amazing demo from my old friend (and former Illustrator PM) Mordy:
In this video, I show how you can use Gemini in the free Google AI Studio as your own personal tutor to help you get your work done. After you watch me using it to learn how to take a sketch I made on paper to recreating a logo in Illustrator, I promise you’ll be running to do the same.
We propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates memory from the previous frame. This ensures semantic stability in core regions while preserving fine-grained details along object boundaries.
Users can enter search terms like “a person skating with a lens flare” to find corresponding clips within their media library. Adobe says the media intelligence AI can automatically recognize “objects, locations, camera angles, and more,” alongside spoken words — providing there’s a transcript attached to the video. The feature doesn’t detect audio or identify specific people, but it can scrub through any metadata attached to video files, which allows it to fetch clips based on shoot dates, locations, and camera types. The media analysis runs on-device, so doesn’t require an internet connection, and Adobe reiterates that users’ video content isn’t used to train any AI models.
Goodbye, endless scrolling. Hello, AI-powered search panel. With the all-new Media Intelligence in #PremierePro (beta), the content of your clips is automatically recognized, including objects, locations, camera angles & more. Just input your search to find exactly what you need. pic.twitter.com/cOYXDKKaFI
If you’re like me, you may well have spent hours of your youth lovingly recreating the iconic designs of pioneering Santa Cruz artist Jim Phillips. My first deck was a Roskopp 6, and I covered countless notebook covers, a leg cast, my bedroom door, and other surfaces with my humble recreations of his work.
That work is showcased in the documentary “Art And Life,” screening on Thursday in Santa Cruz. I hope to be there, and maybe to see you there as well. (To this day I can’t quite get over the fact that “Santa Cruz” is a real place, and that I can actually visit it. Growing up it was like “Timbuktu” or “Shangri-La.” Funny ol’ world.)
Putting the proverbial chocolate in the peanut butter, those fast-moving kids at Krea have combined custom model training with 3D-guided image generation. Generation is amazingly fast, and the results are some combo of delightful & grotesque (aka “…The JNack Story”). Check it out:
God help you, though, if you import your photo & convert it to 3D for use with the realtime mode. (Who knew I was Cletus the Slack-Jawed Yokel?) pic.twitter.com/nuesUOZ1Db
Here’s another interesting snapshot of progress in our collective speedrun towards generative storytelling. It’s easy to pick on the shortcomings, but can you imagine what you’d say upon seeing this in, say, the olden times of 2023?
The creator writes,
Introducing The Heist – Directed by Jason Zada. Every shot of this film was done via text-to video with Google Veo 2. It took thousands of generations to get the final film, but I am absolutely blown away by the quality, the consistency, and adherence to the original prompt. When I described “gritty NYC in the 80s” it delivered in spades – CONSISTENTLY. While this is still not perfect, it is, hands down, the best video generation model out there, by a long shot. Additionally, it’s important to add that no VFX, no clean up, no color correction has been added. Everything is straight out of Veo 2. Google DeepMind
Here’s a nice write-up covering this paper. It’ll be interesting to dig into the details of how it compares to previous work (see category). [Update: The work comes in part from Adobe Research—I knew those names looked familiar :-)—so here’s hoping we see it in Photoshop & other tools soon.]
this is wild..
this new AI relighting tool can detect the light source in the 3D environment of your image and relight your character, the shadows look so realistic..
Part 9,201 of me never getting over the fact we were working on stuff like this 2 years ago at Adobe (modulo the realtime aspect, which is rad) & couldn’t manage to ship it. It’ll be interesting to see whether the Krea guys (and/or others) pair this kind of interactive-quality rendering with a really high-quality pass, as NVIDIA demonstrated last week using Flux.
3D arrived to Krea.
this new feature lets you turn images into 3D objects and use them in our Real-time tool.
Powered by advanced AI, TRELLIS enables users to create high-quality, customizable 3D objects effortlessly using simple text or image prompts. This innovation promises to improve 3D design workflows, making it accessible to professionals and beginner alike. Here are some examples:
Alpha channels are crucial for visual effects (VFX), allowing transparent elements like smoke and reflections to blend seamlessly into scenes. We introduce TransPixar, a method to extend pretrained video models for RGBA generation while retaining the original RGB capabilities. […] Our approach effectively generates diverse and consistent RGBA videos, advancing the possibilities for VFX and interactive content creation.
The world moves on, and now NVIDIA has teamed up with Black Forest Labs to enable 3D-conditioned image generation. Check out this demo (starting around 1:31:48):
For users interested in integrating the FLUX NIM microservice into their workflows, we have collaborated with NVIDIA to launch the NVIDIA AI Blueprint for 3D-guided generative AI. This packaged workflow allows users to guide image generation by laying out a scene in 3D applications like Blender, and using that composition with the FLUX NIM microservice to generate images that adhere to the scene. This integration simplifies image generation control and showcases what’s possible with FLUX models.
The Former Bird App™ is of course awash in mediocre AI-generated video creations, so it’s refreshing to see what a gifted filmmaker (in this case Ruairi Robinson) can do with emerging tools (in this case Google Veo)—even if that’s some slithering horror I’d frankly rather not behold!
Happy (very slightly belated) new year, everyone! Thanks for continuing to join me on this wild, sometimes befuddling, often exhilarating journey into our shared creative future. Some good perspective on the path ahead:
I hope you’ve been able to spend at least some warm times with friends & family this holiday season, and here’s to a great new year of crackling creativity:
I’ve long wanted—and advocated for building—this kind of flexible, spatial way to compose & blend among ideas. Here’s to new ideas for using new tools.
Supporting Non-Linear Exploration
Creative exploration rarely follows a straight line. The graph structure naturally affords exploration by allowing users to diverge at various points, creating new forks of possible alternatives. As more exploration occurs, the graph grows… pic.twitter.com/Yq18Caj94T
It’s a touch odd to me that Meta is investing here while also shutting down the Meta Spark AR lens platform, but I guess interest in lenses has broadly faded, and AI interpretation of images may prove to be more accessible & scalable. (I wonder what’ll be its Dancing Hot Dog moment.)
I love seeing exactly how Chad Nelson was able to construct a Little Big Planet-inspired game world through some creative prompting & tweening in Open AI’s new Sora video creation model. Check out his exploratory process:
Director Matan Cohen-Grumi shows off the radical acceleration in VFX-heavy storytelling that’s possible through emerging tools—including Pika’s new Scene Ingredients:
For 10 years, I directed TV commercials, where storytelling was intuitive—casting characters, choosing locations, and directing scenes effortlessly. When I shifted to AI over a year ago, the process felt clunky—hacking together solutions, spending hours generating images, and… pic.twitter.com/pJUamLFgWI
Instead of generating images with long, detailed text prompts, Whisk lets you prompt with images. Simply drag in images, and start creating.
Whisk lets you input images for the subject, one for the scene and another image for the style. Then, you can remix them to create something uniquely your own, from a digital plushie to an enamel pin or sticker.
The blog post gives a bit more of a peek behind the scenes & sets some expectations:
Since Whisk extracts only a few key characteristics from your image, it might generate images that differ from your expectations. For example, the generated subject might have a different height, weight, hairstyle or skin tone. We understand these features may be crucial for your project and Whisk may miss the mark, so we let you view and edit the underlying prompts at any time.
In our early testing with artists and creatives, people have been describing Whisk as a new type of creative tool — not a traditional image editor. We built it for rapid visual exploration, not pixel-perfect edits. It’s about exploring ideas in new and creative ways, allowing you to work through dozens of options and download the ones you love.
And yes, uploading a 19th-century dog illustration to generate a plushie dancing an Irish jig is definitely the most JNack way to squander precious work time do vital market research. 🙂
I’m a near-daily user of Ideogram to create all manner of images—mainly goofy dad jokes to (ostensibly) entertain my family. Now they’re enabling batch creation to facilitate creation of lots of variations (e.g. versions of a logo):
Check out this wild video-to-video demo from Nathan Shipley:
Sora Remix test: Scissors to crane
Prompt was “Close up of a curious crane bird looking around a beautiful nature scene by a pond. The birds head pops into the shot and then out.” pic.twitter.com/CvAkdkmFBQ
Just a taste of the torrent the blows past daily on The Former Bird App:
Rodin 3D: “Rodin 3D AI can create stunning, high-quality 3D models from just text or image inputs.”
Trellis 3D: “Iterative prompting/mesh editing. You can now prompt ‘remove X, add Y, Move Z, etc.’… Allows decoding to different output formats: Radiance Fields, 3D Gaussians, and meshes.”
Blender GPT: “Generating 3D assets has never been easier. Here’s me putting together an entire 3D scene in just over a minute.”
This might be the world’s lowest-key demo of what promises to be truly game-changing technology!
I’ve tried a number of other attempts at unlocking this capability (e.g. Meta.ai (see previous), Playground.com, and what Adobe sneak-peeked at the Firefly launch in early 2023), but so far I’ve found them all more unpredictable & frustrating than useful. Could Gemini now have turned the corner? Only hands-on testing (not yet broadly available) will tell!
If you or folks you know might be a good fit for one or more of these roles, please check ’em out & pass along info. Here’s some context from design director Mike Davidson.
————
These positions are United States only, Redmond-preferred, but we’ll also consider the Bay Area and other locations:
Diffusion models are ushering in what feels like a golden(-hour) age in relighting (see previous). Among the latest offerings is LumiNet:
[6/7] Here are a few more random relighting!
How accurate are these results? That’s very hard to answer at the moment But our tests on the MIT dataset, our user study, plus qualitative results all point to us being on the right track.
What if your design tool could understand the meaning & importance of words, then help you style them accordingly?
I’m delighted to say that for what I believe is the first time ever, that’s now possible. For the last 40 years of design software, apps have of course provided all kinds of fonts, styles, and tools for manual typesetting. What they’ve lacked is an understanding of what words actually mean, and consequently of how they should be styled in order to map visual emphasis to semantic importance.
In Microsoft Designer, you can now create a new text object, then apply hierarchical styling (primary, secondary, tertiary) based on AI analysis of word importance:
I’d love to hear what you think. You can go to designer.microsoft.com, create a new document, and add some text. Note: The feature hasn’t yet been rolled out to 100% of users, so it may not yet be available to you—but even in that case it’d be great to hear your thoughts on Designer in general.
This feature came about in response to noticing that text-to-image models are not only learning to spell well (check out some examples I’ve gathered on Pinterest), but can also set text with varied size, position, and styling that’s appropriate to the importance of each word. Check out some of my Ideogram creations (which you can click on & remix using the included prompts):
These results of course incredible (imagine seeing any of this even three years ago!), but they’re just flat images, not editable text. Our new feature, by contrast, leverages semantic understanding and applies it to normal text objects.
What we’ve shipped now is just the absolute tip of the iceberg: to start we’re simply applying preset values based on word hierarchy, but you can readily imagine richer layouts, smart adaptive styling, and much more. Stay tuned—and let us know what you’d like to see!