This new tech from Meta promises to create geometry from video frames. You can try feeding it up to 16 frames via this demo site—or just check out this quick vid:
Huge drop by Meta: ActionMesh turns any video into an animated 3D mesh.
The moment I switched on gravity was the moment everything changed.
Lines I had just drawn started to fall, swing, and collide like they were suddenly alive inside my room. A simple sketch became an object with weight. A doodle turned into something that could react back. It is one of those Vision Pro moments where you catch yourself smiling because it feels playful in a way you do not see coming.
Of course, Old Man Nack™ feels like being a little cautious here: Ten years ago (!) my kids were playing in Adobe’s long-deceased Project Dali…
…and five years ago Google bailed on the excellent Tilt Brush 3D painting app it acquired. ¯\_(ツ)_/¯
And yet, and yet, and yet… I Want To Believe. As I wrote back in 2015,
I always dreamed of giving Photoshop this kind of expressive painting power; hence my long & ultimately fruitless endeavor to incorporate Flash or HTML/WebGL as a layer type. Ah well. It all reminds me of this great old-ish commercial:
So, in the world of AI, and with spatial computing staying a dead parrot (just resting & pining for the fjords!), who knows what dreams may yet come?
Apple’s new 2D-to-3D tech looks like another great step in creating editable representations of the world that capture not just what a camera sensor saw, but what we humans would experience in real life:
Excited to release our first public AI model web app, powered by Apple’s open-source ML SHARP.
Turn a single image into a navigable 3D Gaussian Splat with depth understanding in seconds.
On Friday I got to meet Dr. Fei-Fei Li, “the godmother of AI,” at the launch party for her new company, World Labs (see her launch blog post). We got to chat a bit about a paradox of complexity: that as computer models for perceiving & representing the world grow massively more sophisticated, the interfaces for doing common things—e.g. moving a person in a photo—can get radically simpler & more intentional. I’ll have more to say about this soon.
Meanwhile, here’s her fascinating & wide-ranging conversation with Lenny Rachitsky. I’m always a sucker for a good Platonic allegory-of-the-cave reference. 🙂
From the YouTube summary:
(00:00) Introduction to Dr. Fei-Fei Li (05:31) The evolution of AI (09:37) The birth of ImageNet (17:25) The rise of deep learning (23:53) The future of AI and AGI (29:51) Introduction to world models (40:45) The bitter lesson in AI and robotics (48:02) Introducing Marble, a revolutionary product (51:00) Applications and use cases of Marble (01:01:01) The founder’s journey and insights (01:10:05) Human-centered AI at Stanford (01:14:24) The role of AI in various professions (01:18:16) Conclusion and final thoughts
And here’s Gemini’s solid summary of their discussion of world models:
The Motivation: While LLMs are inspiring, they lack the spatial intelligence and world understanding that humans use daily. This ability to reason about the physical world—understanding objects, movement, and situational awareness—is essential for tasks like first response or even just tidying a kitchen 32:23.
The Concept: A world model is described as the lynchpin connecting visual intelligence, robotics, and other forms of intelligence beyond language 33:32. It is a foundational model that allows an agent (human or robot) to:
Create worlds in their mind’s eye through prompting 35:01.
Interact with that world by browsing, walking, picking up objects, or changing things 35:12.
Reason within the world, such as a robot planning its path 35:31.
The Application: World models are considered the key missing piece for building effective embodied AI, especially robots 36:08. Beyond robotics, the technology is expected to unlock major advances in scientific discovery (like deducing 3D structures from 2D data) 37:48, games, and design 37:31.
The Product: Dr. Li co-founded World Labs to pursue this mission 34:25. Their first product, Marble, is a generative model that outputs genuinely 3D worlds which users can navigate and explore 49:11. Current use cases include virtual production/VFX, game development, and creating synthetic data for robotic simulation 53:05.
My friend Bilawal got to sit down with VFX pioneer John Gaeta to discuss “A new language of perception,” Bullet Time, groundbreaking photogrammetry, the coming Big Bang/golden age of storytelling, chasing “a feeling of limitlessness,” and much more.
In this conversation:
— How Matrix VFX techniques became the prototypes for AI filmmaking tools, game engines, and AR/VR systems — How The Matrix team sourced PhD thesis films from university labs to invent new 3D capture techniques — Why “universal capture” from Matrix 2 & 3 was the precursor to modern volumetric video and 3D avatars — The Matrix 4 experiments with Unreal Engine that almost launched a transmedia universe based on The Animatrix — Why dystopian sci-fi becomes infrastructure (and what that means for AI safety) — Where John is building next: Escape.art and the future of interactive storytelling
Improvements to imaging continues its breakneck pace, as engines evolve from “simple” text-to-image (which we considered miraculous just three years ago—and which I still kinda do, TBH) to understanding time & space.
Now Emu (see project page, code) can create entire multi-page/image narratives, turn 2D images into 3D worlds, and more. Check it out:
Turntable is now available in the Adobe #Illustrator Public Beta Build 29.9.14!!!
A feature that lets you “turn” your 2D artwork to view it from different angles. With just a few steps, you can generate multiple views without redrawing from scratch.
I love seeing the Magnific team’s continued rapid march in delivering identity-preserving reskinning
IT’S FINALLY HERE!
Mystic Structure Reference!
Generate any image controlling structural integrity Infinite use cases! Films, 3D, video games, art, interiors, architecture… From cartoon to real, the opposite, or ANYTHING in between!
This example makes me wish my boys were, just for a moment, 10 years younger and still up for this kind of father/son play. 🙂
Storyboarding? No clue! But with some toy blocks, my daughter’s wild imagination, and a little help from Magnific Structure Reference, we built a castle attacked by dragons. Her idea coming to life powered up with AI magic. Just a normal Saturday Morning. Behold, my daughter’s… pic.twitter.com/52tDZokmIT
The capturing work was led by Harry Nelder and Amity Studio. Nelder used his 16-camera rig to capture the recent winners. The reconstruction software was a combination of a cloud-based platform created by Nelder, which is expected to be released later this year, along with Postshot. Nelder further utilized the Radiance Field method known as Gaussian Splatting for the reconstruction. A compilation video of all the captures, recently posted by BAFTA, was edited by Amity Studio
Putting the proverbial chocolate in the peanut butter, those fast-moving kids at Krea have combined custom model training with 3D-guided image generation. Generation is amazingly fast, and the results are some combo of delightful & grotesque (aka “…The JNack Story”). Check it out:
God help you, though, if you import your photo & convert it to 3D for use with the realtime mode. (Who knew I was Cletus the Slack-Jawed Yokel?) pic.twitter.com/nuesUOZ1Db
Part 9,201 of me never getting over the fact we were working on stuff like this 2 years ago at Adobe (modulo the realtime aspect, which is rad) & couldn’t manage to ship it. It’ll be interesting to see whether the Krea guys (and/or others) pair this kind of interactive-quality rendering with a really high-quality pass, as NVIDIA demonstrated last week using Flux.
3D arrived to Krea.
this new feature lets you turn images into 3D objects and use them in our Real-time tool.
Powered by advanced AI, TRELLIS enables users to create high-quality, customizable 3D objects effortlessly using simple text or image prompts. This innovation promises to improve 3D design workflows, making it accessible to professionals and beginner alike. Here are some examples:
The world moves on, and now NVIDIA has teamed up with Black Forest Labs to enable 3D-conditioned image generation. Check out this demo (starting around 1:31:48):
For users interested in integrating the FLUX NIM microservice into their workflows, we have collaborated with NVIDIA to launch the NVIDIA AI Blueprint for 3D-guided generative AI. This packaged workflow allows users to guide image generation by laying out a scene in 3D applications like Blender, and using that composition with the FLUX NIM microservice to generate images that adhere to the scene. This integration simplifies image generation control and showcases what’s possible with FLUX models.
Just a taste of the torrent the blows past daily on The Former Bird App:
Rodin 3D: “Rodin 3D AI can create stunning, high-quality 3D models from just text or image inputs.”
Trellis 3D: “Iterative prompting/mesh editing. You can now prompt ‘remove X, add Y, Move Z, etc.’… Allows decoding to different output formats: Radiance Fields, 3D Gaussians, and meshes.”
Blender GPT: “Generating 3D assets has never been easier. Here’s me putting together an entire 3D scene in just over a minute.”
Adobe’s new generative 3D/vector tech is a real head-turner. I’m impressed that the results look like clean, handmade paths, with colors that match the original—and not like automatic tracing of crummy text-to-3D output. I can’t wait to take it for a… oh man, don’t say it don’t say it… spin.
Amazing, and literally immersive, work by artists at The Weather Channel. Yikes—stay safe out there, everybody.
The 3D artists at the weather channel deserve a raise for this insane visual
Now watch this, and then realize forecasts are now predicting up to 15 ft of storm surge in certain areas on the western coast of Florida pic.twitter.com/HHrCVWNgpg
I’m old enough to remember 2020, when we sincerely (?) thought that everyone would be excited to put 3D-scanned virtual Olympians onto their coffee tables… or something. (Hey, it was fun while it lasted! And it temporarily kept a bunch of graphics nerds from having to slink back to the sweatshop grind of video game development.)
Anyway, here’s a look back to what Google was doing around augmented reality and the 2020 (’21) Olympics:
I swear I spent half of last summer staring at tiny 3D Naomi Osaka volleying shots on my desktop. I remain jealous of my former teammates who got to work with these athletes (and before them, folks like Donald Glover as Childish Gambino), even though doing so meant dealing with a million Covid safety protocols. Here’s a quick look at how they captured folks flexing & flying through space:
Back when we launched Firefly (alllll the way back in March 2023), we hinted at the potential of combining 3D geometry with diffusion-based rendering, and I tweeted out a very early sneak peek:
Did you see this mind blowing Adobe ControlNet + 3D Composer Adobe is going to launch! It will really boost creatives’ workflow. Video through @jnack
A year+ later, I’m no longer working to integrate the Babylon 3D engine into Adobe tools—and instead I’m working directly with the Babylon team at Microsoft (!). Meanwhile I like seeing how my old teammates are continuing to explore integrations between 3D (in this case, project Neo). Here’s one quick flow:
Here’s a quick exploration from the always-interesting Martin Nebelong:
A very quick first test of Adobe Project Neo.. didn’t realize this was out in open beta by now. Very cool!
I had to try to sculpt a burger and take that through Krea. You know, the usual thing!
There’s some very nice UX in NEO and the list-based SDF editing is awesome.. very… pic.twitter.com/e3ldyPfEDw
And here’s a fun little Neo->Firefly->AI video interpolation test from Kris Kashtanova:
Tutorial: Direct your cartoons with Project Neo + Firefly + ToonCrafter
1) Model your characters in Project Neo 2) Generate first and last frame with Firefly + Structure Reference 3) Use ToonCrafter to make a video interpolation between the first and the last frame
Being able to declare what you want, instead of having to painstakingly set up parameters for materials, lighting, etc. may prove to be an incredibly unlock for visual expressivity, particularly around the generally intimidating realm of 3D. Check out what tyFlow is bringing to the table:
You can see a bit more about how it works in this vid…
Pretty cool! I’d love to see Illustrator support model import & rendering of this sort, such that models could be re-posed in one’s .Ai doc, but this still looks like a solid approach:
3D meets 2D!
With the Expressive or Pixel Art styles in Project Neo, you can export your designs as SVGs to edit in Illustrator or use on your websites. pic.twitter.com/vOsjb2S2Un
Man, what I wouldn’t have given years ago, when we were putting 3D support into Photoshop, for the ability to compute meshes from objects (e.g. a photo of a soda can or a shirt) in order to facilitate object placement like this.
I still can’t believe I was allowed in the building with these giant throbbing brains. 🙂
Create a 3D model from a single image, set of images or a text prompt in < 1 minute
This new AI paper called CAT3D shows us that it’ll keep getting easier to produce 3D models from 2D images — whether it’s a sparser real world 3D scan (a few photos instead of hundreds) or… pic.twitter.com/sOsOBsjC8Q
Man, who knew that posting the tweet below would get me absolutely dragged by AI haters (“Worst. Dad. Ever.”) who briefly turned me into the Bean Dad of AI art? I should say more about that eye-opening experience, but for now, enjoy (unlike apparently thousands of others!) this innocuous mixing of AI & kid art:
This app looks like a delightful little creation tool that’s just meant for doodling, but I’d love to see this kind of physical creation paired with the world of generative AI rendering. I’m reminded of how “Little Big Planet” years ago made me yearn for Photoshop tools that felt like Sackboy’s particle-emitting jetpack. Someday, maybe…?
A kind of 3D brush
Tiny Glade is going to be just a relaxing castle doodling game. No more, no less. More than enough!
The game seems amazing. But oh my god… Think about what could be done by further abstracting the idea of that “3D brush.”pic.twitter.com/kguZCq5jrb
So, @StabilityAI has this new experimental imageTo3D model, and I just painted a moon buggy in SageBrush, dropped it into their Huggingface space, converted it in Reality Converter, and air dropped it onto the moon – all on #AppleVisionPropic.twitter.com/pj3TTcy5zt
Heh—this fun little animation makes me think back to how I considered changing my three-word Google bio from “Teaching Google Photoshop” (i.e. getting robots to see & create like humans, making beautiful things based on your life & interests) to “Wow! Nobody Cares.” :-p Here’s to less of that in 2024.
F1 racing lover John LePore (whose VFX work you’ve seen in Iron Man 2 and myriad other productions over the years) has created the first demo for Apple Vision Pro that makes me say, “Okay, dang, that looks truly useful & compelling.” Check out his quick demo & behind-the-scenes narration:
Apple Vision Pro + Formula 1 = Killer App (?)
a story about: -design & innovation -racing royalty -property theft and more! pic.twitter.com/6MbLKEDqOB
Hey gang—here’s to having a great 2024 of making the world more beautiful & fun. Here’s a little 3D creation (with processing courtesy of Luma Labs) made from some New Year’s Eve drone footage I captured at Gaviota State Beach. (If it’s not loading for some reason, you can see a video version in this tweet).
Oh look, I’m George Clooney! Kinda. You can be, too. FAL AI promises “AI inference faster than you can type.”
“100ms image generation at 1024×1024. Announcing Segmind-Vega and Segmind-VegaRT, the fastest and smallest, open source models for image generation at the highest resolution.”
Krea has announced their open beta, “free for everyone.”
Instagram has enabled image generation inside chat (pretty “meh,” in my experience so far), and in stories creation, “It allows you to replace a background of an image into whatever AI generated image you’d like.”
“Did you know that you can train an AI Art model and get paid every time someone uses it? That’s Generaitiv’s Model Royalties System for you.”
Here’s a great look at how the scrappy team behind Luma.ai has helped enable beautiful volumetric captures of Phoenix Suns players soaring through the air:
Go behind the scenes of the innovative collaboration between Profectum Media and the Phoenix Suns to discover how we overcame technological and creative challenges to produce the first 3D bullet time neural radiance field NeRF effect in a major sports NBA arena video. This involved not just custom-building a 48 GoPro multi-cam volumetric rig but also integrating advanced AI tools from Luma AI to capture athletes in stunning, frozen-in-time 3D visual sequences. This venture is more than just a glimpse behind the scenes – it’s a peek into the evolving world of sports entertainment and the future of spatial capture.
Man, I’m inspired—and TBH a little jealous—seeing 14yo creator Preston Mutanga creating amazing 3D animations, as he’s apparently been doing for fully half his life. I think you’ll enjoy the short talk he gave covering his passions:
The presentation will take the audience on a journey, a journey across the Spider-Verse where a self-taught, young, talented 14-year-old kid used Blender, to create high-quality LEGO animations of movie trailers. Through the use of social media, this young artist’s passion and skill caught the attention of Hollywood producers, leading to a life-changing invitation to animate in a new Hollywood movie.
That’s the promise of Adobe’s Project Neo—which you can sign up to test & use now! Check out the awesome sneak peek they presented at MAX:
Incorporating 3D elements into 2D designs (infographics, posters, logos or even websites) can be difficult to master, and often requires designers to learn new workflows or technical skills.
Project Neo enables designers to create 2D content by using 3D shapes without having to learn traditional 3D creation tools and methods. This technology leverages the best of 3D principles so designers can create 2D shapes with one, two or three-point perspectives easily and quickly. Designers using this technology are also able to collaborate with their stakeholders and make edits to mockups at the vector level so they can quickly make changes to projects.
Roughly 1,000 years ago (i.e. this past April!), I gave an early sneak peek at the 3D-to-image work we’ve been doing around Firefly. Now at MAX, my teammate Yi Zhou has demonstrated some additional ways we could put the core tech to work—by adding posable humans to the scene.
Project Poseable makes it easy for anyone to quickly design 3D prototypes and storyboards in minutes with generative AI.
Instead of having to spend time editing the details of a scene — the background, different angles and poses of individual characters, or the way the character interacts with surrounding objects in the scene — users can tap into AI-based character posing models and use image generation models to easily render 3D character scenes.
I’m so pleased & even proud (having at least having offered my encouragement to him over the years) to see my buddy Bilawal spreading his wings and spreading the good word about AI-powered creativity.
Check out his quick thoughts on “Channel-surfing realities layered on top of the real world,” “3D screenshots for the real world,” and more:
Favorite quote 😉:
“All they need to do is have a creative vision, and a Nack for working in concert with these AI models”—beautifully said, my friend! 🙏😜. pic.twitter.com/f6oUNSQXul