Google Research has devised “Alchemist,” a new way to swap object textures:
And people keep doing wonderful things with realtime image synthesis:
Happy mixing of decoder embeddings in real-time! Base prompt is ‘photo of a room, sofa, decor’ and the two knobs are ‘industrial’ and ‘rococo’. If you are wondering what is running there in the background… pic.twitter.com/5svyDy5C4e
Always pushing the limits of expressive tech, Martin Nebelong has paired Photoshop painting with AI rendering, followed by Runway’s new image-to-video model. “Days of Miracles & Wonder,” as always:
Painting with AI in photoshop – And doing magic with Runways new Gen 3 image to video. This stuff is insane.. wow.
Our tools and workflows are at the brink of an incredible renaissance.
In this history books, this clip will be referred to as “Owl and cake” 😛
Man, I’m old enough to remember rotoscoping video by hand—a process that quickly made me want to jump right out a window. Years later, when we were working on realtime video segmentation at Google, I was so proud to show the tech to a bunch of high school design students—only to have them shrug and treat it as completely normal.
Ah, but so it goes: “One of history’s few iron laws is that luxuries tend to become necessities and to spawn new obligations. Once people get used to a certain luxury, they take it for granted.” — Yuval Noah Harari
In any case, Meta has just released what looks like a great update to their excellent—and open-source—Segment Anything Model. Check it out:
Introducing Meta Segment Anything Model 2 (SAM 2) — the first unified model for real-time, promptable object segmentation in images & videos.
SAM 2 is available today under Apache 2.0 so that anyone can use it to build their own experiences
You can play with the demo and learn more on the site:
Following up on the success of the Meta Segment Anything Model (SAM) for images, we’re releasing SAM 2, a unified model for real-time promptable object segmentation in images and videos that achieves state-of-the-art performance.
In keeping with our approach to open science, we’re sharing the code and model weights with a permissive Apache 2.0 license.
We’re also sharing the SA-V dataset, which includes approximately 51,000 real-world videos and more than 600,000 masklets (spatio-temporal masks).
SAM 2 can segment any object in any video or image—even for objects and visual domains it has not seen previously, enabling a diverse range of use cases without custom adaptation.
Back when we launched Firefly (alllll the way back in March 2023), we hinted at the potential of combining 3D geometry with diffusion-based rendering, and I tweeted out a very early sneak peek:
Did you see this mind blowing Adobe ControlNet + 3D Composer Adobe is going to launch! It will really boost creatives’ workflow. Video through @jnack
A year+ later, I’m no longer working to integrate the Babylon 3D engine into Adobe tools—and instead I’m working directly with the Babylon team at Microsoft (!). Meanwhile I like seeing how my old teammates are continuing to explore integrations between 3D (in this case, project Neo). Here’s one quick flow:
Here’s a quick exploration from the always-interesting Martin Nebelong:
A very quick first test of Adobe Project Neo.. didn’t realize this was out in open beta by now. Very cool!
I had to try to sculpt a burger and take that through Krea. You know, the usual thing!
There’s some very nice UX in NEO and the list-based SDF editing is awesome.. very… pic.twitter.com/e3ldyPfEDw
And here’s a fun little Neo->Firefly->AI video interpolation test from Kris Kashtanova:
Tutorial: Direct your cartoons with Project Neo + Firefly + ToonCrafter
1) Model your characters in Project Neo 2) Generate first and last frame with Firefly + Structure Reference 3) Use ToonCrafter to make a video interpolation between the first and the last frame
As I’ve probably mentioned already, when I first surveyed Adobe customers a couple of years ago (right after DALL•E & Midjourney first shipped), it was clear that they wanted selective synthesis—adding things to compositions, and especially removing them—much more strongly than whole-image synthesis.
Thus it’s no surprise that Generative Fill in Photoshop has so clearly delivered Firefly’s strongest product-market fit, and I’m excited to see Illustrator following the same path—but for vectors:
Generative Shape Fill will help you improve your workflow including:
Create detailed, scalable vectors: After you draw or select your shape, silhouette, or outline in your artboard, use a text prompt to ideate on vector options to fill it.
Style Reference for brand consistency: Create a wide variety of options that match the color, style, and shape of your artwork to ensure a consistent look and feel.
Add effects to your creations: Enhance your vector options further by adding styles like 3D, geometric, pixel art or more.
They’re also adding the ability to create vector patterns simply via prompting:
Soon after Generative Fill shipped last year, people discovered that using a semi-opaque selection could help blend results into an environment (e.g. putting fish under water). The new Selection Brush in Photoshop takes functionality that’s been around for 30+ years (via Quick Select mode) and brings it more to the surface, which in turn makes it easier to control GenFill behavior:
For now the functionality is limited to upscaling, but I have to think that they’ll soon turn on the super cool relighting & restyling tech that enables fun like transforming my dog using just different prompts (click to see larger):
I wish Adobe hadn’t given up (at least for the last couple of years and foreseeable future) on the Smart Portrait tech we were developing. It’s been stuck at 1.0 since 2020 and could be so much better. Maybe someday!
In the meantime, check out LivePortrait:
Some impressive early results coming out of LivePortrait, a new model for face animation.
Upload a photo + a reference video and combine them!
Being able to declare what you want, instead of having to painstakingly set up parameters for materials, lighting, etc. may prove to be an incredibly unlock for visual expressivity, particularly around the generally intimidating realm of 3D. Check out what tyFlow is bringing to the table:
You can see a bit more about how it works in this vid…
Years ago Adobe experimented with a real-time prototype of Photoshop’s Landscape Mixer Neural Filter, and the resulting responsiveness made one feel like a deity—fluidly changing summer to winter & back again. I was reminded of using Google Earth VR, where grabbing & dragging th
Nothing came of it, but in the time since then, realtime diffusion rendering (see amazing examples from Krea & others) and image-to-image restyling have opened some amazing new doors. I wish I could attach filters to any layer in Photoshop (text, 3D, shape, image) and have it reinterpreted like this:
New way to navigate latent space. It preservers the underlying image structure and feels a bit like a powerful style-transfer that can be applied to anything. The trick is to… pic.twitter.com/orFBysBpkT
Pretty cool! I’d love to see Illustrator support model import & rendering of this sort, such that models could be re-posed in one’s .Ai doc, but this still looks like a solid approach:
3D meets 2D!
With the Expressive or Pixel Art styles in Project Neo, you can export your designs as SVGs to edit in Illustrator or use on your websites. pic.twitter.com/vOsjb2S2Un
Heh: András István Arató—aka Hide The Pain Harold, the wincing king of stock photography—seems like a genuinely good dude. Here he narrates his story in brief:
Using Magic Insert we are, for the first time, able to drag-and-drop a subject from an image with an arbitrary style onto another target image with a vastly different style and achieve a style-aware and realistic insertion of the subject into the target image.
Here is a demo that you can access on the desktop version of the website. We’re excited by the options Magic Insert opens up for artistic creation, content creation and for the overall expansion of GenAI controllability. pic.twitter.com/HhbfrEfXZH
Of course, much of the challenge here—where art meets science—is around identity preservation: to what extent can & should the output resemble the input? Here it’s subject to some interpretation. In other applications one wants an exact copy of a given person or thing, but optionally transformed in just certain ways (e.g. pose & lighting).
When we launched Firefly last year, we showed off some of Adobe’s then-new ObjectStitch tech for making realistic composites. It didn’t ship while I was there due to challenges around identity preservation. As far as I know those challenges remain only partially solved, so I’ll continue holding out hope—as I have for probably 30 years now!—for future tech breakthroughs that get us all the way across that line.
Check out this striking application of AI-powered relighting: a single rendering is deeply & realistically transformed via one AI tool, and the results are then animated & extended by another.
Wandering alone around the campus of my alma mater this past weekend had me in a deeply wistful, reflective mood. I reached out across time & space to some long-separated friends, and I thought you might enjoy this beautiful tune that’s been in my head the whole while.
Man, what I wouldn’t have given years ago, when we were putting 3D support into Photoshop, for the ability to compute meshes from objects (e.g. a photo of a soda can or a shirt) in order to facilitate object placement like this.
Here’s a micro tutorial on how to create similar effects:
Here’s how to morph memes using Dream Machine’s new Keyframe feature. Simply upload two of your favorite memes, write a prompt that describes how you’d like to transition between them, and we’ll dream up the rest. https://t.co/G3HUEBEAcO#LumaDreamMachinepic.twitter.com/yNaRhERutn
This Lego machine can easily create a beautiful pixelart of anything you want! It is programmed in Python, and, with help of OpenAI’s DALL-E 3, it can make anything!
Sten of the YouTube channel Creative Mindstorms demonstrates his very own robotprinter named Pixelbot 3000, made of LEGO bricks, that can produce pixel art with the help of OpenAI’s DALL-E 3 and AI images. Using a 32 x 32 plate and numerous round LEGO bricks, the robot printer automatically pins the pieces onto their designated positions until it forms the pixel art version of the image. He uses Python as his main programming language, and to create pixel art of anything, he employs AI, specifically OpenAI’s DALL-E 3.
Fun! You can grab the free browser extension here.
* right-click-remix any image w/ tons of amazing AI presets: Style Transfer, Controlnets… * build & remix your own workflows with full comfyUI support * local + cloud!
besides some really great default presets using all sorts of amazing ComfyUI workflows (which you can inspect and remix on http://glif.app), the extension will now also pull your own compatible glifs into it!
The tech, a demo of which you can try here, promises “‘imitative editing,’ allowing users to edit images using reference images without the need for detailed text descriptions.”
Good grief, the pace of change makes “AI vertigo” such a real thing. Just last week we were seeing “skeleton underwater” memes with Runway submerged in a rusty chair. :-p I’m especially excited to see how it handles text (which remains a struggle for text-to-image models including DALL•E):
I’m really digging the simple joy in this little experiment, powered by Imagen:
1 Prompt. 26 letters. Any kind of alphabet you can imagine. #GenType empowers you to craft, refine, and download one-of-a-kind AI generated type, building from A-Z with just your imagination.
It is a highly scalable and efficient transformer model trained directly on videos making it capable of generating physically accurate, consistent and eventful shots. Dream Machine is our first step towards building a universal imagination engine and it is available to everyone now!
There’s been a firestorm this week about the terms of service that my old home team put forward, based (as such things have been since time immemorial) on a lot of misunderstanding & fear. Fortunately the company has been working to clarify what’s really going on.
Sorry for delay on this. Info here, including what actually changed in the TOS (not much), as well as what Adobe can / cannot do with your content. https://t.co/LZFkDXrmep
My former Google teammates have been cranking out some amazing AI personalization tech, with HyperDreamBooth far surpassing the performance of their original DreamBooth (y’know, from 2022—such a simpler ancient time!). Here they offer a short & pretty accessible overview of how it works:
Using only a single input image, HyperDreamBooth is able to personalize a text-to-image diffusion model 25x faster than DreamBooth, by using (1) a HyperNetwork to generate an initial prediction of a subset of network weights that are then (2) refined using fast finetuning for high fidelity to subject detail. Our method both conserves model integrity and style diversity while closely approximating the subject’s essence and details.
“Maybe the real treasure was the friends we made along the way” is, generally, ironic shorthand for “worthless treasure”—but I’ve also found it to be true. That’s particularly the case for the time I spent at Google, where I met excellent folks like Bilawal Sidhu (a fellow PM veteran of the augmented reality group). I’m delighted that he’s now crushing it as the new host of the TED AI Show podcast.
Check out their episodes so far, including an interview with former OpenAI board member Helen Toner, who discusses the circumstances of firing Sam Altman last year before losing her board position.
I particularly enjoyed this Movie Mindset podcast episode, which in part plays as a fantastic tribute to the power of After Effects:
We sit down with Mike Cheslik, the director of the new(ish) silent comedy action farce Hundreds of Beavers. We discuss his Wisconsin influences, ultra-DIY approach to filmmaking, making your film exactly as stupid as it needs to be, and the inherent humor of watching a guy in a mascot costume get wrecked on camera.
(And no, I’m not just talking oppressive humidity—though after living in California so long, that was quite a handful.) My 14yo MiniMe Henry & I had a ball over the weekend on our first trip to Louisiana, chasing the Empress steam engine as it made its way from Canada down to Mexico City. I’ll try to share a proper photo album soon, but in the meantime here are some great shots from Henry (enhanced with the now-indispensible Generative Fill), plus a bit of fun drone footage:
“Combine your ink strokes with text prompts to generate new images in nearly real time with Cocreator,” Microsoft explains. “As you iterate, so does the artwork, helping you more easily refine, edit and evolve your ideas. Powerful diffusion-based algorithms optimize for the highest quality output over minimum steps to make it feel like you are creating alongside AI.”
The Designer team at Microsoft is working to enable AI-powered creation & editing experiences across a wide range of tools, and I’m delighted that my new teammates are rolling out a new set of integrations. Check out how you can now create images right inside Microsoft Teams:
I really enjoyed this TED talk from Fei-Fei Li on spatial computing & the possible dawning of a Cambrian explosion on how we—and our creations—perceive the world.
In the beginning of the universe, all was darkness — until the first organisms developed sight, which ushered in an explosion of life, learning and progress. AI pioneer Fei-Fei Li says a similar moment is about to happen for computers and robots. She shows how machines are gaining “spatial intelligence” — the ability to process visual data, make predictions and act upon those predictions — and shares how this could enable AI to interact with humans in the real world.
When I surveyed thousands of Photoshop customers waaaaaay back in the Before Times—y’know, summer 2022—I was struck by the fact that beyond wanting to insert things into images, and far beyond wanting to create images from scratch, just about everyone wanted better ways to remove things.
Happily, that capability has now come to Lightroom. It’s a deceptively simple change that, I believe, required a lot of work to evolve Lr’s non-destructive editing pipeline. Traditionally all edits were expressed as simple parameters, and then masks got added—but as far as I know, this is the first time Lr has ventured into transforming pixels in an additive way (that is, modify one bunch, then make subsequent edits that depend on the previous edits). That’s a big deal, and a big step forward for the team.
Adobe’s CEO (duh :-)) sat down with Nilay Patel for an in-depth interview. Here are some of the key points, as summarized by ChatGPT:
———-
AI as a Paradigm Shift: Narayen views AI as a fundamental shift, similar to the transitions to mobile and cloud technologies. He emphasizes that AI, especially generative AI, can automate tasks, enhance creative processes, and democratize access to creative tools. This allows users who might not have traditional artistic skills to create compelling content (GIGAZINE) (Stanford Graduate School of Business).
Generative AI in Adobe Products: Adobe’s Firefly, a family of generative AI models, has been integrated into various Adobe products. Firefly enhances creative workflows by enabling users to generate images, text effects, and video content with simple text prompts. This integration aims to accelerate ideation, exploration, and production, making it easier for creators to bring their visions to life (Adobe News) (Welcome to the Adobe Blog).
Empowering Creativity: Narayen highlights that Adobe’s approach to AI is centered around augmenting human creativity rather than replacing it. Tools like Generative Fill in Photoshop and new generative AI features in Premiere Pro are designed to streamline tedious tasks, allowing creators to focus on the more creative aspects of their work. This not only improves productivity but also expands creative possibilities (The Print) (Adobe News).
Business Model and Innovation: Narayen discusses how Adobe is adapting its business model to leverage AI. By integrating AI across Creative Cloud, Document Cloud, and Experience Cloud, Adobe aims to enhance its products and deliver more value to users. This includes experimenting with new business models and monetizing AI-driven features to stay at the forefront of digital creativity (Stanford Graduate School of Business) (The Print).
Content Authenticity and Ethics: Adobe emphasizes transparency and ethical use of AI. Initiatives like Content Credentials help ensure that AI-generated content is properly attributed and distinguishable from human-created content. This approach aims to maintain trust and authenticity in digital media (Adobe News) (Welcome to the Adobe Blog).
I still can’t believe I was allowed in the building with these giant throbbing brains. 🙂
Create a 3D model from a single image, set of images or a text prompt in < 1 minute
This new AI paper called CAT3D shows us that it’ll keep getting easier to produce 3D models from 2D images — whether it’s a sparser real world 3D scan (a few photos instead of hundreds) or… pic.twitter.com/sOsOBsjC8Q
I’ve gotta say, this one touches a kinda painful nerve with me.
10 years ago I walked into the Google Photos team expecting normal humans to do things like say, “Show me the best pictures of my grandkids.” I immediately felt like a fool: something like 97% of daily users don’t search, preferring to simply launch the app and scroll scroll scroll forever.
A decade later, the Photos team is talking about using large language models to enable uses like the following:
With Ask Photos, you can ask for what you’re looking for in a natural way, like: “Show me the best photo from each national park I’ve visited.” Google Photos can show you what you need, saving you from all that scrolling.
For example, you can ask: “What themes have we had for Lena’s birthday parties?”. Ask Photos will understand details, like what decorations are in the background or on the birthday cake, to give you the answer.
Will anyone actually do this? It’s really hard for me to imagine, at least as it’s been framed above.
Now, what I can imagine working—in pretty great ways—is a real Assistant experience that suggests a bunch of useful tasks with which it can assist, such as gathering up photos to make birthday or holiday cards. (The latter task always falls to me every year, and I wish I could more confidently do it better.) Assistant could easily ask whose birthday it is & on what date, then scan one’s library and suggest a nice range of images as well as presentation options (cards, short animations, etc.). That kind of agent could be a joy to interact with.
Never doubt the power of a motivated person or two to do what needs to be done. Stick around to the last section of this short vid to see Stable Diffusion-powered “Find & Replace” (maskless inpainting powered by prompts) in action:
I came across this post (originally from 2017) just now while looking for other work from Paul Asente. Here’s hoping it can finally see the light of day in Illustrator! —J.
———–
Paul Asente is an OG of the graphics world, having been responsible for (if I recall correctly) everything from Illustrator’s vector meshes & art brushes to variable-width strokes. Now he’s back with new Adobe illustration tech to drop some millefleurs science:
PhysicsPak automatically fills a shape with copies of elements, growing, stretching, and distorting them to fill the space. It uses a physics simulation to do this and to control the amount of distortion.