Here’s a bit more on how the new editing features work:
We’re testing two new features today: our image editor for uploaded images and image re-texturing for exploring materials, surfacing, and lighting. Everything works with all our advanced features, such as style references, character references, and personalized models pic.twitter.com/jl3a1ZDKNg
I’ve become an Ideogram superfan, using it to create imagery daily, so I’m excited to kick the tires on this new interactive tool—especially around its ability to synthesize new text in the style of a visual reference.
Today, we’re introducing Ideogram Canvas, an infinite creative board for organizing, generating, editing, and combining images.
Bring your face or brand visuals to Ideogram Canvas and use industry-leading Magic Fill and Extend to blend them with creative, AI-generated content. pic.twitter.com/m2yjulvmE2
You can upload your own images or generate new ones within Canvas, then seamlessly edit, extend, or combine them using industry-leading Magic Fill (inpainting) and Extend (outpainting) tools. Use Magic Fill and Extend to bring your face or brand visuals to Ideogram Canvas and blend them with creative, AI-generated elements. Perfect for graphic design, Ideogram Canvas offers advanced text rendering and precise prompt adherence, allowing you to bring your vision to life through a flexible, iterative process.
Filmmaker & Pika Labs creative director Matan Cohen Grumi makes this town look way more dynamic than usual (than ever?) through the power of his team’s tech:
Adobe’s new generative 3D/vector tech is a real head-turner. I’m impressed that the results look like clean, handmade paths, with colors that match the original—and not like automatic tracing of crummy text-to-3D output. I can’t wait to take it for a… oh man, don’t say it don’t say it… spin.
Oh man, for years we wanted to build this feature into Photoshop—years! We tried many times (e.g. I wanted this + scribble selection to be the marquee features in Photoshop Touch back in 2011), but the tech just wasn’t ready. But now, maybe, the magic is real—or at least tantalizingly close!
Being a huge nerd, I wonder about how the tech works, and whether it’s substantially the same as what Magnific has been offering (including via a Photoshop panel) for the last several months. Here’s how I used that on my pooch:
But even if it’s all the same, who cares?
Being useful to people right where they live & work, with zero friction, is tremendous. Generative Fill is a perfect example: similar (if lower quality) inpainting was available from DALL•E for a year+ before we shipped GenFill in Photoshop, but the latter has quietly become an indispensible, game-changing piece of the imaging puzzle for millions of people. I’d love to see compositing improvements go the same way.
As I drove the Micronaxx to preschool back in 2013, Macklemore’s “Can’t Hold Us” hit the radio & the boys flipped out, making their stuffed buddies Leo & Ollie go nuts dancing to the tune. I remember musing with Dave Werner (a fellow dad to young kids) about being able to animate said buddies.
Fast forward a decade+, and now Dave is using Adobe’s recently unveiled Firefly Video model to do what we could only dimly imagine back then:
As soon as Google dropped DreamBooth back in 2022, people have been trying—generally without much success—to train generative models that can incorporate the fine details of specific products. Thus far it just hasn’t been possible to meet most brands’ demanding requirements for fidelity.
Now tiny startup Flair AI promises to do just that—and to pair the object definitions with custom styling and even video. Check it out:
You can now generate brand-consistent video advertisements for your products on @flairAI_
1. Train a model on your brand’s aesthetic 2. Train a model on your clothing or product 3. Combine both models in one prompt 4. Animate✨
In my limited experience so far, it’s cool but highly unpredictable. I’ll test it further, and I’d love to know how it works for you. Meanwhile you can try similar techniques via https://playground.com/:
Welcome to the new Playground
Use AI to design logos, t-shirts, social media posts, and more by just texting it like a person.
Wow @runwayml just dropped an updated Gen-3 Alpha Turbo Video-to-Video mode & it’s awesome! It’s super fast & lets you do 9:16 portrait video. Anything is possible! pic.twitter.com/AxeFaJwAPR
I have no idea what AI and other tools were used here, but it’d be fun to get a peek behind the curtain. As a commenter notes,
The meandering strings in the soundtrack. The hard studio lighting of the close-ups. The midtone-heavy Technicolor grading. The macro-lens DOF for animation sequences. This is spot-on 50’s film aesthetic, bravo.
And if that headline makes no sense, it probably just means your not terminally AI-pilled, and I’m caught flipping a grunt. 😉 Anyway, the tiny but mighty crew at Krea have brought the new Flux text-to-image model—including its ability to spell—to their realtime creation tool:
Flux now in Realtime.
available in Krea with hundreds of styles included.
What a fun little project & great NYC vibe-catcher: the folks at Runway captured street scenes with a disposable film camera, then used their model to put the images in motion. Check it out:
I love seeing how scrappy creators combine tools in new ways, blazing trails that we may come to see as commonplace soon enough. Here Eric Solorio (enigmatic_e) shows how he used Viggle & other tools to create his viral Deadpool animation:
As promised, here is a breakdown of how I did the Deadpool animation I recently posted. pic.twitter.com/F130Skq17U
I’ve been having a ball using the new Ideogram app for iOS to import photos & remix them into new creations. This is possible via their web UI as well, but there’s something extra magical about the immediacy of capture & remix. Check out a couple quick explorations I did while out with the kids, starting from a ballcap & the fuel tank of an old motorcycle:
I love this level of transparency from the folks behind Photo AI. Developer @levelsio reports,
[Flux] made Photo AI finally good enough overnight to be actually used by people and be satisfied with the results… it’s more expensive [than SD] but worth it because the photos are way way better… Not sure about profitability but with SD it was about 85% profit. With Flux def less maybe 65%… Very unplanned and grateful the foundational models got better.
We’re arguably in something of a trough of disillusionment in the AI-art hype cycle, but this kind of progress gives reason for hope: more quality & more utility do translate into more sustainable value—and there’s every reason to think that things will only improve from here.
Flux, the new AI model, changes businesses (and lives)
It made https://t.co/1vEawpI5vb finally good enough overnight to be actually used by people and be satisfied with the results
All my improvements before helped but now it’s accelerating with Flux’s photo quality pic.twitter.com/BiAqi5BgnY
Listen, I know that it’s a lot more seductive & cathartic to say “I f*cking hate generative AI,” and you can get 90,000+ likes for doing so, but—believe it or not—thoughtfulness & nuance actually matter. That is, how one uses generative tech can have very different implications for the creative community.
It’s therefore important to evaluate a range of risk/reward scenarios: What’s unambiguously useful & low-risk, vs. what’s an inducement to ripping people off, and what lies in the middle?
I see a continuum like this (click/tap to see larger):
None of this will draw any attention or generate much conversation—at least if my attempts to engage people on Twitter are any indication—but it’s the kind of thing actual toolmakers must engage with if we’re to make progress together. And so, back to work.
My friend Nathan has fed a mix of Schwarzenegger photos & drawings from Aesop’s Fables into the new open-source Flux model, creating a rad woodcut style. That’s interesting enough on its own—but it’s so 24 hours ago, and thus he’s now taken to animating the results. Check out the thread below for details:
Animating yesterday’s #FLUX woodcut Arnold using one of my favorite clips from the old soundboards
This uses Follow-Your-Emoji / Reference UNet in ComfyUI, which did a better job than LivePortrait.
It’s wild that capabilities that blew our minds two years ago—for which I & others spent months on a waiting list for DALL•E, which demanded beefy servers to run—are now available (only better) running in your pocket, on your telephone. Check out the latest from Google:
Pixel Studio is a first-of-its-kind image generator. So now you can bring all ideas to life from scratch, right on your phone — a true creative canvas.9
It’s powered by combining an on-device diffusion model running on Tensor G4 and our Imagen 3 text-to-image model in the cloud. With a UI optimized for easy prompting, style changes and editing, you can quickly bring your ideas to conversations with friends and family.
3. Pixel Studio
Create anything you imagine with PixelStudio, a groundbreaking image generator powered by an on-device diffusion model. It’s your AI canvas. pic.twitter.com/oDBqkUfqOR
Back when I worked on Google Photos, and especially later when I worked in Research, I really wanted to ship a camera mode that would help ensure great group photos. Prior to the user pressing the capture button, it would observe the incoming video stream, notice when it had at least one instance of each face smiling with their eyes open, and then knit together a single image in which everyone looked good.
Of course, the idea was hardly new: I’d done the same thing manually with my own wedding photos back in 2005, and in 2013 Google+ introduced “AutoAwesome Smile” to select good expressions across images & merge them into a single shot. It was a great feature, though sadly the only time people noticed its existence is when it failed in often hilarious “AutoAwful” ways (turning your baby or dog into, say, a two-nosed Picasso). My idea was meant to improve on this by not requiring multiple photos, and of course by suppressing unwanted hilarity.
Anyway, Googlers gonna Google, and now the Pixel team has introduced an interactive mode that helps you capture & merge two shots—the first one of a group, and the second of the photographer who took the first. Check out Marques Brownlee’s 1-minute demo:
The most interesting AI feature on the new Pixels IMO: “Add Me”
We take the intuitive conversational flow of ChatGPT and merge it with Uizard generative UI capabilities and drag-and-drop editor, to provide you with an intuitive UI design generator. You can turn a couple of ideas into a digital product design concept in a flash!
I’m really curious to see how the application of LLMs & conversational AI reshapes the design process, from ideation & collaboration to execution, deployment, and learning—and I’d love to hear your thoughts! Meanwhile here’s a very concise look at how Autodesigner works:
And if that piques your interest, here’s a more in-depth look:
Google Research has devised “Alchemist,” a new way to swap object textures:
And people keep doing wonderful things with realtime image synthesis:
Happy mixing of decoder embeddings in real-time! Base prompt is ‘photo of a room, sofa, decor’ and the two knobs are ‘industrial’ and ‘rococo’. If you are wondering what is running there in the background… pic.twitter.com/5svyDy5C4e
Always pushing the limits of expressive tech, Martin Nebelong has paired Photoshop painting with AI rendering, followed by Runway’s new image-to-video model. “Days of Miracles & Wonder,” as always:
Painting with AI in photoshop – And doing magic with Runways new Gen 3 image to video. This stuff is insane.. wow.
Our tools and workflows are at the brink of an incredible renaissance.
In this history books, this clip will be referred to as “Owl and cake” 😛
Man, I’m old enough to remember rotoscoping video by hand—a process that quickly made me want to jump right out a window. Years later, when we were working on realtime video segmentation at Google, I was so proud to show the tech to a bunch of high school design students—only to have them shrug and treat it as completely normal.
Ah, but so it goes: “One of history’s few iron laws is that luxuries tend to become necessities and to spawn new obligations. Once people get used to a certain luxury, they take it for granted.” — Yuval Noah Harari
In any case, Meta has just released what looks like a great update to their excellent—and open-source—Segment Anything Model. Check it out:
Introducing Meta Segment Anything Model 2 (SAM 2) — the first unified model for real-time, promptable object segmentation in images & videos.
SAM 2 is available today under Apache 2.0 so that anyone can use it to build their own experiences
You can play with the demo and learn more on the site:
Following up on the success of the Meta Segment Anything Model (SAM) for images, we’re releasing SAM 2, a unified model for real-time promptable object segmentation in images and videos that achieves state-of-the-art performance.
In keeping with our approach to open science, we’re sharing the code and model weights with a permissive Apache 2.0 license.
We’re also sharing the SA-V dataset, which includes approximately 51,000 real-world videos and more than 600,000 masklets (spatio-temporal masks).
SAM 2 can segment any object in any video or image—even for objects and visual domains it has not seen previously, enabling a diverse range of use cases without custom adaptation.
Back when we launched Firefly (alllll the way back in March 2023), we hinted at the potential of combining 3D geometry with diffusion-based rendering, and I tweeted out a very early sneak peek:
Did you see this mind blowing Adobe ControlNet + 3D Composer Adobe is going to launch! It will really boost creatives’ workflow. Video through @jnack
A year+ later, I’m no longer working to integrate the Babylon 3D engine into Adobe tools—and instead I’m working directly with the Babylon team at Microsoft (!). Meanwhile I like seeing how my old teammates are continuing to explore integrations between 3D (in this case, project Neo). Here’s one quick flow:
Here’s a quick exploration from the always-interesting Martin Nebelong:
A very quick first test of Adobe Project Neo.. didn’t realize this was out in open beta by now. Very cool!
I had to try to sculpt a burger and take that through Krea. You know, the usual thing!
There’s some very nice UX in NEO and the list-based SDF editing is awesome.. very… pic.twitter.com/e3ldyPfEDw
And here’s a fun little Neo->Firefly->AI video interpolation test from Kris Kashtanova:
Tutorial: Direct your cartoons with Project Neo + Firefly + ToonCrafter
1) Model your characters in Project Neo 2) Generate first and last frame with Firefly + Structure Reference 3) Use ToonCrafter to make a video interpolation between the first and the last frame
As I’ve probably mentioned already, when I first surveyed Adobe customers a couple of years ago (right after DALL•E & Midjourney first shipped), it was clear that they wanted selective synthesis—adding things to compositions, and especially removing them—much more strongly than whole-image synthesis.
Thus it’s no surprise that Generative Fill in Photoshop has so clearly delivered Firefly’s strongest product-market fit, and I’m excited to see Illustrator following the same path—but for vectors:
Generative Shape Fill will help you improve your workflow including:
Create detailed, scalable vectors: After you draw or select your shape, silhouette, or outline in your artboard, use a text prompt to ideate on vector options to fill it.
Style Reference for brand consistency: Create a wide variety of options that match the color, style, and shape of your artwork to ensure a consistent look and feel.
Add effects to your creations: Enhance your vector options further by adding styles like 3D, geometric, pixel art or more.
They’re also adding the ability to create vector patterns simply via prompting:
Soon after Generative Fill shipped last year, people discovered that using a semi-opaque selection could help blend results into an environment (e.g. putting fish under water). The new Selection Brush in Photoshop takes functionality that’s been around for 30+ years (via Quick Select mode) and brings it more to the surface, which in turn makes it easier to control GenFill behavior:
For now the functionality is limited to upscaling, but I have to think that they’ll soon turn on the super cool relighting & restyling tech that enables fun like transforming my dog using just different prompts (click to see larger):
I wish Adobe hadn’t given up (at least for the last couple of years and foreseeable future) on the Smart Portrait tech we were developing. It’s been stuck at 1.0 since 2020 and could be so much better. Maybe someday!
In the meantime, check out LivePortrait:
Some impressive early results coming out of LivePortrait, a new model for face animation.
Upload a photo + a reference video and combine them!
Being able to declare what you want, instead of having to painstakingly set up parameters for materials, lighting, etc. may prove to be an incredibly unlock for visual expressivity, particularly around the generally intimidating realm of 3D. Check out what tyFlow is bringing to the table:
You can see a bit more about how it works in this vid…
Years ago Adobe experimented with a real-time prototype of Photoshop’s Landscape Mixer Neural Filter, and the resulting responsiveness made one feel like a deity—fluidly changing summer to winter & back again. I was reminded of using Google Earth VR, where grabbing & dragging th
Nothing came of it, but in the time since then, realtime diffusion rendering (see amazing examples from Krea & others) and image-to-image restyling have opened some amazing new doors. I wish I could attach filters to any layer in Photoshop (text, 3D, shape, image) and have it reinterpreted like this:
New way to navigate latent space. It preservers the underlying image structure and feels a bit like a powerful style-transfer that can be applied to anything. The trick is to… pic.twitter.com/orFBysBpkT
Using Magic Insert we are, for the first time, able to drag-and-drop a subject from an image with an arbitrary style onto another target image with a vastly different style and achieve a style-aware and realistic insertion of the subject into the target image.
Here is a demo that you can access on the desktop version of the website. We’re excited by the options Magic Insert opens up for artistic creation, content creation and for the overall expansion of GenAI controllability. pic.twitter.com/HhbfrEfXZH
Of course, much of the challenge here—where art meets science—is around identity preservation: to what extent can & should the output resemble the input? Here it’s subject to some interpretation. In other applications one wants an exact copy of a given person or thing, but optionally transformed in just certain ways (e.g. pose & lighting).
When we launched Firefly last year, we showed off some of Adobe’s then-new ObjectStitch tech for making realistic composites. It didn’t ship while I was there due to challenges around identity preservation. As far as I know those challenges remain only partially solved, so I’ll continue holding out hope—as I have for probably 30 years now!—for future tech breakthroughs that get us all the way across that line.
Check out this striking application of AI-powered relighting: a single rendering is deeply & realistically transformed via one AI tool, and the results are then animated & extended by another.
This Lego machine can easily create a beautiful pixelart of anything you want! It is programmed in Python, and, with help of OpenAI’s DALL-E 3, it can make anything!
Sten of the YouTube channel Creative Mindstorms demonstrates his very own robotprinter named Pixelbot 3000, made of LEGO bricks, that can produce pixel art with the help of OpenAI’s DALL-E 3 and AI images. Using a 32 x 32 plate and numerous round LEGO bricks, the robot printer automatically pins the pieces onto their designated positions until it forms the pixel art version of the image. He uses Python as his main programming language, and to create pixel art of anything, he employs AI, specifically OpenAI’s DALL-E 3.
Fun! You can grab the free browser extension here.
* right-click-remix any image w/ tons of amazing AI presets: Style Transfer, Controlnets… * build & remix your own workflows with full comfyUI support * local + cloud!
besides some really great default presets using all sorts of amazing ComfyUI workflows (which you can inspect and remix on http://glif.app), the extension will now also pull your own compatible glifs into it!
The tech, a demo of which you can try here, promises “‘imitative editing,’ allowing users to edit images using reference images without the need for detailed text descriptions.”
Good grief, the pace of change makes “AI vertigo” such a real thing. Just last week we were seeing “skeleton underwater” memes with Runway submerged in a rusty chair. :-p I’m especially excited to see how it handles text (which remains a struggle for text-to-image models including DALL•E):