Category Archives: AI/ML

Krea introduces realtime 3D-guided image generation

Part 9,201 of me never getting over the fact we were working on stuff like this 2 years ago at Adobe (modulo the realtime aspect, which is rad) & couldn’t manage to ship it. It’ll be interesting to see whether the Krea guys (and/or others) pair this kind of interactive-quality rendering with a really high-quality pass, as NVIDIA demonstrated last week using Flux.

Creating a 3D scene from text

…featuring a dose of Microsoft Trellis!

More about Trellis:

Powered by advanced AI, TRELLIS enables users to create high-quality, customizable 3D objects effortlessly using simple text or image prompts. This innovation promises to improve 3D design workflows, making it accessible to professionals and beginner alike. Here are some examples:

Adobe demos generation of video with transparency

Exciting!

From the project page:

Alpha channels are crucial for visual effects (VFX), allowing transparent elements like smoke and reflections to blend seamlessly into scenes. We introduce TransPixar, a method to extend pretrained video models for RGBA generation while retaining the original RGB capabilities. […] Our approach effectively generates diverse and consistent RGBA videos, advancing the possibilities for VFX and interactive content creation.

NVIDIA + Flux = 3D magic

I may never stop being pissed that that the Firefly-3D integration we previewed nearly two years ago didn’t yield more fruit, at least on my watch:

The world moves on, and now NVIDIA has teamed up with Black Forest Labs to enable 3D-conditioned image generation. Check out this demo (starting around 1:31:48):

Details:

For users interested in integrating the FLUX NIM microservice into their workflows, we have collaborated with NVIDIA to launch the NVIDIA AI Blueprint for 3D-guided generative AI. This packaged workflow allows users to guide image generation by laying out a scene in 3D applications like Blender, and using that composition with the FLUX NIM microservice to generate images that adhere to the scene. This integration simplifies image generation control and showcases what’s possible with FLUX models.

Skillful Lovecraftian horror

The Former Bird App™ is of course awash in mediocre AI-generated video creations, so it’s refreshing to see what a gifted filmmaker (in this case Ruairi Robinson) can do with emerging tools (in this case Google Veo)—even if that’s some slithering horror I’d frankly rather not behold!

New AI-powered upscalers arrive

Check out the latest from Topaz:


Alternately, you can run InvSR via Gradio:

Strolling through the latent space in Runway

I’ve long wanted—and advocated for building—this kind of flexible, spatial way to compose & blend among ideas. Here’s to new ideas for using new tools.

A rather incredible demo of Pika Scene Ingredients

Director Matan Cohen-Grumi shows off the radical acceleration in VFX-heavy storytelling that’s possible through emerging tools—including Pika’s new Scene Ingredients:

Google introduces “Whisk,” a fun image remixer

Check out this fun little toy:

Instead of generating images with long, detailed text prompts, Whisk lets you prompt with images. Simply drag in images, and start creating.

Whisk lets you input images for the subject, one for the scene and another image for the style. Then, you can remix them to create something uniquely your own, from a digital plushie to an enamel pin or sticker.

The blog post gives a bit more of a peek behind the scenes & sets some expectations:

Since Whisk extracts only a few key characteristics from your image, it might generate images that differ from your expectations. For example, the generated subject might have a different height, weight, hairstyle or skin tone. We understand these features may be crucial for your project and Whisk may miss the mark, so we let you view and edit the underlying prompts at any time.

In our early testing with artists and creatives, people have been describing Whisk as a new type of creative tool — not a traditional image editor. We built it for rapid visual exploration, not pixel-perfect edits. It’s about exploring ideas in new and creative ways, allowing you to work through dozens of options and download the ones you love.

And yes, uploading a 19th-century dog illustration to generate a plushie dancing an Irish jig is definitely the most JNack way to squander precious work time do vital market research. 🙂

The cool generative 3D hits keep coming

Just a taste of the torrent the blows past daily on The Former Bird App:

  • Rodin 3D: “Rodin 3D AI can create stunning, high-quality 3D models from just text or image inputs.”
  • Trellis 3D: “Iterative prompting/mesh editing. You can now prompt ‘remove X, add Y, Move Z, etc.’… Allows decoding to different output formats: Radiance Fields, 3D Gaussians, and meshes.”
  • Blender GPT: “Generating 3D assets has never been easier. Here’s me putting together an entire 3D scene in just over a minute.”

Google demos amazing image editing done purely through voice

This might be the world’s lowest-key demo of what promises to be truly game-changing technology!

I’ve tried a number of other attempts at unlocking this capability (e.g. Meta.ai (see previous), Playground.com, and what Adobe sneak-peeked at the Firefly launch in early 2023), but so far I’ve found them all more unpredictable & frustrating than useful. Could Gemini now have turned the corner? Only hands-on testing (not yet broadly available) will tell!

Shedding new light with LumiNet

Diffusion models are ushering in what feels like a golden(-hour) age in relighting (see previous). Among the latest offerings is LumiNet:

I’ve shipped my first feature at Microsoft!

What if your design tool could understand the meaning & importance of words, then help you style them accordingly?

I’m delighted to say that for what I believe is the first time ever, that’s now possible. For the last 40 years of design software, apps have of course provided all kinds of fonts, styles, and tools for manual typesetting. What they’ve lacked is an understanding of what words actually mean, and consequently of how they should be styled in order to map visual emphasis to semantic importance.

In Microsoft Designer, you can now create a new text object, then apply hierarchical styling (primary, secondary, tertiary) based on AI analysis of word importance:

I’d love to hear what you think. You can go to designer.microsoft.com, create a new document, and add some text. Note: The feature hasn’t yet been rolled out to 100% of users, so it may not yet be available to you—but even in that case it’d be great to hear your thoughts on Designer in general.

This feature came about in response to noticing that text-to-image models are not only learning to spell well (check out some examples I’ve gathered on Pinterest), but can also set text with varied size, position, and styling that’s appropriate to the importance of each word. Check out some of my Ideogram creations (which you can click on & remix using the included prompts):

These results of course incredible (imagine seeing any of this even three years ago!), but they’re just flat images, not editable text. Our new feature, by contrast, leverages semantic understanding and applies it to normal text objects.

What we’ve shipped now is just the absolute tip of the iceberg: to start we’re simply applying preset values based on word hierarchy, but you can readily imagine richer layouts, smart adaptive styling, and much more. Stay tuned—and let us know what you’d like to see!

Kling AI promises virtual try-ons

Accurately rendering clothing on humans, and especially estimating their dimensions to enable proper fit (and thus reduce costly returns), has remained a seductive yet stubbornly difficult problem. I’ve written previously about challenges I observed at Google, plus possible steps forward.

Now Kling is promising to use generative video to pair real people & real outfits for convincing visualization (but not fit estimation). Check it out:

FlipSketch promises text-to-animation

We present FlipSketch, a system that brings back the magic of flip-book animation — just draw your idea and describe how you want it to move! …

Unlike constrained vector animations, our raster frames support dynamic sketch transformations, capturing the expressive freedom of traditional animation. The result is an intuitive system that makes sketch animation as simple as doodling and describing, while maintaining the artistic essence of hand-drawn animation.

BlendBox AI promises fast, interactive compositing

I’m finding the app (which is free to try for a couple of moves, but which quickly runs out of credits) to be pretty wacky, as it continuously regenerates elements & thus struggles with identity preservation. The hero vid looks cool, though:

Incisive points on AI & filmmaking from Ben Affleck

Ignoring the misguided (IMHO) contents of the surrounding tweet, I found these four minutes of commentary to be extremely sharp & well informed:

New Google ReCapture tech enables post-capture camera control

Man, I miss working with these guys & gals…

We present ReCapture, a method for generating new videos with novel camera trajectories from a single user-provided video. Our method allows us to re-generate the source video, with all its existing scene motion, from vastly different angles and with cinematic camera motion.

They note that ReCapture is substantially different from other work. Existing methods can control camera either on images or on generated videos and not arbitrary user-provided videos. Check it out:

A love letter to splats

Paul Trillo relentlessly redefines what’s possible in VFX—in this case scanning his back yard to tour a magical tiny world:

Here he gives a peek behind the scenes: 

And here’s the After Effects plugin he used:

Relighting via Midjourney

Check out this impressive use of the new “retexture” feature, which enables image-to-image transformations:

Here’s a bit more on how the new editing features work:

Ideogram Canvas arrives

I’ve become an Ideogram superfan, using it to create imagery daily, so I’m excited to kick the tires on this new interactive tool—especially around its ability to synthesize new text in the style of a visual reference.

You can upload your own images or generate new ones within Canvas, then seamlessly edit, extend, or combine them using industry-leading Magic Fill (inpainting) and Extend (outpainting) tools. Use Magic Fill and Extend to bring your face or brand visuals to Ideogram Canvas and blend them with creative, AI-generated elements. Perfect for graphic design, Ideogram Canvas offers advanced text rendering and precise prompt adherence, allowing you to bring your vision to life through a flexible, iterative process.

Project Perfect Blend promises game-changing compositing in Photoshop

Oh man, for years we wanted to build this feature into Photoshop—years! We tried many times (e.g. I wanted this + scribble selection to be the marquee features in Photoshop Touch back in 2011), but the tech just wasn’t ready. But now, maybe, the magic is real—or at least tantalizingly close!

Being a huge nerd, I wonder about how the tech works, and whether it’s substantially the same as what Magnific has been offering (including via a Photoshop panel) for the last several months. Here’s how I used that on my pooch:

But even if it’s all the same, who cares?

Being useful to people right where they live & work, with zero friction, is tremendous. Generative Fill is a perfect example: similar (if lower quality) inpainting was available from DALL•E for a year+ before we shipped GenFill in Photoshop, but the latter has quietly become an indispensible, game-changing piece of the imaging puzzle for millions of people. I’d love to see compositing improvements go the same way.

The ceiling can’t hold us stuffed animals

As I drove the Micronaxx to preschool back in 2013, Macklemore’s “Can’t Hold Us” hit the radio & the boys flipped out, making their stuffed buddies Leo & Ollie go nuts dancing to the tune. I remember musing with Dave Werner (a fellow dad to young kids) about being able to animate said buddies.

Fast forward a decade+, and now Dave is using Adobe’s recently unveiled Firefly Video model to do what we could only dimly imagine back then:

Time to unearth Leo & get him on stage at last. :->

Flair AI promises brand-consistent video creation

As soon as Google dropped DreamBooth back in 2022, people have been trying—generally without much success—to train generative models that can incorporate the fine details of specific products. Thus far it just hasn’t been possible to meet most brands’ demanding requirements for fidelity.

Now tiny startup Flair AI promises to do just that—and to pair the object definitions with custom styling and even video. Check it out:

Meta AI introduces conversational editing

I was super hyped last year when Meta announced “Emu Edit” tech for selectively editing images using just language:

Now you can try the tech via Meta.ai and in various apps:

In my limited experience so far, it’s cool but highly unpredictable. I’ll test it further, and I’d love to know how it works for you. Meanwhile you can try similar techniques via https://playground.com/:

“Jurassic Park – 1950’s Super Panavision 70”

Chaos reigns!

I have no idea what AI and other tools were used here, but it’d be fun to get a peek behind the curtain. As a commenter notes,

The meandering strings in the soundtrack. The hard studio lighting of the close-ups. The midtone-heavy Technicolor grading. The macro-lens DOF for animation sequences. This is spot-on 50’s film aesthetic, bravo.

[Via Andy Russell]

Flux goes realtime with Krea

And if that headline makes no sense, it probably just means your not terminally AI-pilled, and I’m caught flipping a grunt. 😉 Anyway, the tiny but mighty crew at Krea have brought the new Flux text-to-image model—including its ability to spell—to their realtime creation tool:

Behind the scenes: DIY Deadpool

I love seeing how scrappy creators combine tools in new ways, blazing trails that we may come to see as commonplace soon enough. Here Eric Solorio (enigmatic_e) shows how he used Viggle & other tools to create his viral Deadpool animation:

See also some of his luchador moves, plus more on his various feeds: