Monthly Archives: December 2024

New AI-powered upscalers arrive

Check out the latest from Topaz:


Alternately, you can run InvSR via Gradio:

Strolling through the latent space in Runway

I’ve long wanted—and advocated for building—this kind of flexible, spatial way to compose & blend among ideas. Here’s to new ideas for using new tools.

A rather incredible demo of Pika Scene Ingredients

Director Matan Cohen-Grumi shows off the radical acceleration in VFX-heavy storytelling that’s possible through emerging tools—including Pika’s new Scene Ingredients:

Google introduces “Whisk,” a fun image remixer

Check out this fun little toy:

Instead of generating images with long, detailed text prompts, Whisk lets you prompt with images. Simply drag in images, and start creating.

Whisk lets you input images for the subject, one for the scene and another image for the style. Then, you can remix them to create something uniquely your own, from a digital plushie to an enamel pin or sticker.

The blog post gives a bit more of a peek behind the scenes & sets some expectations:

Since Whisk extracts only a few key characteristics from your image, it might generate images that differ from your expectations. For example, the generated subject might have a different height, weight, hairstyle or skin tone. We understand these features may be crucial for your project and Whisk may miss the mark, so we let you view and edit the underlying prompts at any time.

In our early testing with artists and creatives, people have been describing Whisk as a new type of creative tool — not a traditional image editor. We built it for rapid visual exploration, not pixel-perfect edits. It’s about exploring ideas in new and creative ways, allowing you to work through dozens of options and download the ones you love.

And yes, uploading a 19th-century dog illustration to generate a plushie dancing an Irish jig is definitely the most JNack way to squander precious work time do vital market research. 🙂

The cool generative 3D hits keep coming

Just a taste of the torrent the blows past daily on The Former Bird App:

  • Rodin 3D: “Rodin 3D AI can create stunning, high-quality 3D models from just text or image inputs.”
  • Trellis 3D: “Iterative prompting/mesh editing. You can now prompt ‘remove X, add Y, Move Z, etc.’… Allows decoding to different output formats: Radiance Fields, 3D Gaussians, and meshes.”
  • Blender GPT: “Generating 3D assets has never been easier. Here’s me putting together an entire 3D scene in just over a minute.”

Google demos amazing image editing done purely through voice

This might be the world’s lowest-key demo of what promises to be truly game-changing technology!

I’ve tried a number of other attempts at unlocking this capability (e.g. Meta.ai (see previous), Playground.com, and what Adobe sneak-peeked at the Firefly launch in early 2023), but so far I’ve found them all more unpredictable & frustrating than useful. Could Gemini now have turned the corner? Only hands-on testing (not yet broadly available) will tell!

Microsoft opens 13 new AI + Design roles

If you or folks you know might be a good fit for one or more of these roles, please check ’em out & pass along info. Here’s some context from design director Mike Davidson.

————

These positions are United States only, Redmond-preferred, but we’ll also consider the Bay Area and other locations:

These positions are specifically in our lovely Mountain View office:

Shedding new light with LumiNet

Diffusion models are ushering in what feels like a golden(-hour) age in relighting (see previous). Among the latest offerings is LumiNet:

I’ve shipped my first feature at Microsoft!

What if your design tool could understand the meaning & importance of words, then help you style them accordingly?

I’m delighted to say that for what I believe is the first time ever, that’s now possible. For the last 40 years of design software, apps have of course provided all kinds of fonts, styles, and tools for manual typesetting. What they’ve lacked is an understanding of what words actually mean, and consequently of how they should be styled in order to map visual emphasis to semantic importance.

In Microsoft Designer, you can now create a new text object, then apply hierarchical styling (primary, secondary, tertiary) based on AI analysis of word importance:

I’d love to hear what you think. You can go to designer.microsoft.com, create a new document, and add some text. Note: The feature hasn’t yet been rolled out to 100% of users, so it may not yet be available to you—but even in that case it’d be great to hear your thoughts on Designer in general.

This feature came about in response to noticing that text-to-image models are not only learning to spell well (check out some examples I’ve gathered on Pinterest), but can also set text with varied size, position, and styling that’s appropriate to the importance of each word. Check out some of my Ideogram creations (which you can click on & remix using the included prompts):

These results of course incredible (imagine seeing any of this even three years ago!), but they’re just flat images, not editable text. Our new feature, by contrast, leverages semantic understanding and applies it to normal text objects.

What we’ve shipped now is just the absolute tip of the iceberg: to start we’re simply applying preset values based on word hierarchy, but you can readily imagine richer layouts, smart adaptive styling, and much more. Stay tuned—and let us know what you’d like to see!

Kling AI promises virtual try-ons

Accurately rendering clothing on humans, and especially estimating their dimensions to enable proper fit (and thus reduce costly returns), has remained a seductive yet stubbornly difficult problem. I’ve written previously about challenges I observed at Google, plus possible steps forward.

Now Kling is promising to use generative video to pair real people & real outfits for convincing visualization (but not fit estimation). Check it out: