Category Archives: AI/ML

VFX: Karin Fong talks AI, title design, and more

March 24, 2024AI/ML, Designjnack

Last year I posted about the Imaginary Forces’ beautiful, eerie title sequence for Amazon’s Jack Ryan series, and now School of Motion has sat down for an in-depth discussion with creative director Karin Fong. They talk about a wide range of topics, including AI & its possible impacts towards the 1:09 mark.

Here’s a look behind the scenes of the Jack Ryan sequence:

Stability introduces Image Services APIs

March 23, 2024AI/MLjnack

Given just the latest news, the company’s name sounds ironic, but I love seeing them offer capabilities that we previewed in the Firefly teaser video now more than a year ago. (Here’s hoping Adobe announces some progress on that front at Adobe Summit this coming week.)

Magnific style transfer is amazing

March 20, 2024AI/ML, Illustrationjnack

It’s amazing to see what two people (?!) are able to do. Check out this video & the linked thread, as well as the tool itself.

IT’S FINALLY HERE!

Magnific Style Transfer!

Transform any image, controlling the amount of style transferred and the structural integrity Infinite use cases! 3d, video games, interior design, for fun…

Details, 10 tutorials & HUGE surprise pic.twitter.com/mtR3tVwKmR

— Javi Lopez (@javilopen) March 18, 2024

I’m gonna have a ball going down this rabbit hole, especially for type:

OMG, I’m gonna get completely addicted to @Magnific_AI stylization for typography, aren’t I? pic.twitter.com/6KzppIBTxE

— John Nack (@jnack) March 20, 2024

Amazing: Realtime AI rendering of Photoshop

March 12, 2024AI/ML, Illustrationjnack

I cannot tell you how deeply I hope that the Photoshop team is paying attention to developments like this…

My Photoshop is more fun than yours :-p With a bit of help from Krea ai.

It’s a crazy feeling to see brushstrokes transformed like this in realtime.. And the feeling of control is magnitudes better than with text prompts.#ai #art pic.twitter.com/Rd8zSxGfqD

— Martin Nebelong (@MartinNebelong) March 12, 2024

Fun little AI->3D->AR experiments with Vision Pro

March 6, 20243D, AI/ML, AR/VRjnack

I love watching people connect the emerging creative dots, right in front of our eyes:

This workflow is really fun! Create any 3D object you can imagine in Apple Vision Pro, FAST!

Midjourney (or other image gen) -> TripoSR (modded) – Free USDZ Converter

More info in the thread pic.twitter.com/UsvsFkk3bK

— Blaine Brown  (@blizaine) March 6, 2024

So, @StabilityAI has this new experimental imageTo3D model, and I just painted a moon buggy in SageBrush, dropped it into their Huggingface space, converted it in Reality Converter, and air dropped it onto the moon – all on #AppleVisionPro pic.twitter.com/pj3TTcy5zt

— Gregory Wieber (@dreamwieber) March 7, 2024

AI Mortal Kombat

March 5, 2024AI/MLjnack

Heh—these are obviously silly but well done, and they speak to the creative importance of being specific—i.e. representing particular famous faces. I sometimes note that a joke about a singer & a football player is one thing, whereas a joke about Taylor Swift & Travis Kelce is a whole other thing, all due to it being specific. Thus, for an AI toolmaker, knowing exactly where to draw the line (e.g. disallowing celebrity likenesses) isn’t always so clear.

Generative AI + Human creativity

pic.twitter.com/m8DzIRNDaC

— Linus ●ᴗ● Ekenstam (@LinusEkenstam) March 3, 2024

So… what am I actually doing at Microsoft?

March 1, 2024AI/ML, DALL•Ejnack

It’s a great question, and I think it’s really thoughtful that the day before I joined, the company was generous enough to run a Superb Owl—er, Super Bowl—commercial, just to help me explain the mission to my parents. 😀

But seriously, this ad provides a brief peek into the world of how Copilot can already generate beautiful, interesting things based on your needs—and that’s a core part of the mission I’ve come here to tackle.

A few salient screenshots:

Ideogram promises state-of-the-art text generation

February 28, 2024AI/ML, Ideogram, Typographyjnack

Founded by ex-Google Imagen engineers, Ideogram has just launched version 1.0 widely. It’s said to offer new levels of fidelity in the traditionally challenging domain of type rendering:

Introducing Ideogram 1.0: the most advanced text-to-image model, now available on https://t.co/Xtv2rRbQXI!

This offers state-of-the-art text rendering, unprecedented photorealism, exceptional prompt adherence, and a new feature called Magic Prompt to help with prompting. pic.twitter.com/VOjjulOAJU

— Ideogram (@ideogram_ai) February 28, 2024

Historically, AI-generated text within images has been inaccurate. Ideogram 1.0 addresses this with reliable text rendering capabilities, making it possible to effortlessly create personalized messages, memes, posters, T-shirt designs, birthday cards, logos and more. Our systematic evaluation shows that Ideogram 1.0 is the state-of-the-art in the accuracy of rendered text, reducing error rates by almost 2x compared to existing models.

Holy cow, I work at Microsoft!

February 26, 2024AI/ML, DALL•E, Shit That Actually Mattersjnack

Most folks’ first thought: Wait, whaaaaat?!

Second thought: Actually… that makes sense!

So, it’s true: After nearly three great years back at Adobe, I’ve moved to just the third place I’ve worked since the Clinton Administration: Microsoft!

I’ve signed on with a great group of folks to bring generative imaging magic to as many people as possible, leveraging the power of DALL•E, ChatGPT, Copilot, and other emerging tech to help make fun, beautiful, meaningful things. And yes, they have a very good sense of humor about Clippy, so go ahead and get those jokes out now. :->

It really is a small world: The beautiful new campus (see below) is just two blocks from my old Google office (where I reported to the same VP who’s now in charge of my new group), which itself is just down the road from the original Adobe HQ; see map. (Maybe I should get out more!)

And it’s a small world in a much more meaningful sense: I remain in a very rare & fortunate spot, getting to help guide brilliant engineers’ efforts in service of human creativity, all during what feels like one of the most significant inflection points in decades. I’m filled with gratitude, curiosity, and a strong sense of responsibility to make the most of this moment.

Thank you to my amazing Adobe colleagues for your hard & inspiring work, and especially for chance to build Firefly over the last year. It’s just getting started, and there’s so much we can do together.

Thank you to my new team for opening this door for us. And thank you to the friends & colleagues reading these words. I’ll continue to rely on your thoughtful, passionate perspectives as we navigate these opportunities together.

Let’s do this!

Fun papercraft-styled video

February 21, 2024AI/MLjnack

My friend Nathan Shipley has been deeply exploring AnimateDiff for the last several months, and he’s just collaborated with the always entertaining Karen X. Cheng to make this little papercraft-styled video:

While we’re all waiting for access to Sora…

Here’s our test using open source tools. You can get a decent level of creative control with AnimateDiff

Collab with @CitizenPlain
Music @Artlist_io pic.twitter.com/jNoWzzZDK7

— Karen X. Cheng (@karenxcheng) February 21, 2024

“Neither Artificial nor Intelligent: Artists Working with Algorithms”

February 20, 2024AI/MLjnack

Just in case you’ll be around San Jose this Friday, check out this panel discussion featuring our old Photoshop designer Julie Meridian & other artists discussing their relationship with AI:

Panel discussion: Friday, February 23rd 7pm–9pm. Free admission

Featuring Artists: Julie Meridian, James Morgan, and Steve Cooley
Moderator: Cherri Lakey

KALEID Gallery is proud to host this panel with three talented artists who are using various AI tools in their artistic practice while navigating all the ethical and creative dilemmas that arise with it. With all the controversy around AI collaborative / generated art, we’re looking forward to hearing from these avant-garde artists that are exploring the possibilities of a positive outcome for artists and creatives in this as-of-yet undefined new territory.

“Boximator” enables guided image->video

February 16, 2024AI/MLjnack

Check out this research from ByteDance, the makers of TikTok (where it could well be deployed), which competes with tools like Runway’s Motion Brush:

Boximator: TikTok enters the AI game!

Image-to-video by drawing constraints, motion paths and a prompt.

This the most impressive thing I’ve seen in months. We are facing a paradigm shift, and those who fail to see it will be swept away by the tsunami.

LINK + INFO pic.twitter.com/w9X0nCVBvb

— Javi Lopez (@javilopen) February 13, 2024

Check out Sora, OpenAI’s eye-popping video model

February 15, 2024AI/MLjnack

Hot on the heels of Lumiere from Google…

…here comes Sora from OpenAI:

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf

— OpenAI (@OpenAI) February 15, 2024

My only question: How did they not call it SORR•E? :-p

But seriously, as always…

OpenAI, Meta, & Microsoft promote AI transparency

February 12, 2024AI/MLjnack

Good progress across the board:

OpenAI is adding new watermarks to DALL-E 3
- “The company says watermarks from C2PA will appear in images generated on the ChatGPT website and the API for the DALL-E 3 model. Mobile users will get the watermarks by February 12th. They’ll include both an invisible metadata component and a visible CR symbol, which will appear in the top left corner of each image.”
Meta Will Label AI Images Across Facebook, Instagram, & Threads
- “Meta will employ various techniques to differentiate AI-generated images from other images. These include visible markers, invisible watermarks, and metadata embedded in the image files… Additionally, Meta is implementing new policies requiring users to disclose when media is generated by artificial intelligence, with consequences for failing to comply.”
Building trust with content credentials in Microsoft Designer
- “When you create a design in Designer you can also decide if you’d like to include basic, trustworthy facts about the origin of the design or the digital content you’ve used in the design with the file.”

Firefly image creation & Lightroom come to Apple Vision Pro

February 11, 2024Adobe Firefly, AR/VR, Lightroomjnack

Not having a spare $3500 burning a hole in my pocket, I’ve yet to take this for a spin myself, but I’m happy to see it. Per the Verge:

The interface of the Firefly visionOS app should be familiar to anyone who’s already used the web-based version of the tool — users just need to enter a text description within the prompt box at the bottom and hit “generate.” This will then spit out four different images that can be dragged out of the main app window and placed around the home like virtual posters or prints. […]

Meanwhile, we also now have a better look at the native Adobe Lightroom photo editing app that was mentioned back when the Apple Vision Pro was announced last June. The visionOS Lightroom experience is similar to that of the iPad version, with a cleaner, simplified interface that should be easier to navigate with hand gestures than the more feature-laden desktop software.

Come share your recipes via the Firefly site

February 11, 2024Adobe Firefly, AI/MLjnack

I’m delighted to say that firefly.adobe.com now supports a live stream of community-created generative recipes. You can share your own simply by creating images via the Text to Image module, then clicking the share button. I’m especially pleased that if you use Generative Match to choose a stylization guide image, that image will be included in the recipe for anyone to use.

Check out my chat with Wharton

February 6, 2024AI/MLjnack

I had a chance to sit down for an interesting & wide-ranging chat with folks from the Wharton Tech Club:

Tune into the latest episode of the Wharton Tech Toks podcast! Leon Zhang and Stephanie Kim chat with John Nack, Principal Product Manager at Adobe with 20+ years of PM experience across Adobe and Google, about GenAI for creators, AI ethics, and more. He also reflects on his career journey. This episode is great if you’re recruiting for tech, PM, or Adobe.

Listen now on Apple Podcasts or Spotify.

As always I’d love to know what you think.

View this post on Instagram

A post shared by Wharton Tech Club (@whartontech)

Making today’s AI interfaces “look completely absurd”

February 4, 2024AI/ML, User Interfacejnack

Time is a flat circle…

Daring Fireball’s Mac 40th anniversary post contained a couple of quotes that made me think about the current state of interaction with AI tools, particularly around imaging. First, there’s this line from Steven Levy’s review of the original Mac:

[W]hat you might expect to see is some sort of opaque code, called a “prompt,” consisting of phosphorescent green or white letters on a murky background.

Think about how revolutionarily different & better (DOS-head haters’ gripes notwithstanding) this was.

What you see with Macintosh is the Finder. On a pleasant, light background, little pictures called “icons” appear, representing choices available to you.

And then there’s this kicker:

“When you show Mac to an absolute novice,” says Chris Espinosa, the twenty-two-year-old head of publications for the Mac team, “he assumes that’s the way all computers work. That’s our highest achievement. We’ve made almost every computer that’s ever been made look completely absurd.”

I don’t know quite what will make today’s prompt-heavy approach to generation feel equivalently quaint, but think how far we’ve come in less than two years since DALL•E’s public debut—from swapping long, arcane codes to having more conversational, iterative creation flows (esp. via ChatGPT) and creating through direct, realtime UIs like those offered via Krea & Leonardo. Throw in a dash of spatial computing, perhaps via “glasses that look like glasses,” and who knows where we’ll be!

But it sure as heck won’t mainly be knowing “some sort of opaque code, called a ‘prompt.'”

My panel discussion at the AI User Conference

January 31, 2024Adobe Firefly, AI/MLjnack

Thanks to Jackson Beaman & crew for putting together a great event yesterday in SF. I joined him, KD Deshpande (founder of Simplified), and Sofiia Shvets (founder of Let’s Enhance & Claid.ai) for a 20-minute panel discussion (which starts at 3:32:03 or so, in case the embedded version doesn’t jump you to the proper spot) about creating production-ready imagery using AI. Enjoy, and please let me know if you have any comments or questions!

The Founding Fathers talk AI art

January 29, 2024AI/ML, Idle Philosophizingjnack

Well, not exactly—but T-Paine’s words about how we value things still resonate today:

We humans are fairly good at pricing effort (notably in dollars paid per hour worked), but we struggle much more with pricing value. Cue the possibly apocryphal story about Picasso asking $10,000 for a drawing he sketched in a matter of seconds, but the ability to create which had taken him a lifetime.

A couple of related thoughts:

My artist friend is a former Olympic athlete who talks about how people bond through shared struggle, particularly in athletics. For him, someone using AI-powered tools is similar to a guy showing up at the gym with a forklift, using it to move a bunch of weight, and then wanting to bond afterwards with the actual weightlifters.
I see ostensible thought leaders crowing about the importance of “taste,” but I wonder how they think that taste is or will be developed in the absence of effort.
As was said of—and by?—Steve Jobs, “The journey is the reward.”

[Via Louis DeScioli]

After Effects + Midjourney + Runway = Harry Potter magic

January 26, 2024After Effects, AI/MLjnack

It’s bonkers what one person can now create—bonkers!

I edited out ziplines to make a Harry Potter flying video, added something special at the end
byu/moviemaker887 inAfterEffects

I took a video of a guy zip lining in full Harry Potter costume and edited out the zip lines to make it look like he was flying. I mainly used Content Aware Fill and the free Redgiant/Maxon script 3D Plane Stamp to achieve this.

For the surprise bit at the end, I used Midjourney and Runway’s Motion Brush to generate and animate the clothing.

Trapcode Particular was used for the rain in the final shot.

I also did a full sky replacement in each shot and used assets from ProductionCrate for the lighting and magic wand blast.

[Via Victoria Nece]

Krea upgrades its realtime generation

January 25, 2024AI/ML, Illustrationjnack

I had the pleasure of hanging out with these crazy-fast-moving guys last week, and I remain amazed at the speed of their shipping velocity. Check out the latest updates to their realtime canvas:

big upgrades to quality!

announcing Portrait, Concept, CGI, and Cartoon.

try them for free in KREA real-time (link below). pic.twitter.com/iLqyWHT1Vn

— KREA AI (@krea_ai) January 25, 2024

Check out how trailblazing artist Martin Nebelong is putting it to use:

Speaking about control.. another step in the right direction!

Testing out the latest Krea ai updates.. crazy stuff

Combined with some Magnific ai magic.#ai #art pic.twitter.com/9Zz4JY4W15

— Martin Nebelong (@MartinNebelong) January 26, 2024

Google introduces Lumiere for video generation & editing

January 24, 2024AI/MLjnack

Man, not a day goes by without the arrival of some new & mind-blowing magic—not a day!

We introduce Lumiere — a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion — a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution — an approach that inherently makes global temporal consistency difficult to achieve. […]

We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.

Content credentials are coming to DALL•E

January 23, 2024AI/ML, DALL•Ejnack

From its first launch, Adobe Firefly has included support for content credentials, providing more transparency around the origin of generated images, and I’m very pleased to see Open AI moving in the same direction:

Early this year, we will implement the Coalition for Content Provenance and Authenticity’s digital credentials—an approach that encodes details about the content’s provenance using cryptography—for images generated by DALL·E 3.

We are also experimenting with a provenance classifier, a new tool for detecting images generated by DALL·E. Our internal testing has shown promising early results, even where images have been subject to common types of modifications. We plan to soon make it available to our first group of testers—including journalists, platforms, and researchers—for feedback.

Tutorial: Firefly + Character Animator

January 19, 2024Adobe Firefly, AI/ML, Illustrationjnack

Helping discover Dave Werner & bring him into Adobe remains one of my favorite accomplishments at the company. He continues to do great work in designing characters as well as the tools that can bring them to life. Watch how he combines Firefly with Adobe Character Animator to create & animate a stylish tiger:

Adobe Firefly’s text to image feature lets you generate imaginative characters and assets with AI. But what if you want to turn them into animated characters with performance capture and control over elements like arm movements, pupils, talking, and more? In this tutorial, we’ll walk through the process of taking a static Adobe Firefly character and turning it into an animated puppet using Adobe Photoshop or Illustrator plus Character Animator.

0:00 Intro
0:57 Generating artwork
1:47 Photoshop Basic Prep
6:19 Character Animator Basic Rigging
8:07 Eyes
14:02 Mouth
19:48 Arms & Physics Prep
26:05 Arms & Physics Rigging
32:02 Using Adobe Illustrator
34:59 Outro

“How Adobe is managing the AI copyright dilemma, with general counsel Dana Rao”

January 16, 2024AI/ML, Shit That Actually Mattersjnack

Honestly, if you asked, “Hey, wanna spend an hour+ listening to current and former intellectual property attorneys talking about EU antitrust regulation, ethical data sourcing, and digital provenance,” I might say, “Ehmm, I’m good!”—but Nilay Patel & Dana Rao make it work.

I found the conversation surprisingly engrossing & fast-moving, and I was really happy to hear Dana (with whom I’ve gotten to work some regarding AI ethics) share thoughtful insights into how the company forms its perspectives & works to put its values into practice. I think you’ll enjoy it—perhaps more than you’d expect!

Adobe’s hiring a prototyper to explore generative AI

January 10, 2024AI/MLjnack

We’re only just beginning to discover the experiential possibilities around generative creation, so I’m excited to see this rare gig open up:

You will build new and innovative user interactions and interfaces geared towards our customers unique needs, test and refine those interfaces in collaboration with academic research, user researchers, designers, artists and product teams.

Check out the listing for the full details.

Adobe Firefly named “Product of the Year”

January 2, 2024Adobe Fireflyjnack

Nice props from The Futurum Group:

Here is why: Adobe Firefly is the most commercially successful generative AI product ever launched. Since it was introduced in March in beta and made generally available in June, at last count in October, Firefly users have generated more than 3 billion images. Adobe says Firefly has attracted a significant number of new Adobe users, making it hard to imagine that Firefly is not aiding Adobe’s bottom line.

AI Holiday Leftovers, Vol. 3

December 30, 2023AI/MLjnack

Fun with famous IP:
- Game of Thrones Mario Kart Edition
- Midjourney happily replicates the Simpsons
- Batman, Star Wars, and 300 with text
- Star Wars action figures
- Biden in space, Trump with AR, etc.
Vectors: StarVector: Generating Scalable Vector Graphics Code from Images
How to train a custom Stable Diffusion model to generate consistent characters via LensGo.ai, which lets you train 3 custom models for free every month.
Insane 128x zoom-in on AI-generated meat.

AI Holiday Leftovers, Vol. 2

December 28, 2023AI/ML, Typographyjnack

3D:
- Paint Anything 3D with Lighting-Less Texture Diffusion Models: “Paint3D is a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs.”
- “Google just revealed an ABSOLUTE depth estimation model. As opposed to recent depth models (Marigold, PatchFusion) which aim for maximum details, DMD aims to estimate the ABSOLUTE depth (in meters) within the image.”
Typography:
- Retro-futuristic alphabet rendered with Midjourney V6: “Just swapped out the letter and kept everything else the same. Prompt: Letter “A”, cyberpunk style, metal, retro-futuristic, star wars, intrinsic details, plain black background. Just change the letter only. Not all renders are perfect, some I had to do a few times to get a good match. Try this strategy for any type of cool alphabet!”
- As many others have noted, Midjourney is now good at type. Find more here.

AI Holiday Leftovers, Vol. 1

December 26, 2023AI/ML, Try-onjnack

Dig in, friends. 🙂

Drawing/painting:
- Using a simple kids’ drawing tablet to create art: “I used @Vizcom_ai to transform the initial sketch. This tool has gotten soo good by now. I then used @LeonardoAi_’s image to image to enhance the initial image a bit, and then used their new motion feature to make it move. I also used @Magnific_AI to add additional details to a few of the images and Decohere AI’s video feature.”
- Latte art: “Photoshop paint sent to @freepik’s live canvas. The first few seconds of the video are real-time to show you how responsive it is. The music was made with @suno_ai_. Animation with Runways Gen-2.”

Photo editing:
- Google Photos gets a generative upgrade: “Magic Eraser now uses gen AI to fill in detail when users remove unwanted objects from photos. Google Research worked on the MaskGIT generative image transformer for inpainting, and improved segmentation to include shadows and objects attached to people.”
- Clothing/try-on:
  - PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns: “We propose a novel virtual try-on from unconstrained designs (ucVTON) task to enable photorealistic synthesis of personalized composite clothing on input human images.”
  - AnyDoor is “a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way.”
- SDXL Auto FaceSwap enables to create new images using the face of a source image (example attached).

AI: Tons of recent rad things

December 20, 20233D, AI/MLjnack

Realtime:
- Oh look, I’m George Clooney! Kinda. You can be, too. FAL AI promises “AI inference faster than you can type.”
- “100ms image generation at 1024×1024. Announcing Segmind-Vega and Segmind-VegaRT, the fastest and smallest, open source models for image generation at the highest resolution.”
- Krea has announced their open beta, “free for everyone.”
- How incredible would it be to have realtime generative brushes like this?
- Drawing to Video, made using Vizcom -> Leonardo -> Pika.
3D generation:
- ByteDance has released ImageDream (image to 3D)
- SceneWiz3D offers “A new approach to create high-fidelity 3D scenes from text and 3D object control”
- Image -> depth -> geometry using Marigold + Blender
3D for fashion, sculpting, and more:
- This is what Adobe Substance & a notional 3D mode of Firefly Text-to-Image should feel like.
- Outfit Anyone + Animate Anyone = virtual try on + movement.
- Sculpting/rendering via Adobe Substance 3D Modeler + Dreams + Unbound + Krea.
AnimateDiff v3 was just released.
Instagram has enabled image generation inside chat (pretty “meh,” in my experience so far), and in stories creation, “It allows you to replace a background of an image into whatever AI generated image you’d like.”
“Did you know that you can train an AI Art model and get paid every time someone uses it? That’s Generaitiv’s Model Royalties System for you.”

How-to: Combining Photoshop + ComfyUI

December 19, 2023AI/MLjnack

It’s a little nerdy even for my blood, but some of my teammates swear by these techniques that enable connecting Photoshop to a hosted instance of Stable Diffusion, enabling one to guide the process via a Photoshop doc and/or custom-trained styles:

“I Draw Better Than AI!”

December 18, 2023AI/MLjnack

Hah—I can dig this finger-rich pin from Pictoplasma.

AI image generation is getting crazy fast

December 11, 2023AI/MLjnack

Cats-per-second (CPS) FTW! 😀

Gemini is bonkers

December 6, 2023AI/MLjnack

I mean, seriously, what even is all this?? I can’t explain; just please watch.

0:00 Intro
0:19 Multimodal Dialogue
1:32 Multilinguality 2:04
Game Creation 2:31
Visual Puzzles 3:17
Making Connections
3:39 Image & Text Generation
4:06 Logic & Spatial Reasoning
4:55 Translating Visuals
5:27 Cultural Understanding

Baby, You Can Drive My Bricks

December 6, 2023AI/ML, DALL•Ejnack

I’ve had way too much fun creating custom Lego sets based on friends’ & family’s rides, so to help others do it, I’ve made my first custom GPT, “Baby You Can Drive My Bricks.” Take it for a spin & let me know what you create!

Pika Labs “Idea-to-Video” looks stunning

November 28, 2023AI/MLjnack

It’s ludicrous to think that these folks formed the company just six months ago, and even more ludicrous to see what the model can already do—from video synthesis, to image animation, to inpainting/outpainting:

Our vision for Pika is to enable everyone to be the director of their own stories and to bring out the creator in each of us. Today, we reached a milestone that brings us closer to our vision. We are thrilled to unveil Pika 1.0, a major product upgrade that includes a new AI model capable of generating and editing videos in diverse styles such as 3D animation, anime, cartoon and cinematic, and a new web experience that makes it easier to use. You can join the waitlist for Pika 1.0 at https://pika.art.

“Emu Edit” enables instructional image editing

November 28, 2023AI/MLjnack

This tech—or something much like it—is going to be a very BFD. Imagine simply describing the change you’d like to see in your image—and then seeing it.

[Generative models] still face limitations when it comes to offering precise control. That’s why we’re introducing Emu Edit, a novel approach that aims to streamline various image manipulation tasks and bring enhanced capabilities and precision to image editing.

Emu Edit is capable of free-form editing through instructions, encompassing tasks such as local and global editing, removing and adding a background, color and geometry transformations, detection and segmentation, and more. […]

Emu Edit precisely follows instructions, ensuring that pixels in the input image unrelated to the instructions remain untouched. For instance, when adding the text “Aloha!” to a baseball cap, the cap itself should remain unchanged.

NBA goes NeRF

November 25, 20233D, AI/ML, NeRFjnack

Here’s a great look at how the scrappy team behind Luma.ai has helped enable beautiful volumetric captures of Phoenix Suns players soaring through the air:

Go behind the scenes of the innovative collaboration between Profectum Media and the Phoenix Suns to discover how we overcame technological and creative challenges to produce the first 3D bullet time neural radiance field NeRF effect in a major sports NBA arena video. This involved not just custom-building a 48 GoPro multi-cam volumetric rig but also integrating advanced AI tools from Luma AI to capture athletes in stunning, frozen-in-time 3D visual sequences. This venture is more than just a glimpse behind the scenes – it’s a peek into the evolving world of sports entertainment and the future of spatial capture.

Phat Splats

November 17, 20233D, AI/MLjnack

If you keep hearing about “Gaussian Splatting” & wondering “WTAF,” check out this nice primer from my buddy Bilawal:

There’s also Two-Minute Papers, offering a characteristically charming & accessible overview:

GenAI demos from Russell Brown

November 16, 2023Adobe Firefly, AI/MLjnack

It’s always great to learn from the master—especially when he’s making “spaghetti western” literal!

Iterative creation with ChatGPT

November 15, 2023AI/MLjnack

I’m really digging the experience of (optionally) taking a photo, feeding it into ChatGPT, and then riffing my way towards an interesting visual outcome. Here’s a gallery in which you can see some of the journeys I’ve undertaken recently.

Image->description->image quality is often pretty hit-or-miss. Even so, it’s such a compelling possibility that I keep wanting to try it (e.g. seeing a leaf on the ground, wanting to try turning it into a stingray).
The system attempts to maintain various image properties (e.g. pose, color, style) while varying others (e.g. turning the attached vehicle from a box truck to a tanker while maintaining its general orientation plus specifics like featuring three Holstein cows).
Overall text creation is vastly improved vs. previous models, though it can still derail. It’s striking that one can iteratively improve a particular line of text (e.g. “Make sure that the second line says ‘TRAIN’“).

GenFill vs. eternal dog-pant mysteries

November 10, 2023AI/ML, Generative Filljnack

Hah! This is my kind of ridiculous Adobe social content. 🙂 Happy Friday.

View this post on Instagram

A post shared by Adobe (@adobe)

Hands up for Res Up ⬆️

November 6, 2023Adobe MAX, AI/MLjnack

Speaking of increasing resolution, check out this sneak peek from Adobe MAX:

It’s a video upscaling tool that uses diffusion-based technology and artificial intelligence to convert low-resolution videos to high-resolution videos for applications. Users can directly upscale low-resolution videos to high resolution. They can also zoom-in and crop videos and upscale them to full resolution with high-fidelity visual details and temporal consistency. This is great for those looking to bring new life into older videos or to prevent blurry videos when playing scaled versions on HD screens.

Adventures in Upsampling

November 2, 2023AI/ML, Enormousnessjnack

Interesting recent finds:

Google Zoom Enhance. “Using generative AI, Zoom Enhance intelligently fills in the gaps between pixels and predicts fine details, opening up more possibilities when it comes to framing and flexibility to focus on the most important part of your photo.”
Nick St. Pierre writes, “I just upscaled an image in MJ by 4x, then used Topaz Photo AI to upscale that by another 6x. The final image is 682MP and 32000×21333 pixels large.”
Here’s a thread of 10 Midjourney upsampling examples, including a direct comparison against Topaz.