Category Archives: AI/ML

Pika Labs “Idea-to-Video” looks stunning

It’s ludicrous to think that these folks formed the company just six months ago, and even more ludicrous to see what the model can already do—from video synthesis, to image animation, to inpainting/outpainting:

Our vision for Pika is to enable everyone to be the director of their own stories and to bring out the creator in each of us. Today, we reached a milestone that brings us closer to our vision. We are thrilled to unveil Pika 1.0, a major product upgrade that includes a new AI model capable of generating and editing videos in diverse styles such as 3D animation, anime, cartoon and cinematic, and a new web experience that makes it easier to use. You can join the waitlist for Pika 1.0 at https://pika.art.

“Emu Edit” enables instructional image editing

This tech—or something much like it—is going to be a very BFD. Imagine simply describing the change you’d like to see in your image—and then seeing it.

[Generative models] still face limitations when it comes to offering precise control. That’s why we’re introducing Emu Edit, a novel approach that aims to streamline various image manipulation tasks and bring enhanced capabilities and precision to image editing.

Emu Edit is capable of free-form editing through instructions, encompassing tasks such as local and global editing, removing and adding a background, color and geometry transformations, detection and segmentation, and more. […]

Emu Edit precisely follows instructions, ensuring that pixels in the input image unrelated to the instructions remain untouched. For instance, when adding the text “Aloha!” to a baseball cap, the cap itself should remain unchanged.

Read more here & here.

And for some conceptually related (but technically distinct) ideas, see previous: Iterative creation with ChatGPT.

NBA goes NeRF

Here’s a great look at how the scrappy team behind Luma.ai has helped enable beautiful volumetric captures of Phoenix Suns players soaring through the air:

Go behind the scenes of the innovative collaboration between Profectum Media and the Phoenix Suns to discover how we overcame technological and creative challenges to produce the first 3D bullet time neural radiance field NeRF effect in a major sports NBA arena video. This involved not just custom-building a 48 GoPro multi-cam volumetric rig but also integrating advanced AI tools from Luma AI to capture athletes in stunning, frozen-in-time 3D visual sequences. This venture is more than just a glimpse behind the scenes – it’s a peek into the evolving world of sports entertainment and the future of spatial capture.

Phat Splats

If you keep hearing about “Gaussian Splatting” & wondering “WTAF,” check out this nice primer from my buddy Bilawal:

There’s also Two-Minute Papers, offering a characteristically charming & accessible overview:

Iterative creation with ChatGPT

I’m really digging the experience of (optionally) taking a photo, feeding it into ChatGPT, and then riffing my way towards an interesting visual outcome. Here’s a gallery in which you can see some of the journeys I’ve undertaken recently.

  • Image->description->image quality is often pretty hit-or-miss. Even so, it’s such a compelling possibility that I keep wanting to try it (e.g. seeing a leaf on the ground, wanting to try turning it into a stingray).
  • The system attempts to maintain various image properties (e.g. pose, color, style) while varying others (e.g. turning the attached vehicle from a box truck to a tanker while maintaining its general orientation plus specifics like featuring three Holstein cows).
  • Overall text creation is vastly improved vs. previous models, though it can still derail. It’s striking that one can iteratively improve a particular line of text (e.g. “Make sure that the second line says ‘TRAIN’“).

Hands up for Res Up ⬆️

Speaking of increasing resolution, check out this sneak peek from Adobe MAX:

It’s a video upscaling tool that uses diffusion-based technology and artificial intelligence to convert low-resolution videos to high-resolution videos for applications. Users can directly upscale low-resolution videos to high resolution. They can also zoom-in and crop videos and upscale them to full resolution with high-fidelity visual details and temporal consistency. This is great for those looking to bring new life into older videos or to prevent blurry videos when playing scaled versions on HD screens.

Adventures in Upsampling

Interesting recent finds:

  • Google Zoom Enhance. “Using generative AI, Zoom Enhance intelligently fills in the gaps between pixels and predicts fine details, opening up more possibilities when it comes to framing and flexibility to focus on the most important part of your photo.”
  • Nick St. Pierre writes, “I just upscaled an image in MJ by 4x, then used Topaz Photo AI to upscale that by another 6x. The final image is 682MP and 32000×21333 pixels large.”
  • Here’s a thread of 10 Midjourney upsampling examples, including a direct comparison against Topaz.

Demos: Using Generative AI in Illustrator

If you’ve been sleeping on Text to Vector, check out this handful of quick how-to vids that’ll get you up to speed:

Reflect on this: Project See Through burns through glare

Marc Levoy (professor emeritus at Stanford) was instrumental in delivering the revolutionary Night Sight mode on Pixel 3 phones—and by extension on all the phones that quickly copied their published techniques. After leaving Google for Adobe, he’s been leading a research team that’s just shown off the reflection-zapping Project See Through:

Today, it’s difficult or impossible to manually remove reflections. Project See Through simplifies the process of cleaning up reflections by using artificial intelligence. Reflections are automatically removed, and optionally saved as separate images for editing purposes. This gives users more control over when and how reflections appear in their photos.

What’s even better than Generative Fill? GenFill that moves.

Back in the day, I dreaded demoing Photoshop ahead of the After Effects team: we’d do something cool, and they’d make that cool thing move. I hear echoes of that in Project Fast Fill—generative fill for video.

Project Fast Fill harnesses Generative Fill, powered by Adobe Firefly, to bring generative AI technology into video editing applications. This makes it easy for users to use simple text prompts to perform texture replacement in videos, even for complex surfaces and varying light conditions. Users can use this tool to edit an object on a single frame and that edit will automatically propagate into the rest of the video’s frames, saving video editors a significant amount of texture editing time.

Check it out:

Adobe Project Posable: 3D humans guiding image generation

Roughly 1,000 years ago (i.e. this past April!),  I gave an early sneak peek at the 3D-to-image work we’ve been doing around Firefly. Now at MAX, my teammate Yi Zhou has demonstrated some additional ways we could put the core tech to work—by adding posable humans to the scene.

Project Poseable makes it easy for anyone to quickly design 3D prototypes and storyboards in minutes with generative AI.

Instead of having to spend time editing the details of a scene — the background, different angles and poses of individual characters, or the way the character interacts with surrounding objects in the scene — users can tap into AI-based character posing models and use image generation models to easily render 3D character scenes.

Check it out:

Generative Match: It’s Pablos all the way down…

Here’s a fun little tutorial from my teammate Kris on using reference images to style your prompt (in this case, her pet turtle Pablo). And meanwhile, here’s a little gallery of good style reference images (courtesy of my fellow PM Lee) that you’re welcome to download and use in your creations.

Introducing Generative Match in Firefly

Hey everyone—I’m just back from Adobe MAX, and hopefully my blog is back from some WordPress database shenanigans that have kept me from posting.

I don’t know what the site will enable right now, so I’ll start by simply pointing to a great 30-second tour of my favorite new feature in Firefly, Generative Match. It enables you to upload your own image as a style reference, or to pick one that Adobe provides, and mix it together with your prompt and other parameters.

You can then optionally share the resulting recipe (via “Copy link” in the Share menu that appears over results), complete with the image ingredient; try this example. This goes well beyond what one can do with just copying/pasting a prompt, and as we introduce more multimodal inputs (3D object, sketching, etc.), it’ll become all the more powerful.

All images below were generated with the following prompt: a studio portrait of a fluffy llama, hyperrealistic, shot on a white cyclorama + various style images:

Google promises interactive creation of dynamic, looping videos

My old teammates Richard Tucker, Noah Snavely, and co. have been busy. Check out this quick video & interactive demo:

80lv notes,

According to the team, they trained the prior using a dataset of motion trajectories extracted from real-life video sequences that featured natural, oscillating motions like those seen in trees, flowers, candles, and wind-blown clothing. These trajectories can then be applied to convert static images into smooth-looping dynamic videos, slow-motion clips, or interactive experiences that allow users to interact with the elements within the image.

“Sky Dachshunds!” The future of creativity?

Here are four minutes that I promise you won’t regret spending as Nathan Shipley demonstrates DALL•E 3 working inside ChatGPT to build up an entire visual world:

I mean, seriously, the demo runs through creating:

  • Ideas
  • Initial visuals
  • Logos
  • Apparel featuring the logos
  • Game art
  • Box copy
  • Games visualized in multiple styles
  • 3D action figures
  • and more.

Insane. Also charming: its extremely human inability to reliably spell “Dachshund!”

Firefly summary on The Verge

In case you missed any or all of last week’s news, here’s a quick recap:

Firefly-powered workflows that have so far been limited to the beta versions of Adobe’s apps — like Illustrator’s vector recoloring, Express text-to-image effects, and Photoshop’s Generative Fill tools — are now generally available to most users (though there are some regional restrictions in countries with strict AI laws like China).

Adobe is also launching a standalone Firefly web app that will allow users to explore some of its generative capabilities without subscribing to specific Adobe Creative Suite applications. Adobe Express Premium and the Firefly web app will be included as part of a paid Creative Cloud subscription plan.

Specifically around credits:

To help manage the compute demand (and the costs associated with generative AI), Adobe is also introducing a new credit-based system that users can “cash in” to access the fastest Firefly-powered workflows. The Firefly web app, Express Premium, and Creative Cloud paid plans will include a monthly allocation of Generative Credits starting today, with all-app Creative Cloud subscribers receiving 1,000 credits per month.

Users can still generate Firefly content if they exceed their credit limit, though the experience will be slower. Free plans for supported apps will also include a credit allocation (subject to the app), but this is a hard limit and will require customers to purchase additional credits if they’re used up before the monthly reset. Customers can buy additional Firefly Generative Credit subscription packs starting at $4.99.

How Adobe is compensating Stock creators for their contributions to Firefly

None of this AI magic would be possible without beautiful source materials from creative people, and in a new blog post and FAQ, the Adobe Stock team provides some new info:

All eligible Adobe Stock contributors with photos, vectors or illustrations in the standard and Premium collection, whose content was used to train the first commercial Firefly model will receive a Firefly bonus. This initial bonus, which will be different for each contributor, is based on the all-time total number of approved images submitted to Adobe Stock that were used for Firefly training, and the number of licenses that those images generated in the 12-month period between June 3rd, 2022, to June 2nd, 2023. The bonus is planned to pay out once a year and is currently weighted towards number of licenses issued for an image, which we consider a useful proxy for the demand and usefulness of those images. The next Firefly Bonus is planned for 2024 for new content used for training Firefly.

They’ve also provided info on what’s permissible around submitting AI-generated content:

With Adobe Firefly now commercially available, Firefly-generated works that meet our generative AI submission guidelines will now be eligible for submission to Adobe Stock. Given the proliferation of generative AI in tools like Photoshop, and many more tools and cameras to come, we anticipate that assets in the future will contain some number of generated pixels and we want to set up Adobe Stock for the future while protecting artists. We are increasing our moderation capabilities and systems to be more effective at preventing the use of creators’ names as prompts with a focus on protecting creators’ IP. Contributors who submit content that infringes or violates the IP rights of other creators will be removed from Adobe Stock.

Adobe, AI, and the FAIR act

From Dana Rao, Adobe’s General Counsel & Chief Trust Officer:

Adobe has proposed that Congress establish a new Federal Anti-Impersonation Right (the “FAIR” Act) to address this type of economic harm. Such a law would provide a right of action to an artist against those that are intentionally and commercially impersonating their work or likeness through AI tools. This protection would provide a new mechanism for artists to protect their livelihood from people misusing this new technology, without having to rely solely on laws around copyright and fair use. In this law, it’s simple: intentional impersonation using AI tools for commercial gain isn’t fair.

This is really tricky territory, as we seek to find a balance between enabling creative use of tools & protection of artists. I encourage you to read the whole post, and I’d love to hear your thoughts.

“The AI-Powered Tools Supercharging Your Imagination”

I’m so pleased & even proud (having at least having offered my encouragement to him over the years) to see my buddy Bilawal spreading his wings and spreading the good word about AI-powered creativity.

Check out his quick thoughts on “Channel-surfing realities layered on top of the real world,” “3D screenshots for the real world,” and more:

Favorite quote 😉:

Firefly: Making a lo-fi animation with Adobe Express

Check out this quick tutorial from Kris Kashtanova:

Firefly site gets faster, adds dark mode support & more

Good stuff just shipped on firefly.adobe.com:

  • New menu options enable sending images from the Text to Image module to Adobe Express.
  • The UI now supports Danish, Dutch, Finnish, Italian, Korean, Norwegian, Swedish, and Chinese. Go to your profile and select preferences to change the UI language.
  • New fonts are available for Korean, Chinese (Traditional), and Chinese (Simplified).
  • Dark mode is here! Go to your profile and select preferences to change the mode.
  • A licensing and indemnification workflow is supported for entitled users.
  • Mobile bug fixes include significant performance improvements.
  • You can now access Firefly from the Web section of CC Desktop.

You may need to perform a hard refresh on your browser to see the changes. Cmd (Ctrl) + Shift + R.

If anything looks amiss, or if there’s more you’d like to see changed, please let us know!

GenFill + old photos = 🥰

Speaking of using Generative Fill to build up areas with missing detail, check out this 30-second demo of old photo restoration:

And though it’s not presently available in Photoshop, check out this use of ControlNet to revive an old family photo:

ControlNet did a good job rejuvenating a stained blurry 70 year old photo of my 90 year old grandparents.
by u/prean625 in StableDiffusion

“Where the Fireflies Fly”

I had a ball chatting with members of the Firefly community, including our new evangelist Kris Kashtanova & O.G. designer/evangelist Rufus Deuchler. It was a really energetic & wide-ranging conversation, and if you’d like to check it out, here ya go:

Photoshop introduces Generative Expand

It’s here (in your beta copy of Photoshop, same as Generative Fill), and it works pretty much exactly as I think you’d expect: drag out crop handles, then optionally specify what you want placed into the expanded region.

In addition:

Today, we’re excited to announce that Firefly-powered features in Photoshop (beta) will now support text prompts in 100+ languages — enabling users around the world to bring their creative vision to life with text prompts in the language they prefer.

AI images -> video: ridonkulous

It’s 2023, and you can make all of this with your GD telephone. And just as amazingly, a year or two from now, we’ll look back on feeling this way & find it quaint.