Monthly Archives: November 2023

Pika Labs “Idea-to-Video” looks stunning

It’s ludicrous to think that these folks formed the company just six months ago, and even more ludicrous to see what the model can already do—from video synthesis, to image animation, to inpainting/outpainting:

Our vision for Pika is to enable everyone to be the director of their own stories and to bring out the creator in each of us. Today, we reached a milestone that brings us closer to our vision. We are thrilled to unveil Pika 1.0, a major product upgrade that includes a new AI model capable of generating and editing videos in diverse styles such as 3D animation, anime, cartoon and cinematic, and a new web experience that makes it easier to use. You can join the waitlist for Pika 1.0 at https://pika.art.

“Emu Edit” enables instructional image editing

This tech—or something much like it—is going to be a very BFD. Imagine simply describing the change you’d like to see in your image—and then seeing it.

[Generative models] still face limitations when it comes to offering precise control. That’s why we’re introducing Emu Edit, a novel approach that aims to streamline various image manipulation tasks and bring enhanced capabilities and precision to image editing.

Emu Edit is capable of free-form editing through instructions, encompassing tasks such as local and global editing, removing and adding a background, color and geometry transformations, detection and segmentation, and more. […]

Emu Edit precisely follows instructions, ensuring that pixels in the input image unrelated to the instructions remain untouched. For instance, when adding the text “Aloha!” to a baseball cap, the cap itself should remain unchanged.

Read more here & here.

And for some conceptually related (but technically distinct) ideas, see previous: Iterative creation with ChatGPT.

NBA goes NeRF

Here’s a great look at how the scrappy team behind Luma.ai has helped enable beautiful volumetric captures of Phoenix Suns players soaring through the air:

Go behind the scenes of the innovative collaboration between Profectum Media and the Phoenix Suns to discover how we overcame technological and creative challenges to produce the first 3D bullet time neural radiance field NeRF effect in a major sports NBA arena video. This involved not just custom-building a 48 GoPro multi-cam volumetric rig but also integrating advanced AI tools from Luma AI to capture athletes in stunning, frozen-in-time 3D visual sequences. This venture is more than just a glimpse behind the scenes – it’s a peek into the evolving world of sports entertainment and the future of spatial capture.

Phat Splats

If you keep hearing about “Gaussian Splatting” & wondering “WTAF,” check out this nice primer from my buddy Bilawal:

There’s also Two-Minute Papers, offering a characteristically charming & accessible overview:

Iterative creation with ChatGPT

I’m really digging the experience of (optionally) taking a photo, feeding it into ChatGPT, and then riffing my way towards an interesting visual outcome. Here’s a gallery in which you can see some of the journeys I’ve undertaken recently.

  • Image->description->image quality is often pretty hit-or-miss. Even so, it’s such a compelling possibility that I keep wanting to try it (e.g. seeing a leaf on the ground, wanting to try turning it into a stingray).
  • The system attempts to maintain various image properties (e.g. pose, color, style) while varying others (e.g. turning the attached vehicle from a box truck to a tanker while maintaining its general orientation plus specifics like featuring three Holstein cows).
  • Overall text creation is vastly improved vs. previous models, though it can still derail. It’s striking that one can iteratively improve a particular line of text (e.g. “Make sure that the second line says ‘TRAIN’“).

The Young & The Spiderverse

Man, I’m inspired—and TBH a little jealous—seeing 14yo creator Preston Mutanga creating amazing 3D animations, as he’s apparently been doing for fully half his life. I think you’ll enjoy the short talk he gave covering his passions:

The presentation will take the audience on a journey, a journey across the Spider-Verse where a self-taught, young, talented 14-year-old kid used Blender, to create high-quality LEGO animations of movie trailers. Through the use of social media, this young artist’s passion and skill caught the attention of Hollywood producers, leading to a life-changing invitation to animate in a new Hollywood movie.

Hands up for Res Up ⬆️

Speaking of increasing resolution, check out this sneak peek from Adobe MAX:

It’s a video upscaling tool that uses diffusion-based technology and artificial intelligence to convert low-resolution videos to high-resolution videos for applications. Users can directly upscale low-resolution videos to high resolution. They can also zoom-in and crop videos and upscale them to full resolution with high-fidelity visual details and temporal consistency. This is great for those looking to bring new life into older videos or to prevent blurry videos when playing scaled versions on HD screens.

Adventures in Upsampling

Interesting recent finds:

  • Google Zoom Enhance. “Using generative AI, Zoom Enhance intelligently fills in the gaps between pixels and predicts fine details, opening up more possibilities when it comes to framing and flexibility to focus on the most important part of your photo.”
  • Nick St. Pierre writes, “I just upscaled an image in MJ by 4x, then used Topaz Photo AI to upscale that by another 6x. The final image is 682MP and 32000×21333 pixels large.”
  • Here’s a thread of 10 Midjourney upsampling examples, including a direct comparison against Topaz.