Category Archives: AI/ML

Say it -> Select it: Runway ML promises semantic video segmentation

I find myself recalling something that Twitter founder Evan Williams wrote about “value moving up the stack“:

As industries evolve, core infrastructure gets built and commoditized, and differentiation moves up the hierarchy of needs from basic functionality to non-basic functionality, to design, and even to fashion.

For example, there was a time when chief buying concerns included how well a watch might tell time and how durable a pair of jeans was.

Now apps like FaceTune deliver what used to be Photoshop-only levels of power to millions of people, and Runway ML promises to let you just type words to select & track objects in video—using just a Web browser. 👀

ML + MIDI = trippy facial fun

“Hijacking Brains: The Why I’m Here Story” 😌

As I wrote many years ago, it was the chance to work with alpha geeks that drew me to Adobe:

When I first encountered the LiveMotion team, I heard that engineer Chris Prosser had built himself a car MP3 player (this was a couple of years before the iPod). Evidently he’d disassembled an old Pentium 90, stuck it in his trunk, connected it to the glovebox with some Ethernet cable, added a little LCD track readout, and written a Java Telnet app for synching the machine with his laptop. Okay, I thought, I don’t want to do that, but I’d like to hijack the brains of someone who could.

Now my new teammate Cameron Smith has spent a weekend wiring MIDI hardware to StyleGAN to control facial synthesis & modification:

“Total Relighting” promises to teleport(rait) you into new vistas

This stuff makes my head spin around—and not just because the demo depicts heads spinning around!

You might remember the portrait relighting features that launched on Google Pixel devices last year, leveraging some earlier research. Now a number of my former Google colleagues have created a new method for figuring out how a portrait is lit, then imposing new light sources in order to help it blend into new environments. Check it out:

Interesting, interactive mash-ups powered by AI

Check out how StyleMapGAN (paper, PDF, code) enables combinations of human & animal faces, vehicles, buildings, and more. Unlike simple copy-paste-blend, this technique permits interactive morphing between source & target pixels:

From the authors, a bit about what’s going on here:

Generative adversarial networks (GANs) synthesize realistic images from random latent vectors. Although manipulating the latent vectors controls the synthesized outputs, editing real images with GANs suffers from i) time-consuming optimization for projecting real images to the latent vectors, ii) or inaccurate embedding through an encoder. We propose StyleMapGAN: the intermediate latent space has spatial dimensions, and a spatially variant modulation replaces AdaIN. It makes the embedding through an encoder more accurate than existing optimization-based methods while maintaining the properties of GANs. Experimental results demonstrate that our method significantly outperforms state-of-the-art models in various image manipulation tasks such as local editing and image interpolation. Last but not least, conventional editing methods on GANs are still valid on our StyleMapGAN. Source code is available at https://github.com/naver-ai/StyleMapGAN​.

Artbreeder is wild

Artbreeder is a trippy project that lets you “simply keep selecting the most interesting image to discover totally new images. Infinitely new random ‘children’ are made from each image. Artbreeder turns the simple act of exploration into creativity.” Check out interactive remixing:

Here’s an overview of how it works:

Generative Adversarial Networks are the main technology enabling Artbreeder. Artbreeder uses BigGAN and StyleGAN models. There is a minimal open source version available that uses BigGAN.

Using AI to create Disney- & Pixar-style caricatures

I find this emerging space so fascinating. Check out how Toonify.photos (which you can use for free, or at high quality for a very modest fee) can turn one’s image into a cartoon character. It leverages training data based on iconic illustration styles:

I also chuckled at this illustration from the video above, as it endeavors to how two networks (the “adversaries” in “Generative Adversarial Network”) attempt, respectively, to fool the other with output & to avoid being fooled. Check out more details in the accompanying article.

same.energy enables fun, visual search

Same Energy is a visual search engine. You can use it to find beautiful art, photography, decoration ideas, or anything else.” I recommend simply clicking it & exploring a bit, but you can also see a bit here (vid not in English, but that doesn’t really matter):

I’m using it to find all kinds of interesting image sets, like this:

As for how it works,

The default feeds available on the home page are algorithmically curated: a seed of 5-20 images are selected by hand, then our system builds the feed by scanning millions of images in our index to find good matches for the seed images. You can create feeds in just the same way: save images to create a collection of seed images, then look at the recommended images.