Things the internet loves:
Let’s do this:
Elsewhere, I told my son that I finally agree with his strong view that the live-action Lion King (which I haven’t seen) does look pretty effed up. 🙃
Things the internet loves:
Let’s do this:
Elsewhere, I told my son that I finally agree with his strong view that the live-action Lion King (which I haven’t seen) does look pretty effed up. 🙃
My colleagues Jingwan, Jimei, Zhixin, and Eli have devised new tech for re-posing bodies & applying virtual clothing:
Our work enables applications of posed-guided synthesis and virtual try-on. Thanks to spatial modulation, our result preserves the texture details of the source image better than prior work.
Check out some results (below), see the details of how it works, and stay tuned for more.
Hard to believe that it’s been almost seven years since my team shipped Halloweenify face painting at Google, and hard to believe how far things have come since then. For this Halloween you can use GANs to apply & animate all kinds of fun style transfers, like this:
I dunno, but it’s got me feeling kinda Zucked up…
They’re using using deepfakes for scripted micro-storytelling:
The new 10-episode Snap original series “The Me and You Show” taps into Snapchat’s Cameos — a feature that uses a kind of deepfake technology to insert someone’s face into a scene. Using Cameos, the show makes you the lead actor in comedy skits alongside one of your best friends by uploading a couple of selfies. […]
The Cameos feature is based on tech developed by AI Factory, a startup developing image and video recognition, analysis and processing technology that Snap acquired in 2019. […]
According to Snap, more than 44 million Snapchat users engage with Cameos on a weekly and more than 16 million share Cameos with their friends.
I dunno—to my eye the results look like a less charming version of the old JibJab templates that were hot 20 years ago, but I’m 30 years older than the Snapchat core demographic, so what do I know?
These can be made with any still photo and will animate the head while other parts stay static and can’t have replaced backgrounds. Still, the result below shows how movements and facial expressions performed by the real person are seamlessly added to a still photograph. The human can act as a sort of puppeteer of the still photo image.
What do you think?
I keep meaning to pour one out for my nearly-dead homie, Photoshop 3D (post to follow, maybe). We launched it back in 2007 thinking that widespread depth capture was right around the corner. But “Being early is the same as being wrong,” as Marc Andreessen says, and we were off by a decade (before iPhones started putting depth maps into images).
Now, though, the world is evolving further, and researchers are enabling apps to perceive depth even in traditional 2D images—no special capture required. Check out what my colleagues have been doing together with university collaborators:
A few months back, I mentioned that my teammates had connected some machine learning models to create StyleCLIP, a way of editing photos using natural language. People have been putting it to interesting, if ethically complicated, use:
Now you can try it out for yourself. Obviously it’s a work in progress, but I’m very interested in hearing what you think of both the idea & what you’re able to create.
And just because my kids love to make fun of my childhood bowl cut, here’s Less-Old Man Nack featuring a similar look, as envisioned by robots:
FaceMix offers a rather cool way to create a face by mixing together up to four individually editable images, which you can upload or select from a set of presets. The 30-second tour:
Here’s a more detailed look into how it works:
On the reasonable chance that you’re interested in my work, you might want to bookmark (or at least watch) this one. Two-Minute Papers shows how NVIDIA’s StyleGAN research (which underlies Photoshop’s Smart Portrait Neural Filter) has been evolving, recently being upgraded with Alias-Free GAN (which very nicely reduces funky artifacts—e.g. a “sticky beard” and “boiling” regions (hair, etc.):
Side note: I continue to find the presenter’s enthusiasm utterly infectious: “Imagine saying that to someone 20 years ago. You would end up in a madhouse!” and “Holy mother of papers!”
Hmm—I’m not sure what to think about this & would welcome your thoughts. Promising to “Give people an idea of your appearance, while still protecting your true identity,” this Anonymizer service will take in your image, then generate multiple faces that vaguely approximate your characteristics:
Here’s what it made for me:
I find the results impressive but a touch eerie, and as I say, I’m not sure how to feel. Is this something you’d find useful (vs., say, just using something other than a photograph as your avatar)?
As I mentioned back in May,
You might remember the portrait relighting features that launched on Google Pixel devices last year, leveraging some earlier research. Now a number of my former Google colleagues have created a new method for figuring out how a portrait is lit, then imposing new light sources in order to help it blend into new environments.
Two-Minute Papers has put together a nice, accessible summary of how it works:
Heh—I was amused to hear generative apps’ renderings of human faces—often eerie, sometimes upsetting—described as turning people into “rotten fruits.”
This reminded me of a recurring sketch from Conan O’Brien’s early work, which featured literal rotting fruit acting out famous films—e.g. Apocalypse Now, with Francis Ford Coppola sitting there to watch:
No, I don’t know what this has to do with anything—except now I want to try typing “rotting fruit” plus maybe “napalm in the morning” into a generative engine just to see what happens. The horror… the horror!
In the magical, frequently bizarre world of generative adversarial networks, changing one attribute will often accidentally affect other “entangled” ones (e.g. I’ve seen a change of gaze cause people to grow beards!). This new tech promises better isolation of—and thus control over—things like hair style, lighting, skin tone, and more.
A bunch of my former Google colleagues, including with whom I’m now working as she’s joined Adobe, have introduced new techniques that promise amazing colorization of old photos.
By characterizing the quirks & limitations of old cameras and film, then creating and manipulating a “digital sibling,” the team is able to achieve some really lifelike results:
These academic videos are often kinda dry, but I promise that this one is pretty intriguing:
Generative artist Glenn Marshall has used CLIP + VQGAN to send Radiohead down a rather Lovecraftian rabbit hole:
Okay, this one is a little “inside baseball,” but I’m glad to see more progress using GANs to transfer visual styles among images. Check it out:
The current state-of-the-art in neural style transfer uses a technique called Adaptive Instance Normalization (AdaIN), which transfers the statistical properties of style features to a content image, and can transfer an infinite number of styles in real time. However, AdaIN is a global operation, and thus local geometric structures in the style image are often ignored during the transfer. We propose Adaptive convolutions; a generic extension of AdaIN, which allows for the simultaneous transfer of both statistical and structural styles in real time.
OMG—I’m away from our brick piles & thus can’t yet try this myself, but I can’t wait to take it for a spin. As PetaPixel explains:
If you have a giant pile of LEGO bricks and are in need of ideas on what to build, Brickit is an amazing app that was made just for you. It uses a powerful AI camera to rapidly scan your LEGO bricks and then suggest fun little projects you can build with what you have.
Here’s a short 30-second demo showing how the app works — prepare to have your mind blown:
“A nuclear-powered pencil”: that’s how someone recently described ArtBreeder, and the phrase comes to mind for NVIDIA Canvas, a new prototype app you can download (provided you have Windows & beefy GPU) and use to draw in some trippy new ways:
Paint simple shapes and lines with a palette of real world materials, like grass or clouds. Then, in real-time, our revolutionary AI model fills the screen with show-stopping results.
Don’t like what you see? Swap a material, changing snow to grass, and watch as the entire image changes from a winter wonderland to a tropical paradise. The creative possibilities are endless.
Man, I’m not even the first to imagine a tripping-out Content-Aware Phil…
…cue the vemödalen. ¯\_(ツ)_/¯
Anyway, “Large Scale Image Completion via Co-Modulated Generative Adversarial Networks” (and you thought “Content-Aware Fill” was a mouthful), which you can try out right in your browser, promises next-level abilities to fill in gaps by using GANs that understand specific domains like human faces & landscapes.
I’m not sure whether the demo animation does the idea justice, as you might reasonably think “Why would I want to scarify a face & then make a computer fill in the gaps?,” but the underlying idea (that the computer can smartly fill holes based on understanding the real-world structure of a scene) seems super compelling.
I love when tech opens a new portal in time, bringing the past closer & making it more relatable.
Photoshop Neural Filters are insanely cool, but right now adjusting any parameter generally takes a number of seconds of calculation. To make things more interactive, of my teammates are collaborating with university researchers on an approach that couples cheap-n’-cheerful quality for interactive preview with nicer-but-slower calculation of final results. This is all a work in progress, and I can’t say if/when these techniques will ship in real products, but I’m very glad to see the progress.
Watch how this new tech is able to move & blend just parts of an image (e.g. hair) while preserving others:
We propose a novel latent space for image blending which is better at preserving detail and encoding spatial information, and propose a new GAN-embedding algorithm which is able to slightly modify images to conform to a common segmentation mask.
Our novel representation enables the transfer of the visual properties from multiple reference images including specific details such as moles and wrinkles, and because we do image blending in a latent-space we are able to synthesize images that are coherent.
A few weeks ago I mentioned Toonify, an online app that can render your picture in a variety of cartoon styles. Researchers are busily cranking away to improve upon it, and the new AgileGAN promises better results & the ability to train models via just a few inputs:
Our approach provides greater agility in creating high quality and high resolution (1024×1024) portrait stylization models, requiring only a limited number of style exemplars (∼100) and short training time (∼1 hour).
This Adobe Research collaboration with Stanford & Brown Universities aims to make sense of people moving in space, despite having just 2D video as an input:
We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose and shape. Though substantial progress has been made in estimating 3D human motion and shape from dynamic observations, recovering plausible pose sequences in the presence of noise and occlusions remains a challenge. For this purpose, we propose an expressive generative model in the form of a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence. Furthermore, we introduce a flexible optimization-based approach that leverages HuMoR as a motion prior to robustly estimate plausible pose and shape from ambiguous observations. Through extensive evaluations, we demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset, and enables motion reconstruction from multiple input modalities including 3D keypoints and RGB(-D) videos.
Erik Härkönen recently interned at Adobe, collaborating with several of my teammates on interesting emerging tech. This was all a pleasant surprise to me, as I’d independently stumbled across this fun vid in which he encapsulates some exciting things AI is learning to do with photos:
Generative artist Nathan Shipley has been doing some amazing work with GANs, and he recently collaborated with BMW to use projection mapping to turn a new car into a dynamic work of art:
I’ve long admired the Art Cars series, with a particular soft spot for Jenny Holzer’s masterfully disconcerting PROTECT ME FROM WHAT I WANT:
Here’s a great overview of the project’s decades of heritage, including a dive into how Andy Warhol adorned what may be the most valuable car in the world—painting on it at lightning speed:
Years ago my friend Matthew Richmond (Chopping Block founder, now at Adobe) would speak admiringly of “math-rock kids” who could tinker with code to expand the bounds of the creative world. That phrase came to mind seeing this lovely little exploration from Derrick Schultz:
Here it is in high res:
From LiDAR scanners in millions of pockets to AIs that can now generate 3D from 2D, the magic’s getting deep:
NVIDIA Research is revving up a new deep learning engine that creates 3D object models from standard 2D images — and can bring iconic cars like the Knight Rider’s AI-powered KITT to life — in NVIDIA Omniverse.
A single photo of a car, for example, could be turned into a 3D model that can drive around a virtual scene, complete with realistic headlights, tail lights and blinkers.
I’ve always had a soft spot for incredibly crappy film dubbing—especially this Bill Murray SNL classic that I hadn’t seen in 30 years but remember like it was yesterday…
…and not to mention Police Academy (“Damn you, wanna fight? Fight me!!“):
“The extraction of facial data” — a time-consuming computational process — “runs parallel with the production itself.” The technology strips actors’ faces off, converting their visages into a 3D model, according to Lynes. “This creates millions of 3D models, which the AI uses as reference points,” he says.
“And then, using an existing foreign-language recording of the dialogue, it studies the actor and generates a new 3D model per frame,” he adds. Finally, the imagery is converted back to 2D. Digital effects artists can then manually fix anything that seems off.
I’ve obviously been talking a ton about the crazy-powerful, sometimes eerie StyleGAN2 technology. Here’s a case of generative artist Mario Klingemann wiring visuals to characteristics of music:
Watch it at 1/4 speed if you really want to freak yourself out.
Beats-to-visuals gives me an excuse to dig up & reshare Michel Gondry’s brilliant old Chemical Brothers video that associated elements like bridges, posts, and train cars with the various instruments at play:
Back to Mario: he’s also been making weirdly bleak image descriptions using CLIP (the same model we’ve explored using to generate faces via text). I congratulated him on making a robot sound like Werner Herzog. 🙃
I find myself recalling something that Twitter founder Evan Williams wrote about “value moving up the stack“:
As industries evolve, core infrastructure gets built and commoditized, and differentiation moves up the hierarchy of needs from basic functionality to non-basic functionality, to design, and even to fashion.
For example, there was a time when chief buying concerns included how well a watch might tell time and how durable a pair of jeans was.
Now apps like FaceTune deliver what used to be Photoshop-only levels of power to millions of people, and Runway ML promises to let you just type words to select & track objects in video—using just a Web browser. 👀
“Hijacking Brains: The Why I’m Here Story” 😌
As I wrote many years ago, it was the chance to work with alpha geeks that drew me to Adobe:
When I first encountered the LiveMotion team, I heard that engineer Chris Prosser had built himself a car MP3 player (this was a couple of years before the iPod). Evidently he’d disassembled an old Pentium 90, stuck it in his trunk, connected it to the glovebox with some Ethernet cable, added a little LCD track readout, and written a Java Telnet app for synching the machine with his laptop. Okay, I thought, I don’t want to do that, but I’d like to hijack the brains of someone who could.
Now my new teammate Cameron Smith has spent a weekend wiring MIDI hardware to StyleGAN to control facial synthesis & modification:
This stuff makes my head spin around—and not just because the demo depicts heads spinning around!
You might remember the portrait relighting features that launched on Google Pixel devices last year, leveraging some earlier research. Now a number of my former Google colleagues have created a new method for figuring out how a portrait is lit, then imposing new light sources in order to help it blend into new environments. Check it out:
Check out how StyleMapGAN (paper, PDF, code) enables combinations of human & animal faces, vehicles, buildings, and more. Unlike simple copy-paste-blend, this technique permits interactive morphing between source & target pixels:
From the authors, a bit about what’s going on here:
Generative adversarial networks (GANs) synthesize realistic images from random latent vectors. Although manipulating the latent vectors controls the synthesized outputs, editing real images with GANs suffers from i) time-consuming optimization for projecting real images to the latent vectors, ii) or inaccurate embedding through an encoder. We propose StyleMapGAN: the intermediate latent space has spatial dimensions, and a spatially variant modulation replaces AdaIN. It makes the embedding through an encoder more accurate than existing optimization-based methods while maintaining the properties of GANs. Experimental results demonstrate that our method significantly outperforms state-of-the-art models in various image manipulation tasks such as local editing and image interpolation. Last but not least, conventional editing methods on GANs are still valid on our StyleMapGAN. Source code is available at https://github.com/naver-ai/StyleMapGAN.
Artbreeder is a trippy project that lets you “simply keep selecting the most interesting image to discover totally new images. Infinitely new random ‘children’ are made from each image. Artbreeder turns the simple act of exploration into creativity.” Check out interactive remixing:
Artbreeder is a nuclear powered pencil.
— Bay Raitt (@bayraitt) September 17, 2019
Here’s an overview of how it works:
I find this emerging space so fascinating. Check out how Toonify.photos (which you can use for free, or at high quality for a very modest fee) can turn one’s image into a cartoon character. It leverages training data based on iconic illustration styles:
I also chuckled at this illustration from the video above, as it endeavors to how two networks (the “adversaries” in “Generative Adversarial Network”) attempt, respectively, to fool the other with output & to avoid being fooled. Check out more details in the accompanying article.
You say “work with an AI to make art, purely from a text prompt,” I hear “monkey with a revolver”—which reminds me, I should plug “monkey with a revolver” into this system to see what comes out. Meanwhile, example weirdness:
“Same Energy is a visual search engine. You can use it to find beautiful art, photography, decoration ideas, or anything else.” I recommend simply clicking it & exploring a bit, but you can also see a bit here (vid not in English, but that doesn’t really matter):
I’m using it to find all kinds of interesting image sets, like this:
As for how it works,
The default feeds available on the home page are algorithmically curated: a seed of 5-20 images are selected by hand, then our system builds the feed by scanning millions of images in our index to find good matches for the seed images. You can create feeds in just the same way: save images to create a collection of seed images, then look at the recommended images.
On Monday I mentioned my new team’s mind-blowing work to enable image synthesis through typing, and I noted that it builds on NVIDIA’s StyleGAN research. If you’re interested in the latter, check out this two-minute demo of how it enables amazing interactive generation of stylized imagery:
This new project called StyleGAN2, developed by NVIDIA Research, and presented at CVPR 2020, uses transfer learning to produce seemingly infinite numbers of portraits in an infinite variety of painting styles. The work builds on the team’s previously published StyleGAN project. Learn more here.
Welcome to the rabbit hole, my friends. 🙃
What if instead of pushing pixels, you could simply tell your tools what changes you’d like to see? (Cue Kramer voice: “Why don’t you just tell me the movie…??”) This new StyleCLIP technology (code) builds on NVIDIA’s StyleGAN foundation to enable image editing simply by applying various terms. Check out some examples (“before” images in the top row; “after” below along with editing terms).
Here’s a demo of editing human & animal faces, and even of transforming cars:
By no means have I been around here long enough (five whole days!) to grok everything that’s going on here, but as I come up to speed, I’ll do my best to share what I’m learning. Meanwhile I’d love to hear your thoughts on how we might thoughtfully bring techniques like this to life.