What if Content-Aware Fill started hallucinating?

Man, I’m not even the first to imagine a tripping-out Content-Aware Phil…

…cue the vemödalen. ¯\_(ツ)_/¯

Anyway, “Large Scale Image Completion via Co-Modulated Generative Adversarial Networks” (and you thought “Content-Aware Fill” was a mouthful), which you can try out right in your browser, promises next-level abilities to fill in gaps by using GANs that understand specific domains like human faces & landscapes.

I’m not sure whether the demo animation does the idea justice, as you might reasonably think “Why would I want to scarify a face & then make a computer fill in the gaps?,” but the underlying idea (that the computer can smartly fill holes based on understanding the real-world structure of a scene) seems super compelling.

Lego introduces Adidas shelltoes

Oh my God.

LEGO has officially announced the new LEGO adidas Originals Superstar (10282) which will be available starting on July 1. The shoe has 731 pieces and will retail for $79.99. In the ongoing collaboration with adidas, LEGO has recreated the iconic Superstar sneaker in brick form. Instead of the regular LEGO packaging, the set will actually come in a shoebox for authenticity and even the laces on it are real.

Design: The “Supersonic Booze Carrier”

I’ve always said that when—not if—I die in a fiery crash alongside Moffett Field, it’ll be because I was rubbernecking at some cool plane or other (e.g. the immense Antonov An-124), and you’ll remember this and say, “Well, he did at least call his shot.”

Suffice it to say I’m a huge plane nerd with a special soft spot for exotic (to me) ex-Soviet aircraft. I therefore especially enjoyed this revealing look into the Tu-22, whose alcohol-based air conditioning system made it a huge hit with aircrews (that is, when it wasn’t killing them via things like its downward-firing ejection seats!). Even if planes aren’t your jam, I think you’ll find the segment on how the alcohol became currency really interesting.

Chuck Close compares golf & creativity

I had a long & interesting talk this week with Erik Natzke, whose multi-disciplinary art (ranging from code to textiles) has inspired me for years. As we were talking through the paths by which one can find a creative solution, he shared this quote from painter Chuck Close:

Chuck Close: I thought that using a palette was like shooting an arrow directly at a bull’s-eye. You hope that you make the right decision out of context. But when you shoot it at the bull’s eye, you hit what you were aiming at. And I thought, as a sports metaphor, golf was a much more interesting way to think about it.

If you think about golf, it’s the only sport—and it’s a little iffy if it’s a sport, although Tiger made it into a sport—in which you move from general to specific in an ideal number of correcting moves. The first stroke is just a leap of faith, you hit it out there; you hope you’re on the fairway. Second one corrects that, the third one corrects that. By the third or fourth you hope that you’re on the green. And at one or two putts, you place that ball in a very specific three-and-a-half inch diameter circle, which you couldn’t even see from the tee. How did you do it? You found it moving through the landscape, making mid-course corrections.

I thought, “This is exactly how I paint.” I tee off in the wrong direction to make it more interesting, now I’ve got to correct like crazy, then I’ve got to correct again. What’s it need? I need some of that. And then four or five or six strokes, I hopefully have found the color world that I want. Then I can sort of celebrate, you know, put that in the scorecard, and move on to the next one.

Bonus: “Is that a face made of meat??” — my 11yo Henry, walking by just now & seeing this image from afar 😛

“Anycost GAN” promises interactive editing using AI

Photoshop Neural Filters are insanely cool, but right now adjusting any parameter generally takes a number of seconds of calculation. To make things more interactive, of my teammates are collaborating with university researchers on an approach that couples cheap-n’-cheerful quality for interactive preview with nicer-but-slower calculation of final results. This is all a work in progress, and I can’t say if/when these techniques will ship in real products, but I’m very glad to see the progress.

Trippy Adobe brushes

As I noted last year,

I’ve always been part of that weird little slice of the Adobe user population that gets really hyped about offbeat painting tools—from stretching vectors along splines & spraying out fish in Illustrator (yes, they’re both in your copy right now; no, you’ve never used them).

In that vein, I dig what Erik Natzke & co. have explored:

This one’s even trippier:

Here’s a quick tutorial on how to make your own brush via Adobe Capture:

And here are the multicolor brushes added to Adobe Fresco last year:

Illustrator & InDesign get big boosts on Apple Silicon

On an epic dog walk this morning, Old Man Nack™ took his son through the long & winding history of Intel vs. Motorola, x86 vs. PPC, CISC vs. RISC, toasted bunny suits, the shock of Apple’s move to Intel (Marklar!), and my lasting pride in delivering the Photoshop CS3 public beta to give Mac users native performance six months early.

As luck would have it, Adobe has some happy news to share about the latest hardware evolution:

Today, we’re thrilled to announce that Illustrator and InDesign will run natively on Apple Silicon devices. While users have been able to continue to use the tool on M1 Macs during this period, today’s development means a considerable boost in speed and performance. Overall, Illustrator users will see a 65 percent increase in performance on an M1 Mac, versus Intel builds — InDesign users will see similar gains, with a 59 percent improvement on overall performance on Apple Silicon. […]

These releases will start to roll out to customers starting today and will be available to all customers across the globe soon.

Check out the post for full details.

“Barbershop” uses GANs to flip your wig

Watch how this new tech is able to move & blend just parts of an image (e.g. hair) while preserving others:

We propose a novel latent space for image blending which is better at preserving detail and encoding spatial information, and propose a new GAN-embedding algorithm which is able to slightly modify images to conform to a common segmentation mask.

Our novel representation enables the transfer of the visual properties from multiple reference images including specific details such as moles and wrinkles, and because we do image blending in a latent-space we are able to synthesize images that are coherent.

Automatic caricature creation gets better & better

A few weeks ago I mentioned Toonify, an online app that can render your picture in a variety of cartoon styles. Researchers are busily cranking away to improve upon it, and the new AgileGAN promises better results & the ability to train models via just a few inputs:

Our approach provides greater agility in creating high quality and high resolution (1024×1024) portrait stylization models, requiring only a limited number of style exemplars (∼100) and short training time (∼1 hour).

alt text

Adobe “HuMoR” estimates 3D human movements from 2D inputs

This Adobe Research collaboration with Stanford & Brown Universities aims to make sense of people moving in space, despite having just 2D video as an input:

We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose and shape. Though substantial progress has been made in estimating 3D human motion and shape from dynamic observations, recovering plausible pose sequences in the presence of noise and occlusions remains a challenge. For this purpose, we propose an expressive generative model in the form of a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence. Furthermore, we introduce a flexible optimization-based approach that leverages HuMoR as a motion prior to robustly estimate plausible pose and shape from ambiguous observations. Through extensive evaluations, we demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset, and enables motion reconstruction from multiple input modalities including 3D keypoints and RGB(-D) videos.

Google makes strides on equitable imaging

“I’m real black, like won’t show up on your camera phone,” sang Childish Gambino. It remains a good joke, but ten years later, it’s long past time for devices to be far fairer in how they capture and represent the world. I’m really happy to see my old teammates at Google focusing on just this area:

“Supernatural” offers home workouts in VR

Hmm—this looks slick, but I’m not sure that I want to have a big plastic box swinging around my face while I’m trying to get fit. As a commenter notes, “That’s just Beat Saber with someone saying ‘good job’ once in a while”—but a friend of mine says it’s great. ¯\_(ツ)_/¯

This vid (same poster frame but different content) shows more of the actual gameplay:

BMW art cars go AI

Generative artist Nathan Shipley has been doing some amazing work with GANs, and he recently collaborated with BMW to use projection mapping to turn a new car into a dynamic work of art:

I’ve long admired the Art Cars series, with a particular soft spot for Jenny Holzer’s masterfully disconcerting PROTECT ME FROM WHAT I WANT:

Here’s a great overview of the project’s decades of heritage, including a dive into how Andy Warhol adorned what may be the most valuable car in the world—painting on it at lightning speed:

AI art: GANimated flowers

Years ago my friend Matthew Richmond (Chopping Block founder, now at Adobe) would speak admiringly of “math-rock kids” who could tinker with code to expand the bounds of the creative world. That phrase came to mind seeing this lovely little exploration from Derrick Schultz:

Here it is in high res:

Mo’ betta witchcraft: NVIDIA turns 2D images into 3D

From LiDAR scanners in millions of pockets to AIs that can now generate 3D from 2D, the magic’s getting deep:

NVIDIA Research is revving up a new deep learning engine that creates 3D object models from standard 2D images — and can bring iconic cars like the Knight Rider’s AI-powered KITT to life — in NVIDIA Omniverse.

A single photo of a car, for example, could be turned into a 3D model that can drive around a virtual scene, complete with realistic headlights, tail lights and blinkers.

“Who Do We Want Our Customers to Become?”

As I’ve noted previously, this essay from Slack founder Stewart Butterfield is a banger. You should read the whole thing if you haven’t—or re-read it if you have—and care about building great products. In my new role exploring the crazy, sometimes scary world of AI-first creativity tools, I find myself meditating on this line:

Who Do We Want Our Customers to Become?… We want them to become relaxed, productive workers… masters of their own information and not slaves… who communicate purposively.

I want customers to be fearless explorers—to F Around & Find Out, in the spirit of Walt Whitman:

Yes, this is way outside Adobe’s comfort zone—but I didn’t come back here to be comfortable. Game on.

Apply for the Adobe Stock Artist Development Fund

I’m really happy to see Adobe putting skin in the game to increase diversity & inclusion in stock imagery:

Introducing the Artist Development Fund, a new $500,000 creative commission program from Adobe Stock. As an expression of our commitment to inclusion we’re looking for artists who self-identify with and expertly depict diverse communities within their work.

Here’s how it works:

The fund also ensures artists are compensated for their work. We will be awarding funding of $12,500 each to a total of 40 global artists on a rolling basis during 2021. Artist Development Fund recipients will also gain unique opportunities, including having their work and stories featured across Adobe social and editorial channels to help promote accurate and inclusive cultural representation within the creative industry.

Using AI to enhance film dubbing

I’ve always had a soft spot for incredibly crappy film dubbing—especially this Bill Murray SNL classic that I hadn’t seen in 30 years but remember like it was yesterday…

…and not to mention Police Academy (“Damn you, wanna fight? Fight me!!“):

Not every example is so charming, however and now a company called Flawless plans to use neural networks to fit actors’ mouth movements to dialogue:

“The extraction of facial data” — a time-consuming computational process — “runs parallel with the production itself.” The technology strips actors’ faces off, converting their visages into a 3D model, according to Lynes. “This creates millions of 3D models, which the AI uses as reference points,” he says.

“And then, using an existing foreign-language recording of the dialogue, it studies the actor and generates a new 3D model per frame,” he adds. Finally, the imagery is converted back to 2D. Digital effects artists can then manually fix anything that seems off.

Google reveals Project Starline, a next-gen 3D video chat booth

I’m thrilled that a bunch of Google friends (including Dan Goldman, who was instrumental in bringing Content-Aware Fill to Photoshop) have gotten to reveal Project Starline, their effort to deliver breakthrough 3D perception & display to bring people closer together:

Imagine looking through a sort of magic window, and through that window, you see another person, life-size and in three dimensions. You can talk naturally, gesture and make eye contact.

To make this experience possible, we are applying research in computer vision, machine learning, spatial audio and real-time compression. We’ve also developed a breakthrough light field display system that creates a sense of volume and depth that can be experienced without the need for additional glasses or headsets.

Check out this quick tour, even if it’s hard to use regular video to convey the experience of using the tech:

I hope that Dan & co. will be able to provide some peeks behind the scenes, including at how they captured video for testing and demos. (Trust me, it’s all way weirder & more fascinating than you’d think!)

Body Movin’: Adobe Character Animator introduces body tracking (beta)

You’ll scream, you’ll cry, promises designer Dave Werner—and maybe not due just to “my questionable dance moves.”

Live-perform 2D character animation using your body. Powered by Adobe Sensei, Body Tracker automatically detects human body movement using a web cam and applies it to your character in real time to create animation. For example, you can track your arms, torso, and legs automatically. View the full release notes.

Check out the demo below & the site for full details.

Syncopated AI nightmare fuel

I’ve obviously been talking a ton about the crazy-powerful, sometimes eerie StyleGAN2 technology. Here’s a case of generative artist Mario Klingemann wiring visuals to characteristics of music:

Watch it at 1/4 speed if you really want to freak yourself out.

Beats-to-visuals gives me an excuse to dig up & reshare Michel Gondry’s brilliant old Chemical Brothers video that associated elements like bridges, posts, and train cars with the various instruments at play:

Back to Mario: he’s also been making weirdly bleak image descriptions using CLIP (the same model we’ve explored using to generate faces via text). I congratulated him on making a robot sound like Werner Herzog. 🙃

Say it -> Select it: Runway ML promises semantic video segmentation

I find myself recalling something that Twitter founder Evan Williams wrote about “value moving up the stack“:

As industries evolve, core infrastructure gets built and commoditized, and differentiation moves up the hierarchy of needs from basic functionality to non-basic functionality, to design, and even to fashion.

For example, there was a time when chief buying concerns included how well a watch might tell time and how durable a pair of jeans was.

Now apps like FaceTune deliver what used to be Photoshop-only levels of power to millions of people, and Runway ML promises to let you just type words to select & track objects in video—using just a Web browser. 👀

ML + MIDI = trippy facial fun

“Hijacking Brains: The Why I’m Here Story” 😌

As I wrote many years ago, it was the chance to work with alpha geeks that drew me to Adobe:

When I first encountered the LiveMotion team, I heard that engineer Chris Prosser had built himself a car MP3 player (this was a couple of years before the iPod). Evidently he’d disassembled an old Pentium 90, stuck it in his trunk, connected it to the glovebox with some Ethernet cable, added a little LCD track readout, and written a Java Telnet app for synching the machine with his laptop. Okay, I thought, I don’t want to do that, but I’d like to hijack the brains of someone who could.

Now my new teammate Cameron Smith has spent a weekend wiring MIDI hardware to StyleGAN to control facial synthesis & modification:

VFX & photography: Fireside chat tonight with Paul Debevec

If you liked yesterday’s news about Total Relighting, or pretty much anything else related to HDR capture over the last 20 years, you might dig this SIGGRAPH LA session, happening tonight at 7pm Pacific:

Paul Debevec is one of the most recognized researchers in the field of CG today. LA ACM SIGGRAPH’s “fireside chat” with Paul and Carolyn Giardina, of the Hollywood Reporter, will allow us a glimpse at the person behind all the innovative scientific work. This event promises to be one of our most popularas Paul always draws a crowd and is constantly in demand to speak at conferences around the world.

“Total Relighting” promises to teleport(rait) you into new vistas

This stuff makes my head spin around—and not just because the demo depicts heads spinning around!

You might remember the portrait relighting features that launched on Google Pixel devices last year, leveraging some earlier research. Now a number of my former Google colleagues have created a new method for figuring out how a portrait is lit, then imposing new light sources in order to help it blend into new environments. Check it out:

“No One Is Coming. It’s Up To Us.”

“Everyone sweeps the floor around here.”

As I’ve noted many times, that core ethos from Adobe’s founders has really stuck with me over the years. In a similar, if superficially darker, vein, I keep meditating on the phrase “No One Is Coming, It’s Up To Us,” which appears in a sticker I put on the back of my car:

It’s reeeeealy easy to sit around and complain that we don’t have enough XYZ support (design cycles, eng bodies, etc.), and it’s all true/fair—but F that ‘cause it doesn’t move the ball. I keep thinking of DMX, with regard to myself & other comfortable folks:

I put in work, and it’s all for the kids (uh)
But these cats done forgot what work is (uh-huh)

Some brief & bracing wisdom:

Happy Monday. Go get some.

A thoughtful conversation about race

I know it’s not a subject that draws folks to this blog, but I wanted to share a really interesting talk I got to attend recently at Google. Broadcaster & former NFL player Emmanuel Acho hosts “Uncomfortable Conversations With A Black Man,” and I was glad that he shared his time and perspective with us. If you stick around to the end, I pop in with a question. The conversation is also available in podcast form.

This episode is with Emmanuel Acho, who discusses his book and YouTube Channel series of the same name: “Uncomfortable Conversations with a Black Man”, which offers conversations about race in an effort to drive open dialogue.

Emmanuel is a Fox Sports analyst and co-host of “Speak for Yourself”. After earning his undergraduate degree in sports management in 2012, Emmanuel was drafted by the Cleveland Browns. He was then traded to the Philadelphia Eagles in 2013, where he spent most of his career. While in the NFL, Emmanuel spent off seasons at the University of Texas to earn his master’s degree in Sports Psychology. Emmanuel left the football field and picked up the microphone to begin his broadcast career. He served as the youngest national football analyst and was named a 2019 Forbes 30 Under 30 Selection. Due to the success of his web series, with over 70 million views across social media platforms, he wrote the book “Uncomfortable Conversations with a Black Man”, and it became an instant New York Times Best Seller.

Interesting, interactive mash-ups powered by AI

Check out how StyleMapGAN (paper, PDF, code) enables combinations of human & animal faces, vehicles, buildings, and more. Unlike simple copy-paste-blend, this technique permits interactive morphing between source & target pixels:

From the authors, a bit about what’s going on here:

Generative adversarial networks (GANs) synthesize realistic images from random latent vectors. Although manipulating the latent vectors controls the synthesized outputs, editing real images with GANs suffers from i) time-consuming optimization for projecting real images to the latent vectors, ii) or inaccurate embedding through an encoder. We propose StyleMapGAN: the intermediate latent space has spatial dimensions, and a spatially variant modulation replaces AdaIN. It makes the embedding through an encoder more accurate than existing optimization-based methods while maintaining the properties of GANs. Experimental results demonstrate that our method significantly outperforms state-of-the-art models in various image manipulation tasks such as local editing and image interpolation. Last but not least, conventional editing methods on GANs are still valid on our StyleMapGAN. Source code is available at https://github.com/naver-ai/StyleMapGAN​.

A little fun with Bullet Time

During our epic Illinois-to-California run down Route 66 in March, my son Henry and I had fun capturing all kinds of images, including via my Insta360 One X2 camera. Here are a couple of “bullet time” slow-mo vids I thought were kind of fun. The first comes from the Round Barn in Arcadia, OK…

…and the second from the Wigwam Motel in Holbrook, AZ (see photos):

It’s a bummer that the optical quality here suffers from having the company’s cheap-o lens guards applied. (Without the guards, one errant swipe of the selfie stick can result in permanent scratches to the lens, necessitating shipment back to China for repairs.) They say they’re working on more premium glass ones, for which they’ll likely get yet more of my dough. ¯\_(ツ)_/¯

What a difference four years makes in iPhone cameras

“People tend to overestimate what can be done in one year and to underestimate what can be done in five or ten years,” as the old saying goes. Similarly, it can be hard to notice one’s own kid’s progress until confronted with an example of that kid from a few years back.

My son Henry has recently taken a shine to photography & has been shooting with my iPhone 7 Plus. While passing through Albuquerque a few weeks back, we ended up shooting side by side—him with the 7, and me with an iPhone 12 Pro Max (four years newer). We share a camera roll, and as I scrolled through I was really struck seeing the output of the two devices placed side by side.

I don’t hold up any of these photos (all unedited besides cropping) as art, but it’s fun to compare them & to appreciate just how far mobile photography has advanced in a few short years. See gallery for more.

Vid2Actor: Turning video of humans into posable 3D models

As I’m on a kick sharing recent work from Ira Kemelmacher-Shlizerman & team, here’s another banger:

Given an “in-the-wild” video, we train a deep network with the video frames to produce an animatable human representation.

This can be rendered from any camera view in any body pose, enabling applications such as motion re-targeting and bullet-time rendering without the need for rigged 3D meshes.

I look forward (?) to the not-so-distant day when a 3D-extracted Trevor Lawrence hucks a touchdown to Cleatus the Fox Sports Robot. Grand slam!!

Artbreeder is wild

Artbreeder is a trippy project that lets you “simply keep selecting the most interesting image to discover totally new images. Infinitely new random ‘children’ are made from each image. Artbreeder turns the simple act of exploration into creativity.” Check out interactive remixing:

Here’s an overview of how it works:

Generative Adversarial Networks are the main technology enabling Artbreeder. Artbreeder uses BigGAN and StyleGAN models. There is a minimal open source version available that uses BigGAN.

Design: Split-flap signs

I’ve long loved the weird mechanical purring of those flappy-letter signs one sees (or at least used to see) in train stations & similar venues, but I haven’t felt like throwing down the better part of three grand to own a Vestaboard. Now maker Scott Bezek is working on an open-source project for making such signs at home, combining simple materials and code. In case you’d never peeked inside such a mechanism (and really, why would you have?) and are curious, here’s how they work:

And here, for some reason, are six oddly satisfying minutes of a sign spelling out four-letter words:

Check out the Spark AR Master Class

I remain fascinated by what Snap & Facebook are doing with their respective AR platforms, putting highly programmable camera stacks into the hands of hundreds of millions of consumers & hundreds of thousands of creators. If you have thoughts on the subject & want to nerd out some time, drop me a note.

A few months back I wanted to dive into the engine that’s inside Instagram, and I came across the Spark AR masterclass put together & presented by filter creator Eddy Adams. I found it engaging & informative, if even a bit fast for my aging brain 🙃. If you’re tempted to get your feet wet in this emerging space, I recommend giving it a shot.