Heh—I got a kick out of seeing how AI would go about hallucinating its idea of what my flamed-out ’84 Volvo wagon looked like. See below for a comparison. And in retrospect, how did I not adorn mine with a tail light made from a traffic cone (or is it giant candy corn?) and “VOOFO NACK”? 😅
Not yet having access to this system [taps mic impatiently], I’m just checking out its simple but effective interface from afar. Here’s how artists can designate specific regions in order to repopulate them:
Each week we’ll cover a different aspect of machine learning. A short lecture covering theories and practices will be followed by demoes using open source web tools and a web-browser tool called Google Colab. The last 3 weeks of class you’ll be given the chance to create your own project using the skills you’ve learned. Topics will include selecting the right model for your use case, gathering and manipulating datasets, and connecting your models to data sources such as audio, text, or numerical data. We’ll also talk a little ethics, because we can’t teach machine learning without a little ethics.
I really enjoyed this conversation—touching, as it does, on my latest fascination (AI-generated art via DALL•E) and myriad other topics. In fact, I plan to listen to it again—hopefully this time near a surface through which to jot down & share some of the most resonant observations. Meanwhile, I think you’ll find it thoughtful & stimulating.
In this episode of the podcast, Sam Harris speaks with Eric Schmidt about the ways artificial intelligence is shifting the foundations of human knowledge and posing questions of existential risk.
My old boss on Photoshop, Kevin Connor, used to talk about the inexorable progression of imaging tools from the very general (e.g. the Clone Stamp) to the more specific (e.g. the Healing Brush). In the process, high-complexity, high-skill operations were rendered far more accessible—arguably to a fault. (I used to joke that believe it or not, drop shadows were cool before Photoshop made them easy. ¯\_(ツ)_/¯)
I think of that observation when seeing things like the Face Swap tool from Icons8. What once took considerable time & talent in an app like Photoshop is now rendered trivially fast (and free!) to do. “Days of Miracles & Wonder,” though we hardly even wonder now. (How long will it take DALL•E to go from blown minds to shrugged shoulders? But that’s a subject for another day.)
There’s no way this is real, is there?! I think it must use NFW technology (No F’ing Way), augmented with a side of LOL WTAF. 😛
Here’s an NYT video showing the system in action:
The NYT article offers a concise, approachable description of how the approach works:
A neural network learns skills by analyzing large amounts of data. By pinpointing patterns in thousands of avocado photos, for example, it can learn to recognize an avocado. DALL-E looks for patterns as it analyzes millions of digital images as well as text captions that describe what each image depicts. In this way, it learns to recognize the links between the images and the words.
When someone describes an image for DALL-E, it generates a set of key features that this image might include. One feature might be the line at the edge of a trumpet. Another might be the curve at the top of a teddy bear’s ear.
Then, a second neural network, called a diffusion model, creates the image and generates the pixels needed to realize these features. The latest version of DALL-E, unveiled on Wednesday with a new research paper describing the system, generates high-resolution images that in many cases look like photos.
Though DALL-E often fails to understand what someone has described and sometimes mangles the image it produces, OpenAI continues to improve the technology. Researchers can often refine the skills of a neural network by feeding it even larger amounts of data.
A big part of my rationale in going to Google eight (!) years ago was that a lot of creativity & expressivity hinge on having broad, even mind-of-God knowledge of one’s world (everywhere you’ve been, who’s most important to you, etc.). Given access to one’s whole photo corpus, a robot assistant could thus do amazing things on one’s behalf.
In that vein, MyStyle proposes to do smarter face editing (adjusting expressions, filling in gaps, upscaling) by being trained on 100+ images of an individual face. Check it out:
Can machines generate art like a human would? They already are.
Join us on March 30th, at 9AM Pacific for a live chat about what’s on the frontier of machine learning and art. Our team of panelists will break down how text prompts in machine learning models can create artwork like a human might, and what it all means for the future of artistic expression.
Aaron Hertzmann is a Principal Scientist at Adobe, Inc., and an Affiliate Professor at University of Washington. He received a BA in Computer Science and Art & Art History from Rice University in 1996, and a PhD in Computer Science from New York University in 2001. He was a professor at the University of Toronto for 10 years, and has worked at Pixar Animation Studios and Microsoft Research. He has published over 100 papers in computer graphics, computer vision, machine learning, robotics, human-computer interaction, perception, and art. He is an ACM Fellow and an IEEE Fellow.
Ryan is a Machine Learning Engineer/Researcher at Adobe with a focus on multimodal image editing. He has been creating generative art using machine learning for years, but is most known for his recent work with CLIP for text-to-image systems. With a Bachelor’s in Psychology from the University of Utah, he is largely self-taught.
V7 Labs has created a new artificial intelligence-based (AI) software that works as a Google Chrome extension that is capable of detecting artificially generated profile pictures — like the ones above — with a claimed 99.28% accuracy.
Creator Alberto Rizzoli walks through the flow in this video (more detailed than the one below).
My now-teammates’ work on Neural Filters is exactly what made me want to return to the ‘Dobe, and I’m thrilled to get to build upon what they’ve been doing. It’s great to see Fast Company recognizing this momentum:
For putting Photoshop wizardry within reach
Adobe’s new neural filters use AI to bring point-and-click simplicity to visual effects that would formerly have required hours of labor and years of image-editing expertise. Using them, you can quickly change a photo subject’s expression from deadpan to cheerful. Or adjust the direction that someone is looking. Or colorize a black-and-white photo with surprising subtlety. Part of Adobe’s portfolio of “Sensei” AI technologies, the filters use an advanced form of machine learning known as generative adversarial networks. That lets them perform feats such as rendering parts of a face that weren’t initially available as you edit a portrait. Like all new Sensei features, the neural filters were approved by an Adobe ethics committee and review board that assess AI products for problems stemming from issues such as biased data. In the case of these filters, this process identified an issue with how certain hairstyles were rendered and fixed it before the filters were released to the public.
I swear to God, stuff like this makes me legitimately feel like I’m having a stroke:
And that example, curiously, seems way more technically & aesthetically sophisticated than the bulk of what I see coming from the “NFT art” world. I really enjoyed this explication of why so much of such content seems like cynical horseshit—sometimes even literally:
PantherMedia, the first microstock agency in Germany, […] partnered with VAIsual, a technology company that pioneers algorithms and solutions to generate synthetic licensed stock media. The two have come together to offer the first set of 100% AI-generated, licensable stock photos of “people.”
None of the photos are of people who actually exist.
The “first” claim seems odd to me, as Generated.photos has been around for quite some time—albeit not producing torsos. That site offers an Anonymizer service that can take in your image, then generate multiple faces that vaguely approximate your characteristics. Here’s what it made for me:
Now I’m thinking of robots replacing humans in really crummy stock-photo modeling jobs, bringing to mind Mr. “Rob Ott” sliding in front of the camera:
Researchers at NVIDIA & Case Western Reserve University have developed an algorithm that can distinguish different painters’ brush strokes “at the bristle level”:
Extracting topographical data from a surface with an optical profiler, the researchers scanned 12 paintings of the same scene, painted with identical materials, but by four different artists. Sampling small square patches of the art, approximately 5 to 15 mm, the optical profiler detects and logs minute changes on a surface, which can be attributed to how someone holds and uses a paintbrush.
They then trained an ensemble of convolutional neural networks to find patterns in the small patches, sampling between 160 to 1,440 patches for each of the artists. Using NVIDIA GPUs with cuDNN-accelerated deep learning frameworks, the algorithm matches the samples back to a single painter.
The team tested the algorithm against 180 patches of an artist’s painting, matching the samples back to a painter at about 95% accuracy.
Illinois stayed largely snow-free during our recent visit, but I had some fun screwing around with Photoshop’s new Landscape Mixer Neural Filter, giving the place a dusting of magic:
Just for the lulz, I tried applying the filter to a 360º panorama I’d captured via my drone. The results don’t entirely withstand a lot of scrutiny (try showing the pano below in full-screen mode & examine the buildings), but they’re fun—and good grief, we can now do all this in literally one click!
For the sake of comparison, here’s the unmodified original:
As usual, I’m channeling Towlie in admitting I have no idea what’s going on right now—or at least just an inkling of one—but check out some recent witchcraft that takes in text & simple strokes, then synthesizes multiple kinds of outputs using a single model:
This new witchcraft “synthesizes not only high-resolution, multi-view-consistent images in real time, but also produces high-quality 3D geometry.” Plus it makes a literally dizzying array of gatos!
Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. For this purpose, we introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.
The imagineers (are they still called that?) promise a new way to create photorealistic full-head portrait renders from captured data without the need for artist intervention.
Our method begins with traditional face rendering, where the skin is rendered with the desired appearance, expression, viewpoint, and illumination. These skin renders are then projected into the latent space of a pre-trained neural network that can generate arbitrary photo-real face images (StyleGAN2).
The result is a sequence of realistic face images that match the identity and appearance of the 3D character at the skin level, but is completed naturally with synthesized hair, eyes, inner mouth and surroundings.
Rather than needing to draw out every element of an imagined scene, users can enter a brief phrase to quickly generate the key features and theme of an image, such as a snow-capped mountain range. This starting point can then be customized with sketches to make a specific mountain taller or add a couple trees in the foreground, or clouds in the sky.
It doesn’t just create realistic images — artists can also use the demo to depict otherworldly landscapes.
Today we are introducing Pet Portraits, a way for your dog, cat, fish, bird, reptile, horse, or rabbit to discover their very own art doubles among tens of thousands of works from partner institutions around the world. Your animal companion could be matched with ancient Egyptian figurines, vibrant Mexican street art, serene Chinese watercolors, and more. Just open the rainbow camera tab in the free Google Arts & Culture app for Android and iOS to get started and find out if your pet’s look-alikes are as fun as some of our favorite animal companions and their matches.
In traditional graphics work, vectorizing a bitmap image produces a bunch of points & lines that the computer then renders as pixels, producing something that approximates the original. Generally there’s a trade-off between editability (relatively few points, requiring a lot of visual simplification, but easy to see & manipulate) and fidelity (tons of points, high fidelity, but heavy & hard to edit).
Importing images into a generative adversarial network (GAN) works in a similar way: pixels are converted into vectors which are then re-rendered as pixels—and guess what, it’s a generally lossy process where fidelity & editability often conflict. When the importer tries to come up with a reasonable set of vectors that fit the entire face, it’s easy to end up with weird-looking results. Additionally, changing one attribute (e.g. eyebrows) may cause changes to others (e.g. hairline). I saw a case once where making someone look another direction caused them to grow a goatee (!).
My teammates’ FaceStudio effort proposes to address this problem by sidestepping the challenge of fitting the entire face, instead letting you broadly select a region and edit just that. Check it out:
Leaving aside the eye-popping, sometimes disconcerting applications of GANs for facial synthesis & editing, what if the core tech could be used just to generate high-quality results even with poor bandwidth? That’s one possible application of NVIDIA’s recent endeavors:
By analyzing various artists’ distinctive treatment of facial geometry, researchers in Israel devised a way to render images with both their painterly styles (brush strokes, texture, palette, etc.) and shape. Here’s a great six-minute overview:
What if Photoshop’s breakthrough Smart Portrait, which debuted at MAX last year, could work over time?
One may think this is an easy task as all that is needed is to apply Smart Portrait for every frame in the video. Not only is this tedious, but also visually unappealing due to lack of temporal consistency.
In Project Morpheus, we are building a powerful video face editing technology that can modify someone’s appearance in an automated manner, with smooth and consistent results.
It’s that thing where you wake up, see some exciting research, tab over to Slack to share it with your team—and then notice that the work is from your teammates. 😝
Check out StyleAlign from my teammate Eli Shechtman & collaborators. Among other things, they’ve discovered interesting, useful correspondences in ML models for very different kinds of objects:
We find that the child model’s latent spaces are semantically aligned with those of the parent, inheriting incredibly rich semantics, even for distant data domains such as human faces and churches. Second, equipped with this better understanding, we leverage aligned models to solve a diverse set of tasks. In addition to image translation, we demonstrate fully automatic cross-domain image morphing
Here’s a little taste of what it enables:
And to save you the trouble of looking up the afore-referenced Ghostbusters line, here ya go. 👻
The visualizations for StyleNeRF tech are more than a little trippy, but the fundamental idea—that generative adversarial networks (GANs) can enable 3D control over 2D faces and other objects—is exciting. Here’s an oddly soundtracked peek:
And here’s a look at the realtime editing experience:
The new 10-episode Snap original series “The Me and You Show” taps into Snapchat’s Cameos — a feature that uses a kind of deepfake technology to insert someone’s face into a scene. Using Cameos, the show makes you the lead actor in comedy skits alongside one of your best friends by uploading a couple of selfies. […]
The Cameos feature is based on tech developed by AI Factory, a startup developing image and video recognition, analysis and processing technology that Snap acquired in 2019. […]
According to Snap, more than 44 million Snapchat users engage with Cameos on a weekly and more than 16 million share Cameos with their friends.
I dunno—to my eye the results look like a less charming version of the old JibJab templates that were hot 20 years ago, but I’m 30 years older than the Snapchat core demographic, so what do I know?
These can be made with any still photo and will animate the head while other parts stay static and can’t have replaced backgrounds. Still, the result below shows how movements and facial expressions performed by the real person are seamlessly added to a still photograph. The human can act as a sort of puppeteer of the still photo image.
I keep meaning to pour one out for my nearly-dead homie, Photoshop 3D (post to follow, maybe). We launched it back in 2007 thinking that widespread depth capture was right around the corner. But “Being early is the same as being wrong,” as Marc Andreessen says, and we were off by a decade (before iPhones started putting depth maps into images).
Now, though, the world is evolving further, and researchers are enabling apps to perceive depth even in traditional 2D images—no special capture required. Check out what my colleagues have been doing together with university collaborators:
On the reasonable chance that you’re interested in my work, you might want to bookmark (or at least watch) this one. Two-Minute Papers shows how NVIDIA’s StyleGAN research (which underlies Photoshop’s Smart Portrait Neural Filter) has been evolving, recently being upgraded with Alias-Free GAN (which very nicely reduces funky artifacts—e.g. a “sticky beard” and “boiling” regions (hair, etc.):
Side note: I continue to find the presenter’s enthusiasm utterly infectious: “Imagine saying that to someone 20 years ago. You would end up in a madhouse!” and “Holy mother of papers!”
Hmm—I’m not sure what to think about this & would welcome your thoughts. Promising to “Give people an idea of your appearance, while still protecting your true identity,” this Anonymizer service will take in your image, then generate multiple faces that vaguely approximate your characteristics:
Here’s what it made for me:
I find the results impressive but a touch eerie, and as I say, I’m not sure how to feel. Is this something you’d find useful (vs., say, just using something other than a photograph as your avatar)?
You might remember the portrait relighting features that launched on Google Pixel devices last year, leveraging some earlier research. Now a number of my former Google colleagues have created a new method for figuring out how a portrait is lit, then imposing new light sources in order to help it blend into new environments.
Two-Minute Papers has put together a nice, accessible summary of how it works:
Heh—I was amused to hear generative apps’ renderings of human faces—often eerie, sometimes upsetting—described as turning people into “rotten fruits.”
This reminded me of a recurring sketch from Conan O’Brien’s early work, which featured literal rotting fruit acting out famous films—e.g. Apocalypse Now, with Francis Ford Coppola sitting there to watch:
No, I don’t know what this has to do with anything—except now I want to try typing “rotting fruit” plus maybe “napalm in the morning” into a generative engine just to see what happens. The horror… the horror!