Category Archives: AI/ML

Stable Diffusion can draw the contents of your brain

“It’s all in your head.” — Gorillaz

I’ve spent the last ~year talking about my brain being “DALL•E-pilled,” where I’ve started seeing just about everything (e.g. a weird truck) as some kind of AI manifestation. But that’s nothing compared to using generative imaging models to literally see your thoughts:

Researchers Yu Takagi and Shinji Nishimoto, from the Graduate School of Frontier Biosciences at Osaka University, recently wrote a paper outlining how it’s possible to reconstruct high res images (PDF) using latent diffusion models, by reading human brain activity gained from functional Magnetic Resonance Imaging (fMRI), “without the need for training or fine-tuning of complex deep generative models” (via Vice).

Use Stable Diffusion ControlNet in Photoshop

Check out this integration of sketch-to-image tech—and if you have ideas/requests on how you’d like to see capabilities like these get more deeply integrated into Adobe tools, lay ’em on me!

Also, it’s not in Photoshop, but as it made me think of the Photo Restoration Neural Filter in PS, check out this use of ControlNet to revive an old family photo:

3D + AI: Stable Diffusion comes to Blender

I’m really excited to see what kinds of images, not to mention videos & textured 3D assets, people will now be able to generate via emerging techniques (depth2img, ControlNet, etc.):

AI: Running image synthesis in seconds, *on your telephone*

Looks like a bunch of my former teammates have been doing great work to enable Stable Diffusion to synthesize images in ~15s on an Android device:

In a demo video, Qualcomm shows version 1.5 of Stable Diffusion generating a 512 x 512 pixel image in under 15 seconds. Although Qualcomm doesn’t say what the phone is, it does say it’s powered by its flagship Snapdragon 8 Gen 2 chipset (which launched last November and has an AI-centric Hexagon processor). The company’s engineers also did all sorts of custom optimizations on the software side to get Stable Diffusion running optimally.

ControlNet is wild

This new capability in Stable Diffusion (think image-to-image, but far more powerful) produces some real magic. Check out what I got with some simple line art:

And check out this thread of awesome sauce:

Welcome to the meme-predicted future.

An entirely generative realtime musical performance

1992 Pink Floyd laser light show in Dubuque, IA—you are back. 😅

Through this AI DJ project, we have been exploring the future of DJ performance with AI. At first, we tried to make an AI-based music selection system as an AI DJ. In the second iteration, we utilized a few AI models on stage to generate real-time symbolic music (i.e., MIDI). In the performance, a human DJ (Tokui) controlled various parameters of the generative AI models and drum machines. This time, we aim to advance one step further and deploy AI models to generate audio on stage in near real-time. Everything you hear during the performance will be pure AI-generation (no synthesizer, no drum machine).

In this performance, Emergent Rhythm, the human DJ will become an AJ or “AI Jockey” instead of a Disk Jockey, and he is expected to tame and ride the AI-generated audio stream in real-time. The distinctive characteristics of AI-based audio generation and “morphing” will provide a unique and even otherworldly sonic experience for the audience.

Live talk Saturday: “An Introduction to AI for Designers”

Sounds like it could be an interesting session:

Introducing the new DigitalFUTURES course of free AI tutorials.

Several of the top AI designers in the world are coming together to offer the world’s first free, comprehensive course in AI for designers. This course starts off at an introductory level and gets progressively more advanced. 18 Feb, Introductory Session 10.00 am EST, 4.00 pm CET, 11.00 pm China What is AI? What are Midjourney, DALL•E, Stable Diffusion, etc.? What is GPT3? What is ChatGPT? And how are they revolutionizing design?

Neil Leach
Shael Patel
Reem Mosleh
Clay Odom

New generative delights

Paul Trillo used Runway’s new Gen-1 experimental model to create a Cubist Simpsons intro:

Meanwhile fabdream.ai salutes the power of love:

Runway introduces “Gen-1” to stylize video

Check out this new generative stylization model. I’m intrigued by the idea of using simple primitives (think dollhouse furniture) to guide synthesis & stylization (e.g. of the buildings shown briefly here).

See this thread from company founder Cristóbal Valenzuela:

“The impossibilities are endless”: Yet more NeRF magic

Last month Paul Trillo shared some wild visualizations he made by walking around Michelangelo’s David, then synthesizing 3D NeRF data. Now he’s upped the ante with captures from the Louvre:

Over in Japan, Tommy Oshima used the tech to fly around, through, and somehow under a playground, recording footage via a DJI Osmo + iPhone:

https://twitter.com/jnack/status/1616981915902554112?s=20&t=5LOmsIoifLw8oNVMV2fYIw
As I mentioned last week, Luma Labs has enabled interactive model embedding, and now they’re making the viewer crazy-fast:

Me talk generative imaging one day

I got my professional start at AGENCY.COM, a big dotcom-era startup co-founded by creative whirlwind Kyle Shannon. Kyle has been exploring AI imaging like mad, and recently he’s organized an AI Artists Salon that anyone is welcome to join in person (Denver) or online:

The AI Artists Salon is a collaborative group of creatively-minded people and we welcome anyone curious about the tsunami of inspiring generative technologies already rocking our our world. See Community Links & Resources.

On Tuesday evening I had the chance to present some ideas & progress that has inspired me—nothing confidential about Adobe work, of course, but hopefully illuminating nonetheless. If you’re interested, check it out (and pro tip: if you set playback to 1.5x speed or higher, I sound a lot sharper & funnier!).

The world’s first (?) NeRF-powered commercial

Karen X. Cheng, back with another 3D/AI banger:

As luck (?) would have it, the commercial dropped on the third anniversary of my former teammate Jon Barron & collaborators bringing NeRFs into existence:

The Chainsmokers meet Stable Diffusion

“HEY MAN, you ever drop acid?? No? Well I do, and it looks *just like this*!!” — an excitable Googler when someone wallpapered a big meeting room in giant DeepDream renderings

In a similar vein, have fun tripping balls with AI, courtesy of Remi Molettee:

Bonus: Journey gets the treatment:

Bonus bonus: Journey gets rather hilariously silenced:

AI-painted animation: “Help Changes Everything”

In this beautiful work from Paul Trillo & co., AI extends—instead of replaces—human creativity & effort:

Here’s a peek behind the scenes:

AI: From dollhouse to photograph

Check out Karen X. Cheng’s clever use of simple wooden props + depth-to-image synthesis to create 3D renderings:

She writes,

1. Take reference photo (you can use any photo – e.g. your real house, it doesn’t have to be dollhouse furniture)
2. Set up Stable Diffusion Depth-to-Image (google “Install Stable Diffusion Depth to Image YouTube”)
3. Upload your photo and then type in your prompts to remix the image

We recommend starting with simple prompts, and then progressively adding extra adjectives to get the desired look and feel. Using this method, @justinlv generated hundreds of options, and then we went through and cherrypicked our favorites for this video

AI Snoop Dogg has arrived

…y’know, for all of you who were waiting. 🙄

I’m not sure what to say about “The first rap fully written and sung by an AI with the voice of Snoop Dogg,” except that now I really want the ability to drop in collaborations by other well known voices—e.g. Christopher Walken.

Maybe someone can now lip-sync it with the faces of YoDogg & friends:

Heinz AI Ketchup

Life’s like a mayonnaise soda…
What good is seeing eye chocolate…

Lou Reed

The marketers at Heinz had a little fun noticing that an AI image-making app (DALL•E, I’m guessing) tended to interpret requests for “ketchup” in the style of Heinz’s iconic bottle. Check it out:

ArtStation, Kickstarter, and others share their AI art policies

The whole community of creators, including toolmakers, continues to feel its way forward in the fast-moving world of AI-enabled image generation. For reference, here are some of the statements I’ve been seeing:

  • ArtStation has posted guidance on “Use of AI Software on ArtStation.”
    • Projects tagged using “NoAI” will automatically be assigned an HTML “NoAI” meta tag.
    • Projects won’t be assigned this tag automatically, as the site wants creators to choose whether or not their work is eligible for use in training.
  • Kickstarter has shared “Our Current Thinking on the Use of AI-Generated Image Software and AI Art.
    • “Kickstarter must, and will always be, on the side of creative work and the humans behind that work. We’re here to help creative work thrive.”
    • Key questions they’ll ask include “Is a project copying or mimicking an artist’s work?” and “Does a project exploit a particular community or put anyone at risk of harm?”
  • From 3dtotal Publishing:
    • “3dtotal has four fundamental goals. One of them is to support and help the artistic community, so we cannot support AI art tools as we feel they hurt this community.”
  • Clip Studio Paint will no longer implement an image generator function:
    • “We received a lot of feedback from the community and will no longer implement the image generator palette.”
    • They “fear that this will make Clip Studio Paint artwork synonymous with AI-generated work” and are choosing to prioritize other features.
  • The Society of Illustrators has shared their thoughts:
    • “We oppose the commercial use of Artificially manufactured images and will not allow AI into our annual competitions at all levels.”
    • “AI was trained using copyrighted images. We will oppose any attempts to weaken copyright protections, as that is the cornerstone of the illustration community.”

More NeRF magic: From Michelangelo to NYC

This stuff—creating 3D neural models from simple video captures—continues to blow my mind. First up is Paul Trillo visiting the David:

Then here’s AJ from the NYT doing a neat day-to-night transition:

And lastly, Hugues Bruyère used a 360º camera to capture this scene, then animate it in post (see thread for interesting details):

https://twitter.com/smallfly/status/1604609303255605251?s=20&t=jdSW1NC_n54YTxsnkkFPJQ

A cool, quick demo of Midjourney->3D

Numerous apps are promising pure text-to-geometry synthesis, as Luma AI shows here:

On a more immediately applicable front, though, artists are finding ways to create 3D (or at least “two-and-a-half-D”) imagery right from the output of apps like Midjourney. Here’s a quick demo using Blender:

In a semi-related vein, I used CapCut to animate a tongue-in-cheek self portrait from my friend Bilawal:

https://twitter.com/jnack/status/1599476677918478337?s=20&t=vu_Q7Wme3Q3Ueqp1WaGUpA

[Via Shi Yan]

Helping artists control whether AI trains on their work

I believe strongly that creative tools must honor the wishes & rights of creative people. Hopefully that sounds thuddingly obvious, but it’s been less obvious how to get to a better state than the one we now inhabit, where a lot of folks are (quite reasonably, IMHO) up in arms about AI models having been trained on their work, without their consent. People broadly agree that we need solutions, but getting to them—especially via big companies—hasn’t been quick.

Thus it’s great to see folks like Mat Dryhurst & Holly Herndon driving things forward, working with Stability.ai and others to define opt-out/-in tools & get buy-in from model trainers. Check out the news:

https://twitter.com/spawning_/status/1603126330261897217

Here’s a concise explainer vid from Mat:

DALL•E/Stable Diffusion Photoshop panel gains new features

Our friend Christian Cantrell (20-year Adobe vet, now VP of Product at Stability.ai) continues his invaluable world to plug the world of generative imaging directly into Photoshop. Check out the latest, available for free here:

More NeRF magic: Dolly zoom & beyond

It’s insane to me how much these emerging tools democratize storytelling idioms—and then take them far beyond previous limits. Recently Karen X. Cheng & co. created some wild “drone” footage simply by capturing handheld footage with a smartphone:

Now they’re creating an amazing dolly zoom effect, again using just a phone. (Click through to the thread if you’d like details on how the footage was (very simply) captured.)

Meanwhile, here’s a deeper dive on NeRF and how it’s different from “traditional” photogrammetry (e.g. in capturing reflective surfaces):

Disney demos new aging/de-aging tech

Check out the latest magic, as described by Gizmodo:

To make an age-altering AI tool that was ready for the demands of Hollywood and flexible enough to work on moving footage or shots where an actor isn’t always looking directly at the camera, Disney’s researchers, as detailed in a recently published paper, first created a database of thousands of randomly generated synthetic faces. Existing machine learning aging tools were then used to age and de-age these thousands of non-existent test subjects, and those results were then used to train a new neural network called FRAN (face re-aging network).

When FRAN is fed an input headshot, instead of generating an altered headshot, it predicts what parts of the face would be altered by age, such as the addition or removal of wrinkles, and those results are then layered over the original face as an extra channel of added visual information. This approach accurately preserves the performer’s appearance and identity, even when their head is moving, when their face is looking around, or when the lighting conditions in a shot change over time. It also allows the AI generated changes to be adjusted and tweaked by an artist, which is an important part of VFX work: making the alterations perfectly blend back into a shot so the changes are invisible to an audience.

AI-made avatars for LinkedIn, Tinder, and more

As I say, another day, another specialized application of algorithmic fine-tuning. Per Vice:

For $19, a service called PhotoAI will use 12-20 of your mediocre, poorly-lit selfies to generate a batch of fake photos specially tailored to the style or platform of your choosing. The results speak to an AI trend that seems to regularly jump the shark: A “LinkedIn” package will generate photos of you wearing a suit or business attire…

…while the “Tinder” setting promises to make you “the best you’ve ever looked”—which apparently means making you into an algorithmically beefed-up dudebro with sunglasses. 

Meanwhile, the quality of generated faces continues to improve at a blistering pace:

Crowdsourced AI Snoop Doggs (is a real headline you can now read)

The Doggfather recently shared a picture of himself (rendered presumably via some Stable Diffusion/DreamBooth personalization instance)…

…thus inducing fans to reply with their own variations (click tweet above to see the thread). Among the many fun Snoop Doggs (or is it Snoops Dogg?), I’m partial to Cyberpunk…

…and Yodogg:

Some amazing AI->parallax animations

Great work from Guy Parsons, combining Midjourney with Capcut:

And from the replies, here’s another fun set:

Check out frame interpolation from Runway

I meant to share this one last month, but there’s just no keeping up with the pace of progress!

My initial results are on the uncanny side, but more skillful practitioners like Paul Trillo have been putting the tech to impressive use:

Happy Thanksgiving! Pass the tasty inpainting.

Among the many, many things for which I can give thanks this year, I want to express my still-gobsmacked appreciation of the academic & developer communities that have brought us this year’s revolution in generative imaging. One of those developers is our friend & Adobe veteran Christian Cantrell, and he continues to integrate new tech from his new company (Stability AI) into Photoshop at a breakneck pace. Here’s the latest:

Here he provides a quick comparison between results from the previous Stable Diffusion inpainting model (top) & the latest one:

In any event, wherever you are & however you celebrate (or don’t), I hope you’re well. Thanks for reading, and I wish all the best for the coming year!

Dalí meets DALL•E! 👨🏻‍🎨🤖

Among the great pleasures of this year’s revolutions in AI imaging has been the chance to discover & connect with myriad amazing artists & technologists. I’ve admired the work of Nathan Shipley, so I was delighted to connect him with my self-described “grand-mentee” Joanne Jang, PM for DALL•E. Nathan & his team collaborated with the Dalí Museum & OpenAI to launch Dream Tapestry, a collaborative realtime art-making experience.

The Dream Tapestry allows visitors to create original, realistic Dream Paintings from a text description. Then, it stitches a visitor’s Dream Painting together with five other visitors’ paintings, filling in the spaces between them to generate one collective Dream Tapestry. The result is an ever-growing series of entirely original Dream Tapestries, exhibited on the walls of the museum.

Check it out:

My Heritage introduces “AI Time Machine”

Another day, another special-purpose variant of AI image generation.

A couple of years ago, MyHeritage struck a chord with the world via Deep Nostalgia, an online app that could animate the faces of one’s long-lost ancestors. In reality it could animate just about any face in a photo, but I give them tons of credit for framing the tech in a really emotionally resonant way. It offered not a random capability, but rather a magical window into one’s roots.

Now the company is licensing tech from Astria, which itself builds on Stable Diffusion & Google Research’s DreamBooth paper. Check it out:

Interestingly (perhaps only to me), it’s been hard for MyHeritage to sustain the kind of buzz generated by Deep Nostalgia. They later introduced the much more ambitious DeepStory, which lets you literally put words in your ancestors’ mouths. That seems not to have bent the overall needle in awareness, at least in the way that the earlier offering did. Let’s see how portrait generation fares.

Neural JNack has entered the chat… 🤖

Last year my friend Bilawal Singh Sidhu, a PM driving 3D experiences for Google Maps/Earth, created an amazing 3D render (also available in galactic core form) of me sitting atop the Trona Pinnacles. At that time he used “traditional” photogrammetry techniques (kind of a funny thing to say about an emerging field that remains new to the world), and this year he tried processing the same footage (comprised of a couple simple orbits from my drone) using new Neural Radiance Field (“NeRF”) tech:

For comparison, here’s the 3D model generated via the photogrammetry approach:

The file is big enough that I’ve had some trouble loading it on my iPhone. If that affects you as well, check out this quick screen recording:

Feedback, please: AI-powered ideation & collaboration?

A new (to me, at least) group called Kive has just introduced AI Canvas.

Here’s a quick demo:

To my eye it’s similar to Prompt.ist, introduced a couple of weeks ago by Facet:

https://twitter.com/josephreisinger/status/1586042022401409024

I’m curious: Have you checked out these tools, and do you intend to put them to use in your creative processes? I have some thoughts that I can share soon, but in the meantime it’d be great to hear yours.