OMG, what is even happening?!
Every. Single. Week. There’s. A. Breakthrough.
— Suhail (@Suhail) September 29, 2022
Per the site,
The system uses images with descriptions to learn what the world looks like and how it is often described. It also uses unlabeled videos to learn how the world moves. With this data, Make-A-Video lets you bring your imagination to life by generating whimsical, one-of-a-kind videos with just a few words or lines of text.
Completely insane. DesireToKnowMoreIntensifies.gif!
Whew—no more wheedling my “grand-mentee” Joanne on behalf of colleagues wanting access. 😅
Starting today, we are removing the waitlist for the DALL·E beta so users can sign up and start using it immediately. More than 1.5M users are now actively creating over 2M images a day with DALL·E—from artists and creative directors to authors and architects—with over 100K users sharing their creations and feedback in our Discord community.
You can sign up here. Also exciting:
We are currently testing a DALL·E API with several customers and are excited to soon offer it more broadly to developers and businesses so they can build apps on this powerful system.
It’s hard to overstate just how much this groundbreaking technology has rocked our whole industry—all since publicly debuting less than 6 months ago! Congrats to the whole team. I can’t wait to see what they’re cooking up next.
Depending on how well it works, tech like this could be the greatest unlock in 3D creation the world has ever known.
The company blog post features interesting, promising details:
Though quicker than manual methods, prior 3D generative AI models were limited in the level of detail they could produce. Even recent inverse rendering methods can only generate 3D objects based on 2D images taken from various angles, requiring developers to build one 3D shape at a time.
GET3D gets its name from its ability to Generate Explicit Textured 3D meshes — meaning that the shapes it creates are in the form of a triangle mesh, like a papier-mâché model, covered with a textured material. This lets users easily import the objects into game engines, 3D modelers and film renderers — and edit them.
See also Dream Fields (mentioned previously) from Google:
Christian Cantrell + the Stability devs remain a house on fire:
— Christian Cantrell (@cantrell) September 26, 2022
Here’s a more detailed (3-minute) walk-through of this free plugin:
The Corridor Crew has been banging on Stable Diffusion & Google’s new DreamBooth tech (see previous) that enables training the model to understand a specific concept—e.g. one person’s face. Here they’ve trained it using a few photos of team member Sam Gorski, then inserted him into various genres:
From there they trained up models for various guys at the shop, then created an illustrated fantasy narrative. Just totally incredible, and their sheer exuberance makes the making-of pretty entertaining:
- Support of your own Server and Stability Cloud
- Text2Image, Inpainting (2 variations), and Image2Image
- Preview Screen
- Modifiers Library
- Working on Selection
- Tiling, Face Reconstruction, Multi Server Management and more
The Stable Diffusion-centered search engine (see a few posts back) now makes it easy to turn a real-world concept into a Stable Diffusion prompt:
This seems like precisely what I pined for publicly, albeit then about DALL•E:
Dryhurst and Herndon are developing a standard they’re calling Source+, which is designed as a way of allowing artists to and opt into — or out of — allowing their work being used as training data for AI. (The standard will cover not just visual artists, but musicians and writers, too.) They hope that AI generator developers will recognize and respect the wishes of artists whose work could be used to train such generative tools.
Source+ (now in beta) is a product of the organization Spawning… [It] also developed Have I Been Trained, a site that lets artists see if their work is among the 5.8 billion images in the Laion-5b dataset, which is used to train the Stable Diffusion and MidJourney AI generators. The team plans to add more training datasets to pore through in the future.
The creators also draw a distinction between the rights of living vs. dead creators:
The project isn’t aimed at stopping people putting, say, “A McDonalds restaurant in the style of Rembrandt” into DALL-E and gazing on the wonder produced. “Rembrandt is dead,” Dryhurst says, “and Rembrandt, you could argue, is so canonized that his work has surpassed the threshold of extreme consequence in generating in their image.” He’s more concerned about AI image generators impinging on the rights of living, mid-career artists who have developed a distinctive style of their own.
“We’re not looking to build tools for DMCA takedowns and copyright hell,” he says. “That’s not what we’re going for, and I don’t even think that would work.”
On a personal note, I’m amused to see what the system thinks constitutes “John Nack”—apparently chubby German-ish old chaps…? 🙃
More awesome work from Christian Cantrell in his free plugin. Check it out:
Lovely work from Glenn Marshall & friends:
Check out ClipDrop’s relighting app, demoed here:
The app allows you to apply professional lights to your portrait images 📸 in real time ⚡
— Onur Tasar (@onurxtasar) September 7, 2022
Fellow nerds might enjoy reading about the implementation details.
Great work from developer Christian Cantrell! I’d love to know what you think of this.
“Shoon is a recently released side scrolling shmup,” says Vice, “that is fairly unremarkable, except for one quirk: it’s made entirely with art created by Midjourney, an AI system that generates images from text prompts written by users.’ Check out the results:
Meanwhile my friend Bilawal is putting generative imaging to work in creating viral VFX:
Magdalena Bay has shared a new Felix Geen directed video for “Dreamcatching.” The clip, multi-dimensional explored through cutting-edge AI technology and GAN artwork, combined with VQGAN+CLIP, is a technique that utilizes a collection of neural networks that work in unison to generate images based on input text and/or images.
Judi Dench fighting a centaur on the moon!
Happy Friday. 😅
Let the canvases extend in every direction! The thoughtfully designed new tiling UI makes it easy to synthesize adjacent chunks in sequence, partly overcoming current resolution limits in generative imaging:
Here’s a nice little demo from our designer Davis Brown, who takes his dad Russell’s surreal desert explorations to totally new levels:
Amazing work from the always clever Karen X. Cheng, collaborating with Paul Trillo & others:
View this post on Instagram
Speaking of Paul here’s a fun new little VFX creation made using DALL•E:
AI is going to change VFX. This is a silly little experiment but it shows how powerful dall-e 2 is in generating elements into a pre existing video. These tools will become easier to use so when spectacle becomes cheap, ideas will prevail#aiart #dalle #ufo @openaidalle #dalle2 pic.twitter.com/XGHy9uY09H— Paul Trillo (@paultrillo) August 30, 2022
I… I just can’t handle it: this tech is advancing so fast, my hair is whipping back. 😅
My old teammate Yael Pritch & team have announced DreamBooth: by providing 3-5 images of a subject, you can fine-tune a model of that subject, then generate variations (e.g. changing the environment and context).
The creative possibilities are just bonkers.
Eng manager Barry Young writes,
The latest beta build of Photoshop contains a new feature called Photo Restoration. Whenever I have seen new updates in AI photo restoration over the last few years, I have tried the technology on an old family photo that I have of my great great great grandfather. A Scotsman who lived between 1845-1919. I applied the neural filter plus colorize technique to update the image in Photoshop. The restored photo is on the left, the original on the right. It is really astonishing how advanced AI is becoming.
Learn more about accessing the feature in Photoshop here.
I don’t know much about these folks, but I’m excited to see that they’re working to integrate Stable Diffusion into Photoshop:
You can add your name to the waitlist via their site. Meanwhile here’s another exploration of SD + Photoshop:
See, isn’t that a more seductive title than “Personalizing Text-to-Image Generation using Textual Inversion“? 😌 But the so-titled paper seems really important in helping generative models like DALL•E to become much more precise. The team writes:
We ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom.
Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new “words” in the embedding space of a frozen text-to-image model. These “words” can be composed into natural language sentences, guiding personalized creation in an intuitive way.
Check out the kind of thing it yields:
[Update: Seems that much of this may be fake. :-\ Still, the fact that it’s remotely plausible is nuts!]
Good lord (and poor Conan!). This creator used:
- DALL•E to create hundreds of similar-looking images of a face
- Create Skeleton to convert them into a 3D model
- DeepMotion.com to generate 3D body animation
- Deepfake Lab to generate facial animation
- Audio tools to deepen & distort her voice, creating a new one
The new open-source Stable Diffusion model is pretty darn compelling. Per PetaPixel:
“Just telling the AI something like ‘landscape photography by Marc Adamus, Glacial lake, sunset, dramatic lighting, mountains, clouds, beautiful’ gives instant pleasant looking photography-like images. It is incredible that technology has got to this point where mere words produce such wonderful images (please check the Facebook group for more).” — photographer Aurel Manea
Many years ago (nearly 10!), when I was in the thick of making up bedtime stories every night, I wished aloud for an app that would help do the following:
- Record you telling your kids bedtime stories (maybe after prompting you just before bedtime)
- Transcribe the text
- Organize the sound & text files (into a book, journal, and/or timeline layout)
- Add photos, illustrations, and links.
- Share from the journal to a blog, Tumblr, etc.
I was never in a position to build it, but seeing this fusion of kid art + AI makes me hope again:
With #stablediffusion img2img, I can help bring my 4yr old’s sketches to life.
— PH AI (@fofrAI) August 23, 2022
So here’s my tweet-length PRD:
- Record parents’/kids’ voices.
- Transcribe as a journal.
- Enable scribbling.
- Synthesize images on demand.
On behalf of parents & caregivers everywhere, come on, world—LFG! 😛
Malick Lombion & friends combined “more than 1,200 AI-generated art pieces combined with around 1,400 photographs” to create this trippy tour:
Elsewhere, After Effects ninja Paul Trillo is back at it with some amazing video-meets-DALL•E-inpainting work:
I’m eager to see all the ways people might combine generation & fashion—e.g. pre-rendering fabric for this kind of use in AR:
Happy Monday. 😌
[Via Dave Dobish]
We are teetering on the cusp of a Cambrian explosion in UI creativity, with hundreds of developers competing to put amazing controls atop a phalanx of ever-improving generative models. These next couple of months & years are gonna be wiiiiiiild.
Watching this clip from the Today Show introduction of Photoshop in 1990, it’s amazing to hear the same ethical questions 32 years ago that we contend with now around AI-generated imagery. Also amazing: I now work with Russell‘s son Davis (our designer) to explore AI imaging + Photoshop and beyond.
Ever since DALL•E hit the scene, I’ve been wanting to know what words its model for language-image pairing would use to describe images:
Now the somewhat scarily named CLIP Interrogator promises exactly that kind of insight:
What do the different OpenAI CLIP models see in an image? What might be a good text prompt to create similar images using CLIP guided diffusion or another text to image model? The CLIP Interrogator is here to get you answers!
Here’s hoping it helps us get some interesting image -> text -> image flywheels spinning.
More wildly impressive inpainting & animation from Paul Trillo:
Just fully bonkers—and I’ve gotta think there’s a lot more to come!
Hmm—this is no doubt brilliant tech, and I’d like to learn more, but I wonder about the Venn diagram between “Objects that people want in 3D,” “Objects for which a sufficiently large number of good images exist,” and “Objects for which good human-made 3D models don’t already exist.” In my experience photogrammetry is most relevant for making models from extremely specific subjects (e.g. a particular apartment) rather than from common objects that are likely to exist on Sketchfab et al. It’s entirely possible I’m missing a nuanced application here, though. As I say, cool tech!
I wish I’d gotten to work more with Steve Seitz at Google, as I’ve long admired his wide-ranging work (from Photosynth to Face Movies to the company’s new 3D video collaboration tech). Here he provides a pretty accessible overview of how large language models (e.g. those behind DALL•E & similar systems) actually work:
Though we don’t (yet?) have the ability to use 3D meshes (e.g. those generated from a photo of a person) to guide text-based synthesis through systems like DALL•E, here’s a pretty compelling example of making 2D art, then wrapping it onto a body in real time:
— Maxiм (@maximkuzlin) August 3, 2022
“This emerging tech isn’t perfect yet, so we got some weird results along with ones that looked like Heinz—but that was part of the fun. We then started plugging in ketchup combination phrases like ‘impressionist painting of a ketchup bottle’ or ‘ketchup tarot card’ and the results still largely resembled Heinz. We ultimately found that no matter how we were asking, we were still seeing results that looked like Heinz.”
Pass the Kemp!
[Via Aaron Hertzmann]
Creator Paul Trillo (see previous) is back at it. Here’s new work + a peek into how it’s made:
I mentioned Meta Research’s DALL•E-like Make-A-Scene tech when it debuted recently, but I couldn’t directly share their short overview vid. Here’s a quick look at how various artists have been putting the system to work, notably via hand-drawn cues that guide image synthesis:
GANs (generative adversarial networks), like what underpins Smart Portrait in Photoshop, promise all kind of fine-grained image synthesis and editing. Check out new advances around one’s ‘do:
[Via Davis Brown]
AI animation tech, which in this case leverages the motion of a face in a video to animate a different face in a still image, keeps getting better & better. Check out these results from Samsung Labs:
When it rains, it pours: Text2LIVE promises the ability to use descriptions to modify parts of photos:
[O]ur goal is to edit the appearance of existing objects (e.g., object’s texture) or augment the scene with new visual effects (e.g., smoke, fire) in a semantically meaningful manner.
It also works on video:
This new tech from
The team writes,
We found that the image generated from both text and sketch was almost always (99.54 percent of the time) rated as better aligned with the original sketch. It was often (66.3 percent of the time) more aligned with the text prompt too. This demonstrates that Make-A-Scene generations are indeed faithful to a person’s vision communicated via the sketch.
Nicely done; can’t wait to see more experiences like this.
It’s cool & commendable to see OpenAI making improvements in the tricky area of increasing representation & diversity among the humans it depicts. From email they sent today:
DALL·E now generates images of people that more accurately reflect the diversity of the world’s population. Thank you to everyone who has marked results as biased in our product; your feedback helped inform and evaluate this new mitigation, which we plan on refining as we gather more data and feedback.
People have been noticing & sharing examples, e.g. via this Reddit thread.
[Update: See their blog post for more details & examples.]
Neil Leach, author of Architecture in the Age of Artificial Intelligence: An Introduction to AI for Architects, here shares his enthusiastic thoughts about emerging tools becoming “a prosthesis for the human imagination” (recalling Steve Jobs describing the computer as “a bicycle for the mind”).
Diffusion models, such as MidJourney, are going to be game changers that will change the way in which we operate. Consulting these models for inspiration in the design studio will become as common as using Google or Wikipedia when writing an essay. Importantly, however, we must recognise that there are other forms of AI that will be deployed in architectural design, that will look at other aspects of design, such as performance. For the moment they operate as a form of ‘extended intelligence’ – or as an extension to the human imagination – where the designer remains in charge. Eventually, however, we can expect these all to be incorporated on to a single ‘data to fabrication’ platform that will allow building designs to be generated completely autonomously.