Speaking of Bilawal, and in the vein of the PetPortrait.ai service I mentioned last week, here’s a fun little video in which he’s trained an AI model to create images of his mom’s dog. “Oreo lookin’ FESTIVE in that sweater, yo!” 🥰 I can only imagine that this kind of thing will become mainstream quickly.
Last year my friend Bilawal Singh Sidhu, a PM driving 3D experiences for Google Maps/Earth, created an amazing 3D render (also available in galactic core form) of me sitting atop the Trona Pinnacles. At that time he used “traditional” photogrammetry techniques (kind of a funny thing to say about an emerging field that remains new to the world), and this year he tried processing the same footage (comprised of a couple simple orbits from my drone) using new Neural Radiance Field (“NeRF”) tech:
I’m curious: Have you checked out these tools, and do you intend to put them to use in your creative processes? I have some thoughts that I can share soon, but in the meantime it’d be great to hear yours.
I’m not sure whom to credit with this impressive work (found here), nor how exactly they made it, but—like the bespoke pet portraits site I shared yesterday—I expect to see an explosion in such purpose-oriented applications of AI imaging:
We’re at just the start of what I expect to be an explosion of hyper-specific offerings powered by AI.
For $24, PetPortrait.ai offers “40 high resolution, beautiful, one-of-a-kind portraits of your pets in a variety of styles.” They say it takes 4-6 hours and requires the following input:
~10 portrait photos of their face
~5 photos from different angles of their head and chest
~5 full-body photos
It’ll be interesting to see what kind of traction this gets. The service Turn Me Royal offers more human-made offerings in a similar vein, and we delighted our son by commissioning this doge-as-Venetian-doge portrait (via an artist on Etsy) a couple of years ago:
A few weeks ago I shared info on Google’s “Infinite Nature” tech for generating eye-popping fly-throughs from still images. Now that team has shared various interesting tech details on how it all works. And if reading all that isn’t your bag, hey, at least enjoy some beautiful results:
I’m not working on such efforts & am not making an explicit link between the two—but broadly speaking, I find the intersection of such primitives/techniques to be really promising.
He notes, “Custom, fine-tuned models are absolutely game-changing, and in the future will almost certainly represent the majority of diffusion-based creativity.” 👀 Seems like a non-trivial statement coming from the new VP of product at Stability.ai.
I’ve tried it & it’s pretty slick. These guys are cooking with gas! (Also, how utterly insane would this have been to see even six months ago?! What a year, what a world.)
Introducing Infinite Image
Extend any image to infinite possibilities using a text description. A limitless canvas of creativity.
Christian has trained a model on Rivians & says (ambitiously, but not without some justification) that “This is how all advertising and marketing collateral will be made sooner than most of the world realizes.”
On a related note, here’s a thread (from an engineer at Shopify) on fine-tuning models to generate images of specific products (showing strengths/limitations).
I see numerous custom models emerging that enable creation of art in the style of Spider-Man, Pixar, and more.
OMG—interactive 3D shadow casting in 2D photos FTW! 🔥
In this sneak, we re-imagine what image editing would look like if we used Adobe Sensei-powered technologies to understand the 3D space of a scene – the geometry of a road and the car on the road, and the trees surrounding, the lighting coming from the sun and the sky, the interactions between all these objects leading to occlusions and shadows – from a single 2D photograph.
One of the sleeper features that debuted at Adobe MAX is the new Create Background, found under Neural Filters. (Note that you need to be running the current public beta release of Photoshop, available via the Creative Cloud app—y’know, that little “Cc” icon dealio you ignore in your menu bar. 🙃)
As this quick vid demonstrates, the filter can not only generate backgrounds based on text, it links to a Behance gallery containing images and popular prompts. You can use these visuals as inspiration, then use the prompts to produce artwork within the plugin:
I’m really excited to learn more about this development, which I’ve been eagerly awaiting. More control + more speed will make generative imaging truly, broadly useful. I’d like to understand how it compares to techniques like prompt editing.
Generative AI incorporated into Adobe Express will help less experienced creators achieve their unique goals. Rather than having to find a pre-made template to start a project with, Express users could generate a template through a prompt, and use Generative AI to add an object to the scene, or create a unique font based on their description. But they still will have full control — they can use all of the Adobe Express tools for editing images, changing colors, and adding fonts to create the flyer, poster, or social media post they imagine.
It seems almost too good to be true, but Google Researchers & their university collaborators have unveiled a way to edit images using just text:
In this paper we demonstrate, for the very first time, the ability to apply complex (e.g., non-rigid) text-guided semantic edits to a single real image. For example, we can change the posture and composition of one or multiple objects inside an image, while preserving its original characteristics. Our method can make a standing dog sit down or jump, cause a bird to spread its wings, etc. — each within its single high-resolution natural image provided by the user.
Contrary to previous work, our proposed method requires only a single input image and a target text (the desired edit). It operates on real images, and does not require any additional inputs (such as image masks or additional views of the object).
Easy placement/movement of 3D primitives -> realistic/illustrative rendering has long struck me as extremely promising. Using tech like StyleGAN to render from 3D can produce interesting results, but it’s been difficult to bring the level of quality & consistency up to what Adobe users demand.
Now with Stable Diffusion (and, one hopes, other diffusion models in the future) attached to Blender (and, one hopes, other object manipulation tools), the vision is getting closer to reality:
The power & immersiveness of rendering 3D from images is growing at an extraordinary rate. NeRF Studio promises to make creation much more approachable:
Easily my favorite thing at Google was getting to work with stone-cold geniuses like Noah Snavely (one of the minds behind Microsoft’s PhotoSynth) and Richard Tucker. Now they & their teammates have produced some jaw-dropping image synthesis tech:
And “hold onto your papers,” as here’s a look into how it all works:
The system uses images with descriptions to learn what the world looks like and how it is often described. It also uses unlabeled videos to learn how the world moves. With this data, Make-A-Video lets you bring your imagination to life by generating whimsical, one-of-a-kind videos with just a few words or lines of text.
Whew—no more wheedling my “grand-mentee” Joanne on behalf of colleagues wanting access. 😅
Starting today, we are removing the waitlist for the DALL·E beta so users can sign up and start using it immediately. More than 1.5M users are now actively creating over 2M images a day with DALL·E—from artists and creative directors to authors and architects—with over 100K users sharing their creations and feedback in our Discord community.
We are currently testing a DALL·E API with several customers and are excited to soon offer it more broadly to developers and businesses so they can build apps on this powerful system.
It’s hard to overstate just how much this groundbreaking technology has rocked our whole industry—all since publicly debuting less than 6 months ago! Congrats to the whole team. I can’t wait to see what they’re cooking up next.
Depending on how well it works, tech like this could be the greatest unlock in 3D creation the world has ever known.
The company blog post features interesting, promising details:
Though quicker than manual methods, prior 3D generative AI models were limited in the level of detail they could produce. Even recent inverse rendering methods can only generate 3D objects based on 2D images taken from various angles, requiring developers to build one 3D shape at a time.
GET3D can instead churn out some 20 shapes a second when running inference on a single NVIDIA GPU — working like a generative adversarial network for 2D images, while generating 3D objects. […]
GET3D gets its name from its ability to Generate Explicit Textured 3D meshes — meaning that the shapes it creates are in the form of a triangle mesh, like a papier-mâché model, covered with a textured material. This lets users easily import the objects into game engines, 3D modelers and film renderers — and edit them.
See also Dream Fields (mentioned previously) from Google:
The Corridor Crew has been banging on Stable Diffusion & Google’s new DreamBooth tech (see previous) that enables training the model to understand a specific concept—e.g. one person’s face. Here they’ve trained it using a few photos of team member Sam Gorski, then inserted him into various genres:
From there they trained up models for various guys at the shop, then created an illustrated fantasy narrative. Just totally incredible, and their sheer exuberance makes the making-of pretty entertaining:
I’m really excited to see this work from artists Holly Dryhurst & Mat Herndon. From Input Mag:
Dryhurst and Herndon are developing a standard they’re calling Source+, which is designed as a way of allowing artists to and opt into — or out of — allowing their work being used as training data for AI. (The standard will cover not just visual artists, but musicians and writers, too.) They hope that AI generator developers will recognize and respect the wishes of artists whose work could be used to train such generative tools.
Source+ (now in beta) is a product of the organization Spawning… [It] also developed Have I Been Trained, a site that lets artists see if their work is among the 5.8 billion images in the Laion-5b dataset, which is used to train the Stable Diffusion and MidJourney AI generators. The team plans to add more training datasets to pore through in the future.
The creators also draw a distinction between the rights of living vs. dead creators:
The project isn’t aimed at stopping people putting, say, “A McDonalds restaurant in the style of Rembrandt” into DALL-E and gazing on the wonder produced. “Rembrandt is dead,” Dryhurst says, “and Rembrandt, you could argue, is so canonized that his work has surpassed the threshold of extreme consequence in generating in their image.” He’s more concerned about AI image generators impinging on the rights of living, mid-career artists who have developed a distinctive style of their own.
And lastly,
“We’re not looking to build tools for DMCA takedowns and copyright hell,” he says. “That’s not what we’re going for, and I don’t even think that would work.”
On a personal note, I’m amused to see what the system thinks constitutes “John Nack”—apparently chubby German-ish old chaps…? 🙃
The makers of this new search engine say they’re already serving more than 200,000 images/day & growing rapidly. Per this article, “It’s a massive collection of over 5 million Stable Diffusion images including its text prompts.” Just get ready to see some… interesting art (?). 🙃
'Consonance' is my project that explores how AI interprets the spoken word. This is an excerpt from a James Joyce novel. I use his words exactly for the prompt, in the style of artist John Lavery. Funded by @futurescreensni A collaboration with @HeaneyCentre + Armchair & Rocket pic.twitter.com/8sGgipjZeb
Karen X. Cheng & pals (including my friend August Kamp) went to work extending famous works by Vermeer, Da Vinci, and Magritte, then placing them into AR filter (which you can launch from the post) that lets you walk right into the scenes. Wild!
“Shoon is a recently released side scrolling shmup,” says Vice, “that is fairly unremarkable, except for one quirk: it’s made entirely with art created by Midjourney, an AI system that generates images from text prompts written by users.’ Check out the results:
Magdalena Bay has shared a new Felix Geen directed video for “Dreamcatching.” The clip, multi-dimensional explored through cutting-edge AI technology and GAN artwork, combined with VQGAN+CLIP, is a technique that utilizes a collection of neural networks that work in unison to generate images based on input text and/or images.
Creative director Wes Phelan shared this charming little summary of how he creates kids’ books & games using DALL•E, including their newly launched outpainting support:
Let the canvases extend in every direction! The thoughtfully designed new tiling UI makes it easy to synthesize adjacent chunks in sequence, partly overcoming current resolution limits in generative imaging:
We just released a new edit interface for DALL·E that lets you use Outpainting to expand beyond the original borders of an image!
You can use this to make images with different aspect ratios, or arbitrarily large images like murals or magazine covers. pic.twitter.com/OW4lC6HQFl
Speaking of Paul here’s a fun new little VFX creation made using DALL•E:
AI is going to change VFX. This is a silly little experiment but it shows how powerful dall-e 2 is in generating elements into a pre existing video. These tools will become easier to use so when spectacle becomes cheap, ideas will prevail#aiart#dalle#ufo@openaidalle#dalle2pic.twitter.com/XGHy9uY09H
I… I just can’t handle it: this tech is advancing so fast, my hair is whipping back. 😅
My old teammate Yael Pritch & team have announced DreamBooth: by providing 3-5 images of a subject, you can fine-tune a model of that subject, then generate variations (e.g. changing the environment and context).