I… I just can’t handle it: this tech is advancing so fast, my hair is whipping back. 😅
My old teammate Yael Pritch & team have announced DreamBooth: by providing 3-5 images of a subject, you can fine-tune a model of that subject, then generate variations (e.g. changing the environment and context).
The latest beta build of Photoshop contains a new feature called Photo Restoration. Whenever I have seen new updates in AI photo restoration over the last few years, I have tried the technology on an old family photo that I have of my great great great grandfather. A Scotsman who lived between 1845-1919. I applied the neural filter plus colorize technique to update the image in Photoshop. The restored photo is on the left, the original on the right. It is really astonishing how advanced AI is becoming.
Learn more about accessing the feature in Photoshop here.
We ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom.
Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new “words” in the embedding space of a frozen text-to-image model. These “words” can be composed into natural language sentences, guiding personalized creation in an intuitive way.
The new open-source Stable Diffusion model is pretty darn compelling. Per PetaPixel:
“Just telling the AI something like ‘landscape photography by Marc Adamus, Glacial lake, sunset, dramatic lighting, mountains, clouds, beautiful’ gives instant pleasant looking photography-like images. It is incredible that technology has got to this point where mere words produce such wonderful images (please check the Facebook group for more).” — photographer Aurel Manea
What the what? From Sébastien Deguy, founder of Allegorithmic & now VP 3D & Immersive at Adobe:
I don’t communicate often about that other part of my activities, but I am so glad I could work with one of my all time favorites that I have to 🙂 My latest track, featuring Raekwon (Wu-Tang Clan) is available on all streaming platforms! Like here on Spotify.
Share your knowledge, passion and experience of Adobe Premiere Pro and After Effects with video makers.
Engage daily with communities around professional video wherever they are (Reddit, Twitter, Facebook, Instagram, etc.) in two-way conversations, representing Adobe.
Be active and visible within the community, build long term relationships and trust, and demonstrate that knowledge and understanding of the users to help Adobe internally.
Build relationships with leaders in the identified communities.
Establish yourself as a leader through your work and participation in time-sensitive topics and conversations.
Answer questions and engage in discussion about Adobe products and policies with a heavy focus on newcomers to the ecosystem.
Encourage others through sharing your personal use of and experimentation with professional video tools.
Enable conversation, build content and speak about Adobe tools to address the specific audience needs.
Understand the competitive landscape and promptly report accordingly.
Coordinate with other community, product, marketing and campaign teams to develop mini-engagements and activities for the community (i.e. AMAs with the product team on Reddit, or community activities and discussions via live streams, etc.)
Work closely with the broader community team, evangelism team, and product teams to provide insight and feedback to advocate for the pro video community within Adobe and to help drive product development direction.
We are teetering on the cusp of a Cambrian explosion in UI creativity, with hundreds of developers competing to put amazing controls atop a phalanx of ever-improving generative models. These next couple of months & years are gonna be wiiiiiiild.
Watching this clip from the Today Show introduction of Photoshop in 1990, it’s amazing to hear the same ethical questions 32 years ago that we contend with now around AI-generated imagery. Also amazing: I now work with Russell‘s son Davis (our designer) to explore AI imaging + Photoshop and beyond.
Ever since DALL•E hit the scene, I’ve been wanting to know what words its model for language-image pairing would use to describe images:
Now the somewhat scarily named CLIP Interrogator promises exactly that kind of insight:
What do the different OpenAI CLIP models see in an image? What might be a good text prompt to create similar images using CLIP guided diffusion or another text to image model? The CLIP Interrogator is here to get you answers!
Here’s hoping it helps us get some interesting image -> text -> image flywheels spinning.
Hmm—this is no doubt brilliant tech, and I’d like to learn more, but I wonder about the Venn diagram between “Objects that people want in 3D,” “Objects for which a sufficiently large number of good images exist,” and “Objects for which good human-made 3D models don’t already exist.” In my experience photogrammetry is most relevant for making models from extremely specific subjects (e.g. a particular apartment) rather than from common objects that are likely to exist on Sketchfab et al. It’s entirely possible I’m missing a nuanced application here, though. As I say, cool tech!
I wish I’d gotten to work more with Steve Seitz at Google, as I’ve long admired his wide-ranging work (from Photosynth to Face Movies to the company’s new 3D video collaboration tech). Here he provides a pretty accessible overview of how large language models (e.g. those behind DALL•E & similar systems) actually work:
Though we don’t (yet?) have the ability to use 3D meshes (e.g. those generated from a photo of a person) to guide text-based synthesis through systems like DALL•E, here’s a pretty compelling example of making 2D art, then wrapping it onto a body in real time:
Somehow I totally missed this announcement a few months back—perhaps because the device apparently isn’t compatible with my Mavic 2 Pro. I previously bought an Insta360 One R (which can split in half) with a drone-mounting cage, but I found the cam so flaky overall that I never took the step of affixing it to a cage that was said to interfere with GPS signals. In any event, this little guy looks fun:
“This emerging tech isn’t perfect yet, so we got some weird results along with ones that looked like Heinz—but that was part of the fun. We then started plugging in ketchup combination phrases like ‘impressionist painting of a ketchup bottle’ or ‘ketchup tarot card’ and the results still largely resembled Heinz. We ultimately found that no matter how we were asking, we were still seeing results that looked like Heinz.”
I mentioned Meta Research’s DALL•E-like Make-A-Scene tech when it debuted recently, but I couldn’t directly share their short overview vid. Here’s a quick look at how various artists have been putting the system to work, notably via hand-drawn cues that guide image synthesis:
Speaking of 360º vids, Stewart & Alina share a range of great points on “reframing with purpose” (serving the storytelling), plus technical details on relative sharpness (it’s much greater towards the center), color profiles, and more.