All posts by jnack

“Who Do We Want Our Customers to Become?”

As I’ve noted previously, this essay from Slack founder Stewart Butterfield is a banger. You should read the whole thing if you haven’t—or re-read it if you have—and care about building great products. In my new role exploring the crazy, sometimes scary world of AI-first creativity tools, I find myself meditating on this line:

Who Do We Want Our Customers to Become?… We want them to become relaxed, productive workers… masters of their own information and not slaves… who communicate purposively.

I want customers to be fearless explorers—to F Around & Find Out, in the spirit of Walt Whitman:

Yes, this is way outside Adobe’s comfort zone—but I didn’t come back here to be comfortable. Game on.

How-to for app developers combating misinformation

Although it’s just one piece of a large puzzle, the Content Authenticity Initiative is working to help toolmakers add content credentials that help establish the original of digital media & disclose what edits have been done to it.

If you make imaging-related tools, check out this in-depth workshop exploring Adobe’s three open-source products for adding CAI support:

Content credentials, currently integrated into Adobe Photoshop, will now be available for other products and services through three simple open-source tools. Dave Kozma, Eric Scouten, and Gavin Peacock from the CAI team will show off the new JavaScript SDK, Rust Toolkit, and Command Line Utility. CAI Lead Product Designer and C2PA UX Task Force Co-Chair Pia Blumenthal will walk through the UX guidelines, use cases, and trust signals these new tools enable.

Fun, these clone wars are

Cristóbal Valenzuela from Runway ML shared a fun example of what’s possible via video segmentation & overlaying multiple takes of a trick:

As a commenter noted, the process (shown in this project file) goes like this:

  1. Separate yourself from the background in each clip
  2. Throw away all backgrounds but one, and stack up all the clips of just you (with the success on top).

Coincidentally, I just saw Russell Brown posting a fun bonus-limbed image:

New podcast: DALL•E & You & Me

On Friday I had a ball chatting with Brian McCullough and Chris Messina on the arrival of DALL•E & other generative-imaging tech on the Techmeme Ride Home podcast. The section intro begins at 31:30, with me chiming in at 35:45 & riffing for about 45 minutes. I hope you enjoy listening as much as I enjoy talking (i.e. one heck of a lot 😅), and I’d love to know what you think.

Image credit: August Kamp + DALL•E

I’ve gathered links to some of the topics we discussed:

  • Don’t Give Your Users Shit Work. Seriously. But knowing just where to draw the line between objectively wasteful crap (e.g. tedious file format conversion) and possibly welcome labor (e.g. laborious but meditative etching) isn’t always easy. What happens when you skip the proverbial 10,000 hours of practice required to master a craft? What happens when everyone in the gym is now using a mech suit that lifts 10,000 lbs.?
  • Vemödalen: The Fear That Everything Has Already Been Done,” is demonstrated with painful hilarity via accounts like Insta Repeat. (And to make it meta, there’s my repetition of the term.) “So we beat on, boats against the current, borne back ceaselessly into the past…” Or as Marshawn Lynch might describe running through one’s face, “Over & over, and over & over & over…”
  • As Louis CK deftly noted, “Everything is amazing & nobody’s happy.”
  • The disruption always makes me think of The Onion’s classic “Dolphins Evolve Opposable Thumbs“: “Holy f*ck, that’s it for us monkeys.” My new friend August replied with the armed dolphin below. 💪👀
  • A group of thoughtful creators recently mused on “What AI art means for human artists.” Like me, many of them likened this revolution to the arrival of photography in the 19th century. It immediately devalued much of what artists had labored for years to master—yet in doing so it freed them up to interpret the world more freely (think Impressionism, Cubism, etc.).
  • Content-Aware Fill was born from the amazing PatchMatch technology (see video). We got it into Photoshop by stripping it down to just one piece (inpainting), and I foresee similar streamlined applications of the many things DALL•E-type tech can do (layout creation, style transfer, and more).
  • StyleCLIP is my team’s effort to edit faces via text by combining OpenAI’s CLIP (part of DALL•E’s magic sauce) with NVIDIA’s StyleGAN. You can try it out here.
  • Longtime generative artist Mario Klingemann used GPT-3 to coin a name for Promptomancy. I wonder how long these incantations & koans will remain central, and how quickly we’ll supplement or even supplant them with visual affordances (presets, sliders, grids, etc.).
  • O.C.-actor-turned-author Ben McKenzie wrote a book on crypto that promises to be sharp & entertaining, based on the interviews with him I’ve heard.
  • Check out the DALL•E-made 3D Lego Teslas that, at a glance, fooled longtime Pixar vet Guido Quaroni. I also love these gibberish-filled ZZ Top concert posters.
  • My grand-mentee (!) Joanne is the PM for DALL•E.
  • Bill Atkinson created MacPaint, blowing my 1984 mind with breakthroughs like flood fill. The arrival of DALL•E feels so similar.

Lightroom now supports video editing

At last…!

Per the team blog (which lists myriad other improvements):

The same edit controls that you already use to make your photography shine can now be used with your videos as well! Not only can you use Lightroom’s editing capabilities to make your video clips look their best, you can also copy and paste edit settings between photos and videos, allowing you to achieve a consistent aesthetic across both your photos and videos. Presets, including Premium Presets and Lightroom’s AI-powered Recommended Presets, can also be used with videos. Lightroom also allows you to trim off the beginning or end of a video clip to highlight the part of the video that is most important.

And here’s a fun detail:

Video: Creative — to go along with Lightroom’s fantastic new video features, these stylish and creative presets, created by Stu Maschwitz, are specially optimized to work well with videos.

I’ll share more details as I see tutorials, etc. arrive.

DALL•E text as Italian comedic gibberish

Amidst my current obsession with AI-generated images, I’ve been particularly charmed by DALL•E’s penchant for rendering some delightfully whacked-out text, as in these concert posters:

This reminded me of an old Italian novelty song meant to show non-native English speakers what the language sounds like to non-speakers. Enjoy. 😛

Just scratching the surface on generative inpainting

I’m having a ball asking the system to create illustrations, after which I can select regions and generate new variations. Click/tap if needed to play the animation below:

It’s a lot of fun for photorealistic work, too. Here I erased everything but Russell Brown’s face, then let DALL•E synthesize the rest:

And check out what it did with a pic of my wife & our friend (“two women surrounded by numerous sugar skulls and imagery from día de los muertos, in the style of salvador dalí, digital art”). 💃🏻💀👀

“The AI that creates any picture you want, explained”

Obviously I’m almost criminally obsessed with DALL•E et al. (sorry if you wanted to see my normal filler here 😌). Here’s an accessible overview of how we got here & how it all works:

The vid below gathers a lot of emerging thoughts from sharp folks like my teammate Ryan Murdock & my friend Mario Klingemann. “Maybe the currency is ideas [vs. execution]. This is a future where everyone is an art director,” says Rob Sheridan. Check it out:

[Via Dave Dobish]

“Content-Aware Fill… cubed”: DALL•E inpainting is nuts

The technology’s ability not only to synthesize new content, but to match it to context, blows my mind. Check out this thread showing the results of filling in the gap in a simple cat drawing via various prompts. Some of my favorites are below:

Also, look at what it can build out around just a small sample image plus a text prompt (a chef in a sushi restaurant); just look at it!

Making the “Gaslit” Main Titles

“Not a single keyframe of animation was set in the making of the title, created by tweaking and bending the alignment knobs of a vintage TV,” writes Anthony Vitagliano. “Instead, I shot it using a vintage Montgomery Ward ‘Airline’ Portable Television, an iPhone, and a patchwork of cables and converters in my basement.”

Check out the results:

See Anthony’s site for high-res captures of the frames.

Mobile DALL•E = My kind of location-based AR

I’ve long considered augmented reality apps to be “realtime Photoshop”—or perhaps more precisely, “realtime After Effects.” I think that’s true & wonderful, but most consumer AR tends to be ultra-confined filters that produce ~1 outcome well.

Walking around San Francisco today, it struck me today that DALL•E & other emerging generative-art tools could—if made available via a simple mobile UI—offer a new kind of (almost) realtime Photoshop, with radically greater creative flexibility.

Here I captured a nearby sculpture, dropped out the background in Photoshop, uploaded it to DALL•E, and requested “a low-polygon metallic tree surrounded by big dancing robots and small dancing robots.” I like the results!

Meet “Imagen,” Google’s new AI image synthesizer

What a time to be alive…

Hard on the heels of OpenAI revealing DALL•E 2 last month, Google has announced Imagen, promising “unprecedented photorealism × deep level of language understanding.” Unlike DALL•E, it’s not yet available via a demo, but the sample images (below) are impressive.

I’m slightly amused to see Google flexing on DALL•E by highlighting Imagen’s strengths in figuring out spatial arrangements & coherent text (places where DALL•E sometimes currently struggles). The site claims that human evaluators rate Imagen output more highly than what comes from competitors (e.g. MidJourney).

I couldn’t be more excited about these developments—most particularly to figure out how such systems can enable amazing things in concert with Adobe tools & users.

What a time to be alive

The AP joins the Content Authenticity Initiative

Adobe & partners in fighting disinformation have continued to grow their ranks, and it’s great to see the Associated Press coming on board:

With reporting from 250 locations around the world, AP is a key addition to the CAI’s mission to help consumers everywhere better understand the provenance and attribution of images and video. 

“We are pleased to join the CAI in its efforts to combat misinformation and disinformation around photojournalism,” said AP Director of Photography David Ake. “AP has worked to advance factual reporting for over 175 years. Teaming up to help ensure the authenticity of images aligns with that mission.”  

Full-stack engineers: come work with me!

We are building some rad stuff (seriously, I wish I could show you already) and would love to have you join us:

We are looking for a versatile and passionate Senior Developer to join us and help drive full stack, complex component implementation. You’ll play a key role in architectural discussions, defining solutions, and solving highly technical issues. Our team builds both cloud services (Python, and C++) and web experiences (Javascript, typescript, web Components, etc …) . The winning candidate for this high impact role requires a deep knowledge about cloud-based architectures as well as a solid CS fundamentals.

Some key responsibilities:

  • Architect efficient and reusable full-stack systems that can support several different deep learned models
  • Design, architect, and implement multiple low-latency micro-services (we mostly use JavaScript, C++, and Python)
  • Building simple, robust, and scalable platforms used by many external users
  • Work closely with UX designers, Product managers, Machine Learning engineers to develop compelling experiences
  • Take a project from scoping requirements through the actual launch

Bird scooters use Google AR to curb illegal parking

Building on yesterday’s post about Google’s new Geospatial API, developers can now embed a live view featuring a camera feed + augmentations, and developers like Bird are wasting no time in putting it to use. TNW writes,

When parking a scooter, the app prompts a rider to quickly scan the QR code on the vehicle and its surrounding area using their smartphone camera… [T]his results in precise, centimeter-level geolocation that enables the system to detect and prevent improper parking with extreme accuracy — all while helping monitor user behavior.

TechCrunch adds,

Out of the over 200 cities that Lime serves, its VPS is live now in six: London, Paris, Tel Aviv, Madrid, San Diego and Bordeaux. Similar to Bird, Lime’s pilot involves testing the tech with a portion of riders. The company said results from its pilots have been promising, with those who used the new tool seeing a 26% decrease in parking errors compared to riders who didn’t have the tool enabled. 

A deep dive on Google’s new Geospatial API

My friend Bilawal & I collaborated on AR at Google, including our efforts to build a super compact 3D engine for driving spatial annotation & navigation. We’d often talk excitedly about location-based AR experiences, especially the Landmarker functionality arriving in Snapchat. All the while he’s been busy pushing the limits of photogrammetry (including putting me in space!) to scan 3D objects.

Now I’m delighted to see him & his team unveiling the Geospatial API (see blog post, docs, and code), which enables cross-platform (iOS, Android) deployment of experiences that present both close-up & far-off augmentations. Here’s the 1-minute sizzle reel:

For a closer look, check out this interesting deep dive into what it offers & how it works:

A 3lb. wind turbine promises power on the go

Hmm—dunno whether I’d prefer carrying this little dude over just pocketing a battery pack or two—but I dig the idea & message:

Once set up on its tripod, the 3-pound, 40-watt device automatically rotates towards the wind and starts charging its 5V, 12,000 mAh battery. (Alternatively it can charge your device directly via USB.) The company says that in peak conditions, the Shine Turbine can generate enough juice to charge a smartphone in just 20 minutes.

DALL•E vs. Reality: Flavawagon Edition

Heh—I got a kick out of seeing how AI would go about hallucinating its idea of what my flamed-out ’84 Volvo wagon looked like. See below for a comparison. And in retrospect, how did I not adorn mine with a tail light made from a traffic cone (or is it giant candy corn?) and “VOOFO NACK”? 😅

ApARtment hunting

Among the Google teams working on augmented reality, there was a low-key religious war about the importance of “metric scale” (i.e. matching real-world proportions 1:1). The ARCore team believed it was essential (no surprise, given their particular tech stack), while my team (Research) believed that simply placing things in the world with a best guess as to size, then letting users adjust an object if needed, was often the better path.

I thought of this upon seeing StreetEasy’s new AR tech for apartment-hunting in NYC. At the moment it lets you scan a building to see its inventory. That’s very cool, but my mind jumped to the idea of seeing 3D representations of actual apartments (something the company already offers, albeit not in AR), and I’m amused to think of my old Manhattan place represented in AR: drawing it as a tiny box at one’s feet would be metric scale. 😅 My God that place sucked. Anyway, we’ll see how useful this tech proves & where it can go from here.

“A StreetEasy Instagram poll found that 95% of people have walked past an apartment building and wondered if it has an available unit that meets their criteria. At the same time, 77% have had trouble identifying a building’s address to search for later.”

Google acquires wearable AR display tech

I got a rude awakening a couple of years ago while working in Google’s AR group: the kind of displays that could fit into “glasses that look like glasses” (i.e. not Glass-style unicorn protuberances) had really tiny fields of view, crummy resolution, short battery life, and more. I knew that my efforts to enable cloud-raytraced Volvos & Stormtroopers & whatnot wouldn’t last long in a world that prioritized Asteroids-quality vector graphics on a display the size of a 3″x5″ index card held at arm’s length.

Having been out of that world for a year+ now, I have no inside info on how Google’s hardware efforts have been evolving, but I’m glad to see that they’re making a serious (billion-dollar+) investment in buying more compelling display tech. Per The Verge,

According to Raxium’s website, a Super AMOLED screen on your phone has a pixel pitch (the distance between the center of one pixel, and the center of another pixel next to it) of about 50 microns, while its MicroLED could manage around 3.5 microns. It also boasts of “unprecedented efficiency” that’s more than five times better than any world record.

How does any of this compare to what we’ll see out of Apple, Meta, Snap, etc.? I have no idea, but at least parts of the future promise to be fun.

New courses: Machine learning for typography & art

These Cooper Union courses sound fun:

Each week we’ll cover a different aspect of machine learning. A short lecture covering theories and practices will be followed by demoes using open source web tools and a web-browser tool called Google Colab. The last 3 weeks of class you’ll be given the chance to create your own project using the skills you’ve learned. Topics will include selecting the right model for your use case, gathering and manipulating datasets, and connecting your models to data sources such as audio, text, or numerical data. We’ll also talk a little ethics, because we can’t teach machine learning without a little ethics.

AI: Sam Harris talks with Eric Schmidt

I really enjoyed this conversation—touching, as it does, on my latest fascination (AI-generated art via DALL•E) and myriad other topics. In fact, I plan to listen to it again—hopefully this time near a surface through which to jot down & share some of the most resonant observations. Meanwhile, I think you’ll find it thoughtful & stimulating.

In this episode of the podcast, Sam Harris speaks with Eric Schmidt about the ways artificial intelligence is shifting the foundations of human knowledge and posing questions of existential risk.

A charming Route 66 doodle from Google

Last year I took my then-11yo son Henry (aka my astromech droid) on a 2000-mile “Miodyssey” down Route 66 in my dad’s vintage Miata. It was a great way to see the country (see more pics & posts than you might ever want), and despite the tight quarters we managed not to kill one another—or to get slain by Anton Chigurh in an especially murdery Texas town (but that’s another story!).

In any event, we were especially charmed to see the Goog celebrate the Mother Road in this doodle:

Outdoor Photographer reviews Adobe Super Resolution

Great to see Adobe AI getting some love:

Adobe Super Resolution technology is the best solution I’ve yet found for increasing the resolution of digital images. It doubles the linear resolution of your file, quadrupling the total pixel count while preserving fine detail. Super Resolution is available in both Adobe Camera Raw (ACR) and Lightroom and is accessed via the Enhance command. And because it’s built-in, it’s free for subscribers to the Creative Cloud Photography Plan.

Check out the whole article for details.

Snapchat rolls out Landmarker creation tools; Disney deploys them

Despite Pokemon Go’s continuing (and to me, slightly baffling) success, I’ve long been much more bullish on Snap than Niantic for location-based AR. That’s in part because of their very cool world lens tech, which they’ve been rolling out more widely. Now they’re opening up the creation flow:

“In 2019, we started with templates of 30 beloved sites around the world which creators could build upon called Landmarkers… Today, we’re launching Custom Landmarkers in Lens Studio, letting creators anchor Lenses to local places they care about to tell richer stories about their communities through AR.”

Interesting stats:

At its Lens Fest event, the company announced that 250,000 lens creators from more than 200 countries have made 2.5 million lenses that have been viewed more than 3.5 trillion times. Meanwhile, on Snapchat’s TikTok clone Spotlight, the app awarded 12,000 creators a total of $250 million for their posts. The company says that more than 65% of Spotlight submissions use one of Snapchat’s creative tools or lenses.

On a related note, Disney is now using the same core tech to enable group AR annotation of the Cinderella Castle. Seems a touch elaborate:

  • Park photographer takes your pic
  • That pic ends up in your Disney app
  • You point that app at the castle
  • You see your pic on the castle
  • You then take a pic of your pic on the castle… #YoDawg :upside_down_face:

NASA celebrates Hubble’s 32nd birthday with a lovely photo of five clustered galaxies

Honestly, from DALL•E innovations to classic mind-blowers like this, I feel like my brain is cooking in my head. 🙃 Take ‘er away, science:

Bonus madness (see thread for details):