I’m so pleased & even proud (having at least having offered my encouragement to him over the years) to see my buddy Bilawal spreading his wings and spreading the good word about AI-powered creativity.
Check out his quick thoughts on “Channel-surfing realities layered on top of the real world,” “3D screenshots for the real world,” and more:
Favorite quote 😉:
“All they need to do is have a creative vision, and a Nack for working in concert with these AI models”—beautifully said, my friend! 🙏😜. pic.twitter.com/f6oUNSQXul
I’m still digging out (of email, Slack, and photos, but thankfully no longer of literal snow) following last weekend’s amazing photo adventure in Ely, NV. I need to try processing more footage via the amazing Luma app, but for now here’s a cool 3D version of the Nevada Northern Railway‘s water tower, made simply by orbiting it with my drone & uploading the footage:
Last month Paul Trillo shared some wild visualizations he made by walking around Michelangelo’s David, then synthesizing 3D NeRF data. Now he’s upped the ante with captures from the Louvre:
NeRFs of the Louvre made from a handful of short iPhone videos shot during a location scout last month. Each shot reimagined over a month later. The impossibilities are endless. More to come…
Here’s an example made from a quick capture I did of my friend (nothing special, but amazing what one can get simply by walking in a circle while recording video):
As luck (?) would have it, the commercial dropped on the third anniversary of my former teammate Jon Barron & collaborators bringing NeRFs into existence:
Three years ago today, the project that eventually became NeRF started working (positional encoding was the missing piece that got us from "hmm" to "wow"). Here's a snippet of that email thread between Matt Tancik, @_pratul_, @BenMildenhall, and me. Happy birthday NeRF! pic.twitter.com/UtuQpWsOt4
This stuff—creating 3D neural models from simple video captures—continues to blow my mind. First up is Paul Trillo visiting the David:
Finally got to see Michaelangelo's David in Florence and rather than just take a photo like normal person, I spent 20 minutes walking around it capturing every angle looking like an insane person. It's hard to look cool when making a #NeRF but damn it looks cool later @LumaLabsAIpic.twitter.com/sLGJ2CKCJy
It’s insane to me how much these emerging tools democratize storytelling idioms—and then take them far beyond previous limits. Recently Karen X. Cheng & co. created some wild “drone” footage simply by capturing handheld footage with a smartphone:
Now they’re creating an amazing dolly zoom effect, again using just a phone. (Click through to the thread if you’d like details on how the footage was (very simply) captured.)
NeRF update: Dollyzoom is now possible using @LumaLabsAI I shot this on my phone. NeRF is gonna empower so many people to get cinematic level shots Tutorial below –
Last year my friend Bilawal Singh Sidhu, a PM driving 3D experiences for Google Maps/Earth, created an amazing 3D render (also available in galactic core form) of me sitting atop the Trona Pinnacles. At that time he used “traditional” photogrammetry techniques (kind of a funny thing to say about an emerging field that remains new to the world), and this year he tried processing the same footage (comprised of a couple simple orbits from my drone) using new Neural Radiance Field (“NeRF”) tech:
The power & immersiveness of rendering 3D from images is growing at an extraordinary rate. NeRF Studio promises to make creation much more approachable:
Check out this high-speed overview of recent magic courtesy of my friend Bilawal:
Photogrammetry is an art form that has been around for decades, but it’s never looked better thanks to ML techniques like Neural Radiance Fields (NeRF). This video shows a wide range of 3D captures made using this technique. And I gotta say, NeRF really breathes new life into my old photo scans! All these datasets were posed in COLMAP and trained + rendered with NVIDIA’s free Instant NGP tools.
The visualizations for StyleNeRF tech are more than a little trippy, but the fundamental idea—that generative adversarial networks (GANs) can enable 3D control over 2D faces and other objects—is exciting. Here’s an oddly soundtracked peek:
And here’s a look at the realtime editing experience:
“This is certainly the coolest thing I’ve ever worked on, and it might be one of the coolest things I’ve ever seen.”
My Google Research colleague Jon Barron routinely makes amazing stuff, so when he gets a little breathless about a project, you know it’s something special. I’ll pass the mic to him to explain their new work around capturing multiple photos, then synthesizing a 3D model:
I’ve been collaborating with Berkeley for the last few months and we seem to have cracked neural rendering. You just train a boring (non-convolutional) neural network with five inputs (xyz position and viewing angle) and four outputs (RGB+alpha), combine it with the fundamentals of volume rendering, and get an absurdly simple algorithm that beats the state of the art in neural rendering / view synthesis by *miles*.
You can change the camera angle, change the lighting, insert objects, extract depth maps — pretty much anything you would do with a CGI model, and the renderings are basically photorealistic. It’s so simple that you can implement the entire algorithm in a few dozen lines of TensorFlow.