I’m still digging out (of email, Slack, and photos, but thankfully no longer of literal snow) following last weekend’s amazing photo adventure in Ely, NV. I need to try processing more footage via the amazing Luma app, but for now here’s a cool 3D version of the Nevada Northern Railway‘s water tower, made simply by orbiting it with my drone & uploading the footage:
It’s insane to me how much these emerging tools democratize storytelling idioms—and then take them far beyond previous limits. Recently Karen X. Cheng & co. created some wild “drone” footage simply by capturing handheld footage with a smartphone:
Now they’re creating an amazing dolly zoom effect, again using just a phone. (Click through to the thread if you’d like details on how the footage was (very simply) captured.)
Meanwhile, here’s a deeper dive on NeRF and how it’s different from “traditional” photogrammetry (e.g. in capturing reflective surfaces):
Last year my friend Bilawal Singh Sidhu, a PM driving 3D experiences for Google Maps/Earth, created an amazing 3D render (also available in galactic core form) of me sitting atop the Trona Pinnacles. At that time he used “traditional” photogrammetry techniques (kind of a funny thing to say about an emerging field that remains new to the world), and this year he tried processing the same footage (comprised of a couple simple orbits from my drone) using new Neural Radiance Field (“NeRF”) tech:
For comparison, here’s the 3D model generated via the photogrammetry approach:
Check out this high-speed overview of recent magic courtesy of my friend Bilawal:
Photogrammetry is an art form that has been around for decades, but it’s never looked better thanks to ML techniques like Neural Radiance Fields (NeRF). This video shows a wide range of 3D captures made using this technique. And I gotta say, NeRF really breathes new life into my old photo scans! All these datasets were posed in COLMAP and trained + rendered with NVIDIA’s free Instant NGP tools.
The visualizations for StyleNeRF tech are more than a little trippy, but the fundamental idea—that generative adversarial networks (GANs) can enable 3D control over 2D faces and other objects—is exciting. Here’s an oddly soundtracked peek:
And here’s a look at the realtime editing experience:
“This is certainly the coolest thing I’ve ever worked on, and it might be one of the coolest things I’ve ever seen.”
My Google Research colleague Jon Barron routinely makes amazing stuff, so when he gets a little breathless about a project, you know it’s something special. I’ll pass the mic to him to explain their new work around capturing multiple photos, then synthesizing a 3D model:
I’ve been collaborating with Berkeley for the last few months and we seem to have cracked neural rendering. You just train a boring (non-convolutional) neural network with five inputs (xyz position and viewing angle) and four outputs (RGB+alpha), combine it with the fundamentals of volume rendering, and get an absurdly simple algorithm that beats the state of the art in neural rendering / view synthesis by *miles*.
You can change the camera angle, change the lighting, insert objects, extract depth maps — pretty much anything you would do with a CGI model, and the renderings are basically photorealistic. It’s so simple that you can implement the entire algorithm in a few dozen lines of TensorFlow.