Great visual storytelling trickery, as always, from Karen X. Cheng:
I’m still digging out (of email, Slack, and photos, but thankfully no longer of literal snow) following last weekend’s amazing photo adventure in Ely, NV. I need to try processing more footage via the amazing Luma app, but for now here’s a cool 3D version of the Nevada Northern Railway‘s water tower, made simply by orbiting it with my drone & uploading the footage:
Last month Paul Trillo shared some wild visualizations he made by walking around Michelangelo’s David, then synthesizing 3D NeRF data. Now he’s upped the ante with captures from the Louvre:
Over in Japan, Tommy Oshima used the tech to fly around, through, and somehow under a playground, recording footage via a DJI Osmo + iPhone:
Pretty cool!
Here’s an example made from a quick capture I did of my friend (nothing special, but amazing what one can get simply by walking in a circle while recording video):
Karen X. Cheng, back with another 3D/AI banger:
As luck (?) would have it, the commercial dropped on the third anniversary of my former teammate Jon Barron & collaborators bringing NeRFs into existence:
Heh—before the holidays get past us entirely, check out this novel approach to 3D motion capture from the always entertaining Kevin Parry:
[Via Victoria Nece]
This stuff—creating 3D neural models from simple video captures—continues to blow my mind. First up is Paul Trillo visiting the David:
Then here’s AJ from the NYT doing a neat day-to-night transition:
And lastly, Hugues Bruyère used a 360º camera to capture this scene, then animate it in post (see thread for interesting details):
It’s insane to me how much these emerging tools democratize storytelling idioms—and then take them far beyond previous limits. Recently Karen X. Cheng & co. created some wild “drone” footage simply by capturing handheld footage with a smartphone:
Now they’re creating an amazing dolly zoom effect, again using just a phone. (Click through to the thread if you’d like details on how the footage was (very simply) captured.)
Meanwhile, here’s a deeper dive on NeRF and how it’s different from “traditional” photogrammetry (e.g. in capturing reflective surfaces):
Last year my friend Bilawal Singh Sidhu, a PM driving 3D experiences for Google Maps/Earth, created an amazing 3D render (also available in galactic core form) of me sitting atop the Trona Pinnacles. At that time he used “traditional” photogrammetry techniques (kind of a funny thing to say about an emerging field that remains new to the world), and this year he tried processing the same footage (comprised of a couple simple orbits from my drone) using new Neural Radiance Field (“NeRF”) tech:
For comparison, here’s the 3D model generated via the photogrammetry approach:
The file is big enough that I’ve had some trouble loading it on my iPhone. If that affects you as well, check out this quick screen recording:
The power & immersiveness of rendering 3D from images is growing at an extraordinary rate. NeRF Studio promises to make creation much more approachable:
The kind of results one can generate from just a series of photos or video frames is truly bonkers:
Here’s a tutorial on how to use it:
Check out this high-speed overview of recent magic courtesy of my friend Bilawal:
Photogrammetry is an art form that has been around for decades, but it’s never looked better thanks to ML techniques like Neural Radiance Fields (NeRF). This video shows a wide range of 3D captures made using this technique. And I gotta say, NeRF really breathes new life into my old photo scans! All these datasets were posed in COLMAP and trained + rendered with NVIDIA’s free Instant NGP tools.
The visualizations for StyleNeRF tech are more than a little trippy, but the fundamental idea—that generative adversarial networks (GANs) can enable 3D control over 2D faces and other objects—is exciting. Here’s an oddly soundtracked peek:
And here’s a look at the realtime editing experience:
“This is certainly the coolest thing I’ve ever worked on, and it might be one of the coolest things I’ve ever seen.”
My Google Research colleague Jon Barron routinely makes amazing stuff, so when he gets a little breathless about a project, you know it’s something special. I’ll pass the mic to him to explain their new work around capturing multiple photos, then synthesizing a 3D model:
I’ve been collaborating with Berkeley for the last few months and we seem to have cracked neural rendering. You just train a boring (non-convolutional) neural network with five inputs (xyz position and viewing angle) and four outputs (RGB+alpha), combine it with the fundamentals of volume rendering, and get an absurdly simple algorithm that beats the state of the art in neural rendering / view synthesis by *miles*.
You can change the camera angle, change the lighting, insert objects, extract depth maps — pretty much anything you would do with a CGI model, and the renderings are basically photorealistic. It’s so simple that you can implement the entire algorithm in a few dozen lines of TensorFlow.
Check it out in action:
[YouTube]