4K4D: High Resolution Dynamic 3D Scenes

Michael Rubloff

Michael Rubloff

Oct 18, 2023

Email
Copy Link
Twitter
Linkedin
Reddit
Whatsapp
4K4D
4K4D

I apologize I've been a bit slow this week. I've been at Ad Week in Manhattan and am finally sitting down to write. There's been a ton to cover and I'm really excited to bring all of the news to you, starting with 4K4D.

Throughout this year, we've seen papers begin to emerge that are able to handle dynamic NeRFs and Gaussian Splats, something that I personally thought was impossible in January. But time and time again, I've been proven wrong.

4K4D, stands for 4K 4D point cloud representation, offers a new approach to dynamic view synthesis. There are a few parts that make up 4K4D's structure.

Initially, a coarse point cloud of the dynamic scene is derived using a space carving algorithm. Space carving is a computer vision method that helps reconstructs a 3D model of a scene or object. For the dynamic portions of a scene, they use a segmentation method to obtain their masks and then space carving is used to get the coarse point cloud. However, on the static portions of a scene, say a brick wall, they actually are using Instant-NGP to train the initial point clouds. These are each trained to 250 and 300K steps, respectively.

For each point in the scene, its position is modeled as a learnable vector. A predefined 4D feature grid assigns a feature vector to every point. This vector is then processed through MLP networks to predict several attributes, including the point's radius, density, and spherical harmonics coefficients.

The big differentiator is the implementation of a 4D point cloud representation and a hybrid appearance model. But what exactly is it a hybrid of? The answer lies in a intersection of spherical harmonics and an image blending model.

Interestingly, they found that MLP based spherical harmonic models don't represent dynamic scenes well, so they added an image blending model with spherical harmonics. This in turn leads to more accurate scene appearance. This also has another well thought out piece to it— this image blending network is separate from the viewing direction, which allows it to be pre-computed post training and directly contributes to faster rendering speed. And that rendering speed is on another level compared to other methods, offering a 30X speedup. On a 4090, they're able to achieve 400fps at 1080 and 80 fps at 4K.

Comparatively, 3D Gaussian Splatting only utilizes spherical harmonics as part of the pipeline. The hybrid approach allows them to fully exploit the input images, which in turn leads to a higher fidelity output.

One major innovation is the differentiable depth peeling algorithm that is based specifically upon the 4K4D method, is able to leverage the hardware rasterizer because they start with a point cloud representation, leading to the impressive the rendering speeds you see above. This is a custom shader that implements the depth peeling algorithm.

There are a ton of potential use cases for this technology, ranging from the more niche such as sports replays and dance choreography to full scale productions and memory capture. It also doesn't seem totally dissimilar from what Apple has shown from its Spatial Videos. As we get closer to the unveiling of the Vision Pro, I am inordinately curious about how they are powering it.

The data footprint does increase linearly with the length of the input video, so the longer the video, the more computationally heavy it becomes. The authors mention that this is an area where it can be improved, so that other use cases, such as a 4D play or movie can exist.

Please note, that this is not something that can be generated immediately. The examples that are shown here took roughly 24 hours to train before they're ready to view. Impressively, they're trained to 800K steps, across 200 original images. At least as we head into the winter months, you'll have something to keep your home warm overnight. What is further impressive, is that this is able to be run on a single 4090, showing that 4K4D is accessible to patient consumers. Unfortunately people will need to be a little more patient, as they have not released the source code, yet.

Featured

Recents

Featured

Platforms

GSOPs 2.0: Now Commercially Viable with Houdini Commercial License

The 2.0 release for GSOPs is here with a commercial license!

Michael Rubloff

Dec 20, 2024

Platforms

GSOPs 2.0: Now Commercially Viable with Houdini Commercial License

The 2.0 release for GSOPs is here with a commercial license!

Michael Rubloff

Dec 20, 2024

Platforms

GSOPs 2.0: Now Commercially Viable with Houdini Commercial License

The 2.0 release for GSOPs is here with a commercial license!

Michael Rubloff

Platforms

Odyssey Announces Generative World Model, Explorer

Odyssey shows off their photo real world generator, powered by Radiance Fields.

Michael Rubloff

Dec 18, 2024

Platforms

Odyssey Announces Generative World Model, Explorer

Odyssey shows off their photo real world generator, powered by Radiance Fields.

Michael Rubloff

Dec 18, 2024

Platforms

Odyssey Announces Generative World Model, Explorer

Odyssey shows off their photo real world generator, powered by Radiance Fields.

Michael Rubloff

Platforms

PICO Splat for Unreal Engine Plugin

The Unreal Engine plugin for Pico headsets has been released in beta.

Michael Rubloff

Dec 13, 2024

Platforms

PICO Splat for Unreal Engine Plugin

The Unreal Engine plugin for Pico headsets has been released in beta.

Michael Rubloff

Dec 13, 2024

Platforms

PICO Splat for Unreal Engine Plugin

The Unreal Engine plugin for Pico headsets has been released in beta.

Michael Rubloff

Research

HLOC + GLOMAP Repo

A GitHub repo from Pablo Vela has integrated GLOMAP with HLOC.

Michael Rubloff

Dec 10, 2024

Research

HLOC + GLOMAP Repo

A GitHub repo from Pablo Vela has integrated GLOMAP with HLOC.

Michael Rubloff

Dec 10, 2024

Research

HLOC + GLOMAP Repo

A GitHub repo from Pablo Vela has integrated GLOMAP with HLOC.

Michael Rubloff