Getting higher and higher quality outputs from dynamic radiance fields has always been an exciting prospect. We've seen a handful of methods emerge, including one company, but today we're getting another one that seems to be stepping it up a notch. This is 3DGStream and it's fast to train, fast to render, and comes with high fidelity.
Let's jump into it.
3DGStream introduces a novel method that leverages 3D Gaussian Splatting (3DGS) for scene representation, combined with a Neural Transformation Cache (NTC) and an adaptive 3DG addition strategy. 3D Gaussian Splatting is a type of radiance field that allows for hyper-realistic reconstructions that uses standard 2D images as inputs.
The Neural Transformation Cache (NTC) is crucial for 3DGStream's efficiency, offering a streamlined approach to model the complex movements in dynamic scenes. Unlike traditional methods that might retrain or extensively adjust models for each frame, the NTC operates on a more granular level by focusing on the movements of 3DGS — essentially, small volumetric elements that collectively represent the scene.
The brilliance of the NTC lies in its ability to efficiently capture and model translations (movements) and rotations of these 3D Gaussians between frames. This is achieved through a compact, highly optimized structure that combines multi-resolution hash encoding with a shallow, fully-fused Multilayer Perceptron (MLP). The hash encoding allows for a dense representation of the scene, ensuring that dynamic regions receive more attention during the optimization process. This design choice not only makes the NTC highly adaptable but also significantly reduces the computational overhead, allowing for the rapid transformation of the 3D Gaussians with minimal storage requirements.
But dynamic scenes are not just about moving objects; they often involve elements that appear or disappear from view. This is where the adaptive 3DGS addition strategy comes into play, showcasing 3DGStream's adaptability and attention to detail. The method intelligently introduces new 3D Gaussians to model emerging objects — like a car entering the frame or a ball being thrown — without the need for manual adjustments or retraining from scratch.
The strategy employs a two-pronged approach: first, it identifies areas in the scene where new objects emerge based on the spatial gradients of existing 3D Gaussians. This ensures that new Gaussians are only added where necessary, keeping the model efficient and focused. Next, it introduces these new 3DGS in a way that mimics the scene's existing structure, adjusting their attributes (such as size, color, and opacity) to seamlessly integrate with the surrounding environment.
The adaptive 3DGS addition strategy not only adds new elements. It also carefully considers the scene's coherence and computational efficiency. By selectively spawning new 3D Gaussians and optimizing them alongside existing ones, 3DGStream maintains high-quality renderings without significantly increasing the model's complexity or storage requirements. Remarkably, 3DGStream achieves its efficiency with a model size of under 10MB, challenging preconceptions about the storage requirements for high-quality dynamic scene rendering.
The seamless integration of the NTC and the adaptive 3DGS addition strategy allows 3DGStream to offer something truly remarkable: the ability to stream complex, dynamic scenes in real-time, with photorealistic quality and unprecedented efficiency. By focusing on the minutiae of scene changes and intelligently adapting to new elements, 3DGStream ensures that each frame is rendered with the utmost fidelity and detail.
3DGStream renders in real-time, while also completing its training remarkably quickly. 3DGStream revolutionizes the FVV streaming process by achieving fast per-frame reconstruction within just 12 seconds and rendering at an astonishing 200 frames per second. Interestingly the paper lists COLMAP as a potential bottleneck and aims to increase fidelity as other models improve. Poor COLMAP has been put through the wringer this year, despite virtually all methods using the Structure from Motion method.
The question that I continue to see pop up more and more: What will replace COLMAP? Will DUSt3R emerge as an answer?
Another potential issue is the usage of Spherical Harmonic Rotation. The authors have noted that it has a minimal impact on the reconstruction quality, despite it taking up a majority of time.
The experiments conducted by Sun and colleagues show that 3DGStream surpasses the existing methods in terms of speed and efficiency, but also maintains high-quality image rendering. This positions 3DGStream as a promising solution for real-world applications, bringing us just a little closer to the seamless integration of virtual and real worlds.
<p>Unfortunately, it's going to be a while before we see any code, if at all. 3DGStream has been accepted into CVPR 2024 and may or may not be releasing its code by June of this year. It is however a better glimpse into where we are in the state of dynamic reconstruction.