Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.
Gaustudio wants to be a flexible and powerful Gaussian Splatting toolbox, capable of adding new methods as they are released. On its current release, it already supports meshing. The baseline from which they are operating is already more complete than a large amount of platforms and potentially rivaling Nerfstudio in the variety of methods supported.
There are three main takeaways from what Gaustudio is launching with:
Modular Design: GauStudio is highly modular, allowing various components to be combined or replaced to tailor 3D scene modeling methods. This adaptability extends to different foreground models, background models, and other components.
Hybrid Representation: GauStudio introduces a hybrid Gaussian representation, integrating foreground models with skyball background models. This innovation significantly reduces artifacts in unbounded outdoor scenes and enhances novel view synthesis.
Gaussian Splatting Surface Reconstruction (GauS): A standout feature is the GauS module, a render-then-fuse approach that converts 3DGS inputs into high-fidelity mesh reconstructions without the need for fine-tuning.
Gaustudio addresses challenges in novel view synthesis by enabling tailored modeling pipelines for different tasks. Its architecture encompasses several key stages, including scene initialization, optimization with geometric and sparsity regularizers, enhancement of 3D Gaussians’ representation ability, and scene compression through learnable or geometric pruning.
The framework not only simplifies the integration of 3DGS techniques but also introduces efficient modules for surface reconstruction and customizable background modeling. Particularly, the GauS module easily extracts textured meshes, offering a versatile solution across different 3DGS based methods.
Their Gaussians to Mesh pipeline is quite fast, taking about 90 seconds to complete. Building on the foundation set by SuGaR, they employ a volumetric fusion approach, render the median depth, and fuse it into a mesh using VDBFusion. This method is compatible with most of the existing 3DGS platforms out there and <a href="https://radiancefields.com/kiri-engine-adds-mesh-to-3dgs/" data-type="post" data-id="8740" target="_blank" rel="noreferrer noopener">Kiri Engine is using Gaustudio</a> to power its mesh extraction from captures.
Gaussian Splatting and radiance fields are famous for their ability to model accurate view dependent features, like reflections, shifting light, and shadows. Whereas the original Gaussian Splatting implementation uses Spherical Harmonics to model their view dependent effects, it has been a discussion on the optimal choice.Aras gave a good overview of why Spherical Harmonics are potentially not the best representation choice, given their large memory footprint.
Gaustudio instead opts for Spherical Gaussians for modeling specular components within scenes and Spherical Gaussians for balancing image quality and rendering speed. With these changes, they're able to retain or exceed the same fidelity level, but increase the render speed and drop memory footprint.
However, they also support material and lighting decomposition and are achieving this through associating each point with additional PBR parameters, such as opacity, culling, and roughness in addition to indirect illumination. Also boosting real time rates are Neural Feature Vectors, where each point is associated with a compact Neural Feature Vector and then run through a small multi layer perceptron.
Gaustudio even extends to scenes with sparse input information, such as the one we looked at recently with InstantSplat, with generalized Gaussian Splatting initialization. Just like the name states, they initialize the Gaussians using pre-computed properties from a generalized Gaussian splatting model for further optimization. This helps with the consistency of the scene when looking at a novel viewpoint. I would be very curious to see and compare InstantSplat and given Gaustudio's modular approach, how they could even better be integrated with one another.
As Gaussian Splatting platforms have emerged, one of the big considerations and comparisons have been how each one handles large outdoor scenes, in terms of representing an ever extending skyline. It's quite a challenge to accurately model this through Gaussians, but certainly not impossible. There have not been a ton of open source methods to address this, Gaustudio specifically handles this by incorporating a spherical environment map composed of Gaussians to model the sky separately from the foreground in the rendering model.
It's clear that Gaustudio will support quite a few of the leading Gaussian Splatting research papers, including Gaussian Pro, SuGaR, and Mip-Splatting data. Enabling people to train these methods are part of their roadmap. Additionally as part of the roadmap are the inclusion of papers such as VastGaussian, Scaffold-GS, Triplane-GS, and FSGS.
Almost all of Gaustudio is MIT Licensed, with the exception of the rasterizer. However, Gaustudio is taking steps to prevent any miscommunication by having a flag or warning in the training pipeline to make users aware when they are utilizing non-commercial modules derived from the 3DGS codebase, similar to OpenCV and plans to offer all necessary commercial-free substitute modules for companies, ensuring they can utilize Gaustudio without any licensing conflicts.
Right now Gaustudio has only been tested on Linux, but their team is currently working on a Gradio instance and testing Windows. There is still a lot of work to continue to enhance and build out the platform, but the emergence of another modular and wide ranging model will hopefully allow and encourage people to build larger and larger implementations.
As the methods continue to evolve, it's important that the frameworks remain flexible and modular. With that in mind, Gaustudio is quite flexible and allows it to extend to different foreground models, background models, and other components, offering unprecedented flexibility.