Tri-MipRF was one of the more underrated NeRF papers to be released last year. Now we're seeing a progression of the work Tri-Mip created with Rip-NeRF. As you might imagine a successor to be, Rip-NeRF is slightly slower, but a bit higher fidelity than its predecessor.
Rip-NeRF combines two main concepts, a Platonic Solid Projection and a Ripmap encoding for which the method derives its name from. The combination of these allows Rip-NeRF to efficiently manage and render highly detailed and anti-aliased images from novel viewpoints. This dual approach not only addresses the inherent limitations in previous NeRF implementations, such as blurring and aliasing, but also improves on both the memory consumption and processing time.
A Platonic Solid Projection is a spatial factorization technique that breaks down the 3D space into simpler, manageable segments. Specifically, it projects the 3D anisotropic areas onto the faces of a Platonic solid. The 3D space is segmented based on the faces of a Platonic solid—such as a cube, tetrahedron, or dodecahedron. Each solid has faces that are regular, congruent polygons, making them ideal for uniformly distributed projections.
They move from a cube triplane over to a shape known as an icosahedron, which also serves as the Platonic solid. They actually tried experimenting with quite a few different shapes (cubes, tetrahedrons, dodecahedrons) and there's a clear correlation between the shape and the resulting fidelity and time to train.
Each face of the chosen Platonic solid acts as a projection plane. The complex 3D volume is projected onto these 2D planes, which simplifies the representation of the scene. This is particularly effective in capturing anisotropic features—those that have directional dependencies.
Complementing the Platonic Solid Projection, Ripmap Encoding precisely characterizes the 2D projections derived from the 3D space. It is a form of texture mapping that enhances the representation of anisotropic areas. Ripmaps are essentially an extension of mipmaps, designed to handle anisotropic textures more effectively. They are constructed by pre-filtering a learnable feature grid with anisotropic kernels, which allows for different levels of detail depending on the viewing direction and distance.
I wasn't sure what the VRAM draw was going to be for Rip-NeRF, but you will be able to run it on upper end consumer cards, as it takes about 20GB. The training time isn't completely terrible either, with the full method of 120K steps taking about two hours and the lite version of ~30 minutes. Compared to the time it takes to train a capture on Nerfacto-Huge (4 hours) or Zip-NeRF in Nerfstudio (55 minutes), it's not bad.
I am always excited to see NeRF papers that continue to push fidelity levels higher, even at the cost of training time for now. Rip-NeRF can be further explored on their Project page or on their Github, where their code has already been published. Please note that as of this time, there has not been a license assigned to the project yet.