We've continued to get higher fidelity NeRFs in the last few months, but the trade off has always been sacrificing frame rate or being more computationally intensive to run.
The team from ARQUIMEA Research Center, Universidad Polit ́ecnica de Madrid, Universidad de las Palmas de Gran Canaria, IUSIANI Centre for Automation and Robotics, CSIC-UPM, and Institut de Robòtica i Informàtica Industrial, CSIC-UPC have published NeRFLight.
You might recognize some of the names such as Fernando Rivas-Manzaneque, Volinga's co-founder, and contributing author to this site.
NeRFLight is a NeRF architecture that allows for high quality representations with high framerates and low memory footprints. They're able to achieve this by splitting the volume of the density field used in NeRF into eight different regions, each with a different decoder, but sharing a common feature grid, which shockingly reduces the amount of features necessary by 8 times! They introduce even more efficiency by leveraging voxels in close proximity to have a more natural local gradient transition.
NeRFLight involves splitting the 3D volume, or the density field used in NeRF, into multiple regions. In this case, NeRFLight chosen to split it into eight regions.
The reason they chose eight regions can be better understood if we consider 3D space and a cube. The bounding box of the scene, which is a cube, is divided in half along each of its three dimensions: width, height, and depth. This results in eight smaller cubes, or regions, each with a side length that's half that of the original bounding box.
Each of these eight regions has its own density decoder, but they all share a common feature grid. The voxel grid resolution is kept constant across these regions, and the choice of resolution depends on the specific dataset. The feature located at each vertex of these grids has a dimension of 32.
To sum up, the process of dividing into eight regions comes from halving the 3D space along each of its three dimensions. This results in 2^3 = 8 regions, each a cube with a size equal to half the side of the original bounding box.
While all of this is astounding, what does it result in? Through this method they've been able to achieve 181 fps, with a model footprint of 14mb! That's truly insane. But surely, with all this improvement, there must be a significant quality trade-off, right? No; believe it or not, NeRFLight still posts comparable PSNR and SSIM levels to Instant NGP.
The Problem with Linear Grids
Traditionally, linear grids are used in the process of visual reconstruction. However, these often lead to discontinuities - abrupt changes or breaks in the visual presentation. Consequently, color decoders face difficulty handling this inconsistent representation. As a result, artifacts or visible distortions may arise at the boundaries between neighboring regions, as can be seen below in (a):
NeRFLight revolutionizes this process by using a symmetric grid instead. With this approach, the system ensures smoother transitions between regions, effectively eliminating the problematic discontinuities of linear non-symmetric grids. This process produces a seamless, high-quality visual output, as demonstrated in (b). Even in complex, non-symmetric scenes, such as a rotated Lego scene, NeRFLight still excels.
Notably, the symmetric grid strategy not only enhances the quality of reconstruction but also improves the frame rate versus storage ratio. It does this by reducing the number of features, which results in a lighter model with faster access to the features for each voxel. In simpler terms, it means that NeRFLight renders images faster and uses less storage space - a crucial advantage in applications like online games, social media, or displaying a NeRF in realtime, say on a mobile device.
Additionally, the combination of implicit and explicit training in NeRFLight bolsters the quality of the final model. This dual-training strategy also provides high-quality results when applying direct optimization of the features.
However, the groundbreaking NeRFLight isn't without its challenges. The most significant limitation is the amount of training time required, significantly more than other methods like TensoRF or Instant-NGP. The team behind NeRFLight believes that applying tensor decomposition, similar to TensoRF, could help alleviate this issue.
Another limitation is that NeRFLight currently cannot represent unbounded or forward-facing scenes. The team aims to overcome this obstacle by applying normalized device coordinates (NDC) or multi-sphere images (MSI) in future work. Despite these limitations, the research has proved promising, providing robust, seamless, and high-quality results in the current implementation.
NeRFLight currently boasts the highest frame rate versus storage cost ratio in the field. Even while optimizing for smaller size and inference efficiency, it delivers high-quality rendering results, equaling the most recent NeRF models. With its promise of seamless reconstructions, faster rendering, and more accurate reconstruction, NeRFLight is set to revolutionize how we process and view digital images.
Whether it's streaming on social media, powering visuals in online games, or sculpting realities in the burgeoning metaverse, NeRFLight's superior FPS/MB ratio will help empower viewing NeRFs in realtime on a variety of devices, helping unlock even more usecases.