4/22/23 Update: The code for F2-NeRF is now open sourced and available here!
The new NeRF papers continue to pour in. This time with the introduction of F2-NeRF (Fast Free NeRF).
Contrary to other NeRF platforms such as Instant-NGP and Luma, which are designed for bounded scenes, F2-NeRF utilizes a space-warping method, named Perspective Warping. This allows for high quality renders in a relatively short training time. The results might shock you at just how clear F2-NeRF is able to get.
They introduce a general space-warping scheme called perspective warping that is applicable to arbitrary camera trajectories and a space subdivision algorithm to adaptively allocate grids for foreground and background regions. Experimental results show that F2-NeRF outperforms existing grid-based NeRF methods on various datasets with different trajectory patterns.
The training time compared to Instant-NGP is roughly double, but given how quickly Instant-NGP runs, it does not appear to be a huge detractor. Especially when the rendered output is so clear. These papers are among some of my favorite because they increase the overall sharpness, while still being manageable on a consumer grade GPU (These were trained on a 2080ti!)
There have been efforts to try and get platforms such as Instant NGP to require less memory using strategies such as such as voxel pruning, tensor decomposition, and hash indexing, but these can only be utilized by bounded scenes when grids are built in the original Euclidean space. F2-NeRF is built on top of the framework of Instant NGP, but allow for free camera trajectories for large, unbounded
scenes.
So how does it work? The rendering process is subdivided into two parts: the preparation stage and
the actual rendering stage.
In other words, they focus on the immediate view of what the user is looking at from the foreground and the background and group them together in order to most effectively apply local warping. The sizes of these grids also directly corresponds to the subject — coarse grids for background regions and fine grids for foreground regions.
Then, in the rendering stage, F2-NeRF renders the pixel color, by sampling points on the camera ray and conducting weighted accumulation of what those colors are. The sampled densities and colors are fetched from the multi-resolution hash grid. Inherently, there will be a problem where there is multiple objects within the foreground.
In comparison, F2-NeRF takes advantage of the perspective warping and the adaptive space subdivision to fully exploit the representation capacity, which enables F2-NeRF to produce better rendering quality. They did find that mip-NeRF-360 was able to pull highly accurate renderings, however this was only when the scene was allowed to train for hours. The example they showed below utilized mip-NeRF-360 short, for a period of 30 minutes.
Looking at the above photo, the results are getting close to approaching Ground Truth (original). You know that when a paper puts a disclaimer that its technology is good enough to generate misleading fake images, it's accurate. For me, the biggest thing is the text clarity on the sign. Clarity has been something that I have still been struggling with Instant-NGP and the text fidelity.
F2-NeRF represents another step in the right direction to high quality NeRFs running on consumer grade GPUs. How the various platforms incorporate the technology remains to be seen, but the evidence is undeniable.