For the long time readers of this site, earlier this year, we looked into Google Research's Memory Efficient Radiance Fields (MERF). Now, they're back with another groundbreaking method: Streamable Memory Efficient Radiance Fields, or SMERF. In a nutshell, SMERF achieves Zip-NeRF quality NeRFs, operating at a remarkable 60fps on everyday devices like smartphones and laptops.
The evolution from MERF to SMERF is striking. While MERF impressed with its fidelity, it struggled with larger, unbounded scenes.
While it could be a simple fix to increase the resolution of the coarse 3D grid and fine 2D triplane resolutions, it presents a massive problem of computational overhead and memory consumption. So, in order to not run into that issue, SMERF introduces a hierarchical structure.
At the heart of SMERF's success is its novel hierarchical model partitioning. Imagine walking around a large house. You don't need to necessarily know what the second floor looks like, until you're in view of it. Similar, instead of overwhelming a system by trying to render this entire space at once, SMERF cleverly divides a space into smaller segments, each represented by a distinct neural radiance field (NeRF) model. We've seen something similar be employed recently with PyNeRF.
As you move through the virtual environment, SMERF seamlessly loads and processes only the segments you're viewing. It doesn't appear to be overly different from level of detail rendering, where the section that is immediately in view is treated with the highest level of detail.
This dynamic approach not only conserves memory but also ensures that your device, regardless of its power, delivers a smooth, uninterrupted experience. Despite segmenting the environment, SMERF maintains a remarkable consistency across the entire scene. This coherence means you can explore every corner of the virtual world without encountering visual hiccups or discrepancies.
Deferred Appearance Network Partitioning is a method that separates the neural processing of appearance (color, texture) from geometry (shape, depth). This separation allows for more focused and efficient computation, as each network can specialize in either appearance or geometry.
SMERF uses trilinear interpolation over deferred network parameters to determine the right parameters for every point in space. MERF uses a tiny network— 2 hidden layers, 16 units— for the same purpose. This network doesn't have enough capacity to model all kinds of reflections and specular highlights, but naively making it bigger increases compute costs at rendering time. SMERF side-steps the issue by letting the parameters of this network vary over space. The result: higher model capacity, same compute!
Feature gating is a technique used in neural networks to regulate the flow of information. It involves using 'gates' to control which features (data points) should be activated or suppressed during the learning process. In the context of SMERF, feature gating is used to enhance the efficiency of the student model. It allows the model to focus on the most relevant features for rendering, thus optimizing the learning and rendering process. Gating can be particularly useful in the distillation process, where the student model learns to prioritize features that are most crucial for mimicking the teacher model’s output. This is akin to how gates in electronic circuits control the flow of electrical signals.
Once this has all been achieved, SMERF moves onto the training portion.
SMERF's distillation training strategy plays a crucial role in balancing high-quality visuals with speed. Think of it as an expert chef (the 'teacher' model) training a promising apprentice (the 'student' model, or SMERF). The apprentice learns to recreate the chef's complex recipes but in a way that's quicker and more efficient, without compromising on taste. This leads to it rendering at Zip-NeRF quality, but three times faster!
This method is also utilized in Snap's Mobile R2L, but they use Instant-NGP, rather than Zip-NeRF.
The results of all of this are pretty spectacular. It out performs 3DGS and MERF, while coming close to Zip-NeRF. This is intentional, as Zip-NeRF is the "teacher" for SMERF, they do not want it to exceed its quality. If Zip-NeRF was a Sith Lord, it would be very happy.
Like all things, there are some limitations and caveats to SMERF. All of this low memory footprint comes with a high cost and that cost is training based. These models were trained up to 200,000 steps on either 8 V100s or 16 A100s. Yeah, you probably won't be running your custom datasets on SMERF anytime soon.
Similar to Adaptive Shells from NVIDIA SMERF is able to produce high fidelity NeRFs running in real time. However, SMERF is designed to run from your smartphone or directly in the browser at 60fps.
That said, there are some incredible demos that you can check out right now from the Google team on their Project Page! My personal favorite is the new Berlin dataset. It gives us a high fidelity glimpse into how in the shorter term future we might be able to revisit larger moments in time. I also think it's easy to see how experiential marketing can be leveraged to contain Easter Eggs for consumers. Take for instance, a radiance field of Abbey Road Studios with Beatles lore spread throughout it.
Between CamP, Reconfusion, and most recently Nuvo, Google appears to have amassed quite a large amount of proprietary methods.
Only time will tell what exactly Google plans to with SMERF and the viewer they've developed, but the most straightforward application appears to be one that Google has been very public about. Google Maps. There is such a straightforward use case for this that it only makes sense that Google has been prioritizing getting radiance fields running on smartphones. The code for the viewer is available here.
That said, I'm not sure that the everyday consumer will want/need to be able to fully explore the interior of a restaurant immediately and thus I could see there being a toned down version that gets pushed out.
However, the training impediment does pose as a large blocker for applying SMERF to every location within Google Maps, at least for the time being. That is an issue that will definitively need to be solved prior to us seeing large scale unbounded radiance fields in one of the most used apps in the world.