UniSDF comes on the heels on a few advancements to create better geometry out of radiance field methods, specifically NeRFs. Earlier this year, NVIDIA won one of Time Magazine's Top Invention of 2023 for its work on Neuralangelo. UniSDF kicks these advancements up a notch with their method.
There's something about watching really clean radiance fields that is just aesthetically pleasing to me. I think it's knowing that photorealistic 3D outputs of the world are already here.
UniSDF creates noticeably cleaner outputs, specifically revolving around reflections and highly specular objects, but how does it work?
UniSDF begins by understanding the shape and form of objects in the scene. UniSDF does this using a Signed Distance Field (SDF), a mathematical function that describes how far a point in space is from the nearest surface. Next, UniSDF begins to take a look at fine details– things like the textures and small nuances. They employ Instant-NGP for this reason, in addition to its fast training speed. Time and time again, we have seen researchers use NVIDIA's method as a foundation for further development.
This is where things get really fun. UniSDF uses not one, but two radiance fields as part of their reconstruction method, each serving a specific purpose. One is a Camera View radiance field and one is a Reflected View radiance field. You can now see why the full name of the method is Unified SDF.
Camera View Radiance Field: This field captures the scene as seen from the camera's viewpoint, focusing on non-reflective surfaces and diffuse reflections.
Reflected View Radiance Field: This field specializes in capturing reflections, particularly the tricky specular ones seen on shiny surfaces.
Ok, so now they have two separate radiance fields, each specializing in their own domain, but obviously we're looking to have them match up. UniSDF does this through a learned composition process. It uses a Weight MLP to determine how much each of the two radiance fields should contribute to the final view. It's critical that this combination is accurate, otherwise we'd end up with the shiniest car you've ever seen or an output that more resembles photogrammetry.
UniSDF uses a coarse-to-fine training strategy like in Neuralangelo. It starts by understanding the broader structure of the scene before gradually refining its focus to capture finer details. This approach is complemented by regularization techniques that maintain geometric consistency and smoothness.
Throughout the training process, the Weight MLP continuously refines its understanding of how reflective or non-reflective different parts of the scene are. This adaptability allows UniSDF to accurately reconstruct scenes with a complex mix of reflective and non-reflective surfaces.
UniSDF was trained using 8 NVIDIA Tesla V100's, knocking out the ability for most people to try it on their own. But wait. What's this? The author of Instant-Angelo author took to Twitter to say he has already implemented the UniSDF findings and has achieved similar results on a 3090. Very exciting, especially considering that Instant-Angelo is both open sourced and MIT Licensed.
I'll be curious to see if other research methods begin employing multiple radiance fields to specialize in a domain. I find the idea of combining these radiance fields particularly interesting because in theory you might be able to create several fields to tackle individual drawbacks. Then again, I'm no engineer and perhaps the compute would be better spent elsewhere. UniSDF makes it hard to argue though.