GARField: Granular Semantics for Radiance Fields

Michael Rubloff

Michael Rubloff

Jan 29, 2024

Email
Copy Link
Twitter
Linkedin
Reddit
Whatsapp
GARField nerfstudio
GARField nerfstudio

Last year, we saw the introduction of the first semantic based NeRF papers such as LERF, which expanded into LERF TOGO. Now we are getting another paper from the same group of authors, with GARField. GARField stands for Group Anything with Radiance Fields and it lives up to the name.

GARField allows for togglable granularity when querying a radiance field scene. For instance, if you had a radiance field of a lamp, but really were only interested in the lamp shade, you could toggle between the larger lamp and the individual pieces that make it up.

This progression of methods have evolved towards approaching a world where we can understand and interact with the totality of a three dimensional space. This is quite a big deal! However, it comes with a challenge of ambiguity too. How do you know what grouping belongs with one another and show that coherently? How do categorize a lamp shade to be different than if someone indicates they want to focus on the lamp?

This is a central focus of GARField, in which they prioritize the physical scale of entities into a hierarchical structure.

I have made my opinion known on numerous occasions of the potential depth of use cases for semantic based radiance fields and throughout this week, one post on Linkedin caught my eye from Wayfair Director, Bryan Godwin.

The potential for the Ecommerce industry to leverage GARField is again staggering and presents several angles for which you can solve problems. The most straight forward one is through product display, but that can be done with a standalone radiance field. Let's take it up a notch and explore other use cases. One that I would personally appreciate is a completely self served customer service platform that utilizes radiance fields for assembly instructions.

Growing up, I was terrible at building Legos, because I had a tendency to ignore the instructions. Unfortunately that early life lesson of assembling things has carried over to adulthood and thus, assemble it yourself furniture is a dreaded activity. However, methods such as GARField, enables brands, such as Wayfair to understand, break down problems, and implement feedback to struggling customers. No longer would I need to feel the fear of missing a step, but utilize one of the practical applications of radiance field technology.

Additionally, Wayfair would be able to "explode" their products for people to see how the finished product contains the sum of their parts. For instance, showing each set of screw types and how they should look when completed, versus an incorrect assembly.

With all of these use cases in mind, how exactly does it work?

Like the vast majority of radiance fields, they start with a series of two dimensional, normal images. The similarities stop pretty early though, as GARField pre processes the images with a Segment Anything Model (SAM). The results from SAM are each given a physical scale by deprojecting depth from the NeRF. Those resulting scales are used to train one of the most important pieces in GARField, a scale-conditioned affinity field. That's a pretty scary grouping of words, but what it means is what parts of the scene might overlap, given the granularity request. For instance, if say you have a car in the scene, you would group the windshield and the windshield wipers as part of the entire car. But, if the granularity you're searching for is to find just the windshield wipers, the two now belong to different groups. Having this contextual awareness is critical for the method to work.

This field optimizes a dense 3D feature field, where the affinity (or relatedness) between points is determined by their feature distance. This approach allows two points to exhibit high affinity at one scale and low affinity at another. It's a delicate balancing act that GARField manages, effectively addressing the challenge of representing multi-scale groupings within a 3D scene.

For instance, two points might be considered part of the same group at a larger scale but separate at a finer scale. This approach is key to resolving ambiguities inherent in grouping objects, where the same point can belong to different groups based on the chosen scale.

GARField then employs a margin-based contrastive objective to refine the relationships in the scene. It involves pulling features within the same group closer and pushing features in different groups apart. This is done based on the scale of the groups, ensuring consistency across the various scales of grouping. There's actually another supervisor for the contrastive objective, called a continuous scale supervision, which uses 3D mask scales, defines groups only at discrete values where masks are chosen. Without the continuous scale supervision, the hierarchy doesn't work.

After the establishment and optimization of the Scale-Conditioned Affinity Field, GARField undertakes several steps to achieve hierarchical scene decomposition. With the optimized scale-conditioned affinity field, GARField can extract a hierarchy of 3D groups. This is achieved by recursively clustering the features in the field at descending scales. The process ensures that the generated groups are subparts of the prior cluster, adhering to a coarse-to-fine approach.

GARField constructs a hierarchical tree of scene nodes. This is done by iteratively reducing the scale for affinity and running a density-based clustering algorithm on each leaf node. The result is a hierarchical tree where each node represents a grouping at a particular scale. Through either automatic tree construction or user interaction, GARField can generate a hierarchy of groupings within a scene. This hierarchy ranges from clusters of objects to individual objects and their sub-parts. It's this granularity of decomposition that sets GARField apart.

In its final form, users can interact with the system to select specific groups based on the scale and context. This feature is particularly useful for tasks like 3D asset extraction or interactive segmentation, where precision and context are crucial. In the visualization examples, they are actually showing Gaussian Splatting outputs because they're a bit easier to segment than NeRFs.

This is definitely the first time we've seen Luma AI's name on a publicly released paper, but that's probably because paper author, Matt Tancik works at Luma.

As pretty much every author of GARField is from nerfstudio, it's probably pretty obvious that it's built on top of nerfstudio's nerfacto. Further, it's actually one of the first papers to also be built on nerfstudio's new viser viewer. Hopefully will be made available to the public, the same way that LERF is. Readers will be happy to know that this is something that can be run by consumer graphic cards, taking roughly 30 minutes on a 4090 GPU.

Featured

Recents

Featured

Platforms

Postshot V.5 Released

The newest version of Postshot is here!

Michael Rubloff

Dec 23, 2024

Platforms

Postshot V.5 Released

The newest version of Postshot is here!

Michael Rubloff

Dec 23, 2024

Platforms

Postshot V.5 Released

The newest version of Postshot is here!

Michael Rubloff

News

Create a Hyper-Real Holiday Card

Just in time for the holidays comes a way to share hyper real holiday wishes!

Michael Rubloff

Dec 23, 2024

News

Create a Hyper-Real Holiday Card

Just in time for the holidays comes a way to share hyper real holiday wishes!

Michael Rubloff

Dec 23, 2024

News

Create a Hyper-Real Holiday Card

Just in time for the holidays comes a way to share hyper real holiday wishes!

Michael Rubloff

Platforms

GSOPs 2.0: Now Commercially Viable with Houdini Commercial License

The 2.0 release for GSOPs is here with a commercial license!

Michael Rubloff

Dec 20, 2024

Platforms

GSOPs 2.0: Now Commercially Viable with Houdini Commercial License

The 2.0 release for GSOPs is here with a commercial license!

Michael Rubloff

Dec 20, 2024

Platforms

GSOPs 2.0: Now Commercially Viable with Houdini Commercial License

The 2.0 release for GSOPs is here with a commercial license!

Michael Rubloff

Platforms

Odyssey Announces Generative World Model, Explorer

Odyssey shows off their photo real world generator, powered by Radiance Fields.

Michael Rubloff

Dec 18, 2024

Platforms

Odyssey Announces Generative World Model, Explorer

Odyssey shows off their photo real world generator, powered by Radiance Fields.

Michael Rubloff

Dec 18, 2024

Platforms

Odyssey Announces Generative World Model, Explorer

Odyssey shows off their photo real world generator, powered by Radiance Fields.

Michael Rubloff