News|Research

FMGS: Gaussian Splatting with Semantics

Michael Rubloff

Michael Rubloff

Jan 5, 2024

Email
Copy Link
Twitter
Linkedin
Reddit
Whatsapp
FMGS
FMGS

It's no secret at all that I am a massive fan of pairing radiance fields with semantics. I've written about it multiple times and always get excited when I see a new method. This brings me to last night when Foundation Model Embedded Gaussian Splatting (FMGS) came across my screen.

For those not familiar, semantic paired methods literally allow you to ask a radiance field a question about what's contained inside it. It can be a simple question like, where did I leave my olive oil in the kitchen, or what screw do I use first to assemble this desk I got from IKEA?

There was actually another Gaussian Splatting linked method just last week named LangSplat, but this newest method also caught my attention. LangSplat offers a 200X speed up compared to LERF, but FMGS...FMGS is 800X faster than LERF! They're able to achieve 103.4 fps, while it's running. Admittedly, the authors have not released a lot of media or examples that I can show, but I greatly would like to.

Returning to the earlier paper of LangSplat, it seems to be strong at utilizing SAM; I am super curious how the two new papers could each be leveraged on top of one another. FMGS stands out by its integration of vision-language embeddings from foundation models directly into the 3D scene representation, merging visual and linguistic data effectively. On the other hand, LangSplat takes a slightly different approach, focusing on constructing a 3D language field by enhancing each Gaussian with language embeddings distilled from CLIP and utilizing a tile-based splatting technique for rendering language features.

How is FMGS getting such a ridiculous speed boost? It feels like so much of the work people do can be tied back to NVIDIA's Multi Resolution Hash Encoding, or Instant NGP. They're not actually using Instant NGP, because that is NeRF based, but they direct inspiration from it. In FMGS, this speed boost is achieved through the innovative integration of multi-resolution hash encoding, enhancing the efficiency of the framework.

The distinguishing feature of FMGS is its integration of vision-language embeddings from foundation models. These embeddings are incorporated into the 3D scene representation, enabling the model to understand and interpret the semantic content within the scene. In practice, this involves distilling feature maps generated from image-based foundation models and rendering them from the 3D GS model, effectively merging visual and linguistic data.

While we've seen various efforts aimed at optimizing Gaussian splatting, FMGS introduces a unique solution to the challenge. To navigate the memory and computational constraints often encountered, FMGS leverages a Multi-Resolution Hash Encoding (MHE). This method works in tandem with Gaussian Splatting, enhancing its ability to efficiently represent complex language content within 3D scenes.

This component uses hash tables at multiple resolutions, reducing the computational load while maintaining the quality of the semantic embeddings. A key innovation in FMGS is the introduction of a pixel alignment loss. This component ensures that the rendered feature distance of semantically similar entities is minimized, adhering to pixel-level semantic boundaries. This aspect of FMGS contributes to the framework's ability to provide high-quality rendering and fast training, crucial for practical applications.

FMGS employs a unique training procedure that involves supervising the MHE-based language feature field using a hybrid feature map. This map is derived from multi-scale image crops obtained from various viewpoints. The training process ensures that the language embeddings capture relevant features at each scale, allowing for a comprehensive representation of the scene.

For querying, FMGS allows users to interact with the 3D scene using natural language. The model generates relevancy maps based on the query, highlighting semantically relevant parts of the scene.

Unlike traditional methods that focus either on geometric accuracy or semantic understanding, FMGS excels in both. It provides a more holistic understanding of the scene by integrating detailed geometry with rich semantic context. Additionally, FMGS demonstrates a significant improvement in inference speed and versatility compared to other state-of-the-art methods.

FMGS opens up a plethora of possibilities in augmented reality and robotics. In AR, it can enhance user experiences by providing more accurate and interactive representations of physical spaces. In robotics, FMGS can be instrumental in developing robots that understand and navigate spaces more effectively, recognizing objects not just by their shape but also by their semantic properties.

Funnily enough, in order to not go insane out of boredom in the days between Christmas and New Years, I had a long phone call with a friend who it it click for him how many opportunities there are for this. Some of the ones we spoke about was hospital and patient SOP management, evacuation and simulation methods, and a grocery store automating inventory. Not far from that, some of my personal favorite are in the agricultural space.

Given that FMGS comes out of Google, I have to imagine how they might be thinking about it benefiting search. I would be curious to see how a user of Google Maps might be using FMGS. My thought on more everyday uses, such as asking, where is the bathroom in this coffee shop?

Think about all the possibilities of what you can do with radiance fields paired with semantics. What do you think? How would you use a radiance field that can highlight what's contained in it?

Their authors have also stated that they will be releasing their code after the paper has been accepted.

Featured

Featured

Featured

Research

Frustum Volume Caching

A criticism of NeRFs is their rendering rates. Quietly a couple of papers have been published over the last two months which push NeRFs into real time rates.

Michael Rubloff

Jul 26, 2024

Research

Frustum Volume Caching

A criticism of NeRFs is their rendering rates. Quietly a couple of papers have been published over the last two months which push NeRFs into real time rates.

Michael Rubloff

Jul 26, 2024

Research

Frustum Volume Caching

A criticism of NeRFs is their rendering rates. Quietly a couple of papers have been published over the last two months which push NeRFs into real time rates.

Michael Rubloff

Research

N-Dimensional Gaussians for Fitting of High Dimensional Functions

It significantly improves the fidelity of reflections and other view-dependent effects, making scenes look more realistic.

Michael Rubloff

Jul 24, 2024

Research

N-Dimensional Gaussians for Fitting of High Dimensional Functions

It significantly improves the fidelity of reflections and other view-dependent effects, making scenes look more realistic.

Michael Rubloff

Jul 24, 2024

Research

N-Dimensional Gaussians for Fitting of High Dimensional Functions

It significantly improves the fidelity of reflections and other view-dependent effects, making scenes look more realistic.

Michael Rubloff

Platforms

Luma AI launches Loops for Dream Machine

Luma AI is starting the week off hot, with the release of Loops.

Michael Rubloff

Jul 22, 2024

Platforms

Luma AI launches Loops for Dream Machine

Luma AI is starting the week off hot, with the release of Loops.

Michael Rubloff

Jul 22, 2024

Platforms

Luma AI launches Loops for Dream Machine

Luma AI is starting the week off hot, with the release of Loops.

Michael Rubloff

Platforms

SuperSplat adds Histogram Editing

PlayCanvas is back with a new update to SuperSplat. It's the release of v0.22.2 and then the quick update to v0.24.0.

Michael Rubloff

Jul 18, 2024

Platforms

SuperSplat adds Histogram Editing

PlayCanvas is back with a new update to SuperSplat. It's the release of v0.22.2 and then the quick update to v0.24.0.

Michael Rubloff

Jul 18, 2024

Platforms

SuperSplat adds Histogram Editing

PlayCanvas is back with a new update to SuperSplat. It's the release of v0.22.2 and then the quick update to v0.24.0.

Michael Rubloff

Trending articles

Trending articles

Trending articles

Platforms

Nerfstudio Releases gsplat 1.0

Just in time for your weekend, Ruilong Li and the team at Nerfstudio are bringing a big gift.

Michael Rubloff

Jun 7, 2024

Platforms

Nerfstudio Releases gsplat 1.0

Just in time for your weekend, Ruilong Li and the team at Nerfstudio are bringing a big gift.

Michael Rubloff

Jun 7, 2024

Platforms

Nerfstudio Releases gsplat 1.0

Just in time for your weekend, Ruilong Li and the team at Nerfstudio are bringing a big gift.

Michael Rubloff

News

SIGGRAPH 2024 Program Announced

The upcoming SIGGRAPH conference catalog has been released and the conference will be filled of radiance fields!

Michael Rubloff

May 14, 2024

News

SIGGRAPH 2024 Program Announced

The upcoming SIGGRAPH conference catalog has been released and the conference will be filled of radiance fields!

Michael Rubloff

May 14, 2024

News

SIGGRAPH 2024 Program Announced

The upcoming SIGGRAPH conference catalog has been released and the conference will be filled of radiance fields!

Michael Rubloff

Platforms

Google CloudNeRF: Zip-NeRF and CamP in the Cloud

It doesn't seem like a lot of people know this, but you can run CamP and Zip-NeRF in the cloud, straight through Google and it's actually super easy. It’s called CloudNeRF.

Michael Rubloff

May 8, 2024

Platforms

Google CloudNeRF: Zip-NeRF and CamP in the Cloud

It doesn't seem like a lot of people know this, but you can run CamP and Zip-NeRF in the cloud, straight through Google and it's actually super easy. It’s called CloudNeRF.

Michael Rubloff

May 8, 2024

Platforms

Google CloudNeRF: Zip-NeRF and CamP in the Cloud

It doesn't seem like a lot of people know this, but you can run CamP and Zip-NeRF in the cloud, straight through Google and it's actually super easy. It’s called CloudNeRF.

Michael Rubloff

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff