With GTC and GDC in the immediate future, we have seen an increase of consequential NeRF papers get published. Language Embedded Radiance Fields or LERF, represent a new emergence of the technology. LERF allows for computers to understand in a given NeRF what the individual items contained in a scene are from a text prompt.
LERF Teaser Video
This unlocks a tremendous amounts of use cases ranging from assembly instructions, to building inspections, to fire escape plans. For me, I am reliant on watching YouTube tutorials on how to assemble furniture and pretty much anything that doesn't come pre-assembled. Now it will be possible for retailers such as IKEA to train these models to what piece corresponds to each step of the assembly process and allow the end user to query where each piece goes. For instance, what are the all pieces I will be using for steps 1-3 of assembly?
Additionally, it allows for asynchronous customer support and troubleshooting.
Furthermore, these retailers will also have the ability to generate NeRFs of what the completed product is supposed to look like in 3D and allow users to make sure they followed the instructions correctly - something that I have continued to struggle with.
Given that the paper authors, Matthew Tancik and Angjoo Kanazawa, are also founders of Nerfstudio, it's not a surprise that they announced an integration within NeRFstudio is coming. This integration capability is what transforms the technology into a realistic use case for short term impact on businesses.
This is further inline with some of the recent developments as more and more features begin to be compatible with NeRFstudio. The ecosystem that is developing around NeRF is extremely exciting and one to closely watch as the year continues.
This advance also gives us a glimpse into the future where it will be possible to utilize a text prompt to delete or transform an object. Perhaps it will also build upon NeRFshop to make some of these changes to a scene in a more efficient manner, similar to how content aware has emerged in Photoshop. It also represents the possibility of providing a command to a robot to fulfill. Immediately my mind goes towards how distribution such as Amazon would be able to utilize NeRFs with its fulfillment network to more efficiently and accurately transport items around a warehouse to be fulfilled.
LERF enables pixel-aligned queries of the distilled 3D CLIP embeddings without relying on region proposals, masks, or fine-tuning, supporting long-tail open-vocabulary queries hierarchically across the volume.
In the paper, it is demonstrated how it is possible to now pair algorithms that work with natural language to interact with the scene.
Once the items that ChatGPT has recommended to clean up the area has been defined, it is easy to establish a workflow to complete the goal, all while being confirmed that the resources exist in the live scenario.
However, there are other examples included on the product page to help show how LERF can help transform businesses. For instance, a florist can demonstrate to a client how a bouquet will look and show live examples of different types of flowers will pair.
The amount of detail that is able to be drilled down is surprising as demonstrated by the above video; not only is LERF able to understand the larger bouquet of flowers, but it further able to isolate the types of flowers contained within it. As this technology progresses, the documentation of everyday items for educational purposes will rapidly increase. For instance, I would have been hopeless attempting to figure out what rosemary was, but now it's just about asking what flowers are contained in this bouquet. This tool will immediately be a boon for so many industries and it's easy to imagine how florists can use it to help upsell variety of flowers to clients.
Professor Angjoo Kanazawa will also be giving a presentation on NeRFstudio next week at NVIDIA's GTC. Check out the article here with the NeRF schedule and make sure to add it to your list!