Just after the LERF paper was announced last week, I couldn't help but imagine how eventually the method could be paired with a in-NeRF editor. What year that would happen in, I could only imagine. As it turned out, I needed to wait less than two weeks, for with the publication of Instruct-NeRF2NeRF, several of my dreams became available.
UC Berkeley researchers have developed a groundbreaking new method for editing 3D scenes using simple text-based instructions, called Instruct-NeRF2NeRF. The technique allows users to make diverse local and global scene edits, offering a more accessible and intuitive 3D editing experience for everyone, even those without specialized training.
Traditional 3D editing tools require extensive training and expertise, but the emergence of neural 3D reconstruction techniques has made it easier to create realistic digital representations of real-world scenes. However, the tools for editing these scenes have remained underdeveloped, creating a barrier for those without specialized skills. Instruct-NeRF2NeRF aims to change that by allowing users to edit 3D scenes using simple text instructions.
In other words, Instruct-NeRF2NeRF allows for a user to edit a NeRF based solely off of a text prompt. On its own, it's an extremely powerful tool; paired with the power of LERF, it becomes a very dangerous combo.
If the model in the above video looks familiar, it's because it's Ethan Weber, one of the NeRFstudio founders and researcher at UC Berkeley.
The researchers evaluated their approach on a variety of captured NeRF scenes, demonstrating that their method can accomplish diverse contextual edits on real scenes. This includes environmental changes such as adjusting the time of day or localized changes that modify specific objects within the scene.
Instruct-NeRF2NeRF works by using an image-conditioned diffusion model, InstructPix2Pix, to iteratively edit input images while optimizing the underlying scene. This results in an optimized 3D scene that respects the edit instruction, allowing for more realistic and targeted edits than previous methods. By enabling a wide variety of edits using flexible and expressive textual instructions, the approach makes 3D scene editing more accessible and intuitive for everyday users.
The researchers evaluated their approach on a variety of captured NeRF scenes, demonstrating that their method can accomplish diverse contextual edits on real scenes. This includes environmental changes such as adjusting the time of day or localized changes that modify specific objects within the scene. This can either be applied on the global level of the scene or on an individual subject.
Using natural language instructions, even beginners can achieve high-quality results without the need for additional tools or specialized knowledge. This innovative method provides a more user-friendly interface for 3D editing and has the potential to revolutionize the way 3D scenes are created and modified.
These NeRFs were originally generated in Nerfstudio using the NeRFfacto model. Several of the Instruct-NeRF2NeRF authors are also on the NeRFstudio team including Matt Tancik and Professor Angjoo Kanazawa. The project page also states that an official NeRFstudio integration is coming. Several of the recent developments have been shown to work in NeRFstudio and with this, NeRFstudio remains the most open NeRF project for adding on new plugins and discoveries.
Instruct-NeRF2NeRF addresses some limitations of previous text-based stylization approaches, such as their inability to incorporate localized edits. The researchers were able to enable mask-free instructional edits by taking advantage of the recent instruction-based 2D image-conditioned diffusion model, resulting in a purely language-based interface that enables a wider range of intuitive and content-aware 3D editing.
This development comes as natural language is increasingly seen as the next "programming language" for specifying complex tasks, with large-language models (LLMs) like GPT and ChatGPT enabling more user-friendly interfaces through language instructions. The researchers' work is the first to demonstrate instructional editing in 3D, which is particularly significant given the difficulty of the base task.
Instruct-NeRF2NeRF could revolutionize the 3D editing field, offering a user-friendly, instruction-based interface that allows for a wide range of intuitive and content-aware edits. By enabling even novice users to create high-quality 3D scenes without the need for specialized knowledge, this groundbreaking technique has the potential to transform the way we create and interact with digital 3D content. With each passing week, I find myself needing to re calibrate my expectations for the evolution of the field. What will be announced next is anyone's guess, but I am fairly sure that the team of researchers at UC Berkeley will be involved.