Today in our SIGGRAPH Asia series we host best paper award winning NVIDIA team and discuss their latest work on a hybrid NeRF named “Adaptive Shells”. We have Zian Wang, Merlin Nimier-David, Tianchang Shen, Nicholas Sharp, Jun Gao and Zan Gojcic with us.
We appreciate their spontaneous willingness to share insights about their work. We have already previously covered the paper for those interested in delving into the technical detail but nothing beats the authors’ perspective and outlook on their work.
Excitingly, they also have plans to release the code to Adaptive Shells soon.
Q: Congratulations on winning the SIGGRAPH Asia’s best paper award this morning. Could you please introduce the team?
A: Thanks, our team comprises individuals with diverse expertise. Each member brings a unique perspective to problem-solving in graphics and we are excited how machine learning makes all these borders completely obsolete. Also, including computer vision people to the team has made things better. They were missing the whole 2000s…
Merlin is our rendering expert and we have specialists in computer vision, machine learning for 3D representation and Nick focuses on old-school geometry. This diversity turned out to be incredibly useful for us, fostering efficient collaboration and building strong connections among team members. Overall it went really well.
Q: Could you share how your collaboration for Adaptive Shells came about?
A: Interestingly, we are not even a single team but a collaboration between two teams within NVIDIA, spread across four or five cities, including the US, Switzerland and Toronto. So we were two separate teams working on very similar topics. At some point, we recognized the synergies and decided to merge. And we found that you can really cover much more if you have a heterogeneous group of people that is thinking about the problems from a different angle, then if you have nine people that basically have exactly the same background, because they will use the same tools, they will think about the problem in exactly the same way.
Q: That is quite a unique and collaborative setup. Let's dive into your paper. Could you provide an overview of the idea behind Adaptive Shells?
A: In some ways it is a really natural thing to try to build a hybrid NeRF representation, because everyone wants things to be faster. The core idea of our paper is to create two meshes for Neuralangelo - one bounding it from the outside and one from the inside. The advantage of this approach is that the expensive neural volume rendering only needs to happen between these two meshes. Some things are surfaces, you do not need to treat them as volumes.
If you only treat them as surfaces, you want only a single sample. That same insight is actually transferable to Gaussians.
Q: That sounds like a significant breakthrough. Moving on, it is mentioned that you plan to release code related to your research?
A: We are researchers: we just want to give these people awesome tools, let them build cool things. So we are definitely working on getting some code out, but it is still in flight. We are still working on file size and speed optimizations to ensure a smoother and more accessible release. We are keen on engaging with the broader community, and releasing our code is a step in that direction. We strongly believe in fostering collaboration and providing tools that the community can benefit from.
Q: The rapid evolution of the field is undoubtedly a challenge. How have you managed to keep up with the fast-paced advancements?
A: For us it is important to be able to digest things on a high level and then just focus on what is actually important. There is a lot of noise. Out of this noise you need to somehow be able to extract where the field is going, what are the next important problems and the previous point is in our research.
What will likely happen in the foreseeable future and how we can contribute to a bigger goal together. A paper should not end when the paper is published. The idea should not end when a paper is published, at least in my opinion that should not be the final goal of our research projects.
Q: How do you envision the evolution and adoption of NeRF methods, especially considering the progress in hardware capabilities?
A: Hardware had to get there before Nerfs were useful. So usage for NeRFs in the first paper (in 2020) was very limited. With Instant NGP last year this wave started: we were actually reaching the performance where it became interesting and useful.
The future, we believe, involves continuous maturation and aligning with hardware progress. The biggest thing to do is: you have got to keep pressing the performance. You want to keep it getting faster. And you want to get it in people's hands: we were just talking about how we were excited about the way people could be using this in real life.
One of the other big parts of it is making these things, more controllable, more editable, more relightable, and things like that, that if we are really going to make it a piece of the graphics pipeline, you need to. We can add animation and simulation.
It is the same on large scenes or outdoor scenes, you have foliage from the trees and buildings. There are some extra challenges, right? Because anytime you are building data structures that you want to use to accelerate what you are doing, when you are on an outdoor scene, all of a sudden you have some stuff incredibly far away. So there are additional challenges to overcome. But the end result was pretty much we were able to make it work. And it is a thing we can do and look into.
Occasionally, we also get direct feedback, although it is not always a closed loop… People might use our tools, and we only find out about it years later. Sometimes, it happens on platforms like Twitter, where someone shares a screenshot of their work using our tools.
Q: Finally, how do you foresee the future of graphics overall?
A: You might represent some content in a scene with a triangle mesh. When you have a fluid simulation, it is probably going to be voxels. But the point is: you use the right representation for the thing you are doing. And right now we are getting some new options for additional choices of representation.
It is not yet super clear how it would play together with the existing shadow mapping systems, dynamic lighting systems and game engines. This is still open but there are a lot of people interested. Maybe you want to use Shells with other assets that you have for games. If you combine these like with Gaussians or NeRFs you will have a lighting baked into the representation. It is not as trivial.
Maybe problem number one would be robustness. To my point, I think in film people are very risk -averse, big projects, like if it may fail, we can even try… So we might still have polygons, but the thing that needs to change is that right now we still think a lot about artists and creators modifying representations explicitly. So whether it is meshes or whether it is NeRFs or Gaussians or something else, the thing that needs to happen is we need to give people the tools to manage these representations automatically computationally so we do not put a burden on like the end creator to have to assemble the representation themselves.
(Note: The interview format is a condensed representation for clarity.)