Summer is coming to a close and it seems as though, the team at Google went away to CamP. No, this is not a summer camp designed around NeRF, as fun as that would be, but it is the next best thing! CamP stands for Camera Preconditioning for Neural Radiance Fields.
The team behind it should be no surprise to anyone— it features two of the original NeRF authors, Ben Mildenhall and Jon Barron. It also shouldn't surprise anyone that they're using Google's current state of the art method, ZipNeRF. You might remember seeing video clips or posts on Reddit when Zip NeRF came out. It was the first time random friends and family members began texting me that they saw a NeRF on the internet.
CamP tackles one of the most important backbone setups for NeRFs to work, understanding the camera's positioning in a space. NeRFs rely heavily on Structure from Motion (SfM) methods, such as COLMAP, to inform where each camera positioning was located. The problem is that often this method is not great. It can take a while to generate and can often fail, leading to artifacts or completely missing sections of data. This part of the NeRF generation does not get as much research attention, but it should. There are massive implications on what the NeRF model is able to understand from the dataset and CamP is the proof. Whenever you run Instant-NGP or nerfstudio to create a transform file, you might recognize the final step.
The choice of how to parameterize the camera is often neglected. Various works, including BARF and SCNeRF, have adopted different parameterizations. Additionally, the choice of coordinate frame for translation, and the intricacies of focal length and translation interactions, further complicate the scene.
A popular approach to manage this is the coarse-to-fine approach. In multi layer perceptrons (MLPs), higher frequency bands of positional encoding can be windowed, while grid-based strategies like Instant NGP decrease the influence of higher resolution levels. However, these methods cannot completely offset the requirement for accurate camera pose initialization.
It’s crucial to get an accurate estimate of camera intrinsics for a successful NeRF or radiance field. Structure from Motion (SfM) systems, such as COLMAP, generally provide these, covering aspects like focal length and distortion coefficients. Using backpropagation, intrinsic parameters can be optimized, as demonstrated in SCNeRF.
The question becomes a little more complicated. With all of the potential camera parameters to optimize, which one has the greatest effect? Additionally, the mutual influence between scene reconstruction and camera parameter estimation, coupled with the intricacies of modeling neural scene representations, adds to the complexity.
(a) ARKit Poses (w/o COLMAP) (Left)
(b) ARKit Poses + CamP (Right)
To help with this, the Google team asked: How does the camera parameterization affect the projection of points in front of it? Despite its elementary nature, this question highlights potential pitfalls in certain camera parameterizations. These pitfalls can result in one axis of the parameterization, influencing certain projections far more than others.
Some issues can be manually resolved, for instance, by introducing scaling hyperparameters for each dimension. However, this doesn't solve all problems.
CamP can be broken down into just a few, powerful steps. The first is that camera parameters are correlated to one another and additionally have different units of measurement such as focal length in pixels. While it's helpful to be aware of these two, they actually can be damaging to the dataset. With that in mind, they employ a novel preconditioner to decouple the information from the image and get the actual pose. The preconditioner is a set matrix that is applied to the camera's parameters prior to it being passed into the NeRF model.
So how does CamP stack up against existing workflows?
We can see the effect of CamP here in its ability to significantly improve the ability to generate a complete view of the NeRF. Google Researcher, Ben Mildenhall, expressed his appreciation of the AR Kit enabled iOS app, NeRFCapture that Jonathan Stephens gave a demo of, earlier this year.
While the code has not been released yet, it looks like we're seeing another method end with results of higher quality NeRFs using the same data. Their GitHub project page states that it will be presented at SIGGRAPH Asia later this year, so I expect we might have to wait a little bit to the full code be released.