Back in middle school, I really thought that more of my life would involve decoding hidden messages in images for some reason. With Noise-NeRF, we're seeing that it's possible and it actually works pretty well.
This has always been a question in the back of my head, but I had never been able to understand how it might be done until today. This is, how to encode images or messages within NeRFs without it being noticeable. It's is a technique called steganography (not stenography), where people hide data or messages within an ordinary files. Noise-NeRF expands on this by not only maintain the integrity of the original NeRF model, but also allow for information to be securely embedded into 3D models.
There's a range of what you can do with that, both good and bad, but let's focus on some of the positives. By incorporating steganography, it serves as provenance for copyright verification. This would alleviate a lot of the questions about where data has been trained from or created by. It also would help with licensing conversations and to verify how commercially used projects are created and ensure that they have been properly licensed.
I actually am a little curious if any of the existing platforms, such as Luma AI or Polycam have looked into or are already using steganography with their outputs. Noise-NeRF consumes ~16GB of VRAM in the examples shown and are trained using a 3090, so it's not impossible, but the question is is it worth it right now for these companies? Potentially. I have seen so many conversations take place surrounding the provenance of generative AI and while this doesn't necessarily fall into that category, I believe that as 3D continues to emerge, having some level of imperceptible documentation will be important.
In someways, it reminds me of the Gaussian Painters project from Alexandre Carlier, but this extends in 3D.
Let's take a look now at how Noise-NeRF actually works. They propose two strategies to help power Noise-NeRF: Adaptive Pixel Selection and then a Pixel Perturbation Strategy. These two strategies are pivotal in enhancing the steganography quality and efficiency of Noise-NeRF.
Instead of treating all pixels uniformly, they propose something called a Adaptive Pixel Selective Strategy, which identifies and select pixels that show greater sensitivity or responsiveness to the injected noise. Obviously, it won't be perfect straight away, and thus needs some level of optimization and refinement. As the embedding progresses, it dynamically adjusts its focus, choosing different pixel groups based on their performance in prior iterations.
This iterative optimization ensures that each pixel receives the exact level of attention required. Grouping pixels together wouldn't look quite right due to view-dependent effects. Therefore, different pixels may require varying levels of iteration or noise adjustments
While Adaptive Pixel Selection focuses on efficiency and precision, they introduce the second part to Noise-NeRF, a Pixel Perturbation Strategy. It's designed to enhance the speed and effectiveness of the steganographic embedding. In the initial stages of embedding, early stage deviation aims to create a significant deviation of the rendered image from the original. This is achieved by introducing calculated perturbations or noise.
By causing a more substantial initial deviation, the Pixel Perturbation Strategy helps the steganographic process to reach its desired state more quickly— kind of like giving the process a head start. The strategy ensures that the perturbations are significant enough to speed up the process but not so large as to degrade the quality of the final steganographic output.
The combination of these two strategies allows Noise-NeRF to efficiently embed high-quality steganographic information into NeRFs. While Adaptive Pixel Selection ensures that the embedding is precise and efficient, Pixel Perturbation accelerates the process, ensuring quick convergence to high-quality results.
The authors mention a surprising lack of research in this area and as far as I'm aware, they're correct. It's something I haven't really encountered, but believe that it will be important to have documentation of copyright information, but also the type of data that is being stored. Hopefully we'll steer clear of Snow Crash, and it opens up conversations about what is being messaged.