Just earlier today OpenAI unveiled Sora, arguably the most consequential announcement in the past twelve months
Sora is a text to video model announced just after midday on the East coast. To be blunt, it's significantly stronger than any existing public method, surpassing others by orders of magnitude.
That's a large claim, given the progression of technology. On its face, generating that level of high fidelity of video is extremely impressive, but where it becomes incredible is the generating of view consistent, parallax filled large areas. This unlocks the ability of large scale generative radiance fields.
What does this mean? It signifies that generating high-fidelity, large-scale radiance fields will soon be a reality for the public. The most challenging part is now behind us. It is now possible to generate view consistent locations and lifelike video output through text. That is so difficult. That is so completely difficult.
Like the vast majority of people, I only have access to the publicly available videos they've published, but that's all I need. According to the Sora announcement, they're able to generate 60 seconds of footage per prompt. I took the demo video of Santorini as my base. It's only 9 seconds, but that's sufficient.
From here, I simply slice the video up the same way I would with any existing video and run it through COLMAP and then into nerfstudio, but you can use any radiance field company. It works on the first try, with all camera poses found and trains like any other radiance field. Here is the resulting NeRF from Nerfacto.
This was taken from a 9 second video. Imagine what you could do with the full 60 seconds. Imagine what you can do with two minutes. This means, not only is it possible, they've already achieved it. I am very curious to see what OpenAI will do from here on out, but as this stands, they can generate large scale radiance fields from a text prompt. I hadn't anticipated seeing large-scale generative radiance fields until much later.
Further, there is absolutely zero reason why this cannot be automated from here on out; in fact it would be trivial. When Sora is unleashed to the world at large, it will allow for the rapid creation of hyper-realistic three dimensional worlds.
Before my thoughts around radiance fields were limited to capturing the existing physical world and generating smaller objects, but what this means expands possibilities to an almost unfathomable level. You can generate hyperrealistic large scale three dimensional radiance field worlds based off of text.
This also has significant implications for existing radiance field companies with generative capabilities, such as Luma and CSM. Given this is now proven to be possible, I can only wonder what's already been created behind closed doors. If they have not been pursuing this already, I imagine they now are.