NeO 360 for Outdoor NeRFs

Michael Rubloff

Michael Rubloff

Aug 30, 2023

Email
Copy Link
Twitter
Linkedin
Reddit
Whatsapp
NeO 360
NeO 360

Recently, we've seen Google publish blogs on implementing NeRFs into Google Maps and Immersive View. At the end of their post, they gave a sneak peak of something—an outdoor scene. The challenge differs from say an indoor scene, where it's significantly easier to both control the shooting environment and generate enough variety of views to put together a complete representation.

The research community is obviously very aware of this challenge and thus presents today's paper, Neural fields for sparse view synthesis of outdoor scenes (NeO 360). What's amazing is that it offers the ability to reconstruct a 360 scene off of just a few input images, scaling all the way down to a single image.

"The essence of our approach is in capturing the distribution of complex
real-world outdoor 3D scenes and using a hybrid image-
conditional triplanar representation that can be queried
from any world point."NeO 360 Paper

How NeO 360 works

NeO 360 employs a convolutional neural network (CNN) trained on ImageNet to extract features from the input images. This network helps transform images into a format that's easier for the system to analyze and work with, a 2D feature map. They also use conditional triplanes to estimate the scene and its representation. The essence here is to translate the images into a kind of "language" or representation that the system can more effectively understand and manipulate.

For a realistic 3D rendering, both local features (like the texture of a sofa) and broader, global features (like the layout of a room) are crucial. NeO 360 smartly combines these two types of information. Too much of one can lead to unrealistic renderings, like ghostly images, occlusions, or overlooked details.

They then use decoded radiance fields, which assigns a color and a density value for any particular space and viewing direction in the 3D scene. The difference is that it uses local and global features for conditioning. They use the view space of the input image, and then indicate the positions
and camera rays to build the coordinate system.

Two dedicated rendering MLPs are used for extracting and decoding the color and density information. Remember, not every point is solid or tangible. Even the air has a radiance (think about how sunlight looks when shining through mist or dust in the air).

They compared NeO 360 to other sparse input NeRF methods including Pixel NeRF and MVSNeRF using PSNR, LPIPS, and SSIM. NeO 360 was able to outperform all the tested methods.

In conjunction with the NeO 360 method, they also introduce the NeRF for Reconstruction, Decomposition and Scene Synthesis of 360◦ outdoor scene (NeRDS) dataset. The NeRDS dataset contains 75 unbounded scenes across 3 locations.

The dataset aims for diversity in the content that is contained in it as well as the shape of the objects. They are also shown in different lighting and shadow conditions. Some other smaller details include streetlamps and trees.

For me, getting to see large scale, unbounded NeRFs outdoors represents something very exciting. It's a way to visualize our world in full photorealistic detail. It's additionally fascinating that as these methods get stronger and stronger, they can be retrained using existing data. It feels like to me as these datasets are compiled, they are the weakest they'll ever be. I've been doing a lot of thinking about datasets recently and it represents an exciting, albeit a bit optimistic view that progress stacks.

It's easy to see how NeO 360 will affect autonomous driving and I'm equally as excited about wayfinding applications ala Google Maps.

Featured

Recents

Featured