What are the NeRF Metrics?

Michael Rubloff

Michael Rubloff

Jun 12, 2023

Email
Copy Link
Twitter
Linkedin
Reddit
Whatsapp
PSNR
PSNR

I've become more accustomed to seeing repeating words and phrases within NeRF papers. Some of the most common are PSNR, SSIM, and LPIPS. All of them are used to evaluate the quality of each NeRF method within a given scene.

However, all three have been in existence well below NeRFs, so today I figured I would give a quick article on what the three terms are and measure, as well as highlight a new potential way to measure NeRF quality from a group of researchers at the University of Bristol.

While NeRF implementations can produce high-quality visual results, evaluating their performance accurately remains a challenge. Conventional evaluation methods provide approximate indicators and may not capture specific aspects of NeRFs. To address this issue, a group of researchers has proposed a new test framework that isolates the neural rendering network from the NeRF pipeline for more reliable evaluation.

What is PSNR (Peak Signal to Noise Ratio)?

Peak Signal-to-Noise Ratio (PSNR) is a metric widely used in the field of image and video processing to measure the quality of reconstructed (i.e., compressed and then decompressed) images or videos. It evaluates on a color-wise basis, which does have a bit of overlap with NeRFs view dependent colors. The higher the PSNR, the better is the quality of the compressed or reconstructed image, video, or NeRF.

What is SSIM (Structural Similarity Index Measure)?

SSIM is used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as reference.

SSIM considers changes in structural information, perceived luminance, and contrast that can occur when images are subjected to various types of distortion. It aims to reflect the human visual system's perception more closely than simpler metrics like PSNR or mean squared error (MSE).

As SSIM computations are performed on image patches, they allow for some misalignment between the synthesized and reference images. This is helpful for evaluation as there may be variations between the NeRF camera model and the real camera used for capturing the training images.

What is LPIPS (Learned Perceptual Image Patch Similarity)?

Finally, Learned Perceptual Image Patch Similarity (LPIPS) has gained popularity in areas such as frame interpolation, measuring the similarity between features of two images extracted from a pretrained network.

Learned Perceptual Image Patch Similarity (LPIPS) is a perceptual metric that quantifies the human-perceived similarity between two images. Unlike traditional metrics such as PSNR (Peak Signal to Noise Ratio) and SSIM (Structural Similarity Index Measure), which calculate differences based on raw pixel values or simple transformations thereof, LPIPS leverages deep learning to better align with human visual perception. It uses the distance between features extracted by a convolutional neural network (CNN) pretrained on an image classification task as a perceptual metric.

What is Whole-scene Average Prediction Error (WAPE)?

With the three above metrics defined, a group of researchers at the University of Bristol advocate for a new metric that is NeRF specific: Whole-scene Average Prediction Error (WAPE).

The research paper introduces a configurable approach for generating representations specifically for evaluation purposes. This approach utilizes ray-casting to transform mesh models into explicit NeRF samples and "shade" these representations. By combining these methods, the researchers demonstrate how different scenes and types of networks can be evaluated within this framework. They also propose a novel metric called the Whole-scene Average Prediction Error (WAPE) to measure task complexity, considering visual parameters and the distribution of spatial data.

This framework first isolates the neural rendering network from the NeRF pipeline and then performs a parametric evaluation by training and evaluating the NeRF on an explicit radiance field representation.

By isolating the rendering network from the entire pipeline, it gives more tailored feedback to underlying errors within the reconstruction and gain a deeper understanding of how well each method works. For instance,

"as image-based metrics only evaluate the prediction quality via a 2-D projection, this can still result in loss of information about the accuracy of spatial samples in relation to their distribution in volumetric space."

The proposed framework addresses the limitations of image-based metrics by providing a parametric evaluation that compares the learnable outputs of INR rendering networks against ground truth data. By generating synthetic radiance fields from mesh-based representations and applying ray tracing, the researchers accurately represent ground truths and enhance the quality of visual features.

In theory, the more challenging data is to get a high quality NeRF, it should be graded differently to compensate. With that in mind, they introduce my favorite part of the paper, a new metric for evaluating task complexity.

This takes into account the number of input samples, the relative distribution of novel views and training views, and the functional complexity of the chosen ray tracing algorithm(s). The NeRFs that I have processed thus far have all been different in size and complexity and I believe this should be reflected in evaluation. It's also interesting that they interpret complexity as not only stemming from input views, but also extend it to the underlying positional distribution of ray samples.

As continues to be an emerging trend, WAPE's code works with Nerfstudio, though it doesn't appear to have been published yet.

Only time will tell if the NeRF community will embrace WAPE and begin weighing quality markers with it. It has been fascinating to weigh each method's PSNR, SSIM, and LPIPS scores, but if there's a way to get more accurate metrics and identify areas of NeRF specific improvement more efficiently, I'm all for it.

Featured

Recents

Featured

Platforms

GSOPs 2.0: Now Commercially Viable with Houdini Commercial License

The 2.0 release for GSOPs is here with a commercial license!

Michael Rubloff

Dec 20, 2024

Platforms

GSOPs 2.0: Now Commercially Viable with Houdini Commercial License

The 2.0 release for GSOPs is here with a commercial license!

Michael Rubloff

Dec 20, 2024

Platforms

GSOPs 2.0: Now Commercially Viable with Houdini Commercial License

The 2.0 release for GSOPs is here with a commercial license!

Michael Rubloff

Platforms

Odyssey Announces Generative World Model, Explorer

Odyssey shows off their photo real world generator, powered by Radiance Fields.

Michael Rubloff

Dec 18, 2024

Platforms

Odyssey Announces Generative World Model, Explorer

Odyssey shows off their photo real world generator, powered by Radiance Fields.

Michael Rubloff

Dec 18, 2024

Platforms

Odyssey Announces Generative World Model, Explorer

Odyssey shows off their photo real world generator, powered by Radiance Fields.

Michael Rubloff

Platforms

PICO Splat for Unreal Engine Plugin

The Unreal Engine plugin for Pico headsets has been released in beta.

Michael Rubloff

Dec 13, 2024

Platforms

PICO Splat for Unreal Engine Plugin

The Unreal Engine plugin for Pico headsets has been released in beta.

Michael Rubloff

Dec 13, 2024

Platforms

PICO Splat for Unreal Engine Plugin

The Unreal Engine plugin for Pico headsets has been released in beta.

Michael Rubloff

Research

HLOC + GLOMAP Repo

A GitHub repo from Pablo Vela has integrated GLOMAP with HLOC.

Michael Rubloff

Dec 10, 2024

Research

HLOC + GLOMAP Repo

A GitHub repo from Pablo Vela has integrated GLOMAP with HLOC.

Michael Rubloff

Dec 10, 2024

Research

HLOC + GLOMAP Repo

A GitHub repo from Pablo Vela has integrated GLOMAP with HLOC.

Michael Rubloff