Research

Live NeRF Video Calls

Michael Rubloff

Michael Rubloff

Oct 5, 2023

Email
Copy Link
Twitter
Linkedin
Reddit
Whatsapp
Radiance Field Video Call
Radiance Field Video Call

Catching up with my sister has been an exercise in bridging distances. She recently moved to Copenhagen, trading the familiar landscapes of our shared childhood for the charming streets of the Danish capital. Our interactions now mostly consist of FaceTime calls, where screens serve as a window to each other's lives. It's a decent solution, but sometimes the two-dimensional frames make me yearn for a more immersive experience.

This yearning for connection found an unexpected echo during my recent visit to SIGGRAPH, my first ever conference of this kind. It was akin to stepping into a digital art museum, with a vast hall adorned with pioneering works from people of various niches. Drawn, as if by an unseen magnet, to a section in the back left corner of the room, I stumbled upon a spectacle that seemed like something straight out of science fiction.

At first I wasn't sure what it was, but as I watched from a distance, I saw something incredible. It was live NeRFs being created from just a single webcam. Not only that, but people were having video calls with NeRFs. The screens that were utilized are from Looking Glass and are currently available to the public.

I took a look at this paper, Real-time Radiance Fields, back in early May, but I truly did not think I would get to experience it for myself so soon. I should probably learn better, as this tends to happen repeatedly. Rereading my article, I realized it was more about general excitement than how it actually works, so let's take a look at their method.

I found this statement to be particularly funny:


We note that inferring a canonicalized 3D representation (i.e., the inferred 3D representation is frontalized and aligned) from an arbitrary RGB image while simultaneously synthesizing precise subject-specific details from the input is a highly non-trivial taskReal-Time Radiance FieldsTweet

I would have to agree, based solely upon the pure words included in that sentence. In order to pursue this challenge effectively, they break it out into two steps: creating a canonicalized 3D representation of the subject from an image and to render high-frequency person-specific details. Canonicalization in this context refers to representing the subject in a ‘standard’ form, meaning that the generated 3D model is aligned and orientated in a consistent, predefined manner regardless of how the subject is posed or oriented in the original 2D image. These two goals make sense, given the overarching goal.

It begins with creating an hybrid encoder that combines convolutional neural networks (CNNs) and Transformer models, leveraging the strengths of both architectures. They use DeepLabV3 because of its speed. It extracts low resolution data of RGB images that are shown to it, which is then mapped to that original canonical representation. Once that is complete, the information is passed onto a Vision Transformer (ViT) and CNN.

This was a very conscious choice because the ViT is able to quickly map these high resolution outputs, in a similar way as a triplane representation and it allows for high resolution feature maps for the information to pass through from original input to representation.

While this is robust, there are still additional challenges with smaller details, such as strands of hair or birthmarks. In order to regain this information, they use a second encoder. This is different from the initial encoding, focusing more on high-resolution features and using only a single downsampling stage, aiming to capture more detailed information from the image. The final step is similar again, with the new information being passed forwards into another ViT.

Now they move onto the training stage. This is where a GAN called EG3D comes into play. We took a look at GANs as part of GANeRF, if you want to learn more about how they work.

EG3D serves as a crucial component in training the described encoder-based method, as it provides synthetic data that acts as a basis for supervising the new method. Its attributes and efficient design make it a reliable source for generating synthetic data to train the encoder, ensuring the quality and efficiency of the learned representations.

A latent vector is sampled and passed through a EG3D generator to yield a corresponding triplane, 𝑻, and images are rendered from various camera parameters, 𝑷. These parameters include focal length, principal point, camera orientation and position. For each step in the training process where the model is updated, two images of the same identity are synthesized.

EG3D is a sophisticated pretrained 3D GAN, proficient in rendering 3D-aware images, utilizing hybrid triplane representation and neural volumetric rendering, with end-to-end training and superior efficiency. Its role is pivotal in the training of the encoder, involving an adversarial process where the encoder’s representations are evaluated against the original synthetic images, focusing on various aspects like color accuracy, perceptual likeness, and fine details. This ensures the encoder accurately and efficiently produces detailed and canonicalized 3D representations from 2D images. The high-quality and efficient renderings of EG3D make it an ideal base for supervising the training of the new encoder method.

This alone works great with just synthetic data, but that wouldn't be very useful to someone on a video call. Because EG3D is pre-trained, it assumes fixed values for camera roll, focal length, principal point, and distance from the subject when rendering images. In order to have it translate to the real world, these camera parameters are chosen by sampling from random distributions, introducing variability and diversity in the training data. This makes the model more robust as it’s exposed to a wider range of perspectives and variations during training. This forces the model to learn from highly variable and challenging images, enhancing its ability to understand and adapt to different perspectives and details in real-world images.

With all of these steps, it's hard to imagine how this can all be accomplished and still run in realtime to support video conferencing. But shockingly, it takes 22ms on a A100 and 40ms on a 3090. The end results in a 24 fps transmission that is able to photorealistically showcase a live person.

Original on left. Output on right.

Watching the live demonstrations, my thoughts meandered back to my sister. The screens before me were not just about technological marvels; they held the promise of a future where I could feel closer to her. Where the faces on our screens could break the boundaries of two dimensions, giving a sense of presence that our current video calls couldn't. Profile views might still be a challenge for the technology, but I couldn't help but think of how enriching it would be to see my sister's expressions in real-time 3D.



The strides being made in this field are astounding. A normal webcam paired with a consumer-level GPU has the potential to redefine our virtual interactions, making the world feel a bit smaller. Though my sister and I are miles apart, advancements like these give me hope that, in our virtual conversations, those distances could soon feel trivial.


Featured

Featured

Featured

Research

CAT3D Pounces on 3D Scene Generation

We very recently were looking at RealmDreamer, which generates scenes from prompts. Just over a month later, CAT3D, short for "Create Anything in 3D," has emerged and takes things up a notch or two.

Michael Rubloff

May 17, 2024

Research

CAT3D Pounces on 3D Scene Generation

We very recently were looking at RealmDreamer, which generates scenes from prompts. Just over a month later, CAT3D, short for "Create Anything in 3D," has emerged and takes things up a notch or two.

Michael Rubloff

May 17, 2024

Research

CAT3D Pounces on 3D Scene Generation

We very recently were looking at RealmDreamer, which generates scenes from prompts. Just over a month later, CAT3D, short for "Create Anything in 3D," has emerged and takes things up a notch or two.

Michael Rubloff

Radiancefields.com launches Job Board

The latest feature has arrived onto the site and it's with the goal of connecting top talent to companies from newly launched start ups to the world's largest companies.

Michael Rubloff

May 15, 2024

Radiancefields.com launches Job Board

The latest feature has arrived onto the site and it's with the goal of connecting top talent to companies from newly launched start ups to the world's largest companies.

Michael Rubloff

May 15, 2024

Radiancefields.com launches Job Board

The latest feature has arrived onto the site and it's with the goal of connecting top talent to companies from newly launched start ups to the world's largest companies.

Michael Rubloff

News

SIGGRAPH 2024 Program Announced

The upcoming SIGGRAPH conference catalog has been released and the conference will be filled of radiance fields!

Michael Rubloff

May 14, 2024

News

SIGGRAPH 2024 Program Announced

The upcoming SIGGRAPH conference catalog has been released and the conference will be filled of radiance fields!

Michael Rubloff

May 14, 2024

News

SIGGRAPH 2024 Program Announced

The upcoming SIGGRAPH conference catalog has been released and the conference will be filled of radiance fields!

Michael Rubloff

Research

Tri-MipRF to Rip-NeRF

Tri-MipRF was one of the more underrated NeRF papers to be released last year. Now we're seeing a progression of the work Tri-Mip created with Rip-NeRF.

Michael Rubloff

May 14, 2024

Research

Tri-MipRF to Rip-NeRF

Tri-MipRF was one of the more underrated NeRF papers to be released last year. Now we're seeing a progression of the work Tri-Mip created with Rip-NeRF.

Michael Rubloff

May 14, 2024

Research

Tri-MipRF to Rip-NeRF

Tri-MipRF was one of the more underrated NeRF papers to be released last year. Now we're seeing a progression of the work Tri-Mip created with Rip-NeRF.

Michael Rubloff

To embed a website or widget, add it to the properties panel.

Trending articles

Trending articles

Trending articles

Research

Gaustudio

Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.

Michael Rubloff

Apr 8, 2024

Research

Gaustudio

Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.

Michael Rubloff

Apr 8, 2024

Research

Gaustudio

Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.

Michael Rubloff

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Research

The MERF that turned into a SMERF

For the long time readers of this site, earlier this year, we looked into Google Research's Memory Efficient Radiance Fields (MERF). Now, they're back with another groundbreaking method: Streamable Memory Efficient Radiance Fields, or SMERF.

Michael Rubloff

Dec 13, 2023

Research

The MERF that turned into a SMERF

For the long time readers of this site, earlier this year, we looked into Google Research's Memory Efficient Radiance Fields (MERF). Now, they're back with another groundbreaking method: Streamable Memory Efficient Radiance Fields, or SMERF.

Michael Rubloff

Dec 13, 2023

Research

The MERF that turned into a SMERF

For the long time readers of this site, earlier this year, we looked into Google Research's Memory Efficient Radiance Fields (MERF). Now, they're back with another groundbreaking method: Streamable Memory Efficient Radiance Fields, or SMERF.

Michael Rubloff

Guest Article

A short 170 year history of Neural Radiance Fields (NeRF), Holograms, and Light Fields

Lightfield and hologram capture started with a big theoretical idea 115 years ago and we have struggled to make them viable ever since. Neural Radiance fields aka NeRF along with gaming computers now for the first time provide a promising easy and low cost way for everybody to capture and display lightfields.

Katrin Schmid

Mar 2, 2023

Guest Article

A short 170 year history of Neural Radiance Fields (NeRF), Holograms, and Light Fields

Lightfield and hologram capture started with a big theoretical idea 115 years ago and we have struggled to make them viable ever since. Neural Radiance fields aka NeRF along with gaming computers now for the first time provide a promising easy and low cost way for everybody to capture and display lightfields.

Katrin Schmid

Mar 2, 2023

Guest Article

A short 170 year history of Neural Radiance Fields (NeRF), Holograms, and Light Fields

Lightfield and hologram capture started with a big theoretical idea 115 years ago and we have struggled to make them viable ever since. Neural Radiance fields aka NeRF along with gaming computers now for the first time provide a promising easy and low cost way for everybody to capture and display lightfields.

Katrin Schmid

Featured

Featured

Research

Gaustudio

Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.

Michael Rubloff

Apr 8, 2024

Gaustudio

Research

Gaustudio

Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.

Michael Rubloff

Apr 8, 2024

Gaustudio

Research

Gaustudio

Michael Rubloff

Apr 8, 2024

Gaustudio

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

SplaTV

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

SplaTV

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Michael Rubloff

Mar 15, 2024

SplaTV

Research

The MERF that turned into a SMERF

For the long time readers of this site, earlier this year, we looked into Google Research's Memory Efficient Radiance Fields (MERF). Now, they're back with another groundbreaking method: Streamable Memory Efficient Radiance Fields, or SMERF.

Michael Rubloff

Dec 13, 2023

SMERF

Research

The MERF that turned into a SMERF

For the long time readers of this site, earlier this year, we looked into Google Research's Memory Efficient Radiance Fields (MERF). Now, they're back with another groundbreaking method: Streamable Memory Efficient Radiance Fields, or SMERF.

Michael Rubloff

Dec 13, 2023

SMERF

Research

The MERF that turned into a SMERF

Michael Rubloff

Dec 13, 2023

SMERF