Research

Live NeRF Video Calls

Michael Rubloff

Oct 5, 2023

Radiance Field Video Call
Radiance Field Video Call

Catching up with my sister has been an exercise in bridging distances. She recently moved to Copenhagen, trading the familiar landscapes of our shared childhood for the charming streets of the Danish capital. Our interactions now mostly consist of FaceTime calls, where screens serve as a window to each other's lives. It's a decent solution, but sometimes the two-dimensional frames make me yearn for a more immersive experience.

This yearning for connection found an unexpected echo during my recent visit to SIGGRAPH, my first ever conference of this kind. It was akin to stepping into a digital art museum, with a vast hall adorned with pioneering works from people of various niches. Drawn, as if by an unseen magnet, to a section in the back left corner of the room, I stumbled upon a spectacle that seemed like something straight out of science fiction.

At first I wasn't sure what it was, but as I watched from a distance, I saw something incredible. It was live NeRFs being created from just a single webcam. Not only that, but people were having video calls with NeRFs. The screens that were utilized are from Looking Glass and are currently available to the public.

I took a look at this paper, Real-time Radiance Fields, back in early May, but I truly did not think I would get to experience it for myself so soon. I should probably learn better, as this tends to happen repeatedly. Rereading my article, I realized it was more about general excitement than how it actually works, so let's take a look at their method.

I found this statement to be particularly funny:


We note that inferring a canonicalized 3D representation (i.e., the inferred 3D representation is frontalized and aligned) from an arbitrary RGB image while simultaneously synthesizing precise subject-specific details from the input is a highly non-trivial taskReal-Time Radiance FieldsTweet

I would have to agree, based solely upon the pure words included in that sentence. In order to pursue this challenge effectively, they break it out into two steps: creating a canonicalized 3D representation of the subject from an image and to render high-frequency person-specific details. Canonicalization in this context refers to representing the subject in a ‘standard’ form, meaning that the generated 3D model is aligned and orientated in a consistent, predefined manner regardless of how the subject is posed or oriented in the original 2D image. These two goals make sense, given the overarching goal.

It begins with creating an hybrid encoder that combines convolutional neural networks (CNNs) and Transformer models, leveraging the strengths of both architectures. They use DeepLabV3 because of its speed. It extracts low resolution data of RGB images that are shown to it, which is then mapped to that original canonical representation. Once that is complete, the information is passed onto a Vision Transformer (ViT) and CNN.

This was a very conscious choice because the ViT is able to quickly map these high resolution outputs, in a similar way as a triplane representation and it allows for high resolution feature maps for the information to pass through from original input to representation.

While this is robust, there are still additional challenges with smaller details, such as strands of hair or birthmarks. In order to regain this information, they use a second encoder. This is different from the initial encoding, focusing more on high-resolution features and using only a single downsampling stage, aiming to capture more detailed information from the image. The final step is similar again, with the new information being passed forwards into another ViT.

Now they move onto the training stage. This is where a GAN called EG3D comes into play. We took a look at GANs as part of GANeRF, if you want to learn more about how they work.

EG3D serves as a crucial component in training the described encoder-based method, as it provides synthetic data that acts as a basis for supervising the new method. Its attributes and efficient design make it a reliable source for generating synthetic data to train the encoder, ensuring the quality and efficiency of the learned representations.

A latent vector is sampled and passed through a EG3D generator to yield a corresponding triplane, 𝑻, and images are rendered from various camera parameters, 𝑷. These parameters include focal length, principal point, camera orientation and position. For each step in the training process where the model is updated, two images of the same identity are synthesized.

EG3D is a sophisticated pretrained 3D GAN, proficient in rendering 3D-aware images, utilizing hybrid triplane representation and neural volumetric rendering, with end-to-end training and superior efficiency. Its role is pivotal in the training of the encoder, involving an adversarial process where the encoder’s representations are evaluated against the original synthetic images, focusing on various aspects like color accuracy, perceptual likeness, and fine details. This ensures the encoder accurately and efficiently produces detailed and canonicalized 3D representations from 2D images. The high-quality and efficient renderings of EG3D make it an ideal base for supervising the training of the new encoder method.

This alone works great with just synthetic data, but that wouldn't be very useful to someone on a video call. Because EG3D is pre-trained, it assumes fixed values for camera roll, focal length, principal point, and distance from the subject when rendering images. In order to have it translate to the real world, these camera parameters are chosen by sampling from random distributions, introducing variability and diversity in the training data. This makes the model more robust as it’s exposed to a wider range of perspectives and variations during training. This forces the model to learn from highly variable and challenging images, enhancing its ability to understand and adapt to different perspectives and details in real-world images.

With all of these steps, it's hard to imagine how this can all be accomplished and still run in realtime to support video conferencing. But shockingly, it takes 22ms on a A100 and 40ms on a 3090. The end results in a 24 fps transmission that is able to photorealistically showcase a live person.

Original on left. Output on right.

Watching the live demonstrations, my thoughts meandered back to my sister. The screens before me were not just about technological marvels; they held the promise of a future where I could feel closer to her. Where the faces on our screens could break the boundaries of two dimensions, giving a sense of presence that our current video calls couldn't. Profile views might still be a challenge for the technology, but I couldn't help but think of how enriching it would be to see my sister's expressions in real-time 3D.



The strides being made in this field are astounding. A normal webcam paired with a consumer-level GPU has the potential to redefine our virtual interactions, making the world feel a bit smaller. Though my sister and I are miles apart, advancements like these give me hope that, in our virtual conversations, those distances could soon feel trivial.


Featured

Featured

Featured

Research

Shrinking 3DGS File Size

Gaussian Splatting has quickly become one of the most exciting research topics in Radiance Fields, thanks to its fast training, real time rendering rates, and easy to create pipeline. The one critique that emerged was the resulting file size from captures, often venturing into the high hundreds of megabytes and up.

Michael Rubloff

Apr 11, 2024

Research

Shrinking 3DGS File Size

Gaussian Splatting has quickly become one of the most exciting research topics in Radiance Fields, thanks to its fast training, real time rendering rates, and easy to create pipeline. The one critique that emerged was the resulting file size from captures, often venturing into the high hundreds of megabytes and up.

Michael Rubloff

Apr 11, 2024

Research

Shrinking 3DGS File Size

Gaussian Splatting has quickly become one of the most exciting research topics in Radiance Fields, thanks to its fast training, real time rendering rates, and easy to create pipeline. The one critique that emerged was the resulting file size from captures, often venturing into the high hundreds of megabytes and up.

Michael Rubloff

4/11/24

Platforms

Luma AI Android Released

Native Android support from Luma AI is finally here. Of all the questions about Luma features I get, Android support is routinely at the top of the list.

Michael Rubloff

Apr 10, 2024

Platforms

Luma AI Android Released

Native Android support from Luma AI is finally here. Of all the questions about Luma features I get, Android support is routinely at the top of the list.

Michael Rubloff

Apr 10, 2024

Platforms

Luma AI Android Released

Native Android support from Luma AI is finally here. Of all the questions about Luma features I get, Android support is routinely at the top of the list.

Michael Rubloff

4/10/24

Research

PhysAvatar's Dynamic Dances

Playing as yourself in a video game has always seemed like a fun idea. Now, we're one step closer to making that a reality with PhysAvatar.

Michael Rubloff

Apr 9, 2024

Research

PhysAvatar's Dynamic Dances

Playing as yourself in a video game has always seemed like a fun idea. Now, we're one step closer to making that a reality with PhysAvatar.

Michael Rubloff

Apr 9, 2024

Research

PhysAvatar's Dynamic Dances

Playing as yourself in a video game has always seemed like a fun idea. Now, we're one step closer to making that a reality with PhysAvatar.

Michael Rubloff

4/9/24

Research

RealmDreamer's Generative Scenes

Since the unveiling of the Sora's large-scale generative Radiance Fields, the tech world has been buzzing with anticipation about the future of 3D scene generation. There hasn't been much public work since then showcasing what could be coming, but today we're looking at RealmDreamer, which creates scene level generations based on original text prompts.

Michael Rubloff

Apr 11, 2024

Research

RealmDreamer's Generative Scenes

Since the unveiling of the Sora's large-scale generative Radiance Fields, the tech world has been buzzing with anticipation about the future of 3D scene generation. There hasn't been much public work since then showcasing what could be coming, but today we're looking at RealmDreamer, which creates scene level generations based on original text prompts.

Michael Rubloff

Apr 11, 2024

Research

RealmDreamer's Generative Scenes

Since the unveiling of the Sora's large-scale generative Radiance Fields, the tech world has been buzzing with anticipation about the future of 3D scene generation. There hasn't been much public work since then showcasing what could be coming, but today we're looking at RealmDreamer, which creates scene level generations based on original text prompts.

Michael Rubloff

4/11/24

Trending articles

Trending articles

Trending articles

Platforms

Luma AI Android Released

Native Android support from Luma AI is finally here. Of all the questions about Luma features I get, Android support is routinely at the top of the list.

Michael Rubloff

Apr 10, 2024

Platforms

Luma AI Android Released

Native Android support from Luma AI is finally here. Of all the questions about Luma features I get, Android support is routinely at the top of the list.

Michael Rubloff

Apr 10, 2024

Platforms

Luma AI Android Released

Native Android support from Luma AI is finally here. Of all the questions about Luma features I get, Android support is routinely at the top of the list.

Michael Rubloff

Apr 10, 2024

Research

RadSplat's Hybrid NeRFs and 3DGS

Excitingly, we're seeing the arrival of the first result of another widely hyped event. The meeting of NeRFs and Gaussian Splatting.

Michael Rubloff

Mar 21, 2024

Research

RadSplat's Hybrid NeRFs and 3DGS

Excitingly, we're seeing the arrival of the first result of another widely hyped event. The meeting of NeRFs and Gaussian Splatting.

Michael Rubloff

Mar 21, 2024

Research

RadSplat's Hybrid NeRFs and 3DGS

Excitingly, we're seeing the arrival of the first result of another widely hyped event. The meeting of NeRFs and Gaussian Splatting.

Michael Rubloff

Mar 21, 2024

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

Interview

Gaussian Splatting Brings Art Exhibitions Online with Yulei

The advent of radiance fields represents a transformative leap in event photography, aiming to give people the feeling of attending an event asynchronously. Artist Yulei's recent demonstration serves as a compelling example of this technological progression.

Michael Rubloff

Feb 22, 2024

Interview

Gaussian Splatting Brings Art Exhibitions Online with Yulei

The advent of radiance fields represents a transformative leap in event photography, aiming to give people the feeling of attending an event asynchronously. Artist Yulei's recent demonstration serves as a compelling example of this technological progression.

Michael Rubloff

Feb 22, 2024

Interview

Gaussian Splatting Brings Art Exhibitions Online with Yulei

The advent of radiance fields represents a transformative leap in event photography, aiming to give people the feeling of attending an event asynchronously. Artist Yulei's recent demonstration serves as a compelling example of this technological progression.

Michael Rubloff

Feb 22, 2024

Recent articles

Recent articles

Research

Shrinking 3DGS File Size

Gaussian Splatting has quickly become one of the most exciting research topics in Radiance Fields, thanks to its fast training, real time rendering rates, and easy to create pipeline. The one critique that emerged was the resulting file size from captures, often venturing into the high hundreds of megabytes and up.

Michael Rubloff

Apr 11, 2024

3dgs compress

Research

Shrinking 3DGS File Size

Gaussian Splatting has quickly become one of the most exciting research topics in Radiance Fields, thanks to its fast training, real time rendering rates, and easy to create pipeline. The one critique that emerged was the resulting file size from captures, often venturing into the high hundreds of megabytes and up.

Michael Rubloff

Apr 11, 2024

3dgs compress

Platforms

Luma AI Android Released

Native Android support from Luma AI is finally here. Of all the questions about Luma features I get, Android support is routinely at the top of the list.

Michael Rubloff

Apr 10, 2024

Platforms

Luma AI Android Released

Native Android support from Luma AI is finally here. Of all the questions about Luma features I get, Android support is routinely at the top of the list.

Michael Rubloff

Apr 10, 2024

Research

PhysAvatar's Dynamic Dances

Playing as yourself in a video game has always seemed like a fun idea. Now, we're one step closer to making that a reality with PhysAvatar.

Michael Rubloff

Apr 9, 2024

Research

PhysAvatar's Dynamic Dances

Playing as yourself in a video game has always seemed like a fun idea. Now, we're one step closer to making that a reality with PhysAvatar.

Michael Rubloff

Apr 9, 2024