Research

CAT3D Pounces on 3D Scene Generation

Michael Rubloff

Michael Rubloff

May 17, 2024

Email
Copy Link
Twitter
Linkedin
Reddit
Whatsapp
CAT3D
CAT3D

We very recently were looking at RealmDreamer, which generates scene from prompts. Just over a month later, CAT3D, short for "Create Anything in 3D," has emerged and takes things up a notch or two.

CAT3D leverages advanced multi-view diffusion models to generate highly consistent novel views from any number of input images going all the way down to one. These views are then processed using robust 3D reconstruction techniques to produce detailed 3D representations that can be rendered in real-time. Remarkably, CAT3D can create entire 3D scenes.

For all the impressiveness, there's really two main steps to CAT3D's approach: generating novel views and 3D reconstruction.

The model begins by taking conditional views as input, with each view comprising an image and its corresponding camera pose. Each input image is then encoded into a latent representation using an image variational auto-encoder. This transformation reduces the high-dimensional image data into a more manageable lower-dimensional latent space, facilitating easier processing by the model.

The diffusion model captures the joint distribution of target images based on their camera parameters. It predicts the latent representations of the target images from the input images and their camera poses. To ensure consistency among the generated views, the model employs 3D self-attention layers that connect the latents of multiple input images.

Camera poses are encoded using a raymap, which records the ray origin and direction at each spatial location. This representation is invariant to rigid transformations, ensuring that the generated views maintain accurate spatial relationships.

Once the multi-view diffusion model has been trained, it can generate a large set of synthetic views to cover the entire scene. This includes designing camera trajectories that ensure thorough and dense coverage of the scene. These trajectories must avoid passing through objects and maintain reasonable viewing angles. Four types of paths are explored: orbital, forward-facing circle, spline, and spiral trajectories.

The target viewpoints are clustered into smaller groups based on their proximity. The model generates each group independently, ensuring local consistency within each group and long-range consistency between groups.

For single-image conditioning, an autoregressive strategy is used. Initially, a set of anchor views is generated to cover the scene. Subsequent views are then generated in parallel, using the observed and anchor views as conditioning inputs.

If the base input is a single image, they look to generate 80 images to cover the area and when there's more, that range falls around 460-960 images.

Unsurprisingly, they build upon Google's in house NeRF method, Zip-NeRF, but with some additional modifications. They include a perceptual loss (LPIPS) between the rendered image and the input image, which helps in preserving textures and fine details, but also ignores potential inconsistencies. The losses for generated views are weighted based on their distance to the nearest observed view. This approach ensures that views closer to the input images have a greater influence on the reconstruction process.

Impressively, CAT3D can create full scenes in roughly about a minute. We need to recognize that the ceiling on this technology is still unknown. Only a year ago, reducing generation time from hours to a single hour was impressive, and that was for individual objects.

Cat3D also benchmarks to Reconfusion, which was an awesome paper from the Google team late last year, that explored diffusion priors. CAT3D exceeded ReconFusion's fidelity in every experiment run.

CAT3D also benchmarks against ReconFusion, an impressive paper from the Google team last year that explored diffusion priors.

To train CAT3D, you do need 16 A100 GPUs, but I do not believe the takeaway is that you need a large scale workstation to run this. I continue to believe that these are very positive indicators of the ability of the methods to function and to be quickly optimized. With Google also recently announcing Veo, I have to imagine how the two might play into one another.

For more information visit the CAT3D project page. They additionally have some interactive demos of outputs that were converted to Gaussian Splatting for people to try out!

Featured

Featured

Featured

Platforms

Varjo Unveils Teleport

Finnish VR company Varjo has announced 3DGS platform, Teleport.

Michael Rubloff

Jun 18, 2024

Platforms

Varjo Unveils Teleport

Finnish VR company Varjo has announced 3DGS platform, Teleport.

Michael Rubloff

Jun 18, 2024

Platforms

Varjo Unveils Teleport

Finnish VR company Varjo has announced 3DGS platform, Teleport.

Michael Rubloff

Platforms

Wayve Announces PRISM-1

Wayve, has announced the launch of PRISM-1, an innovative 4D reconstruction model designed to significantly enhance the testing and training of advanced driver assistance systems and autonomous driving technology.

Michael Rubloff

Jun 17, 2024

Platforms

Wayve Announces PRISM-1

Wayve, has announced the launch of PRISM-1, an innovative 4D reconstruction model designed to significantly enhance the testing and training of advanced driver assistance systems and autonomous driving technology.

Michael Rubloff

Jun 17, 2024

Platforms

Wayve Announces PRISM-1

Wayve, has announced the launch of PRISM-1, an innovative 4D reconstruction model designed to significantly enhance the testing and training of advanced driver assistance systems and autonomous driving technology.

Michael Rubloff

Platforms

Nerfstudio Gradio UI Added

Nerfstudio now includes a Gradio UI, making it easy and intuitive to set commands and run scripts.

Michael Rubloff

Jun 14, 2024

Platforms

Nerfstudio Gradio UI Added

Nerfstudio now includes a Gradio UI, making it easy and intuitive to set commands and run scripts.

Michael Rubloff

Jun 14, 2024

Platforms

Nerfstudio Gradio UI Added

Nerfstudio now includes a Gradio UI, making it easy and intuitive to set commands and run scripts.

Michael Rubloff

Platforms

Graswald.ai: 3DGS and Ecommerce Platform

Graswald is a 3D product visualization platform using Gaussian Splatting, aimed primarily at the Ecommerce industry to offer hyper realistic detail at a fraction of current time and cost. 

Michael Rubloff

Jun 13, 2024

Platforms

Graswald.ai: 3DGS and Ecommerce Platform

Graswald is a 3D product visualization platform using Gaussian Splatting, aimed primarily at the Ecommerce industry to offer hyper realistic detail at a fraction of current time and cost. 

Michael Rubloff

Jun 13, 2024

Platforms

Graswald.ai: 3DGS and Ecommerce Platform

Graswald is a 3D product visualization platform using Gaussian Splatting, aimed primarily at the Ecommerce industry to offer hyper realistic detail at a fraction of current time and cost. 

Michael Rubloff

Trending articles

Trending articles

Trending articles

Platforms

Nerfstudio Releases gsplat 1.0

Just in time for your weekend, Ruilong Li and the team at Nerfstudio are bringing a big gift.

Michael Rubloff

Jun 7, 2024

Platforms

Nerfstudio Releases gsplat 1.0

Just in time for your weekend, Ruilong Li and the team at Nerfstudio are bringing a big gift.

Michael Rubloff

Jun 7, 2024

Platforms

Nerfstudio Releases gsplat 1.0

Just in time for your weekend, Ruilong Li and the team at Nerfstudio are bringing a big gift.

Michael Rubloff

Platforms

Google CloudNeRF: Zip-NeRF and CamP in the Cloud

It doesn't seem like a lot of people know this, but you can run CamP and Zip-NeRF in the cloud, straight through Google and it's actually super easy. It’s called CloudNeRF.

Michael Rubloff

May 8, 2024

Platforms

Google CloudNeRF: Zip-NeRF and CamP in the Cloud

It doesn't seem like a lot of people know this, but you can run CamP and Zip-NeRF in the cloud, straight through Google and it's actually super easy. It’s called CloudNeRF.

Michael Rubloff

May 8, 2024

Platforms

Google CloudNeRF: Zip-NeRF and CamP in the Cloud

It doesn't seem like a lot of people know this, but you can run CamP and Zip-NeRF in the cloud, straight through Google and it's actually super easy. It’s called CloudNeRF.

Michael Rubloff

Research

Gaustudio

Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.

Michael Rubloff

Apr 8, 2024

Research

Gaustudio

Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.

Michael Rubloff

Apr 8, 2024

Research

Gaustudio

Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.

Michael Rubloff

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Mar 15, 2024

Tools

splaTV: Dynamic Gaussian Splatting Viewer

Kevin Kwok, perhaps better known as Antimatter15, has released something amazing: splaTV.

Michael Rubloff

Featured

Featured

Platforms

Nerfstudio Releases gsplat 1.0

Just in time for your weekend, Ruilong Li and the team at Nerfstudio are bringing a big gift.

Michael Rubloff

Jun 7, 2024

Nerfstudio

Platforms

Nerfstudio Releases gsplat 1.0

Just in time for your weekend, Ruilong Li and the team at Nerfstudio are bringing a big gift.

Michael Rubloff

Jun 7, 2024

Nerfstudio

Platforms

Nerfstudio Releases gsplat 1.0

Michael Rubloff

Jun 7, 2024

Nerfstudio

Platforms

Google CloudNeRF: Zip-NeRF and CamP in the Cloud

It doesn't seem like a lot of people know this, but you can run CamP and Zip-NeRF in the cloud, straight through Google and it's actually super easy. It’s called CloudNeRF.

Michael Rubloff

May 8, 2024

Platforms

Google CloudNeRF: Zip-NeRF and CamP in the Cloud

It doesn't seem like a lot of people know this, but you can run CamP and Zip-NeRF in the cloud, straight through Google and it's actually super easy. It’s called CloudNeRF.

Michael Rubloff

May 8, 2024

Platforms

Google CloudNeRF: Zip-NeRF and CamP in the Cloud

Michael Rubloff

May 8, 2024

Research

Gaustudio

Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.

Michael Rubloff

Apr 8, 2024

Gaustudio

Research

Gaustudio

Gaussian Splatting methods have continued to pour in over the first three months of the year. With the rate of adoption, being able to merge and compare these methods, shortly after their release would be amazing.

Michael Rubloff

Apr 8, 2024

Gaustudio

Research

Gaustudio

Michael Rubloff

Apr 8, 2024

Gaustudio