Tripo AI, the company behind Triplane Gaussians has announced a partnership with Stability AI on the release of their newest state of the art image to 3D model, TripoSR.
You could already generate decently high fidelity 3D objects using Triplane Gaussians, but they've now stepped it up a notch. Built on inspiration from LRM: Large Reconstruction Model for Single Image to 3D, which uses NeRFs as part of their backbone, they're able to generate a 3D object within .5 seconds on a NVIDIA A100. However, it's not necessary to use such a powerful GPU to power TripoSR.
The initial phase involves an image encoder that employs a pre-trained vision transformer model to convert an image into latent vectors. These vectors encapsulate the critical global and local features of the image necessary for the reconstruction process.
TripoSR's transformation from a two-dimensional image to a three-dimensional model is facilitated by its image-to-triplane decoder. This innovative decoder converts the latent vectors into a triplane-NeRF representation, a method capable of capturing complex object shapes and textures. The decoder comprises transformer layers with self-attention and cross-attention mechanisms, enabling it to interpret the relationships within the triplane representation and integrate the image features encoded by the latent vectors effectively.
The triplane-based neural radiance field (NeRF) is where TripoSR’s process culminates, tasked with determining the color and density of points within the 3D space. Through this, the abstract triplane representation is materialized into the 3D mesh. This component utilizes multilayer perceptrons to infuse the final model with realistic textures and colors, closely mirroring the physical objects.
Notably, it adopts an unconventional approach to camera parameter estimation, enhancing the model's flexibility and adaptability to various imaging scenarios. The model also introduces triplane channel optimization and a unique mask loss function during training. These advancements serve to refine the reconstruction quality, optimizing the balance between computational demand and the accuracy of the generated models.
TripoSR already has made a Hugging Face demo already available for people to try out for themselves and yes, it's very fast.
With the model, source code, and an interactive demo readily available under the MIT license, TripoSR is poised to empower a wide range of users—from researchers and developers to creatives—eager to explore the latest advancements in 3D generative AI. We'll have to see if it ends up making an appearance in Comfy3D.