Reconstructing category-specific objects using Neural Radiance Field (NeRF) from a single image is a promising yet challenging task. Existing approaches predominantly rely on projection-based feature retrieval to associate 3D points in the radiance field with local image features from the reference image. However, this process is computationally expensive, dependent on known camera intrinsics, and susceptible to occlusions. To address these limitations, we propose Variable Radiance Field (VRF), a novel framework capable of efficiently reconstructing category-specific objects without requiring known camera intrinsics and demonstrating robustness against occlusions. First, we replace the local feature retrieval with global latent representations, generated through a single feed-forward pass, which improves efficiency and eliminates reliance on camera intrinsics. Second, to tackle coordinate inconsistencies inherent in real-world dataset, we define a canonical space by introducing a learnable, category-specific shape template and explicitly aligning each training object to this template using a learnable 3D transformation. This approach also reduces the complexity of geometry prediction to modeling deformations from the template to individual instances. Finally, we employ a hyper-network-based method for efficient NeRF creation and enhance the reconstruction performance through a contrastive learning-based pretraining strategy. Evaluations on the CO3D dataset demonstrate that VRF achieves state-of-the-art performance in both reconstruction quality and computational efficiency.

PDF URL