HOSNeRF Revolutionizes 360° Free-Viewpoint Rendering of Dynamic Human-Object-Scene from a Single Video
Michael Rubloff
Apr 25, 2023
Researchers from the National University of Singapore and Tencent BCG Business School have developed a new method called HOSNeRF (Human-Object-Scene Neural Radiance Fields) that can create 360° free-viewpoint renderings of dynamic scenes with human-environment interactions from just a single video.
Neural Radiance Fields (NeRF), have made substantial progress in novel view synthesis, particularly in reconstructing static 3D scenes based on multi-view images. However, NeRFs struggle with fast and complex human-object-scene motions and interactions, limiting their applicability in dynamic scenarios. To overcome these limitations, the researchers developed the HOSNeRF, which introduces object bones and state-conditional representations to handle the non-rigid motions and interactions of humans, objects, and the environment more effectively.
The capture portion looks like something out of a Charlie Chaplin movie, but the end results are out of Upgrade.
HOSNeRF attempts (pretty successfully) solving at two challenges: complex object motions in human-object interactions and how humans interact with different objects at different times, for instance, if someone puts a book on a table and then picks it up later. They solve these two issues by introducing the new object bones into the conventional human skeleton hierarchy, which helps estimate large object deformations. For the latter, they introduce two new learnable object state embeddings that can be used as conditions for learning our human-object representation and scene representation.
Combining these yield a ~40-50% higher Learned Perceptual Image Patch Similarity (LPIPS) and it's very noticeable! In other words, HOSNeRF allows pausing a video at any time and rendering all scene details, including dynamic humans, objects, and backgrounds, from arbitrary viewpoints all from just a single video.
As the video continued on, I found myself shocked at how consistently better their method reproduced the scene.
The research team has announced that they will release the code, data, and compelling examples of 360° free-viewpoint renderings from single videos on their website, further promoting the advancement and adoption of this groundbreaking technology.
With the development of HOSNeRF, we are one step closer to bridging the gap between static and dynamic scene renderings and unlocking new possibilities in the realm of immersive experiences. As researchers continue to push the boundaries of what is possible with novel view synthesis and 360° free-viewpoint rendering, we can expect even more exciting developments and innovations in the papers to come.