Unleashing the Power of HumanRF: High-Fidelity Dynamic Human NeRFs
Michael Rubloff
May 12, 2023
3D reconstructions of humans have been notoriously tough because of the large amount of minute details. Additionally, we have a very low tolerance when something looks out of place. It becomes exponentially more difficult when you introduce dynamic motion, facial expressions, and clothing textures to the scene. Synthesia aims to solve some of those challenges with the introduction of HumanRF and the accompanying dataset, ActorsHQ.
HumanRF captures and reproduces full-body human motion from multiple viewpoints using multi-view video input, presenting an astonishingly lifelike video. The extraordinary capabilities of HumanRF lie in its innovative representation that captures minute details with impressive compression rates. This is accomplished by factorizing space and time into a temporal matrix-vector decomposition.
HumanRF distinguishes itself by its precise reconstruction of human actors over extended sequences, capturing high-resolution details even amidst challenging motion. While the majority of research has been restricted to synthesizing 4MP or lower resolutions, HumanRF is setting new benchmarks by operating at a stunning 12MP. The amount of detail can be seen through the sweater, as it flows with motion.
Accompanying HumanRF is the unique multi-view dataset, ActorsHQ. Providing 12MP footage from 160 cameras for 16 sequences with high-fidelity, per-frame mesh reconstructions, ActorsHQ is designed to tackle the complex challenges posed by high-resolution data. ActorsHQ also cuts down on any motion blur by having a shutter speed of approximately 1/1,538. By effectively leveraging this data, ActorsHQ enhances the capabilities of HumanRF. Just like NeRSemble, this is another new high quality dataset that will be available. Luckily, this one will be smaller than 205 terabytes, coming in with just under 40,000 images. This is another exciting dynamic human NeRF progression, but unlike HOSNeRF, HumanRF requires several cameras to composite into a figure.
Nevertheless, the promising breakthrough of HumanRF and ActorsHQ comes with its own challenges. The technology relies heavily on the ActorsHQ dataset and optimizes a separate radiance field for each sequence. Future efforts could explore training a model on high-end recordings to target monocular-only test sequences, achieve explicit control over actor articulation, and speed up render times. There is also room for smaller detail improvements such as hair strands.
Despite these challenges, HumanRF and ActorsHQ mark a phenomenal advancement in human performance capture. As we continue to refine this technology, we inch closer to a future where digital representations of humans are practically indistinguishable from reality. The potential applications of this innovative HumanRF technology span across various industries, including broadcasting, film production, video games, virtual reality, and video conferencing.
While having a video conferencing set up with several cameras isn't exactly feasible, HumanRF does close the gap quite a bit to seeing what's possible. I am curious to see how the broadcasting industry might be able to leverage HumanRF and its successors to bring more intimate broadcasts and interviews into homes. I see this being applied in broadcasting to make interviews more immersive and lifelike, bringing the audience closer than ever before. Seeing the outputs makes me excited for the future where 3D photorealism of humans becomes less and less challenging and more and more commonplace.