
Michael Rubloff
Sep 22, 2025
At SIGGRAPH 2025, one theme resonated across the show floor: the accelerating convergence of radiance field research, AI, and human communication. Few people are closer to this shift than Dr. Shalini De Mello, Director of Research at NVIDIA, who leads NVIDIA’s AI-Mediated Reality and Interaction Research Group.
Her work spans the cutting edge of neural representations, dynamic avatars, and 4D scene reconstruction, including QUEEN, a system for compressing and streaming dynamic Gaussian splats, and most recently GAIA, a generative model for realistic 3D head avatars. Together, these projects hint at a future where digital telepresence is immersive, lifelike, and instantaneous.
I sat down with Dr. De Mello at SIGGRAPH to talk about her team’s latest research, why she believes 3D telepresence will reshape communication, and how the next wave of imaging technologies may bridge the physical and digital worlds.
Michael: I am here at SIGGRAPH 2025 with Director of Research at NVIDIA, Dr. Shalini De Mello. Thank you so much for joining me today.
Shalini: My pleasure. Thank you for having me.
Michael: I wanted to first start out with a general introduction about yourself, because you’re working on some really incredible projects around humans and dynamic avatars. Would you mind giving a quick introduction?
Shalini: Sure, thank you. I’m a Director of Research at NVIDIA. I’m part of the larger NVIDIA Research organization, and I lead a team on AI-mediated reality and interaction research. Our mission is to enable human and computer interaction grounded in the physical reality of 3D spaces.
Michael: Very cool to have you here. And it’s amazing to see some of the work your lab has been putting out lately. One of the projects I really want to touch on first is QUEEN. For people who don’t know, you’re able to reconstruct very lifelike 3D avatars of people that are dynamic. I saw the demo at GTC earlier this year and now you’re showing it again here at SIGGRAPH. Could you give a quick overview?
Shalini: Thank you for asking. The basic idea behind QUEEN is that Gaussian Splatting enables us to reconstruct 3D worlds in very realistic form. If you capture 3D scenes at multiple time points, you get 4D videos, dynamic scenes changing over time.
But if you try to transmit these 4D Gaussian packets as-is, they’re extremely large. You simply cannot stream them in real time. So we wanted to figure out how to compress 4D Gaussians in a streaming fashion.
QUEEN applies ideas from 2D video compression to the 4D Gaussian space by exploiting temporal redundancy. Most of the scene doesn’t change frame to frame, so we only encode the residuals — the deltas over time. With that, we achieve up to 100× compression, making 4D Gaussians streamable for VR headsets or 3D displays.
Michael: It’s amazing to see the VR demo. You really feel a sense of presence. Some of the demos, like the boxer, make you feel like you have to dodge out of the way, because you feel like you’re there.
Shalini: Absolutely. That’s a big reaction we get. For example, one demo was captured by Clear Angle Studios in the UK, a boxer in the ring. When the punches come, people actually flinch. It’s very immersive. Our hope is to use AI and GPUs to democratize this technology so many people can experience it.
Michael: I think many people aren’t aware that imaging has reached such high fidelity in 3D, and even dynamic 3D, that it can feel like you’re really with someone. Did you have a specific moment when you first realized lifelike 3D was possible?
Shalini: Yes, I think it was with the original NeRF — before Instant-NGP. The quality of the multiview rendering was so good. Those Lego and tractor scenes were the first time I thought: this makes sending 3D tangible.
Michael: It’s been only about five years since then. What set you and your lab in the direction of dynamic expression and reconstruction?
Shalini: One big project was 3D telepresence, which started during the pandemic. People were on endless Zoom calls, and fatigue was setting in. We asked: how can we make telepresence in 3D, to restore a sense of connection?
We began by reconstructing humans in 3D and transmitting them across a channel — basically 3D teleconferencing. We showed a demo of AI-mediated 3D telepresence at SIGGRAPH 2024. That was based on NeRFs, but when Gaussian splats came along, it was natural to move to them.
Michael: When evaluating different methods, NeRF, Gaussian Splatting, voxel-based radiance fields, ray-traced approaches, what advantages did Gaussians bring?
Shalini: First, speed. Gaussian Splatting uses rasterization and GPUs, making it much faster to render.
Second, Gaussians are flexible. There’s work on unwrapping Gaussians and placing them on UV textures — essentially Gaussian textiles that wrap onto meshes. That lets you combine traditional graphics with neural representations. You get the fast rasterization and mesh compatibility of classical graphics with the photorealism of learned Gaussians. That’s new work we’re showing this year.
Michael: One of the things you hinted at was video calls. NVIDIA has also been showing GAIA. Could you talk about that?
Shalini: GAIA is a generative AI model that synthesizes highly photorealistic 3D head avatars in a controllable way. You can control identity and expression, and then animate the avatar with a sequence of expressions.
It runs in real time. We showed it running on a ThinkPad with a 4090 GPU. That means you can capture moments in 3D for posterity or communicate in a much more immersive way.
Michael: How do you think communication will evolve in the short term — say, the next one to three years? Will it shift more into 3D?
Shalini: Displays are the main limitation. There are three options:
2D displays with parallax,
stereoscopic or holographic 3D displays,
VR headsets.
Immersion increases as you go down that list, but adoption cost and friction increase too. The challenge is to deliver 3D experiences more seamlessly, in less invasive ways. If we can do that, I think 3D communication will grow quickly.
Michael: Over the past few years, software has leapt forward. Now it seems like distribution and display are the bottlenecks.
Shalini: It’s a symbiotic ecosystem. More applications drive hardware demand, and more hardware enables applications.
Michael: Shifting to sports, how do you think content will evolve with more dynamic volumetric capture?
Shalini: Today, many sports systems scan players ahead of time and then animate them with motion data. The stadiums are also pre-scanned. What you see is an animation, not reality.
With faster Gaussian Splatting and compression like QUEEN, I think we’ll see entire 4D scenes reconstructed and transmitted in real time. That’s the exciting future.
Michael: Compression is critical. But raw 2D inputs can create huge data sizes. What’s the future of compression and delivery to devices?
Shalini: There are no standards for 4D compression yet. It’s partly because neural representations evolve so fast — first NeRF, now Gaussians, and maybe something else next. There is a volumetric video compression group, but nothing finalized.
Standardization usually follows industry adoption. Right now, it’s still open research. For researchers, that’s exciting. Every stage of the pipeline, from capture to compression to streaming and decoding, is an open problem.
Michael: That makes this a very exciting field. Are there particular problems you’re most eager to solve?
Shalini: One is marrying Gaussians with physical AI. Gaussians are great for novel-view synthesis, but they aren’t grounded in physical meshes. For robotics and industrial AI, we need physical grounding — modeling contact, affordances, interactions. Embedding those physical priors into radiance fields is something I’m very excited about.
Michael: That will be fascinating to watch. Speaking of simulation, how do you see this applying to areas like sports performance analysis?
Shalini: In sports, there’s a huge need to simulate human motion, kinematics. Physics also matters: modeling the ball, the court, the interactions. All of the computer graphics knowledge about modeling physics will converge with radiance fields. That combination will open new possibilities for training, analysis, and fan experiences.
Michael: There are so many exciting developments. It feels like we’re entering a new age of imaging. Thank you for joining me today to talk about your work and the future of this technology.
Shalini: Thank you so much for having me.
Conclusion
Dr. Shalini DeMello’s work sits at the intersection of radiance fields, generative AI, and human communication. From QUEEN’s 100× compression of 4D Gaussian splats to GAIA’s real-time, controllable head avatars, her team is redefining how we might capture, transmit, and experience human presence.
What stands out most in her perspective is the balance between realism and efficiency, between neural representations and physical grounding, and between technical possibility and human connection.
As the field races ahead, DeMello believes the future lies not only in faster rendering or better compression, but in making these experiences seamless, accessible, and human-centered. If she’s right, the next leap in communication may not be a sharper video call — but the feeling of truly sharing space with someone, no matter where they are.