Co-hosts Michael Rubloff and MrNeRF speak to co-first author of Gaussian Splatting, Bernhard Kerbl.

Resources:
3D Gaussian Splatting for Real Time Radiance Field Rendering
Taming 3DGS High-Quality Radiance Fields
Hierarchical 3D Gaussian Representation

Transcript

Michael

Okay. Welcome back to the View Dependent podcast, where we speak to industry leaders, researchers, and creatives operating in the world of hyper real 3d, including methods like Radiance Fields. Without exaggeration, no paper that has created such impact as 3D Gaussian Splatting did last year. When you take a look at the top cited papers of 2023. Number seventeen was Google's Gemini. And just two places below that at number nineteen was 3D Gaussian Splatting. And today we're really, really honored to have one of the two co-first authors of Gaussian Splatting with us, Bernhard Kerbl. Thank you so much for coming on the show today and we're really excited to have you here.

Bernhard:

Thank you so much for having me and also thank you so much for taking like so much care and like really getting the details correctly. Like saying like the, uh, impactful paper, like. Of that year specifically and also like co-first author, I really appreciate that you're taking so much care in making sure these things are on point. That's really, really nice. Yeah.

Janusch:

Also from my side, welcome. I'm really happy to have you here. Uh, it was a wild ride since last year, many things happened. And I think there are also many things which we will see the future based on, on your work. And so it's really amazing to have one of the first autos here and talk with you about, yeah, 3D Gaussian Splatting. And so I kind of want to just get started straight away and ask like, how did you get started in the world of like computer science? Did you always want to be a researcher? What kind of like prompted your, your initial journey into this world?

Bernhard

So, uh, I actually started, so with computer science somewhat late, like the real computer science, I really only started at university. So before that, I was playing around with all kinds of Uh, 3d authoring tools, but more like from, from the consumer side, never really from the, from the developer side. So making small 3d worlds, making small. 3d models making some low poly vertex creation and then these kinds of things. Um, and then actually when I, when I started my, uh, my, my, Education at the Graz University of Technology at some point I actually discovered that I like programming. It just makes sense to me. Uh, and then that led into further studies in the masters. Uh, for computer science and specifically software engineering and all of the subjects there basically the one that was most rewarding was the one that give you a visual result. Or basically a visual output for the work that you put in. So, uh, so pretty early I was tending towards computer graphics and visual computing. And yeah, I found some very inspiring personalities there, like for instance, Dieter Schmalstig and Bernhard Kainz and Markus Steinberger, who were all of this time at Theo Graz. Basically leading a force of high performance graphics research and pushing really into that direction. And, uh, yeah, I felt pretty much at home there. So I kept going into this direction and started out towards real time graphics and high performance graphics research, which I guess is the core of my scientific career. And what, what were the first topics you were doing research on? So like more specifically. So I, I think I've doubled at this point in a, in a wide variety of topics. So the very first topic that I did was, uh, on, that was a little bit in the, in the realm of medical visualization. So. Different ways for users to interact with medical data, which I found really, really quite interesting, but then basically it being interactive. But then we very quickly moved on to really like nitty gritty low level solutions that are at the algorithmic level. So basically you have some sort of. Problem how to handle, like in a very abstract way, how to handle data on the graphics processing unit efficiently. Uh, basically you can think of this as like making tools similar to a sorting algorithm. Um, but for basically taking arbitrary. Workloads like for instance, the ones that come up in rendering and processing them very efficiently on the GPU and then producing the outputs again. Very efficiently. So, um, working at, at the low level of making these things work like twiddling bits and optimizing individual instructions was actually one of my first projects there. And when you talk about like having, you know, an interest in like visual based graphics too, like what, what was your thoughts when you first saw like, like NeRF or, or when, you know, did you first come across it? Oh, yeah. So, so my first impression when I saw NeRF was like, I was like really impressed. I was also really intimidated because, uh, coming from computer graphics, I always looked at computer vision as, oh, that's, that's the big math stuff. That's the part that I want to stay a little away from. But then over time, the more I got familiar with it, I basically figured out that all of this really doesn't need to be so scary. Uh, so from, from, uh, from the computer graphics side, we usually approach things, especially in real-time graphics from a pragmatic point of view and a very discreet point of view. Whereas in computer vision, things tend to be more continuous, more abstract, more generalizable. And that can seem like very lofty and a little bit intimidating if you're coming from the pragmatic computer graphics side. But I think, um, There's, there's really nothing to be scared of. And I think those two can really much, very much work hand in hand for great results.

Janusch

What's this kind of inspiration for your following up work? So did NeRF s inspire you?

Bernhard

Uh, yeah, definitely. So I basically, when I, when I joined the, the group of, of George Stratakis, where we did this research in France, So this is part of the INRIA network, so the Institut National de Recherche en Informatique et Automatique. Uh, which is a, is a national research group which has locations all over France. And basically I joined them because I knew they were working on radiance fields. And I wanted to get my hands dirty, like in a hands on project. So I basically went there to learn about radiance fields. That was my initial intention. And when you were there, you actually worked on a paper called NeRF Shop and like speaking of like intimidating things, like how was it kind of taking into consideration, like making NeRF s editable? Sure. So that, that was a, where I basically saw, or basically had the, the experience that, well, there's, there's a good combination of, of both worlds, right? The people who understood very well what NeRF is doing and all the differential behavior were having a harder time understanding the intricate and very well designed System that is instant NGP and its implementation with efficient CUDA kernels. And in order to make something interesting happen, those two sides need to come together. So for me, looking at, for instance, those cooler kernels, those highly optimized and very nice code sections, uh, wasn't that hard. And I could see that other people's were benefiting from me being able to do something there. And at the same time, them providing ideas for high level solutions for using these highly efficient functions to make the editing happen. Because in editing, Things need to be interactive. You cannot wait ten seconds for your dragging operation to actually go through, right? That, that, that would be a pain. If you drag something, then maybe ten seconds later it happens. That's, that's always a hassle. So you need those high performance solutions as for instance, given by instant NGP to get that interactivity that we were after. Was this also inspirational for your following up work? Because if NeRF s are really hard to edit, there might be a better solution to come up with. No, uh, so I can't say that it was, uh, it could have been, but it wasn't. Um, we had an, we, with the NeRF shop, we had an interesting challenge of having this, uh, less. Transparent, less elucidating representation of the NeRF , which is still to me a little bit of a black box being editable. And with the 3D Gaussian Splatting, which came after being easy to edit, that was a, that was honestly a side product. That is not what we were after initially.

Michael

And while you were at then in that group, is that where you first met George Kopanas or how did you kind of get linked up with him?

Bernhard

Yes, no, that's, that's, uh, when I first met George Kopanas or Yorgos Kopanas. That's also where I met George Drettakis for the first time. And it was also the first project, uh, Where I worked with Thomas Leinkühler, who is at MPI in Saarbrücken, leading a group there. So very, very, very capable, very capable team. And so now that you have this like, you know, team all working together and such, how did the actual underlying foundation of 3D Gaussian spotting come about? So Yorgos Kopanas started his, uh, PhD. I don't know if it was 2020. I would need to look that up. I think around 2020. Um, and he has his, I think most of his PhD work has been on these point based representations. Like he really wanted to bring them back. Uh, there was some back and forth with George Drettakis as well. And I think Georgos Koponas at the end or in the end, basically. Uh, convinced George that yes, this point based representation is, uh, really going to make a change. It's going to make a difference. We should definitely go for it. So George is, this is a little bit complicated, right? Both of them are George. That's why I call, I call Yorgos, Yorgos and I call George, George, right? So, uh, George really wanted to have the solution where we say we have input images, which are casually taken by camera with, uh, decent settings. Like it's not a professional camera. It's a camera that anybody can wield. And you get some recordings and from this you reconstruct the scene and it is really fast and it looks really photorealistic. And I think he said, uh, he always wanted to have a reconstruction of Nice like George very consciously chose the south of France as his base of operation. He loves Nice. And he said like his dream would be at the end, that. Somebody like any casual user can walk their home city, like for instance, Nice and make a realistic photo realistic construction of it and then just explore it and have all these experiences. Extremely nice corners and places and nooks and crannies to show off to everybody else in 3D. Um, and uh, sorry, I'm derailing a little bit, but long story short. So, um, George's dream was to have this image-based reconstruction work flawlessly and Jorgos said let's do it point based. And he had done two prior works before we started on 3D Gaussian Splatting. And those two prior works very much set the stage. Um, they could, they could do a lot of impressive things. Um, they also, in some cases had like really, really outstanding quality, but it was painfully slow. Yeah. And, uh, that is, that is where we came in, where we said we have the idea, the core idea. Now let's risk, let's risk it. And let's say, what if we can reduce the requirements tremendously. Reduce the requirements of what needs to go into the process and then also reduce the processing time. And then we would have something that would like really make waves. And that was our goal.

Janusch

So you started pretty much working on the rasterizer first. So it's a goal to rasterize a point cloud, even if it's like a very sparse one in the first place.

Bernhard

Yeah. So I, uh, I started with Yorgos had libraries for rasterization and they basically started from with, with them as inspiration and with them as a foundation and then tried to really like figure out, okay, in this whole system, what is taking the most amount of time? What is taking the most amount of pre-processing and what's the actual payoff? And then basically just started strategically trimming fat and, uh, figuring out ways to make each individual part, uh, develop more efficiently. And of course, always in collaboration with Yorgos who knew this, System already really, really well. Mm-hmm. So then you got it working at one point, I guess. Mm-hmm. But, but then it was like, you have, okay, a sparse point cloud and there's nothing happened, right? So you needed something like an addition, um, mechanism. Was it like a disorder? Uh, not, not like we were always certain that we need a higher number of Gaussians than those SfM points. That was always clear to us and I think like from the beginning we knew somehow this kind of densification of Gaussians needs to happen. It needs to probably needs to happen progressively. So this was basically part of our schedule from the start.

Janusch

And so were there any like really core challenges, you know, uh, that, that you encountered upfront where, you know, you're saying like, is this going to work or is this actually going to be, you know, a fundamentally new representation?

Bernhard

So the thing is, we actually, like, the initial plan was to go for what is now the hierarchical 3D Gaussian representation directly. Um, that is what we were actually after because we were all, we were all thinking about, Hey, we can make the scale really well. Yeah. In the end, if we do it nicely. And then like around Christmas, Yorgos was running the usual metrics and the usual evaluations. And he was like, hold on. We have something here. Like we, we have really, really good quality actually. Like in addition, like even without all this hierarchical stuff and scaling up stuff, we're getting really nice quality numbers in a very low, uh, time span. And that, that is where we actually consider, okay, maybe we should just submit this, right? Um, to, to the SIGGRAPH conference. That was not our initial intention, but that's how it in the end, uh, actually turned out. And, uh, that's what we did.

Janusch

And yeah, I can imagine there's like a lot of frustrating work, especially if you want to get the gradients right in the beginning, right? And you really don't have any reference where you can measure against. Other than you have like a numerical verification, right?

Bernhard

So there we basically benefited from Jorgos’s very disciplined and intricate work that he had done on his previous projects. Yeah. So we could always compare against his rasterizer, which he very meticulously had tested in all kinds of little toy cases. So, um, if we had like a certain number of, of, of Gaussians to differentially render after you project them into 2D Gaussians, we were always able to compare what we produce in terms of gradients with his gradients. And that was super, super helpful. And what was like the general timeframe from like start to finish on when this, this idea first started to be conceived to, you know, the actual final this works. So, I mean, if you see it as a series of incremental steps from the very beginning where Yorgos started with the neural point based rendering or PBNR point based neural rendering actually. I guess it would be two, two and a half years, all in all, taking all the experiences together. But this particular project where we said, okay, this will be a submission to SIGGRAPH, um, we start working on it as a team now. That started in September of 2022 and then I guess four months later or five months we submitted it to SIGGRAPH.

Michael

And just out of curiosity, you know, because obviously NeRF at the time, it was really dominating the world of Radiance Fields. Like how difficult was it to kind of depart from that to a different line of research that, you know, you just believed in?

Bernhard

Uh, for me, it was very easy because I, I wasn't like the, the NeRF Shop, which is excellent work with, with Clément Chambon, who is, who is now of course at MIT. Um, That, that was my first, uh, contact with NeRF. So I, I wasn't really, I had worked on it in one project. So, but I was, I wasn't super invested. Right. And the field was racing along. So working outside of what, What everybody else was going after was actually a little, compared to what I see now, it was relatively calm, is what I would say. And so with like NeRF, NeRF had that moment of like positional encoding where it just started to really like look good.

Michael

Was there like moment in time for you that kind of was equivalent to that?

Bernhard

Oh, that's a great question. Um, the quad, a few of them, but in the end, I'm not so sure I'm not so sure that we traced them correctly or we found that the reasons we identified the reasons for them correctly. At some point we had this epiphany, uh, hold on. Um, what if we just have this. Opacity reset mechanic every so often. It seemed to make things better. These days I'm not so sure anymore. Um, We also had this, what if we just stopped the densification at some key point? And that also was kind of a eureka moment where things just immediately improved the scores that we needed. Uh, but mostly it was about where can we get the performance that we need. And go ahead.

Janusch

Right. And then you submitted this paper to SIGGRAPH and. Did you like immediately start on follow up work or was just like submitting and waiting?

Bernhard

Yeah, yeah, of course. Of course. No, so, George had the very good, like he has great intuition. He has a tons of experience and he has a great view of the field and George. Was convinced from the beginning that this will, uh, that he was very certain that this will be very well received. So he had no doubt about it. So he basically said, yeah, we can. We can immediately start to work on what's next.

Michael

And I think you published it or, uh, or you say hit ARXIV, I think in late April or so, or very early May. And for that time between like, uh, April and then August, early August with SIGGRAPH, it seemed like, you know, everyone was still very much focused on, on like, you know, uh, range from methods like NeRF. And then as soon as you won, you know, like, like best paper at SIGGRAPH. Everything started to like shift and move really fast. Did you notice like a large influx of people starting to then reach out after SIGGRAPH or when did it start to really, you know, catch fire for you?

Bernhard

Great question. Yeah, I think so. A couple of months after SIGGRAPH, maybe, maybe half a month, maybe a month after SIGGRAPH, but I don't, I don't keep track of that very well. I think that would be where things started to, to pop off. Yes. Yeah, I think SIGGRAPH made a huge difference in this.

Michael

And so I just want to show a quick, you know, reading, because we've been talking so much about, we haven't actually looked at any of the examples of what. Sorry. No, no, it's, it's, it's my bad. Um, but yeah, so you can see here that you're able to really get like such like strong visual fidelity and you get these really amazing, you know, view dependent effects that are all rendering in in real time. You know, this is like the one of the really early times where we've had like Radiance Fields running well into the real time rates. And so, um, but you know, you, you finished, you know, the first 3D Gaussian Splatting paper, and then you mentioned you started on other papers such as, you know, Stop the Pop, which addresses some of the other implementations of Gaussian Splatting.

Bernhard

Oh, the paper for Stop the Pop. Ah, sure. So, um, I mean, I was always sort of surprised that more people aren't bothered By these popping artifacts as much as I am. Like when we, even we submitted to SIGGRAPH, I was like, yeah, but what are they going to say about these popping artifacts? Right. And nobody seemed to care about them as much as I did. Um, uh, but something, something where. If you change the camera settings, for instance, if you put yourself in a very extreme situation, like, for instance, VR, right? This is a little bit of foreshadowing, I guess. But in certain camera settings or in certain configurations, these popping artifacts can be really, really nasty. So the Stop The Pop basically solution was uh, very, very principled, very nice piece of engineering to, again, with, with, with a great team, right? Like the, the guys from Theo Graz, like these guys. They really know GPUs. They really know parallel processing, high performance computing. It's just, uh, top of the line. And, uh, yeah.

Janusch

The, and just sorry for, but, but, but the title stops the pop. So did you, did you coin the term popping because I haven't heard it before?

Bernhard

No, no. I think that's a, that's a common term in, uh, in real time graphics. Just when something suddenly, like, you see it a lot, for instance, when you have different levels of LOD and you go from one level of LOD to the next for a mesh, for instance, and it's just really, really obvious that you made the switch. Um, that's, that's commonly referred to as, as popping artifacts. So yeah, no, that's, that's just, that just makes sense, I think.

Janusch

There are also rumors about the title stop the pop. Is that about stop the pop music or something like that?

Bernhard

So I think the guys at grads who worked on this have a healthy disregard for pop music. Uh, like there's nothing, there's nothing evil in there. It's just, if you have, if you have a name that pops, haha, right. Uh, it will, it, it could, like, it will make a little bit of a difference or it can make a significant amount of difference because people can remember it. Right. And I think also, you know, remembering first from like the, the, the name itself, but in this regard too, I think it makes a huge difference on the, the visual fidelity too. I, I showed this, this quick example before and after. Yeah. And it really does address that like popping effect.

Michael

And I guess I was wondering, yeah, if you could just like give a quick overview of like what, what exactly it is?

Bernhard

Oh, sure. Sure. So it's, it's doing two things. So first of all, we have, when you do the, the visit, which called visibility resolution in graphics, right? You basically resolve the order in which. The primitives that you are rendering should be blended together in each pixel. And 3D Gaussian Splatting does this inaccurately because it only ever has one depth value per Gaussian period. And you can go a little bit further and you can say, well, um, when we render, we actually duplicate these Gaussians, we put them into little lists and every region of the screen gets an independent list of Gaussians to work through. So actually you need to duplicate Gaussians, right? If there's one Gaussian here that overlaps with two image regions, basically already need to duplicate. So what you could do is you could at least give those two duplicated Gaussians two different depth values, right? And you can do that. And you get a better, more accurate visibility resolution as a result. But you also get like very clear artifacts at the borders, right? You have one way of rendering the Gaussians or one order of rendering the Gaussians in one tile and another one of rendering them here. So that means at the border you now get very visible artifacts. So the second step is to combine it with a hierarchical behavior where you say you have not only these tiles, But which have their list, but you also have smaller subtitles within those tiles and they also have smaller lists. And these smaller lists are so small that they can be held in a memory, like a memory type that is much, much cheaper to access. Yeah. So you have progressive. Set of smaller image regions up until you end up in the pixel and each of them has their own shorter and shorter lists. So basically you can say I have the full list of cautions here. I take the closest one right now. Then I go to my sub tile. I have for this a separate list. I find out in this subtitle what is my best estimate for the depth value that this Gaussian should have. In this subtitle, you update the list of Gaussians that you have for the subtitle. You take the current winner again. Move on, put it into the, uh, the list of the next smaller subtitle. You repeat the same process until you end up at the pixel level and you have a choice of like two gaussians that you can now pick, um, and process basically add to the pixel. And yeah, that's, that's basically, uh, that's basically it.

Janusch

Yeah. Just, just to note, this process still isn't perfect, right? Because these lists aren't of infinite length. So the process can still make mistakes, but the mistakes are luckily like so small and significantly reduced. That is, A significant improvement compared to the original.

Bernhard

Yeah, I'm surprised because if you look at the rasterizers which are out there like commercial ones and also open source ones, basically none of them implements this technique right now. Or something else, which was very surprising. It's kind of goes, maybe it doesn't disturb people that much or just, I don't know why it's not picked up. So I think that the guys in grads, they know what they're doing, right? So it's not, it's not super trivial, first of all, to implement this. And it's not a hundred percent portable. So you need, uh, certain instructions that maybe for instance, mobile GPUs and so on might not have. So it's, it's something that requires not a big one, but it requires some base level of hardware capability. And also then it requires a good level of engineering, software engineering ability as well to make that work. But in the chronological order, let's go a little bit back.

Janusch

After the 3DGS paper, what was the next work you've been working on? Uh, that was the right. The one that followed directly was the hierarchical Gaussian representation. Um, because it was like the dream of George.

Bernhard

Exactly. Exactly. With this, basically, he could do what he always wanted to do. Basically, he himself, the path that goes through Nice, I don't know, maybe we have that, that video somewhere. Uh, the path that goes, that goes through this, this long, long trek along Nice. That's basically, uh, sorry, no, that's, that's the Inria campus. That's also nice. But, um, the, the, the one where, where, uh, they're going through this, this long, long trek through the city of Nice. And that's actually him, uh, riding his bicycle. So he himself recorded this data set. Uh, wearing that, that cool helmet that we show off in the paper as well. Um, and, and riding around Nice and taking, turning around at some point. And yeah, so the thing that we see right here, that's, that's the INRIA campus. And there we had one of the authors that was Alexandre Who was walking around with the same hat on and, uh, yeah, checking out all the places of the INRIA campus in, in, in Nice.

Janusch

And did you know, like going into this, that it just was going to work like that something on this scale was just completely?

Bernhard No, no. Uh, so, uh, the data collection and figuring out. What kind of data works and doesn't work was a huge, huge challenge there. I think it was, I think we spent more time on figuring out the data and what can and can't work then on the actual approach. So, but that's, that's, I'm told that's always the case when you go for these projects that are just large scale. And that's, that's why it hasn't been done so many times, uh, even with NeRF, like with NeRF, there was Block NeRF and there were a few other approaches like giga NeRF, but many of them, uh, switch to the aerial view, right? If they say, oh, large scale, they basically mean, yeah, we have drawn images from up top. Yeah. But foot level or ground level exploration of vast, vast data set, I think BlockNeRF is one of the few who actually succeeded in that because getting the data and data that works It's really, really nasty. So, yeah. Right. And BlockNeRF, I think, came out of Waymo with Matt Tancik. Right. Yes. I guess it was helpful with all the, you know, dash cam and our other car. Yeah. Yeah, yeah, yeah, yeah. And also the fact that everything takes so long. How will you know that something has worked? You wait a week. That's, that's just, that's, that, that to me is like way, way too long.

Michael

Yeah, and so one of the things that I also wanted to kind of highlight with when it comes to both large scale as well as the original implementation of 3DGS is that you're reconstructing this scene. With potentially hundreds of thousands to millions of individual Gaussians. And with this original implementation, like, the file size that you would get out of it was quite large, you know, ranging anywhere from, like, several hundred megabytes to, you know, gigabyte, gigabyte and a half or so. And then, uh, since then there's been a tremendous amount of like both compression work as well as like densification, uh, methods too, that have really cut down. On the number of Gaussians to still get the same level of fidelity. And so it seems like that, is that kind of what led to the work of, uh, taming 3DGS for you?

Bernhard

Um. Hmm, so taming 3DGS, the motivation for this one was basically pretty much in the name, just making sure things are controllable. So yeah, we've seen these kinds of approaches which are Trying to cut down, for instance, the size by, let's say, 10x, right? We want to have the file sizes to be 10x, 20x smaller. But what does x mean? That to me was always a little bit bothersome. Uh, you, you wouldn't want to be okay. Okay. I have a reduction of like ten times, but what, what do I start out with? And, and all of these relative. All of these relative reductions sort of weren't like not very satisfactory because you might be in a situation where you say I have exactly 500 megabytes to spare. What is the best thing that I could get, right? Or I need to produce a model that has exactly 5,192 Gaussians because that's the dimension of my downstream neural network into which I want to feed it. And then all of these things you basically had zero control over. So that was the main idea behind taming 3D GIS, basically making it controllable and freely usable. And you were still able to really get like the same level of visual fidelity though in the end while it's really like dropping the time. Right, yeah, yeah, yeah. So that's where a very, very, very bright and gifted student from Carnegie Mellon University, that's Saswat Malik, with whom, who was also part of this, presented We presented this in Seacroft Asia this year and basically, he came up with a very nice scoring methodology for actually figuring out which of the Gaussians should be densified and when. And it just avoids a lot of the redundancy that was there in the original approach and replaces it with just like quite principled, well-informed decisions of where to put actually new Gaussians in the densification process. And then you combine that with the awesome work by Rahul Goel, who made very cool optimizations and very detailed and tough optimizations to the whole process and because of these two combined features, basically we get the speed up of about like five to six times reduction in training in addition to it being like completely controllable.

Janusch

So given all your work in this field, what do you think is basically the success or the secret sauce of caution spreading that makes it so successful in comparison to NeRFs?

Bernhard

So I think it's an, it's an all rounder in so much as, uh, you can do a lot with the representation. It's very explicit. If you want to compress something and you know its points with certain attributes, you can rely on tons of previous literature to do that and apply that. So, and you can, of course, then also extend it. So, it's very explicit and it lends itself to Many approaches that we already know from established computer science literature. The other thing is that it is very fast, right? So it has, it has really nice real-time rendering performances, especially rendering performances. So it gives a lot of a performance boost to people who need to iterate quickly. But then also it's quite portable. So this is something that we showed, I think, in our 3DV tutorial. Is that it is somewhat straightforward to actually make not a software rasterizer, but Any hardware, uh, that you have, for instance, in your mobile device render a 3D Gaussian Splatting scene. It's, uh, comparably straightforward. So it's, It lends itself to all kinds of improvements and extensions. Uh, it's high performance, it's good quality, and it is also portable. So I think. Ticking those four boxes is makes it so that everybody can find something about it and it basically keeps on living and finding in each case one of these directions is something that people are exploiting.

Janusch

And did you expect that it's gonna be such a success? So was it something you have foreseen in any, in any chance beforehand, before the publication, or maybe even a couple of months after the publication?

Bernhard

No, no, no. I mean, I knew that it was going to be the most impactful thing that I had done up until this point, but no, no idea that it would get the kind of attention that it's getting right now. And I guess as a result of that, has like your life like changed in any way because of its success? Uh, I would think so. So, I mean, I have been looking for, for, uh, next steps in my career, right? Like for instance, faculty positions and, uh, after 3D Gaussian Splatting, things like talking to people, making connections, learning about opportunities and making an impression have become, uh, easier.

Michael

That's awesome. Yeah, I'm really glad to hear that. And yeah, just out of like, you know, curiosity, I guess we didn't really even like discuss even at like a high level yet as like how Gaussian spotting like works or like how would you describe it to someone? How would I describe it to someone?

Bernhard

Uh, so basically I would say you have something that you want to preserve for Uh, longer time and maybe to interact with like, let's say making a 3D memory of a certain moment in time, uh, or making a 3D memory or a 3D imprint, a digital twin. Of something that you have right in front of you and all you are equipped with or armed with is a, is a camera like a standard grade camera. And then 3D Gaussian Splatting basically is a way to both find out. What is a nice way to model what you have on your images? Like from the images, a good representation of the actual 3D object itself. Filling in the gaps that you don't see from the images because you can only take so many images, right? Which means that you will always be missing some part of the object you are filming or photographing. 3D Gaussian Splatting is trying to fill in those gaps as best as it can and then In addition to being able to construct this 3D representation, which you can now look at from any view or any position that you want to. It also allows it to do this viewing really, really fast. That's the second thing that sets it apart from what was there before 3D Gaussian Splatting arrived is when you say, I want to look at this 3D memory. From a particular point of view, it doesn't take a minute until you get that new view that you haven't seen before, but it's like. sixty frames per second, like the update rate of any casual, um, real-time game that you might be playing.

Janusch

Maybe let's talk for a moment about the spherical harmonics and the role within this process because they are often, they have a nice effect, but they are also quite heavy in this process. How would you describe it? Is there something you would foresee which might solve this problem? Is there something we basically have to live with for a while? Or is it like, yeah, is there something maybe in the future coming up? Which might have to at least remedy the memory requirements, but also maybe hub with better or something like this.

Bernhard

Right. So there's, there's not much. Keeping anybody from, let's say, for instance, replacing the spherical harmonics with other solutions that are out there, like, for instance, spatially encoded color, like, for instance, using a hash grid to store only the color information, like in a Gaussian independent, like 3d continuous encoding, something that you could absolutely do. And that the color is just taken from that, um, or straight up just replacing. The evaluation of spherical harmonics with a small neural network that can also be done. Um, also extending the spherical harmonics with additional Primitives like for instance spherical harmonics plus spherical gaussians which some previous work has actually shown To work quite well. I forget which paper that was so shame on me, but there were there was a at least one paper recently that tried that mix and it worked really well. So the spherical harmonics are definitely nice. Um, I'm not sure right now Uh, how much of the higher bands, it's, it's a little bit iffy what they are exactly doing because 3D Gaussian Splatting tends to create some of these view-dependent effects through other means Then the spherical harmonics. So they definitely have an impact and those higher bands are definitely helping with the quality. How much and how principled it is and what is actually the payoff, uh, the return on investment per spherical harmonic spend. I'm not sure I fully understand it yet or what is the best solution out there. But it's very easy to, and people have done it, to go ahead and replace it, extend it, um, and yeah, um, So in, in terms of compression, there's tons of works already out there, which taking explicitly a look at the spherical harmonics bands, like for instance, the reducing the memory footprint paper. Which very explicitly selects only the bands that actually matter. So, yeah.

Janusch

And so you talk about, uh, like Gaussian splatting being a way to say, like step back into a moment in time or document a moment in time. Have you been like following along with other people who have just been like posting either like people, uh, or like production sites? Like what has that been like for you?

Bernhard

So I, uh, I've seen, so that's actually a very, super interesting because we, we were looking at these standard data sets that everybody was using, which are off like outside garden scenes, monuments, rooms. Uh, and we never looked at people. And we never looked at animals. And then some people, somebody came up and said, Hey, what the heck? If I, if I do 3D Gaussian Splatting on my plush teddy bear, it actually works amazingly well. Or if I had to capture a person which has actually hair, Look, here's what NeRF can do, and here's what 3D Gaussian Splatting can do, but we were completely surprised by this, but we never, never considered this, but in the end, uh, it makes perfect sense that it can do these fine high frequency details, um, really, really well. So yeah, we, we, we followed along and we were sort of surprised by all the things that people found when they were able to try it out from the code base that we hosted.

Janusch

Do you also your own captures? Sorry, you have to follow up if you make your own captures in your spare time.

Bernhard

No, I, uh, sorry. No, um, I'm not, I'm not, I'm not actually a big 3D Gaussian Splatting. You know, I'm in academia. I need to. Always be thinking about what, what could be, uh, the next nice thing for, for the students that I supervise and how they could have their own projects succeed in one way or another. So yeah, no, I, I don't think I've. I've gone out and taken the single, no, that's not true. I've taken some data sets for, for the hierarchical 3D Gaussian Splatting work, but again, purely project related.

Michael

And I do want to ask because it seems like a lot of people just in the world associate the words like radiance fields with neural radiance fields as being just like a NeRF . And in your opinion, is Gaussian Splatting also like a radiance field representation or what does that like word like mean to you?

Bernhard

So if you had asked me like half a year ago, I would have said yes. Uh, if you ask me now, I would say yes. It depends. So it definitely builds on a lot of the theory that, uh, is basically motivates what radiance fields are. So it definitely Is the intention was definitely to have a radiance field for presentation and it is also described as such in the original submission. Whether it's still fully 100% technically correct to call it the radiance field, that's, that's a very, very subtle discussion that we can have. Um, but I think for all intents and purposes, we could say yes. Yeah. For 99.5% of the people out there, if we call it the radiance field, I think that is the closest to the truth.

Janusch

So from your perspective, what are the current biggest Uh, pain points and this approach and also, um, what do you think should be solved to make it, I don't know, more successful, maybe also something which might be used in the gaming industry.

Bernhard

Right, so I guess so far, at least for the vanilla 3D Gaussian Splatting, and this is already a slightly outdated by Sort of experimental approaches that I've seen in the last couple of days is you right now have to go this way of reconstructing from images. So directly, for instance, taking your mesh and converting it to 3000 splits, um, that there's no pipeline for that there. But there was some work that I, again, I forgot the name, unfortunately, uh, happening in the last few days or coming out in the last few days where you can actually directly take meshes, convert them to their representation, which is Building and extending and modifying things on top of 3D Gaussian Splatting ideas. And yes, so that makes it much more usable. You now have a very unified pipeline for 3D game engines. Many. Is there anything else that's preventing us from using them in 3D games? I don't think so, no. I think that that loop is pretty much closed. What else is nasty about 3D Gaussian Splatting? Is maybe still this, this somewhat, uh, not very principled answer to the densification question. Like where do you put Gaussians? But there also we see some great work like, for instance, that by Andrea Tagliasci and colleagues, I think, I'm not sure who was the first author on that, I just know that Andrea worked on it. Which was called the Markov Chain Monte Carlo 3D Gerson Splatting idea, which are taking a very nice approach to Um, to the densification. The one that I see here is Shakiba Keratmand. Yeah, I think that's right. Yeah. So Shakiba from the University of British Columbia. So really, really nice work. This MCMC, this whole MCMC work like really, really impressive also in terms of the quality that they can get. Um, and yeah, really, really nice results.

Janusch

So what do you think about the large scale Gaussians? Because you were like the first one who did the approach to it in a large scale. Do you think something like this might be. Um, an approach which will get more traction in the future, like. Ideas to replace something like Google Maps by Gaussian representations in a real large scale is feasible because then at a certain point you will get so many points, right? And it's going to be hard to install them.

Bernhard

So. I mean, we were strictly speaking, not the first. So even before we had our, even though we worked on, started working on it immediately, right? And we worked towards this hierarchical class representations, there were others before us. Uh, or slightly before us, like for instance, Octree GS, um, and uh, I'm not even sure, vast gaussians. I think vast gaussians was again, aerial images. But people were looking at how can we do really, really large datasets and basically introduce something that is called level of detail. And level of detail is a, is a very well established concept in real time graphics because people have been worrying about what, okay, I have, I have my digital world, which is thousands of kilometers long. How can I ever render that in real time. And once you have a level of detail solution, Basically this problem of scalability is pretty much solved. You just need a place to store the whole data, which can be your hard disk or maybe even an. A remote server and then you just need the mechanism to figure out, okay, what level of detail do I need right now from where? And how can I find it? How can I index into my larger data structure to retrieve exactly that detail at this point? Get different levels of detail, which is usually a consistent amount of overall primitives. Like say, for instance, I need two million primitives at each point in time to get a nice image. I find those two million and I make sure I have them available and whenever the player moves somebody else, some of them are basically, uh, evoked from, from the, from in on-device memory and some others are streamed in. And as soon as you have that, basically nothing stops you from rendering almost infinitely large scenes. Sure, but you still have some problem with training such large students, for instance. Oh, yeah. Another like say like Switzerland here. The whole country, it's going to be really hard to train a scene, right? Because I mean, you can bring up the compute, but still connecting all the parts, it's going to be a challenge. Yeah. Um, so that's why within the hierarchical 3D Gaussian Splatting we went for a divide and conquer approach where basically you say, well, at some point you basically Make a cut between those connecting contiguous regions and you basically treat them almost independently. And at the end, when you put all of them together, basically just make sure that everybody mostly agrees with what their neighbor has, right? That there are no weird discrepancies just at the corner. Uh, basically get rid of that. And that way you get something that is, uh, divide and conquer independently processable and yeah, more easily, uh, more easily handleable, right? So this is again, uh, a page out of the, the Block NeRF solution because they made just a one block is one of these independent regions and they are basically able to also train those independently. So you get one NeRF per block. Is what they have. Um, so yeah, this, this is again, something that's just very sensible and divide and conquer is, is a powerful concept.

Janusch

And with that in mind, then like what, what do you think are like some of the more exciting directions of this, uh, technology? Right.

Bernhard

Something that I've been following somewhat closely are the feed forward Gaussian models that are now coming up. Which are basically saying, well, let's let's do away with all of these optimization nonsense, right? Let's just take our images and let's just immediately predict All the Gaussians that we need like in 0.3 or 0.03 seconds directly via the transformer architecture. And there's different ways of doing that. For instance, you have the the work, uh, that is Pixel Splat, which has been making quite a name, um, where you use epipolar attention. So a nice 3d and projection oriented prior to actually. Make this prediction for the Gaussians. There's other works which are saying all you need is basically attention, exhaustive attention, right? You, you put the tension on it and you actually get Object level Gaussian reconstructions which are super nice and no need for any epipolar or 3D priors. And then there's also people who are combining this, uh, with, uh, the lessons from the Mamba architecture. I think this is called, uh, this is a paper called Long LRM, um, which says, oh, you have up to. thirty-two images, you feed them into the, the transformer all at once. And because it's a mamba like a system or a mamba like a component based architecture, you actually don't have. Quadratic complexity, but you have a much faster complexity and you can again take all of these images and predict Gaussians from it directly. And what's also exciting is, of course, so what's exciting is being able to take all these images and put them into one transformer and getting out the prediction immediately, but also, for instance, taking sparser views, Single image views like, um, single image data sets or few shot image data sets and then Using generative AI to fill in the gaps so that it really remains pretty consistent when you switch to a new test view. But there I am less firm on the current state of research.

Janusch

Let me follow up with one question to the densification. Do you think this is something like directly generating Gaussians? Which will you replace the optimization process in the future eventually? Or is this something which will be stay along with a traditional optimization based approach because I think currently the quality is not yet there. Um, yeah. So as long as it makes sense, I think working with optimization.

Bernhard

So something that we learned from, from the 3D Gaussian Splatting is if you stay true to the things you can, uh, mentally follow and comprehend. You get the benefit at the end of it, right? And people can appreciate it. This is what we saw. We basically had this 3D Gaussian Splatting, which is the explicit representation. Versus NeRF, which is the more implicit representation, which is sort of more of a black box behavior. Basically, by using something that is very easy to mentally comprehend and follow and basically derive from first principles had its benefits in it being usable. And I think a similar, similar thing applies to the optimization, at least for a little while longer, where you say this is actually the result of optimization. Like it's clear. Those were the inputs and that is what it derived like only from these images. There's nothing that was, um, uh, there, there's nothing that, um, I forget what the word is when the neural networks basically are making up things. What's the, what's the, what's the word? Hallucination. Yes, exactly. Sorry. There's nothing that's being hallucinated. Everything basically comes from data. Right. And I think there's, there's definite benefit for that, especially in a real world use cases or in professional use cases, uh, for let's say surveying, uh, reconstruction, preservation of actual historical monuments, uh, You don't want to hallucinate details there, right? You want to stay true to the actual data. You want to derive only what the data gives you. So I think there's, there's very much still, there will always be benefits in, in these principled approaches, but I'm like, I would be also be very excited to see, like have a pipeline that gives you. High quality, credible results, like immediately, of course, for me, that would be, that would be the most exciting to see. And for the optimization based approach, do you think we have like on current hardware squeezed out all the performance or is there still like a gap where we can maybe get another X2? At two X improvement, because I think like yesterday or two days ago was, was another paper, which also tremendously speed up. Um, entire optimization project pro process.

Janusch

But so what do you think? Where's the limit?

Bernhard

So I think that the, uh, the 3D Gaussian Splatting method is, it works well, but also like the design There's, there's so many things in there that still warrant further inspection and improvement and looking at things. Um, if you ever pull up the, the remote viewer, like the one that actually lets you look into the training process, you can see how, uh, random and sort of not well behaved the early stages of optimization are. So I think that being smart about this and exploiting more knowledge of scenes, not for hallucinating details, but for guiding what needs to happen when and at which speeds, I think there's still significant potential for making things converge to optimal quality, uh, significantly faster. Also, another thing I noticed over the last weeks or months is that many people are trying really hard to get rid of the splatting itself, right? Um, to be like more, have more exact projection, especially, you know, within the restaurant phase or maybe also Applying our, um, ray tracing approach. Sure. Um, do you think it's something which will dominate the current rasterizer base implementation, which does just an approximation? So I think the, the ray tracing is, sorry, the, the, the work, uh, that was done. At NVIDIA, right, for the ray tracing Gaussians, which I also saw at SIGGRAPH Asia, that was, that was really, really nice. Um, I think that makes, uh, that was Nicholas Sharp, if I remember that correctly, as the first author. And this makes a ton of sense. It's just ray tracing is such a general and well understood toolbox that If you have a framework that raytraces Gaussians at good performance, then immediately you can extend raytracing renderers to Make cool artistic effects and different types of light transport so much more trivially or so much more easily than you can in a rasterizer. I think where it's always a little bit nitty gritty and implementation details and. Writing different object shaders and then coding things into triangles that shouldn't be triangles and stuff. So having the support to do it with ray tracing Is a, is a really cool thing. I think much of what this ray tracing, uh, Gaussian's work achieves. Can also be still be done with a rasterizer, but with much, much nastier, uh, amounts of work to do to achieve the same thing. So this is really, really great to have. Um, and in terms of what's something that we need to point out is that this ray tracing solutions implicitly also like Does, does a similar thing. It also gets rid of the popping just implicitly. Right. Um, so, and I think it does it like a hundred percent correctly. So both of them, I think stop the pop and ray tracing Gaussians visually. Will probably give you the same visual impression. Um, and in the, in the ray tracing solution, that's just a side product. It just. Popping free visualization just falls out of all the engineering work that they put in to make this run in a 3D ray tracing setting. Yeah.

Michael

And speaking of NVIDIA as well, they actually just came out with another, uh, I guess tangentially follow up work for called a 3d Gaussian unscented transform two, which kind of brings some of those like ray tracing based effects into. Be able to use rasterization, so I was definitely not expecting it to come this fast too, so I was pretty excited. Right, right, right. Is there any work currently which surprises you the most or which you didn't foresee that will come that from us?

Bernhard

Yeah, I think the. These, these, uh, feed-forward Gaussians that I mentioned before. I think those were the most surprising to me. Like, I, I totally didn't expect that it would be possible to have in sub-second time Something that makes sense, but that's because I'm, I'm, my, my background is not in the generative AI, right? So that basically, yeah. So it was surprising to me that this was so quickly done and so well done.

Janusch

And speaking of the world of like generative AI, have you seen any of the, uh, recent companies like World Labs or Odyssey that have started to, you know, use Gaussians to like represent, uh, you know, generative worlds?

Bernhard

Only at the, only at the fringes. So I've seen them pop up every now and then, but, uh, no, no deep dive. And so I guess kind of extending that to, you know, a lot of companies and startups have really come out, uh, since, you know, 3d garrison splatting, uh, Has it been surprising to you or like any of these like use cases that perhaps have come across your, your, uh, desk been just like, I wouldn't have thought that to use, you know, uh, this use case. So again, I may, might have been a little bit neglectful that I don't follow. I don't follow the startup scene that much. I mean, I'm happy whenever I see something that, uh, was successful and made sense and helped people, uh, launch themselves into the, the industrial environment successfully. But, um, I'm, I'm not keeping track of track of it to be, to be frank. Um, because it's just not productive for trying to advance my academic career right now.

Janusch

So, okay. Yeah. Do you foresee the success is going like extending to the next couple of years? Is this something which we will see even more publication because currently it's really hard to Track them at all. Yeah. Yeah. I, I have no clue. Like I, I was convinced that 3D Gaussian Splatting would be, um, replaced, uh, like every, every other, like I'm pessimistic by nature. So I assumed whenever I saw a paper that said when many of them do write, many of them write, ah, we are two times faster than 3D Gaussian Splatting. Oh, it's over. That's it, right? There's a new method and nobody will talk about 3D Gaussian Splatting anymore. But I mean, you have to read those papers very carefully. There's a, there's a lot, there's sometimes a lot of exaggeration going on. It did papers should really be, uh, read very carefully and with, with, with scrutiny and figuring out what, what are they actually comparing on which data sets. And is that actually a fair comparison? So I, um, I, I hope that I, I am a correct, I’m doing research with hopefully a lot of principle, trying to make sure the numbers we get are correct within acceptable margins of errors that can always occur. That the comparisons make sense, that they are fair. And yeah, so I think that scrutiny in this regard is really, really something that should be encouraged in computer science and in research papers.

Michael

And if there's like one thing that you want, like the, the, the audience to take away, whether they're like either like researchers or they just saw, you know, Gaussian Splatting and, and are amazed by it, like, what would you want to like them to take away?

Bernhard

About the approach itself or about the future of what's to come or?

Michael

Uh, yeah, I'd say about like the, the future.

Bernhard

So, If you haven't tried it yet, I think there's tons of tools out there already, uh, that give you the ability to visualize them. If anybody else is bothered by the popping artifacts, Put on some demands to somebody, uh, to make a widely available or maybe super portable, uh, maybe web-based stop the pop solution or just get the stop the pop solution that is out there already. And. Yeah, if you can think about what you could do with this representation, if it just speaks to you and you understand here some input images and I can make my 3D experience from them Then yeah, please, by all means, go ahead and I hope you can find some great new applications for it. But always be vigilant and if. If you have the time to look at research, make sure you look at the research and you pick the methods that works, uh, that is like really sophisticated and you give credit to the people who actually. Did it? Because if there's work out there that somehow gets drowned by 3D Gaussian Splatting and it's just more efficient and better and a higher quality. It would be a shame if it gets drowned out just because there's a hype, right? So, um, and very much if you have the time Stay vigilant, stay attentive on what's coming out because it's, it's really, it's really easy to get, uh, drowned in the, to get swept along with that.

Janusch

So, yeah. Yeah, thank you very much for your work and I think we should wrap up because we are a little bit over time. Um, it was a joy to talk to you and also that you're sharing so much. Um, from your experience and the journey you had before and since the publication it,

Bernhard

I had a great time here on the podcast. Thank you very much. Thank you guys for having me.

Michael

We really appreciate your time today. And yeah, we'll, we'll be back with more episodes in the future. So if you enjoyed today's episode, please consider giving us a follow or subscribing. It would help us tremendously. So thank you so much. Bernard, we really appreciate having you on the show and we will be back soon.