Rendering in 3D Gaussian Splatting
This blogpost is a web version of my notes for my “WebGPU 3D Gaussian Splatting” renderer. I’ve focused only on the forward path, especially what to do after we already have the covariance matrix. There are 2 main approaches here:
- Use the larger eigenvalue to find a square around the Gaussian’s mean. Most often used with a tiled renderer.
- Use eigenvectors to find the axis of the ellipsis. Place vertices on these axes.
I’ve described the math for both approaches.
These are my notes, the notation is going to be sketchy. Don’t worry, my tiny handwriting would have been much worse to read! I don’t work in this sector. It’s been a while since I had to use LaTeX on this blog. If you get something out of this page, then it’s a value added for you. If not, well, it’s not my problem.
Our main sources will be:
- “3D Gaussian Splatting for Real-Time Radiance Field Rendering” by Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis.
- “EWA Volume Splatting” by Matthias Zwicker, Hanspeter Pfister, Jeroen van Baar, and Markus Gross.
Goal
Each Gaussian’s transform is described using position, rotation, and scale. When projected onto a screen, it will have the shape of an ellipsis. You might remember from the principal component analysis that the eigenvectors of the covariance matrix form the axis of this ellipsis. Our goal is to get the image space covariance matrix.
Covariance 3D
Each Gaussian has a scale (vector ) and rotation (quaternion with real part and imaginary parts ). We need transformation matrices for both.
Scale:
Rotation (assuming unit quaternion):
From the definition of covariance matrix you might notice that is proportional to the sample covariance matrix of the dataset X. If the mean is 0, the sum becomes a dot product. Like matrix multiplication. How does covariance change if we have 2 transformation matrices? Let , the combined transformation. Then the covariance of this 3D Gaussian is . From the properties of the transposition, we know that . Then . The scale matrix is diagonal, so . The transposition of the rotation matrix is equal to its inverse. This might come in handy during debugging. Be prepared for left/right-handedness issues.
Covariance is the parameter of the Gaussian. It’s better if you don’t think of “calculating the covariance matrix given scale and rotation”. Instead, the Gaussian is parametrized by the covariance. So it happens that gives a covariance matrix. Not sure if this makes sense, but I found this to be clearer. Additionally, backpropagation is easier with rotation and scale instead of the raw matrix. Covariance matrices have to be symmetric and positive semi-definite.
Covariance 2D
In 3D graphics, to project a vertex into a clip space we multiply its position by view and projection matrices. Let denote our view matrix. For Gaussians we will use the following Jacobian instead of a projection matrix:
- p - position of the Gaussian in world space.
- t - position of the Gaussian in view space.
Check the explanation for the Jacobian in “EWA Volume Splatting”, section 4.4 “The Projective Transformation”. From what I understand, the mapping to perspective space can introduce perspective distortion. Depending on where on screen the Gaussian is rendered, it would have different size? The paper mentions that this mapping is not affine. I.e. it does not preserve parallelism. Use ray space instead. Find more details in:
- “EWA Volume Splatting”,
- “[Concept summary] 3D Gaussian and 2D projection” (Google translated from Korean),
- “Fundamentals of Texture Mapping and Image Warping” by Paul S. Heckbert (section 3.5.5).
The values for focal distance depend on the camera settings. Based on the perspective matrix values and geometric definition of tangent:
const focalX = projectionMat[0] * viewport.width * 0.5;const focalY = projectionMat[5] * viewport.height * 0.5;
The perspective matrix calculation only takes half of the provided field of view. Hence multiplication by 0.5. Depending on our coordinate system, you might have to transpose this matrix.
With this, we can calculate :
This uses the same trick as we have used for calculations.
We are only interested in the 2D projection of the Gaussian (we discard depth). Get the 2D variance matrix from the by removing the third, and fourth rows and columns:
If the determinant of the cov2d is ≤ 0 then the shape would be a parabola or a hyperbola instead of an ellipsis.
At this point, the reference implementation adds to the cov2d. It’s done to ensure that the matrix is invertible. You can find the detailed explanation in AI葵’s “Gaussian Splatting Notes (WIP)”.
Calculating eigenvalues
I recommend reading the Eigenvalues_and_eigenvectors article on Wikipedia. The “Matrix examples” section has nice examples.
Let’s calculate the eigenvalues. To make notation simpler, I’ve assigned a letter to each element of the 2D matrix cov2d.
when (calculating the roots of the quadratic equation):
Remember that (. Might make the implementation easier.
The and (the eigenvalues) are the radiuses of the projected ellipsis along major and minor axes. Since the value for the square root is always >1 (at least in our case), then > . You would expect this from the ellipsis projection.
Right now we have 2 options. We can either get the eigenvectors to derive vertex positions. This is not what the “3D Gaussian Splatting” paper did. They’ve collected all the Gaussians for each tile of their tiled renderer using conservative approximation. We will derive both solutions, starting with the one used in the paper.
Method 1: Project Gaussian to a square
Calculate tiles
The Gaussian distribution has a value even at infinity. This means that every Gaussian affects every pixel. Most of them with negligible results. We want to know which closest tiles it affects the most. Look at the 1D Gaussian distribution, and find radius “r” so that contains most of the area. For normal distribution with :
For example, if you picked , then the integral has a value of ~0.9973. Such an approximation has an error of ~0.27%. As you might suspect, the variance will affect the calculations. We end up with . Sigma is a variance of the Gaussian. For us, it’s (where has a bigger value).
let radius = ceil(3.0 * sqrt(max(lambda1, lambda2)));
The radius
is in pixels. We also have the Gaussian’s position projected into an image space. Every tile inside the radius
around the projected position is affected by the Gaussian. To map it to tiles, we calculate the affected square: projectedPosition.xy +- radius.xy
.
let projPos = mvpMatrix * gaussianSplatPositions[idx];// (we can also cull the Gaussian here if we want)projPos /= projPos.w; // perspective dividelet projPosPx = ndc2px(projPos.xy, viewportSize);// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/59f5f77e3ddbac3ed9db93ec2cfe99ed6c5d121d/cuda_rasterizer/auxiliary.h#L46let BLOCK_X = 16, BLOCK_Y = 16; // tile sizeslet blockSize = vec2<i32>(BLOCK_X, BLOCK_Y);// block count for x,y axeslet grid = (viewportSize.xy + blockSize.xy - 1) / blockSize.xy;let rectMin = vec2<i32>(min(grid.x, max(0, (int)((projPosPx.x - radius) / BLOCK_X))),min(grid.y, max(0, (int)((projPosPx.y - radius) / BLOCK_Y))));let rectMax = vec2<i32>(min(grid.x, max(0, (int)((projPosPx.x + radius + BLOCK_X - 1) / BLOCK_X))),min(grid.y, max(0, (int)((projPosPx.y + radius + BLOCK_Y - 1) / BLOCK_Y))));
With this, we know which tiles are affected by each Gaussian.
Depth sorting
For each tile, we will need its Gaussians sorted by the distance to the camera (closest first). For sorting on the GPU, Radix sort is popular, but Bitonic sort is much easier to implement. On the CPU, you can use your language’s built-in sort for a terrible performance. Or Counting sort for something 6+ times better (according to my app’s profiler).
Gaussian’s value at pixel
You might know the general form of the Gaussian distribution:
“EWA Volume Splatting” provides a function (eq. 9) for the density of an elliptical Gaussian centered at a point with a variance matrix :
The above formula depends on the dimensionality. Only the changes. Here, “d” denotes dimensionality (2 in our case).
In our case, V=cov2d matrix. We need its determinant () and inverse :
In “EWA Volume Splatting”, the inverse of the variance matrix is called the conic matrix. You might have seen this word used as a variable name.
We will now look at . Let’s and:
After matrix multiplication and a dot product:
We can then calculate as an exponential falloff from the mean point of the Gaussian at the pixel p. Multiply the result by the Gaussian’s innate opacity (learned during the training). The was removed (?). Your trained model file will not display correctly if you add it back in.
Blending
Rendering transparent objects is always done starting from the closest ones. Blend N Gaussians for a pixel using the following formula (eq. 3 from “3D Gaussian Splatting” paper):