3D Gaussan Splatting

About me

  • Is Mitya / Митя
  • 3.5 years in ML
  • 1.5 years in 3D reconstruction specifically
  • I'm into:
    • 3D Generative models
    • 3D reconstruction tricks
    • Computer Graphics stuff
    • 3D modeling (sometimes)
    • Math!
    • Computer Vision in general

About today

  1. Prerequisites
  2. NeRF for dummies
  3. 3D Gaussian Splatting intro
  4. Rendering speed
  5. Practical notes & demo
  6. Dynamic reconstruction
  7. Comparison with NeRFs

What is: Triangular Mesh

Any solid object can be represented as triangles

What is: Rendering

Two most used approaches

Rasterization

Projects object to screen space

Ray Tracing

Traces a ray through each pixel

What is: Photogrammetry

Input: many-many photos of an object/scene

Output: point cloud of the object/scene

What is: Gaussian/Normal distribution

\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}

This function:

With nice properties

There is also a multivariate version

If we linearly project the distribution to a lower dimension, it will stay Gaussian

What is: Gradient Descent

Gradient Descent optimizes the parameter of a function

Some function of 2 variables

f(x) \xrightarrow[x]{} \min

Example for Neural Networks

NN(\theta, \text{img}) = \hat{\text{age}}
\theta \, - \text{weights of the NN}
L(\theta) = \sum\limits_{\text{img}, \text{age}} \left(NN(\theta, \text{img}) - \text{age}\right)^2

Dataset of pairs image:age

—  Loss function

\hat{\text{age}}\, - \text{age estimation}
L(\theta) \xrightarrow[\theta]{} \min

Optimization of the Neural Network

via Gradient Descent

We only need the gradient

\nabla_x f

What is: Differentiable Rendering

Our rendering is a function of:

  • camera parameters
  • the object
\text{Render}(\text{cam}, \text{obj})

And what if our object is the output of an NN

And the loss is a function of the render

Camera paremeters

object

L(\theta) = f(\text{Render}(\text{cam}, NN(\theta)))
\frac{\partial L}{\partial \theta} = \frac{\partial f}{\partial \text{Render}}\frac{\partial Render}{\partial \text{obj}}\frac{\partial NN}{\partial \theta}

Chain rule:

is usually not available

but PyTorch3D (and some others) have it

\frac{\partial Render}{\partial \text{obj}}

What is: Neural Radiance Fields (NeRF)

color = (1, 1, 1)

density = 0

color = (0.6, 0.4, 0.23)

density = 1

Any object can be described

with a function 

f(x) = (\text{color}, \text{density})\\ x \;- \text{3D point coordinates}

In NeRF, this function is a NN

Ray direction is also passed because the color may depend on the point of view

NN(\theta, x, d) = (c, \sigma)
\alpha(x) = e^{-\sigma(x) * \delta}

the probability of a light particle passing through this point (segment)

\delta \; - \text{length of the segment}

What is: Neural Radiance Fields (NeRF)

camera ray

\alpha_i

— probability of the ray passing through at point i

T_i = \prod\limits_{j=1}^{i-1} (1-\alpha_j)

— probability of the ray reaching point i

\hat C(r) = \sum\limits_{i=1}^N T_i \alpha_i \text{c}_i

— expected color of the ray r

That is how we render a ray

This is called Volume Rendering, we can render the whole image like this

And it's differentiable

Training a NeRF is optimizing

NN(\theta, x, d)
L(\theta) = \sum\limits_{r} (C(r) - \hat C(r))^2

Loss function:

What is: 3D Gaussian Splatting

camera ray

\alpha_i

— probability of the ray passing through at point i

T_i = \prod\limits_{j=1}^{i-1} (1-\alpha_j)

— probability of the ray reaching point i

\hat C(r) = \sum\limits_{i=1}^N T_i \alpha_i \text{c}_i

— expected color of the ray r

\sigma_i = NN_\sigma(\theta, x_i)\\ c_i = NN_c(\theta, x_i, d)

In NeRF we're using the NN to get density and color

What is: 3D Gaussian Splatting

camera ray

\alpha_i
T_i = \prod\limits_{j=1}^{i-1} (1-\alpha_j)

— probability of the ray reaching point i

\hat C(r) = \sum\limits_{i=1}^N T_i \alpha_i \text{c}_i

— expected color of the ray r

w_{i, j} = w_j\mathcal{N}(x_i | \mu_j, \Sigma_j)\\ \sigma_i = \sum\limits_{j=1}^M w_{i,j}\\ c_i = \sum\limits_{j=1}^M c_jw_{i,j} / \sum\limits_{j=1}^M w_{i,j}

In 3DGS we're using many colored 3D Gaussians

— probability of the ray passing through at point i

What is: 3D Gaussian Splatting

camera ray

\hat C(r) = \sum\limits_{i=1}^N T_i \alpha_i \text{c}_i

expected color of the ray r

w_{i, j} = w_j\mathcal{N}(x_i | \mu_j, \Sigma_j)\\ \sigma_i = \sum\limits_{j=1}^M w_{i,j}\\ c_i = \sum\limits_{j=1}^M c_jw_{i,j} / \sum\limits_{j=1}^M w_{i,j}

This is actually a numerical estimation done by sampling points along the ray

What is: 3D Gaussian Splatting

camera ray

\hat C(r) = \int\limits_{t_{near}}^{t_{far}} T(t) \alpha(t) \text{c}(x)dt\\ x = o + td

expected color of the ray r

w_{i, j} = w_j\mathcal{N}(x_i | \mu_j, \Sigma_j)\\ \sigma_i = \sum\limits_{j=1}^M w_{i,j}\\ c_i = \sum\limits_{j=1}^M c_jw_{i,j} / \sum\limits_{j=1}^M w_{i,j}

The true Volume Rendering formula

looks like this

What is: 3D Gaussian Splatting

camera ray

\alpha_j = \mathcal{N}((u, v) | \mu_j', \Sigma_j')\\ T_j = \prod\limits_{k=1}^{j-1} \alpha_k\\ \hat C(r) = \sum\limits_{j=1}^{M'} T_j\alpha_j c_j
w_{i, j} = w_j\mathcal{N}(x_i | \mu_j, \Sigma_j)\\ \sigma_i = \sum\limits_{j=1}^M w_{i,j}\\ c_i = \sum\limits_{j=1}^M c_jw_{i,j} / \sum\limits_{j=1}^M w_{i,j}

And due to nice Gaussian properties, it can be solved in close form!

How to render

camera ray

16x16 block

Image

  1. Split the image into 16x16 blocks
  2. For each block, allocate only Gaussians with 99% confidence intersection (culling)
  3. Load each block B into a separate GPU group and:
    1. Load allocated Gaussians
    2. Radix Sort in depth order
    3. Each pixel is computed in it's own thread:
      Accumulate colors of Gaussians until 0.9999 probability of ray stopping

How to render

camera ray

16x16 block

Image

  1. Split the image into 16x16 blocks
  2. For each block, allocate only Gaussians with 99% confidence intersection (culling)
  3. Load each block B into a separate GPU group and:
    1. Load allocated Gaussians
    2. Radix Sort in depth order
    3. Each pixel is computed in it's own thread:
      Accumulate colors of Gaussians until 0.9999 probability of ray stopping

Culling + iterating over objects is a property of Rasterization, not Volume Rendering

How to train

camera ray

During training, Gaussians with small opacity are removed

(With small opacity becomes useless)

Adaptive control:

camera ray

And Gaussians with large gradients are cloned

(Large gradients mean this area has a large loss, therefore we need more components)

How to train

camera ray

During training, Gaussians with small opacity are removed

(With small opacity becomes useless)

Adaptive control:

camera ray

And Gaussians with large gradients are cloned

(Large gradients mean this area has a large loss, therefore we need more components)

A good visualization

http://leonidk.com/fmb-plus/

Practical notes

Tested 3DGS and NeRFStudio on 380 images of my cat (frames from a video)

A5000 GPU (rented a VPS)

3D Gaussian Splatting

NeRFStudio

  • 11GB VRAM
  • ~ 1 hour training
  • Default viewer is a pain in the ass to get working
  • 5GB VRAM
  • ~ 80 minutes training
  • Default viewer is great

Practical notes

Step-by-step tutorial for Windows

NeRFStudio Viewer plugin for 3D Gaussians!

https://github.com/yzslab/nerfstudio/tree/gaussian_splatting

Literally came out this week

Practical notes

Step-by-step tutorial for Windows

NeRFStudio Viewer plugin for 3D Gaussians!

https://github.com/yzslab/nerfstudio/tree/gaussian_splatting

Literally came out this week

DEMO

Dynamic Reconstruction

camera ray

Unlike NeRFs, all the Gaussians are individual objects we can move/rotate

Therefore animating them is a lot easier. And it can be used for reconstruction of moving scenes.

Dynamic Reconstruction algorithm:

  1. Reconstruct the first frame
  2. For each of the next frames:
    1. Initialize this frame's Gaussian parameters with the previous frame's
    2. Run optimization only on positions and rotations

Dynamic Reconstruction

camera ray

Unlike NeRFs, all the Gaussians are individual objects we can move/rotate

Therefore animating them is a lot easier. And it can be used for reconstruction of moving scenes.

Dynamic Reconstruction algorithm:

  1. Reconstruct the first frame
  2. For each of the next frames:
    1. Initialize this frame's Gaussian parameters with the previous frame's
    2. Run optimization only on positions and rotations

NeRF vs Gaussian Splatting

Previously state-of-the-art was this NeRF-based method

3DGS didn't beat it, but is pretty much equal in quality

+ Gaussian Splatting renders HD in real-time

+ Much easier to manipulate (animation / dynamic reconstruction)

- No conditional generation (EpiGRAF / NeRF in the wild)

- Meshes are worse than current NeRF methods (Neuralangelo)

- Can't dynamically reconstruct changing topologies (NeRFPlayer)

 

🎊

Thank you for coming!

Follow me on:

Made with Slides.com