Building a Rendering Engine in Tensorflow

Twitter: @awildtaber, Github: ataber

Full blog post on Medium, username @awildtaber

Geometric Representations

  • Polygonal (Meshes)
  • Parametric (Boundary Representation)
  • Implicit (Function Representation)

Parametric Representation

f :R^2 \rightarrow R^3
f:R2R3f :R^2 \rightarrow R^3
f(u,v) \in R^3
f(u,v)R3f(u,v) \in R^3

Represent geometry as the image of a function

Geometry := \{\forall p~\exists u,v~\vert~f(u,v) = p\}
Geometry:={p u,v  f(u,v)=p}Geometry := \{\forall p~\exists u,v~\vert~f(u,v) = p\}

Easy to get local data

Just walk in the u or v direction

Hard to query point membership

Requires solving a hard computational geometry problem every time

Hard to perform boolean operations

Modeling complex shapes requires many parametric patches

Which means exploding memory cost for things like microstructures

Implicit Representation

f :R^3 \rightarrow R
f:R3Rf :R^3 \rightarrow R
f(x, y, z) \in R
f(x,y,z)Rf(x, y, z) \in R

Represent geometry as the kernel of a function

Geometry := \{\forall x,y,z~\vert~ f(x,y,z) = 0\}
Geometry:={x,y,z  f(x,y,z)=0}Geometry := \{\forall x,y,z~\vert~ f(x,y,z) = 0\}

Easy to perform boolean operations

Can model essentially arbitrary complexity with low storage overhead

Rendering is time-consuming

You trade space for time

User interaction is not intuitive

Everything is inherently global

Signed Distance Functions

Special case of implicit representations

The magnitude of the function underestimates the distance to the nearest surface

\forall x,y,z ~ abs(f(x,y,z)) <= d((x,y,z), surface)
x,y,z abs(f(x,y,z))<=d((x,y,z),surface)\forall x,y,z ~ abs(f(x,y,z)) <= d((x,y,z), surface)

Underestimation is important because there are some surfaces (looking at you, ellipse) which do not have closed-form distance functions

When (x,y,z) is inside (outside) the surface, the function's sign is positive (negative)

Prior Art

An Example

from math import sqrt
def cylinder(radius):
  def equation(x, y, z):
    return radius - sqrt(x**2 + y**2)
  return equation

my_cylinder = cylinder(1)
assert(my_cylinder(0, 0, 0) == 1) # 1 unit inside cylinder
assert(my_cylinder(1, 0, 0) == 0) # on wall
assert(my_cylinder(2, 0, 0) == -1) # 1 unit outside cylinder

An Example

def intersection(f, g):
  def intersected(x, y, z):
    return min(f(x, y, z), g(x, y, z))
  return intersected

def plane(normal, offset):
  def equation(x, y, z):
    a, b, c = normal
    return x*a + y*b + z*c + offset
  return equation

my_plane = plane([0, 0, 1], 0) # as in: this is myPlane get theseSnakes off it!
my_cylinder = cylinder(1)
cut_cylinder = intersection(my_cylinder, my_plane)
assert(cut_cylinder(0, 0, -2) == my_plane(0, 0, -2)) # like a plane below z = 0
assert(cut_cylinder(0, 1, 2) == my_cylinder(0, 1, 2)) # like a cylinder above z = 0

Build Geometry with Closures

Combine primitives lazily

Build a syntax tree in the language of geometries and operations

Rendering Signed Distance Functions

Solving a bunch of equations

Polygonization

Populate a grid with values of the function and use that grid to approximate the surface with triangles

Raytracing

Setup a camera in space, and for each pixel, send a ray to intersect the surface. Draw the intersected points to the screen

Polygonization

Marching Cubes

Raytracing

How can Tensorflow help?

Our representations are trees, and therefore graphs

We would like to:

  • Reuse computation results automatically
  • Perform efficient computations on large tensors 
  • Parallelize!

Tensorflow comes with:

  • Common subexpression elimination
  • Automatic differentiation
  • Compilation to GPU code

Easy to implement*

import tensorflow as tf

cylinder_tf = my_cylinder(tf.constant(0.0), tf.constant(1.0), tf.constant(2.0))
session = tf.Session()
cylinder_tf_result = session.run(cylinder_tf)
assert(cylinder_tf_result == my_cylinder(0, 1, 2))

*Some Tensorflow-specific functions are needed like tf.maximum instead of max

Works with tf.Variable and tf.Placeholder too!

Polygonizing SDFs in Tensorflow

Input a tensor for each coordinate

min_bounds = [-1,-1,-1] # the geometric coordinate bounds
max_bounds = [1,1,1] 
output_shape = [200,200,200]
resolutions = list(map(lambda x: x*1j, output_shape))
mi0, mi1, mi2 = min_bounds
ma0, ma1, ma2 = max_bounds
r0, r1, r2 = resolutions
space_grid = np.mgrid[mi0:ma0:r0,mi1:ma1:r1,mi2:ma2:r2]
space_grid = space_grid.astype(np.float32)
x = tf.Variable(space_grid[0,:,:,:], trainable=False, name="X-Coordinates")
y = tf.Variable(space_grid[1,:,:,:], trainable=False, name="Y-Coordinates")
z = tf.Variable(space_grid[2,:,:,:], trainable=False, name="Z-Coordinates")

draw_op = function(x,y,z)
session = tf.Session()
session.run(tf.initialize_all_variables())
volumetric_grid = tf.session.run(draw_op)
marching_cubes(volumetric_grid) # => list of faces and vertices for rendering

Tensorboard!

Raytracing SDFs in Tensorflow

Solve the equations for t:

f(\bold r* t + \bold p) = 0
f(rt+p)=0f(\bold r* t + \bold p) = 0
t \geq 0
t0t \geq 0

First build image plane

def vector_fill(shape, vector):
  return tf.pack([
    tf.fill(shape, vector[0]),
    tf.fill(shape, vector[1]),
    tf.fill(shape, vector[2]),
  ])

resolution = (1920, 1080)
aspect_ratio = resolution[0]/resolution[1]
min_bounds, max_bounds = (-aspect_ratio, -1), (aspect_ratio, 1)
resolutions = list(map(lambda x: x*1j, resolution))
image_plane_coords = np.mgrid[min_bounds[0]:max_bounds[0]:resolutions[0],min_bounds[1]:max_bounds[1]:resolutions[1]]

# Find the center of the image plane
camera_position = tf.constant([-2, 0, 0])
lookAt = (0, 0, 0)
camera = camera_position - np.array(lookAt)
camera_direction = normalize_vector(camera)
focal_length = 1
eye = camera + focal_length * camera_direction

# Coerce into correct shape
image_plane_center = vector_fill(resolution, camera_position)

# Convert u,v parameters to x,y,z coordinates for the image plane
v_unit = [0, 0, -1]
u_unit = tf.cross(camera_direction, v_unit)
uc, vc = image_plane_coords
center = image_plane_center
image_plane = center + uc * vector_fill(resolution, u_unit) + vc * vector_fill(resolution, v_unit)

First build image plane

def vector_fill(shape, vector):
  return tf.pack([
    tf.fill(shape, vector[0]),
    tf.fill(shape, vector[1]),
    tf.fill(shape, vector[2]),
  ])

resolution = (1920, 1080)
aspect_ratio = resolution[0]/resolution[1]
min_bounds, max_bounds = (-aspect_ratio, -1), (aspect_ratio, 1)
resolutions = list(map(lambda x: x*1j, resolution))
image_plane_coords = np.mgrid[min_bounds[0]:max_bounds[0]:resolutions[0],min_bounds[1]:max_bounds[1]:resolutions[1]]

# Find the center of the image plane
camera_position = tf.constant([-2, 0, 0])
lookAt = (0, 0, 0)
camera = camera_position - np.array(lookAt)
camera_direction = normalize_vector(camera)
focal_length = 1
eye = camera + focal_length * camera_direction

# Coerce into correct shape
image_plane_center = vector_fill(resolution, camera_position)

# Convert u,v parameters to x,y,z coordinates for the image plane
v_unit = [0, 0, -1]
u_unit = tf.cross(camera_direction, v_unit)
uc, vc = image_plane_coords
center = image_plane_center
image_plane = center + uc * vector_fill(resolution, u_unit) + vc * vector_fill(resolution, v_unit)

Then build rays

# Populate the image plane with initial unit ray vectors
initial_vectors = image_plane - vector_fill(resolution, eye)
ray_vectors = normalize_vector(initial_vectors)

t = tf.Variable(tf.zeros_initializer(resolution, dtype=tf.float32), name="ScalingFactor")
space = (ray_vectors * t) + image_plane

# Name TF ops for better graph visualization
x = tf.squeeze(tf.slice(space, [0,0,0], [1,-1,-1]), squeeze_dims=[0], name="X-Coordinates")
y = tf.squeeze(tf.slice(space, [1,0,0], [1,-1,-1]), squeeze_dims=[0], name="Y-Coordinates")
z = tf.squeeze(tf.slice(space, [2,0,0], [1,-1,-1]), squeeze_dims=[0], name="Z-Coordinates")

evaluated_function = function(x,y,z)

Then build rays

The iteration operation

Made exquisitely simple because we're using SDF's!

epsilon = 0.0001
distance = tf.abs(evaluated_function)
distance_step = t - (tf.sign(evaluated_function) * tf.maximum(distance, epsilon))
ray_step = t.assign(distance_step)

Just run ray_step as many times as necessary to get convergence

Use automatic differentiation to calculate surface normals for shading

Since the function's value increases as you get closer to the surface, it's derivative always points inward.

Code elided, but here's pictures!

Convert to image

# Mask out pixels not on the surface
epsilon = 0.0001
bitmask = tf.less_equal(distance, epsilon)
masked = color_modulated_by_light * tf.to_float(bitmask)
sky_color = [70, 130, 180]
background = vector_fill(resolution, sky_color) * tf.to_float(tf.logical_not(bitmask))
image_data = tf.cast(masked + background, tf.uint8)

image = tf.transpose(image_data)
render = tf.image.encode_jpeg(image)

Running the iteration

session = tf.Session()
session.run(tf.initialize_all_variables())
step = 0
while step < 50:
  session.run(ray_step)
  step += 1

session.run(render) # <= returns jpeg data you can write to disk

Result in GIF form

Each frame a parameter in the model changes

Tensorboard!

Why is this interesting?

Topology Optimization

Use tf.Variables in creating the representation and use optimization routines to increase strength to weight ratio

If you want to train on synthetic images, consider rendering them in Tensorflow

That way you have the image data in a tensor and you can calculate per-pixel gradients

Tensorflow is a powerful, generic graph computation framework

With a day job as a neural network library

Thank you!

Full blog post on Medium, username @awildtaber

Twitter: @awildtaber, github: ataber

Many thanks to Íñigo Quilez, Matt Keeter, and the Hyperfun project for inspiration/ideas

deck

By ataber