Graphics Programming Virtual Meetup

Discord

Twitter

Homogeneous Perspective Transform

Jim Blinn's Corner

May 1993

Link

Perspective Transformation Matrix

  • Transforms camera space to 'screen space'
  • Makes 3d scenes look like they do in real life
  • A non-affine transformation
  • Is harder to intuit than affine transformations

Affine?

  • Affine transformations preserve the following:
    • Collinearity
      • Point on a line still line on a line after transformation
    • Ratios of distances
      • midpoint of a line segment is the same after transformation
  • Translation, Rotation, Scaling, and Shear are all Affine
  • Affine 4x4 matrices have `(0,0,0,1)` in the last column

Non-Affine

  • A superset of affine transformations
  • Do not preserve ratios of distances
    • Midpoint moves around
  • Commonly called 'Projection Transformations'
  • Perspective transformation falls in this category
  • Non-Affine 4x4 matrices do not have `(0,0,0,1)` in the last column.
    • These values represent the projection component of the matrix

Homogeneous Coordinates

Wikipedia says:
"Any point in the projective plane is represented by a triple (X, Y, Z), called homogeneous coordinates or projective coordinates of the point, where X, Y and Z are not all 0. The point represented by a given set of homogeneous coordinates is unchanged if the coordinates are multiplied by a common factor. "

 

Homogeneous Coordinates

  • The coordinate system used in projective geometry
  • For 3d Cartesian coordinates, tack on a w coordinate
  • Converting between Cartesian requires dividing the x, y, and z by the w coordinate
    • (X, Y, Z) == (x/w, y/w, z/w)
  • Can represent points at infinity
    • (x, y, z, 0) - the 0 for the w value is used to denote ∞
  • For every (X, Y, Z) Cartesian coordinate, there are infinite homogeneous coordinates
    • As w divides x,y,z into X, Y, Z, this w could scale arbitrarily

Fun things with these coordinates

  • Being able to represent ∞ means meaningful geometric calculations can happen
  • A negative w wraps the x,y,z around infinity
    • (1, 0, 0, w) with [w = -0.1] is (-10, 0, 0)
    • Also: (-1, 0, 0, 0) & (1, 0, 0, 0) are the same point
  • Creates a sort of mobius twist to the geometry

4 "coordinate" spaces

  1. 3D - Cartesian before perspective transformation
    • (X, Y, Z)
  2. 3DH - Homogeneous before perspective transformation
    • (x, y, z, 1) - x,y,z have same value as X, Y, Z
    • here things lie in the w=1 hyperplane in 4D space
  3. 3DHP - Homogeneous after perspective transformation -
    • (x, y, z, w) - different values than before
  4. 3DP - Cartesian after perspective transformation
    • (X, Y, Z) - (x/w, y/w, z/w) - divide by w

Simple Perspective

\frac{Y_s}{D} = \frac{Y}{Z + D} \\ Y_s = \frac{Y}{Z/D + 1} \\ \frac{Y}{Z/D + 1} = \frac{y_s}{w_s} \\ w_s = Z/D + 1
(x_s, y_s, z_s, w_s) = (X, Y, Z, Z/D + 1)

Simple Perspective as a Matrix

  • Note: Row order
(x_s, y_s, z_s, w_s) = (X, Y, Z, 1) \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 1/D\\ 0 & 0 & 0 & 1\\ \end{bmatrix} = (X, Y, Z, 1) \textbf{P}
  • The matrix multiplication is in homogeneous space
  • To get real world values, divide by the resulting W expression, 1/D + 1 ( I think this is the expression)

Z coordinate transformation

  • Follows the equation
    • Z_s = Z / ( Z/D + 1)
  • (X, Y, 0, 1)P = (X, Y, 0, 1)
    • z = 0 plane doesn't move
  • (0, 0, -D, 1)P = (0, 0, -D, 0)
    • Eyepoint moves to infinity
  • (0, 0, D, 0)P = (0, 0, D, 1)
    • A point infinitely away becomes a local point

Homogeneous Space Interpretation

  • Perspective transform is a shear in the w direction
  • Points on 3DP have w=1
  • These points shear up and down to get 3DHP
  • Then they project back onto w=1 for 3DP
  • note how the Z=0 and Z=∞ move

And now in stereo!

  • The projection moves the points in the W axis
    • But not the X or Z
  • Converting back to Cartesian coordinates yields our perspective transformation

A view from the real world

  • Elides the intermediate, 4 dimensional steps
  • Good visualization of the final output

Perspective Projection is a 3D transformation

  • The previous slides Cube is now deformed
    • But its not 'on' a flat plane
  • This is handled by the rasterizer. It can now use the `X, Y` coordinate and simply scale + translate to go from NDC to framebuffer coordinates.
  • In a sense its doing a 'orthogonal projection' because depth doesn't alter the shape in that final process

Note about GPU pipelines

  • Post-vertex, the pipeline will divide by w
    • Vertex positions shouldn't have w = 0
  • Normals and other attributes dont get divided
  • Also fixed function vs programmable GPU's altered the pipeline
    • The math however stays the same

Ultimate Understanding

YZ slice of space. Y=0 doesn't move

Little details that add a lot of noise to the matrix

  • The core of the perspective is the homogeneous transformation & back again
  • Deciding how the near & far, left & right, top & bottom planes depend on the set of conventions used
    • Whether depth is [-1, 1] or [0,1]
    • Generally the X and Y axis are [-1, 1]

"Fuller" Perspective Matrix

 

  • Use the 'field of view' for width & height
  • Zn & Zf are the near and far clipping planes
  • (again, row-column)
\begin{bmatrix} c & 0 & 0 & 0\\ 0 & c & 0 & 0\\ 0 & 0 & Q & s\\ 0 & 0 & -QZ_n & 0\\ \end{bmatrix} \\ s = sin(fov/2)\\ c = cos(fov/2)\\ Q = -\frac{s}{1-Z_n/Z_f}

Reverse-Z, a better solution to depth

  • This paper describes needing to carefully select the near and far planes
  • Flipping the table on the problem by reversing which direction we store Z values fixes this
  • I can confirm it is awesome
  • Deserves its own talk...

Graphics Programming Virtual Meetup

Made with Slides.com