Graphics Programming Virtual Meetup

Discord

Twitter

Visibility Buffer Rendering

Charles Giessen

Sources

Talk outline

  • Background
  • Visibility buffer
  • Hardware Partial Derivatives
  • Interpolation and Analytic Partial Derivatives
  • Performance considerations
  • Conclusion

Background

  • Forward rendering
    • Computer material properties and lighting in fragment shader
    • Implicit step done by the GPU to interpolate inputs
  • Light culling/tiling
    • Bucket lights into spacial groups, only light a fragment based on lights in the closest bucket
    • Orthogonal to Forward or Deferred rendering
      • Not discussed here in detail

Background cont'

  • Deferred Rendering
    • Only compute material properties in fragment shader, store in G-Buffer
    • Compute lighting in separate step using G-Buffer as input
    • Lighting is evaluated once per pixel, unlike forward rendering which is for every triangle.
    • G-Buffer usable in post processing effects
    • Still has implicit interpolation step per triangle

Visibility Buffer?

  • Split up Interpolation from material evaluation and lighting
    • Rasterize triangle and store 'triangle ids' in a buffer
      • "Visibility Buffer"
      • Still needs depth buffer - can be combined or separate
    • Feed visibility buffer into material evaluation
      • Can use compute for material/lighting

But why?

  • Use rasterizer only when necessary
  • We can fetch and interpolate texture data ourselves
  • Geared towards high triangle to pixel ratios
    • As 'subpixel triangle' density tend to choke hardware

But how?

  • Use the following fragment shader for all triangle rendering

 

 

 

 

  • Visibility buffer contains just the triangle number and draw call number bitpacked together
  • Note that all geometry must already be GPU resident buffers so it can be queried later

 

// Pass 0: Rasterize all meshes, just output thin visibility
U32 VisibilityPS(U32 drawCallId, U32 triangleId)
{
  return (drawCallId << NUM_TRIANGLE_BITS) | triangleId;
}

Two flavors

  • Combined Material and Lighting evaluation
    • Simpler to implement
    • Bigger fragment shader
    • Best when only 1 material is used
  • Separate passes for material evaluation and lighting
    • Generate a G-Buffer from Visibility Buffer
    • Feed it into the lighting evaluation step
    • More steps
    • Allows different materials to be used more easily
    • What this presentation will discuss in detail

Material Evaluation

  • Sample from visibility buffer at pixel
  • Determine where in the triangle it is (interpolate)
  • Compute Material and write to G-Buffer
// Pass 1: In a CS convert from triangle ID to BRDF data
BrdfData MaterialCS(float2 screenPos)
{
  U32 drawCallId = FetchVisibility() >> NUM_TRIANGLE_BITS;
  U32 triangleId = FetchVisibility() &      TRIANGLE_MASK;

  Interpolators interp    = FetchInterpolators(drawCallId, triangleId);
  BrdfData      brdfData  = MaterialEval(interp);
  return brdfData;
}

Lighting Calculation

  • Injest G-Buffer, output final pixel color
  • Already a step in deferred rendering
  • After this is when post processing is applied on its way to the final framebuffer output
// Pass 2: In a CS, fetch BRDF data and calculate lighting
LightData LightingCS(float2 screenPos)
{
  BrdfData      brdfData  = FetchMaterial(screenPos);
  LightData     lightData = LightingEval(brdfData);
  return lightData;
}

Multiple materials?

  • Want to invoke the material shader on only the pixels it applies to.
  • Achieved through multiple mechanisms
    • Idea presented in this article differs from what I found out in the wild
    • Generally, you render a 'fullscreen quad' per material and early-out if the material id doesn't match the material
      • Can apply tiling & other fancy culling to bring this down
    • The article does an interesting sorting routine to determine how many pixels have each material and sort them by material before dispatching.

Hardware Partial Deriviatives

  • Pixels aren't computed individually, but in 2x2 quads
  • This is to allow partial derivatives to be computed
    • Allows use of 'finite difference method' by sampling the value at 4 locations and taking the difference.
    • dx = left pixel value - right pixel value
    • dy = top pixel value - botton pixel value
  • Notice how all 4 lanes are needed to compute deriviatives
  • Active lane == Pixel makes it to final output
  • Helper lane == Pixel is only needed for deriviatives
  • Helper lanes take up lanes from active lanes
  • Note how 12 total quad lanes are needed to render 3 triangles

Quad Utilization Efficiency

  • Quads are underutilized in forward and deferred rendering with small triangles
Material Lighting
Forward 4x n/a
Deferred 4x 1x
Visibility 1x 1x

Extremely bad utilization examples

Interpolation and Analytic Partial Derivatives

  • Since we replaced hardware interpolation we gotta do it ourselves
  • Fetch the 3 vertices, interpolate using the xy location
  • Requires storing vertices in a post transform cache
    • Not required, but a good thing to explore
  • Reuse barycentrics for all texture samples
  • Code samples can be found online
    • I think its straightforward to understand

May need to fallback to Finite Difference Method

  • Unreal's Nanite tries to use analytic derivatives when possible
  • Not always possible, falls back to FDM for derivatives

Performance considerations

  • A big win in high triangle density scenes
    • Much better quad utilization
  • Beware of memory/cache coherence
    • Lots of places for memory stalls to occur as triangle data is fetched
  • Much harder to do generative geometry
    • Need to have all vertices available for shading later
  • Doesn't gain much if anything for big triangles
  • Multiple materials complicate matters
    • Different strategies to make it work

Conclusion

  • Its cool
  • Its fast
  • Implement your own rasterizer in compute shaders and ditch the fixed function pipeline today
  • Has some complications with the following techniques
    • MSAA
    • Variable Rate Shading
    • TAA, upscaling, and temporal techniques

Thanks for listening


Questions?

Graphics Programming Virtual Meetup

Visibility Buffer

By Charles Giessen

Visibility Buffer

  • 197