Graphics Programming Virtual Meetup

Discord

Twitter

vk_mini_path_tracer

Chapter 4-5
Command Buffers and Writing an image

Link to the tutorial

 

https://nvpro-samples.github.io/vk_mini_path_tracer/

 

Source code


https://github.com/nvpro-samples/vk_mini_path_tracer

Chapter 4

Command Buffers

OpenGL vs Vulkan

"Execution of Work"

  • Immediately executed
    • As-if rule applies
  • Implicitly synchronize
  • Large State machine
  • Deferred executed
    • Manual Submission
  • Explicit Synchronization
  • Little State Machine
    • Automatically resets

Command Buffers

  • Where we 'write' our GPU commands to
    • vkCmdDraw(command_buffer, ...);
      
  • Categories of commands:
    • Binding - Pipelines, Shader resources, Buffers
    • Drawing/Executing (raster & compute)
    • Synchronization - Barriers
    • Data movement - Copying data, transitioning images
  • Designed to be quick to write
    • Needs to write only a few bytes and incrementing a pointer per command

Command Buffers cont'

  • Commands in a command buffer aren't guaranteed to operate in that order
    • Must manually define synchronization
  • Able to be recorded in parallel
    • Must record command buffers in separate threads
  • Explicitly submitted to a 'Queue'
    • vkQueueSubmit(...);

"Queue's" in Vulkan

  • The place where you submit work
  • Queues can support different capabilities & combinations of them
    • Graphics
    • Compute
    • Transfer
  • Grouped into "Queue Families"
    • Can be multiple "Queues" in a single family
  • Most hardware has a Uber Queue that supports all three capability types (abbreviated the GCT queue)

Command Pools

  • Hold the memory of for Command Buffer
  • A command pool can only work with one Queue family
  • Multiple Command Buffers can be allocated from a single Command Pool
VkCommandPoolCreateInfo cmdPoolInfo = nvvk::make<VkCommandPoolCreateInfo>();
cmdPoolInfo.queueFamilyIndex        = context.m_queueGCT;
VkCommandPool cmdPool;
NVVK_CHECK(vkCreateCommandPool(context, &cmdPoolInfo, nullptr, &cmdPool));

Quick Vulkan & NVVK notes

NVVK_CHECK() macro is for checking return values of Vulkan functions

If a function doesn't return void, it returns `VkResult` which is an enum

Returning `VK_SUCCESS` signals that the function didn't fail

 

VK_NULL_HANDLE is a type alias for 0. This is used when there is no 'valid' handle to use

However, in C++`nullptr` can be used instead

Allocating a Command Buffer

  • Command buffers can be 'primary' or 'secondary'
    • Secondary is useful for multi threaded recording
    • Secondary command buffers can't be submitted
      • Instead they are 'called' by primary buffers
  • We will only use primary command buffers
VkCommandBufferAllocateInfo cmdAllocInfo = nvvk::make<VkCommandBufferAllocateInfo>();
cmdAllocInfo.level                       = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
cmdAllocInfo.commandPool                 = cmdPool;
cmdAllocInfo.commandBufferCount          = 1;
VkCommandBuffer cmdBuffer;
NVVK_CHECK(vkAllocateCommandBuffers(context, &cmdAllocInfo, &cmdBuffer));

Command Buffer Lifecycle

  • Multiple 'phases' once allocated
    • Initial - Call "Begin" on it to make it ready to record
    • Recording - Must "End" it when done recording
      • Where we call 'vkCmdYYY()` functions
    • Executable - Ready to be submitted
    • Pending - Has been submitted, currently running
      • Can't modify any resources the command buffer might reference
      • Returns to 'executable' once finished
      • Can be 'reset' to start the cycle over again.

Begin the Command Buffer

VkCommandBufferBeginInfo beginInfo = nvvk::make<VkCommandBufferBeginInfo>();
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
NVVK_CHECK(vkBeginCommandBuffer(cmdBuffer, &beginInfo));
  • ONE_TIME_SUBMIT - Don't allow reusing this command buffer.
  • This call will 'reset' the command buffer if it had been used previously.
    • This is just moving a pointer back to the start, nothing expensive
  • We want to 'fill' the GPU buffer with the same value, 0.5f
  • This is to make sure we are actually modifying the code on the GPU
  • The reinterpret cast is due to the API only accepting 'uint32_t'.
    • We want it to be filled with the bit pattern of a float, thus we must do dirty things
const float fillValue = 0.5f;
const uint32_t& fillValueU32 = reinterpret_cast<const uint32_t&>(fillValue);
vkCmdFillBuffer(cmdBuffer, buffer.buffer, 0, bufferSizeBytes, fillValueU32);

Record into the Command Buffer

Meaning: Our 'Fill GPU Buffer' command might not finish before we start reading from it on the CPU

 

Consecutive commands in a command buffer may work in any order they like, so long as they follow manually defined 'synchronization points'

Problem:
Vulkan doesn't guarantee when a command will be finished

Meaning: Use pipeline barriers to insert the desired order everything must happen in.

Solution:
Define the order things must happen in

// Add a command that says "Make it so that memory writes by the vkCmdFillBuffer call
// are available to read from the CPU." (In other words, "Flush the GPU caches
// so the CPU can read the data.") To do this, we use a memory barrier.
VkMemoryBarrier memoryBarrier = nvvk::make<VkMemoryBarrier>();
memoryBarrier.srcAccessMask   = VK_ACCESS_TRANSFER_WRITE_BIT; // Make transfer writes
memoryBarrier.dstAccessMask   = VK_ACCESS_HOST_READ_BIT;      // Readable by the CPU
vkCmdPipelineBarrier(cmdBuffer,                      // The command buffer
                     VK_PIPELINE_STAGE_TRANSFER_BIT, // From the transfer stage
                     VK_PIPELINE_STAGE_HOST_BIT,     // To the CPU
                     0,                              // No special flags
                     1, &memoryBarrier,              // An array of memory barriers
                     0, nullptr, 0, nullptr);        // No other barriers
  • They synchronize memory
  • But done by specifying the 'stages' for the various memory operations
  • A stage is a discrete 'step' that the GPU has when it is doing work
    • Examples: Vertex shader, Fragment shader, Transfer
      • does include Compute shaders
  • They are likely the most 'complex' part of learning Vulkan
    • Necessary for optimal performance
    • Not obvious how to use them from the get go

But what are Pipeline Barriers?

  • Can be imagined like a scheduling dependency
    • You have to finish task A before you can start task B
  • Several 'Types' of barriers:
    • Memory Barriers are what we just used
    • Buffer Memory Barriers
      • Can apply to a specific range of a buffer
    • Image Memory Barriers
      • Can apply to a specific image (& part of said image)
      • Can perform 'layout transitions'
  • More technical info is in the tutorial, what we have suits our needs currently

More about Pipeline Barriers

Makes the command buffer ready to be submitted and executed

Ending a Command Buffer

NVVK_CHECK(vkEndCommandBuffer(cmdBuffer));

Now to submit it!

Submitting a Command Buffer

VkSubmitInfo submitInfo       = nvvk::make<VkSubmitInfo>();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers    = &cmdBuffer;
NVVK_CHECK(vkQueueSubmit(context.m_queueGCT, 1, &submitInfo, VK_NULL_HANDLE));

vkQueueSubmit performance note:
This call is expensive.
If you can, throw multiple 'command buffers' into the same submit when possible.

Now to read the data back! Right?

  • Not quite, we can't just start reading yet
  • We need to make sure the Command Buffer is finished executing
  • But we have Pipeline Barriers right?
  • That is to guarantee that the memory is ready to read once the `vkCmdFill` is finished
    • It doesn't tell us when it is ready to be read from
       

"Easy Solution"

  • Just wait for the GPU to finish everything it is doing
  • vkQueueWaitIdle(context.m_queueGCT);
  • This will pause the running thread until the Queue we submitted the command buffer on finishes
  • A "Sledge Hammer" type solution.
    • If other unrelated work was happening, we would wait for that work too
  • But: We aren't doing anything else so this is fine

"Better Solution"

  • Use VkFence's to individually wait on the submission
  • Can put a VkFence in a vkQueueSubmit
    • Now the fence can be waited upon
  • 'vkWaitForFences' will only sleep the thread until only the desired the submission is finished
  • Can go further with 'vkGetFenceStatus' to poll the fence to not put the thread to sleep
  • Ultimately, using the simplest solution is best

Cleanup

Delete the Command Pool once we are done

vkDestroyCommandPool(context, cmdPool, nullptr);

Can delete the Command Buffer individually, but easier to delete the pool

Finally we are doing work on the GPU!

Output should now be:

First four elements: 0.500000, 0.500000, 0.500000, 0.500000

Chapter 5

Writing an image to the disk

MUCH easier than the previous Chapter

  • Steps:
    • Create Image
    • Write data
    • Close Image
    • Success!

Add one library to the list of 'inclues'

#define STB_IMAGE_WRITE_IMPLEMENTATION
#include <fileformats/stb_image_write.h>
float* fltData = reinterpret_cast<float*>(data);
stbi_write_hdr("out.hdr", 
    render_width, render_height, 3, reinterpret_cast<float*>(data));

Change Printing code to:

3 is for 3 channels, for RGB

Voila!

Tech Note: sRGB & Linearity

  • We just wrote 0.5 to the entire image
  • So that should be literally 0.5 we are seeing!
  • Except no, while the image contains 0.5, we are seeing a slightly different color
    • Most image editors will list an sRGB color of (188/255, 188/255, 188/255)
  • sRGB uses a 'curve' because humans do not perceive brightness linearly
  • The details are worth reading about but nuanced
    • Don't want to ruin a perfectly good short chapter!
  • Generally, use Linear space for everything but the final render

Next week:

Compute Shaders

Thanks for listening!

 

Questions?

Graphics Programming Virtual Meetup

Made with Slides.com