Graphics Programming Virtual Meetup


Discord


vk_mini_path_tracer
Chapter 4-5
Command Buffers and Writing an image
Link to the tutorial
https://nvpro-samples.github.io/vk_mini_path_tracer/
Source code
https://github.com/nvpro-samples/vk_mini_path_tracer
Chapter 4
Command Buffers
OpenGL vs Vulkan
"Execution of Work"
- Immediately executed
- As-if rule applies
- Implicitly synchronize
- Large State machine
- Deferred executed
- Manual Submission
- Explicit Synchronization
- Little State Machine
- Automatically resets
Command Buffers
- Where we 'write' our GPU commands to
-
vkCmdDraw(command_buffer, ...);
-
- Categories of commands:
- Binding - Pipelines, Shader resources, Buffers
- Drawing/Executing (raster & compute)
- Synchronization - Barriers
- Data movement - Copying data, transitioning images
- Designed to be quick to write
- Needs to write only a few bytes and incrementing a pointer per command
Command Buffers cont'
- Commands in a command buffer aren't guaranteed to operate in that order
- Must manually define synchronization
- Able to be recorded in parallel
- Must record command buffers in separate threads
- Explicitly submitted to a 'Queue'
-
vkQueueSubmit(...);
-
"Queue's" in Vulkan
- The place where you submit work
- Queues can support different capabilities & combinations of them
- Graphics
- Compute
- Transfer
- Grouped into "Queue Families"
- Can be multiple "Queues" in a single family
- Most hardware has a Uber Queue that supports all three capability types (abbreviated the GCT queue)
Command Pools
- Hold the memory of for Command Buffer
- A command pool can only work with one Queue family
- Multiple Command Buffers can be allocated from a single Command Pool
VkCommandPoolCreateInfo cmdPoolInfo = nvvk::make<VkCommandPoolCreateInfo>();
cmdPoolInfo.queueFamilyIndex = context.m_queueGCT;
VkCommandPool cmdPool;
NVVK_CHECK(vkCreateCommandPool(context, &cmdPoolInfo, nullptr, &cmdPool));Quick Vulkan & NVVK notes
NVVK_CHECK() macro is for checking return values of Vulkan functions
If a function doesn't return void, it returns `VkResult` which is an enum
Returning `VK_SUCCESS` signals that the function didn't fail
VK_NULL_HANDLE is a type alias for 0. This is used when there is no 'valid' handle to use
However, in C++`nullptr` can be used instead
Allocating a Command Buffer
- Command buffers can be 'primary' or 'secondary'
- Secondary is useful for multi threaded recording
- Secondary command buffers can't be submitted
- Instead they are 'called' by primary buffers
- We will only use primary command buffers
VkCommandBufferAllocateInfo cmdAllocInfo = nvvk::make<VkCommandBufferAllocateInfo>();
cmdAllocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
cmdAllocInfo.commandPool = cmdPool;
cmdAllocInfo.commandBufferCount = 1;
VkCommandBuffer cmdBuffer;
NVVK_CHECK(vkAllocateCommandBuffers(context, &cmdAllocInfo, &cmdBuffer));Command Buffer Lifecycle
- Multiple 'phases' once allocated
- Initial - Call "Begin" on it to make it ready to record
- Recording - Must "End" it when done recording
- Where we call 'vkCmdYYY()` functions
- Executable - Ready to be submitted
- Pending - Has been submitted, currently running
- Can't modify any resources the command buffer might reference
- Returns to 'executable' once finished
- Can be 'reset' to start the cycle over again.
Begin the Command Buffer
VkCommandBufferBeginInfo beginInfo = nvvk::make<VkCommandBufferBeginInfo>();
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
NVVK_CHECK(vkBeginCommandBuffer(cmdBuffer, &beginInfo));- ONE_TIME_SUBMIT - Don't allow reusing this command buffer.
- This call will 'reset' the command buffer if it had been used previously.
- This is just moving a pointer back to the start, nothing expensive
- We want to 'fill' the GPU buffer with the same value, 0.5f
- This is to make sure we are actually modifying the code on the GPU
- The reinterpret cast is due to the API only accepting 'uint32_t'.
- We want it to be filled with the bit pattern of a float, thus we must do dirty things
const float fillValue = 0.5f;
const uint32_t& fillValueU32 = reinterpret_cast<const uint32_t&>(fillValue);
vkCmdFillBuffer(cmdBuffer, buffer.buffer, 0, bufferSizeBytes, fillValueU32);Record into the Command Buffer
Meaning: Our 'Fill GPU Buffer' command might not finish before we start reading from it on the CPU
Consecutive commands in a command buffer may work in any order they like, so long as they follow manually defined 'synchronization points'
Problem:
Vulkan doesn't guarantee when a command will be finished
Meaning: Use pipeline barriers to insert the desired order everything must happen in.
Solution:
Define the order things must happen in
// Add a command that says "Make it so that memory writes by the vkCmdFillBuffer call
// are available to read from the CPU." (In other words, "Flush the GPU caches
// so the CPU can read the data.") To do this, we use a memory barrier.
VkMemoryBarrier memoryBarrier = nvvk::make<VkMemoryBarrier>();
memoryBarrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT; // Make transfer writes
memoryBarrier.dstAccessMask = VK_ACCESS_HOST_READ_BIT; // Readable by the CPU
vkCmdPipelineBarrier(cmdBuffer, // The command buffer
VK_PIPELINE_STAGE_TRANSFER_BIT, // From the transfer stage
VK_PIPELINE_STAGE_HOST_BIT, // To the CPU
0, // No special flags
1, &memoryBarrier, // An array of memory barriers
0, nullptr, 0, nullptr); // No other barriers- They synchronize memory
- But done by specifying the 'stages' for the various memory operations
- A stage is a discrete 'step' that the GPU has when it is doing work
- Examples: Vertex shader, Fragment shader, Transfer
- does include Compute shaders
- Examples: Vertex shader, Fragment shader, Transfer
- They are likely the most 'complex' part of learning Vulkan
- Necessary for optimal performance
- Not obvious how to use them from the get go
But what are Pipeline Barriers?
- Can be imagined like a scheduling dependency
- You have to finish task A before you can start task B
- Several 'Types' of barriers:
- Memory Barriers are what we just used
- Buffer Memory Barriers
- Can apply to a specific range of a buffer
- Image Memory Barriers
- Can apply to a specific image (& part of said image)
- Can perform 'layout transitions'
- More technical info is in the tutorial, what we have suits our needs currently
More about Pipeline Barriers
Makes the command buffer ready to be submitted and executed
Ending a Command Buffer
NVVK_CHECK(vkEndCommandBuffer(cmdBuffer));Now to submit it!
Submitting a Command Buffer
VkSubmitInfo submitInfo = nvvk::make<VkSubmitInfo>();
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &cmdBuffer;
NVVK_CHECK(vkQueueSubmit(context.m_queueGCT, 1, &submitInfo, VK_NULL_HANDLE));vkQueueSubmit performance note:
This call is expensive. If you can, throw multiple 'command buffers' into the same submit when possible.
Now to read the data back! Right?
- Not quite, we can't just start reading yet
- We need to make sure the Command Buffer is finished executing
- But we have Pipeline Barriers right?
- That is to guarantee that the memory is ready to read once the `vkCmdFill` is finished
- It doesn't tell us when it is ready to be read from
- It doesn't tell us when it is ready to be read from
"Easy Solution"
- Just wait for the GPU to finish everything it is doing
-
vkQueueWaitIdle(context.m_queueGCT); - This will pause the running thread until the Queue we submitted the command buffer on finishes
- A "Sledge Hammer" type solution.
- If other unrelated work was happening, we would wait for that work too
- But: We aren't doing anything else so this is fine
"Better Solution"
- Use VkFence's to individually wait on the submission
- Can put a VkFence in a vkQueueSubmit
- Now the fence can be waited upon
- 'vkWaitForFences' will only sleep the thread until only the desired the submission is finished
- Can go further with 'vkGetFenceStatus' to poll the fence to not put the thread to sleep
- Ultimately, using the simplest solution is best
Cleanup
Delete the Command Pool once we are done
vkDestroyCommandPool(context, cmdPool, nullptr);Can delete the Command Buffer individually, but easier to delete the pool
Finally we are doing work on the GPU!
Output should now be:
First four elements: 0.500000, 0.500000, 0.500000, 0.500000
Chapter 5
Writing an image to the disk
MUCH easier than the previous Chapter
- Steps:
- Create Image
- Write data
- Close Image
- Success!
Add one library to the list of 'inclues'
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include <fileformats/stb_image_write.h>float* fltData = reinterpret_cast<float*>(data);
stbi_write_hdr("out.hdr",
render_width, render_height, 3, reinterpret_cast<float*>(data));Change Printing code to:
3 is for 3 channels, for RGB

Voila!
Tech Note: sRGB & Linearity
- We just wrote 0.5 to the entire image
- So that should be literally 0.5 we are seeing!
- Except no, while the image contains 0.5, we are seeing a slightly different color
- Most image editors will list an sRGB color of (188/255, 188/255, 188/255)
- sRGB uses a 'curve' because humans do not perceive brightness linearly
- The details are worth reading about but nuanced
- Don't want to ruin a perfectly good short chapter!
- Generally, use Linear space for everything but the final render
Next week:
Compute Shaders
Thanks for listening!
Questions?
Graphics Programming Virtual Meetup
Vulkan Mini Path Tracer Chapter 4-5
By Charles Giessen
Vulkan Mini Path Tracer Chapter 4-5
- 92