A typical Vulkan frame
In previous articles, we have seen how to initialize Vulkan and allocate memory for buffers/images. We’ve also looked into synchronization. We will now investigate how to use graphic and compute pipelines. This allows you to render triangles on the screen, or do efficient computations.
In Vulkan, each compute/graphic pass is implemented in 2 steps:
- Create a
VkPipeline
object. To do so, we need to declare e.g. shader modules, uniform types, framebuffer attachment types, depth/stencil test and write, etc. In some apps, you can create allVkPipeline
objects during app initialization. But sometimes you will have to create pipelines while the app is already running. - Execute the pass. Call
vkCmdBindPipeline()
and record subsequent commands (e.g.vkCmdDraw()
orvkCmdDispatch()
) to acommand buffer
.
At the end of the frame, call vkQueueSubmit()
(submit the commands for execution) and vkQueuePresentKHR()
(present the swapchain image). We have already explained the render loop synchronization in “Vulkan synchronization”.
As graphic pipelines are much more complicated, we will explore compute first.
Compute shaders in Vulkan
Most modern video games do not just render triangles on screen. There is a need for efficient calculations on a massive scale. Examples include e.g. particle simulation. Each particle has some position, velocity and collides with the scene objects. If you wanted, you could do this with a graphic pipeline. In WebGL-GPU-particles I did the simulation inside the vertex shader. It’s very clunky but certainly possible. Nowadays you would use compute shaders.
Here are the objects used to declare a compute pipeline in Vulkan:
- Uniforms layout to declare the type of data on each shader binding. Raw Vulkan has complicated concepts like
VkDescriptorPool
andVkDescriptorSetLayout
. I recommend using the VK_KHR_push_descriptor device extension instead. Each pass (both graphic and compute) will always have a singleVkDescriptorSetLayout
. We will assign buffers/images to different bindings of this single descriptor set. - Push constant layout. A small (guaranteed 128 bytes) packet of data. It functions the same as a uniform buffer, but we can easily change it per draw call. This is usually faster than writing to a mapped GPU buffer.
- VkPipelineLayout. Combines layouts of uniforms and push constants.
- VkPipeline. Created using vkCreateComputePipelines(). For the most part, it requires just a
VkPipelineLayout
and a reference to the shader module.
As you can see, 3 out of 4 objects describe how to assign data consumed by the GPU. We have to specify e.g. which buffers and images to use, values for constants, etc. While uniforms are the ‘usual’ way of handling this task, push constants also work if the data is <128 bytes (not all hardware can handle more). We then create a VkPipeline
object and are ready to start the computations.
If you want to follow along, you can use one of Rust-Vulkan-TressFX’s simulation steps as a reference. It contains examples of both uniform buffers and push constants.
Declaring uniforms
Vulkan has an always-enabled extension called GL_KHR_vulkan_glsl. It describes GLSL changes wrt. OpenGL. E.g.
gl_VertexID
becomesgl_VertexIndex
andgl_InstanceID
becomesgl_InstanceIndex
. It also defineslayout(push_constant)
andlayout(set=1, ...)
. Uniforms are now required to be a member of a uniform buffer. Standalone declarations likeuniform float u_blurRadius;
are no longer valid.
As mentioned above, I recommend using the VK_KHR_push_descriptor device extension. It frees the user from managing VkDescriptorPools
and other unpleasantries. Let’s look at an example of GLSL shader code that declares uniforms:
layout(binding=0) uniform GlobalConfigUniformBuffer { ... };layout(binding=1) uniform sampler2D u_sourceTex;layout(binding=2) uniform sampler2D u_linearDepthTex;
The bindings do not have to be consecutive, but it’s a good practice. It can negatively affect the performance. Make sure to set
descriptorCount
to 0 for each unused binding.
With VK_KHR_push_descriptor
we no longer have to declare a set
, as there is only one. vkCreateDescriptorSetLayout() requires VkDescriptorSetLayoutCreateInfo. Remember to set VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR
in VkDescriptorSetLayoutCreateInfo.flags
. Description for each binding contains:
uint32_t binding
. Same as in the GLSL code.VkDescriptorType descriptorType
. Data type. Validated using Vulkan validation layers. Some example values:VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER
. A uniform buffer is a read-only buffer with CPU-written values. This can be data shared by every pass in a frame (like camera position, viewport dimensions, near and far planes, etc.). Or the data for a single currently rendered mesh.VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER
. Sampled image. E.g. texture from a hard drive or an attachment from the previous graphic pass.VK_DESCRIPTOR_TYPE_STORAGE_BUFFER
. Storage buffer. Big buffer with a lot of data that we can freely index (read/write) inside shaders.VK_DESCRIPTOR_TYPE_STORAGE_IMAGE
. Special image that does not use samplers. Allows to read and write from exact pixels as well as atomic operations.
uint32_t descriptorCount
. Used when this binding declares an array. Otherwise, it has a value of 1. Set to 0 if the binding is skipped.VkShaderStageFlags stageFlags
. Shader stages e.g.VK_SHADER_STAGE_FRAGMENT_BIT
. Validated using validation layers.VkSampler* pImmutableSamplers
. Optional field with VkSamplers.
All in all, the whole operation is quite straightforward and can be simplified to the following example Rust code:
let bindings: Vec<vk::DescriptorSetLayoutBinding> = vec![// custom utils to construct VkDescriptorSetLayoutBindingcreate_ubo_binding(0, vk::ShaderStageFlags::FRAGMENT),create_texture_binding(1, vk::ShaderStageFlags::FRAGMENT),create_texture_binding(2, vk::ShaderStageFlags::FRAGMENT),];let create_info = vk::DescriptorSetLayoutCreateInfo::builder().flags(vk::DescriptorSetLayoutCreateFlags::PUSH_DESCRIPTOR_KHR).bindings(&bindings).build();let ds_layout = device.create_descriptor_set_layout(&create_info, None).expect("Failed to create DescriptorSetLayout")
As you can see, we are describing everything that we have already specified in the GLSL shader code. Fortunately, there are tons of existing libraries that can use reflection to generate this data:
- https://github.com/KhronosGroup/SPIRV-Reflect.
- https://github.com/KhronosGroup/SPIRV-Cross. Has a user guide.
- AMD’s https://github.com/GPUOpen-LibrariesAndSDKs/V-EZ. Seems to be inactive.
- In Rust:
- https://github.com/gfx-rs/rspirv. A comprehensive set of tools to manage SPRI-V from Rust, might be overkill.
- https://github.com/Traverse-Research/rspirv-reflect. Does not work with VK_KHR_shader_non_semantic_info. Embark Studios kajiya uses a fork.
- https://github.com/PENGUINLIONG/spirq-rs.
- https://github.com/gwihlidal/spirv-reflect-rs. Wrapper for
SPIRV-Reflect
. - https://github.com/grovesNL/spirv_cross. Wrapper for
SPIRV-Cross
.
Later in this article, we will see how to bind the uniforms to the actual resources.
Declaring push constants
Vulkan push constants refer to a small amount of data that we can set during pass execution. Vulkan specification guarantees 128 bytes. Anything more is hardware-dependent. Imagine we are writing a separable filter like a blur. For performance reasons we first do a horizontal blur, followed by memory barriers and the vertical blur. But how do we inform the shader about the direction of the blur? We could bind a uniform buffer with this data and change the value between draw commands. Or just use push constants to transfer vec2
(memory aligned to vec4
):
// In GLSL:layout(push_constant) uniform Constants {// Direction of the blur:// - First pass: float2(1.0, 0.0)// - Second pass: float2(0.0, 1.0)vec4 u_sssDirection;};
If your shader uses push constants, you will need to declare it using VkPushConstantRange
:
fn get_push_constant_range() -> vk::PushConstantRange {vk::PushConstantRange::builder().offset(0).size(size_of::<SSSBlurPassPushConstants>() as _).stage_flags(vk::ShaderStageFlags::COMPUTE).build()}#[derive(Copy, Clone, Debug)]#[repr(C)]struct SSSBlurPassPushConstants {blur_direction: Vec4,}unsafe impl bytemuck::Zeroable for SSSBlurPassPushConstants {}unsafe impl bytemuck::Pod for SSSBlurPassPushConstants {}
Later in this article, we will see how to assign the values to push constants.
Creating compute VkPipeline
To create a compute pipeline from VkDescriptorSetLayout
, VkPushConstantRange
you need to:
-
Create a pipeline layout that combines
VkDescriptorSetLayout
andVkPushConstantRange
.
Neither vkCreatePipelineLayout()
nor vkCreateComputePipelines()
have many options. I can guarantee that 9 out of 10 times you will use this exact code:
pub unsafe fn create_compute_pipeline(device: &ash::Device,uniform_layouts: &[vk::DescriptorSetLayout],push_constant_ranges: &[vk::PushConstantRange],pipeline_cache: vk::PipelineCache,shader_path: &str,) -> (vk::PipelineLayout, vk::Pipeline) {// create vk::PipelineLayoutlet pl_create_info = vk::PipelineLayoutCreateInfo::builder().set_layouts(uniform_layouts).push_constant_ranges(push_constant_ranges).build();let pipeline_layout = device.create_pipeline_layout(&pl_create_info, None).expect("Failed to create_pipeline_layout");// create vk::Pipelinelet (module_cs, stage_cs) = load_compute_shader(device, shader_path);let create_info = vk::ComputePipelineCreateInfo::builder().stage(stage_cs).layout(pipeline_layout).build();let pipelines = device.create_compute_pipelines(pipeline_cache, &[create_info], None).expect("Failed to create_compute_pipelines");device.destroy_shader_module(module_cs, None);(pipeline_layout, take_first(pipelines))}
The only undefined function is load_compute_shader()
:
unsafe fn load_shader_module(device: &ash::Device, path: &std::path::Path) -> vk::ShaderModule {let mut file =std::fs::File::open(path).expect(&format!("Could not open file '{}'", path.to_string_lossy()));let spirv_code = ash::util::read_spv(&mut file).unwrap();let create_info = vk::ShaderModuleCreateInfo::builder().code(&spirv_code).build();device.create_shader_module(&create_info, None).expect(&format!("Failed to create shader module from file '{}'",path.to_string_lossy()))}unsafe fn load_shader(device: &ash::Device,stage: vk::ShaderStageFlags,path: &std::path::Path,) -> (vk::ShaderModule, vk::PipelineShaderStageCreateInfo) {let shader_fn_name = unsafe { std::ffi::CStr::from_ptr("main\0".as_ptr() as *const i8) };let shader_module = load_shader_module(device, path);let stage_stage = vk::PipelineShaderStageCreateInfo::builder().stage(stage).module(shader_module).name(shader_fn_name).build();(shader_module, stage_stage)}pub unsafe fn load_compute_shader(device: &ash::Device,shader_path: &str,) -> (vk::ShaderModule, vk::PipelineShaderStageCreateInfo) {load_shader(device,vk::ShaderStageFlags::COMPUTE,std::path::Path::new(shader_path),)}
We will use load_shader()
with a graphic pipeline later on too. I recommend adding a check if the file has the .spv
extension. Surely, no one ever accidentally tried to load a .glsl
file, right?
This concludes creating a compute pipeline. It can now be used to execute physics simulations or other compute tasks.
Executing compute dispatch
To dispatch compute shader using VkPipeline
you will usually have code like so:
add_synchronization_barriers(...);device.cmd_bind_pipeline(command_buffer,vk::PipelineBindPoint::COMPUTE,pipeline,);bind_uniforms(...);bind_push_constants(...);let group_count_x = get_group_count_x(...);device.cmd_dispatch(command_buffer, group_count_x, 1, 1);
That’s all there is to it. If you’ve used compute shaders in OpenGL, you already know about workgroup dimensions. In CUDA it is the size of blocks per grid and the size of threads per block. vkCmdDispatch() allows to specify groupCountX
(calculated by get_group_count_x()
), and groupCountY
, groupCountZ
(both 1 in the code above). As you can see, using compute passes in Vulkan can be a bit tedious, but it’s simple. Don’t forget that RenderDoc offers a debugger!
It’s useful to have a callback before and after each compute/render pass. It’s used to assign a profiler scope or a debug name.
Let’s now look at how to bind values for uniforms and push constants. This is something we will do for graphic passes too.
Binding uniform values
Let’s look again at the sample GLSL code that declares uniforms:
layout(binding=0) uniform GlobalConfigUniformBuffer { ... };layout(binding=1) uniform sampler2D u_sourceTex;layout(binding=2) uniform sampler2D u_linearDepthTex;
It declares:
GlobalConfigUniformBuffer
. Uniform buffer struct that will be bound to some VkBuffer memory (with offset from the buffer start).u_sourceTex
,u_linearDepthTex
. Sampled images.
Fortunately, with VK_KHR_push_descriptor, it’s easy to assign the resources to each binding. The vkCmdPushDescriptorSetKHR() function takes the following parameters:
VkCommandBuffer commandBuffer
. Self explanatory.VkPipelineBindPoint pipelineBindPoint
. EitherVK_PIPELINE_BIND_POINT_COMPUTE
orVK_PIPELINE_BIND_POINT_GRAPHICS
.VkPipelineLayout layout
. We have already created this object.uint32_t set
. Always 0 when usingVK_KHR_push_descriptor
.VkWriteDescriptorSet* pDescriptorWrites
. Assignments between bindings and resources.
The extension is called push descriptor, which has nothing in common with push constants.
Some fields in VkWriteDescriptorSet
are used only for VkBuffers
, some only for VkImages
, and some for both:
VkDescriptorSet dstSet
. Always 0 when usingVK_KHR_push_descriptor
.uint32_t dstBinding
. Value from GLSL.uint32_t dstArrayElement
. Usually set to 0. It’s used only with arrays (e.g.layout(binding=0) uniform MaterialsData { Material u_Materials[]; };
). It indicates offset into the array.VkWriteDescriptorSet
was originally used to updatedescriptorCount
bindings starting at offsetdstArrayElement
.uint32_t descriptorCount
. Number of elements inpImageInfo
/pBufferInfo
/pTexelBufferView
. Only one pointer contains a value, the rest isNULL
. This field is automatically calculated by Ash.VkDescriptorType descriptorType
. Type of the uniform. E.g. a uniform/storage buffer or sampled/storage image etc.VkDescriptorImageInfo* pImageInfo
. Descriptor for images. Equivalent to a tuple:(VkImageView, VkLayout, Option<VkSampler>)
.VkDescriptorBufferInfo* pBufferInfo
. Descriptor for buffers. Equivalent to a tuple:(VkBuffer, offset, size)
.VkBufferView* pTexelBufferView
. Used only if you want to access buffer contents using image operations.
Personally, I’ve created a utility function that handles declarations like:
let uniform_resouces = [BindableResource::Buffer {usage: BindableBufferUsage::UBO, // or BindableBufferUsage::SSBObinding: BINDING_INDEX_CONFIG_UBO, // value from GLSLbuffer: (config_buffer, 0, vk::WHOLE_SIZE), // (VkBuffer, offset, size)},BindableResource::SampledImage {binding: BINDING_INDEX_SCENE_DEPTH, // value from GLSLimage_view: depth_stencil_image_view, // VkImageViewlayout: depth_stencil_image_layout, // VkLayoutsampler: sampler_nearest, // VkSampler},BindableResource::StorageImage {binding: BINDING_INDEX_HEAD_POINTERS_IMAGE, // value from GLSLimage_view: ppll_head_pointers_image_view, // VkImageViewlayout: ppll_head_pointers_layout, // VkLayout},];bind_resources_to_descriptors_for_compute(..., uniform_resouces);
You can find the code in uniforms.rs: bind_resources_to_descriptors(). I’ve slightly changed names in the code sample above to make it easier to understand. After all, it’s just a simple call to vkCmdPushDescriptorSetKHR(). With this, the uniforms are set for the next graphic/compute command. Don’t forget you can use RenderDoc to preview the values.
Another approach to uniforms is to use bindless descriptors. You have 1 giant collection of descriptor sets that is shared by every shader. Read more in Vincent Parizet’s “Bindless descriptor sets”.
Setting push constants values
Setting the push constants values requires a call to vkCmdPushConstants(). It takes VkPipelineLayout
, VkShaderStageFlags
(e.g. VK_SHADER_STAGE_FRAGMENT_BIT
), and the CPU memory region that contains values (defined as offset
, size
, and void* pValues
). The documentation says you can use offset and size to partially update the data. Given we have only 128 bytes, there isn’t much difference if we override the whole region. If memory limitations are a concern, there are several ways to circumvent that. You can e.g. declare an array of uniform buffers and then use push constants
to provide an index into the array. This is often used with materials. In GLSL you have layout(binding=0)uniform MaterialsData { Material u_Materials[]; };
. Using push constants
you can then provide an index to this array on a per-drawcall basis. Not the most performant solution, but should work fine.
Void pointers like void* pValues
can be a daily occurrence in C/C++ (remember about alignment!). In Rust I recommend bytemuck:
// We have declared the `SSSBlurPassPushConstants` struct// in the section dedicated to declaring push constantslet push_constants = SSSBlurPassPushConstants {blur_direction: vec4(blur_direction.x, blur_direction.y, 0.0, 0.0),};let push_constants_bytes = bytemuck::bytes_of(&push_constants);device.cmd_push_constants(command_buffer,pipeline_layout,vk::ShaderStageFlags::FRAGMENT,0, // offsetpush_constants_bytes, // data);
If you want to verify that push constants values are correct (data alignment!), I recommend RenderDoc.
Rendering in Vulkan
To draw a mesh using render pass we will need a few objects first:
VkDescriptorSetLayout
,VkPushConstantRange
,VkPipelineLayout
. Declares uniforms and push constants. Just like we have seen for compute passes.- VkRenderPass. Contains information about expected attachments and the order of subpasses.
- VkPipeline. Combines all the above objects. Adds other things like
vertex_input_state
,input_assembly_state
,rasterization_state
,depth_stencil_state
,color_blend_state
etc. For compute pass, we createdVkPipeline
usingvkCreateComputePipelines()
. Now we will usevkCreateGraphicsPipelines()
.VkPipeline
for a graphic pass is by far the most complex object that exists in Vulkan.
Let’s look at each of the above objects and see what settings are available. If You want, you can follow along with the source code for Rust Vulkan TressFX’s SSSBlurPass.
Creating VkRenderPass
VkRenderPass
is a combination of attachments and subpasses. vkCreateRenderPass() requires us to fill VkRenderPassCreateInfo.
If you are starting with Vulkan, I recommend having only one subpass inside each render pass. This will help you plan out synchronization. For example, image layout changes and barriers done between vkCmdBeginRenderPass()
and vkCmdEndRenderPass()
have a different visibility than global.
I advise you to manually write all barriers (vkCmdPipelineBarrier()) before each graphic/compute pass. With VkAttachmentDescription
you can do layout transitions, but it’s a bit clunky. Both VkAttachmentDescription and VkSubpassDependency need knowledge about resource usage before the current pass. Imagine you write a pass that takes a depth buffer image as a uniform. To set VkAttachmentDescription.initialLayout
, VkSubpassDependency.srcStageMask
, and VkSubpassDependency.srcAccessMask
you need to know what was the last usage of the depth buffer. Was it written in the forward pass? Was it read as a uniform for SSAO? You can hardcode the values, but if you swap the order of passes, there might be a lot of bug fixing to do. It’s much easier to get this info during the drawing process. Since the previous pass already recorded its commands, you can have a ‘last layout’ field. On the other hand, this solution could fail in the case of multithreaded command recording. Of course, if you have a render graph you already have this information.
Use the synchronization Vulkan validation layer to easily fix invalid image layouts and intra-pass dependencies.
Since we decided to nearly entirely skip subpasses, all that is left is attachment definitions. The important fields are:
VkFormat format
. Valid VkFormat value e.g.VK_FORMAT_R32G32B32A32_SFLOAT
,VK_FORMAT_R8G8B8A8_UINT
, orVK_FORMAT_D24_UNORM_S8_UINT
.VkSampleCountFlagBits samples
. Choose a number of samples from a list of predefined variants. Remember to also set the VkPipelineMultisampleStateCreateInfo.rasterizationSamples field later. WithVK_EXT_sample_locations
, you can also specify the x and y coordinates of each sample.VkAttachmentLoadOp loadOp
,VkAttachmentStoreOp storeOp
. Make sure to read the docs for VkAttachmentLoadOp and VkAttachmentStoreOp carefully. E.g. any value other thanVK_ATTACHMENT_LOAD_OP_LOAD
can clear/discard current memory content. If an attachment has depth/stencil format,loadOp
andstoreOp
are operations for depth.stencilLoadOp
,stencilStoreOp
are only for stencil. For color attachments, you can setstencilLoadOp
andstencilStoreOp
to0
. This way validation layers will not complain if an uninitialized memory results in a random value.VkImageLayout initialLayout
,VkImageLayout finalLayout
. See above why using these fields for an implicit layout transition barrier is clunky. In the simplest case, both layouts have the same value:VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
- color attachment (including swapchain image),- one of
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
/VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL
/VK_IMAGE_LAYOUT_STENCIL_ATTACHMENT_OPTIMAL
- for depth/stencil image.
With a few utility functions, this is how the render pass definition looks like for Rust-Vulkan-TressFX’s SSSBlurPass. The pass blurs forward render output based on the ‘skin’ stencil mask.
unsafe fn create_render_pass(device: &ash::Device) -> vk::RenderPass {// depth attachment is not written to, but it's needed for the stencil testlet depth_attachment = create_depth_stencil_attachment(0, // idxForwardPass::DEPTH_TEXTURE_FORMAT,vk::AttachmentLoadOp::LOAD, // depth_load_opvk::AttachmentStoreOp::STORE, // depth_store_opvk::AttachmentLoadOp::LOAD, // stencil_load_opvk::AttachmentStoreOp::STORE, // stencil_store_opvk::ImageLayout::DEPTH_STENCIL_ATTACHMENT_OPTIMAL, // initial and final layout);// result imagelet color_attachment = create_color_attachment(1, // idxForwardPass::DIFFUSE_TEXTURE_FORMAT,vk::AttachmentLoadOp::LOAD, // load_opvk::AttachmentStoreOp::STORE, // store_opvk::ImageLayout::COLOR_ATTACHMENT_OPTIMAL, // or (for swapchain image) PRESENT_SRC_KHR);create_render_pass_from_attachments(device, Some(depth_attachment), &[color_attachment])}
The code for create_render_pass_from_attachments()
is in Rust-Vulkan-TressFX’s src\vk_utils\render_pass.rs. As you can see, if subpasses are not used, VkRenderPass
is not that complicated.
Even if you use a depth/stencil buffer only for tests, you still have to declare it in VkRenderPass
with VK_ATTACHMENT_LOAD_OP_LOAD
. It does not matter if you write to it.
Keep in mind that
VkRenderPass
does not allocate memory nor createVkImages
. It only declares that the pass will useVkImage
with format e.g.VK_FORMAT_R32G32B32A32_SFLOAT
. Later, we will specifyVkRenderPassBeginInfo.framebuffer
based onVkImageViews
(itself derived from GPU-memory-backedVkImage
).
There is a popular device extension VK_KHR_dynamic_rendering that simplifies rendering. It removes the concept of subpasses which means that using them for image layout transitions is no longer possible. You might notice that this is exactly what we did in the code above. With this extension, VkRenderPass
becomes obsolete and is supplanted by VkPipelineRenderingCreateInfo. It also removes the need for VkFramebuffer
and vkCmdBeginRenderPass()
. Data from both is aggregated in vkCmdBeginRenderingKHR() that takes e.g. an array of VkImageViews
to use as color attachments. Conversion between VK_KHR_dynamic_rendering
and the code style I suggested above is trivial. You can find better examples in Lesley Lai’s “VK_KHR_dynamic_rendering tutorial”.
Creating graphic pipeline
VkPipeline
for graphic pass combines VkPipelineLayout
and VkRenderPass
. It also allows controlling e.g. VertexInputState, InputAssemblyState, RasterizationState, DepthStencilState, or ColorBlendState.
Creating VkPipeline requires filling all fields of the VkGraphicsPipelineCreateInfo. It’s tedious. I recommend writing a utility that will initialize VkGraphicsPipelineCreateInfo
with some default values. This way:
- There is only one place to fix the errors. As you might have read in my “Debugging Vulkan using RenderDoc” article, I made the mistake of using memory after
Vec
went out of scope. This crashed RenderDoc at random. Fixing it was easy enough. - Makes it impossible for certain classes of errors to happen. In Rust, Ash will happily initialize
VkStencilOpState.write_mask
to0
. It’s a bit dreary to remember to set it every time. - It shows what is actually important. In Rust Vulkan TressFX’s forward rendering pass, the only thing that matters is depth stencil settings. I have worked with many people who like to copy-paste entire pages of code or JIRA ticket descriptions. Nothing says ‘a job’ more than diffing text of 5 JIRA tickets! As they say “If I had more time, I would have written a shorter letter”.
The default values I’ve chosen are ones that can render a fullscreen quad. E.g. depth, stencil test/write disabled, no culling, blend mode to override current content, etc.
I’ve also suggested a similar approach in the “OpenGL state management” article. At least in Vulkan, the settings have less unhinged names.
It’s worthwhile to create a few functions to generate common settings combinations. E.g. fn stencil_write_if_touched(reference: u32, override_current: bool)
or depth_stencil_noop()
etc. Usefulness depends on use cases. For example, in Rust-Vulkan-TressFX, there were only 4 different use cases for depth/stencil. Similar utilities proved helpful for the rest of VkGraphicsPipelineCreateInfo's
fields.
Unfortunately, VkGraphicsPipelineCreateInfo
has a serious problem from the API usage standpoint. It contains a lot of pointers. So if you write a function to initialize this structure, you cannot just return it to the caller. The local variables would be out of scope. The two easiest solutions would be to:
- Have a class that stores all transient data in members.
- Use a closure.
Let’s be honest, classes and closures are the same thing. Just a question which syntax do you prefer.
Here is a simplified code from Rust Vulkan TressFX:
pub fn create_graphic_pipeline_with_defaults(render_pass: &vk::RenderPass,pipeline_layout: &vk::PipelineLayout,shader_paths: (&str, &str), // vertex, fragment shader .spv pathscolor_attachment_count: usize, // used for default blend state// callback that takes pre-filled `VkGraphicsPipelineCreateInfo`,// overrides the default values if needed and returns// final VkPipeline object.creator: impl Fn(vk::GraphicsPipelineCreateInfoBuilder) -> vk::Pipeline,) -> vk::Pipeline {let stages = load_render_shaders(shader_paths);let create_info_builder = vk::GraphicsPipelineCreateInfo::builder()... // set other default values.stages(&stages).layout(*pipeline_layout).render_pass(*render_pass);// invoke the callback with prefilled valuescreator(create_info_builder)}// usage:create_graphic_pipeline_with_defaults(render_pass,pipeline_layout,Self::SHADER_PATHS,Self::COLOR_ATTACHMENT_COUNT,|builder| {let depth_stencil = vk::PipelineDepthStencilStateCreateInfo::builder()....build();let pipeline_create_info = builder.vertex_input_state(...).depth_stencil_state(&depth_stencil).build();create_pipeline(device, pipeline_cache, pipeline_create_info)},)
You can find the full usage sample in ForwardPass.create_pipeline(), and the utility itself in pipeline.rs.
As for what each struct field does, I recommend reading the docs carefully. If you have worked with any graphic API before, you will find the options familiar. Some of the options I’ve described in detail in the “OpenGL state management” article. For some fields, you can use dynamic state to skip the value for now. You will have to provide it when executing the pipeline instead.
Think about it as providing a context for the shader compiler. SPIR-V code contained in
.spv
files is quite generic. The more information we provide during compilation, the more optimizations the driver can do.
VkPipeline explosion
VkGraphicsPipelineCreateInfo
is the most complex object in Vulkan. Unfortunately, you might have to create a lot of VkPipelines
for a single graphic pass. In Rust-Vulkan-TressFX, both mesh and hair can cast shadows. The fragment shader is the same in both cases. Only vertex shader changes. This means that for this pass I had to create:
- 1
VkRenderPass
. Same attachments for both meshes and hair. - 1
VkDescriptorSetLayout
. Only hair rendering required uniforms (storage buffers for hair positions, some per-object settings, etc.). - 1
VkPushConstantRange
. Both meshes and hair can use the samePush Constants
layout. It includes the model matrix, shadow caster position, and viewport size. - 2
VkPipelineLayouts
. One for meshes, one for hair. - 2
VkPipelines
. One for meshes, one for hair.
This gets exponentially worse in more complex apps. There might be many vertex layouts. Or material properties that are either a constant or are sampled from a texture. To solve this, some of the values can be provided during pipeline execution as a dynamic state. There are Vulkan extensions that extend which state can be dynamic: VK_EXT_extended_dynamic_state
, VK_EXT_extended_dynamic_state2, VK_EXT_extended_dynamic_state3, and VK_EXT_vertex_input_dynamic_state. Probably easiest if you look through VkDynamicState values yourself.
Another solution for vertex layouts is to skip vertex buffers. Our shader can fetch and interpret vertex data itself. This is often done using VK_KHR_buffer_device_address. Read more in Vincent Parizet’s “Bindless descriptor sets” or Hans-Kristian Arntzen’s “New game changing Vulkan extensions for mobile: Buffer Device Address”. This technique might even be faster than vertex buffers.
With VK_KHR_pipeline_library device extension you can split the vkCreateGraphicsPipelines()
into more manageable stages. For example, imagine that between 2 VkGraphicsPipelineCreateInfo
objects, only VkGraphicsPipelineCreateInfo.pVertexInputState
changes. E.g. different vertex formats for some scene objects. Currently, in Vulkan, both vertex and fragment shaders would need to be compiled twice. VK_KHR_pipeline_library
can optimize this process. Vertex format does not affect fragment shaders.
Creating framebuffers
We have already declared everything that is a part of a graphic pass. We will now create an entirely separate VkFramebuffer
. As you might know from other APIs, the framebuffer is a collection of VkImages
. vkCreateFramebuffer() takes an array of VkImageViews
, size (as VkExtent2D
) and a VkRenderPass
object. My utility in Rust:
pub unsafe fn create_framebuffer(device: &ash::Device,render_pass: vk::RenderPass,image_views: &[vk::ImageView],size: &vk::Extent2D,) -> vk::Framebuffer {let create_info = vk::FramebufferCreateInfo::builder().render_pass(render_pass).attachments(image_views).width(size.width).height(size.height).layers(1).build();device.create_framebuffer(&create_info, None).expect("Failed to create framebuffer")}
Make sure that VkImageView's
format is the same as the one declared in VkRenderPass
.
Here are a few use cases when creating framebuffers:
- Allocate new images for the framebuffer.
- Framebuffer reuses images created by the previous pass. E.g. Rust-Vulkan-TressFX’s forward pass writes to the depth buffer. Later, hair rendering reuses the depth buffer for depth tests.
- Framebuffer uses externally-created images. This will happen e.g. if you want to render to swapchain’s image view.
There are many other use cases. Somewhere in your app, you will have to store all the created VkImages
, VkImageViews
, and VkFramebuffers
. There is often a need to access each specific VkImageView
and VkFramebuffer
. A popular solution is e.g. token system. Your forward pass would ‘return’ forwardPassDiffuse: ResourceToken
. You can use the token to retrieve the VkImageView
later. Other approaches exist too.
Executing draw commands
To draw triangles onto a framebuffer you will usually have a code like so:
add_synchronization_barriers(...);// begin render passlet clear_values: [ClearValue; _] = [...];let render_area: vk::Rect2D = size_to_rect_vk(&viewport_size);let render_pass_begin_info = vk::RenderPassBeginInfo::builder().render_pass(render_pass) // created during pass initialization.framebuffer(framebuffer) // created during pass initialization.render_area(render_area).clear_values(&clear_values).build();device.cmd_begin_render_pass(command_buffer,&render_pass_begin_info,vk::SubpassContents::INLINE,);// set dynamic state that was declared in VkGraphicsPipelineCreateInfolet viewport: vk::Viewport = create_viewport(&viewport_size);device.cmd_set_viewport(command_buffer, 0, &[viewport]);// bind pipelinedevice.cmd_bind_pipeline(command_buffer,vk::PipelineBindPoint::GRAPHICS,pipeline,);// draw callsfor entity in &scene.objects {bind_uniforms(entity, ...);bind_push_constants(entity, ...);device.cmd_bind_vertex_buffers(command_buffer, 0, &[entity.vertex_buffer], &[0]);device.cmd_bind_index_buffer(command_buffer,entity.index_buffer,0,vk::IndexType::UINT32,);device.cmd_draw_indexed(command_buffer,entity.vertex_count,entity.instance_cnt,entity.first_index,entity.vertex_offset,entity.first_instance);}// end render passdevice.cmd_end_render_pass(command_buffer);
While that’s a lot of code, there is not much new. We have already seen most of it in the compute pass. Albeit we have to call vkCmdBeginRenderPass() (and the corresponding vkCmdEndRenderPass()), the VkRenderPassBeginInfo doesn’t have many fields to fill. And what it has is quite self-explanatory. When defining VkGraphicsPipelineCreateInfo
you could have provided a dynamic state. Now it’s time to provide the actual values using e.g. vkCmdSetViewport()
, vkCmdSetScissor()
, etc. Like with the compute pass, call vkCmdBindPipeline()
but this time with VK_PIPELINE_BIND_POINT_GRAPHICS
instead of VK_PIPELINE_BIND_POINT_COMPUTE
. We’ve also seen binding uniforms and push constants.
Calls to vkCmdBindVertexBuffers() and vkCmdBindIndexBuffer() are optional. It may happen that certain objects do not have vertex or index buffers. In Rust-Vulkan-TressFX, when rendering hair, the vertex data is taken from storage buffers instead. Keep in mind that RenderDoc uses vertex and index buffers to preview the geometry for each draw call. While this is just an ‘extra’ and does not affect anything, the preview saved me a few hours of debugging.
As of Vulkan 1.3, there are only 4 vkCmdDraw*()
commands:
Of course, in the real app, there are more efficient ways of rendering the triangles. Rebinding uniforms for every object is probably not the best idea. You might also store all vertex and index buffers in big continuous VkBuffers
. Use entity.first_index
and entity.vertex_offset
parameters to control offsets.
It’s useful to have a callback before and after each compute/render pass. It’s used to assign a profiler scope or a debug name.
Drawing fullscreen triangle
Drawing a triangle that covers every pixel of the screen is one of the most common operations. Every post-processing effect will use this technique. Sascha Willems’s “Vulkan tutorial on rendering a fullscreen quad without buffers” works wonders. I’ve created a single vertex shader and then reused it in every suitable pass.
Unfortunately, if you come from OpenGL there is a caveat. Vulkan uses different coordinate system. I admit, that I always fix it by trial and error. Read Matthew Wellings’s “The new Vulkan Coordinate System” and Johannes Unterguggenberger’s “Setting Up a Proper Projection Matrix for Vulkan” for a detailed explanation of the consequences.
Summary
This article concludes the series on using the Vulkan API. When I started Rust-Vulkan-TressFX, I had only the Vulkan-tutorial and the official specification as a guide. We’ve worked our way through countless functions, parameters, and structures. We’ve seen which ones we can ignore.
If you want to know more about Vulkan, I recommend looking through the references under every post. There is a scarcity of detailed explanations available on the net. Hope you’ve learned something.
References
- Arseny Kapoulkine’s “Writing an efficient Vulkan renderer”
- Yuriy O’Donnell’s “FrameGraph: Extensible Rendering Architecture in Frostbite”
- Vulkan docs for Push Constants
- CUDA best practices guide
- Sascha Willems’s “Vulkan tutorial on rendering a fullscreen quad without buffers”
- Read Matthew Wellings’s “The new Vulkan Coordinate System”
- Johannes Unterguggenberger’s “Setting Up a Proper Projection Matrix for Vulkan”
- Vincent Parizet’s “Bindless descriptor sets”
- Lesley Lai’s “VK_KHR_dynamic_rendering tutorial”
- Embark Studios’s kajiya