-
Notifications
You must be signed in to change notification settings - Fork 0
[Batch 4] Dedicated transfer queue + staging buffer ring #384
Copy link
Copy link
Open
Labels
batch-4Batch 4: Advanced GPUBatch 4: Advanced GPUdocumentationImprovements or additions to documentationImprovements or additions to documentationengineenhancementNew feature or requestNew feature or requesthotfixperf/renderingRendering pipeline performanceRendering pipeline performance
Description
Summary
Create a dedicated Vulkan transfer queue with a ring-buffered staging allocator for chunk mesh uploads. Currently mesh uploads happen on the graphics queue and may cause stalls. A dedicated transfer queue allows uploads to overlap with rendering.
Depends on: #376 (new vertex format — staging buffer stride must match)
Current State
Chunk mesh uploads happen via GlobalVertexAllocator.upload():
- Worker thread builds mesh vertices on CPU
- Main thread calls
rm.updateBuffer()or maps vertex buffer memory - Vertex data copied to GPU-visible memory
- Chunk becomes renderable
The rm.updateBuffer() path may stall the graphics queue if the buffer is in use by a previous frame. At 128+ chunks with constant streaming, this creates frame hitches.
Target Architecture
Dedicated Transfer Queue
- Query Vulkan for a queue family with
TRANSFER_BIT(but notGRAPHICS_BIT, if available) - Fallback: share graphics queue if no dedicated transfer queue exists
- Create command pool + command buffer for transfer operations
Staging Buffer Ring
const StagingRing = struct {
buffer: VkBuffer,
memory: VkDeviceMemory,
mapped: [*]u8,
capacity: usize, // total ring size (e.g., 64MB)
head: usize, // write cursor
tail: usize, // read cursor (last completed fence)
frame_fences: [MAX_FRAMES_IN_FLIGHT]VkFence,
frame_offsets: [MAX_FRAMES_IN_FLIGHT]usize,
pub fn beginFrame(self: *StagingRing, frame_index: usize) usize;
pub fn allocate(self: *StagingRing, size: usize, alignment: usize) ![]u8;
pub fn submit(self: *StagingRing, transfers: []Transfer) !void;
};- Ring buffer: HOST_VISIBLE memory, persistently mapped
allocate()returns a slice into the ring for the caller to write intosubmit()issuesvkCmdCopyBuffercommands on the transfer queuebeginFrame()advances tail based on previous frame's fence completion
Upload Flow
- Worker thread produces mesh vertex data on CPU
- Main thread:
staging.allocate(size)→ get CPU-visible slice - Copy vertex data into staging slice
- Record
vkCmdCopyBuffer(staging → megabuffer at offset) - Submit to transfer queue with fence
- Next frame: fence completed → staging ring space reclaimed
Implementation Plan
Step 1: Transfer queue detection + setup
- In
VulkanDeviceinit: query queue families for dedicated transfer queue - Create command pool with
TRANSIENT_BIT+RESET_COMMAND_BUFFER_BIT - If no dedicated queue: share graphics queue (no parallelism, but staging still works)
Step 2: Staging ring allocator
- Allocate
VK_BUFFER_USAGE_TRANSFER_SRC_BITwithHOST_VISIBLE | HOST_COHERENT - 64MB default capacity (configurable)
- Ring-wrap: if allocation would wrap, pad to end and allocate from start
- Frame-aware: each frame's allocations are released when that frame's fence completes
Step 3: Integration with GlobalVertexAllocator
upload()method: instead of direct map+copy, allocate from staging ring, copy, then issue GPU-side copy- The megabuffer stays device-local (not host-visible), which is better for GPU performance
vkCmdCopyBufferfrom staging → device-local megabuffer
Step 4: Queue synchronization
- Transfer queue fence: signals when copy completes
- Semaphore: transfer complete → graphics queue can consume the data
- Or: use
vkQueueSubmitwith timeline semaphore if available
Files to Create
src/engine/graphics/vulkan/transfer_queue.zig— queue management, staging ring
Files to Modify
src/engine/graphics/vulkan/device.zig— transfer queue detectionsrc/world/chunk_allocator.zig—upload()uses staging ringsrc/world/lod_upload_queue.zig— LOD uploads use staging ringsrc/engine/graphics/vulkan/rhi_resource_lifecycle.zig— resource transitions for copies
Testing
- Chunk meshes upload correctly via transfer queue
- No frame hitches during heavy chunk streaming
- Fallback works when no dedicated transfer queue
- Staging ring wraps correctly without corruption
- Memory usage bounded (ring doesn't grow unbounded)
- LOD mesh uploads work through same path
Roadmap: docs/PERFORMANCE_ROADMAP.md — Batch 4, Issue 3B-1
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
batch-4Batch 4: Advanced GPUBatch 4: Advanced GPUdocumentationImprovements or additions to documentationImprovements or additions to documentationengineenhancementNew feature or requestNew feature or requesthotfixperf/renderingRendering pipeline performanceRendering pipeline performance