[Batch 4] Dedicated transfer queue + staging buffer ring

## Summary

Create a dedicated Vulkan transfer queue with a ring-buffered staging allocator for chunk mesh uploads. Currently mesh uploads happen on the graphics queue and may cause stalls. A dedicated transfer queue allows uploads to overlap with rendering.

**Depends on:** #376 (new vertex format — staging buffer stride must match)

## Current State

Chunk mesh uploads happen via `GlobalVertexAllocator.upload()`:
1. Worker thread builds mesh vertices on CPU
2. Main thread calls `rm.updateBuffer()` or maps vertex buffer memory
3. Vertex data copied to GPU-visible memory
4. Chunk becomes renderable

The `rm.updateBuffer()` path may stall the graphics queue if the buffer is in use by a previous frame. At 128+ chunks with constant streaming, this creates frame hitches.

## Target Architecture

### Dedicated Transfer Queue
- Query Vulkan for a queue family with `TRANSFER_BIT` (but not `GRAPHICS_BIT`, if available)
- Fallback: share graphics queue if no dedicated transfer queue exists
- Create command pool + command buffer for transfer operations

### Staging Buffer Ring
```zig
const StagingRing = struct {
    buffer: VkBuffer,
    memory: VkDeviceMemory,
    mapped: [*]u8,
    capacity: usize,          // total ring size (e.g., 64MB)
    head: usize,              // write cursor
    tail: usize,              // read cursor (last completed fence)
    frame_fences: [MAX_FRAMES_IN_FLIGHT]VkFence,
    frame_offsets: [MAX_FRAMES_IN_FLIGHT]usize,
    
    pub fn beginFrame(self: *StagingRing, frame_index: usize) usize;
    pub fn allocate(self: *StagingRing, size: usize, alignment: usize) ![]u8;
    pub fn submit(self: *StagingRing, transfers: []Transfer) !void;
};
```

- Ring buffer: HOST_VISIBLE memory, persistently mapped
- `allocate()` returns a slice into the ring for the caller to write into
- `submit()` issues `vkCmdCopyBuffer` commands on the transfer queue
- `beginFrame()` advances tail based on previous frame's fence completion

### Upload Flow
1. Worker thread produces mesh vertex data on CPU
2. Main thread: `staging.allocate(size)` → get CPU-visible slice
3. Copy vertex data into staging slice
4. Record `vkCmdCopyBuffer(staging → megabuffer at offset)`
5. Submit to transfer queue with fence
6. Next frame: fence completed → staging ring space reclaimed

## Implementation Plan

### Step 1: Transfer queue detection + setup
- In `VulkanDevice` init: query queue families for dedicated transfer queue
- Create command pool with `TRANSIENT_BIT` + `RESET_COMMAND_BUFFER_BIT`
- If no dedicated queue: share graphics queue (no parallelism, but staging still works)

### Step 2: Staging ring allocator
- Allocate `VK_BUFFER_USAGE_TRANSFER_SRC_BIT` with `HOST_VISIBLE | HOST_COHERENT`
- 64MB default capacity (configurable)
- Ring-wrap: if allocation would wrap, pad to end and allocate from start
- Frame-aware: each frame's allocations are released when that frame's fence completes

### Step 3: Integration with GlobalVertexAllocator
- `upload()` method: instead of direct map+copy, allocate from staging ring, copy, then issue GPU-side copy
- The megabuffer stays device-local (not host-visible), which is better for GPU performance
- `vkCmdCopyBuffer` from staging → device-local megabuffer

### Step 4: Queue synchronization
- Transfer queue fence: signals when copy completes
- Semaphore: transfer complete → graphics queue can consume the data
- Or: use `vkQueueSubmit` with timeline semaphore if available

## Files to Create

- `src/engine/graphics/vulkan/transfer_queue.zig` — queue management, staging ring

## Files to Modify

- `src/engine/graphics/vulkan/device.zig` — transfer queue detection
- `src/world/chunk_allocator.zig` — `upload()` uses staging ring
- `src/world/lod_upload_queue.zig` — LOD uploads use staging ring
- `src/engine/graphics/vulkan/rhi_resource_lifecycle.zig` — resource transitions for copies

## Testing

- [ ] Chunk meshes upload correctly via transfer queue
- [ ] No frame hitches during heavy chunk streaming
- [ ] Fallback works when no dedicated transfer queue
- [ ] Staging ring wraps correctly without corruption
- [ ] Memory usage bounded (ring doesn't grow unbounded)
- [ ] LOD mesh uploads work through same path

**Roadmap:** `docs/PERFORMANCE_ROADMAP.md` — Batch 4, Issue 3B-1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Batch 4] Dedicated transfer queue + staging buffer ring #384

Summary

Current State

Target Architecture

Dedicated Transfer Queue

Staging Buffer Ring

Upload Flow

Implementation Plan

Step 1: Transfer queue detection + setup

Step 2: Staging ring allocator

Step 3: Integration with GlobalVertexAllocator

Step 4: Queue synchronization

Files to Create

Files to Modify

Testing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Batch 4] Dedicated transfer queue + staging buffer ring #384

Description

Summary

Current State

Target Architecture

Dedicated Transfer Queue

Staging Buffer Ring

Upload Flow

Implementation Plan

Step 1: Transfer queue detection + setup

Step 2: Staging ring allocator

Step 3: Integration with GlobalVertexAllocator

Step 4: Queue synchronization

Files to Create

Files to Modify

Testing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions