Skip to content

Make better use of VPM #113

@doe300

Description

@doe300

Up to 2 VPM writes can be queued to VPM write FIFO (QPU -> VPM), write will block when FIFO full.
-> No need to stall/delay between VPM writes, currently used
-> Information could be used to insert non-VPM-access between pairs of VPM writes (e.g. write vpm; write vpm; something else to prevent stall; write vpm; ...)

Up to 2 VPM read setups can be queued to VPM read FIFO (VPM -> QPU), further writes to setup register will be ignored, outstanding VPM reads on program finish are cancelled.
-> We could queue up to 2 read setups before waiting for data to be available. Also, for loops, we could issue the read setup for the next iteration in advance, this needs emptying of data after loop ends (to empty the data read for the one-after-last iteration).

DMA load/store operations cannot be queued, but DMA load and DMA store can run concurrently.

Is VPM access required to be synchronized between all QPUs?
There is no statement in the specification to (or against) that fact. Is the VPM really shared (as in locking required) or is it "shared" but can still be used by every QPU at once (like the TMU, no locking required)?

https://github.com/nineties/py-videocore uses mutex to lock VPM access in parallel examples, https://github.com/mn416/QPULib does not seem to use a mutex, https://github.com/maazl/vc4asm uses semaphores to lock VPM access.

Sources:
VideoCore IV Specification, pages 55+

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions