In some cases, it is possible to do different codegen for the same RSPL code, with different amount of SU/VUs. This can in turn create more optimization opportunities. It would be nice if during the reord, the optimizer also alternated between equivalent codegen sequences finding the one that optimizes more given the current context.
For instance:
- Copy of vectors (
v1 = v2) can either be one VU (vor) or two SU (sqv + lqv on scratch space)
- Read/write to single lane can either be
mtc2/mfc2 or sh+lsv / ssv + lh on scratch space
- Single-lane movements (
vmov) can be also emitted as ssv+lsv on scratch space
- In addition to the above, mfc2 from a vec32 can also be done with two mfc2 intermixed with a vector right shift (assuming a scratch vector register can be used)
In some cases, it is possible to do different codegen for the same RSPL code, with different amount of SU/VUs. This can in turn create more optimization opportunities. It would be nice if during the reord, the optimizer also alternated between equivalent codegen sequences finding the one that optimizes more given the current context.
For instance:
v1 = v2) can either be one VU (vor) or two SU (sqv+lqvon scratch space)mtc2/mfc2orsh+lsv/ssv+lhon scratch spacevmov) can be also emitted asssv+lsvon scratch space