I find that cudaMemcpy2DAsync is used for kv cache transmission. However, in Paraworker:migrate_blocks, there is no torch.cuda.synchronize()
I wonder how it guarantee that kv cache transmission finished?
looking forward to your reply, thanks! @interestingLSY @PKUFlyingPig
I find that
cudaMemcpy2DAsyncis used for kv cache transmission. However, inParaworker:migrate_blocks, there is notorch.cuda.synchronize()DistServe/distserve/worker.py
Line 250 in 3a5c539
I wonder how it guarantee that kv cache transmission finished?
looking forward to your reply, thanks! @interestingLSY @PKUFlyingPig