Vine: Waiting Tasks Consume Only Disk#4343
Vine: Waiting Tasks Consume Only Disk#4343dthain wants to merge 3 commits intocooperative-computing-lab:masterfrom
Conversation
memory, and disk, while WAITING_RETRIEVAL tasks consume only disk: - Change count_worker_resources() to keep track of tasks in running/waiting_retrieval state. - Change accounting for RUNNING and WAITING_RETRIEVAL tasks. - Change uses of itable_size(w->current_tasks) to use the appropriate task counter instead.
|
FYI, let's not be in a hurry to accept this one, I think it may have some unexpected consequences, so it can wait until we have gained some experience with it... |
| /* Waiting tasks consume only disk. */ | ||
| w->tasks_waiting_retrieval++; | ||
| if(box) { | ||
| w->resources->disk.inuse += box->disk; |
There was a problem hiding this comment.
I think here you want to use t->sandbox_measured. This will be helpful when using proportional resources (the default), which tries to assign as much disk as possible to the tasks.
There was a problem hiding this comment.
That's an interesting idea. So t->sandbox_measured is made available when the task completion message arrives. That should give a value less than the "box" assigned at dispatch time, which makes sense.
I'm hesitating b/c the worker adds up resource consumption at its end solely based on the request request at submit time. Is there some problem that may result from the manager and the worker assuming different values?
There was a problem hiding this comment.
Since count_worker_resources is called when the number of tasks change in the worker, I think it should be ok.
Proposed Changes
Under the current accounting scheme, tasks consume all resource types, even when they are in the WAITING_RETRIEVAL state. This results in taskvine leaving cores/memory/gpus idle until it can clear out those waiting tasks.
This (experiment) changes the accounting of tasks, so that only RUNNING tasks consume cores/memory/gpus, and WAITING tasks only consume disk. Accounting has been change to look at specific counters rather than
itable_size(w->current_tasks)which captures both types.The intended result is that we should see more tasks running and more overlap between task execution and result retrieval.
Merge Checklist
The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.
make testRun local tests prior to pushing.make formatFormat source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lintRun lint on source code prior to pushing.