Skip to content

Vine: Waiting Tasks Consume Only Disk#4343

Open
dthain wants to merge 3 commits intocooperative-computing-lab:masterfrom
dthain:vine-current-tasks
Open

Vine: Waiting Tasks Consume Only Disk#4343
dthain wants to merge 3 commits intocooperative-computing-lab:masterfrom
dthain:vine-current-tasks

Conversation

@dthain
Copy link
Copy Markdown
Member

@dthain dthain commented Feb 5, 2026

Proposed Changes

Under the current accounting scheme, tasks consume all resource types, even when they are in the WAITING_RETRIEVAL state. This results in taskvine leaving cores/memory/gpus idle until it can clear out those waiting tasks.

This (experiment) changes the accounting of tasks, so that only RUNNING tasks consume cores/memory/gpus, and WAITING tasks only consume disk. Accounting has been change to look at specific counters rather than itable_size(w->current_tasks) which captures both types.

The intended result is that we should see more tasks running and more overlap between task execution and result retrieval.

Merge Checklist

The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update: Update the manual to reflect user-visible changes.
  • Type Labels: Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels: Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM: Mark your PR as ready to merge.

memory, and disk, while WAITING_RETRIEVAL tasks consume only disk:
- Change count_worker_resources() to keep track of tasks in running/waiting_retrieval state.
- Change accounting for RUNNING and WAITING_RETRIEVAL tasks.
- Change uses of itable_size(w->current_tasks) to use the appropriate task counter instead.
@dthain
Copy link
Copy Markdown
Member Author

dthain commented Feb 5, 2026

FYI, let's not be in a hurry to accept this one, I think it may have some unexpected consequences, so it can wait until we have gained some experience with it...

@btovar btovar self-requested a review February 6, 2026 13:13
/* Waiting tasks consume only disk. */
w->tasks_waiting_retrieval++;
if(box) {
w->resources->disk.inuse += box->disk;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you want to use t->sandbox_measured. This will be helpful when using proportional resources (the default), which tries to assign as much disk as possible to the tasks.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea. So t->sandbox_measured is made available when the task completion message arrives. That should give a value less than the "box" assigned at dispatch time, which makes sense.

I'm hesitating b/c the worker adds up resource consumption at its end solely based on the request request at submit time. Is there some problem that may result from the manager and the worker assuming different values?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since count_worker_resources is called when the number of tasks change in the worker, I think it should be ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants