Task Spooler PLUS

A single-binary, multi-user job scheduler — think of it as a lightweight Slurm for shared workstations. No daemons, no databases to configure, no cluster infrastructure. Just one ts binary.

Originally a single-user task queue, Task Spooler PLUS has been transformed by Kylin JIANG into a multi-user job scheduler with crash recovery, cgroups resource isolation, and NUMA-aware CPU binding — bringing Slurm-like scheduling to small shared machines.

中文文档

Why Task Spooler PLUS?

vs. the original Task Spooler

	Original TS	TS PLUS
Users	One queue per user	Central server, multi-user with per-user slot limits
Recovery	Jobs lost on crash	SQLite3 WAL — all jobs, states, timings survive crashes and reboots
Resource control	None	cgroups v1/v2 CPU limiting, freezer pause/resume, NUMA CPU binding
Scheduling	FIFO only	Dependency chains, wall-time auto-pause, `--at` scheduled execution
User management	None	Dynamic user config, suspend/resume per user, `ts -X` hot-reload
Client resilience	Disconnected on server restart	Auto-reconnect, re-attach running jobs seamlessly

vs. Slurm

	Slurm	TS PLUS
Setup	`slurmctld`, `slurmd`, `munge`, MySQL, config files	One binary, one config file
Target	Clusters (hundreds–thousands of nodes)	Workstations (1 node, several–tens of users)
Multi-user	✓	✓
Job recovery	Via database	SQLite3 WAL
CPU/NUMA binding	`--cpu-bind` / `--mem-bind`	`TS_CPU_BIND=1`, NUMA-aware allocator
cgroups	v1/v2 (via plugin)	v1/v2 (built-in)
Wall-time limits	✓	✓ (auto-pause + re-queue)
Per-job callbacks	Epilog/Prolog scripts	`--on-finish` hook with placeholders

If you have a shared workstation with a handful of users running simulation or ML workloads, TS PLUS gives you Slurm-like job management without the operational burden.

How it Started

Task Spooler PLUS began as a fork of Task Spooler by Lluís Batlle i Rossell. Kylin JIANG transformed it from a single-user queue into a multi-user system with:

cgroups v1/v2 — CPU quota limiting, freezer-based pause/resume, cpuset NUMA binding
SQLite3 crash recovery — jobs, states, and timing data survive reboots
CPU/NUMA binding allocator — topology-aware, best-fit group selection, auto-defrag with NUMA affinity preservation
Dynamic user management — struct User via vec_t, config hot-reload, per-user suspend/resume
Wall-time enforcement — auto-pause timed-out jobs, re-queue with extended deadline
Client auto-reconnect — running jobs survive server restarts

Features

Cross-platform task queue for GNU/Linux, Darwin, Cygwin, FreeBSD
Multi-user support with per-user slot limits
Fatal crash recovery via SQLite3 (WAL mode) — jobs, states, and timings survive reboots
Cgroups CPU limiting and freezer-based pause/resume (v1 and v2, compile-time selectable)
Wall-time management — auto-pause and re-queue timed-out tasks
Global user control — suspend/resume all jobs for a single user
Comprehensive output in default, JSON, and tab-separated formats
Simple build — just make (no autotools)
Optional stderr separation for better log management
PID lookup (--find-by-pid) to identify which job owns a process (including descendants)
Scheduled execution (--at) — delay jobs until a specified time (+5m, 14:00, 2025-06-01T14:00)
CPU binding (TS_CPU_BIND) — NUMA-aware topology-based CPU allocation with cgroups cpuset v1/v2, HT exclusion, and crash-restart recovery
Crash survival — jobs persist through server restart with automatic client reconnect
Post-job hook (--on-finish) — run a command after a job finishes, with access to job info via placeholders

Tools

tools/migrate_uid.py — Migrate old ts_UID column (vec index) to Linux UIDs
tools/clear_finished.py — Clear or --drop + recreate the Finished table

Quick Start

make                      # build (cgroups v1, default)
make CGROUP_V2=1          # build with cgroups v2 support
make TS_CPU_BIND=1        # build with CPU binding (cgroups cpuset)
make CGROUP_V2=1 TS_CPU_BIND=1  # build with both cgroups v2 + CPU binding
sudo ./ts --daemon        # start the server as daemon (root only)
./ts -l                   # list jobs
./ts sleep 30             # enqueue a job
./ts -r <id>              # remove a job
./ts -k <id>              # kill a running job
./ts -w <id>              # wait for a job to finish

Per-job callback example

--on-finish lets you run a command after a job finishes, with full job info via placeholders:

./ts --on-finish "cat > job-{jobid}.info << 'EOF'
 pid={pid}  label={label}  exitcode={exitcode}
 realtime={realtime} usertime={usertime} systime={systime} pausetime={pausetime}
 start_time={start_time}
 enque_time={enque_time}
 end_time={end_time}
 slots={slots}
EOF" -L test_job sleep 10

After the job finishes, job-<id>.info will contain all the resolved values — useful for logging, email notifications (--on-finish "sendmail {exitcode} ..."), or chaining workflows.

Build & Install

make                      # Build `ts` binary
make CGROUP_V2=1          # Build with cgroups v2
make clean                # Remove objects and binary
./install_make            # Install to /usr/local (needs root)

Default paths (override via environment variables):

Variable	Default	Purpose
`TS_SOCKET`	`$TMPDIR/socket-ts.root`	Unix socket
`TS_USER_PATH`	`/home/kylin/task-spooler/user.txt`	User config
`TS_LOGFILE_PATH`	`/home/kylin/task-spooler/log.txt`	Job log
`TS_SQLITE_PATH`	`/home/kylin/task-spooler/task-spooler.db`	SQLite database
`TS_SLOTS`	`1`	Max concurrent jobs
`TS_MAXFINISHED`	`1000`	Max finished jobs
`TS_MAX_WALL_TIME`	`10080` (minutes)	Max wall-time limit
`TS_FIRST_JOBID`	`1000`	Starting job ID

Edit defaults.h to change the built-in defaults.

User Configuration

The server reads a user config file (path from TS_USER_PATH) mapping usernames to max slot counts. Usernames are resolved to Linux UIDs via getpwnam() — no need to manually specify UIDs.

user.txt format:

# <username> <max_slots>
TS_SLOTS = 16
john    4
mary    2

Root (uid=0) is automatically added with full control.

The config can be refreshed at runtime via ts -X (root only). The refresh only allows adding new users — existing users cannot be removed or have their slots changed.

How it works

A server process runs as root, managing jobs in memory and persisting state to SQLite3. Client processes connect via a Unix socket. The server never executes user commands — clients fork and run jobs themselves, preserving the user's environment, ulimits, and working directory.

ts (client)  ──Unix socket──▶  ts (server daemon)
   │                              │
   fork() + exec(cmd)             │  manages queue, slots, users
   │                              │  persists to SQLite3 (WAL)
   waitpid() → notify server      │

On crash, the client auto-reconnects and re-attaches running jobs. On reboot, all job state is restored from SQLite.

Cgroups support

Build-time selection:

make — cgroups v1 (cpu.cfs_quota_us + freezer.state)
make CGROUP_V2=1 — cgroups v2 (cpu.max + cgroup.freeze)

Both provide CPU quota limiting and freezer-based pause/resume.

To check which cgroup version your system supports:

mount | grep cgroup
# v1 shows: cgroup on /sys/fs/cgroup/cpu, freezer, cpuset ...
# v2 shows: cgroup2 on /sys/fs/cgroup type cgroup2

Single-instance guard

The server checks /proc on --daemon startup: if another instance of the same binary is already running as root, it refuses to start.

Common problems

Server stuck: remove the socket file (/tmp/socket-ts.root), then restart
After SIGKILL: .db-wal and .db-shm files persist — SQLite auto-recovers on next open
After crash: running jobs lose exit code and signal information

Manual

See man ts or run ts -h for the full command reference.

Task Spooler PLUS 2.6.1 - a multi-user job scheduler like slurm.
Copyright (C) 2007-2026  Kylin JIANG - Duc Nguyen - Lluis Batlle i Rossell

Environment Variables:
  TS_SOCKET        : Unix socket path (default: $TMPDIR/socket-ts.root)
  TS_SLOTS         : Max concurrent jobs (server start, default: 1)
  TS_USER_PATH     : User config file path (server start)
  TS_LOGFILE_PATH  : Job log path (server start)
  TS_SQLITE_PATH   : SQLite DB path (server start)
  TS_MAXFINISHED   : Max finished jobs (default: 1000)
  TS_MAX_WALL_TIME : Max wall-time (default: 10080 min)
  TS_MAXCONN       : Max connections (default: 1000)
  TS_SORTJOBS      : Job queue sorting control
  TS_SAVELIST      : Crash recovery file for job list
  TS_ENV           : Command to gather job info during enqueue
  TS_ONFINISH      : Default on-finish command (overridden by --on-finish)
  TMPDIR           : Temporary output directory

Long option actions:
  --getenv [var]          Get server environment variable
  --setenv [var]          Set server environment flag
  --unsetenv [var]        Remove server environment flag
  --get-label || -a [id]  Show job label
  --full-cmd || -F [id]   Show full command
  --find-by-pid [pid]     Find which running job a PID belongs to
  --check-daemon           Verify daemon status
  --count-running || -R   Count running jobs
  --last-queue-id || -q   Show last added job ID
  --get-logdir             Display log directory path
  --set-logdir [path]     Configure log directory
  --serialize || -M [fmt] Export job list (default/json/tab)
                          Use -M json -J <id> for single job JSON
  --hold [jobid]          Pause specified job
  --cont [jobid]          Resume paused job
  --suspend [USER]        Suspend user
  --resume [USER]         Resume user
  --lock                  Lock server
  --unlock                Release server lock
  --at <time>             Schedule: +5m, 14:00, 06-01_14:00, 2025-06-01T14:00
  --on-finish <template>  Run command after job finishes
                          Placeholders: {jobid} {output} {exitcode} {pid} {label}
                          {command} {realtime} {usertime} {systime}
                          {pausetime} {start_time} {enque_time} {end_time} {slots}
  --wtime [dur]           Wall time limit (e.g. 30s, 3.4m, 1.5H, 2d)
  --add-wtime [dur]       Increase job wall time (root only)
  --job [id] || -J [id]  Specify job ID
  --daemon                Run as daemon (root only)
  --no-bind-defrag        Disable defrag (server start, root only)

Actions:
  -A           List info for all users
  -X           Refresh user config (root only)
  -K           Stop server (root only)
  -C           Clear finished jobs
  -l           Show job list (default)
  -S [num]     Get/set max concurrent jobs (root only)
  -t [id]      Tail last 10 lines of output
  -c [id]      Show complete output
  -p [id]      Display job PID
  -o [id]      Show output file path
  -i [id]      Display job info
  -s [id]      Show job state
  -r [id]      Remove job
  -w [id]      Wait for job
  -k [id]      Send SIGTERM to job
  -T           SIGTERM all jobs (root only)
  -u [id]      Prioritize job
  -U <id-id>   Swap two jobs
  -h           Show help
  -V           Display version

Options adding jobs:
  -B           Exit if server full
  -n           Disable output storage
  -E           Separate stderr
  -O           Set log filename
  -z           Gzip output
  -f           Run in foreground
  -d           Run after last job
  -D <id,...>  Run after specified IDs
  -W <id,...>  Run after successful IDs
  -L [label]   Assign job label
  -N [num]     Required slots (default: 1)

History

Андрей Пантюхин (Andrew Pantyukhin) maintains the BSD port.
Alessandro Öhler provided the original Gentoo ebuild.
Alexander V. Inyukhin maintains unofficial Debian packages.
Pascal Bleser packed the program for SuSE/openSuSE.
Gnomeye maintains the AUR package.
Eric Keller wrote a nodejs web server for the task spooler queue.
Duc Nguyen developed GPU support.
Kylin JIANG transformed Task Spooler into Task Spooler PLUS: multi-user architecture with central server, SQLite3 WAL crash recovery, cgroups v1/v2 (CPU limiting, freezer, cpuset NUMA binding), NUMA-aware CPU binding allocator with auto-defrag, wall-time auto-pause, dynamic user management with hot-reload, client auto-reconnect, scheduled execution (--at), per-job hooks (--on-finish), PID lookup, and hundreds of stability fixes.

License

See the provided COPYING file.

Name		Name	Last commit message	Last commit date
Latest commit History 301 Commits
assets		assets
cjson		cjson
deb/pkg/DEBIAN		deb/pkg/DEBIAN
debian-pkg		debian-pkg
gentoo		gentoo
tools		tools
web		web
.gitignore		.gitignore
.hg_archival.txt		.hg_archival.txt
.hgignore		.hgignore
.hgtags		.hgtags
1.1015		1.1015
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
COPYING		COPYING
Changelog		Changelog
Makefile		Makefile
PORTABILITY		PORTABILITY
README.md		README.md
README_CN.md		README_CN.md
TRICKS		TRICKS
auto-changelog		auto-changelog
buglist.bug		buglist.bug
cgroups.c		cgroups.c
cgroups.h		cgroups.h
client.c		client.c
client.h		client.h
cpu_bind.c		cpu_bind.c
cpu_bind.h		cpu_bind.h
defaults.h		defaults.h
env.c		env.c
env.h		env.h
error.c		error.c
error.h		error.h
error.wav		error.wav
execute.c		execute.c
execute.h		execute.h
gen_topology.py		gen_topology.py
info.c		info.c
info.h		info.h
install_cmake		install_cmake
install_make		install_make
job_ops.c		job_ops.c
job_ops.h		job_ops.h
jobs.c		jobs.c
jobs.h		jobs.h
kill_ppid.sh		kill_ppid.sh
list.c		list.c
list.h		list.h
main.c		main.c
main.h		main.h
makedeb		makedeb
manual_test.sh		manual_test.sh
migrate_db.py		migrate_db.py
msg.c		msg.c
msg.h		msg.h
msgdump.c		msgdump.c
msgdump.h		msgdump.h
notifications-sound.wav		notifications-sound.wav
notify.c		notify.c
notify.h		notify.h
package.json		package.json
print.c		print.c
print.h		print.h
reinstall		reinstall
relink.py		relink.py
runtime_limit.c		runtime_limit.c
runtime_limit.h		runtime_limit.h
sendmail		sendmail
server.c		server.c
server.h		server.h
server_env.c		server_env.c
server_env.h		server_env.h
server_start.c		server_start.c
server_start.h		server_start.h
server_user.c		server_user.c
server_user.h		server_user.h
setenv		setenv
signals.c		signals.c
signals.h		signals.h
sleep_test.sh		sleep_test.sh
sqlite.c		sqlite.c
sqlite.h		sqlite.h
tail.c		tail.c
tail.h		tail.h
task-spooler.db		task-spooler.db
task-spooler.service		task-spooler.service
testbench.sh		testbench.sh
ts.1		ts.1
uninstall		uninstall
user.c		user.c
user.h		user.h
user.txt		user.txt
utils.c		utils.c
utils.h		utils.h
vec.c		vec.c
vec.h		vec.h
version.h		version.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Task Spooler PLUS

Why Task Spooler PLUS?

vs. the original Task Spooler

vs. Slurm

How it Started

Features

Tools

Quick Start

Per-job callback example

Build & Install

User Configuration

How it works

Cgroups support

Single-instance guard

Common problems

Manual

History

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Task Spooler PLUS

Why Task Spooler PLUS?

vs. the original Task Spooler

vs. Slurm

How it Started

Features

Tools

Quick Start

Per-job callback example

Build & Install

User Configuration

How it works

Cgroups support

Single-instance guard

Common problems

Manual

History

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages