SlackSlurm posts Slurm job lifecycle notifications to Slack via Incoming Webhooks. It is designed to run as a watcher on a login node, so compute nodes do not need outbound internet access.
- Python 3.8+
- Slurm commands available in PATH:
squeue,sacct,scontrol - Login node can reach
https://hooks.slack.com
-
Go to your Slack workspace and navigate to "Apps". https://api.slack.com/apps?new_app=1
-
Click "Create New App" and choose "From scratch".
- Name your app and select the workspace.
- In the left sidebar, go to "Incoming Webhooks" and activate them.
- Click "Add New Webhook to Workspace", choose a channel (Your own channel, not a public one), and click "Allow".
pip install --user .- Initialize config (writes the default config file):
slackslurm init --webhook-url "https://hooks.slack.com/services/..."- Add a marker to the sbatch script header:
#SLACKSLURM project=demo mention_on_fail=@here- Run the watcher on a login node:
slackslurm watchYou can running in a tmux session or via nohup for long-term monitoring.
- Optional: send a test message:
slackslurm test-
The watcher polls
squeuefor active jobs owned by the current user. -
It only tracks jobs whose sbatch script header contains
#SLACKSLURM. -
When a job disappears from
squeue, the watcher queriessacctfor the final state and exit code. Ifsacctis unavailable, it falls back toscontrol show job(best effort). -
Notifications are sent via Slack Incoming Webhooks using Block Kit payloads.
-
Job submitted:
-
Job finished successfully:
slackslurm init --webhook-url URL- Creates
~/.config/slackslurm/config.json. - You can also set
SLACKSLURM_WEBHOOK_URLinstead of passing the flag.
- Creates
slackslurm test- Sends a sample message to verify webhook connectivity.
slackslurm watch [--once] [--poll-interval SECONDS]- Runs the watcher loop.
--onceperforms a single poll cycle.
- Runs the watcher loop.
Default config path:
~/.config/slackslurm/config.json
You can override paths with:
SLACKSLURM_CONFIG(config file path)SLACKSLURM_STATE(state file path)
Example config:
{
"webhook_url": "https://hooks.slack.com/services/...",
"poll_interval_seconds": 45,
"notify_on": ["submit", "start", "end"],
"script_marker": "#SLACKSLURM",
"tail_log_lines_on_fail": 40,
"max_log_chars": 1800,
"mention_on_fail": "@here",
"include_log_paths": true
}Key fields:
webhook_url: Slack Incoming Webhook URL.notify_on: choose any ofsubmit,start,end.script_marker: change the marker if you want a different tag.tail_log_lines_on_fail: number of lines to include from stderr/stdout.max_log_chars: hard cap for the log snippet size.mention_on_fail: mention string added to failed jobs.include_log_paths: include stdout/stderr file paths in the message.
Add the marker line in the header (before the first non-comment line):
#!/bin/bash
#SBATCH --job-name=train_x
#SBATCH --gres=gpu:4
#SBATCH --time=12:00:00
#SBATCH --output=logs/%x_%j.out
#SBATCH --error=logs/%x_%j.err
#SLACKSLURM project=robot exp=run42 mention_on_fail=@here
python train.pySupported tag behavior:
key=valuepairs are shown in the Slack message context as tags.mention_on_failoverrides the config value for this job only.
For long-running sessions, use tmux or nohup:
tmux new -s slackslurm
slackslurm watchnohup slackslurm watch > ~/.local/state/slackslurm/watch.log 2>&1 &To stop, send SIGINT or kill the process.
State is stored at:
~/.local/state/slackslurm/state.json
If you need to reset notification history, stop the watcher and delete the state file.
- No messages: ensure the sbatch script has the
#SLACKSLURMmarker and that the watcher runs on a login node with internet access. - Missing end notifications: confirm
sacctworks on your cluster. - Webhook errors: run
slackslurm testand check Slack app configuration. - Log tails missing: ensure
StdOut/StdErrpaths exist and are readable.
- Treat the webhook URL as a secret and do not commit it to the repo.
- The config file is written with
0600permissions when possible.






