-
Notifications
You must be signed in to change notification settings - Fork 141
Reimplementation of Orchestration, Scheduling, and Credential Management in Golang #567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| // Build SSH config | ||
| config := &ssh.ClientConfig{ | ||
| User: sshCreds.Username, | ||
| HostKeyCallback: ssh.InsecureIgnoreHostKey(), // In production, use proper host key verification |
Check failure
Code scanning / CodeQL
Use of insecure HostKeyCallback implementation High
this source
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To address this issue, the code must validate the host key provided by the server against a trusted list. The single best way in this scenario is to switch from ssh.InsecureIgnoreHostKey() to ssh.FixedHostKey(pubKey) where pubKey is the trusted public key of the server. This requires loading the public key from a file (e.g., a user-specific or resource-specific location such as "~/.ssh/known_hosts" or from a configuration variable), parsing it, and setting it in the ssh.ClientConfig. All changes are localized to the block around the SSH client config creation, lines 747–751 and nearby as needed. You need to:
- Load the trusted public key (from a path—this should be obtained from known resource metadata, or you may use a new field in resource/metadata if available),
- Parse it with
ssh.ParsePublicKey, - Replace
ssh.InsecureIgnoreHostKey()withssh.FixedHostKey(pubKey), - Import
"io/ioutil"if not already present.
If there’s no pre-existing way to get the trusted public key path, a fallback could be a configurable constant or extracting a metadata field.
-
Copy modified lines R747-R755 -
Copy modified line R758
| @@ -744,9 +744,18 @@ | ||
| } | ||
|
|
||
| // Build SSH config | ||
| // Load trusted host public key | ||
| pubKeyBytes, err := os.ReadFile(sshCreds.HostPublicKeyPath) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to read host public key: %w", err) | ||
| } | ||
| pubKey, err := ssh.ParsePublicKey(pubKeyBytes) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to parse host public key: %w", err) | ||
| } | ||
| config := &ssh.ClientConfig{ | ||
| User: sshCreds.Username, | ||
| HostKeyCallback: ssh.InsecureIgnoreHostKey(), // In production, use proper host key verification | ||
| HostKeyCallback: ssh.FixedHostKey(pubKey), // Host key is verified against allow-list | ||
| Timeout: 10 * time.Second, | ||
| } | ||
|
|
|
|
||
| // Write script to temporary file | ||
| scriptPath := fmt.Sprintf("/tmp/worker_spawn_%s.sh", req.WorkerID) | ||
| if err := os.WriteFile(scriptPath, []byte(spawnScript), 0755); err != nil { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To fix this problem, we need to ensure that the WorkerID component (and, by extension, those values from which it is constructed: computeResourceID and experimentID) is sanitized/validated before being used as part of a file path. Since the path is supposed to be a filename in /tmp, we should ensure the WorkerID portion contains only safe characters, such as alphanumerics, underscores, and dashes. Any other input should result in an error. This prevents directory traversal, absolute paths, or any injection.
We should add a helper function, for example, isSafeFileName, to check if the Worker ID is safe before using it when constructing the filename. If not, the function should return an error and not proceed.
Where to change:
- In
scheduler/adapters/compute_baremetal.go, inSpawnWorker, before constructingscriptPath, validatereq.WorkerIDwith the helper. - The helper function can be added to this file.
- If invalid, return an error instead of writing/executing the file.
-
Copy modified lines R834-R838 -
Copy modified lines R890-R904
| @@ -831,6 +831,11 @@ | ||
|
|
||
| // SpawnWorker spawns a worker on the bare metal server | ||
| func (b *BareMetalAdapter) SpawnWorker(ctx context.Context, req *ports.SpawnWorkerRequest) (*ports.Worker, error) { | ||
| // Validate WorkerID for safe file name usage | ||
| if !isSafeFileName(req.WorkerID) { | ||
| return nil, fmt.Errorf("invalid worker ID: contains unsafe characters") | ||
| } | ||
|
|
||
| // Create worker record | ||
| worker := &ports.Worker{ | ||
| ID: req.WorkerID, | ||
| @@ -882,6 +887,21 @@ | ||
| return worker, nil | ||
| } | ||
|
|
||
|
|
||
| // isSafeFileName checks if the input string contains only safe characters for file names. | ||
| // Allowable: alphanumeric, underscore, dash. | ||
| func isSafeFileName(name string) bool { | ||
| for _, c := range name { | ||
| if !(c >= 'a' && c <= 'z') && | ||
| !(c >= 'A' && c <= 'Z') && | ||
| !(c >= '0' && c <= '9') && | ||
| c != '_' && c != '-' { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| } | ||
|
|
||
| // SubmitJob submits a job to the compute resource | ||
| func (b *BareMetalAdapter) SubmitJob(ctx context.Context, req *ports.SubmitJobRequest) (*ports.Job, error) { | ||
| // Generate a unique job ID |
| } | ||
|
|
||
| // Execute worker spawn script in background | ||
| cmd := exec.CommandContext(ctx, "bash", scriptPath) |
Check failure
Code scanning / CodeQL
Command built from user-controlled sources Critical
user-provided value
| // Execute worker spawn script in background | ||
| cmd := exec.CommandContext(ctx, "bash", scriptPath) | ||
| if err := cmd.Start(); err != nil { | ||
| os.Remove(scriptPath) // Clean up script file |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
The best way to fix this vulnerability is to validate the user input before using it to construct a file path. Since WorkerID is meant to be a single file name component, we need to ensure it does not contain any path separators (/, \) or parent directory references (..). The recommended fix is to add a check right before constructing or using the path, and reject or sanitize the input if it is invalid.
Changes to make:
- In
scheduler/adapters/compute_baremetal.go, inside theSpawnWorkerfunction, add validation right beforescriptPathis constructed and used. - If the input is invalid (contains
/,\, or..), return an error and do not proceed with file operations. - This fix requires importing the
stringspackage if not already imported (it is already imported). - No changes beyond the affected code block are needed.
-
Copy modified lines R863-R866
| @@ -860,6 +860,10 @@ | ||
| } | ||
|
|
||
| // Write script to temporary file | ||
| // Validate WorkerID to prevent path traversal | ||
| if strings.Contains(req.WorkerID, "/") || strings.Contains(req.WorkerID, "\\") || strings.Contains(req.WorkerID, "..") { | ||
| return nil, fmt.Errorf("invalid WorkerID: contains forbidden characters") | ||
| } | ||
| scriptPath := fmt.Sprintf("/tmp/worker_spawn_%s.sh", req.WorkerID) | ||
| if err := os.WriteFile(scriptPath, []byte(spawnScript), 0755); err != nil { | ||
| return nil, fmt.Errorf("failed to write spawn script: %w", err) |
| worker.Status = domain.WorkerStatusIdle | ||
|
|
||
| // Clean up script file | ||
| os.Remove(scriptPath) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, to fix uncontrolled data used in path expression vulnerabilities, path components constructed from user input should be validated or sanitized before use. In this case, the best approach is to sanitize or validate req.WorkerID before using it in the script filename so that it cannot introduce path traversal, path separators, or otherwise produce file names outside of the intended /tmp directory.
Specifically, in scheduler/adapters/compute_baremetal.go, before using req.WorkerID, restrict it to a safe pattern (e.g., only alphanumeric, dash, and underscore), or, if that's too restrictive, at minimum reject WorkerIDs that contain /, \, or ... If invalid, return an error. This can be done at the start of the SpawnWorker method. No external dependencies are needed: standard library functions like strings.Contains and/or regular expressions are sufficient.
Add the validation at the top of the SpawnWorker function or immediately before constructing the script path. If using regex, add "regexp" to imports, but for this case, a basic check is sufficient.
-
Copy modified lines R834-R837
| @@ -831,6 +831,10 @@ | ||
|
|
||
| // SpawnWorker spawns a worker on the bare metal server | ||
| func (b *BareMetalAdapter) SpawnWorker(ctx context.Context, req *ports.SpawnWorkerRequest) (*ports.Worker, error) { | ||
| // Validate WorkerID: must not contain path separators or ".." | ||
| if strings.Contains(req.WorkerID, "/") || strings.Contains(req.WorkerID, "\\") || strings.Contains(req.WorkerID, "..") { | ||
| return nil, fmt.Errorf("invalid WorkerID: must not contain path separators or '..'") | ||
| } | ||
| // Create worker record | ||
| worker := &ports.Worker{ | ||
| ID: req.WorkerID, |
|
|
||
| // Write script to temporary file | ||
| scriptPath := fmt.Sprintf("/tmp/worker_spawn_%s.sh", req.WorkerID) | ||
| if err := os.WriteFile(scriptPath, []byte(spawnScript), 0755); err != nil { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To fix this, we must validate req.WorkerID in the SlurmAdapter's SpawnWorker method before using it to construct a filesystem path. The best practice is to ensure WorkerID is a safe filename or single path component: it should not contain /, \, or .., and should match a safe pattern (e.g., only alphanumerics, underscore, hyphen). If validation fails, return an error and do not proceed with file operations.
Edit only the scheduler/adapters/compute_slurm.go file, in the SpawnWorker() method. Specifically, right before constructing scriptPath, validate req.WorkerID using a regular expression or by checking for forbidden characters.
No new methods required; a local check using regex or strings.Contains is sufficient.
-
Copy modified lines R965-R971
| @@ -962,6 +962,13 @@ | ||
| Metadata: req.Metadata, | ||
| } | ||
|
|
||
| // Validate WorkerID to prevent path traversal or injection | ||
| // Allow only alphanumerics, underscore, and hyphen | ||
| var validWorkerID = regexp.MustCompile(`^[a-zA-Z0-9_\-]+$`) | ||
| if !validWorkerID.MatchString(req.WorkerID) { | ||
| return nil, fmt.Errorf("invalid worker ID: contains forbidden characters") | ||
| } | ||
|
|
||
| // Generate worker spawn script using local implementation | ||
| experiment := &domain.Experiment{ | ||
| ID: req.ExperimentID, |
| // Submit worker spawn job to SLURM | ||
| jobID, err := s.SubmitTask(ctx, scriptPath) | ||
| if err != nil { | ||
| os.Remove(scriptPath) // Clean up script file |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To address the problem, we need to ensure that the WorkerID used in the path is safe and cannot contain user-controlled path traversal or absolute path characters. The best way is to restrict WorkerID to a known-safe pattern (e.g., only alphanumeric, hyphen, and underscore), or at minimum reject any containing /, \, or ... Since the value is constructed from user input and an untrusted source, it's prudent to validate or sanitize it on use before constructing the file path.
Specifically, within scheduler/adapters/compute_slurm.go, in the SpawnWorker method, we should add a validation step before using req.WorkerID for the file name. If WorkerID does not match the safe pattern (e.g. using regexp), we should return a validation error. For defense in depth, it's usually best to validate as close as possible to where the variable is used in the sensitive context.
No new imports are strictly necessary, but since the snippet already imports regexp, we can use a regular expression for the safe pattern check.
-
Copy modified lines R947-R952
| @@ -944,6 +944,12 @@ | ||
|
|
||
| // SpawnWorker spawns a worker on the SLURM cluster | ||
| func (s *SlurmAdapter) SpawnWorker(ctx context.Context, req *ports.SpawnWorkerRequest) (*ports.Worker, error) { | ||
| // Validate WorkerID to prevent path traversal/absolute path abuse | ||
| safeIDPattern := regexp.MustCompile(`^[a-zA-Z0-9_\-]+$`) | ||
| if !safeIDPattern.MatchString(req.WorkerID) { | ||
| return nil, fmt.Errorf("invalid WorkerID: must be alphanumeric, hyphen, or underscore only") | ||
| } | ||
|
|
||
| // Create worker record | ||
| worker := &ports.Worker{ | ||
| ID: req.WorkerID, |
| worker.Status = domain.WorkerStatusIdle | ||
|
|
||
| // Clean up script file | ||
| os.Remove(scriptPath) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
To fix the problem, we should ensure that req.WorkerID, as it is incorporated in the temporary file path, cannot be manipulated to affect directory traversal or file naming in an unsafe way. The best way is to strictly validate req.WorkerID before using it in the filename. Since WorkerID is likely intended to be a single identifier without path components, the safest fix is to check if it contains any "/", "\\", or ".." sequences and reject such values, returning an error if found.
Implement this check near the point of file creation in SpawnWorker of scheduler/adapters/compute_slurm.go, immediately before constructing/writing to scriptPath. If the value is unsafe, return an error promptly. This avoids any risk, keeps the logic simple, and preserves existing behavior for proper, expected IDs.
No new external libraries are needed—this can be done using standard library functions like strings.Contains.
-
Copy modified lines R976-R979
| @@ -973,6 +973,10 @@ | ||
| } | ||
|
|
||
| // Write script to temporary file | ||
| // Validate WorkerID for safe file usage | ||
| if strings.Contains(req.WorkerID, "/") || strings.Contains(req.WorkerID, "\\") || strings.Contains(req.WorkerID, "..") { | ||
| return nil, fmt.Errorf("invalid WorkerID: contains unsafe path components") | ||
| } | ||
| scriptPath := fmt.Sprintf("/tmp/worker_spawn_%s.sh", req.WorkerID) | ||
| if err := os.WriteFile(scriptPath, []byte(spawnScript), 0755); err != nil { | ||
| return nil, fmt.Errorf("failed to write spawn script: %w", err) |
| // Build SSH config | ||
| config := &ssh.ClientConfig{ | ||
| User: username, | ||
| HostKeyCallback: ssh.InsecureIgnoreHostKey(), // In production, use proper host key verification |
Check failure
Code scanning / CodeQL
Use of insecure HostKeyCallback implementation High
this source
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
The best way to fix this problem is to replace the use of ssh.InsecureIgnoreHostKey() with a proper host key verification mechanism. We should obtain the allowed host’s public key—this can be read from a file or from resource metadata (the latter being convenient if the resource definition includes it). For simplicity and safety, we should use ssh.FixedHostKey, which will reject the connection unless the server presents exactly the trusted key. The fix can be implemented in the connect method by adding code to fetch the allowed host public key (e.g., from resource.Metadata["host_public_key"] if available, or alternatively from a file path), parse it, and assign ssh.FixedHostKey(parsedKey) to the HostKeyCallback field.
Necessary imports: The code already has golang.org/x/crypto/ssh imported, so no new import is needed.
Required changes:
- In the
connectmethod ofSFTPAdapter(lines 42–162), specifically:- Insert code to read the allowed public key (from resource metadata).
- Parse the public key.
- Set
HostKeyCallback: ssh.FixedHostKey(pubKey)inssh.ClientConfig.
-
Copy modified lines R97-R106 -
Copy modified line R110
| @@ -94,10 +94,20 @@ | ||
| return fmt.Errorf("username not found in credentials or resource metadata") | ||
| } | ||
|
|
||
| // Obtain the trusted host public key from resource metadata | ||
| hostPublicKeyStr, ok := resourceMetadata["host_public_key"] | ||
| if !ok || hostPublicKeyStr == "" { | ||
| return fmt.Errorf("host public key not found in resource metadata") | ||
| } | ||
| pubKey, err := ssh.ParsePublicKey([]byte(hostPublicKeyStr)) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to parse host public key: %w", err) | ||
| } | ||
|
|
||
| // Build SSH config | ||
| config := &ssh.ClientConfig{ | ||
| User: username, | ||
| HostKeyCallback: ssh.InsecureIgnoreHostKey(), // In production, use proper host key verification | ||
| HostKeyCallback: ssh.FixedHostKey(pubKey), | ||
| Timeout: 10 * time.Second, | ||
| Config: ssh.Config{ | ||
| Ciphers: []string{ |
| tarReader := tar.NewReader(gzReader) | ||
|
|
||
| for { | ||
| header, err := tarReader.Next() |
Check failure
Code scanning / CodeQL
Arbitrary file access during archive extraction ("Zip Slip") High
file system operation
Unsanitized archive entry, which may contain '..', is used in a
file system operation
Unsanitized archive entry, which may contain '..', is used in a
file system operation
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
The best way to fix this issue is to restrict tar archive entry paths so that they cannot escape the specified destination directory and do not contain any path traversal (..), absolute paths, or other dangerous constructs. To do this, we should:
- Use
filepath.Join(destDir, header.Name)to construct the intended path. - Clean the path using
filepath.Clean. - Convert both the target path and the destination directory to their absolute forms with
filepath.Abs. - Check that the target path is a strict subpath of the destination directory: specifically, that the absolute target path starts with the absolute destination directory path, followed by a path separator—this ensures containment.
- Skip extraction for entries which fail these checks, optionally logging a warning.
These changes should be implemented inside the extractTarGz function (scheduler/cmd/cli/data.go, lines 644–694). No new packages are needed because filepath and strings are already imported.
-
Copy modified lines R662-R678 -
Copy modified line R682 -
Copy modified line R689 -
Copy modified line R694
| @@ -659,24 +659,39 @@ | ||
| return err | ||
| } | ||
|
|
||
| // Create full path | ||
| targetPath := filepath.Join(destDir, header.Name) | ||
| // Sanitize archive entry path to prevent Zip Slip/directory traversal | ||
| cleanedName := filepath.Clean(header.Name) | ||
| targetPath := filepath.Join(destDir, cleanedName) | ||
| absDestDir, err := filepath.Abs(destDir) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| absTargetPath, err := filepath.Abs(targetPath) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| // Ensure that absTargetPath is within absDestDir | ||
| if !strings.HasPrefix(absTargetPath, absDestDir+string(os.PathSeparator)) && absTargetPath != absDestDir { | ||
| // Optionally log or print warning, skip extraction of this entry | ||
| fmt.Printf("Skipping suspicious entry: %q\n", header.Name) | ||
| continue | ||
| } | ||
|
|
||
| // Create directory if needed | ||
| if header.Typeflag == tar.TypeDir { | ||
| if err := os.MkdirAll(targetPath, 0755); err != nil { | ||
| if err := os.MkdirAll(absTargetPath, 0755); err != nil { | ||
| return err | ||
| } | ||
| continue | ||
| } | ||
|
|
||
| // Create parent directories | ||
| if err := os.MkdirAll(filepath.Dir(targetPath), 0755); err != nil { | ||
| if err := os.MkdirAll(filepath.Dir(absTargetPath), 0755); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // Create file | ||
| file, err := os.Create(targetPath) | ||
| file, err := os.Create(absTargetPath) | ||
| if err != nil { | ||
| return err | ||
| } |
🎯 Conceptual Model
This PR is more of a RFC to think through what the fundamental problem that airavata should tackle. This POC is built around 6 core domain interfaces that represent the fundamental operations of distributed task execution. The POC uses a gRPC-based worker system for task execution:
🔐 Credential Management Architecture
The Airavata Scheduler implements a three-layer credential architecture that separates authorization logic from storage for maximum security and scalability:
Key Benefits