Feature/k8_v2 #754

alexcos20 · 2024-11-18T05:25:36Z

Initial code for k8. still heavy WIP

alexcos20

AI automated code review (Gemini 3).

Overall risk: high

Summary:
This pull request introduces a significant refactoring and expansion of the Compute-to-Data (C2D) functionality. It modularizes C2D type definitions, integrates Docker and Kubernetes as new compute backend engines, and adds support for "free compute" jobs. A new SQLite database is implemented for persistent C2D job state, alongside new API endpoints for managing compute jobs (including streaming logs) and cron jobs for cleaning up expired data. This is a major architectural change that enhances the C2D capabilities of the Ocean Node.

Comments:
• [INFO][other] The change from url to fileObject for ComputeAsset and ComputeAlgorithm is a good step towards a more generic data source abstraction. It allows different storage types to be handled consistently.
• [WARNING][bug] The jobId generated here uses generateUniqueID() which returns a UUID. However, the getStatus handler in src/components/core/compute/getStatus.ts expects the jobId to be in the format hash-jobId to identify the correct engine. The current implementation exposes the bare UUID as cjob.jobId, which could lead to ambiguity if multiple Docker engines were configured or if the job ID isn't globally unique across all engines without the hash prefix. Consider prepending the clusterHash to the jobId before returning it to the user, similar to the commented-out line cjob.jobId = this.getC2DConfig().hash + '-' + cjob.jobId.
• [WARNING][performance] Multiple uses of synchronous I/O operations like statSync (line 90) and writeFileSync (line 273), and tar.create({ sync: true }) (line 296) can block the Node.js event loop. This can severely impact performance and responsiveness, especially when dealing with large files or a high volume of concurrent compute jobs. These should be replaced with asynchronous fs/promises methods or stream-based operations where appropriate.
• [WARNING][bug] The stopComputeJob method for the Docker engine is currently a placeholder (return null). This method is essential for users to explicitly terminate their compute jobs. It needs to be fully implemented to ensure proper resource management and job lifecycle control.
• [INFO][other] The entrypoint '/data/transformation/algorithm' is hardcoded. While this might be a common case, consider if this should be more flexible, perhaps derived from the algorithm's metadata or allowing for different input/output paths/conventions, especially for complex algorithms.
• [INFO][other] The maxJobDuration for free compute is set to 30 seconds. While this might be intentional for a free tier, it's a very short duration for many real-world computations. Ensure this is clearly communicated to users and evaluate if a slightly longer duration might be more practical for basic tests.
• [ERROR][security] CRITICAL SECURITY VULNERABILITY: Hardcoded Kubernetes certData and keyData on lines 44-46 (and potentially related configuration in src/utils/config.ts) expose sensitive credentials directly in the codebase. This grants unauthorized access to the Kubernetes cluster and is an extremely severe security risk. These credentials must be removed from the code and loaded securely at runtime, for example, from environment variables, Kubernetes secrets, or a dedicated secrets management service. This must be addressed immediately before merging this PR.
• [ERROR][bug] The Kubernetes compute engine is currently incomplete. getComputeEnvironments returns an empty array, and getComputeJobResult (line 115) and cleanupExpiredStorage (line 130) explicitly throw "Not implemented" errors. Additionally, createpvc (line 144) calls createNamespace instead of the appropriate createNamespacedPersistentVolumeClaim, and the YAML template uses placeholder values that are not dynamically replaced. The engine, as implemented, is not functional for actual compute operations and should be marked as experimental or withheld until fully developed and tested.
• [ERROR][bug] The condition computeEnvironment.storageExpiry > Date.now() / 1000 to identify expired jobs appears to be inverted. To find jobs that are past their expiry date, the condition should be job.expireTimestamp < Date.now() / 1000. The storageExpiry field on computeEnvironment likely defines the retention policy for the environment, not the individual job's expiry, which is stored in job.expireTimestamp. This logic error will prevent expired jobs from being cleaned up correctly.
• [WARNING][bug] The getFinishedJobs SQL query dateFinished IS NOT NULL OR results IS NOT NULL might be too broad. If a job has results but dateFinished is null, it could be picked up prematurely. It might be safer to rely solely on dateFinished IS NOT NULL or introduce a specific status for 'finished' states. Additionally, the SQL query itself does not filter by environment, but the subsequent filter call (line 273) does. This means the initial query fetches all finished jobs and filters in memory, which is inefficient. The SQL query should include WHERE (dateFinished IS NOT NULL OR results IS NOT NULL) AND environment = ? to filter at the database level.
• [INFO][documentation] New environment variables like DOCKER_COMPUTE_ENVIRONMENTS and K8_CLUSTERS have been introduced. Ensure that the documentation for these, especially regarding the structure of their JSON values and how to securely configure Kubernetes credentials, is comprehensive and clear. The current env.md only adds CRON_DELETE_DB_LOGS and CRON_CLEANUP_C2D_STORAGE.

alexcos20 and others added 30 commits October 1, 2024 08:26

start work on docker c2d

2c56498

more on db

bfcfc21

fix c2d database

30fdbf2

generic c2d schema

08f5d34

BREAKING: refactor ComputeAsset and ComputeStartCommand interfaces

ab52c87

add freeCompute handler

0c3f7e3

Merge branch 'main' into feature/c2d_docker

9e1fd39

working POC of Docker compute

da59f18

small fixes

b85325c

update deps

c3887aa

update ignores

610bc4e

fix getComputeJobResult and getComputeJobStatus

d26a8c5

fix conflicts merge main

0ecf3e5

wip: add sql lite db

88aecf1

merge main

f45278d

Merge branch 'main' into feature/c2d_docker

b8b86f1

more changes + refactor internal blob/body

e13a027

add get running jobs

07851ca

minor changes on DB struct, defaults

04ca135

add logging

66d97f7

Merge branch 'main' into feature/c2d_docker

b28ca2c

small refactor

36a8b95

change log to inof

4a4c6d7

Merge branch 'main' into feature/c2d_docker

4f15d9c

wip: adding cron to delete expired storage and expired jobs

28dd303

add test for getting expied and clean

ed5d2ec

refactor deletion, add .env properties

bcd0878

small refactor

ac5effa

Merge branch 'main' into feature/c2d_docker

85ac957

cleanup free job aftre download results

710ec29

paulo-ocean and others added 10 commits October 30, 2024 10:09

some logs

795efc6

Merge branch 'main' into feature/c2d_docker

0967141

more logs + small refactor

3c99d71

Merge branch 'main' into feature/c2d_docker

855c768

fix: check for file object

5aae86b

wip: add env vars for docket compute envs

ccf3d29

wip: add env vars for docket compute envs

d872753

merge main

745dc90

add config options

40121ab

wip

3bba502

paulo-ocean force-pushed the feature/c2d_docker branch from cffd0c1 to 21632b9 Compare December 3, 2024 16:39

Base automatically changed from feature/c2d_docker to main March 21, 2025 13:44

alexcos20 closed this Jan 6, 2026

alexcos20 deleted the feature/k8_v2 branch January 6, 2026 07:48

alexcos20 commented Jan 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/k8_v2 #754

Feature/k8_v2 #754

Uh oh!

alexcos20 commented Nov 18, 2024

Uh oh!

alexcos20 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature/k8_v2 #754

Feature/k8_v2 #754

Uh oh!

Conversation

alexcos20 commented Nov 18, 2024

Uh oh!

alexcos20 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants