-
Notifications
You must be signed in to change notification settings - Fork 19
Feature/k8_v2 #754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/k8_v2 #754
Conversation
cffd0c1 to
21632b9
Compare
alexcos20
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AI automated code review (Gemini 3).
Overall risk: high
Summary:
This pull request introduces a significant refactoring and expansion of the Compute-to-Data (C2D) functionality. It modularizes C2D type definitions, integrates Docker and Kubernetes as new compute backend engines, and adds support for "free compute" jobs. A new SQLite database is implemented for persistent C2D job state, alongside new API endpoints for managing compute jobs (including streaming logs) and cron jobs for cleaning up expired data. This is a major architectural change that enhances the C2D capabilities of the Ocean Node.
Comments:
• [INFO][other] The change from url to fileObject for ComputeAsset and ComputeAlgorithm is a good step towards a more generic data source abstraction. It allows different storage types to be handled consistently.
• [WARNING][bug] The jobId generated here uses generateUniqueID() which returns a UUID. However, the getStatus handler in src/components/core/compute/getStatus.ts expects the jobId to be in the format hash-jobId to identify the correct engine. The current implementation exposes the bare UUID as cjob.jobId, which could lead to ambiguity if multiple Docker engines were configured or if the job ID isn't globally unique across all engines without the hash prefix. Consider prepending the clusterHash to the jobId before returning it to the user, similar to the commented-out line cjob.jobId = this.getC2DConfig().hash + '-' + cjob.jobId.
• [WARNING][performance] Multiple uses of synchronous I/O operations like statSync (line 90) and writeFileSync (line 273), and tar.create({ sync: true }) (line 296) can block the Node.js event loop. This can severely impact performance and responsiveness, especially when dealing with large files or a high volume of concurrent compute jobs. These should be replaced with asynchronous fs/promises methods or stream-based operations where appropriate.
• [WARNING][bug] The stopComputeJob method for the Docker engine is currently a placeholder (return null). This method is essential for users to explicitly terminate their compute jobs. It needs to be fully implemented to ensure proper resource management and job lifecycle control.
• [INFO][other] The entrypoint '/data/transformation/algorithm' is hardcoded. While this might be a common case, consider if this should be more flexible, perhaps derived from the algorithm's metadata or allowing for different input/output paths/conventions, especially for complex algorithms.
• [INFO][other] The maxJobDuration for free compute is set to 30 seconds. While this might be intentional for a free tier, it's a very short duration for many real-world computations. Ensure this is clearly communicated to users and evaluate if a slightly longer duration might be more practical for basic tests.
• [ERROR][security] CRITICAL SECURITY VULNERABILITY: Hardcoded Kubernetes certData and keyData on lines 44-46 (and potentially related configuration in src/utils/config.ts) expose sensitive credentials directly in the codebase. This grants unauthorized access to the Kubernetes cluster and is an extremely severe security risk. These credentials must be removed from the code and loaded securely at runtime, for example, from environment variables, Kubernetes secrets, or a dedicated secrets management service. This must be addressed immediately before merging this PR.
• [ERROR][bug] The Kubernetes compute engine is currently incomplete. getComputeEnvironments returns an empty array, and getComputeJobResult (line 115) and cleanupExpiredStorage (line 130) explicitly throw "Not implemented" errors. Additionally, createpvc (line 144) calls createNamespace instead of the appropriate createNamespacedPersistentVolumeClaim, and the YAML template uses placeholder values that are not dynamically replaced. The engine, as implemented, is not functional for actual compute operations and should be marked as experimental or withheld until fully developed and tested.
• [ERROR][bug] The condition computeEnvironment.storageExpiry > Date.now() / 1000 to identify expired jobs appears to be inverted. To find jobs that are past their expiry date, the condition should be job.expireTimestamp < Date.now() / 1000. The storageExpiry field on computeEnvironment likely defines the retention policy for the environment, not the individual job's expiry, which is stored in job.expireTimestamp. This logic error will prevent expired jobs from being cleaned up correctly.
• [WARNING][bug] The getFinishedJobs SQL query dateFinished IS NOT NULL OR results IS NOT NULL might be too broad. If a job has results but dateFinished is null, it could be picked up prematurely. It might be safer to rely solely on dateFinished IS NOT NULL or introduce a specific status for 'finished' states. Additionally, the SQL query itself does not filter by environment, but the subsequent filter call (line 273) does. This means the initial query fetches all finished jobs and filters in memory, which is inefficient. The SQL query should include WHERE (dateFinished IS NOT NULL OR results IS NOT NULL) AND environment = ? to filter at the database level.
• [INFO][documentation] New environment variables like DOCKER_COMPUTE_ENVIRONMENTS and K8_CLUSTERS have been introduced. Ensure that the documentation for these, especially regarding the structure of their JSON values and how to securely configure Kubernetes credentials, is comprehensive and clear. The current env.md only adds CRON_DELETE_DB_LOGS and CRON_CLEANUP_C2D_STORAGE.
Initial code for k8. still heavy WIP