From c591bb209e6f478807e365b3321048c323cc2cd9 Mon Sep 17 00:00:00 2001 From: Johannes Oehmen Date: Tue, 17 Mar 2026 12:47:48 +0100 Subject: [PATCH 01/10] Initial Design document --- .../deadline-cloud-integration/.config.kiro | 1 + .../deadline-cloud-integration/design.md | 949 ++++++++++++++++++ 2 files changed, 950 insertions(+) create mode 100644 .kiro/specs/deadline-cloud-integration/.config.kiro create mode 100644 .kiro/specs/deadline-cloud-integration/design.md diff --git a/.kiro/specs/deadline-cloud-integration/.config.kiro b/.kiro/specs/deadline-cloud-integration/.config.kiro new file mode 100644 index 0000000..52fb3fa --- /dev/null +++ b/.kiro/specs/deadline-cloud-integration/.config.kiro @@ -0,0 +1 @@ +{"specId": "4b150a90-c8d3-4c94-9eba-9d1e668c9123", "workflowType": "design-first", "specType": "feature"} diff --git a/.kiro/specs/deadline-cloud-integration/design.md b/.kiro/specs/deadline-cloud-integration/design.md new file mode 100644 index 0000000..16b07b4 --- /dev/null +++ b/.kiro/specs/deadline-cloud-integration/design.md @@ -0,0 +1,949 @@ +# Design Document: AYON Deadline Cloud Integration + +## Overview + +This design describes the integration between AYON (a VFX/animation pipeline management tool) and AWS Deadline Cloud for render farm submission. The integration follows the approach hooking AYON into the Deadline Cloud Submitter tool rather than creating job bundles directly. AYON handles pipeline concerns (validation, settings pre-population, post-render publishing) while delegating actual job submission to the native Deadline Cloud Submitter. + +The addon is structured as a standard AYON server addon with a client-side component. The server side manages settings and configuration (farm profiles, DCC-specific defaults, submission parameters). The client side runs inside DCC applications (Maya, Houdini, etc.) and integrates with the AYON Publisher workflow — collecting render data, running validations, pre-populating the Deadline Cloud Submitter, and handling post-render publishing (version registration, transcoding, validation). + +The key architectural principle is **separation of responsibilities**: AYON owns the pipeline (what to render, how to validate, what to do after render), while Deadline Cloud owns the execution (scheduling, resource management, job lifecycle). The integration point is the Deadline Cloud Submitter's hook/callback system, where AYON injects its pipeline logic at well-defined stages. + +## MVP Scope + +### In Scope (MVP) + +- **Studio-configurable validations**: Pre-submission validation plugins that run before any resource-intensive operations. Includes both built-in technical validations (renderable camera exists, valid frame range) and studio-defined custom validations (required AOVs, render settings checks). +- **Publishing and processing results**: Version registration in AYON, transcoding, reviewable creation, burnins, file movement/renaming via path templates. +- **Basic job dependencies**: Render → Post-render (publish) dependency chain. + +### Deferred (Post-MVP) + +- **Advanced project tracking**: Higher-level job dependencies beyond render→publish, priority management across assets, planning integration, task status updates. +- **Asset-level dependencies**: Complex dependency graphs like "create render archives → render images → publish". +- **Cross-job orchestration**: Managing priorities and dependencies across multiple submissions. + +## Architecture + +```mermaid +graph TD + subgraph "DCC Application (Maya/Houdini/etc.)" + A[AYON Publisher UI] --> B[Collect Plugins] + B --> C[Validate Plugins] + C --> D[Deadline Cloud Submitter Bridge] + end + + subgraph "AYON Server" + E[Deadline Cloud Addon - Server] --> F[Settings / Farm Profiles] + F --> G[DCC-Specific Defaults] + end + + subgraph "Storage Layer" + S1[Job Attachments - S3 Bucket] + S2[Shared Storage - Storage Profiles] + end + + subgraph "AWS Deadline Cloud" + D --> H[Deadline Cloud Submitter Tool] + H -->|OJD Job Bundle| I[Deadline Cloud API] + I --> J[Render Jobs] + J --> K[Post-Render Job] + I -.->|CreateMonitor| MON[Job Monitor API] + end + + subgraph "Post-Render Pipeline" + K --> L[AYON Post-Render Script] + L --> M[Output Validation] + M --> N[Transcoding / Burnins] + N --> O[Version Registration in AYON] + end + + A -.->|fetch settings| E + D -.->|pre-populate options| H + J -->|read inputs| S1 + J -->|read inputs| S2 + J -->|write outputs| S1 + J -->|write outputs| S2 + L -->|access outputs| S1 + L -->|access outputs| S2 + L -.->|register versions| E +``` + +## Sequence Diagrams + +### Main Submission Flow + +```mermaid +sequenceDiagram + participant Artist + participant Publisher as AYON Publisher + participant Collector as Collect Plugins + participant Validator as Validate Plugins + participant Bridge as Submitter Bridge + participant Settings as AYON Server Settings + participant Submitter as DC Submitter Tool + participant DC as Deadline Cloud API + + Artist->>Publisher: Open Publisher, select render instances + Publisher->>Settings: Fetch farm profiles & DCC defaults + Settings-->>Publisher: DeadlineCloudSettings + Publisher->>Collector: Run collection plugins + Collector-->>Publisher: RenderInstance[] (layers, AOVs, cameras) + Publisher->>Validator: Run validation plugins + Validator-->>Publisher: Validation results (pass/fail) + + alt Validation Failed + Publisher-->>Artist: Show validation errors + else Validation Passed + Publisher->>Bridge: Submit via Deadline Cloud + Bridge->>Bridge: Map AYON instances to Submitter params + Bridge->>Submitter: Pre-populate settings & invoke submission + Submitter->>DC: CreateJob (render job) + DC-->>Submitter: job_id + Submitter->>DC: CreateJob (post-render job, depends on render) + DC-->>Submitter: post_job_id + Submitter-->>Bridge: Submission result (job IDs) + Bridge-->>Publisher: Submission complete + Publisher-->>Artist: Show success with job IDs + end +``` + +### Post-Render Publishing Flow + +```mermaid +sequenceDiagram + participant DC as Deadline Cloud + participant Script as Post-Render Script + participant AYON as AYON Server API + + DC->>Script: Trigger post-render job (render complete) + Script->>Script: Discover rendered output files + Script->>Script: Validate outputs (frame completeness, file integrity) + + alt Validation Failed + Script->>DC: Report failure + else Validation Passed + Script->>Script: Move/rename files via path templates + Script->>Script: Run transcoding (if configured) + Script->>Script: Apply burnins (if configured) + Script->>AYON: Register version (files, metadata) + AYON-->>Script: Version registered + Script->>DC: Report success + end +``` + +## Storage and Data Transfer + +AWS Deadline Cloud provides two options for managing input and output data: + +### Option 1: Job Attachments + +Deadline Cloud transfers data to and from Cloud Workers using S3 buckets: +- **Input sync**: Scene files and assets are uploaded to S3 and synced to workers when the job starts +- **Output sync**: Rendered outputs are synced back to the workstation when the job finishes +- **Linux VFS mount**: On Linux workers, job attachments can be mounted as a virtual filesystem for standard file access +- **Output retrieval**: The Deadline CLI provides commands to download job outputs, which can be run manually or as a scheduled CRON job + +### Option 2: Shared Storage (Storage Profiles) + +Uses storage profiles to remap paths between different filesystems and platforms: +- **Path remapping**: Automatically translates paths between Windows, Linux, and macOS workers +- **Existing files**: Files already on shared storage are not re-uploaded +- **Local files**: Files not on shared storage are uploaded to the job attachments S3 bucket +- **Cross-platform support**: Enables mixed-platform render farms with consistent path resolution + +### Post-Render Script Access + +The post-render script accesses rendered outputs via: +1. **Job attachments**: Download outputs using Deadline CLI before processing +2. **Shared storage**: Direct filesystem access using remapped paths from storage profiles +3. **Hybrid**: Combination based on storage configuration + +### Storage Configuration + +The `StorageConfig` in farm profiles determines which storage method is used and how paths are resolved. See the `StorageConfig` data model below. + +## Components and Interfaces + +### Component 1: Server Settings (`server/settings.py`) + +**Purpose**: Define all configurable settings for the addon — farm profiles, DCC defaults, submission parameters, and post-render options. Served to client via AYON server API. + +```python +class FarmProfile(BaseSettingsModel): + """A named farm configuration profile.""" + name: str + farm_id: str + queue_id: str + storage_profile_id: str | None = None + storage_config: StorageConfig = StorageConfig(storage_mode="job_attachments") + priority: int = 50 + max_retries: int = 3 + +class DCCSubmissionDefaults(BaseSettingsModel): + """Per-DCC default submission parameters.""" + dcc_name: str # "maya", "houdini", etc. + job_template: str | None = None + parameter_overrides: dict[str, Any] = {} + +class PostRenderSettings(BaseSettingsModel): + """Configuration for post-render processing.""" + enable_transcoding: bool = False + transcode_profiles: list[TranscodeProfile] = [] + burnin_config: BurninConfig = BurninConfig() + validate_frame_completeness: bool = True + validate_file_integrity: bool = True + +class DeadlineCloudSettings(BaseSettingsModel): + """Root settings model for the addon.""" + farm_profiles: list[FarmProfile] = [] + default_profile: str = "" + dcc_defaults: list[DCCSubmissionDefaults] = [] + post_render: PostRenderSettings = PostRenderSettings() + custom_validations: list[CustomValidation] = [] + auto_detect_credentials: bool = True + +class CustomValidation(BaseSettingsModel): + """Studio-configurable validation rule.""" + name: str + enabled: bool = True + dcc_scope: list[str] = [] # Empty = all DCCs, or ["maya", "houdini"] + validation_type: str # "required_aovs", "render_settings", "custom_script" + parameters: dict[str, Any] = {} + error_message: str = "" +``` + +**Responsibilities**: +- Store farm connection details (farm ID, queue ID, storage profiles) +- Define per-DCC submission defaults and job template overrides +- Configure post-render pipeline behavior (transcoding, validation) +- Provide sensible defaults for all settings + +### Component 2: Render Instance Collector (`client/plugins/collect_render.py`) + +**Purpose**: Collect render-related data from the DCC scene — render layers, AOVs, cameras, frame ranges — and package them as AYON instances for the publish pipeline. + +```python +class CollectedRenderInstance: + """Data collected from a DCC scene for a single render unit.""" + instance_name: str + render_layer: str + aovs: list[str] + cameras: list[str] + frame_range: tuple[int, int] + frame_step: int + scene_file: str + output_dir: str + expected_files: list[str] + dcc_specific_data: dict[str, Any] +``` + +**Responsibilities**: +- Query DCC scene for renderable items (layers, ROPs, write nodes) +- Resolve output paths using AYON anatomy templates +- Calculate expected output file lists for post-render validation +- Package DCC-specific data needed by the submitter + +### Component 3: Submitter Bridge (`client/plugins/submit_to_deadline_cloud.py`) + +**Purpose**: Bridge between AYON's publish pipeline and the Deadline Cloud Submitter tool. Maps AYON render instances to Submitter parameters, pre-populates settings, and invokes submission. The Submitter collects scene data, presents them in its UI, and creates an Open Job Description (OJD) job bundle — a grouped OJD template with asset references, parameter values, and additional files needed by the job. The job bundle is then submitted via the Deadline Cloud Python API. + +```python +class SubmitterBridge: + """Bridges AYON publish data to Deadline Cloud Submitter.""" + + def map_instance_to_params( + self, + instance: CollectedRenderInstance, + settings: DeadlineCloudSettings, + ) -> SubmissionParams: ... + + def pre_populate_submitter( + self, + submitter: Any, # DC Submitter tool handle + params: SubmissionParams, + ) -> None: ... + + def submit( + self, + instances: list[CollectedRenderInstance], + settings: DeadlineCloudSettings, + ) -> SubmissionResult: ... +``` + +**Responsibilities**: +- Translate AYON render instances into Deadline Cloud Submitter parameters +- Pre-populate the Submitter with AYON settings before submission +- Attach post-render job configuration to the submission +- Return job IDs and status back to the AYON publish pipeline +- Support job progress monitoring via the Deadline Cloud API (`CreateMonitor`) + +### Component 4: Validation Plugins (`client/plugins/validate_*.py`) + +**Purpose**: Run AYON-specific validations on collected render data before submission. These run within the AYON Publisher pipeline, before the Submitter is invoked. Validations are critical for farm rendering (especially cloud) to prevent wasting resources on jobs that would fail. + +**Validation Categories**: +- **Technical validations** (built-in): Renderable camera exists, valid output paths, scene file integrity +- **Project-context validations** (built-in): Frame range matches AYON context, correct folder/task assignment +- **Studio-configurable validations** (custom): Required render elements/AOVs present, specific render settings enforced, naming conventions, resolution checks — defined per-studio via settings + +```python +class ValidateFrameRange: + """Ensure frame range is valid and matches AYON context.""" + def process(self, instance: CollectedRenderInstance) -> None: ... + +class ValidateOutputPaths: + """Ensure output paths are resolvable and writable.""" + def process(self, instance: CollectedRenderInstance) -> None: ... + +class ValidateSceneIntegrity: + """DCC-specific scene checks before farm submission.""" + def process(self, instance: CollectedRenderInstance) -> None: ... + +class ValidateRenderElements: + """Studio-configurable: Check required AOVs/render elements are present.""" + def process(self, instance: CollectedRenderInstance) -> None: ... + +class ValidateRenderSettings: + """Studio-configurable: Enforce specific render settings (resolution, sampling, etc.).""" + def process(self, instance: CollectedRenderInstance) -> None: ... +``` + +**Responsibilities**: +- Validate frame ranges match AYON asset context +- Verify output paths are resolvable via anatomy templates +- Run DCC-specific scene integrity checks +- Execute studio-defined custom validations from settings +- Block submission if any validation fails (with actionable error messages) + +### Component 5: Post-Render Script (`client/scripts/post_render.py`) + +**Purpose**: Runs as a Deadline Cloud job after rendering completes. Handles output validation, transcoding, burnin application, and version registration in AYON. + +**Publishing Process**: +Publishing is the process where a version is registered in AYON. Beyond registration, various operations run during publishing: +- **File movement/renaming**: Locally produced data is moved and renamed to final destinations controlled by AYON path templates (anatomy) +- **Remote data integration**: Data produced outside (e.g., cloud renders) can be directly integrated to final destinations from their online locations, or first downloaded then processed +- **Transcoding**: Convert formats (e.g., EXR → JPEG for review) +- **Burnin application**: Add frame numbers, shot info, and other metadata overlays to review media +- **Additional validations**: Post-render checks on output quality and completeness + +```python +class PostRenderProcessor: + """Handles post-render pipeline on the farm.""" + + def discover_outputs( + self, expected_files: list[str] + ) -> list[str]: ... + + def validate_outputs( + self, discovered: list[str], expected: list[str] + ) -> ValidationResult: ... + + def transcode( + self, files: list[str], profile: TranscodeProfile + ) -> list[str]: ... + + def apply_burnins( + self, files: list[str], burnin_config: BurninConfig + ) -> list[str]: ... + + def move_to_final_destination( + self, files: list[str], anatomy_templates: dict + ) -> list[str]: ... + + def register_version( + self, files: list[str], metadata: dict[str, Any] + ) -> str: ... +``` + +**Responsibilities**: +- Discover and validate rendered output files (frame completeness, file integrity) +- Move/rename files to final destinations using AYON anatomy path templates +- Run transcoding if configured (e.g., EXR → JPEG for review) +- Apply burnins to review media (frame numbers, shot info, custom text) +- Register the rendered version in AYON (files, representations, metadata) +- Report success/failure back to Deadline Cloud + +## Data Models + +### SubmissionParams + +```python +@dataclass +class SubmissionParams: + """Parameters mapped from AYON instance to DC Submitter format.""" + job_name: str + farm_id: str + queue_id: str + priority: int + frame_range: str # "1-100" format for DC + scene_file: str + output_dir: str + job_template: str | None # OJD template name + job_bundle_dir: str | None # Path to assembled OJD job bundle + parameter_values: dict[str, Any] # Parameter values for the OJD template + storage_profile_id: str | None + storage_config: StorageConfig | None + max_retries: int + post_render_config: PostRenderConfig +``` + +**Validation Rules**: +- `job_name` must be non-empty and contain only alphanumeric, dash, underscore +- `farm_id` and `queue_id` must be valid AWS resource identifiers +- `priority` must be between 0 and 100 +- `frame_range` must match pattern `\d+(-\d+)?` +- `scene_file` must be an existing file path +- `post_render_config` must be present + +### PostRenderConfig + +```python +@dataclass +class PostRenderConfig: + """Configuration passed to the post-render job.""" + ayon_project: str + ayon_folder_path: str + ayon_task: str + ayon_product_name: str + expected_files: list[str] + representations: list[RepresentationConfig] + transcode_profiles: list[TranscodeProfile] + burnin_config: BurninConfig + anatomy_templates: dict[str, str] # Path templates for file movement/renaming + storage_config: StorageConfig # How to access rendered outputs + ayon_server_url: str + # Credentials handled via Deadline Cloud's secret management +``` + +**Validation Rules**: +- `ayon_project`, `ayon_folder_path`, `ayon_task` must be non-empty +- `expected_files` must contain at least one entry +- `representations` must contain at least one entry +- `ayon_server_url` must be a valid URL + +### SubmissionResult + +```python +@dataclass +class SubmissionResult: + """Result returned after submitting to Deadline Cloud.""" + success: bool + render_job_id: str | None + post_render_job_id: str | None + error_message: str | None = None + submitted_instances: list[str] = field(default_factory=list) +``` + +### TranscodeProfile + +```python +@dataclass +class TranscodeProfile: + """Defines a transcoding operation for post-render.""" + name: str + input_extension: str # e.g., ".exr" + output_extension: str # e.g., ".jpg" + ffmpeg_args: list[str] # Additional ffmpeg arguments + create_representation: bool = True +``` + +### StorageConfig + +```python +@dataclass +class StorageConfig: + """Configuration for storage and data transfer.""" + storage_mode: str # "job_attachments", "shared_storage", or "hybrid" + storage_profile_id: str | None = None + s3_bucket_name: str | None = None + + # Path mappings for cross-platform support + path_mappings: list[PathMapping] = field(default_factory=list) + + # Job attachments settings + auto_sync_inputs: bool = True + auto_sync_outputs: bool = True + use_vfs_on_linux: bool = False + + # Output retrieval settings + output_download_method: str = "manual" # "manual", "cron", "on_complete" + cron_schedule: str | None = None # e.g., "*/15 * * * *" + +@dataclass +class PathMapping: + """Maps paths between platforms for shared storage.""" + name: str + windows_path: str | None = None + linux_path: str | None = None + macos_path: str | None = None +``` + +**Validation Rules**: +- `storage_mode` must be one of: "job_attachments", "shared_storage", "hybrid" +- If `storage_mode` is "shared_storage" or "hybrid", `storage_profile_id` must be set +- If `storage_mode` is "job_attachments" or "hybrid", `s3_bucket_name` should be set (or use default) +- `path_mappings` must have at least two platform paths defined per mapping +- `output_download_method` must be one of: "manual", "cron", "on_complete" +- If `output_download_method` is "cron", `cron_schedule` must be a valid cron expression + +### BurninConfig + +```python +@dataclass +class BurninConfig: + """Configuration for burnin overlays on review media.""" + enabled: bool = False + frame_number: bool = True + shot_name: bool = True + task_name: bool = False + custom_text: str | None = None + font_size: int = 24 + position: str = "bottom" # "top", "bottom", "both" +``` + + +## Key Functions with Formal Specifications + +### Function 1: `SubmitterBridge.map_instance_to_params()` + +```python +def map_instance_to_params( + self, + instance: CollectedRenderInstance, + settings: DeadlineCloudSettings, +) -> SubmissionParams: + """Map an AYON render instance to Deadline Cloud submission parameters. + + Resolves the farm profile, applies DCC-specific defaults, and builds + the complete parameter set for the Submitter tool. + """ +``` + +**Preconditions:** +- `instance` is a fully collected render instance (all fields populated) +- `instance.frame_range[0] <= instance.frame_range[1]` +- `settings.farm_profiles` contains at least one profile +- `settings.default_profile` references a valid profile name, or the first profile is used + +**Postconditions:** +- Returns a valid `SubmissionParams` with all required fields populated +- `result.farm_id` and `result.queue_id` come from the resolved farm profile +- `result.frame_range` is formatted as DC-compatible string (e.g., "1-100") +- `result.post_render_config` contains all data needed for post-render publishing +- No side effects on `instance` or `settings` + +**Loop Invariants:** N/A + +### Function 2: `SubmitterBridge.submit()` + +```python +def submit( + self, + instances: list[CollectedRenderInstance], + settings: DeadlineCloudSettings, +) -> SubmissionResult: + """Submit render instances to Deadline Cloud via the Submitter tool. + + Maps each instance to submission params, pre-populates the Submitter, + and invokes submission. Creates both render and post-render jobs. + """ +``` + +**Preconditions:** +- `instances` is non-empty +- All instances have passed AYON validation +- `settings` contains valid farm profile configuration +- Deadline Cloud Submitter tool is available and authenticated + +**Postconditions:** +- If successful: `result.success is True`, `result.render_job_id` and `result.post_render_job_id` are valid job IDs +- If failed: `result.success is False`, `result.error_message` describes the failure +- Post-render job has a dependency on the render job (runs only after render completes) +- `result.submitted_instances` lists all instance names that were submitted +- No partial submissions: either all instances submit or none do + +**Loop Invariants:** +- For each processed instance: the instance has been mapped to valid `SubmissionParams` + +### Function 3: `PostRenderProcessor.validate_outputs()` + +```python +def validate_outputs( + self, + discovered: list[str], + expected: list[str], +) -> ValidationResult: + """Validate that rendered outputs match expectations. + + Checks frame completeness (all expected files exist) and + file integrity (files are non-zero size and readable). + """ +``` + +**Preconditions:** +- `expected` is non-empty (at least one expected output file) +- `discovered` contains absolute file paths +- `expected` contains absolute file paths + +**Postconditions:** +- `result.is_valid` is True if and only if all expected files are present in discovered and all pass integrity checks +- `result.missing_files` contains expected files not found in discovered +- `result.corrupt_files` contains files that exist but fail integrity checks +- No file system modifications + +**Loop Invariants:** +- After checking file `i`: `missing_files ∪ valid_files ∪ corrupt_files` accounts for all files checked so far + +### Function 4: `PostRenderProcessor.register_version()` + +```python +def register_version( + self, + files: list[str], + metadata: dict[str, Any], +) -> str: + """Register a new version in AYON with the rendered files. + + Creates representations for each file type and attaches + metadata (frame range, render stats, etc.) to the version. + """ +``` + +**Preconditions:** +- `files` is non-empty and all files exist on disk +- `metadata` contains required keys: `project`, `folder_path`, `task`, `product_name` +- AYON server is reachable and authenticated + +**Postconditions:** +- Returns the version ID of the newly created version +- All files are registered as representations on the version +- Metadata is attached to the version entity +- Version is visible in AYON UI after registration + +**Loop Invariants:** N/A + +## Algorithmic Pseudocode + +### Main Submission Algorithm + +```python +def execute_submission(publisher_context, settings): + """ + ALGORITHM: Main AYON-to-Deadline-Cloud submission workflow. + INPUT: publisher_context (collected AYON publish context), settings (addon settings) + OUTPUT: SubmissionResult + + This runs as the "extract" phase of the AYON publish pipeline. + """ + + # Step 1: Resolve farm profile + profile = resolve_farm_profile(settings) + assert profile is not None, "No valid farm profile found" + + # Step 2: Collect all render instances from publisher context + instances = [ + inst for inst in publisher_context.instances + if inst.data.get("family") == "render" + ] + assert len(instances) > 0, "No render instances to submit" + + # Step 3: Map each instance to submission parameters + all_params = [] + for instance in instances: + params = map_instance_to_params(instance, settings, profile) + assert params.farm_id != "" and params.queue_id != "" + all_params.append((instance, params)) + + # Step 4: Pre-populate and invoke Deadline Cloud Submitter + submitter = get_deadline_cloud_submitter() + results = [] + + for instance, params in all_params: + # Pre-populate submitter with AYON-derived settings + pre_populate_submitter(submitter, params) + + # Submit render job + render_job_id = submitter.submit_job(params) + + # Submit post-render job with dependency on render job + post_config = build_post_render_config(instance, settings) + post_job_id = submitter.submit_post_job( + post_config, depends_on=render_job_id + ) + + results.append(SubmissionResult( + success=True, + render_job_id=render_job_id, + post_render_job_id=post_job_id, + submitted_instances=[instance.instance_name], + )) + + # Step 5: Aggregate results + return aggregate_results(results) +``` + +### Post-Render Processing Algorithm + +```python +def execute_post_render(config: PostRenderConfig, settings: PostRenderSettings): + """ + ALGORITHM: Post-render processing on the farm. + INPUT: config (PostRenderConfig from submission), settings (PostRenderSettings) + OUTPUT: success (bool) + + Runs as a Deadline Cloud job after rendering completes. + """ + + # Step 1: Discover rendered output files + discovered = discover_output_files(config.expected_files) + + # Step 2: Validate outputs + validation = validate_outputs(discovered, config.expected_files) + + if not validation.is_valid: + report_failure( + f"Missing: {validation.missing_files}, " + f"Corrupt: {validation.corrupt_files}" + ) + return False + + # Step 3: Move files to final destinations via path templates + final_files = move_to_final_destination( + discovered, config.anatomy_templates + ) + + # Step 4: Transcode if configured + all_files = list(final_files) + for profile in config.transcode_profiles: + matching = [f for f in final_files if f.endswith(profile.input_extension)] + if matching: + transcoded = transcode_files(matching, profile) + all_files.extend(transcoded) + + # Step 5: Apply burnins to review media if configured + if settings.burnin_config.enabled: + review_files = [f for f in all_files if is_review_format(f)] + if review_files: + burnin_files = apply_burnins(review_files, settings.burnin_config) + all_files.extend(burnin_files) + + # Step 6: Build representations + representations = build_representations(all_files, config.representations) + + # Step 7: Register version in AYON + metadata = { + "project": config.ayon_project, + "folder_path": config.ayon_folder_path, + "task": config.ayon_task, + "product_name": config.ayon_product_name, + } + version_id = register_version(representations, metadata) + + assert version_id is not None, "Version registration failed" + return True +``` + +### Farm Profile Resolution Algorithm + +```python +def resolve_farm_profile( + settings: DeadlineCloudSettings, + override_name: str | None = None, +) -> FarmProfile: + """ + ALGORITHM: Resolve which farm profile to use for submission. + INPUT: settings (addon settings), override_name (optional explicit profile) + OUTPUT: FarmProfile + + Priority: explicit override > instance-level setting > default profile > first profile + """ + + profiles_by_name = {p.name: p for p in settings.farm_profiles} + assert len(profiles_by_name) > 0, "No farm profiles configured" + + # Check explicit override first + if override_name and override_name in profiles_by_name: + return profiles_by_name[override_name] + + # Fall back to default profile + if settings.default_profile and settings.default_profile in profiles_by_name: + return profiles_by_name[settings.default_profile] + + # Last resort: first profile + return settings.farm_profiles[0] +``` + +## Example Usage + +```python +# Example 1: Server settings configuration (in AYON UI) +settings = DeadlineCloudSettings( + farm_profiles=[ + FarmProfile( + name="production", + farm_id="farm-abc123", + queue_id="queue-xyz789", + storage_profile_id="sp-def456", + priority=50, + max_retries=3, + ), + FarmProfile( + name="previs", + farm_id="farm-abc123", + queue_id="queue-previs", + priority=30, + ), + ], + default_profile="production", + dcc_defaults=[ + DCCSubmissionDefaults( + dcc_name="maya", + job_template="maya-arnold-render", + parameter_overrides={"renderer": "arnold"}, + ), + ], + post_render=PostRenderSettings( + enable_transcoding=True, + transcode_profiles=[ + TranscodeProfile( + name="review", + input_extension=".exr", + output_extension=".jpg", + ffmpeg_args=["-q:v", "2"], + ), + ], + ), +) + +# Example 2: Submission from AYON Publisher (client-side plugin) +bridge = SubmitterBridge() +result = bridge.submit( + instances=collected_render_instances, + settings=addon_settings, +) +if result.success: + print(f"Render job: {result.render_job_id}") + print(f"Post-render job: {result.post_render_job_id}") +else: + print(f"Submission failed: {result.error_message}") + +# Example 3: Post-render script execution (on farm worker) +processor = PostRenderProcessor() +outputs = processor.discover_outputs(config.expected_files) +validation = processor.validate_outputs(outputs, config.expected_files) +if validation.is_valid: + processor.transcode(outputs, transcode_profile) + version_id = processor.register_version(outputs, metadata) +``` + +## Correctness Properties + +The following properties must hold for the integration to be correct: + +1. **Submission Atomicity**: For any set of render instances submitted together, either all instances are submitted successfully (all job IDs returned) or none are (rollback on partial failure). + +2. **Settings Propagation**: For all settings `s` configured in AYON server and all submissions using those settings, the Deadline Cloud Submitter receives parameters consistent with `s` — i.e., `submitter.farm_id == resolved_profile(s).farm_id`. + +3. **Validation Gate**: For all render instances `i`, if any validation plugin reports failure on `i`, then `i` is never submitted to Deadline Cloud. Formally: `∀i: validation_failed(i) ⟹ ¬submitted(i)`. + +4. **Post-Render Ordering**: For all post-render jobs `p` with dependency on render job `r`, `p` executes only after `r` completes successfully. Formally: `∀(r, p): depends_on(p, r) ⟹ completed(r) before started(p)`. + +5. **Frame Completeness**: For all post-render validations, the set of discovered files must be a superset of expected files for the validation to pass. Formally: `∀v: v.is_valid ⟹ expected_files ⊆ discovered_files`. + +6. **Version Registration Idempotency**: Registering the same version with the same files and metadata multiple times produces exactly one version in AYON (handles retries gracefully). + +7. **Profile Resolution Determinism**: For the same settings and override inputs, `resolve_farm_profile` always returns the same profile. The resolution order is deterministic: explicit override > default > first. + +## Error Handling + +### Error Scenario 1: Deadline Cloud Submitter Not Available + +**Condition**: The Deadline Cloud Submitter tool/library is not installed or not importable in the DCC environment. +**Response**: Fail early during plugin discovery with a clear error message: "AWS Deadline Cloud Submitter is not installed. Please install the deadline-cloud-for-{dcc} package." +**Recovery**: User installs the required submitter package and retries. + +### Error Scenario 2: Authentication Failure + +**Condition**: AWS credentials are not configured or expired when attempting submission. +**Response**: Catch authentication errors from the Submitter and surface them in the AYON Publisher UI with guidance on credential configuration. +**Recovery**: User configures AWS credentials (via `aws configure`, environment variables, or Deadline Cloud Monitor) and retries. + +### Error Scenario 3: Partial Render Failure (Missing Frames) + +**Condition**: Render job completes but some frames are missing or corrupt. +**Response**: Post-render validation detects missing/corrupt files, reports the specific frames affected, and marks the Deadline Cloud job as failed. +**Recovery**: Artist can re-submit only the failed frames (if supported) or re-submit the entire job. The post-render job does not register a partial version. + +### Error Scenario 4: AYON Server Unreachable During Post-Render + +**Condition**: Post-render script cannot reach the AYON server to register the version. +**Response**: Retry with exponential backoff (3 attempts, 5s/15s/45s delays). If all retries fail, mark the Deadline Cloud job as failed with the connection error. +**Recovery**: Once AYON server is back, the post-render job can be manually retried from the Deadline Cloud console. + +### Error Scenario 5: Invalid Farm Profile Configuration + +**Condition**: Settings reference a farm ID or queue ID that doesn't exist in Deadline Cloud. +**Response**: The Submitter returns an error on submission. The bridge catches this and reports it in the Publisher UI with the specific invalid resource ID. +**Recovery**: Admin corrects the farm profile settings in AYON server. + +## Job Monitoring + +Job progress can be monitored via the Deadline Cloud API using `CreateMonitor`. This enables: +- Real-time progress tracking of render jobs from within AYON +- Notification when jobs complete, fail, or require attention +- Integration with AYON's event system for automated status updates + +For MVP, monitoring is informational only — artists can check job status via the Deadline Cloud Monitor UI or the AYON Publisher. Deeper integration (automatic retries, AYON task status updates) is deferred to post-MVP. + +## Testing Strategy + +### Unit Testing Approach + +- Test `map_instance_to_params` with various instance configurations and settings combinations +- Test `resolve_farm_profile` with all priority paths (override, default, fallback) +- Test validation plugins independently with mock DCC data +- Test `PostRenderProcessor.validate_outputs` with complete, partial, and empty file sets +- Test `PostRenderConfig` serialization/deserialization (data must survive round-trip through Deadline Cloud job parameters) +- Coverage goal: 90%+ on bridge logic and validation plugins + +### Property-Based Testing Approach + +**Property Test Library**: `hypothesis` (Python) + +- **Profile resolution determinism**: For any valid settings, calling `resolve_farm_profile` twice with the same inputs returns the same result +- **Frame range mapping**: For any valid `(start, end)` tuple where `start <= end`, the mapped DC frame range string parses back to the same range +- **Submission params completeness**: For any valid `CollectedRenderInstance` and `DeadlineCloudSettings`, `map_instance_to_params` returns params where all required fields are non-empty +- **Validation correctness**: For any file list where `expected ⊆ discovered`, validation returns `is_valid=True`; for any list where `expected ⊄ discovered`, returns `is_valid=False` + +### Integration Testing Approach + +- End-to-end submission test with mocked Deadline Cloud API (using `moto` or similar) +- Test the full Publisher pipeline: collect → validate → submit → verify job creation +- Test post-render script with fixture files simulating rendered outputs +- Test AYON version registration with a test AYON server instance +- DCC-specific integration tests for Maya and Houdini collectors (requires DCC licenses in CI or mock DCC APIs) + +## Performance Considerations + +- **Batch submission**: When submitting multiple render layers, batch API calls to Deadline Cloud where possible rather than one-at-a-time +- **File discovery**: For large frame ranges (1000+ frames), use parallel file existence checks in post-render validation +- **Settings caching**: Cache resolved farm profiles and DCC defaults for the duration of a publish session (settings don't change mid-publish) +- **Transcoding parallelism**: Run transcoding operations in parallel across frames using worker threads, bounded by available CPU cores + +## Security Considerations + +- **AWS Credentials**: Never store AWS credentials in AYON settings. Rely on Deadline Cloud's native credential management (Deadline Cloud Monitor, IAM roles, environment variables) +- **AYON API Token for Post-Render**: The post-render script needs AYON server access. Use Deadline Cloud's secret management to pass the AYON API token to the job, never embed it in job parameters +- **Scene File Access**: Ensure farm workers have read access to scene files and write access to output directories via Deadline Cloud storage profiles +- **Input Sanitization**: Validate all user-provided strings (job names, paths) before passing to the Submitter to prevent injection + +## Dependencies + +- **AYON Server** (>= 1.0.7): Server-side addon hosting and settings API +- **AYON Launcher**: Client-side plugin execution environment +- **AWS Deadline Cloud Submitter** (`deadline-cloud-for-maya`, `deadline-cloud-for-houdini`, etc.): Native DCC submitter tools that handle actual job creation +- **`deadline` Python package**: AWS Deadline Cloud client library for API interactions +- **`ayon-python-api`**: AYON server API client (used in post-render script for version registration) +- **`ffmpeg`** (optional): Required on farm workers if transcoding is enabled +- **DCC Applications**: Maya, Houdini (and potentially others) with their respective AYON integrations installed From 3c37551f5891451a6500f428e70136a2b61d8d2c Mon Sep 17 00:00:00 2001 From: Johannes Oehmen Date: Tue, 17 Mar 2026 13:35:47 +0100 Subject: [PATCH 02/10] Amending Design document --- .../deadline-cloud-integration/design.md | 398 +++++++++++++++++- 1 file changed, 384 insertions(+), 14 deletions(-) diff --git a/.kiro/specs/deadline-cloud-integration/design.md b/.kiro/specs/deadline-cloud-integration/design.md index 16b07b4..81fa474 100644 --- a/.kiro/specs/deadline-cloud-integration/design.md +++ b/.kiro/specs/deadline-cloud-integration/design.md @@ -113,9 +113,18 @@ sequenceDiagram sequenceDiagram participant DC as Deadline Cloud participant Script as Post-Render Script + participant CLI as Deadline Cloud CLI participant AYON as AYON Server API DC->>Script: Trigger post-render job (render complete) + + alt Job Attachments Mode + Script->>CLI: deadline job download-output + CLI-->>Script: Downloaded output files to local path + else Shared Storage Mode + Script->>Script: Access outputs directly via remapped paths + end + Script->>Script: Discover rendered output files Script->>Script: Validate outputs (frame completeness, file integrity) @@ -142,6 +151,8 @@ Deadline Cloud transfers data to and from Cloud Workers using S3 buckets: - **Output sync**: Rendered outputs are synced back to the workstation when the job finishes - **Linux VFS mount**: On Linux workers, job attachments can be mounted as a virtual filesystem for standard file access - **Output retrieval**: The Deadline CLI provides commands to download job outputs, which can be run manually or as a scheduled CRON job +- **Automatic output downloads (TBD)**: Deadline Cloud supports automatic output downloads via `deadline queue sync-output` configured as a cron job or scheduled task. This requires additional setup: dedicated long-term IAM credentials (not Deadline Cloud Monitor credentials), a storage profile with all output paths configured, and a checkpoint directory for tracking download progress. See [AWS docs: Automatic downloads](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/auto-downloads.html). The exact integration approach (whether AYON manages this configuration or defers to studio-level setup) is TBD. +- **No direct S3 access**: Render output data is encrypted and cannot be accessed directly from S3 buckets. All output retrieval must go through the Deadline Cloud CLI output download mechanism (e.g., `deadline job download-output` or `deadline queue sync-output`). This is a hard constraint of the job attachments mode. ### Option 2: Shared Storage (Storage Profiles) @@ -154,9 +165,9 @@ Uses storage profiles to remap paths between different filesystems and platforms ### Post-Render Script Access The post-render script accesses rendered outputs via: -1. **Job attachments**: Download outputs using Deadline CLI before processing -2. **Shared storage**: Direct filesystem access using remapped paths from storage profiles -3. **Hybrid**: Combination based on storage configuration +1. **Job attachments**: Outputs must first be downloaded using the Deadline Cloud CLI (`deadline job download-output`) before any processing. There is no direct S3 access — data is encrypted and can only be retrieved through the CLI download mechanism. This adds a mandatory "download outputs" step as the first operation in the post-render pipeline. Alternatively, if automatic downloads are configured (`deadline queue sync-output` as a cron job), outputs may already be available locally — but this requires additional IAM and storage profile setup (TBD, see [AWS docs](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/auto-downloads.html)). +2. **Shared storage**: Direct filesystem access using remapped paths from storage profiles. No download step required. +3. **Hybrid**: Combination based on storage configuration — shared storage files are accessed directly, job attachment files require CLI download first. ### Storage Configuration @@ -193,15 +204,64 @@ class PostRenderSettings(BaseSettingsModel): validate_frame_completeness: bool = True validate_file_integrity: bool = True +class HostRequirements(BaseSettingsModel): + """Hardware/OS requirements for Deadline Cloud worker hosts. + + Overrides the host requirements in the Deadline Cloud Submitter's + job settings. When set, these values are injected into the OJD + template's hostRequirements section, replacing the submitter defaults. + All fields are optional — only non-None values override the submitter. + """ + os_family: str | None = None # "linux", "windows", "macos" + cpu_arch: str | None = None # "x86_64", "arm64" + min_vcpu: int | None = None # Minimum vCPUs + max_vcpu: int | None = None # Maximum vCPUs + min_memory_mib: int | None = None # Minimum memory in MiB + max_memory_mib: int | None = None # Maximum memory in MiB + min_gpu: int | None = None # Minimum GPU count + max_gpu: int | None = None # Maximum GPU count + min_gpu_memory_mib: int | None = None # Minimum GPU memory in MiB + max_gpu_memory_mib: int | None = None # Maximum GPU memory in MiB + +class CondaPackage(BaseSettingsModel): + """A single conda package specification.""" + name: str # e.g., "maya", "maya-openjd", "maya-vray" + version: str # Explicit version spec (e.g., "2026.*"), or "auto" to use installed version from artist machine, or "" for latest + +class CondaConfig(BaseSettingsModel): + """Conda package and channel configuration for farm workers. + + Overrides the default auto-detection behavior of the Deadline Cloud + DCC submitters (e.g., deadline-cloud-for-maya), allowing studios to + pin specific package versions from AYON server settings. + """ + packages: list[CondaPackage] = [] # e.g., [{"name": "maya", "version": "2026.*"}, {"name": "maya-openjd", "version": "auto"}, {"name": "maya-vray", "version": ""}] + channels: list[str] = [] # Custom conda channels (overrides default channels if non-empty) + +class QueueConfig(BaseSettingsModel): + """A named Deadline Cloud queue.""" + name: str # Display name + queue_id: str # AWS queue ID + farm_id: str # Associated farm ID + description: str = "" + class DeadlineCloudSettings(BaseSettingsModel): """Root settings model for the addon.""" farm_profiles: list[FarmProfile] = [] default_profile: str = "" + available_queues: list[QueueConfig] = [] # All available queues defined at server level + default_queue_id: str = "" # Server-level default queue + conda_config: CondaConfig = CondaConfig() # Conda package/channel overrides + host_requirements: HostRequirements = HostRequirements() # Worker host hardware/OS overrides dcc_defaults: list[DCCSubmissionDefaults] = [] post_render: PostRenderSettings = PostRenderSettings() custom_validations: list[CustomValidation] = [] auto_detect_credentials: bool = True +class ProjectDeadlineCloudSettings(BaseSettingsModel): + """Per-project overrides for Deadline Cloud settings.""" + default_queue_id: str = "" # Project-level override; empty = use server default + class CustomValidation(BaseSettingsModel): """Studio-configurable validation rule.""" name: str @@ -268,8 +328,119 @@ class SubmitterBridge: instances: list[CollectedRenderInstance], settings: DeadlineCloudSettings, ) -> SubmissionResult: ... + + def _get_parameter_values( + self, + instance: CollectedRenderInstance, + settings: DeadlineCloudSettings, + ) -> dict[str, Any]: + """Build parameter values for the OJD template. + + Overrides the native submitter's auto-detected conda packages + with AYON-configured values. Specifically, this populates the + `CondaPackages` and `RezPackages` shared parameter values that + are passed to `SubmitJobToDeadlineDialog`. + + If settings.conda_config.packages is non-empty, those packages + replace the auto-detected values (e.g., the default + `conda_packages = f"maya={maya_version}.* maya-openjd={adaptor_version}.*"` + from deadline-cloud-for-maya). + + For packages with version="auto", the installed version from the + artist's machine is resolved at submission time. + """ + ... + + def _resolve_conda_packages( + self, + conda_config: CondaConfig, + dcc_context: dict[str, Any], + ) -> str: + """Resolve conda package string from AYON settings. + + Builds the conda package specification string by combining + AYON-configured packages with version resolution: + - Explicit versions are used as-is (e.g., "maya=2026.*") + - "auto" versions are resolved from the artist's installed DCC + - Empty versions use latest (e.g., "maya-vray") + + Returns a space-separated package string compatible with the + Deadline Cloud submitter's CondaPackages parameter. + """ + ... + + def _resolve_queue_id( + self, + settings: DeadlineCloudSettings, + project_settings: ProjectDeadlineCloudSettings | None, + instance_override: str | None = None, + ) -> str: + """Resolve which queue ID to use for submission. + + Priority order: + 1. Instance-level override (if provided) + 2. Project default queue (from project settings) + 3. Server default queue (from server settings) + 4. First available queue (fallback) + """ + ... + + def _resolve_host_requirements( + self, + host_req: HostRequirements, + ) -> dict[str, Any] | None: + """Resolve host requirements from AYON settings into OJD format. + + Converts non-None fields from HostRequirements into the + hostRequirements dict expected by the OJD template. Only + fields explicitly set in AYON settings are included — unset + fields fall through to the submitter's defaults. + + Returns None if no fields are set (submitter defaults preserved). + """ + ... ``` +#### Conda Package Override Behavior + +The native Deadline Cloud DCC submitters (e.g., `deadline-cloud-for-maya`) auto-detect the DCC version and pull the latest compatible conda packages automatically. For example, the Maya submitter builds: +```python +conda_packages = f"maya={maya_version}.* maya-openjd={adaptor_version}.*" +``` + +The AYON integration overrides this behavior when `conda_config.packages` is configured in server settings. AYON's settings take precedence over the auto-detected values. This is an intentional design decision — studios need version pinning control for reproducibility and stability on the farm. + +Key override rules: +- If `conda_config.packages` is non-empty, AYON builds the `CondaPackages` parameter value from settings instead of using auto-detection +- The Maya version always comes from AYON server settings (not auto-detected from the artist's DCC) +- For `maya-openjd`, the version depends on what's installed on the artist's machine when `version="auto"` is set — this allows the adaptor version to track the artist's local installation while still being explicitly controllable +- If `conda_config.channels` is non-empty, those channels override the default conda channels +- If `conda_config` is empty/default, the native submitter's auto-detection behavior is preserved (backward compatible) + +> **Integration Consideration**: This override may conflict with the native submitter's auto-detection logic. The AYON integration explicitly takes precedence. Studios should be aware that enabling conda config in AYON settings will suppress the submitter's built-in version resolution. This is documented as a known integration point that requires coordination between AYON addon updates and Deadline Cloud submitter updates. + +#### Queue Resolution + +Queue selection follows a priority chain: +1. **Instance-level override**: Explicit queue set on a specific render instance +2. **Project default queue**: `ProjectDeadlineCloudSettings.default_queue_id` for the current AYON project +3. **Server default queue**: `DeadlineCloudSettings.default_queue_id` +4. **First available queue**: Falls back to the first entry in `DeadlineCloudSettings.available_queues` + +This allows studios to define all available queues centrally, set a global default, and let individual projects override as needed. + +#### Host Requirements Override Behavior + +The native Deadline Cloud Submitter exposes host requirements in its job settings UI (OS family, vCPU, memory, GPU). The AYON integration allows studios to override these from server settings via `HostRequirements`. + +Key override rules: +- Only non-None fields in `HostRequirements` override the submitter's values — unset fields preserve the submitter's defaults or artist's manual selections +- This is a partial override model: studios can pin OS family and GPU requirements while leaving CPU/memory to the submitter defaults +- Host requirements are injected into the OJD template's `hostRequirements` section before submission +- If all fields are None (default), the submitter's host requirements are preserved entirely (backward compatible) + +> **Integration Consideration**: Host requirements interact with Deadline Cloud's fleet configuration. Studios should ensure that the configured requirements match available fleet capacity — e.g., requesting GPU workers when no GPU fleet is provisioned will cause jobs to remain queued indefinitely. + **Responsibilities**: - Translate AYON render instances into Deadline Cloud Submitter parameters - Pre-populate the Submitter with AYON settings before submission @@ -331,6 +502,22 @@ Publishing is the process where a version is registered in AYON. Beyond registra class PostRenderProcessor: """Handles post-render pipeline on the farm.""" + def download_outputs( + self, config: PostRenderConfig + ) -> list[str]: + """Download render outputs via Deadline Cloud CLI. + + Required for job attachments mode — outputs are encrypted in S3 + and can only be retrieved through the Deadline Cloud CLI + (e.g., `deadline job download-output`). + + For shared storage mode, this is a no-op that returns the + expected file paths directly (files are already accessible). + + Returns the local file paths of downloaded/accessible outputs. + """ + ... + def discover_outputs( self, expected_files: list[str] ) -> list[str]: ... @@ -382,6 +569,9 @@ class SubmissionParams: job_template: str | None # OJD template name job_bundle_dir: str | None # Path to assembled OJD job bundle parameter_values: dict[str, Any] # Parameter values for the OJD template + conda_packages: str | None # Resolved conda package string (overrides auto-detection if set) + conda_channels: list[str] | None # Custom conda channels (overrides defaults if set) + host_requirements: dict[str, Any] | None # Resolved host requirements (overrides submitter defaults if set) storage_profile_id: str | None storage_config: StorageConfig | None max_retries: int @@ -469,6 +659,12 @@ class StorageConfig: # Output retrieval settings output_download_method: str = "manual" # "manual", "cron", "on_complete" cron_schedule: str | None = None # e.g., "*/15 * * * *" + + # Automatic download settings (TBD — requires dedicated IAM credentials + # and storage profile configuration, see AWS docs: Automatic downloads) + auto_download_enabled: bool = False + auto_download_checkpoint_dir: str | None = None # Checkpoint dir for sync-output tracking + auto_download_aws_profile: str | None = None # AWS credentials profile name (e.g., "deadline-downloader") @dataclass class PathMapping: @@ -528,9 +724,13 @@ def map_instance_to_params( **Postconditions:** - Returns a valid `SubmissionParams` with all required fields populated -- `result.farm_id` and `result.queue_id` come from the resolved farm profile +- `result.farm_id` and `result.queue_id` come from the resolved farm profile and queue resolution +- `result.queue_id` follows the resolution priority: instance override > project default > server default > first available - `result.frame_range` is formatted as DC-compatible string (e.g., "1-100") - `result.post_render_config` contains all data needed for post-render publishing +- If `settings.conda_config.packages` is non-empty, `result.conda_packages` is the resolved conda string from AYON settings (not auto-detected) +- If `settings.conda_config.packages` is empty, `result.conda_packages` is None (native auto-detection preserved) +- If any field in `settings.host_requirements` is non-None, `result.host_requirements` contains only those fields; otherwise `result.host_requirements` is None (submitter defaults preserved) - No side effects on `instance` or `settings` **Loop Invariants:** N/A @@ -641,6 +841,19 @@ def execute_submission(publisher_context, settings): profile = resolve_farm_profile(settings) assert profile is not None, "No valid farm profile found" + # Step 1b: Resolve conda packages from AYON settings (overrides auto-detection) + conda_packages = resolve_conda_packages( + settings.conda_config, + dcc_context=get_dcc_context(), # Artist's installed DCC/adaptor versions + ) + + # Step 1c: Resolve queue ID (instance override > project > server > first available) + project_settings = get_project_settings(publisher_context.project_name) + queue_id = resolve_queue_id(settings, project_settings) + + # Step 1d: Resolve host requirements from AYON settings + host_requirements = resolve_host_requirements(settings.host_requirements) + # Step 2: Collect all render instances from publisher context instances = [ inst for inst in publisher_context.instances @@ -653,6 +866,15 @@ def execute_submission(publisher_context, settings): for instance in instances: params = map_instance_to_params(instance, settings, profile) assert params.farm_id != "" and params.queue_id != "" + # Override queue_id with resolved value + params.queue_id = queue_id + # Apply conda package override if configured + if conda_packages: + params.conda_packages = conda_packages + params.conda_channels = settings.conda_config.channels or None + # Apply host requirements override if configured + if host_requirements: + params.host_requirements = host_requirements all_params.append((instance, params)) # Step 4: Pre-populate and invoke Deadline Cloud Submitter @@ -695,10 +917,22 @@ def execute_post_render(config: PostRenderConfig, settings: PostRenderSettings): Runs as a Deadline Cloud job after rendering completes. """ - # Step 1: Discover rendered output files - discovered = discover_output_files(config.expected_files) - - # Step 2: Validate outputs + # Step 1: Download render outputs (mandatory for job attachments mode) + # In job attachments mode, outputs are encrypted in S3 and must be + # retrieved via the Deadline Cloud CLI before any processing. + # In shared storage mode, this is a no-op (files already accessible). + if config.storage_config.storage_mode in ("job_attachments", "hybrid"): + local_paths = download_outputs_via_cli(config) + # Uses: deadline job download-output + assert len(local_paths) > 0, "No outputs downloaded from Deadline Cloud" + else: + # Shared storage: files are directly accessible via remapped paths + local_paths = config.expected_files + + # Step 2: Discover rendered output files + discovered = discover_output_files(local_paths) + + # Step 3: Validate outputs validation = validate_outputs(discovered, config.expected_files) if not validation.is_valid: @@ -708,12 +942,12 @@ def execute_post_render(config: PostRenderConfig, settings: PostRenderSettings): ) return False - # Step 3: Move files to final destinations via path templates + # Step 4: Move files to final destinations via path templates final_files = move_to_final_destination( discovered, config.anatomy_templates ) - # Step 4: Transcode if configured + # Step 5: Transcode if configured all_files = list(final_files) for profile in config.transcode_profiles: matching = [f for f in final_files if f.endswith(profile.input_extension)] @@ -721,17 +955,17 @@ def execute_post_render(config: PostRenderConfig, settings: PostRenderSettings): transcoded = transcode_files(matching, profile) all_files.extend(transcoded) - # Step 5: Apply burnins to review media if configured + # Step 6: Apply burnins to review media if configured if settings.burnin_config.enabled: review_files = [f for f in all_files if is_review_format(f)] if review_files: burnin_files = apply_burnins(review_files, settings.burnin_config) all_files.extend(burnin_files) - # Step 6: Build representations + # Step 7: Build representations representations = build_representations(all_files, config.representations) - # Step 7: Register version in AYON + # Step 8: Register version in AYON metadata = { "project": config.ayon_project, "folder_path": config.ayon_folder_path, @@ -774,6 +1008,76 @@ def resolve_farm_profile( return settings.farm_profiles[0] ``` +### Queue Resolution Algorithm + +```python +def resolve_queue_id( + settings: DeadlineCloudSettings, + project_settings: ProjectDeadlineCloudSettings | None = None, + instance_override: str | None = None, +) -> str: + """ + ALGORITHM: Resolve which queue ID to use for submission. + INPUT: settings (server settings), project_settings (per-project overrides), instance_override (optional) + OUTPUT: queue_id (str) + + Priority: instance override > project default > server default > first available queue + """ + + # 1. Instance-level override takes highest priority + if instance_override: + return instance_override + + # 2. Project-level default queue + if project_settings and project_settings.default_queue_id: + return project_settings.default_queue_id + + # 3. Server-level default queue + if settings.default_queue_id: + return settings.default_queue_id + + # 4. Fall back to first available queue + assert len(settings.available_queues) > 0, "No queues configured" + return settings.available_queues[0].queue_id +``` + +### Conda Package Resolution Algorithm + +```python +def resolve_conda_packages( + conda_config: CondaConfig, + dcc_context: dict[str, Any], +) -> str: + """ + ALGORITHM: Resolve conda package string from AYON settings. + INPUT: conda_config (from server settings), dcc_context (artist's DCC environment info) + OUTPUT: conda_packages_str (space-separated package spec string) + + If conda_config.packages is empty, returns empty string (native submitter + auto-detection is preserved). Otherwise, builds the package string from + AYON settings, overriding the submitter's auto-detected values. + """ + + if not conda_config.packages: + return "" # No override — let native submitter auto-detect + + parts = [] + for pkg in conda_config.packages: + if pkg.version == "auto": + # Resolve version from artist's installed DCC/adaptor + installed_version = dcc_context.get(f"{pkg.name}_version", "") + if installed_version: + parts.append(f"{pkg.name}={installed_version}.*") + else: + parts.append(pkg.name) # Fall back to latest if not detected + elif pkg.version: + parts.append(f"{pkg.name}={pkg.version}") + else: + parts.append(pkg.name) # Empty version = latest + + return " ".join(parts) +``` + ## Example Usage ```python @@ -796,6 +1100,29 @@ settings = DeadlineCloudSettings( ), ], default_profile="production", + available_queues=[ + QueueConfig( + name="Main Render Queue", + queue_id="queue-xyz789", + farm_id="farm-abc123", + description="Primary production render queue", + ), + QueueConfig( + name="Previs Queue", + queue_id="queue-previs", + farm_id="farm-abc123", + description="Lower priority previs renders", + ), + ], + default_queue_id="queue-xyz789", + conda_config=CondaConfig( + packages=[ + CondaPackage(name="maya", version="2026.*"), + CondaPackage(name="maya-openjd", version="auto"), # Use artist's installed version + CondaPackage(name="maya-vray", version=""), # Latest available + ], + channels=["my-studio-conda-channel"], + ), dcc_defaults=[ DCCSubmissionDefaults( dcc_name="maya", @@ -830,11 +1157,46 @@ else: # Example 3: Post-render script execution (on farm worker) processor = PostRenderProcessor() -outputs = processor.discover_outputs(config.expected_files) + +# Step 0: Download outputs first (required for job attachments mode) +local_files = processor.download_outputs(config) + +# Then proceed with discovery, validation, and publishing +outputs = processor.discover_outputs(local_files) validation = processor.validate_outputs(outputs, config.expected_files) if validation.is_valid: processor.transcode(outputs, transcode_profile) version_id = processor.register_version(outputs, metadata) + +# Example 4: Per-project queue override +project_settings = ProjectDeadlineCloudSettings( + default_queue_id="queue-previs", # This project uses the previs queue +) +queue_id = resolve_queue_id( + settings=server_settings, + project_settings=project_settings, +) +# Returns "queue-previs" (project override takes precedence over server default) + +# Example 5: Conda package resolution +conda_str = resolve_conda_packages( + conda_config=settings.conda_config, + dcc_context={"maya-openjd_version": "0.15"}, +) +# Returns: "maya=2026.* maya-openjd=0.15.* maya-vray" + +# Example 6: Host requirements override (GPU renders need GPU workers) +settings_with_gpu = DeadlineCloudSettings( + # ...other settings... + host_requirements=HostRequirements( + os_family="linux", + min_gpu=1, + min_gpu_memory_mib=8192, # 8 GB GPU memory minimum + ), +) +# Only os_family, min_gpu, and min_gpu_memory_mib are injected into the OJD template. +# All other host requirement fields (vcpu, memory, max_gpu, etc.) fall through +# to the Deadline Cloud Submitter's defaults. ``` ## Correctness Properties @@ -855,6 +1217,14 @@ The following properties must hold for the integration to be correct: 7. **Profile Resolution Determinism**: For the same settings and override inputs, `resolve_farm_profile` always returns the same profile. The resolution order is deterministic: explicit override > default > first. +8. **Conda Override Precedence**: When `conda_config.packages` is non-empty in AYON settings, the resolved `CondaPackages` parameter value must match the AYON-configured packages, not the native submitter's auto-detected values. Formally: `∀s: s.conda_config.packages ≠ [] ⟹ submission.conda_packages == resolve_conda_packages(s.conda_config)`. + +9. **Queue Resolution Determinism**: For the same settings, project settings, and instance override, `resolve_queue_id` always returns the same queue ID. The resolution order is deterministic: instance override > project default > server default > first available. + +10. **Output Download Precondition**: For job attachments mode, post-render processing must not begin validation or file operations until outputs have been successfully downloaded via the Deadline Cloud CLI. Formally: `∀j: j.storage_mode == "job_attachments" ⟹ download_complete(j) before discover_outputs(j)`. + +11. **Host Requirements Override Precedence**: When any field in `host_requirements` is non-None in AYON settings, the corresponding field in the OJD template's hostRequirements must match the AYON-configured value. Unset fields must not be injected (submitter defaults preserved). Formally: `∀f ∈ HostRequirements.fields: f is not None ⟹ ojd.hostRequirements[f] == settings.host_requirements[f]`. + ## Error Handling ### Error Scenario 1: Deadline Cloud Submitter Not Available From 7799e51283c0b9ec4b02030a5c86b2376dcfcb65 Mon Sep 17 00:00:00 2001 From: Johannes Oehmen Date: Tue, 17 Mar 2026 13:47:00 +0100 Subject: [PATCH 03/10] Amending Design document, validating --- .../deadline-cloud-integration/design.md | 102 ++++++++++-------- 1 file changed, 58 insertions(+), 44 deletions(-) diff --git a/.kiro/specs/deadline-cloud-integration/design.md b/.kiro/specs/deadline-cloud-integration/design.md index 81fa474..1da4987 100644 --- a/.kiro/specs/deadline-cloud-integration/design.md +++ b/.kiro/specs/deadline-cloud-integration/design.md @@ -14,7 +14,7 @@ The key architectural principle is **separation of responsibilities**: AYON owns - **Studio-configurable validations**: Pre-submission validation plugins that run before any resource-intensive operations. Includes both built-in technical validations (renderable camera exists, valid frame range) and studio-defined custom validations (required AOVs, render settings checks). - **Publishing and processing results**: Version registration in AYON, transcoding, reviewable creation, burnins, file movement/renaming via path templates. -- **Basic job dependencies**: Render → Post-render (publish) dependency chain. +- **Basic job dependencies**: Render → Post-render (publish) dependency chain within a single job using OJD step dependencies. ### Deferred (Post-MVP) @@ -45,9 +45,8 @@ graph TD subgraph "AWS Deadline Cloud" D --> H[Deadline Cloud Submitter Tool] H -->|OJD Job Bundle| I[Deadline Cloud API] - I --> J[Render Jobs] - J --> K[Post-Render Job] - I -.->|CreateMonitor| MON[Job Monitor API] + I --> J[Render Step] + J --> K[Post-Render Step] end subgraph "Post-Render Pipeline" @@ -68,6 +67,8 @@ graph TD L -.->|register versions| E ``` +> **Note on job structure**: Deadline Cloud models render and post-render as *steps within a single job*, not as separate jobs. Step dependencies (`dependencies: [dependsOn: RenderStep]`) ensure the post-render step only runs after the render step completes. This is the native Deadline Cloud pattern — the submitter creates a single OJD job template with multiple steps. The post-render step can access the render step's outputs via step-level job attachment syncing. + ## Sequence Diagrams ### Main Submission Flow @@ -97,13 +98,11 @@ sequenceDiagram Publisher->>Bridge: Submit via Deadline Cloud Bridge->>Bridge: Map AYON instances to Submitter params Bridge->>Submitter: Pre-populate settings & invoke submission - Submitter->>DC: CreateJob (render job) + Submitter->>DC: CreateJob (single job with render + post-render steps) DC-->>Submitter: job_id - Submitter->>DC: CreateJob (post-render job, depends on render) - DC-->>Submitter: post_job_id - Submitter-->>Bridge: Submission result (job IDs) + Submitter-->>Bridge: Submission result (job ID, step IDs) Bridge-->>Publisher: Submission complete - Publisher-->>Artist: Show success with job IDs + Publisher-->>Artist: Show success with job ID end ``` @@ -116,7 +115,7 @@ sequenceDiagram participant CLI as Deadline Cloud CLI participant AYON as AYON Server API - DC->>Script: Trigger post-render job (render complete) + DC->>Script: Trigger post-render step (render step complete) alt Job Attachments Mode Script->>CLI: deadline job download-output @@ -211,6 +210,14 @@ class HostRequirements(BaseSettingsModel): job settings. When set, these values are injected into the OJD template's hostRequirements section, replacing the submitter defaults. All fields are optional — only non-None values override the submitter. + + OJD mapping: + - os_family → attributes: [{name: "attr.worker.os.family", anyOf: [value]}] + - cpu_arch → attributes: [{name: "attr.worker.cpu.arch", anyOf: [value]}] + - min/max_vcpu → amounts: [{name: "amount.worker.vcpu", min/max: value}] + - min/max_memory_mib → amounts: [{name: "amount.worker.memory", min/max: value}] + - min/max_gpu → amounts: [{name: "amount.worker.gpu", min/max: value}] + - min/max_gpu_memory_mib → amounts: [{name: "amount.worker.gpu.memory", min/max: value}] """ os_family: str | None = None # "linux", "windows", "macos" cpu_arch: str | None = None # "x86_64", "arm64" @@ -220,8 +227,8 @@ class HostRequirements(BaseSettingsModel): max_memory_mib: int | None = None # Maximum memory in MiB min_gpu: int | None = None # Minimum GPU count max_gpu: int | None = None # Maximum GPU count - min_gpu_memory_mib: int | None = None # Minimum GPU memory in MiB - max_gpu_memory_mib: int | None = None # Maximum GPU memory in MiB + min_gpu_memory_mib: int | None = None # Minimum GPU memory in MiB (per-GPU lower bound) + max_gpu_memory_mib: int | None = None # Maximum GPU memory in MiB (per-GPU lower bound) class CondaPackage(BaseSettingsModel): """A single conda package specification.""" @@ -412,6 +419,7 @@ The AYON integration overrides this behavior when `conda_config.packages` is con Key override rules: - If `conda_config.packages` is non-empty, AYON builds the `CondaPackages` parameter value from settings instead of using auto-detection +- `CondaPackages` and `CondaChannels` are queue environment parameters — the default conda queue environment adds these as job parameters at submission time. The submitter populates them based on the DCC application. AYON overrides these parameter values before submission. - The Maya version always comes from AYON server settings (not auto-detected from the artist's DCC) - For `maya-openjd`, the version depends on what's installed on the artist's machine when `version="auto"` is set — this allows the adaptor version to track the artist's local installation while still being explicitly controllable - If `conda_config.channels` is non-empty, those channels override the default conda channels @@ -419,6 +427,8 @@ Key override rules: > **Integration Consideration**: This override may conflict with the native submitter's auto-detection logic. The AYON integration explicitly takes precedence. Studios should be aware that enabling conda config in AYON settings will suppress the submitter's built-in version resolution. This is documented as a known integration point that requires coordination between AYON addon updates and Deadline Cloud submitter updates. +> **Conda Version Pinning**: AWS recommends pinning to major.minor versions only (e.g., `maya=2026`, not `maya=2026.1`), because patch releases replace previous packages on the `deadline-cloud` channel. Pinning to a specific patch version will cause submissions to fail when that patch is superseded. The AYON settings UI should guide studios toward this best practice. + #### Queue Resolution Queue selection follows a priority chain: @@ -444,9 +454,9 @@ Key override rules: **Responsibilities**: - Translate AYON render instances into Deadline Cloud Submitter parameters - Pre-populate the Submitter with AYON settings before submission -- Attach post-render job configuration to the submission -- Return job IDs and status back to the AYON publish pipeline -- Support job progress monitoring via the Deadline Cloud API (`CreateMonitor`) +- Attach post-render step configuration to the submission +- Return job ID and step IDs back to the AYON publish pipeline +- Support job progress monitoring via the Deadline Cloud API (`GetJob`, `SearchSteps`, `SearchTasks`) ### Component 4: Validation Plugins (`client/plugins/validate_*.py`) @@ -488,7 +498,7 @@ class ValidateRenderSettings: ### Component 5: Post-Render Script (`client/scripts/post_render.py`) -**Purpose**: Runs as a Deadline Cloud job after rendering completes. Handles output validation, transcoding, burnin application, and version registration in AYON. +**Purpose**: Runs as a Deadline Cloud step after rendering completes (dependent step within the same job). Handles output validation, transcoding, burnin application, and version registration in AYON. **Publishing Process**: Publishing is the process where a version is registered in AYON. Beyond registration, various operations run during publishing: @@ -591,7 +601,7 @@ class SubmissionParams: ```python @dataclass class PostRenderConfig: - """Configuration passed to the post-render job.""" + """Configuration passed to the post-render step.""" ayon_project: str ayon_folder_path: str ayon_task: str @@ -619,8 +629,9 @@ class PostRenderConfig: class SubmissionResult: """Result returned after submitting to Deadline Cloud.""" success: bool - render_job_id: str | None - post_render_job_id: str | None + job_id: str | None # Single job containing both render and post-render steps + render_step_id: str | None = None + post_render_step_id: str | None = None error_message: str | None = None submitted_instances: list[str] = field(default_factory=list) ``` @@ -746,7 +757,7 @@ def submit( """Submit render instances to Deadline Cloud via the Submitter tool. Maps each instance to submission params, pre-populates the Submitter, - and invokes submission. Creates both render and post-render jobs. + and invokes submission. Creates a single job with render and post-render steps. """ ``` @@ -757,9 +768,9 @@ def submit( - Deadline Cloud Submitter tool is available and authenticated **Postconditions:** -- If successful: `result.success is True`, `result.render_job_id` and `result.post_render_job_id` are valid job IDs +- If successful: `result.success is True`, `result.job_id` is a valid job ID, `result.render_step_id` and `result.post_render_step_id` are valid step IDs within that job - If failed: `result.success is False`, `result.error_message` describes the failure -- Post-render job has a dependency on the render job (runs only after render completes) +- Post-render step has a dependency on the render step (runs only after render completes successfully) - `result.submitted_instances` lists all instance names that were submitted - No partial submissions: either all instances submit or none do @@ -885,19 +896,19 @@ def execute_submission(publisher_context, settings): # Pre-populate submitter with AYON-derived settings pre_populate_submitter(submitter, params) - # Submit render job - render_job_id = submitter.submit_job(params) - - # Submit post-render job with dependency on render job + # Submit single job with render step + post-render step + # The OJD job template contains both steps, with the post-render + # step declaring a dependency on the render step: + # dependencies: + # - dependsOn: RenderStep post_config = build_post_render_config(instance, settings) - post_job_id = submitter.submit_post_job( - post_config, depends_on=render_job_id - ) + job_result = submitter.submit_job(params, post_config) results.append(SubmissionResult( success=True, - render_job_id=render_job_id, - post_render_job_id=post_job_id, + job_id=job_result.job_id, + render_step_id=job_result.render_step_id, + post_render_step_id=job_result.post_render_step_id, submitted_instances=[instance.instance_name], )) @@ -914,7 +925,9 @@ def execute_post_render(config: PostRenderConfig, settings: PostRenderSettings): INPUT: config (PostRenderConfig from submission), settings (PostRenderSettings) OUTPUT: success (bool) - Runs as a Deadline Cloud job after rendering completes. + Runs as a dependent step within the same Deadline Cloud job, after the + render step completes. The step dependency ensures this only executes + when all render tasks have succeeded. """ # Step 1: Download render outputs (mandatory for job attachments mode) @@ -1150,8 +1163,9 @@ result = bridge.submit( settings=addon_settings, ) if result.success: - print(f"Render job: {result.render_job_id}") - print(f"Post-render job: {result.post_render_job_id}") + print(f"Job: {result.job_id}") + print(f"Render step: {result.render_step_id}") + print(f"Post-render step: {result.post_render_step_id}") else: print(f"Submission failed: {result.error_message}") @@ -1209,7 +1223,7 @@ The following properties must hold for the integration to be correct: 3. **Validation Gate**: For all render instances `i`, if any validation plugin reports failure on `i`, then `i` is never submitted to Deadline Cloud. Formally: `∀i: validation_failed(i) ⟹ ¬submitted(i)`. -4. **Post-Render Ordering**: For all post-render jobs `p` with dependency on render job `r`, `p` executes only after `r` completes successfully. Formally: `∀(r, p): depends_on(p, r) ⟹ completed(r) before started(p)`. +4. **Post-Render Ordering**: For all post-render steps `p` with dependency on render step `r` within the same job, `p` executes only after `r` completes successfully. This is enforced by OJD step dependencies (`dependencies: [dependsOn: RenderStep]`). Formally: `∀(r, p): depends_on(p, r) ⟹ completed(r) before started(p)`. 5. **Frame Completeness**: For all post-render validations, the set of discovered files must be a superset of expected files for the validation to pass. Formally: `∀v: v.is_valid ⟹ expected_files ⊆ discovered_files`. @@ -1241,15 +1255,15 @@ The following properties must hold for the integration to be correct: ### Error Scenario 3: Partial Render Failure (Missing Frames) -**Condition**: Render job completes but some frames are missing or corrupt. -**Response**: Post-render validation detects missing/corrupt files, reports the specific frames affected, and marks the Deadline Cloud job as failed. -**Recovery**: Artist can re-submit only the failed frames (if supported) or re-submit the entire job. The post-render job does not register a partial version. +**Condition**: Render step completes but some frames are missing or corrupt. +**Response**: Post-render step validation detects missing/corrupt files, reports the specific frames affected, and marks the step as failed. +**Recovery**: Artist can re-submit the job or re-queue failed tasks. The post-render step does not register a partial version. ### Error Scenario 4: AYON Server Unreachable During Post-Render **Condition**: Post-render script cannot reach the AYON server to register the version. -**Response**: Retry with exponential backoff (3 attempts, 5s/15s/45s delays). If all retries fail, mark the Deadline Cloud job as failed with the connection error. -**Recovery**: Once AYON server is back, the post-render job can be manually retried from the Deadline Cloud console. +**Response**: Retry with exponential backoff (3 attempts, 5s/15s/45s delays). If all retries fail, mark the step as failed with the connection error. +**Recovery**: Once AYON server is back, the failed step can be manually retried from the Deadline Cloud console. ### Error Scenario 5: Invalid Farm Profile Configuration @@ -1259,10 +1273,10 @@ The following properties must hold for the integration to be correct: ## Job Monitoring -Job progress can be monitored via the Deadline Cloud API using `CreateMonitor`. This enables: -- Real-time progress tracking of render jobs from within AYON -- Notification when jobs complete, fail, or require attention -- Integration with AYON's event system for automated status updates +Job progress can be monitored via: +- **Deadline Cloud Monitor**: A web-based UI created via the `CreateMonitor` API (requires IAM Identity Center setup). This is a management-level tool for viewing farms, queues, and fleets — not a per-job programmatic API. +- **Deadline Cloud API**: Programmatic job status tracking via `GetJob`, `SearchSteps`, `SearchTasks` API calls. This enables real-time progress tracking from within AYON. +- **Deadline Cloud CLI**: `deadline job get` and related commands for command-line monitoring. For MVP, monitoring is informational only — artists can check job status via the Deadline Cloud Monitor UI or the AYON Publisher. Deeper integration (automatic retries, AYON task status updates) is deferred to post-MVP. From 78ce3dca2eedec09d289d27a4e9b27c556498696 Mon Sep 17 00:00:00 2001 From: Johannes Oehmen Date: Tue, 17 Mar 2026 13:52:16 +0100 Subject: [PATCH 04/10] Amending Design document, validating, adding doc urls --- .../deadline-cloud-integration/design.md | 27 +++++++++++-------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/.kiro/specs/deadline-cloud-integration/design.md b/.kiro/specs/deadline-cloud-integration/design.md index 1da4987..2a08bef 100644 --- a/.kiro/specs/deadline-cloud-integration/design.md +++ b/.kiro/specs/deadline-cloud-integration/design.md @@ -67,7 +67,7 @@ graph TD L -.->|register versions| E ``` -> **Note on job structure**: Deadline Cloud models render and post-render as *steps within a single job*, not as separate jobs. Step dependencies (`dependencies: [dependsOn: RenderStep]`) ensure the post-render step only runs after the render step completes. This is the native Deadline Cloud pattern — the submitter creates a single OJD job template with multiple steps. The post-render step can access the render step's outputs via step-level job attachment syncing. +> **Note on job structure**: Deadline Cloud models render and post-render as *steps within a single job*, not as separate jobs. Step dependencies (`dependencies: [dependsOn: RenderStep]`) ensure the post-render step only runs after the render step completes. This is the native Deadline Cloud pattern — the submitter creates a single OJD job template with multiple steps. The post-render step can access the render step's outputs via step-level job attachment syncing. See [Step dependencies](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/build-jobs-scheduling.html), [Using files from a step in a dependent step](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/using-files-output-from-a-step-in-a-dependent-step.html), and [OJD StepTemplate schema](https://github.com/OpenJobDescription/openjd-specifications/wiki/2023-09-Template-Schemas). ## Sequence Diagrams @@ -143,7 +143,7 @@ sequenceDiagram AWS Deadline Cloud provides two options for managing input and output data: -### Option 1: Job Attachments +### Option 1: [Job Attachments](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/build-job-attachments.html) Deadline Cloud transfers data to and from Cloud Workers using S3 buckets: - **Input sync**: Scene files and assets are uploaded to S3 and synced to workers when the job starts @@ -151,9 +151,9 @@ Deadline Cloud transfers data to and from Cloud Workers using S3 buckets: - **Linux VFS mount**: On Linux workers, job attachments can be mounted as a virtual filesystem for standard file access - **Output retrieval**: The Deadline CLI provides commands to download job outputs, which can be run manually or as a scheduled CRON job - **Automatic output downloads (TBD)**: Deadline Cloud supports automatic output downloads via `deadline queue sync-output` configured as a cron job or scheduled task. This requires additional setup: dedicated long-term IAM credentials (not Deadline Cloud Monitor credentials), a storage profile with all output paths configured, and a checkpoint directory for tracking download progress. See [AWS docs: Automatic downloads](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/auto-downloads.html). The exact integration approach (whether AYON manages this configuration or defers to studio-level setup) is TBD. -- **No direct S3 access**: Render output data is encrypted and cannot be accessed directly from S3 buckets. All output retrieval must go through the Deadline Cloud CLI output download mechanism (e.g., `deadline job download-output` or `deadline queue sync-output`). This is a hard constraint of the job attachments mode. +- **No direct S3 access**: Render output data is encrypted and cannot be accessed directly from S3 buckets. All output retrieval must go through the Deadline Cloud CLI output download mechanism (e.g., [`deadline job download-output`](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/getting-output-files-from-a-job.html) or [`deadline queue sync-output`](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/auto-downloads.html)). This is a hard constraint of the job attachments mode. -### Option 2: Shared Storage (Storage Profiles) +### Option 2: [Shared Storage (Storage Profiles)](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/storage-profile-shared-file.html) Uses storage profiles to remap paths between different filesystems and platforms: - **Path remapping**: Automatically translates paths between Windows, Linux, and macOS workers @@ -419,7 +419,7 @@ The AYON integration overrides this behavior when `conda_config.packages` is con Key override rules: - If `conda_config.packages` is non-empty, AYON builds the `CondaPackages` parameter value from settings instead of using auto-detection -- `CondaPackages` and `CondaChannels` are queue environment parameters — the default conda queue environment adds these as job parameters at submission time. The submitter populates them based on the DCC application. AYON overrides these parameter values before submission. +- `CondaPackages` and `CondaChannels` are [queue environment parameters](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/create-queue-environment.html) — the default conda queue environment adds these as job parameters at submission time. The submitter populates them based on the DCC application. AYON overrides these parameter values before submission. - The Maya version always comes from AYON server settings (not auto-detected from the artist's DCC) - For `maya-openjd`, the version depends on what's installed on the artist's machine when `version="auto"` is set — this allows the adaptor version to track the artist's local installation while still being explicitly controllable - If `conda_config.channels` is non-empty, those channels override the default conda channels @@ -427,7 +427,7 @@ Key override rules: > **Integration Consideration**: This override may conflict with the native submitter's auto-detection logic. The AYON integration explicitly takes precedence. Studios should be aware that enabling conda config in AYON settings will suppress the submitter's built-in version resolution. This is documented as a known integration point that requires coordination between AYON addon updates and Deadline Cloud submitter updates. -> **Conda Version Pinning**: AWS recommends pinning to major.minor versions only (e.g., `maya=2026`, not `maya=2026.1`), because patch releases replace previous packages on the `deadline-cloud` channel. Pinning to a specific patch version will cause submissions to fail when that patch is superseded. The AYON settings UI should guide studios toward this best practice. +> **Conda Version Pinning**: AWS recommends pinning to major.minor versions only (e.g., `maya=2026`, not `maya=2026.1`), because patch releases replace previous packages on the `deadline-cloud` channel. Pinning to a specific patch version will cause submissions to fail when that patch is superseded. The AYON settings UI should guide studios toward this best practice. See [Default conda queue environment](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/create-queue-environment.html) for the full list of available packages and pinning guidance. #### Queue Resolution @@ -441,12 +441,12 @@ This allows studios to define all available queues centrally, set a global defau #### Host Requirements Override Behavior -The native Deadline Cloud Submitter exposes host requirements in its job settings UI (OS family, vCPU, memory, GPU). The AYON integration allows studios to override these from server settings via `HostRequirements`. +The native Deadline Cloud Submitter exposes host requirements in its [Host requirements tab](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/jobs-using-submitter.html) (OS family, vCPU, memory, GPU). The AYON integration allows studios to override these from server settings via `HostRequirements`. Key override rules: - Only non-None fields in `HostRequirements` override the submitter's values — unset fields preserve the submitter's defaults or artist's manual selections - This is a partial override model: studios can pin OS family and GPU requirements while leaving CPU/memory to the submitter defaults -- Host requirements are injected into the OJD template's `hostRequirements` section before submission +- Host requirements are injected into the OJD template's [`hostRequirements` section](https://github.com/OpenJobDescription/openjd-specifications/wiki/2023-09-Template-Schemas) before submission. The OJD spec defines standard amount capabilities (`amount.worker.vcpu`, `amount.worker.memory`, `amount.worker.gpu`, `amount.worker.gpu.memory`) and attribute capabilities (`attr.worker.os.family`). See [Schedule jobs — Determine fleet compatibility](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/build-jobs-scheduling.html) for how host requirements interact with fleet capabilities. - If all fields are None (default), the submitter's host requirements are preserved entirely (backward compatible) > **Integration Consideration**: Host requirements interact with Deadline Cloud's fleet configuration. Studios should ensure that the configured requirements match available fleet capacity — e.g., requesting GPU workers when no GPU fleet is provisioned will cause jobs to remain queued indefinitely. @@ -498,7 +498,12 @@ class ValidateRenderSettings: ### Component 5: Post-Render Script (`client/scripts/post_render.py`) -**Purpose**: Runs as a Deadline Cloud step after rendering completes (dependent step within the same job). Handles output validation, transcoding, burnin application, and version registration in AYON. +> **TBD — Execution Model**: It is to be discussed whether the post-render processing should run as a dependent step within the Deadline Cloud job (on a farm worker) or be executed locally within the AYON pipeline (on the artist's workstation or a dedicated processing machine). Key trade-offs: +> - **Farm step**: Runs close to rendered data (especially with job attachments), scales with farm capacity, but requires AYON server access from farm workers and complicates credential management. +> - **Local AYON pipeline**: Keeps all AYON logic local, simpler credential handling, but requires downloading all rendered outputs first and doesn't leverage farm compute for transcoding/burnins. +> This decision affects the architecture of the post-render pipeline and how outputs are accessed. The current design documents both paths. + +**Purpose**: Handles output validation, transcoding, burnin application, and version registration in AYON after rendering completes. May run as a Deadline Cloud dependent step (on farm) or locally within the AYON pipeline (see TBD above). **Publishing Process**: Publishing is the process where a version is registered in AYON. Beyond registration, various operations run during publishing: @@ -1274,8 +1279,8 @@ The following properties must hold for the integration to be correct: ## Job Monitoring Job progress can be monitored via: -- **Deadline Cloud Monitor**: A web-based UI created via the `CreateMonitor` API (requires IAM Identity Center setup). This is a management-level tool for viewing farms, queues, and fleets — not a per-job programmatic API. -- **Deadline Cloud API**: Programmatic job status tracking via `GetJob`, `SearchSteps`, `SearchTasks` API calls. This enables real-time progress tracking from within AYON. +- **Deadline Cloud Monitor**: A web-based UI created via the [`CreateMonitor` API](https://docs.aws.amazon.com/deadline-cloud/latest/APIReference/API_CreateMonitor.html) (requires IAM Identity Center setup). This is a management-level tool for viewing farms, queues, and fleets — not a per-job programmatic API. +- **Deadline Cloud API**: Programmatic job status tracking via [`GetJob`](https://docs.aws.amazon.com/deadline-cloud/latest/APIReference/API_GetJob.html), [`SearchSteps`](https://docs.aws.amazon.com/deadline-cloud/latest/APIReference/API_SearchSteps.html), [`SearchTasks`](https://docs.aws.amazon.com/deadline-cloud/latest/APIReference/API_SearchTasks.html) API calls. This enables real-time progress tracking from within AYON. - **Deadline Cloud CLI**: `deadline job get` and related commands for command-line monitoring. For MVP, monitoring is informational only — artists can check job status via the Deadline Cloud Monitor UI or the AYON Publisher. Deeper integration (automatic retries, AYON task status updates) is deferred to post-MVP. From b52b4292398954de4cb8b9a202f1797a43e10d8a Mon Sep 17 00:00:00 2001 From: Johannes Oehmen Date: Fri, 10 Apr 2026 11:15:18 +0200 Subject: [PATCH 05/10] Add personal Kiro settings to .gitignore --- .gitignore | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/.gitignore b/.gitignore index 8fa135f..33302f2 100644 --- a/.gitignore +++ b/.gitignore @@ -165,3 +165,7 @@ Temporary Items # poetry and uv locks poetry.lock uv.lock + +# Kiro personal settings +.kiro/settings/ +.kiro/steering/ From deed0b44b319c0e67ed2d677ee11b51c868ec910 Mon Sep 17 00:00:00 2001 From: Johannes Oehmen Date: Mon, 13 Apr 2026 11:37:52 +0200 Subject: [PATCH 06/10] Resolve post-render execution model: MVP on-prem, target state farm-side via OpenJD - Replace TBD in Component 5 with resolved decision and trade-off analysis - Update architecture diagram to show MVP on-prem post-render flow - Update sequence diagrams for render-only job submission - Add fileshare access analysis by fleet type (CMF, SMF via VPC Lattice) - Add post-MVP target state: OpenJD templates for portable post-render - Simplify SubmissionResult for render-only MVP - Move farm-side post-render to Deferred scope --- .../deadline-cloud-integration/design.md | 84 +++++++++++-------- 1 file changed, 51 insertions(+), 33 deletions(-) diff --git a/.kiro/specs/deadline-cloud-integration/design.md b/.kiro/specs/deadline-cloud-integration/design.md index 2a08bef..b280503 100644 --- a/.kiro/specs/deadline-cloud-integration/design.md +++ b/.kiro/specs/deadline-cloud-integration/design.md @@ -14,10 +14,11 @@ The key architectural principle is **separation of responsibilities**: AYON owns - **Studio-configurable validations**: Pre-submission validation plugins that run before any resource-intensive operations. Includes both built-in technical validations (renderable camera exists, valid frame range) and studio-defined custom validations (required AOVs, render settings checks). - **Publishing and processing results**: Version registration in AYON, transcoding, reviewable creation, burnins, file movement/renaming via path templates. -- **Basic job dependencies**: Render → Post-render (publish) dependency chain within a single job using OJD step dependencies. +- **Post-render publishing on-prem**: Rendered outputs are downloaded to the studio (via auto-download cron or AYON service), and the publish pipeline runs locally on a dedicated machine with filesystem access. ### Deferred (Post-MVP) +- **Farm-side post-render**: Running the publish pipeline on Deadline Cloud workers as OpenJD template steps. Requires a lightweight headless publish runner (without full AYON launcher/Qt dependencies) and fileshare access from workers. See [Post-Render Execution Model](#post-render-execution-model) for the full trade-off analysis. - **Advanced project tracking**: Higher-level job dependencies beyond render→publish, priority management across assets, planning integration, task status updates. - **Asset-level dependencies**: Complex dependency graphs like "create render archives → render images → publish". - **Cross-job orchestration**: Managing priorities and dependencies across multiple submissions. @@ -45,15 +46,14 @@ graph TD subgraph "AWS Deadline Cloud" D --> H[Deadline Cloud Submitter Tool] H -->|OJD Job Bundle| I[Deadline Cloud API] - I --> J[Render Step] - J --> K[Post-Render Step] + I --> J[Render Job] end - subgraph "Post-Render Pipeline" - K --> L[AYON Post-Render Script] - L --> M[Output Validation] - M --> N[Transcoding / Burnins] - N --> O[Version Registration in AYON] + subgraph "On-Prem Post-Render Pipeline (MVP)" + P1[Auto-Download / AYON Service] --> P2[Output Discovery & Validation] + P2 --> P3[File Movement / Renaming] + P3 --> P4[Transcoding / Burnins] + P4 --> P5[Version Registration in AYON] end A -.->|fetch settings| E @@ -62,12 +62,14 @@ graph TD J -->|read inputs| S2 J -->|write outputs| S1 J -->|write outputs| S2 - L -->|access outputs| S1 - L -->|access outputs| S2 - L -.->|register versions| E + S1 -->|download outputs| P1 + S2 -->|access outputs| P1 + P5 -.->|register versions| E ``` -> **Note on job structure**: Deadline Cloud models render and post-render as *steps within a single job*, not as separate jobs. Step dependencies (`dependencies: [dependsOn: RenderStep]`) ensure the post-render step only runs after the render step completes. This is the native Deadline Cloud pattern — the submitter creates a single OJD job template with multiple steps. The post-render step can access the render step's outputs via step-level job attachment syncing. See [Step dependencies](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/build-jobs-scheduling.html), [Using files from a step in a dependent step](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/using-files-output-from-a-step-in-a-dependent-step.html), and [OJD StepTemplate schema](https://github.com/OpenJobDescription/openjd-specifications/wiki/2023-09-Template-Schemas). +> **Note on MVP job structure**: For MVP, Deadline Cloud handles rendering only. The job submitted is a render-only job — there is no post-render step within the Deadline Cloud job. Post-render publishing runs on-prem, triggered by output availability (via auto-download cron or AYON service). This keeps the Deadline Cloud integration focused on rendering while AYON handles the full publish pipeline locally. +> +> **Target state (post-MVP)**: Post-render steps defined as [OpenJD templates](https://github.com/OpenJobDescription/openjd-specifications/wiki/2023-09-Template-Schemas) that can run either as dependent steps within the Deadline Cloud job (on farm workers) or on-prem via the [OpenJD runtime](https://github.com/OpenJobDescription/openjd-sessions). This gives customers the choice based on their infrastructure (cloud-native vs on-prem vs hybrid). See [Post-Render Execution Model](#post-render-execution-model) for details. ## Sequence Diagrams @@ -98,7 +100,7 @@ sequenceDiagram Publisher->>Bridge: Submit via Deadline Cloud Bridge->>Bridge: Map AYON instances to Submitter params Bridge->>Submitter: Pre-populate settings & invoke submission - Submitter->>DC: CreateJob (single job with render + post-render steps) + Submitter->>DC: CreateJob (render-only job) DC-->>Submitter: job_id Submitter-->>Bridge: Submission result (job ID, step IDs) Bridge-->>Publisher: Submission complete @@ -106,36 +108,31 @@ sequenceDiagram end ``` -### Post-Render Publishing Flow +### Post-Render Publishing Flow (MVP — On-Prem) ```mermaid sequenceDiagram participant DC as Deadline Cloud - participant Script as Post-Render Script - participant CLI as Deadline Cloud CLI + participant DL as Auto-Download / AYON Service + participant Script as Post-Render Script (On-Prem) participant AYON as AYON Server API - DC->>Script: Trigger post-render step (render step complete) - - alt Job Attachments Mode - Script->>CLI: deadline job download-output - CLI-->>Script: Downloaded output files to local path - else Shared Storage Mode - Script->>Script: Access outputs directly via remapped paths - end + DC->>DC: Render job completes + DL->>DC: Download outputs (deadline queue sync-output / manual) + DC-->>DL: Rendered output files + DL->>Script: Trigger publish pipeline (outputs available locally) Script->>Script: Discover rendered output files Script->>Script: Validate outputs (frame completeness, file integrity) alt Validation Failed - Script->>DC: Report failure + Script->>Script: Report failure else Validation Passed Script->>Script: Move/rename files via path templates Script->>Script: Run transcoding (if configured) Script->>Script: Apply burnins (if configured) Script->>AYON: Register version (files, metadata) AYON-->>Script: Version registered - Script->>DC: Report success end ``` @@ -498,10 +495,33 @@ class ValidateRenderSettings: ### Component 5: Post-Render Script (`client/scripts/post_render.py`) -> **TBD — Execution Model**: It is to be discussed whether the post-render processing should run as a dependent step within the Deadline Cloud job (on a farm worker) or be executed locally within the AYON pipeline (on the artist's workstation or a dedicated processing machine). Key trade-offs: -> - **Farm step**: Runs close to rendered data (especially with job attachments), scales with farm capacity, but requires AYON server access from farm workers and complicates credential management. -> - **Local AYON pipeline**: Keeps all AYON logic local, simpler credential handling, but requires downloading all rendered outputs first and doesn't leverage farm compute for transcoding/burnins. -> This decision affects the architecture of the post-render pipeline and how outputs are accessed. The current design documents both paths. +#### Post-Render Execution Model + +**MVP: On-prem post-render.** Rendered outputs are downloaded to the studio via auto-download (`deadline queue sync-output` as a cron job) or an AYON service. The publish pipeline runs locally on a dedicated machine with access to the studio filesystem. This is the simplest path — it avoids cloud packaging, worker-to-AYON connectivity, and fileshare access complexity. + +**Target state (post-MVP): Farm-side post-render via OpenJD templates.** Post-render steps are defined as OpenJD templates that can run as dependent steps on Deadline Cloud workers or on-prem via the OpenJD runtime (`openjd-sessions`). This gives customers the choice based on their infrastructure. The same templates work in both environments — the "where does it run?" question becomes a deployment choice per customer, not a development decision. + +**Trade-offs:** + +| | On-prem (MVP) | On-farm (post-MVP) | +|---|---|---| +| Compute | Dedicated studio machine | Farm workers (scales with farm) | +| Fileshare access | Direct (local filesystem) | Depends on fleet type (see below) | +| AYON dependencies | Already available locally | Must be packaged (conda, host config, AMI) | +| Latency | Download first, then process | Processing starts after render | +| Setup complexity | AYON service + auto-download | OpenJD templates + dependency packaging | + +**Fileshare access by fleet type (relevant for farm-side post-render):** +- **CMF on-prem**: Workers already on studio network, direct access +- **CMF on EC2**: Workers in customer VPC, reach on-prem via VPN/Direct Connect +- **SMF**: Workers can access customer VPC resources (NFS, fileshares) via [VPC Lattice resource endpoints](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/smf-vpc.html) +- **Cloud-native storage**: A synchronized fileshare available in the cloud (e.g., FSx) works directly + +**Prerequisites for farm-side post-render:** +- OpenJD templates defining the post-render steps +- A lightweight headless publish runner (subset of ayon-core, without full launcher/Qt dependencies) +- AYON dependencies available on workers (via conda packages, host configuration scripts, or pre-baked AMIs) +- Network connectivity from workers to the AYON server API **Purpose**: Handles output validation, transcoding, burnin application, and version registration in AYON after rendering completes. May run as a Deadline Cloud dependent step (on farm) or locally within the AYON pipeline (see TBD above). @@ -634,9 +654,7 @@ class PostRenderConfig: class SubmissionResult: """Result returned after submitting to Deadline Cloud.""" success: bool - job_id: str | None # Single job containing both render and post-render steps - render_step_id: str | None = None - post_render_step_id: str | None = None + job_id: str | None # Render job ID error_message: str | None = None submitted_instances: list[str] = field(default_factory=list) ``` From 7349582b088e96416fba5c56e6b88ed5add9e967 Mon Sep 17 00:00:00 2001 From: Johannes Oehmen Date: Mon, 20 Apr 2026 19:18:11 +0200 Subject: [PATCH 07/10] design: per-DCC and per-project conda package configuration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Replace flat CondaConfig with per-DCC DCCCondaConfig model - Add project-level conda overrides to ProjectDeadlineCloudSettings - Support custom conda packages with explicit S3 channel configuration - Add Publisher UI visibility for editable conda packages on render instances - Update resolve_conda_packages algorithm for project → server → auto-detection chain - Add ValidateCondaPackages validation plugin and correctness property - Update examples, tests, and formal specs to reflect new model --- .../deadline-cloud-integration/design.md | 232 ++++++++++++++---- 1 file changed, 184 insertions(+), 48 deletions(-) diff --git a/.kiro/specs/deadline-cloud-integration/design.md b/.kiro/specs/deadline-cloud-integration/design.md index b280503..6921c86 100644 --- a/.kiro/specs/deadline-cloud-integration/design.md +++ b/.kiro/specs/deadline-cloud-integration/design.md @@ -232,15 +232,36 @@ class CondaPackage(BaseSettingsModel): name: str # e.g., "maya", "maya-openjd", "maya-vray" version: str # Explicit version spec (e.g., "2026.*"), or "auto" to use installed version from artist machine, or "" for latest -class CondaConfig(BaseSettingsModel): - """Conda package and channel configuration for farm workers. +class DCCCondaConfig(BaseSettingsModel): + """Per-DCC conda package and channel configuration for farm workers. + + Each DCC has its own combination of conda packages (DCC app, OpenJD + adaptor, renderer, etc.). This model defines the packages for a + single DCC application. Overrides the default auto-detection behavior of the Deadline Cloud DCC submitters (e.g., deadline-cloud-for-maya), allowing studios to pin specific package versions from AYON server settings. + + Override model (follows AYON's standard settings override pattern): + - project settings → server settings → auto-detection + - Auto-detection is the default out of the box. If no settings are + configured at any level, the native submitter behavior is preserved. + - TDs only need to configure when they want explicit control. + + Channel behavior: + - By default the `deadline-cloud` channel is used implicitly by the + native submitter. + - As soon as custom_packages are added, channels must be defined + explicitly — either both `deadline-cloud` and the custom S3 channel + (e.g., `s3://my-studio-conda-123456789-us-west-2/Conda/Default`) + if standard AWS packages are still needed, or only the S3 channel + if everything including the adaptor is packaged custom. """ - packages: list[CondaPackage] = [] # e.g., [{"name": "maya", "version": "2026.*"}, {"name": "maya-openjd", "version": "auto"}, {"name": "maya-vray", "version": ""}] - channels: list[str] = [] # Custom conda channels (overrides default channels if non-empty) + dcc_name: str # "maya", "houdini", etc. + packages: list[CondaPackage] = [] # Standard DCC packages, e.g., [{"name": "maya", "version": "2026.*"}, {"name": "maya-openjd", "version": "auto"}, {"name": "maya-vray", "version": ""}] + custom_packages: list[CondaPackage] = [] # Studio-specific custom conda packages (proprietary tools, plugins, internal libraries) + channels: list[str] = [] # Conda channels. Empty = use default `deadline-cloud` channel implicitly. Must be set explicitly when custom_packages are used. class QueueConfig(BaseSettingsModel): """A named Deadline Cloud queue.""" @@ -255,7 +276,7 @@ class DeadlineCloudSettings(BaseSettingsModel): default_profile: str = "" available_queues: list[QueueConfig] = [] # All available queues defined at server level default_queue_id: str = "" # Server-level default queue - conda_config: CondaConfig = CondaConfig() # Conda package/channel overrides + dcc_conda_configs: list[DCCCondaConfig] = [] # Per-DCC conda package/channel configuration host_requirements: HostRequirements = HostRequirements() # Worker host hardware/OS overrides dcc_defaults: list[DCCSubmissionDefaults] = [] post_render: PostRenderSettings = PostRenderSettings() @@ -263,8 +284,14 @@ class DeadlineCloudSettings(BaseSettingsModel): auto_detect_credentials: bool = True class ProjectDeadlineCloudSettings(BaseSettingsModel): - """Per-project overrides for Deadline Cloud settings.""" + """Per-project overrides for Deadline Cloud settings. + + Follows AYON's standard settings override model: studio defaults + apply everywhere, projects only override when needed. Empty/default + values inherit from server settings. + """ default_queue_id: str = "" # Project-level override; empty = use server default + dcc_conda_configs: list[DCCCondaConfig] = [] # Project-level per-DCC conda overrides; empty = use server defaults class CustomValidation(BaseSettingsModel): """Studio-configurable validation rule.""" @@ -345,7 +372,7 @@ class SubmitterBridge: `CondaPackages` and `RezPackages` shared parameter values that are passed to `SubmitJobToDeadlineDialog`. - If settings.conda_config.packages is non-empty, those packages + If a DCCCondaConfig exists for the active DCC, those packages replace the auto-detected values (e.g., the default `conda_packages = f"maya={maya_version}.* maya-openjd={adaptor_version}.*"` from deadline-cloud-for-maya). @@ -357,13 +384,18 @@ class SubmitterBridge: def _resolve_conda_packages( self, - conda_config: CondaConfig, + dcc_name: str, + server_settings: DeadlineCloudSettings, + project_settings: ProjectDeadlineCloudSettings | None, dcc_context: dict[str, Any], ) -> str: - """Resolve conda package string from AYON settings. + """Resolve conda package string from AYON settings for a specific DCC. + + Looks up the DCCCondaConfig for the active DCC, following the + override chain: project → server → auto-detection. Builds the conda package specification string by combining - AYON-configured packages with version resolution: + standard packages and custom packages with version resolution: - Explicit versions are used as-is (e.g., "maya=2026.*") - "auto" versions are resolved from the artist's installed DCC - Empty versions use latest (e.g., "maya-vray") @@ -412,17 +444,36 @@ The native Deadline Cloud DCC submitters (e.g., `deadline-cloud-for-maya`) auto- conda_packages = f"maya={maya_version}.* maya-openjd={adaptor_version}.*" ``` -The AYON integration overrides this behavior when `conda_config.packages` is configured in server settings. AYON's settings take precedence over the auto-detected values. This is an intentional design decision — studios need version pinning control for reproducibility and stability on the farm. +The AYON integration overrides this behavior when a `DCCCondaConfig` is configured for the active DCC in server or project settings. AYON's settings take precedence over the auto-detected values. This is an intentional design decision — studios need version pinning control for reproducibility and stability on the farm. + +Conda packages are defined **per DCC**. Each DCC has its own combination of packages — Maya needs `maya`, `maya-openjd`, and a renderer package like `maya-vray`; Houdini would need `houdini`, `houdini-openjd`, etc. The settings structure reflects this so TDs configure packages for each DCC independently. + +**Override resolution chain** (follows AYON's standard [settings override model](https://help.ayon.app/help/articles/8317800-working-with-settings)): + +**project `dcc_conda_configs` → server `dcc_conda_configs` → auto-detection** + +Auto-detection is the default out of the box. If no settings are configured at any level, the native submitter behavior is preserved. Studios can start using the integration without configuring any conda settings. TDs only step in to pin versions when they need explicit control. A TD sets `maya-vray=*` (latest) at the studio level once. If a specific project needs to stay on VRay 6.x, they override just that project. No need to configure every project individually. Key override rules: -- If `conda_config.packages` is non-empty, AYON builds the `CondaPackages` parameter value from settings instead of using auto-detection +- If a `DCCCondaConfig` exists for the active DCC (at project or server level), AYON builds the `CondaPackages` parameter value from settings instead of using auto-detection +- Project-level `dcc_conda_configs` take precedence over server-level for the same DCC - `CondaPackages` and `CondaChannels` are [queue environment parameters](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/create-queue-environment.html) — the default conda queue environment adds these as job parameters at submission time. The submitter populates them based on the DCC application. AYON overrides these parameter values before submission. -- The Maya version always comes from AYON server settings (not auto-detected from the artist's DCC) -- For `maya-openjd`, the version depends on what's installed on the artist's machine when `version="auto"` is set — this allows the adaptor version to track the artist's local installation while still being explicitly controllable -- If `conda_config.channels` is non-empty, those channels override the default conda channels -- If `conda_config` is empty/default, the native submitter's auto-detection behavior is preserved (backward compatible) +- For packages with `version="auto"`, the version depends on what's installed on the artist's machine — this allows the adaptor version to track the artist's local installation while still being explicitly controllable +- If no `DCCCondaConfig` exists for the active DCC at any level, the native submitter's auto-detection behavior is preserved (backward compatible) + +**Custom conda packages and channels:** -> **Integration Consideration**: This override may conflict with the native submitter's auto-detection logic. The AYON integration explicitly takes precedence. Studios should be aware that enabling conda config in AYON settings will suppress the submitter's built-in version resolution. This is documented as a known integration point that requires coordination between AYON addon updates and Deadline Cloud submitter updates. +The `custom_packages` field on `DCCCondaConfig` allows TDs to add studio-specific conda packages beyond the standard DCC/renderer/adaptor set (proprietary tools, custom plugins, internal libraries). + +When custom packages are used, channels must be defined explicitly. By default the `deadline-cloud` channel is used implicitly by the native submitter. As soon as custom packages are added, the `channels` field must be set — either both `deadline-cloud` and the custom S3 channel (e.g., `s3://my-studio-conda-123456789-us-west-2/Conda/Default`) if standard AWS packages are still needed, or only the S3 channel if everything including the adaptor is packaged custom. + +**Publisher UI visibility:** + +The resolved conda packages (from the override chain) are displayed as editable fields on the render instance in the AYON Publisher UI. This lets artists adjust packages before submitting (e.g., testing a different renderer version) without needing TD access to server settings. Whatever they set still goes through the publish validation step before submission, so invalid or unsupported packages are caught before reaching the farm. Central conda config acts as a form of human validation — TDs explicitly define what runs on the farm rather than relying on auto-detection. + +> **Current implementation note**: PR #5 implements a simpler version of this — an "Extra Conda Packages" text field that appends to the resolved packages. The full design calls for displaying and editing the complete resolved package list, not just appending extras. The current auto-detection logic in `environment.py` is already per-host (`_get_conda_pkgs_for_maya`), which aligns with the per-DCC `DCCCondaConfig` model. The implementation needs to evolve from flat string settings to the structured per-DCC model with project-level overrides. + +> **Integration Consideration**: Enabling per-DCC conda config in AYON settings will suppress the native submitter's built-in version resolution for that DCC. Studios should be aware that this requires coordination between AYON addon updates and Deadline Cloud submitter updates. > **Conda Version Pinning**: AWS recommends pinning to major.minor versions only (e.g., `maya=2026`, not `maya=2026.1`), because patch releases replace previous packages on the `deadline-cloud` channel. Pinning to a specific patch version will cause submissions to fail when that patch is superseded. The AYON settings UI should guide studios toward this best practice. See [Default conda queue environment](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/create-queue-environment.html) for the full list of available packages and pinning guidance. @@ -484,6 +535,15 @@ class ValidateRenderElements: class ValidateRenderSettings: """Studio-configurable: Enforce specific render settings (resolution, sampling, etc.).""" def process(self, instance: CollectedRenderInstance) -> None: ... + +class ValidateCondaPackages: + """Validate resolved conda packages before submission. + + Checks that the conda package string is well-formed and that + channels are defined when custom packages are present. This + validation also runs when artists edit packages in the Publisher UI. + """ + def process(self, instance: CollectedRenderInstance) -> None: ... ``` **Responsibilities**: @@ -762,8 +822,8 @@ def map_instance_to_params( - `result.queue_id` follows the resolution priority: instance override > project default > server default > first available - `result.frame_range` is formatted as DC-compatible string (e.g., "1-100") - `result.post_render_config` contains all data needed for post-render publishing -- If `settings.conda_config.packages` is non-empty, `result.conda_packages` is the resolved conda string from AYON settings (not auto-detected) -- If `settings.conda_config.packages` is empty, `result.conda_packages` is None (native auto-detection preserved) +- If a `DCCCondaConfig` exists for the active DCC (at project or server level), `result.conda_packages` is the resolved conda string from AYON settings (not auto-detected) +- If no `DCCCondaConfig` exists for the active DCC at any level, `result.conda_packages` is None (native auto-detection preserved) - If any field in `settings.host_requirements` is non-None, `result.host_requirements` contains only those fields; otherwise `result.host_requirements` is None (submitter defaults preserved) - No side effects on `instance` or `settings` @@ -876,13 +936,18 @@ def execute_submission(publisher_context, settings): assert profile is not None, "No valid farm profile found" # Step 1b: Resolve conda packages from AYON settings (overrides auto-detection) - conda_packages = resolve_conda_packages( - settings.conda_config, + # Resolved per-DCC: looks up DCCCondaConfig for the active DCC, + # project settings override server settings, auto-detection is fallback. + dcc_name = get_active_dcc_name() # e.g., "maya", "houdini" + project_settings = get_project_settings(publisher_context.project_name) + conda_packages, conda_channels = resolve_conda_packages( + dcc_name=dcc_name, + server_settings=settings, + project_settings=project_settings, dcc_context=get_dcc_context(), # Artist's installed DCC/adaptor versions ) # Step 1c: Resolve queue ID (instance override > project > server > first available) - project_settings = get_project_settings(publisher_context.project_name) queue_id = resolve_queue_id(settings, project_settings) # Step 1d: Resolve host requirements from AYON settings @@ -905,7 +970,7 @@ def execute_submission(publisher_context, settings): # Apply conda package override if configured if conda_packages: params.conda_packages = conda_packages - params.conda_channels = settings.conda_config.channels or None + params.conda_channels = conda_channels or None # Apply host requirements override if configured if host_requirements: params.host_requirements = host_requirements @@ -1081,24 +1146,41 @@ def resolve_queue_id( ```python def resolve_conda_packages( - conda_config: CondaConfig, + dcc_name: str, + server_settings: DeadlineCloudSettings, + project_settings: ProjectDeadlineCloudSettings | None, dcc_context: dict[str, Any], -) -> str: +) -> tuple[str, list[str]]: """ - ALGORITHM: Resolve conda package string from AYON settings. - INPUT: conda_config (from server settings), dcc_context (artist's DCC environment info) - OUTPUT: conda_packages_str (space-separated package spec string) + ALGORITHM: Resolve conda package string and channels from AYON settings for a specific DCC. + INPUT: dcc_name (active DCC), server_settings, project_settings (per-project overrides), dcc_context (artist's DCC environment info) + OUTPUT: (conda_packages_str, channels) — space-separated package spec string and list of channels - If conda_config.packages is empty, returns empty string (native submitter - auto-detection is preserved). Otherwise, builds the package string from - AYON settings, overriding the submitter's auto-detected values. + Override chain: project dcc_conda_configs → server dcc_conda_configs → auto-detection + If no DCCCondaConfig exists for the active DCC at any level, returns empty + (native submitter auto-detection is preserved). """ - if not conda_config.packages: - return "" # No override — let native submitter auto-detect + # Step 1: Find DCCCondaConfig for this DCC, project level first + dcc_config = None + if project_settings: + dcc_config = next( + (c for c in project_settings.dcc_conda_configs if c.dcc_name == dcc_name), + None, + ) + if dcc_config is None: + dcc_config = next( + (c for c in server_settings.dcc_conda_configs if c.dcc_name == dcc_name), + None, + ) + + if dcc_config is None or (not dcc_config.packages and not dcc_config.custom_packages): + return ("", []) # No override — let native submitter auto-detect + # Step 2: Build package string from standard + custom packages + all_packages = list(dcc_config.packages) + list(dcc_config.custom_packages) parts = [] - for pkg in conda_config.packages: + for pkg in all_packages: if pkg.version == "auto": # Resolve version from artist's installed DCC/adaptor installed_version = dcc_context.get(f"{pkg.name}_version", "") @@ -1111,7 +1193,12 @@ def resolve_conda_packages( else: parts.append(pkg.name) # Empty version = latest - return " ".join(parts) + # Step 3: Resolve channels + # Default `deadline-cloud` channel is implicit when no custom packages exist. + # When custom packages are present, channels must be explicit. + channels = dcc_config.channels if dcc_config.channels else [] + + return (" ".join(parts), channels) ``` ## Example Usage @@ -1151,14 +1238,16 @@ settings = DeadlineCloudSettings( ), ], default_queue_id="queue-xyz789", - conda_config=CondaConfig( - packages=[ - CondaPackage(name="maya", version="2026.*"), - CondaPackage(name="maya-openjd", version="auto"), # Use artist's installed version - CondaPackage(name="maya-vray", version=""), # Latest available - ], - channels=["my-studio-conda-channel"], - ), + dcc_conda_configs=[ + DCCCondaConfig( + dcc_name="maya", + packages=[ + CondaPackage(name="maya", version="2026.*"), + CondaPackage(name="maya-openjd", version="auto"), # Use artist's installed version + CondaPackage(name="maya-vray", version=""), # Latest available + ], + ), + ], dcc_defaults=[ DCCSubmissionDefaults( dcc_name="maya", @@ -1215,12 +1304,55 @@ queue_id = resolve_queue_id( ) # Returns "queue-previs" (project override takes precedence over server default) -# Example 5: Conda package resolution -conda_str = resolve_conda_packages( - conda_config=settings.conda_config, +# Example 5: Per-DCC conda package resolution +conda_str, channels = resolve_conda_packages( + dcc_name="maya", + server_settings=server_settings, + project_settings=None, # No project override — uses server defaults + dcc_context={"maya-openjd_version": "0.15"}, +) +# Returns: ("maya=2026.* maya-openjd=0.15.* maya-vray", []) + +# Example 5b: Per-project conda override (project A pins VRay 6.x for Maya 2024) +project_a_settings = ProjectDeadlineCloudSettings( + default_queue_id="queue-previs", + dcc_conda_configs=[ + DCCCondaConfig( + dcc_name="maya", + packages=[ + CondaPackage(name="maya", version="2024.*"), + CondaPackage(name="maya-openjd", version="auto"), + CondaPackage(name="maya-vray", version="6.*"), + ], + ), + ], +) +conda_str, channels = resolve_conda_packages( + dcc_name="maya", + server_settings=server_settings, + project_settings=project_a_settings, dcc_context={"maya-openjd_version": "0.15"}, ) -# Returns: "maya=2026.* maya-openjd=0.15.* maya-vray" +# Returns: ("maya=2024.* maya-openjd=0.15.* maya-vray=6.*", []) +# Project override takes precedence over server default (maya=2026.*) + +# Example 5c: Custom conda packages with custom S3 channel +custom_config = DCCCondaConfig( + dcc_name="maya", + packages=[ + CondaPackage(name="maya", version="2026.*"), + CondaPackage(name="maya-openjd", version="auto"), + ], + custom_packages=[ + CondaPackage(name="my-studio-maya-tools", version="1.2.*"), + ], + channels=[ + "deadline-cloud", # Still need standard packages (maya, maya-openjd) + "s3://my-studio-conda-123456789-us-west-2/Conda/Default", + ], +) +# Both channels required: deadline-cloud for standard packages, +# S3 channel for custom my-studio-maya-tools package # Example 6: Host requirements override (GPU renders need GPU workers) settings_with_gpu = DeadlineCloudSettings( @@ -1254,7 +1386,7 @@ The following properties must hold for the integration to be correct: 7. **Profile Resolution Determinism**: For the same settings and override inputs, `resolve_farm_profile` always returns the same profile. The resolution order is deterministic: explicit override > default > first. -8. **Conda Override Precedence**: When `conda_config.packages` is non-empty in AYON settings, the resolved `CondaPackages` parameter value must match the AYON-configured packages, not the native submitter's auto-detected values. Formally: `∀s: s.conda_config.packages ≠ [] ⟹ submission.conda_packages == resolve_conda_packages(s.conda_config)`. +8. **Conda Override Precedence**: When a `DCCCondaConfig` exists for the active DCC, the resolved `CondaPackages` parameter value must match the AYON-configured packages, not the native submitter's auto-detected values. Project-level config takes precedence over server-level for the same DCC. Formally: `∀dcc, s: dcc_conda_config(dcc, s) ≠ empty ⟹ submission.conda_packages == resolve_conda_packages(dcc, s)`, where `dcc_conda_config` resolves project → server → empty. 9. **Queue Resolution Determinism**: For the same settings, project settings, and instance override, `resolve_queue_id` always returns the same queue ID. The resolution order is deterministic: instance override > project default > server default > first available. @@ -1262,6 +1394,8 @@ The following properties must hold for the integration to be correct: 11. **Host Requirements Override Precedence**: When any field in `host_requirements` is non-None in AYON settings, the corresponding field in the OJD template's hostRequirements must match the AYON-configured value. Unset fields must not be injected (submitter defaults preserved). Formally: `∀f ∈ HostRequirements.fields: f is not None ⟹ ojd.hostRequirements[f] == settings.host_requirements[f]`. +12. **Conda Package Validation Gate**: For all submissions where conda packages are resolved (from settings or artist edits in Publisher UI), the packages must pass validation before submission. Invalid package specs or missing channels when custom packages are present must block submission. Formally: `∀s: conda_validation_failed(s) ⟹ ¬submitted(s)`. + ## Error Handling ### Error Scenario 1: Deadline Cloud Submitter Not Available @@ -1309,6 +1443,7 @@ For MVP, monitoring is informational only — artists can check job status via t - Test `map_instance_to_params` with various instance configurations and settings combinations - Test `resolve_farm_profile` with all priority paths (override, default, fallback) +- Test `resolve_conda_packages` with per-DCC configs: project override, server fallback, auto-detection fallback, custom packages with channels - Test validation plugins independently with mock DCC data - Test `PostRenderProcessor.validate_outputs` with complete, partial, and empty file sets - Test `PostRenderConfig` serialization/deserialization (data must survive round-trip through Deadline Cloud job parameters) @@ -1322,6 +1457,7 @@ For MVP, monitoring is informational only — artists can check job status via t - **Frame range mapping**: For any valid `(start, end)` tuple where `start <= end`, the mapped DC frame range string parses back to the same range - **Submission params completeness**: For any valid `CollectedRenderInstance` and `DeadlineCloudSettings`, `map_instance_to_params` returns params where all required fields are non-empty - **Validation correctness**: For any file list where `expected ⊆ discovered`, validation returns `is_valid=True`; for any list where `expected ⊄ discovered`, returns `is_valid=False` +- **Conda resolution determinism**: For the same DCC name, server settings, project settings, and DCC context, `resolve_conda_packages` always returns the same result. Project-level config always takes precedence over server-level for the same DCC. ### Integration Testing Approach From 80c9bbbc7fbc5c3ead852f7118405996c86d08d7 Mon Sep 17 00:00:00 2001 From: Johannes Oehmen Date: Tue, 21 Apr 2026 10:35:51 +0200 Subject: [PATCH 08/10] design: hybrid SMF + on-prem CMF post-render architecture MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Replace external on-prem publish with in-job PUBLISH step on CMF worker - Two-step job: RENDER (SMF) → PUBLISH (on-prem CMF) via OJD step dependencies - Per-step hostRequirements route steps to different fleets on same queue - On-prem worker accesses S3 via Deadline Cloud credential chain (no VPN) - Submission hooks (deadline-cloud PR #986) for job template assembly - IAM Roles Anywhere for on-prem worker bootstrap credentials - Publish dependencies can be conda-packaged for multi-node scaling - Path to fully cloud-based publishing when fileshare access available --- .../deadline-cloud-integration/design.md | 120 ++++++++++-------- 1 file changed, 69 insertions(+), 51 deletions(-) diff --git a/.kiro/specs/deadline-cloud-integration/design.md b/.kiro/specs/deadline-cloud-integration/design.md index 6921c86..91e4bdd 100644 --- a/.kiro/specs/deadline-cloud-integration/design.md +++ b/.kiro/specs/deadline-cloud-integration/design.md @@ -14,11 +14,11 @@ The key architectural principle is **separation of responsibilities**: AYON owns - **Studio-configurable validations**: Pre-submission validation plugins that run before any resource-intensive operations. Includes both built-in technical validations (renderable camera exists, valid frame range) and studio-defined custom validations (required AOVs, render settings checks). - **Publishing and processing results**: Version registration in AYON, transcoding, reviewable creation, burnins, file movement/renaming via path templates. -- **Post-render publishing on-prem**: Rendered outputs are downloaded to the studio (via auto-download cron or AYON service), and the publish pipeline runs locally on a dedicated machine with filesystem access. +- **Post-render publishing via on-prem CMF worker**: The render job includes a PUBLISH step that runs on an on-prem customer-managed fleet (CMF) worker. The worker downloads render outputs from S3 via the Deadline Cloud credential chain (no VPN needed) and runs the publish pipeline locally. This keeps the full render→publish flow within a single Deadline Cloud job. ### Deferred (Post-MVP) -- **Farm-side post-render**: Running the publish pipeline on Deadline Cloud workers as OpenJD template steps. Requires a lightweight headless publish runner (without full AYON launcher/Qt dependencies) and fileshare access from workers. See [Post-Render Execution Model](#post-render-execution-model) for the full trade-off analysis. +- **Fully cloud-based publishing**: Once the publish worker fleet has access to the studio fileshare via VPN, Direct Connect, or FSx, the on-prem CMF can be replaced with a cloud CMF or SMF. The same OJD job template works — no changes to the AYON integration needed. - **Advanced project tracking**: Higher-level job dependencies beyond render→publish, priority management across assets, planning integration, task status updates. - **Asset-level dependencies**: Complex dependency graphs like "create render archives → render images → publish". - **Cross-job orchestration**: Managing priorities and dependencies across multiple submissions. @@ -46,30 +46,30 @@ graph TD subgraph "AWS Deadline Cloud" D --> H[Deadline Cloud Submitter Tool] H -->|OJD Job Bundle| I[Deadline Cloud API] - I --> J[Render Job] + I --> J[Step 1: RENDER - SMF] + J -->|write outputs| S1 + J -->|depends on| K[Step 2: PUBLISH - On-Prem CMF] end - subgraph "On-Prem Post-Render Pipeline (MVP)" - P1[Auto-Download / AYON Service] --> P2[Output Discovery & Validation] - P2 --> P3[File Movement / Renaming] - P3 --> P4[Transcoding / Burnins] - P4 --> P5[Version Registration in AYON] + subgraph "On-Prem CMF Worker" + K -->|sync outputs from S3| L[Output Discovery & Validation] + L --> M[File Movement / Renaming] + M --> N[Transcoding / Burnins] + N --> O[Version Registration in AYON] end A -.->|fetch settings| E D -.->|pre-populate options| H J -->|read inputs| S1 J -->|read inputs| S2 - J -->|write outputs| S1 - J -->|write outputs| S2 - S1 -->|download outputs| P1 - S2 -->|access outputs| P1 - P5 -.->|register versions| E + O -.->|register versions| E ``` -> **Note on MVP job structure**: For MVP, Deadline Cloud handles rendering only. The job submitted is a render-only job — there is no post-render step within the Deadline Cloud job. Post-render publishing runs on-prem, triggered by output availability (via auto-download cron or AYON service). This keeps the Deadline Cloud integration focused on rendering while AYON handles the full publish pipeline locally. +> **Job structure**: A single Deadline Cloud job with two steps. The RENDER step runs on a service-managed fleet (SMF) and produces outputs as job attachments in S3. The PUBLISH step depends on RENDER and runs on an on-prem customer-managed fleet (CMF) worker. The on-prem worker downloads render outputs from S3 via the Deadline Cloud credential chain (`AssumeQueueRoleForWorker` → queue role with S3 access). No VPN is needed — all communication is outbound HTTPS. > -> **Target state (post-MVP)**: Post-render steps defined as [OpenJD templates](https://github.com/OpenJobDescription/openjd-specifications/wiki/2023-09-Template-Schemas) that can run either as dependent steps within the Deadline Cloud job (on farm workers) or on-prem via the [OpenJD runtime](https://github.com/OpenJobDescription/openjd-sessions). This gives customers the choice based on their infrastructure (cloud-native vs on-prem vs hybrid). See [Post-Render Execution Model](#post-render-execution-model) for details. +> Per-step host requirements in the [OJD template](https://github.com/OpenJobDescription/openjd-specifications/wiki/2023-09-Template-Schemas) route each step to the correct fleet. Both the SMF and on-prem CMF are associated with the same queue. Custom capability attributes (e.g., `attr.worker.fleet.type`) differentiate them. +> +> **Path to fully cloud-based publishing**: Because the PUBLISH step is defined as an OpenJD template, the same job structure works when the publish fleet moves to the cloud. Once the publish fleet has fileshare access via VPN, Direct Connect, or FSx, the on-prem CMF can be replaced with a cloud CMF or SMF — no changes to the job template or AYON integration needed. ## Sequence Diagrams @@ -108,31 +108,34 @@ sequenceDiagram end ``` -### Post-Render Publishing Flow (MVP — On-Prem) +### Post-Render Publishing Flow (On-Prem CMF Worker) ```mermaid sequenceDiagram participant DC as Deadline Cloud - participant DL as Auto-Download / AYON Service - participant Script as Post-Render Script (On-Prem) + participant S3 as S3 Job Attachments + participant Worker as On-Prem CMF Worker participant AYON as AYON Server API - DC->>DC: Render job completes - DL->>DC: Download outputs (deadline queue sync-output / manual) - DC-->>DL: Rendered output files + DC->>DC: RENDER step completes (all tasks) + DC->>Worker: Schedule PUBLISH step (dependency satisfied) + Worker->>DC: AssumeQueueRoleForWorker + DC-->>Worker: Queue role credentials (S3 access) + Worker->>S3: Sync render outputs (job attachments) + S3-->>Worker: Rendered output files - DL->>Script: Trigger publish pipeline (outputs available locally) - Script->>Script: Discover rendered output files - Script->>Script: Validate outputs (frame completeness, file integrity) + Worker->>Worker: Discover rendered output files + Worker->>Worker: Validate outputs (frame completeness, file integrity) alt Validation Failed - Script->>Script: Report failure + Worker->>DC: Report step failure else Validation Passed - Script->>Script: Move/rename files via path templates - Script->>Script: Run transcoding (if configured) - Script->>Script: Apply burnins (if configured) - Script->>AYON: Register version (files, metadata) - AYON-->>Script: Version registered + Worker->>Worker: Move/rename files via path templates + Worker->>Worker: Run transcoding (if configured) + Worker->>Worker: Apply burnins (if configured) + Worker->>AYON: Register version (files, metadata) + AYON-->>Worker: Version registered + Worker->>DC: Report step success end ``` @@ -557,31 +560,46 @@ class ValidateCondaPackages: #### Post-Render Execution Model -**MVP: On-prem post-render.** Rendered outputs are downloaded to the studio via auto-download (`deadline queue sync-output` as a cron job) or an AYON service. The publish pipeline runs locally on a dedicated machine with access to the studio filesystem. This is the simplest path — it avoids cloud packaging, worker-to-AYON connectivity, and fileshare access complexity. +**Hybrid SMF + on-prem CMF approach.** The render job is a single Deadline Cloud job with two steps. The RENDER step runs on a service-managed fleet (cloud). The PUBLISH step runs on an on-prem customer-managed fleet worker, with a step dependency on RENDER. The on-prem worker downloads render outputs from S3 via the Deadline Cloud credential chain and runs the publish pipeline locally (transcoding, burnins, file movement, AYON version registration). + +This keeps the full render→publish flow within a single Deadline Cloud job — no external orchestration, no cron jobs, no separate AYON service. The job either succeeds (render + publish) or fails with full visibility in the Deadline Cloud Monitor. + +**Per-step fleet routing.** The [OJD spec](https://github.com/OpenJobDescription/openjd-specifications/wiki/2023-09-Template-Schemas) defines `hostRequirements` at the step level. Each step can target a different fleet via custom capability attributes (e.g., `attr.worker.fleet.type` = `"smf-render"` vs `"cmf-publish"`). Both fleets are associated with the same queue. From the [AWS docs](https://docs.aws.amazon.com/deadline-cloud/latest/userguide/jobs-processing.html): "The fleet is chosen based on the capabilities configured for the fleet and the host requirements of a specific step." + +**Submission hooks for job template assembly.** The [submission hooks feature](https://github.com/aws-deadline/deadline-cloud/pull/986) (`hooks.yaml` in job bundles or via `DEADLINE_HOOKS_DIR`) can inject the PUBLISH step into the OJD template at submission time. A pre-submission hook adds the PUBLISH step with the correct `hostRequirements`, `dependsOn`, and AYON context metadata. AYON's pre-launch hook sets `DEADLINE_HOOKS_DIR` to point to AYON-managed hook scripts, so this applies to all submissions without modifying job bundles. + +**On-prem worker S3 access (no VPN needed).** The on-prem worker accesses S3 job attachments through the Deadline Cloud credential chain: +1. Worker bootstraps with `AWSDeadlineCloud-WorkerHost` credentials via [IAM Roles Anywhere](https://docs.aws.amazon.com/rolesanywhere/latest/userguide/introduction.html) (certificate-based, recommended for production) or IAM user access keys (for testing) +2. Worker calls `AssumeFleetRoleForWorker` → fleet role credentials (auto-refreshed) +3. When processing the PUBLISH step, worker calls `AssumeQueueRoleForWorker` → queue role credentials with S3 `GetObject`/`PutObject` on the job attachments bucket +4. Worker agent syncs render outputs from S3 before the step script runs + +All communication is outbound HTTPS (port 443) to: `scheduling.deadline.{region}.amazonaws.com`, `s3.{region}.amazonaws.com`, `logs.{region}.amazonaws.com`. No inbound connections. IAM Identity Center is not suitable here — it's for interactive human users, not headless worker agents. + +**On-prem worker prerequisites:** +- [Deadline Cloud worker agent](https://github.com/aws-deadline/deadline-cloud-worker-agent) installed +- IAM Roles Anywhere trust anchor configured (or IAM user keys for testing) +- Software dependencies: ffmpeg, OIIO, lightweight AYON publish runner. These can be packaged as conda packages and installed at step runtime via the conda queue environment — the same mechanism used for DCC packages on the render step. This means publish workers don't need all dependencies pre-installed, and studios can scale to multiple on-prem worker nodes without managing software on each one individually. +- Network: outbound HTTPS to AWS endpoints + access to AYON server on the studio network -**Target state (post-MVP): Farm-side post-render via OpenJD templates.** Post-render steps are defined as OpenJD templates that can run as dependent steps on Deadline Cloud workers or on-prem via the OpenJD runtime (`openjd-sessions`). This gives customers the choice based on their infrastructure. The same templates work in both environments — the "where does it run?" question becomes a deployment choice per customer, not a development decision. +**Path to fully cloud-based publishing.** Because the PUBLISH step is defined as an OpenJD template, the same job structure works when the publish fleet moves to the cloud. Once the publish fleet has fileshare access via VPN, Direct Connect, or FSx, the on-prem CMF can be replaced with a cloud CMF or SMF — no changes to the job template or AYON integration needed. **Trade-offs:** -| | On-prem (MVP) | On-farm (post-MVP) | -|---|---|---| -| Compute | Dedicated studio machine | Farm workers (scales with farm) | -| Fileshare access | Direct (local filesystem) | Depends on fleet type (see below) | -| AYON dependencies | Already available locally | Must be packaged (conda, host config, AMI) | -| Latency | Download first, then process | Processing starts after render | -| Setup complexity | AYON service + auto-download | OpenJD templates + dependency packaging | - -**Fileshare access by fleet type (relevant for farm-side post-render):** -- **CMF on-prem**: Workers already on studio network, direct access -- **CMF on EC2**: Workers in customer VPC, reach on-prem via VPN/Direct Connect -- **SMF**: Workers can access customer VPC resources (NFS, fileshares) via [VPC Lattice resource endpoints](https://docs.aws.amazon.com/deadline-cloud/latest/developerguide/smf-vpc.html) -- **Cloud-native storage**: A synchronized fileshare available in the cloud (e.g., FSx) works directly - -**Prerequisites for farm-side post-render:** -- OpenJD templates defining the post-render steps -- A lightweight headless publish runner (subset of ayon-core, without full launcher/Qt dependencies) -- AYON dependencies available on workers (via conda packages, host configuration scripts, or pre-baked AMIs) -- Network connectivity from workers to the AYON server API +| | On-prem publish (external) | Hybrid on-prem CMF (this design) | Fully cloud-based (post-MVP) | +|---|---|---|---| +| Compute | Local machine / AYON service | On-prem CMF worker(s) | Cloud CMF or SMF workers | +| Orchestration | External (cron/trigger) | Within DLC job | Within DLC job | +| VPN needed | No | No | Yes (for fileshare access) | +| AYON deps | Already local | Local or via conda | Must be packaged (conda/AMI) | +| Latency | Download first, then process | S3 sync only | Immediate (fileshare access) | +| DLC job tracking | Separate | Full visibility | Full visibility | +| Scalability | Single machine | Multiple CMF workers | Scales with farm | +| Setup | Service/cron infra | CMF fleet + IAM Roles Anywhere | + VPN/Direct Connect/FSx | + +**Open questions:** +- **Job attachment sync between steps**: Does the worker agent automatically sync outputs from a previous step as inputs to a dependent step within the same job? Or does the PUBLISH step script need to explicitly download them via `deadline job download-output`? This needs testing. +- **IAM Roles Anywhere setup complexity**: For studios without an existing PKI/CA, setting up IAM Roles Anywhere adds infrastructure overhead. AWS Private CA is an option but has cost implications. For smaller studios, time-bound IAM user keys may be the pragmatic starting point. **Purpose**: Handles output validation, transcoding, burnin application, and version registration in AYON after rendering completes. May run as a Deadline Cloud dependent step (on farm) or locally within the AYON pipeline (see TBD above). From 6755d5faf3719182b017f6e87dedd95bef8ff607 Mon Sep 17 00:00:00 2001 From: Johannes Oehmen Date: Tue, 26 May 2026 19:36:44 +0200 Subject: [PATCH 09/10] docs: explore launcher distribution paths for Deadline Cloud workers Map out CMF/SMF distribution options for AYON Launcher with mermaid diagrams. Questions conda's immutability assumption and whether auto-update belongs on render workers, then compares stable conda package, per-job conda package, S3-staged centralized mode, and pre-baked AMI as SMF paths. --- .../launcher-distribution.md | 179 ++++++++++++++++++ 1 file changed, 179 insertions(+) create mode 100644 .kiro/specs/deadline-cloud-integration/launcher-distribution.md diff --git a/.kiro/specs/deadline-cloud-integration/launcher-distribution.md b/.kiro/specs/deadline-cloud-integration/launcher-distribution.md new file mode 100644 index 0000000..d0f33c8 --- /dev/null +++ b/.kiro/specs/deadline-cloud-integration/launcher-distribution.md @@ -0,0 +1,179 @@ +# AYON Launcher Distribution on Deadline Cloud Workers + +> Working notes — exploring distribution paths for AYON Launcher on Deadline Cloud workers. +> Context: discussion in [wg-deadline-cloud thread](https://discord.com/channels/517362899170230292/1496527767217377430) and [ayon-launcher PR #303](https://github.com/ynput/ayon-launcher/pull/303). + +## Two assumptions worth questioning + +Before picking a packaging format, two assumptions baked into AYON Launcher's design need a second look in the render farm context: + +### 1. Conda packages are immutable after install + +Conda's contract is that `$CONDA_PREFIX/` doesn't change between install and remove. AYON Launcher self-updates by writing into its own install tree. That's a conflict everywhere conda is involved — workstation, CMF, or SMF. It's not an SMF-specific problem. + +What differs is *how loud the conflict is*: + +| Environment | Why the conflict is louder or quieter | +|---|---| +| **Workstation / local install** | Quiet. User owns env, writes succeed. Conda's view of the package becomes inconsistent, but nobody runs `conda update ayon-launcher` on a workstation, so the inconsistency is invisible until someone tries. | +| **CMF on-prem** | Quiet. Studio controls perms and can grant write. Same latent inconsistency as workstation. Tolerated because studios manage launcher updates manually. | +| **SMF** | Loud. Job-user can't write to root-owned env. Self-update fails immediately with permission denied. | + +So conda + AYON Launcher works *only if you treat conda as a one-shot installer and never run `conda update` on it.* That's a coherent stance, but worth being explicit about. + +### 2. Auto-update is desirable on render workers + +This is the bigger question and it's not really about packaging at all. + +Auto-update on a workstation is great: artists always run the right launcher version for the project they open. But on a render farm, auto-update is **a determinism risk**. + +```mermaid +sequenceDiagram + participant Submit as Job submitted at T0
launcher v1.6.0 expected + participant Worker1 as Worker A
(starts at T0+5m) + participant Worker2 as Worker B
(starts at T0+3h) + participant Server as AYON server + + Submit->>Worker1: render frames 1-50 + Worker1->>Server: bootstrap, what version? + Server-->>Worker1: v1.6.0 + Note over Worker1: renders frames
with v1.6.0 + + Note over Server: Bundle updated mid-job
v1.6.1 set as production + + Submit->>Worker2: render frames 51-100 + Worker2->>Server: bootstrap, what version? + Server-->>Worker2: v1.6.1 + Note over Worker2: renders frames
with v1.6.1 + + Note over Worker1,Worker2: Same job, two launcher versions
+ potentially different addon versions +``` + +What can go wrong: + +- **Different launcher versions across workers in the same job.** Frames 1–50 run with v1.6.0 logic; frames 51–100 with v1.6.1. If addon resolution or publish behavior changed between versions, output drifts mid-sequence. +- **Different addon versions across workers.** Even with launcher pinned, if the active bundle is updated mid-job, addons resolved by later workers differ from earlier ones. +- **Re-renders aren't reproducible.** A frame re-render six months later picks up whatever version is current, not what produced the original. +- **Cold-start cost paid every time.** Auto-update means every worker session checks the server, possibly downloads, before doing useful work. On a Deadline Cloud session that's already paying 12+ minutes of overhead, this adds more. + +The render farm wants the **opposite** of auto-update: pin the launcher version + addon versions at submission time, ship that exact set to every worker, and never let a worker resolve a different version mid-job. + +This isn't an argument against auto-update on the launcher in general — it's an argument that **the render-farm code path should bypass auto-update entirely** and the distribution mechanism should support pinning. + +## What changes if we drop auto-update on the farm + +If the launcher on a worker doesn't try to mutate itself, the immutability problem with conda goes away. The packaging discussion gets a lot simpler. + +```mermaid +flowchart TD + Pin{Pin launcher + addons
at submission time?} + Pin -->|Yes — render farm| Static[Static distribution
no self-update on workers] + Pin -->|No — workstation| Dynamic[Self-updating launcher
existing AYON behavior] + + Static --> Choices{Which static
distribution?} + Choices --> A[Conda package
per launcher version] + Choices --> B[S3-staged install
centralized mode] + Choices --> C[Pre-baked AMI] + Choices --> D[Per-job conda package] +``` + +## Distribution paths for the farm + +```mermaid +flowchart TD + Start[Render job submitted with
pinned launcher + addon versions] --> FleetType{Fleet type?} + + FleetType -->|CMF / on-prem| CMFPath + FleetType -->|SMF| SMFPath + + CMFPath[Studio controls host
install once, no per-job overhead] + CMFPath --> CMFConda[Conda or rez package
installed system-wide] + CMFPath --> CMFLocal[Plain on-prem install
same as workstation] + + SMFPath{How to ship pinned bundle
to ephemeral SMF workers?} + SMFPath -->|A| PathA + SMFPath -->|B| PathB + SMFPath -->|C| PathC + SMFPath -->|D| PathD + + subgraph PathA[Path A — Stable conda package + addon overlay] + A1[Build conda package per launcher release] + A1 --> A2[Publish to S3-backed conda channel] + A2 --> A3[Queue env installs at session start] + A3 --> A4[Job resolves addons from manifest
passed via job parameters] + A4 --> A5[Addons cached in writable session dir] + end + + subgraph PathB[Path B — Per-job conda package] + B1[Build step at job start
creates conda package] + B1 --> B2[Upload to per-job S3 location] + B2 --> B3[Render and publish steps depend on build] + B3 --> B4[Each worker installs the
job-specific package] + end + + subgraph PathC[Path C — S3-staged centralized mode] + C1[Studio pre-stages launcher + addons
per release in S3] + C1 --> C2[Job parameter selects which staged
version to use] + C2 --> C3[Worker syncs from S3 at session start] + C3 --> C4[AYON centralized mode reads
from staged location] + end + + subgraph PathD[Path D — Pre-baked AMI] + D1[Custom AMI with launcher + tools] + D1 --> D2[Worker boots with everything ready] + D2 --> D3[Job verifies AMI version matches
submission pin] + end +``` + +## Path comparison + +| Path | Reproducibility | Session overhead | Update cadence | Native to | Notes | +|------|----------------|------------------|----------------|-----------|-------| +| **A. Stable conda package + addon overlay** | Strong — addons pinned per job, launcher pinned per channel release | Low after first cache hit | Rebuild conda package per launcher release | Deadline Cloud | Cleanest split: rare-changing launcher vs frequent-changing addons | +| **B. Per-job conda package** | Strongest — full pin per submission | Medium-high — build + upload (hundreds of MB) per job | None needed; each job builds fresh | Deadline Cloud | Build cost on every submit, cache thrash, but truly hermetic | +| **C. S3-staged centralized mode** | Strong — pinned via studio's staging process | Low — S3 sync at session start | Studio re-stages per release | AYON | Matches AYON's existing centralized mode; sidesteps conda fight entirely | +| **D. Pre-baked AMI** | Strong — pinned per AMI | Zero | Rebake AMI per launcher release | AWS | Heavy ops burden if launcher versions move often; great for fixed configs | + +## CMF / on-prem + +CMF and on-prem aren't constrained the way SMF is. The conda or rez package from PR #303 is a fine fit *as a one-shot installer* — studio installs it system-wide, treats updates as "uninstall + install new package version," and never relies on the launcher self-updating in place. + +If the studio wants self-update on CMF workers (because they update launcher versions out-of-band from job submissions), the conda/rez package can still be used, with the same caveat: don't run `conda update` on it. + +```mermaid +flowchart LR + Build[ayon-launcher PR 303
hatch build pipeline] --> Wheel[wheel] + Build --> CondaPkg[conda package] + Build --> RezPkg[rez package] + + CondaPkg --> CMFInstall[CMF / on-prem
one-shot install] + RezPkg --> CMFInstall + CMFInstall --> CMFRun[Launcher runs
self-update tolerated
but not via conda update] + + CondaPkg -.SMF caveat.-> SMFNote[SMF needs auto-update
disabled regardless of
packaging format] +``` + +## SMF privilege model — for reference + +Even with auto-update disabled, the SMF privilege model still shapes which paths are viable. Worker host configuration runs as root and prepares the environment; job tasks run as `jobRunAsUser` without admin rights. Anything the job-user needs to write must land in a session-scoped, job-user-owned location. + +```mermaid +sequenceDiagram + participant Host as Worker host config
(root) + participant Env as Conda env or staged install
(read-only to job-user) + participant Session as Session working dir
(writable by job-user) + participant Job as Job task
(jobRunAsUser) + + Host->>Env: install or sync launcher + addons + Job->>Env: read launcher binary + Job->>Session: write logs, addon caches, render output + Note over Job,Env: Auto-update disabled
job-user never writes to Env +``` + +## Open questions + +1. Confirm: should auto-update be **disabled** on render workers, with launcher + addon versions pinned at submission? +2. Where does the studio's source-of-truth for "this job's launcher + addon set" live — AYON server bundle, S3 manifest, both? +3. Which SMF path is the right MVP target, and which is the long-term answer? +4. How does this interact with the existing render → publish split (SMF → CMF) agreed on for MVP? +5. Does AYON Launcher already have a "pinned mode" flag, or does that need to be added? From 24760374c5fcd6052339b07f4f06b5a0c2037c73 Mon Sep 17 00:00:00 2001 From: Johannes Oehmen Date: Tue, 26 May 2026 20:07:58 +0200 Subject: [PATCH 10/10] docs: capture convergence on headless publish package for SMF Add TL;DR and closing section reflecting Ondrej's bootstrap idea and Leon's working ayon-publish recipe. SMF target is a headless conda package, separate from PR #303's full launcher which targets CMF and on-prem. --- .../launcher-distribution.md | 60 +++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/.kiro/specs/deadline-cloud-integration/launcher-distribution.md b/.kiro/specs/deadline-cloud-integration/launcher-distribution.md index d0f33c8..197cca6 100644 --- a/.kiro/specs/deadline-cloud-integration/launcher-distribution.md +++ b/.kiro/specs/deadline-cloud-integration/launcher-distribution.md @@ -3,6 +3,16 @@ > Working notes — exploring distribution paths for AYON Launcher on Deadline Cloud workers. > Context: discussion in [wg-deadline-cloud thread](https://discord.com/channels/517362899170230292/1496527767217377430) and [ayon-launcher PR #303](https://github.com/ynput/ayon-launcher/pull/303). +## TL;DR + +After thinking through it and seeing two parallel pieces of work converge: + +- **Don't ship the full cx_Freeze'd AYON Launcher to SMF workers.** It's the wrong primitive — too big, fights conda's immutability, conflicts with self-update. +- **Ship a headless publish package instead.** A small, conda-installable Python package with just the code paths a worker needs (`ayon-core` headless, `ayon-python-api`, `pyblish-base`, the `ayon-deadline-cloud` addon). Pin its version at submission time. +- **Keep the full launcher (PR #303 conda/rez output) for CMF and on-prem.** That's where the GUI, shim, and self-update belong. + +This is the shape Ondřej proposed and Leon independently prototyped. The rest of this doc is the reasoning that gets there. + ## Two assumptions worth questioning Before picking a packaging format, two assumptions baked into AYON Launcher's design need a second look in the render farm context: @@ -177,3 +187,53 @@ sequenceDiagram 3. Which SMF path is the right MVP target, and which is the long-term answer? 4. How does this interact with the existing render → publish split (SMF → CMF) agreed on for MVP? 5. Does AYON Launcher already have a "pinned mode" flag, or does that need to be added? + +--- + +## Update — convergence on a headless publish package + +After this doc went out, two things happened in close sequence in the wg-deadline-cloud thread: + +**Ondřej's reframe:** *"Maybe we don't really need the whole launcher. Maybe we could simply extract part of it as standalone pip installable thing that will bootstrap AYON and its own version can be independent of both launcher and addons."* + +**Leon's working prototype:** he had built a custom conda recipe (`ayon-publish` v1.9.5, noarch:python) that bundles `ayon-core` (headless), `ayon-python-api`, `pyblish-base`, `clique`, the `ayon_deadline_cloud` addon, and the dependencies needed to run `ayon addon deadline_cloud publish` on an SMF worker. He confirmed it works in the publish step. + +These are the same idea from two angles. Ondřej is asking *"what's the right factoring of AYON for headless workers?"* Leon answered with a recipe. + +### Why this collapses most of the SMF discussion above + +| Concern raised earlier | How a headless publish package handles it | +|---|---| +| Conda immutability | Package never mutates itself — no self-update on workers | +| Auto-update determinism | Package version is the pin; bundle name + package version pin the publish-side code | +| SMF privilege model | Job-user only reads from `$CONDA_PREFIX`, writes go to session dir | +| cx_Freeze bundle size | Pure Python, tens of MB instead of hundreds | +| Shim and GUI complications | Not shipped to workers at all | + +The four SMF paths (A/B/C/D) above were assuming we ship the full launcher to workers. Once we accept that workers only need the headless code paths, **Path A (stable conda + addon overlay)** is essentially what Leon built — just sliced finer than the full launcher. + +### Refined picture + +```mermaid +flowchart LR + Build[ayon-launcher PR 303
hatch build pipeline] --> CondaFull[Full launcher
conda / rez package
cx_Freeze + GUI + shim] + Build --> RezFull[Full launcher
rez package] + + Headless[ayon-publish recipe
ayon-core headless
+ python-api + pyblish
+ ayon-deadline-cloud] --> CondaHeadless[Headless conda package
noarch python] + + CondaFull --> CMFInstall[CMF / on-prem
workstation install] + RezFull --> CMFInstall + + CondaHeadless --> SMFChannel[S3-backed conda channel
queue env installs at session start] + SMFChannel --> SMFWorker[SMF worker
publish step only] + + CondaFull -. not appropriate .-> SMFNote[SMF — too heavy, fights
immutability + self-update] +``` + +### Open items now narrower + +1. Agree explicitly that the SMF target is a headless conda package, separate artifact from PR #303's full launcher package. +2. Decide ownership of the headless package recipe — does it live in `ynput/ayon-core`, `ynput/ayon-deadline-cloud`, or its own repo? Leon's prototype currently has hardcoded local paths and pins ayon-core 1.9.5 + addon by file copy. +3. Settle on dependency pinning strategy: pip-install with explicit versions (Leon's approach, defensible for a self-contained package) vs sourcing each component as a separate `source:` entry. +4. Cross-platform: Leon's wrapper is Linux-shaped. Windows fleets need a `bin\ayon.bat` equivalent. +5. How addons get pinned alongside the headless code — bake into the package (rebuild per addon update) vs pass via job parameters and stage from the AYON server.