feat(ir): return callable CompiledProgram from ir.compile()#961
feat(ir): return callable CompiledProgram from ir.compile()#961Hzfengsy wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughir.compile() now returns a new CompiledProgram object (callable) instead of a path string. CompiledProgram wraps compiled IR, lazily extracts orchestration metadata, supports in-place and return-style calls, and delegates execution to runtime.execute_compiled. Runtime exposes execute_compiled to run precompiled artifacts. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CP as CompiledProgram
participant Runner as runtime.execute_compiled
participant Device as _execute_on_device
User->>CP: __call__(*torch.Tensor)
activate CP
CP->>CP: _get_metadata() / _extract_param_infos() (lazy)
Note over CP: Determine param names, directions,\noutput_indices, shapes, dtypes
alt In-place style (args == total params)
CP->>CP: validate arg count (in-place)
CP->>CP: _build_full_args() (identity)
else Return style (args == input params)
CP->>CP: validate arg count (return-style)
CP->>CP: _build_full_args() (allocate outputs)
end
CP->>Runner: execute_compiled(work_dir, tensors, param_infos, output_indices, platform, device_id)
activate Runner
Runner->>Runner: optionally patch headers
Runner->>Runner: _write_call_golden() (optional)
Runner->>Runner: serialize inputs to work_dir/cache/Default_inputs.pt
Runner->>Device: _execute_on_device(...)
activate Device
Device-->>Runner: execution complete
deactivate Device
deactivate Runner
alt Return style
CP-->>User: return tensor or tuple[tensors]
else In-place style
CP-->>User: return None
end
deactivate CP
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
6327606 to
df719d4
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a CompiledProgram wrapper for the ir.compile output, providing a Triton-like callable API to execute compiled programs directly with torch tensors. The implementation supports both in-place and return-style calling conventions while preserving backward compatibility for path-based artifact access. Review feedback identifies a serialization mismatch in the runtime runner, potential runtime crashes when automatically allocating tensors for dynamic shapes, and opportunities to improve type inference for scalar parameters.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (7)
python/pypto/ir/compiled_program.py (3)
95-95: Addstrict=Trueto zip for defensive programming.While
orch_func.paramsandorch_func.param_directionsshould always have matching lengths by IR construction, addingstrict=Trueprovides an early fail-fast if this invariant is ever violated.♻️ Proposed fix
- for i, (param, direction) in enumerate(zip(orch_func.params, orch_func.param_directions)): + for i, (param, direction) in enumerate(zip(orch_func.params, orch_func.param_directions, strict=True)):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/pypto/ir/compiled_program.py` at line 95, Update the enumerate‑zip loop that iterates over orch_func.params and orch_func.param_directions to use zip(..., strict=True) to enforce that the two iterables have the same length; specifically, modify the loop that currently reads for i, (param, direction) in enumerate(zip(orch_func.params, orch_func.param_directions)) to pass strict=True to zip so a mismatch will raise immediately and fail fast.
40-52: Missing dtype mappings for some PyPTO types.The mapping is missing several
DataTypevalues exported frompypto.ir:FP4,FP8E4M3FN,FP8E5M2,HF4,HF8,INT4,UINT16,UINT32,UINT64. While PyTorch may not support all of these natively, users attempting return-style calls with these types will get an unhelpful "Unsupported dtype" error.Consider adding a note in the error message at line 310 about which dtypes are supported, or mapping to the closest supported type where reasonable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/pypto/ir/compiled_program.py` around lines 40 - 52, The _DATATYPE_TO_TORCH mapping in compiled_program.py lacks several DataType keys (FP4, FP8E4M3FN, FP8E5M2, HF4, HF8, INT4, UINT16, UINT32, UINT64) causing generic "Unsupported dtype" failures; update the _DATATYPE_TO_TORCH dict to include entries for these types mapped to the closest torch.dtype (e.g., map INT4→torch.int8 or a promoted int, UINT16/UINT32/UINT64→torch.int32/torch.int64 as appropriate, FP8/FP4/HF variants→torch.float16 or torch.bfloat16 where reasonable) and/or adjust the error-generation site that raises the "Unsupported dtype" message to enumerate the supported keys from _DATATYPE_TO_TORCH so callers see which dtypes are supported; ensure you reference the dict name _DATATYPE_TO_TORCH and the exact error message string "Unsupported dtype" when making the change.
197-205: Equality and hash semantics may surprise users.
__eq__compares onlyoutput_dir, ignoringplatform,device_id, andprogram. TwoCompiledPrograminstances pointing to the same directory but with different platforms will be considered equal. This is likely intentional for backward compatibility with string paths, but could cause subtle bugs if users expect full equality.Consider documenting this behavior in the class docstring, e.g., "Equality is based solely on output directory for backward compatibility with path strings."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/pypto/ir/compiled_program.py` around lines 197 - 205, The __eq__ and __hash__ implementations for CompiledProgram only use _output_dir (and accept str/PathLike) which can make instances with different platform, device_id, or program compare equal; update the CompiledProgram class docstring to explicitly state that equality and hashing are based solely on output directory for backward compatibility with path strings, mention the specific symbols involved (__eq__, __hash__, _output_dir, and the attributes platform, device_id, program) so callers understand the semantics; do not change behavior in this PR—just document it clearly.tests/ut/ir/test_compiled_program.py (2)
169-176: Use raw string for regex pattern with metacharacters.The pattern
"expects 3 .* or 2"contains the.metacharacter. Use a raw string to make intent clear and satisfy static analysis.♻️ Proposed fix
# Program has 3 params (2 in + 1 out), with return. # Valid: 3 args (in-place) or 2 args (return style) - with pytest.raises(TypeError, match="expects 3 .* or 2"): + with pytest.raises(TypeError, match=r"expects 3 .* or 2"): cp(a) # 1 arg is neither 3 nor 2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ut/ir/test_compiled_program.py` around lines 169 - 176, The test test_wrong_arg_count_with_return uses pytest.raises(..., match="expects 3 .* or 2") which uses a regex with metacharacters in a normal string; change the match argument to a raw string (e.g. r"expects 3 .* or 2") so the regex is interpreted correctly in the assertion for CompiledProgram/ cp call in that test.
93-108: Minor: Prefix unused variable with underscore.Per static analysis,
out_idxat line 95 is unused. Prefix with underscore to indicate intentional discard.♻️ Proposed fix
def test_extracts_param_names_and_directions(self): prog = _make_program_with_orchestration() - infos, out_idx, _ = _extract_param_infos(prog) + infos, _out_idx, _ = _extract_param_infos(prog) assert len(infos) == 3🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ut/ir/test_compiled_program.py` around lines 93 - 108, In test_extracts_param_names_and_directions, the returned out_idx from _extract_param_infos(prog) is unused; change the local variable name to _out_idx (i.e., unpack as infos, _out_idx, _ = _extract_param_infos(prog)) to mark it as intentionally ignored; keep the second test test_output_indices unchanged since it asserts out_idx there.python/pypto/ir/compile.py (1)
52-54: Consider documenting the platform/device_id interaction with backend_type.
RunConfig.__post_init__auto-correctsplatformto matchbackend_type(e.g., ifbackend_type=Ascend950, it ensures platform starts with"a5"). Thecompile()function doesn't perform this validation, so users could inadvertently create aCompiledProgramwith mismatchedplatformandbackend_type.Either add validation here, or document in the docstring that
platformshould matchbackend_type, or delegate validation toCompiledProgram.__init__.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/pypto/ir/compile.py` around lines 52 - 54, The compile() function may produce a CompiledProgram whose platform string can mismatch backend_type because RunConfig.__post_init__ enforces platform/back-end consistency; add validation in compile() (or document and delegate to CompiledProgram.__init__) to ensure platform matches backend_type: call the same normalization/validation logic used by RunConfig.__post_init__ (or replicate its checks) before constructing/returning CompiledProgram, and raise or correct platform/device_id when inconsistent; reference RunConfig.__post_init__, compile(), and CompiledProgram.__init__ so reviewers can locate and update the validation/normalization path.python/pypto/runtime/runner.py (1)
879-906: Consider validatingplatformparameter.Unlike
RunConfig.__post_init__(lines 278-288) which validates and auto-corrects platform values,execute_compiledpasses the platform directly to_execute_on_devicewithout validation. An invalid platform value could propagate to CodeRunner and cause unclear errors.♻️ Suggested validation
def execute_compiled( work_dir: Path, tensors: list[torch.Tensor], param_infos: list, output_indices: list[int], *, platform: str, device_id: int, pto_isa_commit: str | None = None, ) -> None: + valid_platforms = ("a2a3sim", "a2a3", "a5sim", "a5") + if platform not in valid_platforms: + raise ValueError(f"Invalid platform {platform!r}. Expected one of {valid_platforms}.") + work_dir = Path(work_dir)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/pypto/runtime/runner.py` around lines 879 - 906, The execute_compiled function currently forwards the platform argument to _execute_on_device without any validation; add the same validation/normalization logic used in RunConfig.__post_init__ (e.g., normalize aliases, enforce allowed platforms, or raise a clear ValueError) at the start of execute_compiled before calling _execute_on_device so invalid platform strings are corrected or rejected with a clear message; reference the existing validation in RunConfig.__post_init__ for the exact normalization rules and apply them to the platform parameter in execute_compiled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/pypto/ir/__init__.py`:
- Line 44: The type stub python/pypto/ir/__init__.pyi is missing CompiledProgram
in its exported __all__ list; update the __all__ list (the list of strings
around where other classes are exported) to include "CompiledProgram" so the
stub matches the runtime __init__.py which imports CompiledProgram from
.compiled_program, ensuring the public API in the stub and runtime are
synchronized.
In `@python/pypto/ir/compiled_program.py`:
- Around line 303-314: The allocation loop currently calls
torch.zeros(info.shape, ...) for outputs but doesn't guard against dynamic
dimensions (-1), which causes a RuntimeError; update the loop that iterates over
param_infos and output_set to validate info.shape before calling torch.zeros: if
any dimension in info.shape is negative (e.g., -1) raise a clear ValueError like
"Cannot allocate output tensor 'name': dynamic dimension -1 in shape" (include
info.name and info.dtype in the message for context) instead of attempting
torch.zeros; keep existing checks for info.shape is None and unsupported dtype,
and only call torch.zeros when all dimensions are non-negative.
In `@python/pypto/runtime/runner.py`:
- Around line 953-956: The fallback generate_inputs loop currently creates
tensors with hardcoded torch.float32; update it to use the parameter's real
dtype by mapping info.dtype to a torch dtype (e.g. call
_to_torch_dtype(info.dtype)) instead of torch.float32, importing _to_torch_dtype
from compiled_program (or duplicating that mapping) and fall back to
torch.float32 only if info.dtype is None/unknown; update the line that builds
the tuple for each entry in param_infos to pass
dtype=_to_torch_dtype(info.dtype) (with a safe default) so dtype mismatches are
avoided.
- Around line 917-921: The cache saved in _write_call_golden uses a list of
(name, tensor) tuples (created as inputs_data and saved to Default_inputs.pt)
but _load_inputs (used by _cached_gen in _install_golden_inputs_patch) expects
dicts with "kind","name","data"; fix by writing the cache in the same format as
_save_inputs: transform param_infos and tensors into a list of dicts with keys
"kind" (use the appropriate kind from param_infos or a default), "name" (from
info.name) and "data" (the tensor) and save that, or alternatively modify
_write_call_golden to call the existing _save_inputs helper instead of directly
torch.save; ensure Default_inputs.pt uses the same schema as _load_inputs so
_cached_gen can load it without error.
---
Nitpick comments:
In `@python/pypto/ir/compile.py`:
- Around line 52-54: The compile() function may produce a CompiledProgram whose
platform string can mismatch backend_type because RunConfig.__post_init__
enforces platform/back-end consistency; add validation in compile() (or document
and delegate to CompiledProgram.__init__) to ensure platform matches
backend_type: call the same normalization/validation logic used by
RunConfig.__post_init__ (or replicate its checks) before constructing/returning
CompiledProgram, and raise or correct platform/device_id when inconsistent;
reference RunConfig.__post_init__, compile(), and CompiledProgram.__init__ so
reviewers can locate and update the validation/normalization path.
In `@python/pypto/ir/compiled_program.py`:
- Line 95: Update the enumerate‑zip loop that iterates over orch_func.params and
orch_func.param_directions to use zip(..., strict=True) to enforce that the two
iterables have the same length; specifically, modify the loop that currently
reads for i, (param, direction) in enumerate(zip(orch_func.params,
orch_func.param_directions)) to pass strict=True to zip so a mismatch will raise
immediately and fail fast.
- Around line 40-52: The _DATATYPE_TO_TORCH mapping in compiled_program.py lacks
several DataType keys (FP4, FP8E4M3FN, FP8E5M2, HF4, HF8, INT4, UINT16, UINT32,
UINT64) causing generic "Unsupported dtype" failures; update the
_DATATYPE_TO_TORCH dict to include entries for these types mapped to the closest
torch.dtype (e.g., map INT4→torch.int8 or a promoted int,
UINT16/UINT32/UINT64→torch.int32/torch.int64 as appropriate, FP8/FP4/HF
variants→torch.float16 or torch.bfloat16 where reasonable) and/or adjust the
error-generation site that raises the "Unsupported dtype" message to enumerate
the supported keys from _DATATYPE_TO_TORCH so callers see which dtypes are
supported; ensure you reference the dict name _DATATYPE_TO_TORCH and the exact
error message string "Unsupported dtype" when making the change.
- Around line 197-205: The __eq__ and __hash__ implementations for
CompiledProgram only use _output_dir (and accept str/PathLike) which can make
instances with different platform, device_id, or program compare equal; update
the CompiledProgram class docstring to explicitly state that equality and
hashing are based solely on output directory for backward compatibility with
path strings, mention the specific symbols involved (__eq__, __hash__,
_output_dir, and the attributes platform, device_id, program) so callers
understand the semantics; do not change behavior in this PR—just document it
clearly.
In `@python/pypto/runtime/runner.py`:
- Around line 879-906: The execute_compiled function currently forwards the
platform argument to _execute_on_device without any validation; add the same
validation/normalization logic used in RunConfig.__post_init__ (e.g., normalize
aliases, enforce allowed platforms, or raise a clear ValueError) at the start of
execute_compiled before calling _execute_on_device so invalid platform strings
are corrected or rejected with a clear message; reference the existing
validation in RunConfig.__post_init__ for the exact normalization rules and
apply them to the platform parameter in execute_compiled.
In `@tests/ut/ir/test_compiled_program.py`:
- Around line 169-176: The test test_wrong_arg_count_with_return uses
pytest.raises(..., match="expects 3 .* or 2") which uses a regex with
metacharacters in a normal string; change the match argument to a raw string
(e.g. r"expects 3 .* or 2") so the regex is interpreted correctly in the
assertion for CompiledProgram/ cp call in that test.
- Around line 93-108: In test_extracts_param_names_and_directions, the returned
out_idx from _extract_param_infos(prog) is unused; change the local variable
name to _out_idx (i.e., unpack as infos, _out_idx, _ =
_extract_param_infos(prog)) to mark it as intentionally ignored; keep the second
test test_output_indices unchanged since it asserts out_idx there.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1073bb3d-11bc-4cf4-9eb5-5959a2ad2780
📒 Files selected for processing (6)
python/pypto/ir/__init__.pypython/pypto/ir/compile.pypython/pypto/ir/compiled_program.pypython/pypto/runtime/__init__.pypython/pypto/runtime/runner.pytests/ut/ir/test_compiled_program.py
There was a problem hiding this comment.
Pull request overview
This PR updates the Python IR compilation API so pypto.ir.compile() returns a callable CompiledProgram object (Triton-like execution style) while preserving path-like backward compatibility, and adds a runtime helper to execute precompiled artifacts with user-provided tensors.
Changes:
- Add
pypto.ir.CompiledProgramwrapper with__call__, lazy orchestration metadata extraction, and__str__/__fspath__for path-like behavior. - Change
pypto.ir.compile()to returnCompiledProgram(with platform/device binding for later execution). - Add
pypto.runtime.execute_compiled()plus a minimal auto-generated golden script for device execution via the existing runner infrastructure.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
python/pypto/ir/compile.py |
Returns CompiledProgram instead of str, adds platform/device_id wiring. |
python/pypto/ir/compiled_program.py |
New callable wrapper and orchestration signature metadata extraction/allocation logic. |
python/pypto/ir/__init__.py |
Re-export CompiledProgram from pypto.ir. |
python/pypto/runtime/runner.py |
Adds execute_compiled() and _write_call_golden() to run compiled outputs with provided tensors. |
python/pypto/runtime/__init__.py |
Re-export execute_compiled. |
tests/ut/ir/test_compiled_program.py |
New unit tests for backward-compat, metadata extraction, and basic call/alloc behavior. |
There was a problem hiding this comment.
♻️ Duplicate comments (2)
python/pypto/ir/compiled_program.py (1)
309-317:⚠️ Potential issue | 🟠 MajorReturn-style allocation fails for dynamic output dimensions.
Dynamic dims are represented as
-1(Line 105), but Line 317 passes them directly totorch.zeros(...), which raises at runtime.🐛 Proposed fix
if i in output_set: # Allocate output tensor from IR metadata if info.shape is None: raise ValueError(f"Cannot allocate output tensor {info.name!r}: no shape in IR") + if any(d < 0 for d in info.shape): + raise ValueError( + f"Cannot allocate output tensor {info.name!r}: dynamic dimensions {info.shape}. " + "Use in-place calling style and pass output tensors explicitly." + ) torch_dtype = _to_torch_dtype(info.dtype) if torch_dtype is None: raise ValueError(f"Unsupported dtype {info.dtype} for output tensor {info.name!r}") all_tensors.append(torch.zeros(info.shape, dtype=torch_dtype))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/pypto/ir/compiled_program.py` around lines 309 - 317, The allocation code for outputs in the loop over param_infos fails when info.shape contains dynamic dims encoded as -1 because torch.zeros(...) rejects negative sizes; before calling torch.zeros, map info.shape to a concrete size tuple replacing any -1 with 1 (e.g. size = tuple(dim if dim != -1 else 1 for dim in info.shape)), then call torch.zeros(size, dtype=torch_dtype) and append to all_tensors; keep using _to_torch_dtype to get torch_dtype and raise as before if None.python/pypto/runtime/runner.py (1)
925-930:⚠️ Potential issue | 🔴 CriticalInput cache format is incompatible with
_load_inputsand drops user tensors.At Line 928,
Default_inputs.ptis saved as list-of-tuples, but_load_inputs()expects list-of-dicts. That causes cache load failure and fallback to placeholder tensors, so callable execution can ignore provided inputs.🐛 Proposed fix
# 3. Save user tensors to cache so the golden_inputs_patch can load them cache_dir = work_dir / "cache" cache_dir.mkdir(exist_ok=True) - inputs_data = [(info.name, tensor) for info, tensor in zip(param_infos, tensors)] - torch.save(inputs_data, cache_dir / "Default_inputs.pt") + inputs_data = [(info.name, tensor) for info, tensor in zip(param_infos, tensors, strict=True)] + _save_inputs(inputs_data, cache_dir / "Default_inputs.pt")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/pypto/runtime/runner.py` around lines 925 - 930, The cache is being saved as a list-of-tuples but _load_inputs() expects a list-of-dicts, causing user tensors to be dropped; change the saved format so inputs_data is a list of dicts (e.g. each entry includes the parameter name and the tensor under the keys _load_inputs_ expects) and write that to cache_dir / "Default_inputs.pt"; ensure you still use param_infos and tensors to build the list and keep the same filename and cache_dir usage so _load_inputs() can successfully read the user-provided tensors.
🧹 Nitpick comments (2)
python/pypto/ir/compile.py (1)
41-41: Move inline# noqasuppressions to Ruff config.Please avoid inline suppressions in
python/pypto/and add targeted per-file ignores inruff.tomlinstead (with rationale), for bothPLR0913andPLC0415.Based on learnings: In hw-native-sys/pypto, prefer fixing root causes in linter configuration (per-file-ignores in
ruff.toml) rather than inline# noqaunderpython/pypto.Also applies to: 175-175
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/pypto/ir/compile.py` at line 41, Remove the inline "# noqa" suppressions from the compile function (def compile(...)) and any other inline PLR0913/PLC0415 suppressions in the python/pypto package, and instead add targeted per-file-ignores in ruff.toml for the affected files: declare the specific codes (PLR0913, PLC0415) under per-file-ignores with a short rationale comment explaining why the linter is suppressed for that file; then run Ruff to confirm no inline "# noqa" remain and adjust the ruff.toml entries if other files in python/pypto require the same targeted ignores.python/pypto/ir/compiled_program.py (1)
277-277: Use Ruff per-file ignore instead of inline# noqa.Please move this suppression to
ruff.tomlper-file-ignores (with rationale) instead of inline# noqa.Based on learnings: In hw-native-sys/pypto, Python files under
python/pyptoshould preferruff.tomlper-file-ignores over inline suppression comments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/pypto/ir/compiled_program.py` at line 277, The inline suppression "# noqa: PLC0415" on the import "from pypto.runtime.runner import execute_compiled" in compiled_program.py should be removed and instead add a per-file ignore entry in ruff.toml for this file; update ruff.toml's per-file-ignores to include the module path "python/pypto/ir/compiled_program.py": ["PLC0415"] with a short rationale comment (e.g., "import moved to avoid circular import at runtime"), then delete the inline "# noqa" so the file-level ignore documents the exception centrally.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@python/pypto/ir/compiled_program.py`:
- Around line 309-317: The allocation code for outputs in the loop over
param_infos fails when info.shape contains dynamic dims encoded as -1 because
torch.zeros(...) rejects negative sizes; before calling torch.zeros, map
info.shape to a concrete size tuple replacing any -1 with 1 (e.g. size =
tuple(dim if dim != -1 else 1 for dim in info.shape)), then call
torch.zeros(size, dtype=torch_dtype) and append to all_tensors; keep using
_to_torch_dtype to get torch_dtype and raise as before if None.
In `@python/pypto/runtime/runner.py`:
- Around line 925-930: The cache is being saved as a list-of-tuples but
_load_inputs() expects a list-of-dicts, causing user tensors to be dropped;
change the saved format so inputs_data is a list of dicts (e.g. each entry
includes the parameter name and the tensor under the keys _load_inputs_ expects)
and write that to cache_dir / "Default_inputs.pt"; ensure you still use
param_infos and tensors to build the list and keep the same filename and
cache_dir usage so _load_inputs() can successfully read the user-provided
tensors.
---
Nitpick comments:
In `@python/pypto/ir/compile.py`:
- Line 41: Remove the inline "# noqa" suppressions from the compile function
(def compile(...)) and any other inline PLR0913/PLC0415 suppressions in the
python/pypto package, and instead add targeted per-file-ignores in ruff.toml for
the affected files: declare the specific codes (PLR0913, PLC0415) under
per-file-ignores with a short rationale comment explaining why the linter is
suppressed for that file; then run Ruff to confirm no inline "# noqa" remain and
adjust the ruff.toml entries if other files in python/pypto require the same
targeted ignores.
In `@python/pypto/ir/compiled_program.py`:
- Line 277: The inline suppression "# noqa: PLC0415" on the import "from
pypto.runtime.runner import execute_compiled" in compiled_program.py should be
removed and instead add a per-file ignore entry in ruff.toml for this file;
update ruff.toml's per-file-ignores to include the module path
"python/pypto/ir/compiled_program.py": ["PLC0415"] with a short rationale
comment (e.g., "import moved to avoid circular import at runtime"), then delete
the inline "# noqa" so the file-level ignore documents the exception centrally.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ccf14cca-eec9-4f93-b821-10ef1a2553f4
📒 Files selected for processing (6)
python/pypto/ir/__init__.pypython/pypto/ir/compile.pypython/pypto/ir/compiled_program.pypython/pypto/runtime/__init__.pypython/pypto/runtime/runner.pytests/ut/ir/test_compiled_program.py
✅ Files skipped from review due to trivial changes (1)
- python/pypto/runtime/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
- python/pypto/ir/init.py
04f4cff to
097696b
Compare
Make ir.compile() return a CompiledProgram object that can be called with torch tensors (Triton-like API), while maintaining backward compatibility via __str__ and __fspath__. Two calling styles: compiled(a, b, c) # in-place: c modified on device c = compiled(a, b) # return: output allocated and returned Fixes hw-native-sys#958
097696b to
99d7f32
Compare
|
Closing in favor of #1055. |
Summary
ir.compile()return aCompiledProgramobject that is callable with torch tensors (Triton-like API)__str__and__fspath__— existing code usingir.compile()as a path string continues to workexecute_compiled()runtime helper for executing pre-compiled programs with user-provided tensorsAPI
Key design decisions
param_directions,return_types) to detect outputs vs inputs automatically__call__to avoid circular dependency betweenirandruntimemodulesFixes #958
Test plan
ir.compile()return type🤖 Generated with Claude Code