Skip to content

fix(discovery): single-pass scan prevents duplicate resource discovery#271

Closed
deanq wants to merge 3 commits intomainfrom
deanq/ae-2456-fix-discovery-of-resources-is-multiplied
Closed

fix(discovery): single-pass scan prevents duplicate resource discovery#271
deanq wants to merge 3 commits intomainfrom
deanq/ae-2456-fix-discovery-of-resources-is-multiplied

Conversation

@deanq
Copy link
Member

@deanq deanq commented Mar 12, 2026

Summary

  • _discover_resources created a new ResourceDiscovery per .py file, each with its own cache. Files without @remote fell back to directory scanning, rediscovering the same resource N times. Replaced with discover_directory classmethod that scans all files in a single pass with resource_id deduplication.
  • Extracted _SKIP_DIRS constant shared by discover_directory and _scan_project_directory.
  • Promoted LB endpoint "Deployed" log message from DEBUG to INFO.
  • Updated uv.lock for v1.9.1.

Test plan

  • Added 12 new tests for discover_directory (dedup, skip dirs, nested dirs, syntax errors, regression test for the original 3x bug)
  • All 2348 tests pass
  • 85.55% coverage (threshold: 65%)
  • Verified manually with autoresearch example project: 1 resource discovered instead of 3

@deanq deanq requested a review from Copilot March 12, 2026 23:54
@deanq deanq changed the title fix(discovery): single-pass scan prevents duplicate resources fix(discovery): single-pass scan prevents duplicate resource discovery Mar 12, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes duplicate resource discovery during flash run by introducing a single-pass directory scanner that deduplicates resources by resource_id, reducing redundant scanning/imports when iterating over many project files.

Changes:

  • Added ResourceDiscovery.discover_directory(project_root) to scan all *.py files once and dedupe discovered resources.
  • Centralized ignored directories into a shared _SKIP_DIRS constant and reused it in directory scanning logic.
  • Increased the LB endpoint “Deployed” log message verbosity from DEBUG to INFO, and updated uv.lock for runpod-flash v1.9.1.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/runpod_flash/core/discovery.py Adds single-pass discover_directory and shared _SKIP_DIRS; updates directory-scan skip logic.
src/runpod_flash/cli/commands/run.py Switches CLI resource discovery to discover_directory to avoid repeated fallbacks.
tests/unit/test_discovery.py Adds unit tests covering discover_directory behavior (dedup, skip dirs, nested, syntax errors, regression).
src/runpod_flash/core/resources/load_balancer_sls_resource.py Promotes “Deployed” log line to INFO.
uv.lock Bumps runpod-flash to 1.9.1 in the lockfile.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Fix is correct — root cause properly addressed

The original bug: _discover_resources created a new ResourceDiscovery instance per file. Each instance had its own cache and no cross-file state, so files without @remote triggered the _scan_project_directory fallback — which scanned all files again. With 3 files and 1 resource, the fallback fired twice and the resource appeared 3 times.

The fix — a single discover_directory pass with seen_ids deduplication keyed on resource_id — is the right approach. resource_id is an MD5 of resource_type + full config JSON, so two configs with the same name but different fields produce different IDs and are correctly kept separate.


2. Bug: file_path.parts uses absolute path — projects under skip-named dirs break

Both the new discover_directory and the updated _scan_project_directory use:

if not any(skip in p.parts for skip in _SKIP_DIRS)

p is an absolute Path from rglob, so p.parts includes every component of the machine path. The original _scan_project_directory used file_path.relative_to(project_root) first.

If a project lives at /home/user/build/myapp/ or /workspace/dist/my-service/, every Python file's absolute parts will contain "build" or "dist" — and all files will be silently skipped. flash run would report 0 resources with no error.

The test suite doesn't catch this because tmp_path is under /tmp/pytest-xxx/ which contains no skip names. The test_all_skip_dirs_are_excluded test only verifies that skip dirs inside the project root work.

Fix: use p.relative_to(project_root).parts instead of p.parts in both places:

# discover_directory
if not any(skip in p.relative_to(project_root).parts for skip in _SKIP_DIRS)

# _scan_project_directory (same change)
if any(skip in file_path.relative_to(project_root).parts for skip in _SKIP_DIRS)

Worth adding a test: project rooted at a tmp_path whose own absolute path contains a skip dir component (e.g., create a subdirectory named build as the project root and verify resources inside it are found).


3. _import_module accumulates in sys.modules without cleanup

Each discovered file is imported via importlib.util.spec_from_file_location and registered in sys.modules under its dotted name. Nothing removes it after discovery. For a flash run dev server that re-runs discovery on file change (via _watch_and_regenerate), modules accumulate in sys.modules on every hot-reload cycle. Pre-existing behavior in the old code, but discover_directory now imports every project file (not just those that import-chain from the entry point), so the growth per cycle is larger.

Not a blocker, but worth tracking as a known leak.


4. _path_to_module_name doesn't handle __init__.py

pkg/__init__.py produces module name pkg.__init__ from ".".join(rel.with_suffix("").parts). This is syntactically unusual and could conflict with the real pkg package if it's already in sys.modules. In practice, Flash projects with @remote in __init__.py are uncommon, but it's an unhandled edge case.


5. Tests are solid overall

The 12 new tests cover the important cases: deduplication, skip dirs (parameterized across all _SKIP_DIRS), nested dirs, syntax errors, regression for the 3x bug, duplicate stems. The regression test is particularly good — it directly reproduces the original failure scenario.

Two gaps:

  • No test for the absolute-path issue described in §2 above
  • No test for __init__.py files in packages

Nits

  • _scan_project_directory retains project_root = self.entry_point.parent and is now using file_path.parts — if entry_point.parent is inside a skip-named directory, same problem applies. Same fix.
  • The quick-skip heuristic if "@remote" not in content and "Endpoint(" not in content would produce false positives for files with these strings in comments or docstrings — harmless (extra import attempt, no incorrect results), but worth noting.

Verdict

Correct fix for AE-2456. The resource_id-based deduplication approach is solid. One genuine bug introduced by the refactor: switching from relative_to(project_root) to file_path.parts for the skip-dir check means projects living under a directory named build, dist, or similar will silently discover zero resources. Should be fixed before merge; the change is a one-liner in two places.

@KAJdev
Copy link
Contributor

KAJdev commented Mar 13, 2026

shouldn't we prioritize removing AST scanning to fix these issues?

@KAJdev
Copy link
Contributor

KAJdev commented Mar 13, 2026

#265

deanq added 3 commits March 17, 2026 12:28
_discover_resources created a new ResourceDiscovery per .py file, each
with its own cache. Files without @Remote fell back to directory scanning,
rediscovering the same resource. Replaced with discover_directory classmethod
that scans all files once with resource_id deduplication.

Extracted _SKIP_DIRS constant shared by discover_directory and
_scan_project_directory.
- _find_resource_config_vars accepts optional source param so
  discover_directory passes already-read content instead of re-reading.
- _import_module accepts optional module_name; discover_directory passes
  a path-derived dotted name to prevent sys.modules collisions between
  files with the same stem (e.g. pkg_a/worker.py vs pkg_b/worker.py).
- Added _path_to_module_name helper and test for duplicate-stem case.
@deanq deanq force-pushed the deanq/ae-2456-fix-discovery-of-resources-is-multiplied branch from 17cedc3 to 6428332 Compare March 17, 2026 20:00
@deanq
Copy link
Member Author

deanq commented Mar 18, 2026

#265 will address these

@deanq deanq closed this Mar 18, 2026
Copy link
Contributor

@runpod-Henrik runpod-Henrik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up on prior review

Good additions in the third commit — _path_to_module_name prevents sys.modules collisions between files with the same stem, and the source parameter eliminates the double file read. Both were flagged in the prior review.

The absolute-path skip-dir check (item 2 from the prior review) is still unresolved:

# discover_directory — still uses p.parts (absolute path)
if not any(skip in p.parts for skip in _SKIP_DIRS)

p is an absolute path from rglob, so p.parts includes every component of the machine path. A project rooted at /home/user/build/myapp/ has "build" in every file's parts — all files are silently skipped and flash run reports zero resources with no error. Same issue exists in _scan_project_directory.

The fix is a one-liner in both places:

if not any(skip in p.relative_to(project_root).parts for skip in _SKIP_DIRS)

This was the only blocking item from the prior review. Everything else is PASS.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants