Skip to content

PERF: Lazy-load CLI subcommands to fix 70s+ startup on Windows#1241

Open
Luay-Sol wants to merge 4 commits intoSolaceLabs:mainfrom
Luay-Sol:perf/lazy-cli-imports
Open

PERF: Lazy-load CLI subcommands to fix 70s+ startup on Windows#1241
Luay-Sol wants to merge 4 commits intoSolaceLabs:mainfrom
Luay-Sol:perf/lazy-cli-imports

Conversation

@Luay-Sol
Copy link
Copy Markdown

Problem

On air-gapped Windows Server 2022 environments, every SAM CLI command takes 75–80 seconds to return — including sam --version and sam --help.

Before (Windows Server 2022, air-gapped, Python 3.13):

sam --version       76.11s
sam --help          76.52s
sam run --help      80.89s
sam add --help      77.16s
sam init --help     75.46s
sam plugin --help   77.52s
sam tools --help    76.37s

Root cause

cli/main.py eagerly imports all 8 subcommand modules at startup. This triggers a transitive import chain that loads heavyweight ML/cloud packages regardless of which command the user actually runs:

main.py
  → from cli.commands.init_cmd import init
    → init_cmd/__init__.py imports all step modules
      → broker_step.py → config_portal.backend.common
      → web_init_step → config_portal.backend.server
        → import litellm                              (~5s)
        → from solace_agent_mesh.agent.tools.registry
          → agent/tools/__init__.py imports ALL tool modules
            → general_agent_tools → markitdown → magika → onnxruntime (~2s)
            → (transitively) google.adk → google.cloud.aiplatform   (~62s)
Package Windows import time
google.adk (→ google.cloud.aiplatform) 61.83s
litellm 4.90s
markitdown (→ magikaonnxruntime) 1.83s

Even sam --version pays this ~76s cost because all imports run at module load time, before Click parses the --version flag.

Solution

Introduce LazyGroup, a drop-in click.Group subclass that defers subcommand imports until the specific command is actually invoked:

  • sam --version / sam --helpzero subcommand imports
  • sam run --help → only imports run_cmd
  • sam init --help → shows cached help text, no heavy import
  • sam init (actual invocation) → imports init_cmd on demand

The same pattern is applied to nested groups (sam add, sam tools) so their --help also avoids heavyweight imports.

After (same Windows Server 2022 environment):

sam --version       0.27s   (282x faster)
sam --help          0.23s   (333x faster)
sam run --help      1.29s   ( 63x faster)
sam add --help      0.24s   (321x faster)
sam init --help     0.25s   (302x faster)
sam plugin --help   1.05s   ( 74x faster)
sam tools --help    0.24s   (318x faster)

Files changed

File Change
cli/lazy_group.py New — Reusable LazyGroup class with format_commands override
cli/main.py Replace eager from ... import + add_command() with LazyGroup
cli/commands/add_cmd/__init__.py Replace eager sub-imports with LazyGroup for agent/gateway/proxy
cli/commands/tools_cmd.py Replace @tools.command("list") registration with LazyGroup for list

How LazyGroup works

from cli.lazy_group import LazyGroup

_COMMANDS = {
    "run":  ("cli.commands.run_cmd:run",   "Run the application."),
    "init": ("cli.commands.init_cmd:init", "Initialize a project."),
}

@click.group(cls=LazyGroup, lazy_commands=_COMMANDS)
def cli():
    pass
  • list_commands() returns command names from the lazy_commands dict — no imports
  • format_commands() renders help text from the dict — no imports
  • get_command() calls importlib.import_module() only when the command is invoked, then caches it

Backward compatibility

  • All commands work identically when actually invoked
  • sam tools list, sam init --gui, sam add agent --gui all function as before
  • The heavy imports still happen — just deferred to when the feature is used
  • No changes to any command's interface, options, or behavior
  • LazyGroup is a strict superset of click.Group — passes through to super() for any non-lazy commands

Testing

  • All commands verified on macOS (Python 3.11) and Windows Server 2022 (Python 3.13)
  • sam tools list correctly loads the tool registry and displays all tools
  • sam init --gui correctly launches the web portal
  • sam add agent --help shows all options
  • sam run starts the application normally
  • No regressions in existing behavior

Copy link
Copy Markdown
Collaborator

@efunneko efunneko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: PERF: Lazy-load CLI subcommands to fix 70s+ startup on Windows

Summary

Introduces a LazyGroup click.Group subclass that defers subcommand imports until the specific command is invoked. This eliminates the ~76s startup penalty on air-gapped Windows environments caused by transitive imports of heavyweight packages (google.adk, litellm, markitdown/onnxruntime). Applied to the main CLI group, add subgroup, and tools subgroup. The init_cmd module also uses __getattr__ for lazy module-level attribute loading.

Architecture

Well-designed. The LazyGroup class is a clean, reusable abstraction that slots naturally into Click's extension points. Key positives:

  • Uses Click's own cls= parameter for groups — zero monkey-patching
  • Properly delegates to super() for non-lazy commands, making it a true superset of click.Group
  • The __getattr__ approach in cli/commands/init_cmd/__init__.py correctly handles the constraint that tests mock attributes on this module (e.g., mocker.patch("cli.commands.init_cmd.broker_setup_step")) — these trigger __getattr___load_all() and work transparently

Minor gap: plugin_cmd and task_cmd still use eager imports. These are lighter-weight than init_cmd but could be candidates for a follow-up if needed.

Code Quality

Generally clean. A few observations:

  1. init_cmd/__init__.py:136-155run_init_flow manually rebinds every function from mod.__dict__ into local variables. This is verbose and fragile — if a new step is added, two places need updating (the _load_all() bindings AND the local rebindings in run_init_flow). A simpler approach would be to just call _load_all() and use module-level names directly since __getattr__ handles first-access.

  2. lazy_group.py:99 — The format_commands truncation logic manually reimplements Click's help text truncation. This works but could diverge from Click's internal formatting if Click changes. Low risk but worth noting.

  3. lazy_group.py:121 — The broad except Exception is acceptable since all import paths are hardcoded, but the return None means a broken module path silently degrades to "no such command" for the user while only logging to debug. Consider whether a user-visible error message would be more helpful (e.g., "Command 'init' failed to load — see logs").

  4. init_cmd/__init__.py:66-133_build_defaults() re-imports the same modules that _load_all() already imported. This is harmless (Python caches modules) but adds unnecessary duplication. Since _build_defaults is only called from within _load_all, it could just use the already-imported names.

Security

No security concerns. All import paths are hardcoded string literals in source code. There is no mechanism for user input to influence module loading paths. The importlib.import_module() usage is safe given the hardcoded inputs.

Test Coverage

Adequate but incomplete:

  • test_main.py was updated to use list_commands() via Context instead of directly accessing cli.commands dict — this correctly adapts to LazyGroup
  • Existing tests/unit/cli/commands/init_cmd/test___init__.py tests use mocker.patch("cli.commands.init_cmd.broker_setup_step") which works with __getattr__
  • Missing: No unit tests for LazyGroup itself. Key behaviors that should be tested:
    • list_commands() returns lazy + eager commands sorted
    • get_command() defers import until called
    • get_command() caches resolved commands
    • format_commands() uses help text from dict without importing
    • _resolve_import() handles invalid paths gracefully
    • _resolve_import() handles missing attributes gracefully
  • Missing: No test verifying that sam --help does NOT trigger subcommand imports (the core value proposition)

Recommendations

  1. [Blocking] Add unit tests for LazyGroup — This is the core new abstraction and should have its own test suite covering the behaviors listed above.

  2. Simplify run_init_flow local rebindings — The manual mod.xxx → local variable pattern at lines 142-155 is fragile. Just use module-level names after _load_all().

  3. Deduplicate imports in _build_defaults — Since it's called from _load_all(), pass the already-imported names or use module globals.

  4. Consider user-facing error on import failure — A command silently disappearing is confusing. At minimum, get_command could click.echo a warning.

Verdict

Request Changes — The implementation is architecturally sound and the performance improvement is significant. The main gap is the lack of unit tests for LazyGroup itself, which is the key new abstraction. The code quality items are suggestions, not blockers. Adding LazyGroup tests would make this ready to approve.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants