Skip to content

fix(viscy-utils): fall back to TensorBoard logger when wandb is absent#472

Merged
ieivanov merged 4 commits into
mainfrom
fix/cli-default-logger-fallback-tensorboard
Jun 29, 2026
Merged

fix(viscy-utils): fall back to TensorBoard logger when wandb is absent#472
ieivanov merged 4 commits into
mainfrom
fix/cli-default-logger-fallback-tensorboard

Conversation

@ieivanov

Copy link
Copy Markdown
Contributor

Summary

VisCyCLI.add_arguments_to_parser unconditionally set the default trainer.logger to a WandbLogger. But wandb is only an optional dependency (the eval extra / test group), not a core one. So on a base install, every trainer-building subcommand — predict, preprocess, export, precompute, convert_to_anndata — crashed during class instantiation with:

Problem with given class_path 'lightning.pytorch.loggers.WandbLogger':
    Requirement 'wandb>=0.12.10' not met.

Fix

Pick the default logger at runtime:

  • WandbLogger when wandb is importable (unchanged behavior for users with the eval extra),
  • otherwise TensorBoardLogger (tensorboard is already a core dependency), so the CLI works on a base install.

before_instantiate_classes_configure_wandb_logger already no-ops when the resolved logger isn't a WandbLogger (it checks class_path), so the TensorBoard fallback is safe.

Tests

Added unit tests for the new _default_logger helper (monkeypatching importlib.util.find_spec) covering both branches. Full test_cli.py passes; ruff check/ruff format clean.

Closes #471

VisCyCLI.add_arguments_to_parser unconditionally set the default
trainer.logger to a WandbLogger. wandb is only an optional dependency
(the `eval` extra / `test` group), not a core one, so on a base install
every trainer-building subcommand (predict, preprocess, export,
precompute, convert_to_anndata) crashed during class instantiation with
"Requirement 'wandb>=0.12.10' not met".

Pick the default logger at runtime: WandbLogger when wandb is importable,
otherwise TensorBoardLogger (tensorboard is a core dependency), so the CLI
is usable without the eval extra. before_instantiate_classes ->
_configure_wandb_logger already no-ops for non-W&B loggers, so the
TensorBoard fallback is safe.

Fixes #471

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ieivanov ieivanov requested review from edyoshikun and srivarra June 17, 2026 19:07
_WANDB_RUN_TIMESTAMP_FORMAT = r"%Y%m%d-%H%M%S"


def _default_logger():

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fix is a bit dangerous because it won't fail. Intestead it will try to find the second option. I am going to fix this by just making wandb a dependency

@edyoshikun edyoshikun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the changes. review and test @ieivanov if this works for your issue.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is intended to address viscy-utils CLI crashes on base installs caused by defaulting trainer.logger to WandbLogger while wandb is not installed. The current change set resolves the crash by making wandb a core dependency (rather than selecting a non-wandb default logger at runtime as described in the PR summary).

Changes:

  • Add wandb to packages/viscy-utils core dependencies.
  • Remove wandb from optional-dependencies.eval.
  • Remove wandb from the test dependency group.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/viscy-utils/pyproject.toml
Comment thread packages/viscy-utils/pyproject.toml Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@ieivanov

Copy link
Copy Markdown
Contributor Author

Yup, that works too, thank you!

torch.multiprocessing fork() of the multi-threaded pytest parent aborts
on macOS once the Metal/MPS Objective-C runtime is initialized:

  objc[...]: +[MPSGraphObject initialize] may have been in progress in
  another thread when fork() was called. ... Crashing instead.

This is a hard SIGABRT/SIGSEGV in the fork child, not a flake — it
reproduces deterministically. The tests already skip on Windows (no
fork start_method); mirror that with a Darwin skip. The Linux CI runner
still exercises these real-fork DDP paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ieivanov

Copy link
Copy Markdown
Contributor Author

Why the macOS DDP tests are now skipped

The macOS (and fail-fast-canceled Windows) Test Data jobs were failing on 3 parametrizations of test_combined_ddp.py::test_batched_concat_real_ddp_iter_does_not_hang with SIGSEGV/SIGABRT in the torch.multiprocessing fork child.

The real cause (reproduced deterministically on macOS):

objc[...]: +[MPSGraphObject initialize] may have been in progress in another thread
when fork() was called. We cannot safely call it or ignore it in the fork() child
process. Crashing instead.

This is the macOS Objective-C / Metal (MPS) fork-safety abort. The test intentionally uses start_method="fork" (spawn can't re-resolve modules under pytest's --import-mode=importlib), and forks a multi-threaded parent. Once torch initializes the MPS runtime, fork() of that process is fatal on Darwin — it's a hard crash, not a flake.

This isn't caused by this PR's logic. The PR only moves wandb into core deps and regenerates uv.lock (torch stays at 2.12.0); main passed these tests by timing luck, and the dependency reshuffle tipped MPS init across the fork boundary so it now crashes deterministically.

Fix: the test already skips on Windows ("fork" not in mp.get_all_start_methods()); I mirrored that with a sys.platform == "darwin" skip. These real-fork DDP integration tests now run only on the Linux CI runner, which passes them and is where actual DDP training happens.

@edyoshikun edyoshikun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the changes. review and test @ieivanov if this works for your issue.

@ieivanov ieivanov merged commit 4b62365 into main Jun 29, 2026
28 checks passed
@ieivanov ieivanov deleted the fix/cli-default-logger-fallback-tensorboard branch June 29, 2026 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

viscy CLI crashes on base install: default trainer.logger requires optional wandb

4 participants