Merge recent updates from dev branch#79
Conversation
|
Warning Review limit reached
More reviews will be available in 58 minutes and 10 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR integrates wall-clock checkpointing/resume for Sampler training-loop boundaries, introduces empirical per-dimension step-size adaptation across MALA/HMC/GRW kernels and bundles, restructures documentation with a new architecture guide and consolidated guides index, modernises API stub generation and build tooling, and expands hyperparameter documentation. ChangesSampler checkpointing and resume support
Per-dimension step-size adaptation strategy and kernel integration
Documentation restructuring, API generation, and content updates
Build, deployment, and configuration updates
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…API stub generation logic
…in TOML Co-authored-by: Copilot <copilot@github.com>
…ve deprecated parameters
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (2)
src/flowMC/Sampler.py (2)
295-305: ⚡ Quick winBroad exception handler might hide serialization issues.
The fallback from pickle to equinox serialization catches all exceptions (line 302). This could silently hide:
- Out-of-memory errors during large array serialization
- File system errors (disk full, permissions)
- Bugs in custom
__getstate__/__setstate__implementationsWhen the fallback succeeds, the user never learns that pickle failed, making it harder to diagnose issues or optimize serialization.
🔍 Suggested improvement
Log the exception before falling back:
try: pickle.dumps(v, protocol=pickle.HIGHEST_PROTOCOL) resources_to_save[k] = ("pkl", v) - except Exception: + except Exception as e: + logger.debug( + "Resource '%s' not pickle-serializable (%s), using equinox fallback.", + k, type(e).__name__ + ) buf = io.BytesIO() eqx.tree_serialise_leaves(buf, v) resources_to_save[k] = ("eqx", buf.getvalue())🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/flowMC/Sampler.py` around lines 295 - 305, The broad except in the serialization loop over self.resources (inside resources_to_save creation) should record the pickle failure before falling back to eqx.tree_serialise_leaves: change "except Exception:" to "except Exception as e:" and log the exception (including the resource key k, the fact pickle failed, and the exception/traceback) using the module/class logger (or logging.getLogger(__name__)), then proceed to the existing fallback that uses eqx.tree_serialise_leaves; reference the variables/functions resources_to_save, self.resources, pickle.dumps and eqx.tree_serialise_leaves when making this change.
233-246: 💤 Low valueSilent resource restoration failure could hide issues.
When a resource cannot be restored (line 244), only a warning is logged. The sampling will continue with a partially initialized resource, which might cause:
- Incorrect results if the resource contained critical state
- Cryptic downstream failures when strategies access the unrestored resource
Consider logging at
WARNINGlevel with more context:else: - logger.warning("Cannot restore resource '%s' — skipping.", k) + logger.warning( + "Cannot restore resource '%s' (method=%s) — using freshly initialized resource. " + "This may affect sampling correctness if the resource contained important state.", + k, method + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/flowMC/Sampler.py` around lines 233 - 246, The current silent fallback when a resource cannot be restored leaves a partially initialized entry; replace the bare logger.warning in the else branch with a richer log and remove the bad entry from self.resources to avoid partially initialized state: change the else branch that currently does logger.warning("Cannot restore resource '%s' — skipping.", k) to log the resource key, the restoration method/value (e.g. include method and payload size/info) using logger.warning (or logger.error if you prefer stronger signal), and then ensure you delete or pop the key from self.resources (e.g. del self.resources[k] or self.resources.pop(k, None)) so downstream code doesn't see a stale placeholder; reference symbols: ckpt["resources"], self.resources, method, payload, logger, and eqx.tree_deserialise_leaves.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/guides/architecture.md`:
- Line 75: Update the hyperlink in the sentence "You can find the
hyperparameters of a resource, a strategy, or a resource-strategy bundle in the
[API docs](../api/flowMC/Sampler.md)." to point to the dedicated hyperparameter
reference guide (not the Sampler API page); replace the link target
../api/flowMC/Sampler.md with the canonical hyperparameter reference page (e.g.,
../guides/hyperparameter-reference.md or the project's official hyperparameters
guide) so the sentence correctly directs readers to the resource/strategy/bundle
hyperparameter documentation.
In `@src/flowMC/resource/kernel/Gaussian_random_walk.py`:
- Around line 117-124: The jaxtyping shape annotation strings in this file have
leading spaces (e.g. " n_dim", " n_dim 2") which causes Ruff F722; update all
occurrences to remove the leading space so shapes are valid strings (e.g.
"n_dim", "n_dim 2" → "n_dim", "n_dim 2" if the latter was intended as "n_dim 2"
vs probably "n_dim, 2"—confirm intended shape). In practice edit signatures and
type annotations for functions and attributes such as get_effective_dim_profile,
apply_per_dim_scaling, and any annotations on step_size, periodic_mask,
periodic_bounds, kernel parameter and return type hints to use shape strings
without a leading space (e.g. replace Float[Array, " n_dim"] with Float[Array,
"n_dim"] across the file).
In `@src/flowMC/resource/kernel/HMC.py`:
- Around line 214-227: Update the jaxtyping forward-reference shape strings to
remove the leading space so Ruff F722 is resolved: replace occurrences of the
shape annotation '" n_dim"' with '"n_dim"' in get_effective_dim_profile and
apply_per_dim_scaling (and any other similar annotations in this file) so the
forward annotations use exact shape identifiers (e.g., change Float[Array, "
n_dim"] -> Float[Array, "n_dim"]).
- Around line 220-224: The code crashes when calling jnp.finfo on an integer
condition_matrix and has invalid forward-type annotations; fix
get_effective_dim_profile and apply_per_dim_scaling by first ensuring
condition_matrix is a floating array (e.g., condition_matrix =
jnp.asarray(self.condition_matrix, dtype=jnp.float32) or using jnp.result_type
to pick a float) before calling jnp.finfo/eps and before any arithmetic, and
change the jaxtyping/forward annotations in get_effective_dim_profile and
apply_per_dim_scaling to valid Python strings (or enable from __future__ import
annotations) to avoid Ruff F722 syntax errors; update RQSpline_HMC /
RQSpline_HMC_PT construction to pass a float-typed condition_matrix (or ensure
jnp.full(..., dtype=jnp.float32)) so types remain consistent.
In `@src/flowMC/Sampler.py`:
- Around line 213-231: The fingerprint comparison in the resume check uses a
hardcoded absolute tolerance of 1e-6 (variables: ckpt_fp, current_fp,
logpdf_resource.log_pdf) which can produce false positives across platforms;
change this to use a configurable tolerance (e.g., add a constructor/initializer
parameter checkpoint_fingerprint_rtol or checkpoint_fingerprint_atol) and
replace the direct abs(current_fp - ckpt_fp) > 1e-6 test with a robust
comparison (e.g., math.isclose or an equivalent relative+absolute tolerance
check using the new parameter(s)) so callers can relax the threshold for
cross-platform resumes.
- Around line 103-107: The Sampler.__init__ block unconditionally calls
jax.config.update for jax_compilation_cache_dir and
jax_persistent_cache_min_compile_time_secs, which mutates global JAX state and
causes cache collisions when multiple Sampler instances use different outdir
values; change this to first read jax.config.read("jax_compilation_cache_dir")
(or equivalent) and if it is unset, set it to Path(outdir)/"jax_cache",
otherwise if it differs emit a runtime warning (or raise) informing the user
that JAX cache is process-global and only one Sampler should enable
checkpointing per process; update the docstring for Sampler and the
checkpoint_interval handling to reflect this behavior and reference the affected
symbols (Sampler.__init__, checkpoint_interval, outdir, jax.config.update,
jax_compilation_cache_dir).
In `@tests/unit/test_strategies.py`:
- Around line 1030-1037: The test reads the mutated resources after the second
adaptation call causing a false-positive no-op; capture the first-call step size
before calling strategy a second time by reading
resources["local_sampler"].step_size into step1 immediately after the first
strategy(self.key, resources, pos, {}) call (before computing resources2), then
call strategy again to get resources2 and read step2 from
resources2["local_sampler"].step_size; update the assertions to compare
initial_step vs step1 and step1 vs step2 as shown using jnp.allclose with the
existing tolerances so the test truly verifies that the second call is a near
no-op while the first call changed the step size.
---
Nitpick comments:
In `@src/flowMC/Sampler.py`:
- Around line 295-305: The broad except in the serialization loop over
self.resources (inside resources_to_save creation) should record the pickle
failure before falling back to eqx.tree_serialise_leaves: change "except
Exception:" to "except Exception as e:" and log the exception (including the
resource key k, the fact pickle failed, and the exception/traceback) using the
module/class logger (or logging.getLogger(__name__)), then proceed to the
existing fallback that uses eqx.tree_serialise_leaves; reference the
variables/functions resources_to_save, self.resources, pickle.dumps and
eqx.tree_serialise_leaves when making this change.
- Around line 233-246: The current silent fallback when a resource cannot be
restored leaves a partially initialized entry; replace the bare logger.warning
in the else branch with a richer log and remove the bad entry from
self.resources to avoid partially initialized state: change the else branch that
currently does logger.warning("Cannot restore resource '%s' — skipping.", k) to
log the resource key, the restoration method/value (e.g. include method and
payload size/info) using logger.warning (or logger.error if you prefer stronger
signal), and then ensure you delete or pop the key from self.resources (e.g. del
self.resources[k] or self.resources.pop(k, None)) so downstream code doesn't see
a stale placeholder; reference symbols: ckpt["resources"], self.resources,
method, payload, logger, and eqx.tree_deserialise_leaves.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fc34c21a-0d7c-4e7b-9ee3-a6c07736d824
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (43)
.github/workflows/docs.yml.gitignore.pre-commit-config.yamlCONTRIBUTING.mdREADME.mddocs/FAQ.mddocs/development_guide.mddocs/gen_api.pydocs/guides/architecture.mddocs/guides/bundles.mddocs/guides/hyperparameters.mddocs/guides/index.mddocs/index.mddocs/javascripts/mathjax.jsdocs/quickstart.mddocs/tutorials/dualmoon.ipynbdocs/tutorials/index.mddocs/tutorials/train_normalizing_flow.ipynbpyproject.tomlsrc/flowMC/Sampler.pysrc/flowMC/resource/buffers.pysrc/flowMC/resource/kernel/Gaussian_random_walk.pysrc/flowMC/resource/kernel/HMC.pysrc/flowMC/resource/kernel/MALA.pysrc/flowMC/resource/kernel/NF_proposal.pysrc/flowMC/resource/logPDF.pysrc/flowMC/resource/model/common.pysrc/flowMC/resource_strategy_bundle/RQSpline_GRW.pysrc/flowMC/resource_strategy_bundle/RQSpline_GRW_PT.pysrc/flowMC/resource_strategy_bundle/RQSpline_HMC.pysrc/flowMC/resource_strategy_bundle/RQSpline_HMC_PT.pysrc/flowMC/resource_strategy_bundle/RQSpline_MALA.pysrc/flowMC/resource_strategy_bundle/RQSpline_MALA_PT.pysrc/flowMC/strategy/adapt_step_size.pysrc/flowMC/strategy/optimization.pysrc/flowMC/strategy/parallel_tempering.pysrc/flowMC/strategy/take_steps.pysrc/flowMC/strategy/train_model.pytests/integration/test_periodic.pytests/integration/test_quickstart.pytests/unit/test_sampler.pytests/unit/test_strategies.pyzensical.toml
💤 Files with no reviewable changes (3)
- docs/development_guide.md
- .gitignore
- docs/tutorials/index.md
👮 Files not reviewed due to content moderation or server errors (2)
- docs/tutorials/train_normalizing_flow.ipynb
- docs/tutorials/dualmoon.ipynb
Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests
Chores