Skip to content

Regenerate single-file output when its source grows; fix success-message accuracy (#221 follow-up)#253

Open
cboos wants to merge 3 commits into
mainfrom
dev/single-file-staleness
Open

Regenerate single-file output when its source grows; fix success-message accuracy (#221 follow-up)#253
cboos wants to merge 3 commits into
mainfrom
dev/single-file-staleness

Conversation

@cboos

@cboos cboos commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #221/#237. Two user-visible staleness/messaging bugs in the
default single-file conversion path, plus message-accuracy fixes surfaced
while reviewing the change.

Bug 1 — stale HTML on a grown single-file source

claude-code-log <file>.jsonl [--detail low] writes <file>[.low].html, then
re-runs were skipped purely on the embedded tool version
(renderer.is_outdated), which ignores source freshness. So an in-progress
session that grew between runs was wrongly reported "current" and served stale
HTML. #237 only forced regeneration for the explicit -o file case.

Fix: add a source-mtime term to the single-file/no-cache branch —
regenerate when the source file is newer than the existing output. Scoped to a
file source (a directory source has no single mtime and is handled by the
cache path). The version-marker check is kept for tool-version bumps.

Bug 2 — misleading success message

On a skip, the CLI printed Successfully converted … on top of the
converter's … is current, skipping regeneration. The converter now reports
whether it actually (re)wrote output via a non-breaking optional regenerated
out-param (the Path return is unchanged — ~20 callers unaffected); the CLI
prints the success line only on a real regeneration. Applied symmetrically to
the directory branch (Successfully combined …).

Message-accuracy fix

The regeneration gate initially tracked only the combined write, so
--combined no (which writes only per-session files, never a combined) fell
completely silent despite producing every session file — and the per-session
count was dropped whenever the combined was current but a session was
rewritten. The gate now also counts _generate_individual_session_files's
return, i.e. "combined or any session file (re)written". Pure skips still
stay quiet.

Tests

  • TestSingleFileStaleness (test_output_explicit.py): grown source regenerates;
    unchanged source skips with no redundant success line; directory no-op skip
    suppresses the combined success line; --combined no confirms its
    per-session work.
  • Flipped test_single_file_mode_regeneration_behavior, which previously
    enshrined the stale behavior as "expected".

All four behavioral tests were verified RED on the prior code and GREEN after.

Not addressed here

--no-cache on a directory with a grown source can still serve stale HTML
(the source-mtime term is scoped to file sources). This is pre-existing, not
introduced here, and tracked separately.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved regeneration handling so the CLI no longer shows a success message when output is already current and nothing was rebuilt.
    • Default single-file output now refreshes correctly when the source changes.
    • Directory-based runs now suppress misleading success text during skip scenarios while still reporting when session files are generated.
  • Tests
    • Expanded coverage for stale vs. current outputs, single-file regeneration, directory skips, and combined-off session file generation.

cboos and others added 3 commits July 1, 2026 00:17
…essage (#221 follow-up)

The default single-file output path (no `-o`, detail-suffixed naming like
`<file>.low.html`) rode the version-only staleness skip: `is_outdated()`
compares only the embedded tool version, not the source's freshness. So a
single `.jsonl` that grew between runs (an in-progress session re-exported
with the same tool version) was wrongly reported "current" and served stale
HTML. #237 only forced regeneration for the explicit `-o file` case.

Bug 1: add a source-mtime term to the single-file `else` branch in
convert_jsonl_to — regenerate when the source file is newer than the existing
output. Scoped to a file source (a directory source has no single mtime and is
already handled by the cache / `cache_was_updated` path). The version-marker
check is kept for tool-version bumps.

Bug 2: on a skip the CLI printed "Successfully converted" on top of the
converter's own "is current, skipping regeneration" line. convert_jsonl_to now
reports whether it actually (re)wrote the output via a non-breaking optional
`regenerated` out-parameter (the `Path` return contract that ~20 callers rely
on is unchanged); the CLI prints the success line only on a real regeneration.

Tests: flip test_single_file_mode_regeneration_behavior (it enshrined the stale
behavior as "expected") to assert regeneration on a grown source; add
TestSingleFileStaleness covering the exact CLI repro — grown source regenerates,
unchanged source skips with no redundant success line. Both fail on the prior
code.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ty, mtime rationale

Non-blocking review follow-ups on the #221 single-file-staleness fix:

1. Directory-branch symmetry: the skip-suppression of the success line was
   gated only on the single-file branch, so a directory whose combined output
   is current (Phase-1b early-exit) still printed "Successfully combined N
   transcript files ... and generated M individual session files" on a pure
   skip. The `regenerated` out-param already carries the bool — extend the
   same guard to the directory branch so skip-messaging is consistent for
   single-file AND directory inputs.

2. Readability: `regenerated != [False]` -> `any(regenerated)`.

3. Mtime rationale: add a comment explaining why the *output* file's mtime is
   the correct, sufficient, persistence-free basis in single-file mode (no
   cache DB to store the source mtime as the directory path does), and that
   both share the same filesystem-granularity limit.

Test: add test_directory_skip_suppresses_success_line pinning note 1 (RED on
the prior file-only guard, which printed the misleading combined line on a
no-op directory re-run).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bined no silent run)

The skip-suppression added earlier gated the CLI success line on the
COMBINED-output write only (`did_regenerate`). But per-session generation is
an independent axis: `--combined no` writes only individual session files and
never a combined transcript, so `did_regenerate` stayed False and the run
printed NOTHING despite producing every session file — a regression vs. the
prior unconditional "Successfully combined N transcript files ... and
generated M individual session files" line. The same gap silently dropped the
session count whenever the combined was current but a session was (re)written.

`_generate_individual_session_files` already returns a regenerated count; it
was discarded. Capture it and OR it into the `regenerated` out-param so the
gate means "combined OR any session file (re)written". Pure skips (single-file
or a fully-current directory) still report False and stay quiet, preserving
the original double-message fix.

Surfaced by a self-review pass (/code-review) on the branch; not caught by the
diff-scoped manual review because it only manifests under --combined no /
session-only regeneration, which the review didn't execute.

Test: test_combined_no_confirms_session_generation — `--combined no` writes
session files AND confirms the work (RED before: the run printed only
Processing/Loading with no summary line).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

convert_jsonl_to() gains an optional regenerated out-parameter populated across its early-exit, paginated, non-paginated, and per-session paths. The CLI uses this to compute was_regenerated and suppress success messages when output was skipped. Tests are updated and extended to verify this behavior across single-file, directory, and combined-no scenarios.

Changes

Regeneration tracking and messaging

Layer / File(s) Summary
Converter regeneration tracking
claude_code_log/converter.py
convert_jsonl_to accepts a regenerated out-parameter, tracks did_regenerate/sessions_regenerated across cache-hit early exit, paginated and non-paginated generation (with new source_is_newer mtime check), and per-session file generation.
CLI success message gating
claude_code_log/cli.py
CLI passes a regenerated list to convert_jsonl_to, computes was_regenerated, and only prints success messages when regeneration actually occurred.
Regeneration behavior tests
test/test_html_regeneration.py, test/test_output_explicit.py
Existing single-file-mode test updated to expect regeneration on newer source instead of version-marker-based skip; new TestSingleFileStaleness class covers grown/unchanged source, directory rerun, and --combined no scenarios.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant convert_jsonl_to
  participant FileSystem

  CLI->>convert_jsonl_to: convert_jsonl_to(input_path, output_path, regenerated=[])
  convert_jsonl_to->>FileSystem: compare output mtime vs input/version marker
  alt all current, no regeneration needed
    convert_jsonl_to->>convert_jsonl_to: regenerated.append(False)
    convert_jsonl_to-->>CLI: return output_path
    CLI->>CLI: was_regenerated = False, suppress success print
  else regeneration required
    convert_jsonl_to->>FileSystem: write combined HTML and/or session files
    convert_jsonl_to->>convert_jsonl_to: regenerated.append(True)
    convert_jsonl_to-->>CLI: return output_path
    CLI->>CLI: was_regenerated = True, print success message
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • daaain/claude-code-log#237: Both PRs modify convert_jsonl_to()'s regeneration/skip logic, directly overlapping in the converter's caching behavior.

Poem

A rabbit checked the mtime twice,
"Skip or write?" — such careful advice.
No more false cheers when nothing's new,
Just honest hops when work is true. 🐇
Thump thump — regenerated, verified!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main changes: single-file regeneration on source growth and corrected success-message behavior.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev/single-file-staleness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
claude_code_log/cli.py (1)

1343-1361: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Handle --combined no as a session-only success path.

When write_combined is false, convert_jsonl_to() deliberately returns output_path without writing that combined file. This branch is now reachable because was_regenerated becomes true from session-file writes, so the CLI reports Successfully combined ... to {output_path} and click.launch() can target a path that does not exist.

Suggested fix
-        if not was_regenerated:
-            pass
-        elif input_path.is_file():
+        if not was_regenerated:
+            pass
+        elif input_path.is_file():
             click.echo(f"Successfully converted {input_path} to {output_path}")
         else:
             jsonl_count = len(list(input_path.glob("*.jsonl")))
-            if write_individual:
+            if write_combined and write_individual:
                 ext = get_file_extension(output_format)
                 session_files = list(input_path.glob(f"session-*.{ext}"))
                 click.echo(
                     f"Successfully combined {jsonl_count} transcript files from {input_path} to {output_path} and generated {len(session_files)} individual session files"
                 )
-            else:
+            elif write_combined:
                 click.echo(
                     f"Successfully combined {jsonl_count} transcript files from {input_path} to {output_path}"
                 )
+            else:
+                ext = get_file_extension(output_format)
+                session_files = list(input_path.glob(f"session-*.{ext}"))
+                click.echo(
+                    f"Successfully generated {len(session_files)} individual session files from {jsonl_count} transcript files in {input_path}"
+                )
 
-        if open_browser:
+        if open_browser and write_combined:
             click.launch(str(output_path))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude_code_log/cli.py` around lines 1343 - 1361, The success handling in
cli.py’s convert_jsonl_to flow is treating session-only regeneration as a
combined-output success, which can report a nonexistent output_path and launch
it in the browser. Update the branch around was_regenerated,
input_path.is_file(), and write_individual so that when write_combined is false
the CLI uses a session-only success message and only calls click.launch on a
real existing file path, while preserving the combined-file messaging when
output_path was actually written.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/test_html_regeneration.py`:
- Around line 331-352: The regeneration assertion in this test is too dependent
on `time.sleep(0.1)` and can flake on coarse mtime filesystems; update the
`convert_jsonl_to_html` setup so the source file is explicitly made newer than
the HTML output using `os.utime(...)`, matching the approach already used in
`test_output_explicit.py`. Keep the checks around `skip_calls`,
`output_file.stat().st_mtime`, and the `"Single file mode test."` content, but
ensure the test advances the source timestamp in a filesystem-safe way before
calling `convert_jsonl_to_html`.

---

Outside diff comments:
In `@claude_code_log/cli.py`:
- Around line 1343-1361: The success handling in cli.py’s convert_jsonl_to flow
is treating session-only regeneration as a combined-output success, which can
report a nonexistent output_path and launch it in the browser. Update the branch
around was_regenerated, input_path.is_file(), and write_individual so that when
write_combined is false the CLI uses a session-only success message and only
calls click.launch on a real existing file path, while preserving the
combined-file messaging when output_path was actually written.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f13ccd68-ef0e-4edc-8d40-7b4af1b1b440

📥 Commits

Reviewing files that changed from the base of the PR and between 8c974bf and 4d7f498.

📒 Files selected for processing (4)
  • claude_code_log/cli.py
  • claude_code_log/converter.py
  • test/test_html_regeneration.py
  • test/test_output_explicit.py

Comment on lines +331 to +352
# Modify the source: even without a cache, a single file that grows
# (e.g. an in-progress session) MUST regenerate — the output is now
# older than its source (issue #221 follow-up). Previously single
# file mode wrongly skipped on the version marker alone.
time.sleep(0.1)
new_message = '{"type":"user","timestamp":"2025-07-03T16:40:00Z","parentUuid":null,"isSidechain":false,"userType":"human","cwd":"/tmp","sessionId":"test_session","version":"1.0.0","uuid":"single_file_msg","message":{"role":"user","content":[{"type":"text","text":"Single file mode test."}]}}\n'
with open(jsonl_file, "a", encoding="utf-8") as f:
f.write(new_message)

# Single file mode doesn't have cache, so it should still skip based on version
with patch("builtins.print") as mock_print:
convert_jsonl_to_html(jsonl_file)
mock_print.assert_any_call(
"HTML file single_test.html is current, skipping regeneration"
)

# Verify file wasn't regenerated (this is expected behavior for single file mode)
assert output_file.stat().st_mtime == original_mtime
# No "skipping" line: it regenerated because the source is newer.
skip_calls = [
c
for c in mock_print.call_args_list
if c.args and "skipping regeneration" in str(c.args[0])
]
assert not skip_calls, f"unexpected skip after source grew: {skip_calls}"

# The output was rewritten (newer mtime) and now contains the new message.
assert output_file.stat().st_mtime > original_mtime
assert "Single file mode test." in output_file.read_text(encoding="utf-8")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Make this regeneration check coarse-timestamp safe.

This test only sleeps 0.1s before appending, but the production path keys off mtimes. On 1-second-granularity filesystems that can leave the source and HTML in the same tick, so this rerun still skips and stat().st_mtime > original_mtime flakes. test/test_output_explicit.py already avoids that by forcing the source mtime strictly newer with os.utime(...); this case should do the same.

🧰 Tools
🪛 ast-grep (0.44.0)

[warning] 336-336: File path is request-/variable-derived; validate and normalize to prevent path traversal.
Context: open(jsonl_file, "a", encoding="utf-8")
Note: [CWE-22] Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal').

(open-filename-from-request)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test_html_regeneration.py` around lines 331 - 352, The regeneration
assertion in this test is too dependent on `time.sleep(0.1)` and can flake on
coarse mtime filesystems; update the `convert_jsonl_to_html` setup so the source
file is explicitly made newer than the HTML output using `os.utime(...)`,
matching the approach already used in `test_output_explicit.py`. Keep the checks
around `skip_calls`, `output_file.stat().st_mtime`, and the `"Single file mode
test."` content, but ensure the test advances the source timestamp in a
filesystem-safe way before calling `convert_jsonl_to_html`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant