Skip to content

feat: cleanup_duplicates.py + Flex Query period warning#16

Merged
flowcool merged 23 commits into
mainfrom
staging
Jun 22, 2026
Merged

feat: cleanup_duplicates.py + Flex Query period warning#16
flowcool merged 23 commits into
mainfrom
staging

Conversation

@flowcool

@flowcool flowcool commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

  • cleanup_duplicates.py — outil de nettoyage des doublons créés lors du passage de la Flex Query IBKR d'une période courte à Last 365 Calendar Days. Patche les entrées manuelles avec leur IBKR#{tradeID} et supprime les copies dupliquées. Dry-run par défaut, log de sécurité avant toute mutation, probe read-only de l'endpoint avant d'agir.
  • README.md — avertissement critique sur le paramètre période de la Flex Query, description des deux classes d'échec silencieux (ventes skippées, réimports dupliqués), et pointeur vers cleanup_duplicates.py comme chemin de recovery.

Contexte

Bug découvert en prod : Flex Query configurée sur "Last Month" au lieu de "Last 365 Calendar Days". Conséquences : ventes récentes absentes du XML IBKR, et à la correction du paramètre, 90 activités réimportées en doublon avec les entrées manuelles existantes. Le script de cleanup a résolu les 26 doublons trades identifiés (correspondance symbol+type+qty+prix+date ±2j).

Test plan

  • Dry-run validé : 0 paires détectées après cleanup (dédup propre)
  • Sync manuel post-cleanup : New trade activities: 0, duplicates skipped: 104
  • Endpoint probe read-only vérifié avant toute mutation

🤖 Generated with Claude Code

https://claude.ai/code/session_0125mJBtEmStCj8rFFkrtGxb

Summary by CodeRabbit

  • New Features

    • Added manual trigger option for Docker image builds.
    • Introduced utility to clean up duplicate activities from IBKR imports.
  • Documentation

    • Added critical warning: use minimum 365-day period in Flex Query setup to prevent missed transactions and duplicates.
    • Documented recovery procedure for handling duplicate imports.
    • Updated CI/deployment and merge strategy documentation.

flowcool and others added 23 commits June 9, 2026 13:58
The container was running as root, giving any process inside full host
privileges if the container escaped. Creates a dedicated appuser,
transfers /app ownership, and switches to that user before the entrypoint.
The entrypoint writes /app/crontab at runtime, which still works because
appuser owns /app.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
>= constraints allow pip to silently pull a newer version on each image
rebuild, making two builds weeks apart potentially non-identical.
Pinning to the current stable releases (requests 2.32.3, PyYAML 6.0.2)
ensures every build produces the same environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review findings:
- --disabled-password creates a login-capable user; --system is the
  correct flag for a service account (no shell, no aging, UID<1000)
- VOLUME declaration moved after chown so layer order reflects intent
- Added comments documenting the /app/crontab write requirement and
  the mapping.yaml world-readable caveat for bind-mounts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review noted that == pinning doesn't cover transitive deps and there's no
audit trail. Added comment explaining the limitation and the date the
pins were last verified CVE-clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Monthly cadence to avoid noise. Covers:
- pip: requirements.txt (requests, PyYAML) + their transitive deps
- github-actions: docker/*, actions/checkout pinned to major versions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The original VOLUME before entrypoint.sh was left in when the second one
was added after chown. Docker ignores duplicates but it was confusing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- cleanup_duplicates.py: resolves duplicate activities created when
  switching IBKR Flex Query from a short period to Last 365 Calendar Days.
  Patches manual entries with their IBKR#{tradeID} comment (so future syncs
  recognise them) and deletes the duplicate IBKR#-synced copies.
  Dry-run by default; writes a full safety log before any mutation;
  verifies API endpoints with a read-only probe before touching anything.

- README.md: adds a critical warning on the Flex Query period setting,
  explains the two classes of silent failure (skipped sells, duplicate
  re-imports), and points to cleanup_duplicates.py as the recovery path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0125mJBtEmStCj8rFFkrtGxb
@flowcool flowcool merged commit b23c1f0 into main Jun 22, 2026
1 of 2 checks passed
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 23e9ad68-9052-425e-804d-3c5ab6317379

📥 Commits

Reviewing files that changed from the base of the PR and between e4bc0b9 and 9eb0334.

📒 Files selected for processing (4)
  • .github/workflows/docker-publish.yml
  • CLAUDE.md
  • README.md
  • cleanup_duplicates.py

📝 Walkthrough

Walkthrough

Adds cleanup_duplicates.py, a new Python CLI that matches IBKR#-tagged Ghostfolio activities to their manual duplicates, patches manual entry comments with the IBKR trade ID, and deletes the IBKR# copies. The README gains a critical warning about Flex Query date ranges. The Docker publish workflow gains a manual dispatch trigger and action version upgrades, with matching CLAUDE.md documentation.

Changes

Duplicate Activity Cleanup Script

Layer / File(s) Summary
Config, auth, date utils, and API mutation helpers
cleanup_duplicates.py
Establishes DATE_TOLERANCE, environment-backed config/header loading, bulk activity fetch from Ghostfolio, parse_date, a read-only endpoint verification probe, put_comment, delete_activity, and symbol_of helpers.
Main CLI matching logic and apply/dry-run execution
cleanup_duplicates.py
Implements main() with --apply flag parsing, activity partitioning into IBKR# vs manual sets, symbol/type/quantity/unitPrice matching with date tolerance, dry-run report output, JSON safety snapshot write, and the mutation loop with patched/deleted/error counters and sys.exit(1) on failure.
README critical warning and recovery instructions
README.md
Inserts a bold Critical note in Flex Query setup steps warning against periods shorter than 365 days, describes missing-sell and silent-duplicate failure modes, and documents the dry-run then --apply recovery path via cleanup_duplicates.py.

Docker Publish Workflow and CI Docs Updates

Layer / File(s) Summary
workflow_dispatch trigger and action version upgrades
.github/workflows/docker-publish.yml
Adds workflow_dispatch with a ref input defaulting to staging, and upgrades all five Docker-related GitHub Actions to newer major versions.
CLAUDE.md merge strategy and CI deployment docs
CLAUDE.md
Documents the merge-commits-only Git strategy and expands the CI/deployment section to describe main (push latest + Portainer redeploy) and staging (push staging + manual NAS test) branch workflows.

Sequence Diagram

sequenceDiagram
  participant User as User (CLI)
  participant Script as cleanup_duplicates.py
  participant GF as Ghostfolio API

  User->>Script: python cleanup_duplicates.py [--apply]
  Script->>GF: GET /api/v1/activities (verify + fetch all)
  GF-->>Script: full activities list
  Script->>GF: GET /api/v1/activities/{id} (single probe)
  GF-->>Script: single activity (verify OK)
  Script->>Script: partition into IBKR# entries and manual entries
  Script->>Script: match by symbol/type/qty/price + DATE_TOLERANCE
  Script-->>User: dry-run report (if no --apply)

  rect rgba(255, 140, 0, 0.5)
    Note over Script,GF: --apply mode only
    Script->>Script: write JSON safety snapshot
    Script->>GF: GET /api/v1/activities/{manual_id} (re-fetch for PUT payload)
    GF-->>Script: current manual activity
    Script->>GF: PUT /api/v1/activities/{manual_id} (patch comment to IBKR#{trade_id})
    GF-->>Script: 200 OK
    Script->>GF: DELETE /api/v1/activities/{ibkr_id}
    GF-->>Script: 200 OK
  end

  Script-->>User: summary (patched/deleted/errors), exit 1 on errors
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 A bunny found duplicates hiding in rows,
With IBKR# stamps and manual woes.
I matched them by symbol, by price, and by date,
Then patched and deleted — oh, wasn't that great!
A snapshot for safety, a dry-run to check,
Now Ghostfolio's tidy — no duplicate wreck! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch staging

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant