Add table and model for memes from users by aleksspevak · Pull Request #5 · ffmemes/ff-backend

aleksspevak · 2023-12-30T11:58:49Z

Пояснение к таблице meme_raw_upload:

class MemeUserUpload(CustomModel):
    message_id: int
    chat: dict

    content: str | None = None
    date: datetime

    out_links: list[str] | None = None
    mentions: list[str] | None = None # mentioned usernames
    hashtags: list[str] | None = None
    forwarded: dict | None = None
    
    image: list[dict] | None = None # по факту не лист из тех данных, что пришли, а dict но тут нужно еще посмотреть больше с несколькими картинками и видео
    video: list[dict] | None = None
    
meme_raw_upload = Table(
    "meme_raw_upload",
    metadata,
    Column("id", Integer, Identity(), primary_key=True),
    Column("message_id", Integer, nullable=False),
    # from message_id
    # type int not null
    # Example 17, 20 ...
    Column("chat", JSONB, nullable=False),
    # from chat
    # type jsonb not null
    # Example Chat(first_name='', id=, type=<ChatType.PRIVATE>, username=''),
    # first_name как будто не очень нужно, а остальное я бы оставил

    Column("content", String),
    # from caption
    # type varchar null
    # Example текст, текст ...
    Column("date", DateTime, nullable=False),
    # from date
    # type datetime not null
    # Example datetime.datetime(2023, 12, 29, 19, 25, 5, tzinfo=<UTC>)

    Column("out_links", JSONB),
    # from caption_entities where type MessageEntityType.TEXT_LINK
    # type jsonb null
    # Example [https://t.me/ffmemesbot?start=sc_267689, https://huggingface.co/spaces/badayvedat/LLaVA]
    Column("mentions", JSONB),
    # Не было примеров, но может нужно 
    # скорее всего будет в caption_entities с каким-то типом
    # type jsonb null
    Column("hashtags", JSONB),
    # from caption_entities where type MessageEntityType.HASHTAG get length and offset ->parsing caption
    # type jsonb null
    # Example [#meme]
    Column("forwarded", JSONB),
    # from api_kwargs.forward_origin when resend to bot
    # type jsonb null
    # Examples: {'type': 'channel', 'chat': {
    # 					'id': ,
    # 					'title': 'Fast Food Memes / ffmemes',
    # 					'username': 'fastfoodmemes',
    # 					'type': 'channel'
    # 					},
    # 			  'message_id': 8118,
    # 			  'date': 1703875067
    # 		     }
    # 		     {'type': 'hidden_user', 'sender_user_name': '', 'date': 1703853440}}
    #            {'type': 'user', 'sender_user': {'id': , 'is_bot': False, 'first_name': ''}, 'date': 1703877450}

    Column("media", JSONB),
    # from photo я бы взял с наибольшим height+width один dict PhotoSize, нет примера с двумя картинками
    # type jsonb null
    # Examples:
    # photo=(
	#	PhotoSize(file_id='QADNAQ', file_size=1446, file_unique_id='G00eYUh9', height=90, width=58),
    #	PhotoSize(file_id='QADNAQ', file_size=19393, file_unique_id='G00eYUh9', height=320, width=206),
    #	PhotoSize(file_id='QADNAQ', file_size=72237, file_unique_id='G00eYUh9', height=800, width=516),
    #	PhotoSize(file_id='QADNAQ', file_size=88190, file_unique_id='G00eYUh9-', height=1080, width=696)
	#	)
    # from video много данных, основные как будто все строчки кроме api_kwargs и thumbnail, в примере с двумя видео данные об одном
    # type jsonb null
    # Examples:
    # video=Video(
    # 	api_kwargs={
    # 		'thumb': {
    # 			'file_id': 'BwEAB20AAzQE',
    # 			'file_unique_id': 'A',
    # 			'file_size': 11829,
    # 			'width': 175,
    # 			'height': 320
    # 		 }
    # 	},
    # 	duration=21,
    # 	file_id='gTXqpPgc0BA',
    # 	file_name='IMG_2990.MP4',
    # 	file_size=2688663,
    # 	file_unique_id='F',
    # 	height=848,
    # 	mime_type='video/mp4',
    # 	thumbnail=PhotoSize(file_id='BwEAB20AAzQE', file_size=11829, file_unique_id='BwEAB20AAzQE', height=320, width=175),
    # 	width=464)
    Column("created_at", DateTime, server_default=func.now(), nullable=False),
    Column("updated_at", DateTime, onupdate=func.now())

ohld · 2024-01-02T13:38:11Z

Пока что нет полного понимания, что это то, что нужно. Сохраним как черновик, пригодится.

75% of new users leave before meme #5, so showing the channel subscribe popup at #50 misses nearly all of them. This moves it to #5 and adds: - URL CTA button linking to the language-appropriate channel - "I subscribed" callback button for click tracking - Prefect events: ff.popup.telegram_channel.{shown,clicked,subscribed} - 30-second delayed subscription verification via TG API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…y + upload fixes (#178) * fix(etl): move meme.raw_meme_id from JOIN ON to WHERE in IG retry queries PostgreSQL forbids referencing the UPDATE target table in a JOIN's ON clause within the FROM clause. Moved the meme.raw_meme_id condition to WHERE for both the IG retry (broken_content_link → created) and IG expire (broken_content_link → expired_content_link) queries. Fixes FFM-456. Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(describe_memes): graceful exit on OpenRouter quota exhaustion (HTTP 402) When OpenRouter balance drops below $0, all models return 402. Previously, 402 fell through raise_for_status() → HTTPStatusError → continued to next model in the fallback chain. With 5 models × 20 memes, the flow burned through the entire 900s timeout making doomed requests. Now: 402 is detected immediately and returns a QUOTA_EXHAUSTED sentinel, which propagates up to the main loop for an instant batch exit — no model fallback needed since 402 is account-wide. Co-Authored-By: Paperclip <noreply@paperclip.ing> * style: fix ruff formatting in test_engine_contracts.py Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(describe_memes): reduce quota burn from fallback chain and cron frequency Root cause: 5-model fallback chain with 2 models (gemma-4-*) consistently returning 403 wasted ~40% of daily quota on guaranteed failures. Combined with 48 runs/day (every 30min) × 20 memes = 960+ requests against a 1,000/day limit, leaving zero headroom. Changes: - Remove gemma-4-31b-it:free and gemma-4-26b-a4b-it:free (persistent 403) - Reduce cron from */30 to hourly (24 runs × 20 = 480 base requests) - Widen request interval 3.5s → 4.0s (15 rpm effective vs 20 rpm cap) - Update specs and CLAUDE.md to reflect new schedule Resolves FFM-520. Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(describe_memes): replace delisted Gemma 3 models with restored Gemma 4 All google/gemma-3-*:free models were removed from OpenRouter ~Apr 15, causing the pipeline to fail on every attempt and trigger the circuit breaker. The Gemma 4 free models (gemma-4-31b, gemma-4-26b-a4b) are now available again after their earlier 403 issues were resolved. Fixes FFM-543 Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(upload): mark memes as broken_content_link after 3 failed TG upload attempts Previously, when all 3 upload retries were exhausted (e.g. TimedOut), the meme was left in created status with no telegram_file_id — permanently stuck after the 24h query window expired. Now it gets marked as broken_content_link so the failure is visible and retried on next run. Co-Authored-By: Paperclip <noreply@paperclip.ing> * ops: add SENTRY_AUTH_TOKEN + Coolify vars to CTO env, make Sentry required for QA QA log scan routine needs SENTRY_AUTH_TOKEN to call sentry CLI — promote from optional to required. Also grant CTO access to Sentry + Coolify vars for direct debugging. Remove stale SENTRY_DSN from ops runbook (app-level var, not a Paperclip company secret). Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(describe_memes): retry on 429 rate limit instead of aborting batch Previously, any 429 response immediately stopped the entire batch — even transient per-minute rate limits that reset in <60s. This caused 0 memes described for 3+ consecutive runs (FFM-574). Now the flow waits up to 65s (using Retry-After header when available) and retries the same meme. After 3 waits without progress, it stops the batch (likely daily quota exhausted, not a transient spike). Co-Authored-By: Paperclip <noreply@paperclip.ing> * ops: add Prefect API vars to QA + CTO agent config, update runbook QA agent was missing PREFECT_API_URL and PREFECT_AUTH_STRING declarations in .paperclip.yaml, causing connection refused errors during QA log scans. These secrets already exist in Paperclip company secrets (CTO has them). Also documents the Prefect secrets in the ops runbook. SENTRY_AUTH_TOKEN still needs to be created as a company secret (board action). Refs: FFM-580 Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(describe_memes): restore Gemma 3 models, reduce frequency for 50/day quota Gemma 3 free models (27b, 12b) are back on OpenRouter as of 2026-04-20. Re-added as fallbacks after Gemma 4 models. Reduced cron from hourly to every 3 hours and batch_size from 20 to 6 (8 runs × 6 = 48 requests/day) to stay within the 50/day free quota. Previous hourly×20 (480/day) was exhausting the daily limit within 2-3 runs, causing 0.7% coverage over 7 days. Revert to hourly×20 once $10+ lifetime credit unlocks 1,000/day. FFM-587. Co-Authored-By: Paperclip <noreply@paperclip.ing> * feat(popups): move channel popup to meme #5 with conversion tracking 75% of new users leave before meme #5, so showing the channel subscribe popup at #50 misses nearly all of them. This moves it to #5 and adds: - URL CTA button linking to the language-appropriate channel - "I subscribed" callback button for click tracking - Prefect events: ff.popup.telegram_channel.{shown,clicked,subscribed} - 30-second delayed subscription verification via TG API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add experiment files, published comms, and update TODOS - Add active experiments: goat-recency-filter, early-channel-popup - Move cold-start-v2 experiment to completed - Add 18 published communication docs (2026-04-02 to 2026-04-20) - Update experiments/log.jsonl with recent activity - Mark goat recency filter as DONE in TODOS.md - Add uv.lock Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Paperclip <noreply@paperclip.ing> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Codex adversarial review of PR #205 (5 findings, 1 P1 + 4 P2): [P1] Steps 0/7/8 bypassed the gate entirely — `paperclipUpdateIssue done|blocked` was called directly inside each step, so the verification block in step 9 was dead code as wired (cases #1/#3/#4/#5 from the ANTI-PATTERNS log all closed via those direct calls). Restructured so each terminal branch sets `OUTCOME_PATH=A|B|C|D|E|F` and jumps to step 9; step 9 is now the single `paperclipUpdateIssue` call site for the wake. Added an explicit terminal-status mapping table. [P2] `/tmp/sc.json` and `/tmp/app.json` are cross-run race traps — two parallel SE wakes overwrite each other's snapshots. PR-scoped to `/tmp/sc-${PR_NUMBER}.json` and `/tmp/app-${PR_NUMBER}.json`. [P2] Path A3 Coolify probe didn't validate the curl response. Empty body on 401/404/500/network failure → `jq -r .last_online_at` empty → `date -u -d ""` undefined → silent no-op. Now checks curl exit, HTTP 200, and non-empty `last_online_at`; on failure files `[chain-broken:coolify-probe-unhealthy]` for CTO. [P2] Path E was missing the `>= $WAKE_START_ISO` freshness filter that A2/B2/C2/D1 already have. A stale APPROVED review from a prior wake could have let a current silent-exit external-PR wake pass the gate. E1/E2 now filter by wake-start. [P2] A3 chain-broken contradicted the general "any failed check → blocked" rule. Called out A3 as the explicit non-blocking exception: SE delivered review + merge regardless; the broken handoff is a separate `[chain-broken:*]` ticket. Updated "When a check fails" and the terminal-status table to make this explicit. Co-Authored-By: Paperclip <noreply@paperclip.ing>

Round-2 SE review of commit 72ea6ed found 2 P1 regressions and 1 P2 that the structural pass missed: [P1] CI-red branch fall-through. Old code had `exit 0`; my refactor replaced it with `# ... goto step 9`, but bash doesn't honor prose comments — execution fell through to `gh pr merge --squash --auto` and queued the merge for a PR that should stay blocked. Fix: introduce a `SKIP_MERGE` flag set by either precheck failure (CI red OR auto-merge disabled), and gate the `gh pr merge ...` block on `[ -z "$SKIP_MERGE" ]`. The single exit point is still step 9. [P1] A3 chain-broken issue never filed. Both A3 failure branches used `: "file [chain-broken:*] PR #<n> ..."`, but `:` is the bash null command — the string is just an evaluated argument, no issue is ever created. The wake closed `done` silently exactly as ANTI-PATTERNS #5 warned about. Fix: bash block now computes `A3_RESULT` and `A3_DETAIL` only; an explicit prose step below the bash block tells the agent to invoke the `paperclipCreateIssue` MCP tool when A3_RESULT is probe-unhealthy or not-triggered (filing an issue is a tool call, not a shell command, so it shouldn't have been in the bash block). [P2] `allow_auto_merge` precheck was documented AFTER the `gh pr merge --squash --auto` call that depends on it. If the setting drifts back to false, the merge call errors first and the diagnostic recovery never runs. Moved the precheck above the merge command (now under the same `c.` heading as the CI-red precheck), gated by the SKIP_MERGE flag described above. Co-Authored-By: Paperclip <noreply@paperclip.ing>

…firefighting (#205) * feat(agents/se): Self-Check Gate + Anti-Patterns Log to close silent firefighting Adds a mandatory verification step before staff-engineer marks an execution issue done. Each PR-review outcome (merged / queued / blocked-CI / changes- requested / external / already-resolved) maps to a path with explicit checks that must pass; failures route to status=blocked with a reason instead of a silent close. Path A3 probes the next-link (Coolify deploy) via last_online_at on /api/v1/applications/<uuid>: if the container is still healthy on a pre-merge timestamp 5 min after merge, files [chain-broken:coolify-not-triggered] HIGH for CTO. This catches the GH→Coolify webhook drop case ohld reported. ANTI-PATTERNS.md is the case-log feeding the gate. Each row maps to a check; six seed rows from real PRs (#177 6-day silent trigger drop, #199 17h zero-artifact merge, #201 auto-merge race during changes-requested, #200 bare-merge CI race, the user-reported chain-break, and the SC=$(gh pr view --json comments) JSON corruption found while testing). Tested locally against PRs #199 (correctly identified as silent exit, 0 review-signal artifacts) and #201 (signal found, A3 timestamp probe passes). * fix(agents/se): address codex review of self-check gate Two real bugs codex caught before push: [P1] A2/B2/C2 grep was too narrow — only matched the comment-fallback form (STAFF ENGINEER REVIEW: APPROVED) used when GitHub self-review-blocks ohld. For non-ohld internal and external authors, step 7 posts a real `gh pr review --approve -b "Review summary"` whose body lacks the prefix. Those valid approvals would have failed the gate. Fix: accept EITHER a .reviews[] entry with state="APPROVED" OR the comment-prefix. Same dual-form for D1 CHANGES_REQUESTED. [P2] A3 Coolify probe fired immediately after merge, when last_online_at is still pre-merge from the previous deploy. Coolify needs ~3-5 min for the deploy + healthcheck cycle. Without a grace window, every healthy PR would file [chain-broken:coolify-not-triggered] and drown CTO in false alarms. Fix: probe only fires when now - mergedAt >= 300s; otherwise deferred to QA's hourly Process Health Check. Both fixes logged as #7 and #8 in ANTI-PATTERNS.md so the same blind spots don't reappear in future gate revisions. * fix(agents/se): address SE agent CHANGES REQUESTED on PR #205 Two P1s the SE agent caught reviewing this PR: [P1.1] Path A3 used BSD `date -u -j -f` which doesn't exist on the Linux agent runtime. Probe failed at the first line, MERGED_EPOCH was empty, the chain-broken issue never fired. Fix: GNU `date -u -d` auto-parses both mergedAt (ISO 8601) and last_online_at (YYYY-MM-DD HH:MM:SS). [P1.2] A2/B2/C2/D1/E1 jq commands grepped ALL artifacts on the PR. Spec said "for THIS run" but had no time filter, so a stale APPROVED comment from a prior wake would let a current silent-exit wake pass A2 — exactly the silent-close mode the gate was meant to fix. Fix: capture WAKE_START_ISO at the top of every wake, filter via --arg t in A2 + D1 jq calls. Skipped per minimal-code preference: ANTI-PATTERNS rows for these (caught pre-merge, not a production failure), Coolify UUID drift note, D3 MCP-tool prose tweak. All non-blocking from the SE review. Verified locally against PR #205 own data — the new wake-scope filter correctly finds the SE review when WAKE_START < submittedAt and rejects it otherwise. * fix(agents/se): route all SE wake exits through Self-Check Gate Codex adversarial review of PR #205 (5 findings, 1 P1 + 4 P2): [P1] Steps 0/7/8 bypassed the gate entirely — `paperclipUpdateIssue done|blocked` was called directly inside each step, so the verification block in step 9 was dead code as wired (cases #1/#3/#4/#5 from the ANTI-PATTERNS log all closed via those direct calls). Restructured so each terminal branch sets `OUTCOME_PATH=A|B|C|D|E|F` and jumps to step 9; step 9 is now the single `paperclipUpdateIssue` call site for the wake. Added an explicit terminal-status mapping table. [P2] `/tmp/sc.json` and `/tmp/app.json` are cross-run race traps — two parallel SE wakes overwrite each other's snapshots. PR-scoped to `/tmp/sc-${PR_NUMBER}.json` and `/tmp/app-${PR_NUMBER}.json`. [P2] Path A3 Coolify probe didn't validate the curl response. Empty body on 401/404/500/network failure → `jq -r .last_online_at` empty → `date -u -d ""` undefined → silent no-op. Now checks curl exit, HTTP 200, and non-empty `last_online_at`; on failure files `[chain-broken:coolify-probe-unhealthy]` for CTO. [P2] Path E was missing the `>= $WAKE_START_ISO` freshness filter that A2/B2/C2/D1 already have. A stale APPROVED review from a prior wake could have let a current silent-exit external-PR wake pass the gate. E1/E2 now filter by wake-start. [P2] A3 chain-broken contradicted the general "any failed check → blocked" rule. Called out A3 as the explicit non-blocking exception: SE delivered review + merge regardless; the broken handoff is a separate `[chain-broken:*]` ticket. Updated "When a check fails" and the terminal-status table to make this explicit. Co-Authored-By: Paperclip <noreply@paperclip.ing> * fix(agents/se): round-2 review fixes — bash semantics, precheck order Round-2 SE review of commit 72ea6ed found 2 P1 regressions and 1 P2 that the structural pass missed: [P1] CI-red branch fall-through. Old code had `exit 0`; my refactor replaced it with `# ... goto step 9`, but bash doesn't honor prose comments — execution fell through to `gh pr merge --squash --auto` and queued the merge for a PR that should stay blocked. Fix: introduce a `SKIP_MERGE` flag set by either precheck failure (CI red OR auto-merge disabled), and gate the `gh pr merge ...` block on `[ -z "$SKIP_MERGE" ]`. The single exit point is still step 9. [P1] A3 chain-broken issue never filed. Both A3 failure branches used `: "file [chain-broken:*] PR #<n> ..."`, but `:` is the bash null command — the string is just an evaluated argument, no issue is ever created. The wake closed `done` silently exactly as ANTI-PATTERNS #5 warned about. Fix: bash block now computes `A3_RESULT` and `A3_DETAIL` only; an explicit prose step below the bash block tells the agent to invoke the `paperclipCreateIssue` MCP tool when A3_RESULT is probe-unhealthy or not-triggered (filing an issue is a tool call, not a shell command, so it shouldn't have been in the bash block). [P2] `allow_auto_merge` precheck was documented AFTER the `gh pr merge --squash --auto` call that depends on it. If the setting drifts back to false, the merge call errors first and the diagnostic recovery never runs. Moved the precheck above the merge command (now under the same `c.` heading as the CI-red precheck), gated by the SKIP_MERGE flag described above. Co-Authored-By: Paperclip <noreply@paperclip.ing> --------- Co-authored-by: Paperclip <noreply@paperclip.ing>

aleksspevak added 2 commits December 30, 2023 14:45

feat(db): Add meme_raw_upload table and model

242769d

feat(db): Add meme_raw_upload table and model

2781d39

aleksspevak requested a review from ohld December 30, 2023 11:58

aleksspevak self-assigned this Dec 30, 2023

aleksspevak added 2 commits December 30, 2023 16:24

feat(db): Combined video and image to media column

296f83e

feat(db): Combined video and image to media column

c19e82d

ohld closed this Mar 26, 2026

ohld mentioned this pull request Apr 20, 2026

feat: early channel popup at meme #5 + describe_memes rate-limit retry + upload fixes #178

Merged

10 tasks

ohld mentioned this pull request Apr 26, 2026

feat(agents/se): Self-Check Gate + Anti-Patterns Log to close silent firefighting #205

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add table and model for memes from users#5

Add table and model for memes from users#5
aleksspevak wants to merge 4 commits intomainfrom
database-user-meme

aleksspevak commented Dec 30, 2023 •

edited

Loading

Uh oh!

ohld commented Jan 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aleksspevak commented Dec 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ohld commented Jan 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aleksspevak commented Dec 30, 2023 •

edited

Loading