Skip to content

Add table and model for memes from users#5

Closed
aleksspevak wants to merge 4 commits intomainfrom
database-user-meme
Closed

Add table and model for memes from users#5
aleksspevak wants to merge 4 commits intomainfrom
database-user-meme

Conversation

@aleksspevak
Copy link
Copy Markdown
Contributor

@aleksspevak aleksspevak commented Dec 30, 2023

Пояснение к таблице meme_raw_upload:

class MemeUserUpload(CustomModel):
    message_id: int
    chat: dict

    content: str | None = None
    date: datetime

    out_links: list[str] | None = None
    mentions: list[str] | None = None # mentioned usernames
    hashtags: list[str] | None = None
    forwarded: dict | None = None
    
    image: list[dict] | None = None # по факту не лист из тех данных, что пришли, а dict но тут нужно еще посмотреть больше с несколькими картинками и видео
    video: list[dict] | None = None
    
meme_raw_upload = Table(
    "meme_raw_upload",
    metadata,
    Column("id", Integer, Identity(), primary_key=True),
    Column("message_id", Integer, nullable=False),
    # from message_id
    # type int not null
    # Example 17, 20 ...
    Column("chat", JSONB, nullable=False),
    # from chat
    # type jsonb not null
    # Example Chat(first_name='', id=, type=<ChatType.PRIVATE>, username=''),
    # first_name как будто не очень нужно, а остальное я бы оставил

    Column("content", String),
    # from caption
    # type varchar null
    # Example текст, текст ...
    Column("date", DateTime, nullable=False),
    # from date
    # type datetime not null
    # Example datetime.datetime(2023, 12, 29, 19, 25, 5, tzinfo=<UTC>)

    Column("out_links", JSONB),
    # from caption_entities where type MessageEntityType.TEXT_LINK
    # type jsonb null
    # Example [https://t.me/ffmemesbot?start=sc_267689, https://huggingface.co/spaces/badayvedat/LLaVA]
    Column("mentions", JSONB),
    # Не было примеров, но может нужно 
    # скорее всего будет в caption_entities с каким-то типом
    # type jsonb null
    Column("hashtags", JSONB),
    # from caption_entities where type MessageEntityType.HASHTAG get length and offset ->parsing caption
    # type jsonb null
    # Example [#meme]
    Column("forwarded", JSONB),
    # from api_kwargs.forward_origin when resend to bot
    # type jsonb null
    # Examples: {'type': 'channel', 'chat': {
    # 					'id': ,
    # 					'title': 'Fast Food Memes / ffmemes',
    # 					'username': 'fastfoodmemes',
    # 					'type': 'channel'
    # 					},
    # 			  'message_id': 8118,
    # 			  'date': 1703875067
    # 		     }
    # 		     {'type': 'hidden_user', 'sender_user_name': '', 'date': 1703853440}}
    #            {'type': 'user', 'sender_user': {'id': , 'is_bot': False, 'first_name': ''}, 'date': 1703877450}

    Column("media", JSONB),
    # from photo я бы взял с наибольшим height+width один dict PhotoSize, нет примера с двумя картинками
    # type jsonb null
    # Examples:
    # photo=(
	#	PhotoSize(file_id='QADNAQ', file_size=1446, file_unique_id='G00eYUh9', height=90, width=58),
    #	PhotoSize(file_id='QADNAQ', file_size=19393, file_unique_id='G00eYUh9', height=320, width=206),
    #	PhotoSize(file_id='QADNAQ', file_size=72237, file_unique_id='G00eYUh9', height=800, width=516),
    #	PhotoSize(file_id='QADNAQ', file_size=88190, file_unique_id='G00eYUh9-', height=1080, width=696)
	#	)
    # from video много данных, основные как будто все строчки кроме api_kwargs и thumbnail, в примере с двумя видео данные об одном
    # type jsonb null
    # Examples:
    # video=Video(
    # 	api_kwargs={
    # 		'thumb': {
    # 			'file_id': 'BwEAB20AAzQE',
    # 			'file_unique_id': 'A',
    # 			'file_size': 11829,
    # 			'width': 175,
    # 			'height': 320
    # 		 }
    # 	},
    # 	duration=21,
    # 	file_id='gTXqpPgc0BA',
    # 	file_name='IMG_2990.MP4',
    # 	file_size=2688663,
    # 	file_unique_id='F',
    # 	height=848,
    # 	mime_type='video/mp4',
    # 	thumbnail=PhotoSize(file_id='BwEAB20AAzQE', file_size=11829, file_unique_id='BwEAB20AAzQE', height=320, width=175),
    # 	width=464)
    Column("created_at", DateTime, server_default=func.now(), nullable=False),
    Column("updated_at", DateTime, onupdate=func.now())

@aleksspevak aleksspevak requested a review from ohld December 30, 2023 11:58
@aleksspevak aleksspevak self-assigned this Dec 30, 2023
@ohld
Copy link
Copy Markdown
Member

ohld commented Jan 2, 2024

Пока что нет полного понимания, что это то, что нужно. Сохраним как черновик, пригодится.

@ohld ohld closed this Mar 26, 2026
ohld added a commit that referenced this pull request Apr 20, 2026
75% of new users leave before meme #5, so showing the channel subscribe
popup at #50 misses nearly all of them. This moves it to #5 and adds:
- URL CTA button linking to the language-appropriate channel
- "I subscribed" callback button for click tracking
- Prefect events: ff.popup.telegram_channel.{shown,clicked,subscribed}
- 30-second delayed subscription verification via TG API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ohld added a commit that referenced this pull request Apr 20, 2026
…y + upload fixes (#178)

* fix(etl): move meme.raw_meme_id from JOIN ON to WHERE in IG retry queries

PostgreSQL forbids referencing the UPDATE target table in a JOIN's ON
clause within the FROM clause. Moved the meme.raw_meme_id condition to
WHERE for both the IG retry (broken_content_link → created) and IG
expire (broken_content_link → expired_content_link) queries.

Fixes FFM-456.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(describe_memes): graceful exit on OpenRouter quota exhaustion (HTTP 402)

When OpenRouter balance drops below $0, all models return 402. Previously,
402 fell through raise_for_status() → HTTPStatusError → continued to next
model in the fallback chain. With 5 models × 20 memes, the flow burned
through the entire 900s timeout making doomed requests.

Now: 402 is detected immediately and returns a QUOTA_EXHAUSTED sentinel,
which propagates up to the main loop for an instant batch exit — no model
fallback needed since 402 is account-wide.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* style: fix ruff formatting in test_engine_contracts.py

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(describe_memes): reduce quota burn from fallback chain and cron frequency

Root cause: 5-model fallback chain with 2 models (gemma-4-*) consistently
returning 403 wasted ~40% of daily quota on guaranteed failures. Combined
with 48 runs/day (every 30min) × 20 memes = 960+ requests against a
1,000/day limit, leaving zero headroom.

Changes:
- Remove gemma-4-31b-it:free and gemma-4-26b-a4b-it:free (persistent 403)
- Reduce cron from */30 to hourly (24 runs × 20 = 480 base requests)
- Widen request interval 3.5s → 4.0s (15 rpm effective vs 20 rpm cap)
- Update specs and CLAUDE.md to reflect new schedule

Resolves FFM-520.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(describe_memes): replace delisted Gemma 3 models with restored Gemma 4

All google/gemma-3-*:free models were removed from OpenRouter ~Apr 15,
causing the pipeline to fail on every attempt and trigger the circuit
breaker. The Gemma 4 free models (gemma-4-31b, gemma-4-26b-a4b) are
now available again after their earlier 403 issues were resolved.

Fixes FFM-543

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(upload): mark memes as broken_content_link after 3 failed TG upload attempts

Previously, when all 3 upload retries were exhausted (e.g. TimedOut),
the meme was left in created status with no telegram_file_id — permanently
stuck after the 24h query window expired. Now it gets marked as
broken_content_link so the failure is visible and retried on next run.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* ops: add SENTRY_AUTH_TOKEN + Coolify vars to CTO env, make Sentry required for QA

QA log scan routine needs SENTRY_AUTH_TOKEN to call sentry CLI — promote from
optional to required. Also grant CTO access to Sentry + Coolify vars for direct
debugging. Remove stale SENTRY_DSN from ops runbook (app-level var, not a
Paperclip company secret).

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(describe_memes): retry on 429 rate limit instead of aborting batch

Previously, any 429 response immediately stopped the entire batch — even
transient per-minute rate limits that reset in <60s. This caused 0 memes
described for 3+ consecutive runs (FFM-574).

Now the flow waits up to 65s (using Retry-After header when available)
and retries the same meme. After 3 waits without progress, it stops the
batch (likely daily quota exhausted, not a transient spike).

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* ops: add Prefect API vars to QA + CTO agent config, update runbook

QA agent was missing PREFECT_API_URL and PREFECT_AUTH_STRING declarations
in .paperclip.yaml, causing connection refused errors during QA log scans.
These secrets already exist in Paperclip company secrets (CTO has them).
Also documents the Prefect secrets in the ops runbook.

SENTRY_AUTH_TOKEN still needs to be created as a company secret (board action).

Refs: FFM-580

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(describe_memes): restore Gemma 3 models, reduce frequency for 50/day quota

Gemma 3 free models (27b, 12b) are back on OpenRouter as of 2026-04-20.
Re-added as fallbacks after Gemma 4 models.

Reduced cron from hourly to every 3 hours and batch_size from 20 to 6
(8 runs × 6 = 48 requests/day) to stay within the 50/day free quota.
Previous hourly×20 (480/day) was exhausting the daily limit within 2-3
runs, causing 0.7% coverage over 7 days. Revert to hourly×20 once $10+
lifetime credit unlocks 1,000/day. FFM-587.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* feat(popups): move channel popup to meme #5 with conversion tracking

75% of new users leave before meme #5, so showing the channel subscribe
popup at #50 misses nearly all of them. This moves it to #5 and adds:
- URL CTA button linking to the language-appropriate channel
- "I subscribed" callback button for click tracking
- Prefect events: ff.popup.telegram_channel.{shown,clicked,subscribed}
- 30-second delayed subscription verification via TG API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add experiment files, published comms, and update TODOS

- Add active experiments: goat-recency-filter, early-channel-popup
- Move cold-start-v2 experiment to completed
- Add 18 published communication docs (2026-04-02 to 2026-04-20)
- Update experiments/log.jsonl with recent activity
- Mark goat recency filter as DONE in TODOS.md
- Add uv.lock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
ohld added a commit that referenced this pull request Apr 27, 2026
Codex adversarial review of PR #205 (5 findings, 1 P1 + 4 P2):

[P1] Steps 0/7/8 bypassed the gate entirely — `paperclipUpdateIssue
done|blocked` was called directly inside each step, so the verification
block in step 9 was dead code as wired (cases #1/#3/#4/#5 from the
ANTI-PATTERNS log all closed via those direct calls). Restructured so
each terminal branch sets `OUTCOME_PATH=A|B|C|D|E|F` and jumps to step
9; step 9 is now the single `paperclipUpdateIssue` call site for the
wake. Added an explicit terminal-status mapping table.

[P2] `/tmp/sc.json` and `/tmp/app.json` are cross-run race traps —
two parallel SE wakes overwrite each other's snapshots. PR-scoped to
`/tmp/sc-${PR_NUMBER}.json` and `/tmp/app-${PR_NUMBER}.json`.

[P2] Path A3 Coolify probe didn't validate the curl response. Empty
body on 401/404/500/network failure → `jq -r .last_online_at` empty →
`date -u -d ""` undefined → silent no-op. Now checks curl exit, HTTP
200, and non-empty `last_online_at`; on failure files
`[chain-broken:coolify-probe-unhealthy]` for CTO.

[P2] Path E was missing the `>= $WAKE_START_ISO` freshness filter that
A2/B2/C2/D1 already have. A stale APPROVED review from a prior wake
could have let a current silent-exit external-PR wake pass the gate.
E1/E2 now filter by wake-start.

[P2] A3 chain-broken contradicted the general "any failed check →
blocked" rule. Called out A3 as the explicit non-blocking exception:
SE delivered review + merge regardless; the broken handoff is a
separate `[chain-broken:*]` ticket. Updated "When a check fails" and
the terminal-status table to make this explicit.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
ohld added a commit that referenced this pull request Apr 27, 2026
Round-2 SE review of commit 72ea6ed found 2 P1 regressions and 1 P2
that the structural pass missed:

[P1] CI-red branch fall-through. Old code had `exit 0`; my refactor
replaced it with `# ... goto step 9`, but bash doesn't honor prose
comments — execution fell through to `gh pr merge --squash --auto`
and queued the merge for a PR that should stay blocked. Fix: introduce
a `SKIP_MERGE` flag set by either precheck failure (CI red OR
auto-merge disabled), and gate the `gh pr merge ...` block on
`[ -z "$SKIP_MERGE" ]`. The single exit point is still step 9.

[P1] A3 chain-broken issue never filed. Both A3 failure branches used
`: "file [chain-broken:*] PR #<n> ..."`, but `:` is the bash null
command — the string is just an evaluated argument, no issue is ever
created. The wake closed `done` silently exactly as ANTI-PATTERNS #5
warned about. Fix: bash block now computes `A3_RESULT` and `A3_DETAIL`
only; an explicit prose step below the bash block tells the agent to
invoke the `paperclipCreateIssue` MCP tool when A3_RESULT is
probe-unhealthy or not-triggered (filing an issue is a tool call, not
a shell command, so it shouldn't have been in the bash block).

[P2] `allow_auto_merge` precheck was documented AFTER the
`gh pr merge --squash --auto` call that depends on it. If the setting
drifts back to false, the merge call errors first and the diagnostic
recovery never runs. Moved the precheck above the merge command (now
under the same `c.` heading as the CI-red precheck), gated by the
SKIP_MERGE flag described above.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
ohld added a commit that referenced this pull request Apr 27, 2026
…firefighting (#205)

* feat(agents/se): Self-Check Gate + Anti-Patterns Log to close silent firefighting

Adds a mandatory verification step before staff-engineer marks an execution
issue done. Each PR-review outcome (merged / queued / blocked-CI / changes-
requested / external / already-resolved) maps to a path with explicit checks
that must pass; failures route to status=blocked with a reason instead of a
silent close.

Path A3 probes the next-link (Coolify deploy) via last_online_at on
/api/v1/applications/<uuid>: if the container is still healthy on a pre-merge
timestamp 5 min after merge, files [chain-broken:coolify-not-triggered]
HIGH for CTO. This catches the GH→Coolify webhook drop case ohld reported.

ANTI-PATTERNS.md is the case-log feeding the gate. Each row maps to a check;
six seed rows from real PRs (#177 6-day silent trigger drop, #199 17h
zero-artifact merge, #201 auto-merge race during changes-requested, #200
bare-merge CI race, the user-reported chain-break, and the
SC=$(gh pr view --json comments) JSON corruption found while testing).

Tested locally against PRs #199 (correctly identified as silent exit, 0
review-signal artifacts) and #201 (signal found, A3 timestamp probe passes).

* fix(agents/se): address codex review of self-check gate

Two real bugs codex caught before push:

[P1] A2/B2/C2 grep was too narrow — only matched the comment-fallback form
(STAFF ENGINEER REVIEW: APPROVED) used when GitHub self-review-blocks ohld.
For non-ohld internal and external authors, step 7 posts a real `gh pr review
--approve -b "Review summary"` whose body lacks the prefix. Those valid
approvals would have failed the gate. Fix: accept EITHER a .reviews[] entry
with state="APPROVED" OR the comment-prefix. Same dual-form for D1
CHANGES_REQUESTED.

[P2] A3 Coolify probe fired immediately after merge, when last_online_at is
still pre-merge from the previous deploy. Coolify needs ~3-5 min for the
deploy + healthcheck cycle. Without a grace window, every healthy PR would
file [chain-broken:coolify-not-triggered] and drown CTO in false alarms.
Fix: probe only fires when now - mergedAt >= 300s; otherwise deferred to
QA's hourly Process Health Check.

Both fixes logged as #7 and #8 in ANTI-PATTERNS.md so the same blind spots
don't reappear in future gate revisions.

* fix(agents/se): address SE agent CHANGES REQUESTED on PR #205

Two P1s the SE agent caught reviewing this PR:

[P1.1] Path A3 used BSD `date -u -j -f` which doesn't exist on the Linux
agent runtime. Probe failed at the first line, MERGED_EPOCH was empty, the
chain-broken issue never fired. Fix: GNU `date -u -d` auto-parses both
mergedAt (ISO 8601) and last_online_at (YYYY-MM-DD HH:MM:SS).

[P1.2] A2/B2/C2/D1/E1 jq commands grepped ALL artifacts on the PR. Spec
said "for THIS run" but had no time filter, so a stale APPROVED comment
from a prior wake would let a current silent-exit wake pass A2 — exactly
the silent-close mode the gate was meant to fix. Fix: capture
WAKE_START_ISO at the top of every wake, filter via --arg t in A2 + D1
jq calls.

Skipped per minimal-code preference: ANTI-PATTERNS rows for these (caught
pre-merge, not a production failure), Coolify UUID drift note, D3 MCP-tool
prose tweak. All non-blocking from the SE review.

Verified locally against PR #205 own data — the new wake-scope filter
correctly finds the SE review when WAKE_START < submittedAt and rejects
it otherwise.

* fix(agents/se): route all SE wake exits through Self-Check Gate

Codex adversarial review of PR #205 (5 findings, 1 P1 + 4 P2):

[P1] Steps 0/7/8 bypassed the gate entirely — `paperclipUpdateIssue
done|blocked` was called directly inside each step, so the verification
block in step 9 was dead code as wired (cases #1/#3/#4/#5 from the
ANTI-PATTERNS log all closed via those direct calls). Restructured so
each terminal branch sets `OUTCOME_PATH=A|B|C|D|E|F` and jumps to step
9; step 9 is now the single `paperclipUpdateIssue` call site for the
wake. Added an explicit terminal-status mapping table.

[P2] `/tmp/sc.json` and `/tmp/app.json` are cross-run race traps —
two parallel SE wakes overwrite each other's snapshots. PR-scoped to
`/tmp/sc-${PR_NUMBER}.json` and `/tmp/app-${PR_NUMBER}.json`.

[P2] Path A3 Coolify probe didn't validate the curl response. Empty
body on 401/404/500/network failure → `jq -r .last_online_at` empty →
`date -u -d ""` undefined → silent no-op. Now checks curl exit, HTTP
200, and non-empty `last_online_at`; on failure files
`[chain-broken:coolify-probe-unhealthy]` for CTO.

[P2] Path E was missing the `>= $WAKE_START_ISO` freshness filter that
A2/B2/C2/D1 already have. A stale APPROVED review from a prior wake
could have let a current silent-exit external-PR wake pass the gate.
E1/E2 now filter by wake-start.

[P2] A3 chain-broken contradicted the general "any failed check →
blocked" rule. Called out A3 as the explicit non-blocking exception:
SE delivered review + merge regardless; the broken handoff is a
separate `[chain-broken:*]` ticket. Updated "When a check fails" and
the terminal-status table to make this explicit.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(agents/se): round-2 review fixes — bash semantics, precheck order

Round-2 SE review of commit 72ea6ed found 2 P1 regressions and 1 P2
that the structural pass missed:

[P1] CI-red branch fall-through. Old code had `exit 0`; my refactor
replaced it with `# ... goto step 9`, but bash doesn't honor prose
comments — execution fell through to `gh pr merge --squash --auto`
and queued the merge for a PR that should stay blocked. Fix: introduce
a `SKIP_MERGE` flag set by either precheck failure (CI red OR
auto-merge disabled), and gate the `gh pr merge ...` block on
`[ -z "$SKIP_MERGE" ]`. The single exit point is still step 9.

[P1] A3 chain-broken issue never filed. Both A3 failure branches used
`: "file [chain-broken:*] PR #<n> ..."`, but `:` is the bash null
command — the string is just an evaluated argument, no issue is ever
created. The wake closed `done` silently exactly as ANTI-PATTERNS #5
warned about. Fix: bash block now computes `A3_RESULT` and `A3_DETAIL`
only; an explicit prose step below the bash block tells the agent to
invoke the `paperclipCreateIssue` MCP tool when A3_RESULT is
probe-unhealthy or not-triggered (filing an issue is a tool call, not
a shell command, so it shouldn't have been in the bash block).

[P2] `allow_auto_merge` precheck was documented AFTER the
`gh pr merge --squash --auto` call that depends on it. If the setting
drifts back to false, the merge call errors first and the diagnostic
recovery never runs. Moved the precheck above the merge command (now
under the same `c.` heading as the CI-red precheck), gated by the
SKIP_MERGE flag described above.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants