Current file: app/src/lib/tools/caption-generator.ts
Current model: llama-3.3-70b
Current approach: Single prompt asking for 3 caption variations. No character count enforcement, no hashtag validation, no platform-specific formatting rules.
Problems with current approach:
- Character limits are stated in the prompt but not enforced. Captions regularly exceed platform limits (Twitter 280 chars).
- Hashtag relevance is not validated.
- No actual distinction between platform conventions beyond what the LLM remembers.
- The 3 "variations" (Professional, Casual, Bold) are often too similar in practice.
- No engagement prediction or optimization based on platform best practices.
Upgrade plan:
| Step |
Agent |
Action |
| 1 |
Platform Config |
Programmatic: Load platform-specific rules: character limits (Twitter 280, Instagram 2200, LinkedIn 3000, TikTok 2200), hashtag conventions (Instagram 20-30, Twitter 2-3, LinkedIn 3-5), emoji norms, CTA patterns. |
| 2 |
Caption Generator |
Generate 3 distinctly different caption variations using the platform rules as hard constraints. Each variation must have a clearly different tone and structure. |
| 3 |
Constraint Validator |
Programmatic: Check character count against platform limit. Count hashtags against platform convention. Verify emoji usage is within norms. If any constraint is violated, truncate or flag for regeneration. |
| 4 |
Diversity Checker |
Programmatic: Compute text similarity between the 3 variations (e.g., Jaccard similarity on word sets). If any two variations are more than 70% similar, flag for regeneration to ensure genuine diversity. |
| 5 |
Refinement Agent |
If constraints are violated or diversity is too low, regenerate only the failing variations with explicit constraint reminders. Max 2 retries. |
- You are free to enhance the agents stacks in the above plan layout, the above one is just for reference. You can enhance more if needed.
Model suggestions to start with:
- Step 2: Try
llama-3.3-70b (current model, good at creative text). Also try qwen-3-32b or minimax-m2.5 for more varied creative output.
- Step 5: Same model as Step 2 for consistency.
- This tool benefits heavily from the programmatic constraint enforcement (Steps 1, 3, 4). Model choice is less critical than the validation pipeline.
Model Selection Guidance
- You are free to pick any model from the Oxlo catalog based on your own testing and evaluation.
- The Models suggestions above, not mandates. Try them first, and if they do not meet the accuracy target, experiment with alternatives.
Compare against: GPT 5.3 Thinking & Claude Sonnet 4.6 Thinking.
Acceptance criteria:
- 100% of captions respect platform character limits (programmatically enforced).
- Hashtag count falls within platform conventions.
- All 3 variations have less than 70% word-level similarity to each other.
- Overall quality matches or exceeds GPT 5.3 Thinking & Claude Sonnet 4.6 on test cases.
- Overall accuracy at 80%+.
Current file:
app/src/lib/tools/caption-generator.tsCurrent model:
llama-3.3-70bCurrent approach: Single prompt asking for 3 caption variations. No character count enforcement, no hashtag validation, no platform-specific formatting rules.
Problems with current approach:
Upgrade plan:
Model suggestions to start with:
llama-3.3-70b(current model, good at creative text). Also tryqwen-3-32borminimax-m2.5for more varied creative output.Model Selection Guidance
Compare against: GPT 5.3 Thinking & Claude Sonnet 4.6 Thinking.
Acceptance criteria: