Current file: app/src/lib/tools/text-formatter.ts
Current model: llama-3.3-70b
Current approach: Single prompt asking LLM to reformat messy text. No structural analysis, no format validation, no data loss detection.
Problems with current approach:
- LLM may silently drop rows or columns when reformatting.
- Inferred headers may be incorrect.
- Output format (CSV, JSON, YAML) may not parse correctly.
- No verification that all input data is preserved in output.
Upgrade plan:
| Step |
Agent |
Action |
| 1 |
Structure Detector |
Programmatic: Analyze the input text to detect delimiter patterns (tabs, pipes, commas, fixed-width). Count potential rows and columns. |
| 2 |
Parsing Agent |
Using the detected structure hints, parse the messy text into a structured table. Infer column headers and data types. |
| 3 |
Format Compiler |
Programmatic: Convert the parsed data into the requested output format using Python serializers (csv module, json.dumps, yaml.dump). |
| 4 |
Integrity Checker |
Programmatic: Verify row count matches between input and output. Check no data values were dropped. If discrepancies found, flag them. |
- You are free to enhance the agents stacks in the above plan layout, the above one is just for reference. You can enhance more if needed.
Model suggestions to start with:
- Step 2: Try
llama-3.3-70b or qwen-3-32b for text parsing. Also try deepseek-v3.2 for structured extraction.
- Since most of the work is programmatic (Steps 1, 3, 4), the LLM mainly helps with ambiguous parsing decisions.
Model Selection Guidance
- You are free to pick any model from the Oxlo catalog based on your own testing and evaluation.
- The Models suggestions above, not mandates. Try them first, and if they do not meet the accuracy target, experiment with alternatives.
Compare against: Claude Sonnet 4.6 Thinking (strong at structured text extraction).
Acceptance criteria:
- Zero data loss: every value in the input must appear in the output (programmatically verified).
- Output format must be parseable in 100% of cases.
- Row count preservation verified programmatically.
- Overall accuracy at 80%+.
Current file:
app/src/lib/tools/text-formatter.tsCurrent model:
llama-3.3-70bCurrent approach: Single prompt asking LLM to reformat messy text. No structural analysis, no format validation, no data loss detection.
Problems with current approach:
Upgrade plan:
Model suggestions to start with:
llama-3.3-70borqwen-3-32bfor text parsing. Also trydeepseek-v3.2for structured extraction.Model Selection Guidance
Compare against: Claude Sonnet 4.6 Thinking (strong at structured text extraction).
Acceptance criteria: