Unstructured Text Formatter

**Current file:** `app/src/lib/tools/text-formatter.ts`  
**Current model:** `llama-3.3-70b`  
**Current approach:** Single prompt asking LLM to reformat messy text. No structural analysis, no format validation, no data loss detection.

**Problems with current approach:**
- LLM may silently drop rows or columns when reformatting.
- Inferred headers may be incorrect.
- Output format (CSV, JSON, YAML) may not parse correctly.
- No verification that all input data is preserved in output.

**Upgrade plan:**

| Step | Agent | Action |
|------|-------|--------|
| 1 | Structure Detector | Programmatic: Analyze the input text to detect delimiter patterns (tabs, pipes, commas, fixed-width). Count potential rows and columns. |
| 2 | Parsing Agent | Using the detected structure hints, parse the messy text into a structured table. Infer column headers and data types. |
| 3 | Format Compiler | Programmatic: Convert the parsed data into the requested output format using Python serializers (csv module, json.dumps, yaml.dump). |
| 4 | Integrity Checker | Programmatic: Verify row count matches between input and output. Check no data values were dropped. If discrepancies found, flag them. |

- You are free to enhance the agents stacks in the above plan layout, the above one is just for reference. You can enhance more if needed.

**Model suggestions to start with:**
- Step 2: Try `llama-3.3-70b` or `qwen-3-32b` for text parsing. Also try `deepseek-v3.2` for structured extraction.
- Since most of the work is programmatic (Steps 1, 3, 4), the LLM mainly helps with ambiguous parsing decisions.

**Model Selection Guidance**
- **You are free to pick any model from the Oxlo catalog** based on your own testing and evaluation.
- The Models suggestions above, not mandates. Try them first, and if they do not meet the accuracy target, experiment with alternatives.

**Compare against:** Claude Sonnet 4.6 Thinking (strong at structured text extraction).

**Acceptance criteria:**
- Zero data loss: every value in the input must appear in the output (programmatically verified).
- Output format must be parseable in 100% of cases.
- Row count preservation verified programmatically.
- Overall accuracy at 80%+.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstructured Text Formatter #25

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Step	Agent	Action
1	Structure Detector	Programmatic: Analyze the input text to detect delimiter patterns (tabs, pipes, commas, fixed-width). Count potential rows and columns.
2	Parsing Agent	Using the detected structure hints, parse the messy text into a structured table. Infer column headers and data types.
3	Format Compiler	Programmatic: Convert the parsed data into the requested output format using Python serializers (csv module, json.dumps, yaml.dump).
4	Integrity Checker	Programmatic: Verify row count matches between input and output. Check no data values were dropped. If discrepancies found, flag them.

Unstructured Text Formatter #25

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions