Summary
When an Excel file (.xlsx) is extracted during compilation/packaging, the current implementation produces raw text output. Instead, each sheet should be extracted as a separate .csv.txt file with YAML frontmatter metadata.
Expected Behavior
Given an Excel file with sheets "Revenue" and "Expenses":
Output: revenue.csv.txt
---
source: financials.xlsx
sheet: Revenue
rows: 150
columns: 8
extracted_at: 2025-03-11T00:00:00Z
---
col1,col2,col3,...
data,data,data,...
Output: expenses.csv.txt
---
source: financials.xlsx
sheet: Expenses
rows: 75
columns: 6
extracted_at: 2025-03-11T00:00:00Z
---
col1,col2,col3,...
data,data,data,...
Current Behavior
Excel files are extracted as a single blob of text without sheet separation or frontmatter.
Context
- This applies to the registry-bound output — by the time content hits the registry it should already be in
.csv.txt format
- Each sheet becomes its own file, broken out as individual code files with frontmatter
- Frontmatter should include source file, sheet name, and basic stats
- The xlsx dependency (0.18.5) is only used for trusted text extraction during compile/package, not user-uploaded content
Affected Components
- Python CLI:
compiler.py (binary asset extraction stage)
- Go CLI:
compiler.go
- Node.js CLI: compilation pipeline
Summary
When an Excel file (.xlsx) is extracted during compilation/packaging, the current implementation produces raw text output. Instead, each sheet should be extracted as a separate
.csv.txtfile with YAML frontmatter metadata.Expected Behavior
Given an Excel file with sheets "Revenue" and "Expenses":
Output:
revenue.csv.txtOutput:
expenses.csv.txtCurrent Behavior
Excel files are extracted as a single blob of text without sheet separation or frontmatter.
Context
.csv.txtformatAffected Components
compiler.py(binary asset extraction stage)compiler.go