A Rust core parser with two build targets (C ABI shared library + WASM) to maximize ecosystem coverage from a single codebase. Template evaluation (Liquid) is deferred to host language implementations.
┌─────────────────────────────────────────────────────────────┐
│ Rust Core Parser │
│ │
│ Responsibilities: │
│ - Structural parsing: |elements, :attrs, prose, |{...} │
│ - Directive recognition: !name, !:name: │
│ - Streaming via ring buffer / pull-based batching │
│ │
│ Does NOT handle: │
│ - Template evaluation (!if, !for, !{...}) │
│ - Dialect semantics (what !cqrs means) │
│ - Output rendering │
└───────────────┬─────────────────────────┬───────────────────┘
│ │
C ABI export WASM build
(cdylib) (wasm32-unknown)
│ │
┌───────────┴───────────┐ ┌───────┴────────┐
│ Python, Ruby, C#, │ │ Browser JS, │
│ Swift, Lua, Julia, │ │ Deno, Node*, │
│ Node* (native addon) │ │ Edge runtimes │
└───────────────────────┘ └────────────────┘
* Node.js can use either path
┌─────────────────────────────────────────────┐
│ Native ports (community/later): │
│ Go, Java, Elixir │
└─────────────────────────────────────────────┘
The parser's only directive-level knowledge is body mode:
| Syntax | Body | Parser Behavior |
|---|---|---|
!foo |
UDON | Parse body recursively as UDON |
!:foo: |
Raw | Capture body verbatim, tag with "foo" |
That's it. No dialect registry. No special cases in the parser.
!if user.admin ; UDON body — parser recurses
|div Admin tools
!warning :severity high ; UDON body — unknown directive, still parses
Do |{em not} run in prod.
!:sql: ; Raw body — captured verbatim
SELECT * FROM users
WHERE active = true
!:json: {"status": "ok"} ; Raw body — inline form
enum Event {
DirectiveStart { name: String, is_raw: bool, attrs: Vec<Attr> },
DirectiveEnd,
// ... other structural events
}For !:sql:, the parser emits:
DirectiveStart { name: "sql", is_raw: true, ... }RawContent { text: "SELECT * FROM users\n..." }DirectiveEnd
For !warning, the parser emits:
DirectiveStart { name: "warning", is_raw: false, ... }- Normal UDON events for the body
DirectiveEnd
Unlike Liquid (deferred to host implementations), Markdown parsing may belong in the core parser. Rationale:
- Markdown is ubiquitous and stable (CommonMark spec)
- UDON explicitly prefers Markdown over inline UDON for simple formatting
- Consistent Markdown handling across all host languages is valuable
- Inline UDON elements (
|{...}) must interleave with Markdown—this is easier if one parser handles both
Status: Not yet decided. Options include:
- Core parser handles Markdown (CommonMark) as part of prose parsing
- Core parser emits prose as-is; host applies Markdown post-processing
- Core parser recognizes Markdown structure but defers rendering to host
This needs further design work before implementation.
Liquid directives (!if, !for, !let, !{...}) are not special to the
parser. They're just directives with UDON bodies.
The host language layer intercepts these by name and routes them to its native Liquid implementation:
| Host Language | Liquid Implementation |
|---|---|
| Ruby | liquid gem (Shopify's original) |
| Python | python-liquid |
| JavaScript | LiquidJS |
| Go | liquid (osteele) — if native port exists |
| Elixir | Solid — if native port exists |
| C#/.NET | Fluid, DotLiquid |
| Java | Liqp — if native port exists |
| PHP | Liquid for PHP |
Liquid implementations vary in feature completeness and edge-case behavior (filters, whitespace handling, error modes, etc.). This is not yet specified by UDON. If formal alignment becomes necessary, UDON will likely defer to Shopify's Liquid specification as the reference standard, with host implementations expected to be "close enough" for practical use.
Parser events
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Host Layer │
│ │
│ Intercepts by directive name: │
│ │
│ !if, !elif, !else, !for, !let, !{...} │
│ → Route to native Liquid │
│ │
│ !:X: │
│ → Pass raw content to consumer (syntax highlight, │
│ execute, embed — host decides) │
│ │
│ Everything else (!warning, !cqrs, !api, ...) │
│ → Pass through as labeled UDON subtree │
│ │
└─────────────────────────────────────────────────────────────┘
│
▼
Consumer/Renderer
Custom directives that aren't Liquid and aren't !:: just flow through as
semantic markers — labeled UDON subtrees that the final consumer interprets.
Ring buffer in Rust-allocated memory:
Host Rust Parser
│ │
│── write input to buffer ────▶│
│── call parse(len) ──────────▶│
│ │ (parses, writes events to ring buffer)
│◀── (events_written, ─────────│
│ bytes_consumed) │
│── read events via pointer ──▶│ [event buffer in Rust memory]
- Host writes input chunks to a buffer
- Parser processes, writes events to ring buffer
- Host reads events directly via pointer (zero-copy where possible)
- Minimal FFI boundary crossings
Ring buffer in WASM linear memory:
Host (JS) WASM Parser
│ │
│── write to WASM memory ─────▶│ [input buffer]
│── call parse(len) ──────────▶│
│ │ (parses, writes to event buffer)
│◀── (events_written, ─────────│
│ bytes_consumed) │
│── read via Memory view ─────▶│ [event buffer]
- Input and output buffers in WASM linear memory
- Host reads events via
WebAssembly.Memorybuffer views - String data requires copy across JS/WASM boundary
pub struct Parser {
// internal state, partial token buffer, etc.
}
impl Parser {
pub fn new(options: Options) -> Self;
/// Feed a chunk, get batch of events
pub fn feed(&mut self, chunk: &[u8]) -> Vec<Event>;
/// Flush remaining after final chunk
pub fn finish(&mut self) -> Vec<Event>;
}
// For zero-copy native Rust usage
impl Parser {
pub fn feed_iter<'a>(&'a mut self, chunk: &'a [u8]) -> impl Iterator<Item = Event<'a>>;
}High-performance FFI via direct memory access.
| Language | Binding Approach |
|---|---|
| Python | PyO3 or cffi |
| Ruby | magnus, rutie, or FFI gem |
| C#/.NET | P/Invoke with Span |
| Swift | Native C interop |
| Lua/LuaJIT | LuaJIT FFI |
| Julia | ccall |
| Node.js | N-API (optional, for perf-critical) |
Portable, no native compilation required.
| Target | Notes |
|---|---|
| Browser JavaScript | Required — only option |
| Deno | Idiomatic |
| Node.js | Easier distribution than native addons |
| Cloudflare Workers | WASM only |
| Other edge runtimes | WASM typically required |
For ecosystems that strongly prefer pure implementations.
| Language | Reason | Priority |
|---|---|---|
| Go | cgo friction, "pure Go" culture | When demand exists |
| Java | JNI/Panama friction | When demand exists |
| Elixir | NIFs block scheduler | When demand exists |
Native ports would implement the same spec, potentially with different performance characteristics.
From the Rust codebase:
cargo build --release
→ target/release/libudon.{so,dylib,dll} # C ABI shared library
cargo build --release --target wasm32-unknown-unknown
→ target/wasm32-unknown-unknown/release/udon.wasm
wasm-bindgen / wasm-pack (for JS ergonomics)
→ pkg/udon.js, pkg/udon_bg.wasm
- Parser is simple: Knows UDON structure +
!::= raw body. That's all. - Liquid is host-native: Each language uses its own Liquid implementation.
- Custom directives pass through: Semantic markers for consumers to interpret.
- One Rust codebase, two targets: cdylib + WASM covers ~80% of ecosystem.
- Native ports later: Go, Java, Elixir when demand justifies.