Skip to content

Latest commit

 

History

History
217 lines (163 loc) · 5.96 KB

File metadata and controls

217 lines (163 loc) · 5.96 KB

libudon

High-performance UDON parser in Rust.

What is UDON?

UDON (Universal Document & Object Notation) is a unified notation for documents, data, and configuration. See the specification for details.

Status

Phase 3 complete - Parser rewritten using descent, a parser generator that produces callback-based recursive descent parsers.

Features

Feature Syntax Status
Elements |element[id].class1.class2?
Attributes :key value
Typed values integers, floats, rationals, complex, booleans, nil, strings
Arrays [1 2 3]
Text/prose indented content
Hierarchy indentation and rightward nesting
Comments ; line and ;{inline}
Embedded elements |{em bold text}
Interpolation !{{expression}}
Block directives !if condition
Inline directives !{include partial}
Raw blocks !:lang:
References @[name]
Freeform blocks ```

Performance

Benchmarked on Apple Silicon (M-series):

Benchmark Time Throughput
Comprehensive (15KB) 10.5 µs 1.35 GiB/s
Comments 47 ns 726 MiB/s
Minimal file 74 ns 696 MiB/s
Text content 58 ns 672 MiB/s
Dynamic content 86 ns 455 MiB/s
Nested elements 185 ns 336 MiB/s
Empty (overhead) 1.9 ns

Comparison with Previous Parser (main branch)

Benchmark Old (main) New (phase-3) Speedup
comprehensive.udon 28 µs (516 MiB/s) 10.5 µs (1.35 GiB/s) 2.7x
minimal.udon 106 ns (486 MiB/s) 74 ns (696 MiB/s) 1.4x
comments_only 123 ns (278 MiB/s) 47 ns (726 MiB/s) 2.6x
text_only 122 ns (318 MiB/s) 58 ns (672 MiB/s) 2.1x
empty (overhead) 80 ns 1.9 ns 42x

Cross-Format Comparison

Parsing semantically equivalent documents (~50% structure, ~30% short text, ~20% prose):

Format Parser s10 (MB/s) s10 (El/s) s50 (MB/s) s50 (El/s) s200 (MB/s) s200 (El/s) Size
UDON libudon 897 9.4M 744 7.7M 748 7.7M 100%
XML quick-xml 935 7.6M 983 7.9M 1,003 8.0M 129%
JSON serde_json 353 3.4M 372 3.6M 335 3.2M 108%
Markdown pulldown-cmark 199 2.2M 196 2.1M 207 2.2M 98%
TOML toml 54 0.5M 56 0.5M 55 0.5M 122%
YAML serde_yaml 41 0.3M 43 0.4M 43 0.4M 126%
  • s10/s50/s200: 10, 50, 200 item documents (22, 101, 401 elements)
  • MB/s: Raw byte throughput
  • El/s: Semantic elements parsed per second
  • Size: Average document size relative to UDON

Run with: cargo bench --bench compare

Structure

libudon/
├── udon-core/           # Core parser library
│   └── src/
│       ├── lib.rs       # Public API
│       ├── parser.rs    # GENERATED by descent
│       ├── tree.rs      # Tree/AST representation
│       └── span.rs      # Source locations
├── generator/           # Parser specification
│   ├── udon.desc        # Main parser grammar
│   └── values.desc      # Value type parsing
└── regenerate-parser    # Script to regenerate parser

Building

cargo build --release

Testing

cargo test

Benchmarking

cargo bench --bench parse

Regenerating the Parser

The parser is generated from .desc specifications using descent:

# Install descent (from ~/src/descent/)
cd ~/src/descent && dx gem install

# Regenerate parser
./regenerate-parser

# Regenerate with tracing (for debugging)
./regenerate-parser --trace

Usage

libudon provides two APIs:

Tree API (DOM-like)

Parse into a navigable tree structure:

use udon_core::tree::Document;

let input = b"|article[intro].featured
  :author Joseph
  :tags [rust parsing udon]

  |heading Welcome

  Some introductory text.
";

let doc = Document::parse(input).unwrap();

// Navigate the tree
for node in doc.root().children() {
    if let Some(el) = node.as_element() {
        println!("Element: {}", el.name());
        println!("  id: {:?}", el.id());
        println!("  classes: {:?}", el.classes());

        for (name, value) in el.attrs() {
            println!("  :{} = {:?}", name, value);
        }
    }
}

Output:

Element: article
  id: Some("intro")
  classes: ["featured"]
  :author = Bare("Joseph")
  :tags = Array([Bare("rust"), Bare("parsing"), Bare("udon")])

Tree API features:

  • Full parent/child/sibling navigation
  • ElementView for typed access to element properties
  • Value enum preserves original representation (Integer, Float, Rational, Complex, Bool, Nil, Array)
  • Zero-copy where possible via Cow<str>
  • ~313 MB/s throughput (2.6x overhead vs streaming)

Streaming API (SAX-like)

For maximum performance or large documents:

use udon_core::Parser;

let input = b"|article[intro]\n  :author Joseph\n  Hello, world!\n";

Parser::new(input).parse(|event| {
    println!("{}", event.format_line());
});

Output:

ElementStart @ 1..1
Name "article" @ 1..8
Attr "id" @ 8..15
BareValue "intro" @ 9..14
Attr "author" @ 19..25
BareValue "Joseph" @ 27..33
Text "Hello, world!" @ 36..49
ElementEnd @ 50..50

Streaming API features:

  • Zero allocations during parse
  • ~800 MB/s throughput
  • Callback-based event delivery
  • Ideal for large documents or when you only need specific elements

Related Repositories

License

MIT