Skip to content

Grammar: Verilog 2001 (synthesizable subset) #5

Description

@lanserge

Summary

Add a verilog_2001.rawast grammar covering the synthesizable subset of Verilog 2001 (IEEE 1364-2001) — modules, ports, declarations, continuous assignments, procedural blocks (always / initial), statements, instantiations, and the 17-level expression precedence chain.

This is the foundation for any future SystemVerilog work: SV-2017/2023 extends Verilog 2001 rather than replacing it, so a clean V2001 base unblocks SVA assertions (via subparse), classes, interfaces, packages, and constraints as incremental extensions.

A working sketch of the top-level rule structure lives at https://github.com/edacommons/rawast/discussions (paste from the design conversation) and demonstrates the patterns — module declaration, port-list Choice between ANSI and Verilog-95 styles, fan-out at MODULE_ITEM, the per-level expression precedence pattern, etc.

Why now

Three concrete use cases that exist today and want a bidirectional Verilog parser:

  • Source rewriters / formatters — Verible, the open-source SV formatter, took years to build because it shipped its own emitter alongside its parser. rawast's bidirectional walk gives save-direction emit for free; a Verilog formatter on top is a thin layer.
  • Lint / static-analysis tools — the AST is already the structured IR; tools written in Python (via the Python binding) can walk it without reimplementing parsing.
  • EDA flow integration — netlist round-tripping for tool-chain wrappers; the bidirectional model means you can take a flat netlist, mutate the AST, and save it back without losing comments or whitespace structure.

The combination of "bidirectional" + "grammar as data" + "JSON-uniform AST" is unique to rawast among EDA-targeted parsers. Verilog is the canonical EDA input format; not having it is a real coverage gap.

Scope: in

Area Notes
Module structure module … endmodule, parameter port list, port list (ANSI + Verilog-95 styles)
Module items Net / reg declarations, parameters, continuous assign, always blocks, initial blocks, module instantiations
Statements Blocking and non-blocking assignments, if/else, case/casex/casez, begin/end blocks, for, while, repeat, disable, wait, force/release, system task calls
Expressions Full 17-level precedence chain (assignment → ternary → ‖ → && → | → ^ → & → ==/!= → <=/< → shifts → +/- → */// % → ** → unary → primary)
Primaries Numeric literals (every form), string literals, identifiers, bit/part selects, concatenation, function calls, system task calls ($display, etc.)
Event control @(posedge clk or negedge rst_n), @*, @(a or b)
Generate blocks generate / endgenerate, genvar, generate-loop, generate-if, generate-case
Tasks and functions task / endtask, function / endfunction with their body grammars
Comments and whitespace // line and /* */ block comments via the verilog parser group

Scope: out (for this issue)

Area Why deferred
Preprocessor (`define, `include, `ifdef/`else/`endif) Conceptually a separate language. Best handled as a separate preprocessing pass before parsing — what real SV tools do. Filed as a follow-up issue.
UDP declarations Out of scope for synthesizable subset; primitives can be added if a use case appears.
Specify blocks Analog timing arcs; rare in synthesizable code.
defparam Generally banned in synthesizable RTL anyway.
SystemVerilog features (classes, SVA, interfaces, modports, constraints, covergroups) Each gets its own issue. SVA is the most natural first SV extension via subparse.
The full IEEE 1364-2001 LRM grammar ~500 production rules; this issue covers the productive synthesizable subset, not the whole spec.

Pre-requisite: verilog parser group

A C++ parser group (src/parsers_verilog.cpp) with four terminals — same pattern as gdsii / lefdef / tcl:

class VerilogIdentifierParser : public Parser {
    // [a-zA-Z_][a-zA-Z_0-9$]* (simple) OR
    // \<chars-until-whitespace> (escaped) OR
    // $<simple-identifier>     (system task / function name)
};

class VerilogNumberParser : public Parser {
    // 8'hFF, 8'sb01, 'd42, 42, 42.5, 42.5ns, 1step — every numeric
    // form Verilog accepts. ~150 lines on its own; the gating piece.
};

class VerilogStringParser : public Parser {
    // "literal" with C-style escapes (\n, \t, \", \\, \<digit>, \xHH)
};

class VerilogCommentParser : public Parser {
    // // to EOL  AND  /* ... */ block (nestable behavior matches LRM)
};

The number parser is the load-bearing piece — Verilog has 7+ numeric literal syntactic forms. Comparable scope to the LEF/DEF specialised identifier parser.

Implementation phasing

Phase Scope Effort
0. Parser group src/parsers_verilog.cpp + unit tests for each terminal 1 week
1. Module + port list + simple module items Modules, both port-list styles, net/reg decls, continuous assigns. Smoke-test on simple netlists. 1 week
2. Procedural blocks + statements always / initial, all statement kinds, event controls. Test against PicoRV32 RTL. 1 week
3. Expression chain All 17 precedence levels. Decision: parens-required for mixed-op chains (Option C) vs left-associative fold step (post-parse). Document in grammar. 0.5 week
4. Module instantiation + generate Hierarchical instantiation, generate-loop / -if / -case. 1 week
5. Corpus testing + bug fixes OpenTitan, OpenCores, PicoRV32, RISC-V core RTL. Expect ~50 edge-case fixes. 2 weeks
6. Type-vs-variable disambiguation Design decision (see below) 1 week

Total: ~6-7 weeks of focused work for production-grade synthesizable V2001 coverage.

Design decision: type-vs-variable identifier disambiguation

The classic Verilog ambiguity: foo bar; could be a variable declaration (foo is a type, bar is the new variable) OR a statement (foo is an expression, bar is something else — actually this specific form is illegal in V2001, but the general type-vs-variable distinction shows up in many places). Real SV parsers maintain a symbol table during parse and look up identifiers.

PEG can't do this naturally. Two viable paths in rawast:

Path A: Permissive parsing + post-AST disambiguation. Grammar accepts both interpretations as the same AST shape (e.g. {type: "ident_followed_by_ident", first: foo, second: bar}). A post-parse pass walks the AST with a symbol table and rewrites the ambiguous nodes into typed declarations vs. expressions.

  • Pros: pure grammar, no callbacks, easy to test
  • Cons: AST has a brief "ambiguous" phase that downstream tools have to understand

Path B: Mid-parse callbacks register types as they're declared. rawast already supports rule-completion callbacks (used by LEF's DIVIDERCHAR / BUSBITCHARS mechanism). A typedef-rule callback would register the new type name; the identifier parser would then know to dispatch differently for known type names.

  • Pros: closer to how real SV parsers work; AST is immediately correct
  • Cons: stateful parse (the type-set varies during the parse); test isolation is harder; the callback mechanism gets a serious workout (currently only used for LEF's 2 directives)

Recommendation: start with Path A. Simpler, deterministic, and the post-AST pass is just regular Python / C++ code. Migrate to Path B only if benchmarks show it matters.

Roadmap fit

  • M1 (current) — typed Python developer surface; this issue extends the per-format grammar corpus, sits naturally in M1.
  • M2 (.jast container) — once V2001 parses cleanly, real RTL designs can be shipped as .jast artifacts — pre-parsed AST + grammar, portable across tooling, no re-parse cost. Significant value for cloud-scale verification pipelines.

Out-of-scope follow-ups (file as separate issues if/when)

  • verilog_preprocessor`define, `include, `ifdef/`else/`endif, conditional compilation. Separate language, separate grammar.
  • systemverilog.rawast — extends V2001 with classes, interfaces, packages, generics, etc. Probably 6-8 weeks on top of V2001.
  • sva.rawast — SVA assertions and properties; loaded via subparse from SV-2017 grammar's assert property (...) expressions. ~2 weeks.
  • Verilog formatter as a rawast.tools.verilog_format Python package, using the bidirectional save with pretty-print attributes on the grammar.

Related

  • The sketch (~280 lines covering top-level structure) — paste below or attach.
  • FastDRC constraint grammar discussion — established the patterns for direct {op, args} IR emission and operator-precedence handling that this grammar uses.
  • LEF/DEF grammar (grammars/lefdef.rawast) — comparable scope; useful precedent for the parser-group + corpus-test pattern.
  • Tcl grammar (grammars/tcl.rawast) — precedent for subparse + rule-local ignore, both of which will be useful for SV's embedded sub-languages later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions