Summary
Add a verilog_2001.rawast grammar covering the synthesizable subset of Verilog 2001 (IEEE 1364-2001) — modules, ports, declarations, continuous assignments, procedural blocks (always / initial), statements, instantiations, and the 17-level expression precedence chain.
This is the foundation for any future SystemVerilog work: SV-2017/2023 extends Verilog 2001 rather than replacing it, so a clean V2001 base unblocks SVA assertions (via subparse), classes, interfaces, packages, and constraints as incremental extensions.
A working sketch of the top-level rule structure lives at https://github.com/edacommons/rawast/discussions (paste from the design conversation) and demonstrates the patterns — module declaration, port-list Choice between ANSI and Verilog-95 styles, fan-out at MODULE_ITEM, the per-level expression precedence pattern, etc.
Why now
Three concrete use cases that exist today and want a bidirectional Verilog parser:
- Source rewriters / formatters — Verible, the open-source SV formatter, took years to build because it shipped its own emitter alongside its parser. rawast's bidirectional walk gives save-direction emit for free; a Verilog formatter on top is a thin layer.
- Lint / static-analysis tools — the AST is already the structured IR; tools written in Python (via the Python binding) can walk it without reimplementing parsing.
- EDA flow integration — netlist round-tripping for tool-chain wrappers; the bidirectional model means you can take a flat netlist, mutate the AST, and save it back without losing comments or whitespace structure.
The combination of "bidirectional" + "grammar as data" + "JSON-uniform AST" is unique to rawast among EDA-targeted parsers. Verilog is the canonical EDA input format; not having it is a real coverage gap.
Scope: in
| Area |
Notes |
| Module structure |
module … endmodule, parameter port list, port list (ANSI + Verilog-95 styles) |
| Module items |
Net / reg declarations, parameters, continuous assign, always blocks, initial blocks, module instantiations |
| Statements |
Blocking and non-blocking assignments, if/else, case/casex/casez, begin/end blocks, for, while, repeat, disable, wait, force/release, system task calls |
| Expressions |
Full 17-level precedence chain (assignment → ternary → ‖ → && → | → ^ → & → ==/!= → <=/< → shifts → +/- → */// % → ** → unary → primary) |
| Primaries |
Numeric literals (every form), string literals, identifiers, bit/part selects, concatenation, function calls, system task calls ($display, etc.) |
| Event control |
@(posedge clk or negedge rst_n), @*, @(a or b) |
| Generate blocks |
generate / endgenerate, genvar, generate-loop, generate-if, generate-case |
| Tasks and functions |
task / endtask, function / endfunction with their body grammars |
| Comments and whitespace |
// line and /* */ block comments via the verilog parser group |
Scope: out (for this issue)
| Area |
Why deferred |
Preprocessor (`define, `include, `ifdef/`else/`endif) |
Conceptually a separate language. Best handled as a separate preprocessing pass before parsing — what real SV tools do. Filed as a follow-up issue. |
| UDP declarations |
Out of scope for synthesizable subset; primitives can be added if a use case appears. |
| Specify blocks |
Analog timing arcs; rare in synthesizable code. |
defparam |
Generally banned in synthesizable RTL anyway. |
| SystemVerilog features (classes, SVA, interfaces, modports, constraints, covergroups) |
Each gets its own issue. SVA is the most natural first SV extension via subparse. |
| The full IEEE 1364-2001 LRM grammar |
~500 production rules; this issue covers the productive synthesizable subset, not the whole spec. |
Pre-requisite: verilog parser group
A C++ parser group (src/parsers_verilog.cpp) with four terminals — same pattern as gdsii / lefdef / tcl:
class VerilogIdentifierParser : public Parser {
// [a-zA-Z_][a-zA-Z_0-9$]* (simple) OR
// \<chars-until-whitespace> (escaped) OR
// $<simple-identifier> (system task / function name)
};
class VerilogNumberParser : public Parser {
// 8'hFF, 8'sb01, 'd42, 42, 42.5, 42.5ns, 1step — every numeric
// form Verilog accepts. ~150 lines on its own; the gating piece.
};
class VerilogStringParser : public Parser {
// "literal" with C-style escapes (\n, \t, \", \\, \<digit>, \xHH)
};
class VerilogCommentParser : public Parser {
// // to EOL AND /* ... */ block (nestable behavior matches LRM)
};
The number parser is the load-bearing piece — Verilog has 7+ numeric literal syntactic forms. Comparable scope to the LEF/DEF specialised identifier parser.
Implementation phasing
| Phase |
Scope |
Effort |
| 0. Parser group |
src/parsers_verilog.cpp + unit tests for each terminal |
1 week |
| 1. Module + port list + simple module items |
Modules, both port-list styles, net/reg decls, continuous assigns. Smoke-test on simple netlists. |
1 week |
| 2. Procedural blocks + statements |
always / initial, all statement kinds, event controls. Test against PicoRV32 RTL. |
1 week |
| 3. Expression chain |
All 17 precedence levels. Decision: parens-required for mixed-op chains (Option C) vs left-associative fold step (post-parse). Document in grammar. |
0.5 week |
| 4. Module instantiation + generate |
Hierarchical instantiation, generate-loop / -if / -case. |
1 week |
| 5. Corpus testing + bug fixes |
OpenTitan, OpenCores, PicoRV32, RISC-V core RTL. Expect ~50 edge-case fixes. |
2 weeks |
| 6. Type-vs-variable disambiguation |
Design decision (see below) |
1 week |
Total: ~6-7 weeks of focused work for production-grade synthesizable V2001 coverage.
Design decision: type-vs-variable identifier disambiguation
The classic Verilog ambiguity: foo bar; could be a variable declaration (foo is a type, bar is the new variable) OR a statement (foo is an expression, bar is something else — actually this specific form is illegal in V2001, but the general type-vs-variable distinction shows up in many places). Real SV parsers maintain a symbol table during parse and look up identifiers.
PEG can't do this naturally. Two viable paths in rawast:
Path A: Permissive parsing + post-AST disambiguation. Grammar accepts both interpretations as the same AST shape (e.g. {type: "ident_followed_by_ident", first: foo, second: bar}). A post-parse pass walks the AST with a symbol table and rewrites the ambiguous nodes into typed declarations vs. expressions.
- Pros: pure grammar, no callbacks, easy to test
- Cons: AST has a brief "ambiguous" phase that downstream tools have to understand
Path B: Mid-parse callbacks register types as they're declared. rawast already supports rule-completion callbacks (used by LEF's DIVIDERCHAR / BUSBITCHARS mechanism). A typedef-rule callback would register the new type name; the identifier parser would then know to dispatch differently for known type names.
- Pros: closer to how real SV parsers work; AST is immediately correct
- Cons: stateful parse (the type-set varies during the parse); test isolation is harder; the callback mechanism gets a serious workout (currently only used for LEF's 2 directives)
Recommendation: start with Path A. Simpler, deterministic, and the post-AST pass is just regular Python / C++ code. Migrate to Path B only if benchmarks show it matters.
Roadmap fit
- M1 (current) — typed Python developer surface; this issue extends the per-format grammar corpus, sits naturally in M1.
- M2 (
.jast container) — once V2001 parses cleanly, real RTL designs can be shipped as .jast artifacts — pre-parsed AST + grammar, portable across tooling, no re-parse cost. Significant value for cloud-scale verification pipelines.
Out-of-scope follow-ups (file as separate issues if/when)
verilog_preprocessor — `define, `include, `ifdef/`else/`endif, conditional compilation. Separate language, separate grammar.
systemverilog.rawast — extends V2001 with classes, interfaces, packages, generics, etc. Probably 6-8 weeks on top of V2001.
sva.rawast — SVA assertions and properties; loaded via subparse from SV-2017 grammar's assert property (...) expressions. ~2 weeks.
- Verilog formatter as a
rawast.tools.verilog_format Python package, using the bidirectional save with pretty-print attributes on the grammar.
Related
- The sketch (~280 lines covering top-level structure) — paste below or attach.
- FastDRC constraint grammar discussion — established the patterns for direct
{op, args} IR emission and operator-precedence handling that this grammar uses.
- LEF/DEF grammar (
grammars/lefdef.rawast) — comparable scope; useful precedent for the parser-group + corpus-test pattern.
- Tcl grammar (
grammars/tcl.rawast) — precedent for subparse + rule-local ignore, both of which will be useful for SV's embedded sub-languages later.
Summary
Add a
verilog_2001.rawastgrammar covering the synthesizable subset of Verilog 2001 (IEEE 1364-2001) — modules, ports, declarations, continuous assignments, procedural blocks (always / initial), statements, instantiations, and the 17-level expression precedence chain.This is the foundation for any future SystemVerilog work: SV-2017/2023 extends Verilog 2001 rather than replacing it, so a clean V2001 base unblocks SVA assertions (via subparse), classes, interfaces, packages, and constraints as incremental extensions.
A working sketch of the top-level rule structure lives at https://github.com/edacommons/rawast/discussions (paste from the design conversation) and demonstrates the patterns — module declaration, port-list Choice between ANSI and Verilog-95 styles, fan-out at MODULE_ITEM, the per-level expression precedence pattern, etc.
Why now
Three concrete use cases that exist today and want a bidirectional Verilog parser:
The combination of "bidirectional" + "grammar as data" + "JSON-uniform AST" is unique to rawast among EDA-targeted parsers. Verilog is the canonical EDA input format; not having it is a real coverage gap.
Scope: in
module … endmodule, parameter port list, port list (ANSI + Verilog-95 styles)assign,alwaysblocks,initialblocks, module instantiationsif/else,case/casex/casez,begin/endblocks,for,while,repeat,disable,wait,force/release, system task calls$display, etc.)@(posedge clk or negedge rst_n),@*,@(a or b)generate/endgenerate,genvar, generate-loop, generate-if, generate-casetask/endtask,function/endfunctionwith their body grammars//line and/* */block comments via theverilogparser groupScope: out (for this issue)
`define,`include,`ifdef/`else/`endif)defparamsubparse.Pre-requisite:
verilogparser groupA C++ parser group (
src/parsers_verilog.cpp) with four terminals — same pattern asgdsii/lefdef/tcl:The number parser is the load-bearing piece — Verilog has 7+ numeric literal syntactic forms. Comparable scope to the LEF/DEF specialised identifier parser.
Implementation phasing
src/parsers_verilog.cpp+ unit tests for each terminalalways/initial, all statement kinds, event controls. Test against PicoRV32 RTL.Total: ~6-7 weeks of focused work for production-grade synthesizable V2001 coverage.
Design decision: type-vs-variable identifier disambiguation
The classic Verilog ambiguity:
foo bar;could be a variable declaration (foois a type,baris the new variable) OR a statement (foois an expression,baris something else — actually this specific form is illegal in V2001, but the general type-vs-variable distinction shows up in many places). Real SV parsers maintain a symbol table during parse and look up identifiers.PEG can't do this naturally. Two viable paths in rawast:
Path A: Permissive parsing + post-AST disambiguation. Grammar accepts both interpretations as the same AST shape (e.g.
{type: "ident_followed_by_ident", first: foo, second: bar}). A post-parse pass walks the AST with a symbol table and rewrites the ambiguous nodes into typed declarations vs. expressions.Path B: Mid-parse callbacks register types as they're declared. rawast already supports rule-completion callbacks (used by LEF's
DIVIDERCHAR/BUSBITCHARSmechanism). A typedef-rule callback would register the new type name; the identifier parser would then know to dispatch differently for known type names.Recommendation: start with Path A. Simpler, deterministic, and the post-AST pass is just regular Python / C++ code. Migrate to Path B only if benchmarks show it matters.
Roadmap fit
.jastcontainer) — once V2001 parses cleanly, real RTL designs can be shipped as.jastartifacts — pre-parsed AST + grammar, portable across tooling, no re-parse cost. Significant value for cloud-scale verification pipelines.Out-of-scope follow-ups (file as separate issues if/when)
verilog_preprocessor—`define,`include,`ifdef/`else/`endif, conditional compilation. Separate language, separate grammar.systemverilog.rawast— extends V2001 with classes, interfaces, packages, generics, etc. Probably 6-8 weeks on top of V2001.sva.rawast— SVA assertions and properties; loaded viasubparsefrom SV-2017 grammar'sassert property (...)expressions. ~2 weeks.rawast.tools.verilog_formatPython package, using the bidirectional save with pretty-print attributes on the grammar.Related
{op, args}IR emission and operator-precedence handling that this grammar uses.grammars/lefdef.rawast) — comparable scope; useful precedent for the parser-group + corpus-test pattern.grammars/tcl.rawast) — precedent forsubparse+ rule-local ignore, both of which will be useful for SV's embedded sub-languages later.