Skip to content

Replace TLAPM parser with SANY #213

@ahelwer

Description

@ahelwer

As discussed in tlaplus/rfcs#16, the TLAPM parser is to be replaced with SANY. This work has been approved for funding by the TLA⁺ Foundation. The reasons for this replacement are as follows:

  1. Reduced maintenance burden, since only one fully-featured parser will need to be maintained
  2. The TLAPM parser lacks many semantic checks and does not have a fully-functional level checker
  3. The TLAPM parser has many bugs that need to be fixed (although most are minor)

One remaining question is how to handle the interface between TLAPM and SANY, as TLAPM is an OCaml program and SANY is a Java program. Three possibilities exist:

  1. From TLAPM, call SANY's XML Exporter program to parse the TLA⁺ input in a separate process, then consume the output XML from OCaml & translate it to TLAPM's existing abstract syntax tree (AST) format; to reduce startup time, SANY will be run as a GraalVM Native Image so it does not need to load the JVM.
  2. Compile SANY to a GraalVM Native Image, exposing a C API so SANY can be called directly from C code; then, call this C API from OCaml code in TLAPM.
  3. Call SANY from TLAPM OCaml code using some other OCaml/Java interface method; there are a few (1, 2, 3) of these projects floating around but nothing that seems fully-featured & maintained, thus this option is not seriously considered.

Of these options, (1) has been explored to the greatest depth. A prototype parser of SANY's XML output was written back in November 2024. However, at that time @Calvin-L had not yet experimented with compiling SANY to a Graal Native Image, and I was much less familiar with GraalVM generally and its interoperability goals. I have spent some time recently experimenting with exposing a simple C API from Java in Graal.

While it would be nicer to call SANY directly instead of having to parse an intermediate XML AST representation, there does seem to be one substantial drawback: Java code seems limited in the datatypes that can be exposed via Graal Native Image C APIs. My understanding at this time (which might be incorrect) is that the C APIs must return a concretely-defined datatype, not an abstract datatype that can be dynamically casted to an inheriting class. SANY's parse tree node datatype is currently implemented as a fairly deep tree of inheritance; see this by opening the type hierarchy of the SemanticNode class in Eclipse. Thus I believe if we wanted to go this route we would have to write a C API layer inside SANY which somehow simplifies its AST format and makes it concrete. I also don't believe the C API could support variants but might be wrong.

In contrast, XML is very flexible in a way which matches OCaml's type system fairly well, with variants and so forth. We also already have a fairly functional XML parser for this format from past work. Thus while the GraalVM Native Image C API option is compelling and would have substantial benefits for other users who want to consume SANY from non-JVM languages (including web languages!) for expediency the XML option might be the best. What do people think?

Other than this question of how to interface with SANY, the greatest remaining work item is to translate the SANY AST format - however we get it - into TLAPM's existing AST format. We want to leave TLAPM's AST format unchanged as all existing transformations to backend theorem provers are defined over this format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    semantic checkerIssues relating to TLAPM's resolution of semantic propertiessyntax parserIssues relating to TLAPM's syntax parser

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions