Skip to content

Latest commit

 

History

History
224 lines (167 loc) · 5.17 KB

File metadata and controls

224 lines (167 loc) · 5.17 KB

Guira

This is a quick, informal and incomplete specification of the language.

Contents

The List

This language simply defines the structure of a list, here, we will use S-Expressions to show how this list is parsed.

Consider the S-Expression [f [a b] c d], the following Guira expressions are equivalent:

[f [a b] c d]

f [a b] c d

f [a b] c
  d

f [a b]
  c
  d

f
  [a b]
  c
  d

f
  a b
  c
  d

f [a b] \
  c d

f a:b c d

f a:b
  c
  d

Furthermore, if we consider [[a b] [c d] [e f]], in Guira, we have:

[[a b] [c d] [e f]]

[a:b c:d e:f]

a:b c:d e:f

We can also go further and write [[[a b] [c d]] [[e f] [g h]]] as:

[[[a b] [c d]] [[e f] [g h]]]
[[a b] [c d]] [[e f] [g h]]
[a b]:[c d] [e f]:[g h]

The expression [head [head list]] can be rewritten head:head:list.

Forms

Guira implements metaprogramming at runtime through the use of FEXPR, which here are called forms.

A form is simply a function that receives arguments without evaluation, that is: arguments are implicitly quoted. Here's an example:

let my-quote
  form [a . b] 
    pair a b

This form behaves similar to quote, except that it always returns a list, and can receive multiple arguments. Another difference is that it only permits shallow unquoting by using form-unquote.

print
  my-quote 1 2 \
    form-unquote [+ 1 2]
    form-unquote [+ 2 3]

This prints [1 2 3 5]. There is syntax sugar for form-unquote, which is ;. So that the above code can be shortened:

print
  my-quote 1 2 ;[+ 1 2] ;[+ 2 3]

By being shallow we mean that:

print
  my-quote 1 2
    + ;1 ;2

Will print [1 2 [+ ;1 ;2]], this is so that we can use form-unquote inside nested forms. If it were not shallow, the following code would not work:

let f
  function[x]
    my-quote 1 ;[+ 1 x]

As let would try to evaluate ;[+ 1 x] without knowing about x.

If a form returns code, you can evaluate it with eval.

let a
  form [x]
    '+ 1 ,x
print [a 1] [eval [a 1]]

This prints [+ 1 1] 2. The excerpt [eval [a 1]] can be shortened to ![a 1].

There are two kinds of forms: intrinsic forms and user defined forms. They are different in the sense that intrinsic forms have control of the environment and cannot be defined by the user.

This allows all special forms of other lisps (if, quote, lambda, etc) to be implemented as forms, which also mean they are first class. It is completely valid to do [help if] because if is just an identifier. This allows the language to document itself.

Syntax

The syntax draws inspiration from S-expressions, T-expressions, I-expressions, O-expressions, M-expressions and Wisp.

Notation here is Wirth Syntax Notation with extensions from the article Indentation-Sensitive Parsing for Parsec and PCRE.

These extensions are, briefly:

  • the justification operator : that forces the production to be in the same indentation as the parent production;
  • the indentation operator > that forces the production to be in an indentation strictly greater than the parent production;
  • the indentation level of a production, which is defined to be the column position of the first token that is consumed (or produced) in that production;
  • the production Whitespace that indicates tokens that serve only as separators and are otherwise ignored;
  • the regular expressions, which are inside //.
Whitespace = '\r' | ' ' | Comment.
Comment = '#' {not_newline_char} '\n'.

Program = Block.
Block = {:I_Expr NL}.

I_Expr = sugar Pairs {Line_Continue} [End | NL >Block].
Line_Continue = '\\' NL Pairs.
Pairs = Pair {Pair}.
End = '.' Pair.
Pair = sugar Term {':' Term}.
Term = Atom | S_Expr.
S_Expr = '[' ML_Pairs ']'.
ML_Pairs = [NL] Pair {ML_Pair} [NL] [End [NL]].
ML_Pair = [NL] Pair.

NL = '\n' {'\n'}.
Atom = id | num | str.

sugar = {grain}.
grain = '!' | "'" | ',' | '@'.
str = /"[\u0000-\uFFFF]*"/.

id = ident_begin {ident_continue}.
ident_begin = /[a-zA-Z_<>\?=\-\+\*\/\%\$]/.
ident_continue = ident_begin | digit.

num = hex | bin | dec.
dec = [neg] integer [frac | float] [exp].
integer = digit {digit_}.
frac = '/' integer.
float = '.' integer.
exp = 'e' [neg] integer.

hex = '0x' hexdigits.
hexdigits = /[0-9A-Fa-f_]+/.
bin = '0b' bindigits.
bindigits = /[01_]+/.

neg = '~'.
digit = /[0-9]/.
digit_ = digit | '_'.

Future

After the prototype is finished, i will rewrite everything in C99 for performance reasons. This will also allow me to use Emscripten to compile the interpreter to WASM and create a playground, which is much needed.