This is a quick, informal and incomplete specification of the language.
This language simply defines the structure of a list, here, we will use S-Expressions to show how this list is parsed.
Consider the S-Expression [f [a b] c d],
the following Guira expressions are equivalent:
[f [a b] c d]
f [a b] c d
f [a b] c
d
f [a b]
c
d
f
[a b]
c
d
f
a b
c
d
f [a b] \
c d
f a:b c d
f a:b
c
d
Furthermore, if we consider [[a b] [c d] [e f]],
in Guira, we have:
[[a b] [c d] [e f]]
[a:b c:d e:f]
a:b c:d e:f
We can also go further and write [[[a b] [c d]] [[e f] [g h]]]
as:
[[[a b] [c d]] [[e f] [g h]]]
[[a b] [c d]] [[e f] [g h]]
[a b]:[c d] [e f]:[g h]
The expression [head [head list]] can be rewritten head:head:list.
Guira implements metaprogramming at runtime through the use of FEXPR, which here are called forms.
A form is simply a function that receives arguments without evaluation, that is: arguments are implicitly quoted. Here's an example:
let my-quote
form [a . b]
pair a b
This form behaves similar to quote,
except that it always returns a list, and can receive multiple arguments.
Another difference is that it only permits shallow unquoting
by using form-unquote.
print
my-quote 1 2 \
form-unquote [+ 1 2]
form-unquote [+ 2 3]
This prints [1 2 3 5]. There is syntax sugar for form-unquote,
which is ;. So that the above code can be shortened:
print
my-quote 1 2 ;[+ 1 2] ;[+ 2 3]
By being shallow we mean that:
print
my-quote 1 2
+ ;1 ;2
Will print [1 2 [+ ;1 ;2]], this is so that we can use form-unquote
inside nested forms. If it were not shallow, the following code
would not work:
let f
function[x]
my-quote 1 ;[+ 1 x]
As let would try to evaluate ;[+ 1 x] without knowing about x.
If a form returns code, you can evaluate it with eval.
let a
form [x]
'+ 1 ,x
print [a 1] [eval [a 1]]
This prints [+ 1 1] 2. The excerpt [eval [a 1]] can be shortened to ![a 1].
There are two kinds of forms: intrinsic forms and user defined forms. They are different in the sense that intrinsic forms have control of the environment and cannot be defined by the user.
This allows all special forms of other lisps (if, quote, lambda, etc)
to be implemented as forms, which also mean they are first class. It is
completely valid to do [help if] because if is just an identifier.
This allows the language to document itself.
The syntax draws inspiration from S-expressions, T-expressions, I-expressions, O-expressions, M-expressions and Wisp.
Notation here is Wirth Syntax Notation with extensions from the article Indentation-Sensitive Parsing for Parsec and PCRE.
These extensions are, briefly:
- the justification operator
:that forces the production to be in the same indentation as the parent production; - the indentation operator
>that forces the production to be in an indentation strictly greater than the parent production; - the indentation level of a production, which is defined to be the column position of the first token that is consumed (or produced) in that production;
- the production
Whitespacethat indicates tokens that serve only as separators and are otherwise ignored; - the regular expressions, which are inside
//.
Whitespace = '\r' | ' ' | Comment.
Comment = '#' {not_newline_char} '\n'.
Program = Block.
Block = {:I_Expr NL}.
I_Expr = sugar Pairs {Line_Continue} [End | NL >Block].
Line_Continue = '\\' NL Pairs.
Pairs = Pair {Pair}.
End = '.' Pair.
Pair = sugar Term {':' Term}.
Term = Atom | S_Expr.
S_Expr = '[' ML_Pairs ']'.
ML_Pairs = [NL] Pair {ML_Pair} [NL] [End [NL]].
ML_Pair = [NL] Pair.
NL = '\n' {'\n'}.
Atom = id | num | str.
sugar = {grain}.
grain = '!' | "'" | ',' | '@'.
str = /"[\u0000-\uFFFF]*"/.
id = ident_begin {ident_continue}.
ident_begin = /[a-zA-Z_<>\?=\-\+\*\/\%\$]/.
ident_continue = ident_begin | digit.
num = hex | bin | dec.
dec = [neg] integer [frac | float] [exp].
integer = digit {digit_}.
frac = '/' integer.
float = '.' integer.
exp = 'e' [neg] integer.
hex = '0x' hexdigits.
hexdigits = /[0-9A-Fa-f_]+/.
bin = '0b' bindigits.
bindigits = /[01_]+/.
neg = '~'.
digit = /[0-9]/.
digit_ = digit | '_'.After the prototype is finished, i will rewrite everything in C99 for performance reasons. This will also allow me to use Emscripten to compile the interpreter to WASM and create a playground, which is much needed.