-
Notifications
You must be signed in to change notification settings - Fork 3
thomasl/erlang-lex
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Simple table-driven lexer in Erlang.
Works by building an NFA given the rules, then converting the NFA
into a DFA. If multiple rules can match, a priority can be specified.
Ambiguous matches yield compile-time errors.
Note that DYNAMICALLY building the lexing table is normally 'fast
enough' for most uses. Use 'lex:with(LexSpec, String)'.
TODO:
- refactor [lex gen; lex driver; lex codegen; tokenization library]
- improve regexp syntax, including character classes
* currently need lots of escaping, nasty
- improved error checking
* better messages on overlapping matches
- improved matching
* current default is to fail on no match, but tools like lex/flex instead
emit/skip the current char as a default and continue
- support for lexing on binaries
* currently experimental support
- support for incremental lexing
(basically return 'current state' and allow resume later)
- only supports ASCII, in particular no UTF-8 support
* support UTF-8 literals
* support matching single UTF-8 character (variable byte length)
* support full UTF-8 regexps with character classes, etc
(probably also needs codegen)
- code generation
* emit a table as a constant data structure, and the lex driver with it,
and a suitable API
* maybe: emit leex-style full code for the lexer using the table
- worth checking which is faster and smaller
* note: for UTF-8, the table will usually be very sparse and leex-style
explicit code may be required
About
Table-driven lexer for Erlang
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published