A Tree-sitter grammar for the Multilingual Programming Language.
This repository gives you:
- A Tree-sitter parser for
.multisource files - Query files for highlighting, indentation, and folding
- Node and Rust bindings for embedding the parser in applications
- Generator scripts for TextMate and Monaco grammar outputs
The Multilingual Programming Language is an experimental language where the same semantic constructs can be written with keywords from multiple human languages.
Examples:
- English:
if,def,class - French:
si,def,classe - German:
wenn,def,klasse - Japanese: supported through the shared keyword registry
- Arabic: supported through the shared keyword registry
The full keyword registry currently covers 17 languages, including Spanish, Portuguese, Chinese, Italian, Dutch, Polish, Swedish, Danish, Finnish, Hindi, Bengali, and Tamil.
Keywords can be mixed freely in the same program. The language is indentation-sensitive and broadly Python-like in structure.
This repo is useful when you want to:
- Parse Multilingual source code into syntax trees
- Add syntax highlighting to an editor or viewer
- Support indentation and folding in Tree-sitter-aware editors
- Generate editor grammar artifacts from a single keyword registry
After building, the main outputs are:
src/parser.candsrc/scanner.cfor the Tree-sitter parserbindings/node/for Node.js consumersbindings/rust/for Rust consumersqueries/highlights.scm,queries/indents.scm,queries/folds.scmgenerated/multilingual.tmLanguage.jsonfor TextMate-compatible editorsgenerated/monarch.jsonfor Monaco-based editors
This repository is a good upstream grammar source for future github-linguist support, but it is not the github-linguist integration itself.
For language detection and syntax-highlighting discussions, this repository treats:
.multias the canonical file extensionsource.multias the TextMate scope
The repository no longer advertises .ml, because that extension is already heavily associated with other languages and would create avoidable detection conflicts.
- Node.js 14+ for the Tree-sitter CLI and Node binding build
- Python 3.10+ for the generator scripts
- PyYAML for the build scripts:
pip install pyyaml - A C compiler such as GCC, Clang, or MSVC
git clone https://github.com/multilingualprogramming/tree-sitter-multilingual.git
cd tree-sitter-multilingual
npm install
make allIf you only want the parser and tests:
npm install
make build
make testmake inject # Expand multilingual keyword aliases into grammar.js
make generate # Run tree-sitter generate
make build # Build the native Node binding
make test # Run Tree-sitter corpus tests
make tmgrammar # Generate TextMate grammar JSON
make monarch # Generate Monaco tokenizer JSON
make validate # Validate keyword coverage
make all # Run the full pipelinedata/keywords.yaml: Canonical keyword registry across all supported languagesgrammar.js: Tree-sitter grammar sourcesrc/scanner.c: External scanner for indentation handlingqueries/: Highlight, indentation, and folding queriesscripts/: Build and generation scriptstest/corpus/: Grammar test corpusexamples/: Sample.multiprogramsbindings/node/: Node.js binding entrypointbindings/rust/: Rust crate wrapper
The build is driven from data/keywords.yaml.
scripts/inject_aliases.pyreplaces// build:inject <construct>markers ingrammar.jswith generatedchoice(...)expressions.tree-sitter generatecompiles the grammar into C sources.node-gyp rebuildbuilds the native Node binding.scripts/build_tmgrammar.pygeneratesgenerated/multilingual.tmLanguage.json.scripts/build_monarch.pygeneratesgenerated/monarch.json.scripts/validate_coverage.pychecks that keyword coverage is complete and consistent.
The files in generated/ are committed intentionally because they are directly useful to downstream editor and highlighting integrations.
If you change data/keywords.yaml, grammar.js, or generator scripts, regenerate and review the committed outputs before opening a PR:
make tmgrammar
make monarch
make validateAt a minimum, check that these files are updated together when relevant:
generated/multilingual.tmLanguage.jsongenerated/monarch.jsonREADME.mdif user-facing behavior or supported extension guidance changed
After building the parser, you can load it through the bundled Node binding:
const Parser = require("tree-sitter");
const Multilingual = require("./bindings/node");
const parser = new Parser();
parser.setLanguage(Multilingual);
const source = `
def greet(name):
return f"Hello, {name}"
`;
const tree = parser.parse(source);
console.log(tree.rootNode.toString());Use this approach when you are building:
- A CLI formatter or linter
- A code analysis tool
- A desktop app or Electron app
- A custom language service prototype
The Rust binding exposes a language() function for tree-sitter:
let source = r#"
def greet(name):
return f"Hello, {name}"
"#;
let mut parser = tree_sitter::Parser::new();
parser
.set_language(tree_sitter_multilingual::language())
.expect("failed to load multilingual grammar");
let tree = parser.parse(source, None).expect("failed to parse source");
println!("{}", tree.root_node().to_sexp());Use this when you want to embed the grammar in:
- A Rust CLI
- A language server
- A static analysis tool
- A backend service that parses code snippets
This repository already includes the standard query files used by many Tree-sitter integrations:
queries/highlights.scmqueries/indents.scmqueries/folds.scm
These are the files editors typically use for:
- Syntax highlighting
- Auto-indentation
- Code folding
For nvim-treesitter, the key pieces you need are:
- The generated parser
- The
queries/directory - The language registration metadata
Example setup:
require("nvim-treesitter.configs").setup {
highlight = { enable = true },
}To fully integrate this language in Neovim, you would typically:
- Register the parser with
nvim-treesitter - Point it at this repository
- Install the
queries/*.scmfiles alongside the parser
Run:
make tmgrammarThis generates:
generated/multilingual.tmLanguage.json
Use that file inside a VS Code extension, or any editor/tooling stack that consumes TextMate grammars.
This is the right path for:
- VS Code extensions
- Syntax highlighting in Shiki-compatible pipelines
- Any tool that relies on TextMate scopes rather than Tree-sitter directly
Run:
make monarchThis generates:
generated/monarch.json
Use that file when registering a Monaco tokenizer in:
- Monaco Editor
- Browser IDEs
- Web playgrounds
- Electron apps using Monaco
If your application does not need incremental parsing, you can still use the generated grammar artifacts:
- Use
generated/multilingual.tmLanguage.jsonfor TextMate-compatible highlighters - Use
generated/monarch.jsonfor Monaco-based editors
That is often the simplest option for:
- Documentation sites
- Playground pages
- Code preview components
- Read-only syntax highlighting
See these repository examples:
examples/english.multifor a straightforward English-oriented sampleexamples/french.multifor a French-surface sample built around.multi
A small example:
def greet(name = "World"):
return f"Hello, {name}!"
if True:
print(greet("Alice"))
Current grammar coverage includes:
- Functions and classes
- Assignments and expressions
if/elif/elseforandwhile- Imports
- Arithmetic, comparison, logical, and bitwise operators
- Strings, f-strings, numbers, lists, dicts, tuples, and sets
- Line comments using
#
data/keywords.yaml is the source of truth for keyword aliases.
It defines multilingual surface forms for constructs such as:
if,else,eliffor,while,break,continuedef,class,returnimport,from,asand,or,not,in,isTrue,False,None
If you want to add or update a language, start there.
- Edit
data/keywords.yaml - Run
make all - Run
make test - Inspect the updated generated outputs
Run the grammar corpus tests with:
make testThe test files live in test/corpus/*.txt.
- No complex unpacking in assignments
- No decorators or type annotations
- No
async/await - No
withstatements - Comments cannot appear inside expressions
MIT