tree-sitter-multilingual

A Tree-sitter grammar for the Multilingual Programming Language.

This repository gives you:

A Tree-sitter parser for .multi source files
Query files for highlighting, indentation, and folding
Node and Rust bindings for embedding the parser in applications
Generator scripts for TextMate and Monaco grammar outputs

What is the Multilingual Programming Language?

The Multilingual Programming Language is an experimental language where the same semantic constructs can be written with keywords from multiple human languages.

Examples:

English: if, def, class
French: si, def, classe
German: wenn, def, klasse
Japanese: supported through the shared keyword registry
Arabic: supported through the shared keyword registry

The full keyword registry currently covers 17 languages, including Spanish, Portuguese, Chinese, Italian, Dutch, Polish, Swedish, Danish, Finnish, Hindi, Bengali, and Tamil.

Keywords can be mixed freely in the same program. The language is indentation-sensitive and broadly Python-like in structure.

What This Repository Produces

This repo is useful when you want to:

Parse Multilingual source code into syntax trees
Add syntax highlighting to an editor or viewer
Support indentation and folding in Tree-sitter-aware editors
Generate editor grammar artifacts from a single keyword registry

After building, the main outputs are:

src/parser.c and src/scanner.c for the Tree-sitter parser
bindings/node/ for Node.js consumers
bindings/rust/ for Rust consumers
queries/highlights.scm, queries/indents.scm, queries/folds.scm
generated/multilingual.tmLanguage.json for TextMate-compatible editors
generated/monarch.json for Monaco-based editors

GitHub Linguist Positioning

This repository is a good upstream grammar source for future github-linguist support, but it is not the github-linguist integration itself.

For language detection and syntax-highlighting discussions, this repository treats:

.multi as the canonical file extension
source.multi as the TextMate scope

The repository no longer advertises .ml, because that extension is already heavily associated with other languages and would create avoidable detection conflicts.

Quick Start

Prerequisites

Node.js 14+ for the Tree-sitter CLI and Node binding build
Python 3.10+ for the generator scripts
PyYAML for the build scripts: pip install pyyaml
A C compiler such as GCC, Clang, or MSVC

Build Everything

git clone https://github.com/multilingualprogramming/tree-sitter-multilingual.git
cd tree-sitter-multilingual
npm install
make all

If you only want the parser and tests:

npm install
make build
make test

Build Targets

make inject     # Expand multilingual keyword aliases into grammar.js
make generate   # Run tree-sitter generate
make build      # Build the native Node binding
make test       # Run Tree-sitter corpus tests
make tmgrammar  # Generate TextMate grammar JSON
make monarch    # Generate Monaco tokenizer JSON
make validate   # Validate keyword coverage
make all        # Run the full pipeline

Repository Layout

data/keywords.yaml: Canonical keyword registry across all supported languages
grammar.js: Tree-sitter grammar source
src/scanner.c: External scanner for indentation handling
queries/: Highlight, indentation, and folding queries
scripts/: Build and generation scripts
test/corpus/: Grammar test corpus
examples/: Sample .multi programs
bindings/node/: Node.js binding entrypoint
bindings/rust/: Rust crate wrapper

How the Build Works

The build is driven from data/keywords.yaml.

scripts/inject_aliases.py replaces // build:inject <construct> markers in grammar.js with generated choice(...) expressions.
tree-sitter generate compiles the grammar into C sources.
node-gyp rebuild builds the native Node binding.
scripts/build_tmgrammar.py generates generated/multilingual.tmLanguage.json.
scripts/build_monarch.py generates generated/monarch.json.
scripts/validate_coverage.py checks that keyword coverage is complete and consistent.

Keeping Generated Files in Sync

The files in generated/ are committed intentionally because they are directly useful to downstream editor and highlighting integrations.

If you change data/keywords.yaml, grammar.js, or generator scripts, regenerate and review the committed outputs before opening a PR:

make tmgrammar
make monarch
make validate

At a minimum, check that these files are updated together when relevant:

generated/multilingual.tmLanguage.json
generated/monarch.json
README.md if user-facing behavior or supported extension guidance changed

Using This Repository in Applications

1. Use It from Node.js

After building the parser, you can load it through the bundled Node binding:

const Parser = require("tree-sitter");
const Multilingual = require("./bindings/node");

const parser = new Parser();
parser.setLanguage(Multilingual);

const source = `
def greet(name):
  return f"Hello, {name}"
`;

const tree = parser.parse(source);
console.log(tree.rootNode.toString());

Use this approach when you are building:

A CLI formatter or linter
A code analysis tool
A desktop app or Electron app
A custom language service prototype

2. Use It from Rust

The Rust binding exposes a language() function for tree-sitter:

let source = r#"
def greet(name):
  return f"Hello, {name}"
"#;

let mut parser = tree_sitter::Parser::new();
parser
    .set_language(tree_sitter_multilingual::language())
    .expect("failed to load multilingual grammar");

let tree = parser.parse(source, None).expect("failed to parse source");
println!("{}", tree.root_node().to_sexp());

Use this when you want to embed the grammar in:

A Rust CLI
A language server
A static analysis tool
A backend service that parses code snippets

3. Use It in Tree-sitter-Based Editors

This repository already includes the standard query files used by many Tree-sitter integrations:

queries/highlights.scm
queries/indents.scm
queries/folds.scm

These are the files editors typically use for:

Syntax highlighting
Auto-indentation
Code folding

4. Use It in Neovim

For nvim-treesitter, the key pieces you need are:

The generated parser
The queries/ directory
The language registration metadata

Example setup:

require("nvim-treesitter.configs").setup {
  highlight = { enable = true },
}

To fully integrate this language in Neovim, you would typically:

Register the parser with nvim-treesitter
Point it at this repository
Install the queries/*.scm files alongside the parser

5. Use It in VS Code or Any TextMate-Based Editor

Run:

make tmgrammar

This generates:

generated/multilingual.tmLanguage.json

Use that file inside a VS Code extension, or any editor/tooling stack that consumes TextMate grammars.

This is the right path for:

VS Code extensions
Syntax highlighting in Shiki-compatible pipelines
Any tool that relies on TextMate scopes rather than Tree-sitter directly

6. Use It in Monaco Editor

Run:

make monarch

This generates:

generated/monarch.json

Use that file when registering a Monaco tokenizer in:

Monaco Editor
Browser IDEs
Web playgrounds
Electron apps using Monaco

7. Use It in Static Highlighting Pipelines

If your application does not need incremental parsing, you can still use the generated grammar artifacts:

Use generated/multilingual.tmLanguage.json for TextMate-compatible highlighters
Use generated/monarch.json for Monaco-based editors

That is often the simplest option for:

Documentation sites
Playground pages
Code preview components
Read-only syntax highlighting

Example Source File

See these repository examples:

examples/english.multi for a straightforward English-oriented sample
examples/french.multi for a French-surface sample built around .multi

A small example:

def greet(name = "World"):
  return f"Hello, {name}!"

if True:
  print(greet("Alice"))

Grammar Features

Current grammar coverage includes:

Functions and classes
Assignments and expressions
if / elif / else
for and while
Imports
Arithmetic, comparison, logical, and bitwise operators
Strings, f-strings, numbers, lists, dicts, tuples, and sets
Line comments using #

Keyword Registry

data/keywords.yaml is the source of truth for keyword aliases.

It defines multilingual surface forms for constructs such as:

if, else, elif
for, while, break, continue
def, class, return
import, from, as
and, or, not, in, is
True, False, None

If you want to add or update a language, start there.

Adding a New Language

Edit data/keywords.yaml
Run make all
Run make test
Inspect the updated generated outputs

Testing

Run the grammar corpus tests with:

make test

The test files live in test/corpus/*.txt.

Known Limitations

No complex unpacking in assignments
No decorators or type annotations
No async / await
No with statements
Comments cannot appear inside expressions

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
bindings		bindings
data		data
examples		examples
generated		generated
queries		queries
scripts		scripts
src		src
test/corpus		test/corpus
vscode-test-extension		vscode-test-extension
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
binding.gyp		binding.gyp
grammar.js		grammar.js
package.json		package.json
sync-report.json		sync-report.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tree-sitter-multilingual

What is the Multilingual Programming Language?

What This Repository Produces

GitHub Linguist Positioning

Quick Start

Prerequisites

Build Everything

Build Targets

Repository Layout

How the Build Works

Keeping Generated Files in Sync

Using This Repository in Applications

1. Use It from Node.js

2. Use It from Rust

3. Use It in Tree-sitter-Based Editors

4. Use It in Neovim

5. Use It in VS Code or Any TextMate-Based Editor

6. Use It in Monaco Editor

7. Use It in Static Highlighting Pipelines

Example Source File

Grammar Features

Keyword Registry

Adding a New Language

Testing

Known Limitations

License

Related

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tree-sitter-multilingual

What is the Multilingual Programming Language?

What This Repository Produces

GitHub Linguist Positioning

Quick Start

Prerequisites

Build Everything

Build Targets

Repository Layout

How the Build Works

Keeping Generated Files in Sync

Using This Repository in Applications

1. Use It from Node.js

2. Use It from Rust

3. Use It in Tree-sitter-Based Editors

4. Use It in Neovim

5. Use It in VS Code or Any TextMate-Based Editor

6. Use It in Monaco Editor

7. Use It in Static Highlighting Pipelines

Example Source File

Grammar Features

Keyword Registry

Adding a New Language

Testing

Known Limitations

License

Related

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages