Scan Git repositories for hidden Unicode injection attacks. Detects invisible characters weaponized for supply chain compromise — the technique used by Glassworm and related campaigns to hide malicious payloads in plain sight.
In March 2026, threat actor Glassworm compromised 151+ packages across GitHub, npm, and VS Code marketplace by encoding malicious JavaScript payloads using invisible Unicode characters (Variation Selectors and PUA ranges). The code looks clean in every editor, terminal, and code review tool — but executes hidden payloads via eval().
Traditional static analysis tools and visual code review cannot detect these attacks. unicode-sentinel can.
- Supply-chain attack using invisible code hits GitHub and other repositories — Ars Technica
- Glassworm Returns: Unicode Attack on GitHub, npm, VSCode — Aikido Security
- Delivering Malware via Google Calendar Invites and PUAs — Aikido Security
- Invisible Unicode character detection — scans for 7 dangerous character ranges including Variation Selectors, PUA, zero-width, and bidirectional controls
- Glassworm decoder pattern matching — detects the specific
codePointAt()+eval()+Buffer.from()attack signature with 8 regex patterns - Byte discrepancy analysis — flags files where byte size significantly exceeds visible character count, indicating hidden content
- Multiple output formats — table (colored terminal), JSON (structured), and SARIF 2.1.0 (GitHub Advanced Security / CI integration)
- Git hook generation — pre-commit and post-checkout hooks for automatic scanning
- Configurable allowlisting —
.unicode-sentinel.ymlconfig for per-repo file/directory/range exclusions - Binary file detection — automatically skips binary files via magic byte signatures
.gitignoreawareness — respects repository ignore patternsnpm registry scanning— planned: scan npm packages before installVS Code extension scanning— planned: scan .vsix files for hidden payloads— planned: non-zero exit for CI/CD pipeline integration--ciexit codesTest suite— planned: unit and integration testsnpm publish— planned: publish as installable package
| Range | Code Points | Description |
|---|---|---|
| Variation Selectors | U+FE00–FE0F | Glassworm primary encoding range |
| VS Supplement | U+E0100–E01EF | Glassworm secondary encoding range |
| Private Use Area | U+E000–F8FF | Arbitrary hidden data encoding |
| Zero-Width | U+200B–200F | Invisible text processing characters |
| Invisible Formatting | U+2060–2064 | Word joiner, invisible operators |
| Bidi Controls | U+202A–202E | Text direction manipulation |
| Bidi Isolates | U+2066–2069 | Isolated directional sections |
Requires Bun runtime.
git clone https://github.com/christauff/unicode-sentinel.git
cd unicode-sentinel
bun install# Basic scan with colored table output
bun run src/index.ts scan .
# Scan with JSON output
bun run src/index.ts scan ./my-repo --format json
# Scan with SARIF output for GitHub Security
bun run src/index.ts scan . --format sarif > results.sarif
# Only show high and critical findings
bun run src/index.ts scan . --severity high
# Custom byte discrepancy threshold
bun run src/index.ts scan . --threshold 2.0
# Ignore i18n files
bun run src/index.ts scan . --ignore-patterns "*.i18n.*,locales/*"# Generate post-checkout hook (scans on clone/pull)
bun run src/index.ts hooks --hook-type post-checkout
# Generate pre-commit hook (blocks commits with suspicious Unicode)
bun run src/index.ts hooks --hook-type pre-commit# Create example .unicode-sentinel.yml in current directory
bun run src/index.ts init| Level | Trigger |
|---|---|
| CRITICAL | Glassworm decoder pattern detected (codePointAt + eval + invisible chars) |
| HIGH | High density of invisible characters in a single file (>20) |
| MEDIUM | Isolated invisible characters in code files |
| LOW | Byte discrepancy without specific character findings |
Create a .unicode-sentinel.yml in your repo root:
allowlist:
files: [] # Files to skip (glob patterns)
directories: [] # Directories to skip
unicodeRanges: [] # Range names to allow (e.g., "Zero-Width Characters")
patterns: [] # Pattern names to allow
scanning:
threshold: 1.5 # Byte-to-visible-char ratio threshold
includePatterns:
- "*.js"
- "*.ts"
- "*.jsx"
- "*.tsx"
- "*.mjs"
- "*.py"
- "*.rb"
- "*.go"
ignorePatterns:
- "*.min.js"
- "*.bundle.js"
maxFileSize: 10485760 # 10MB
followSymlinks: false
output:
format: table
colors: true
verbose: falsesrc/
├── index.ts # Entry point and CLI router
├── cli/
│ ├── parser.ts # Argument parsing (node:util parseArgs)
│ └── commands.ts # Command handlers (scan, hooks, init)
├── core/
│ ├── scanner.ts # Orchestrator: file collection, filtering, scanning
│ ├── unicode-detector.ts # 7 dangerous Unicode range detection
│ ├── pattern-matcher.ts # 8 Glassworm decoder signature regexes
│ └── byte-analyzer.ts # Byte-to-visible-character ratio analysis
├── config/
│ └── loader.ts # YAML config loader (no external deps)
├── output/
│ ├── formatter.ts # Output format router
│ ├── table.ts # Colored terminal table output
│ ├── json.ts # Structured JSON output
│ └── sarif.ts # SARIF 2.1.0 for GitHub Security
├── git/
│ ├── hooks.ts # Pre-commit and post-checkout generators
│ └── gitignore.ts # .gitignore parser
├── utils/
│ ├── binary-check.ts # Magic byte binary file detection
│ ├── colors.ts # Terminal color utilities
│ └── file-filter.ts # Glob pattern file filtering
└── types/
└── index.ts # TypeScript type definitions
This tool detects techniques used in the following attack patterns:
- T1195.002 — Supply Chain Compromise: Software Supply Chain
- T1027.010 — Obfuscated Files: Command Obfuscation
- T1059.007 — JavaScript Execution
- T1036.005 — Masquerading: Match Legitimate Name
- T1140 — Deobfuscate/Decode Files
MIT
