Skip to content

christauff/unicode-sentinel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

unicode-sentinel

unicode-sentinel logo

Scan Git repositories for hidden Unicode injection attacks. Detects invisible characters weaponized for supply chain compromise — the technique used by Glassworm and related campaigns to hide malicious payloads in plain sight.

Why This Exists

In March 2026, threat actor Glassworm compromised 151+ packages across GitHub, npm, and VS Code marketplace by encoding malicious JavaScript payloads using invisible Unicode characters (Variation Selectors and PUA ranges). The code looks clean in every editor, terminal, and code review tool — but executes hidden payloads via eval().

Traditional static analysis tools and visual code review cannot detect these attacks. unicode-sentinel can.

Related Reporting

Features

  1. Invisible Unicode character detection — scans for 7 dangerous character ranges including Variation Selectors, PUA, zero-width, and bidirectional controls
  2. Glassworm decoder pattern matching — detects the specific codePointAt() + eval() + Buffer.from() attack signature with 8 regex patterns
  3. Byte discrepancy analysis — flags files where byte size significantly exceeds visible character count, indicating hidden content
  4. Multiple output formats — table (colored terminal), JSON (structured), and SARIF 2.1.0 (GitHub Advanced Security / CI integration)
  5. Git hook generation — pre-commit and post-checkout hooks for automatic scanning
  6. Configurable allowlisting.unicode-sentinel.yml config for per-repo file/directory/range exclusions
  7. Binary file detection — automatically skips binary files via magic byte signatures
  8. .gitignore awareness — respects repository ignore patterns
  9. npm registry scanningplanned: scan npm packages before install
  10. VS Code extension scanningplanned: scan .vsix files for hidden payloads
  11. --ci exit codesplanned: non-zero exit for CI/CD pipeline integration
  12. Test suiteplanned: unit and integration tests
  13. npm publishplanned: publish as installable package

Unicode Ranges Detected

Range Code Points Description
Variation Selectors U+FE00–FE0F Glassworm primary encoding range
VS Supplement U+E0100–E01EF Glassworm secondary encoding range
Private Use Area U+E000–F8FF Arbitrary hidden data encoding
Zero-Width U+200B–200F Invisible text processing characters
Invisible Formatting U+2060–2064 Word joiner, invisible operators
Bidi Controls U+202A–202E Text direction manipulation
Bidi Isolates U+2066–2069 Isolated directional sections

Installation

Requires Bun runtime.

git clone https://github.com/christauff/unicode-sentinel.git
cd unicode-sentinel
bun install

Usage

Scan a directory

# Basic scan with colored table output
bun run src/index.ts scan .

# Scan with JSON output
bun run src/index.ts scan ./my-repo --format json

# Scan with SARIF output for GitHub Security
bun run src/index.ts scan . --format sarif > results.sarif

# Only show high and critical findings
bun run src/index.ts scan . --severity high

# Custom byte discrepancy threshold
bun run src/index.ts scan . --threshold 2.0

# Ignore i18n files
bun run src/index.ts scan . --ignore-patterns "*.i18n.*,locales/*"

Generate Git hooks

# Generate post-checkout hook (scans on clone/pull)
bun run src/index.ts hooks --hook-type post-checkout

# Generate pre-commit hook (blocks commits with suspicious Unicode)
bun run src/index.ts hooks --hook-type pre-commit

Initialize config

# Create example .unicode-sentinel.yml in current directory
bun run src/index.ts init

Severity Levels

Level Trigger
CRITICAL Glassworm decoder pattern detected (codePointAt + eval + invisible chars)
HIGH High density of invisible characters in a single file (>20)
MEDIUM Isolated invisible characters in code files
LOW Byte discrepancy without specific character findings

Configuration

Create a .unicode-sentinel.yml in your repo root:

allowlist:
  files: []                    # Files to skip (glob patterns)
  directories: []              # Directories to skip
  unicodeRanges: []            # Range names to allow (e.g., "Zero-Width Characters")
  patterns: []                 # Pattern names to allow

scanning:
  threshold: 1.5               # Byte-to-visible-char ratio threshold
  includePatterns:
    - "*.js"
    - "*.ts"
    - "*.jsx"
    - "*.tsx"
    - "*.mjs"
    - "*.py"
    - "*.rb"
    - "*.go"
  ignorePatterns:
    - "*.min.js"
    - "*.bundle.js"
  maxFileSize: 10485760         # 10MB
  followSymlinks: false

output:
  format: table
  colors: true
  verbose: false

Architecture

src/
├── index.ts                 # Entry point and CLI router
├── cli/
│   ├── parser.ts            # Argument parsing (node:util parseArgs)
│   └── commands.ts          # Command handlers (scan, hooks, init)
├── core/
│   ├── scanner.ts           # Orchestrator: file collection, filtering, scanning
│   ├── unicode-detector.ts  # 7 dangerous Unicode range detection
│   ├── pattern-matcher.ts   # 8 Glassworm decoder signature regexes
│   └── byte-analyzer.ts     # Byte-to-visible-character ratio analysis
├── config/
│   └── loader.ts            # YAML config loader (no external deps)
├── output/
│   ├── formatter.ts         # Output format router
│   ├── table.ts             # Colored terminal table output
│   ├── json.ts              # Structured JSON output
│   └── sarif.ts             # SARIF 2.1.0 for GitHub Security
├── git/
│   ├── hooks.ts             # Pre-commit and post-checkout generators
│   └── gitignore.ts         # .gitignore parser
├── utils/
│   ├── binary-check.ts      # Magic byte binary file detection
│   ├── colors.ts            # Terminal color utilities
│   └── file-filter.ts       # Glob pattern file filtering
└── types/
    └── index.ts             # TypeScript type definitions

MITRE ATT&CK Coverage

This tool detects techniques used in the following attack patterns:

  • T1195.002 — Supply Chain Compromise: Software Supply Chain
  • T1027.010 — Obfuscated Files: Command Obfuscation
  • T1059.007 — JavaScript Execution
  • T1036.005 — Masquerading: Match Legitimate Name
  • T1140 — Deobfuscate/Decode Files

License

MIT

About

Scan cloned Git repositories for hidden Unicode injection attacks (Glassworm-style supply chain attacks)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors