Feature Request: Integrate Magika for AI-powered file type detection & malware defense
Problem
GoClaw agents frequently handle file uploads, read/write operations, and process user-provided files across skills (docx, pdf, xlsx, pptx, etc.). Currently, file type detection relies on file extensions or basic magic bytes, which is vulnerable to:
- Extension spoofing: A
.pdf file that's actually an executable
- Polyglot files: Files valid in multiple formats, potentially hiding malicious payloads
- MIME type confusion: Incorrect content-type leading to wrong processing pipeline
- Malware injection: Malicious files disguised as benign documents passing through skill scripts
This is especially critical given GoClaw's multi-tenant architecture where agents process files from untrusted sources.
Proposed Solution
Integrate Google Magika — an AI-powered file content type detection tool — into GoClaw's file handling pipeline.
What is Magika?
- AI-powered: Deep learning model trained on ~100M samples across 200+ content types
- ~99% accuracy: Outperforms traditional
file command and magic-byte detection, especially on textual content
- Fast: ~5ms inference time per file (near-constant, independent of file size)
- Lightweight: Model weighs only a few MBs
- Production-proven: Used at scale by Google (Gmail, Drive, Safe Browsing), VirusTotal, and abuse.ch — processing hundreds of billions of samples weekly
- Apache 2.0 license: Permissive, suitable for integration
- Multiple interfaces: CLI (Rust), Python API, JS/TS, Go bindings (WIP)
Integration Points in GoClaw
1. File Upload / Ingestion Gate
User uploads file → Magika scan → Verify actual type matches expected type → Accept or reject
- Validate files before they enter any skill processing pipeline
- Block mismatches (e.g., PE binary uploaded as
.docx)
2. Skill Pre-flight Check
Before skill scripts execute, verify input files are the expected type:
{
"path": "uploads/user_file.docx",
"expected": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"actual": "application/x-dosexec",
"action": "block"
}
3. Security Layer (5-Layer Security Model)
Add Magika as an additional layer in GoClaw's security architecture:
- Layer 0: Input validation
- Layer 1: File type verification (Magika) ← NEW
- Layer 2: Content sanitization
- Layer 3: Execution sandboxing
- Layer 4: Output validation
4. magika Binary as System Dependency
Add magika to the package installer (dep_installer.go) as a recognized system binary, similar to ffmpeg, tesseract, pandoc:
apk add magika # or install via pip: pipx install magika
Implementation Options
| Option |
Pros |
Cons |
CLI integration (call magika binary from Go) |
Simple, no Go dependency, works immediately |
Process spawn overhead |
| Go bindings (when available) |
Native, fastest, no subprocess |
Go bindings still WIP |
| Python API (via existing Python skill runtime) |
Available now, well-documented |
Requires Python runtime |
| HTTP microservice (sidecar) |
Language-agnostic, scalable |
Adds infrastructure complexity |
Recommended: Start with CLI integration (option 1) for immediate value, migrate to Go bindings when stable.
Configuration
{
"security": {
"magika": {
"enabled": true,
"mode": "high-confidence",
"block_on_mismatch": true,
"allowed_types": ["document", "code", "text", "image"],
"blocked_types": ["executable", "archive", "inode"],
"max_file_size_mb": 50
}
}
}
Use Cases
- Skill security: Ensure PDF/docx/xlsx skills only receive valid files of the expected type
- Upload validation: Reject spoofed files at the gateway before they reach agents
- Audit logging: Log file type detection results for compliance and forensics
- Malware prevention: Catch disguised executables, scripts, or polyglot files
- Multi-tenant isolation: Prevent cross-tenant file type attacks in shared environments
References
Labels: enhancement, security, malware-protection, file-handling
Feature Request: Integrate Magika for AI-powered file type detection & malware defense
Problem
GoClaw agents frequently handle file uploads, read/write operations, and process user-provided files across skills (docx, pdf, xlsx, pptx, etc.). Currently, file type detection relies on file extensions or basic magic bytes, which is vulnerable to:
.pdffile that's actually an executableThis is especially critical given GoClaw's multi-tenant architecture where agents process files from untrusted sources.
Proposed Solution
Integrate Google Magika — an AI-powered file content type detection tool — into GoClaw's file handling pipeline.
What is Magika?
filecommand and magic-byte detection, especially on textual contentIntegration Points in GoClaw
1. File Upload / Ingestion Gate
.docx)2. Skill Pre-flight Check
Before skill scripts execute, verify input files are the expected type:
{ "path": "uploads/user_file.docx", "expected": "application/vnd.openxmlformats-officedocument.wordprocessingml.document", "actual": "application/x-dosexec", "action": "block" }3. Security Layer (5-Layer Security Model)
Add Magika as an additional layer in GoClaw's security architecture:
4.
magikaBinary as System DependencyAdd
magikato the package installer (dep_installer.go) as a recognized system binary, similar toffmpeg,tesseract,pandoc:Implementation Options
magikabinary from Go)Recommended: Start with CLI integration (option 1) for immediate value, migrate to Go bindings when stable.
Configuration
{ "security": { "magika": { "enabled": true, "mode": "high-confidence", "block_on_mismatch": true, "allowed_types": ["document", "code", "text", "image"], "blocked_types": ["executable", "archive", "inode"], "max_file_size_mb": 50 } } }Use Cases
References
Labels: enhancement, security, malware-protection, file-handling