[Feature Request] Integrate Google Magika for AI-powered file type detection & malware defense

## Feature Request: Integrate Magika for AI-powered file type detection & malware defense

### Problem
GoClaw agents frequently handle file uploads, read/write operations, and process user-provided files across skills (docx, pdf, xlsx, pptx, etc.). Currently, file type detection relies on file extensions or basic magic bytes, which is vulnerable to:

- **Extension spoofing**: A `.pdf` file that's actually an executable
- **Polyglot files**: Files valid in multiple formats, potentially hiding malicious payloads
- **MIME type confusion**: Incorrect content-type leading to wrong processing pipeline
- **Malware injection**: Malicious files disguised as benign documents passing through skill scripts

This is especially critical given GoClaw's multi-tenant architecture where agents process files from untrusted sources.

### Proposed Solution
Integrate [Google Magika](https://github.com/google/magika) — an AI-powered file content type detection tool — into GoClaw's file handling pipeline.

### What is Magika?
- **AI-powered**: Deep learning model trained on ~100M samples across 200+ content types
- **~99% accuracy**: Outperforms traditional `file` command and magic-byte detection, especially on textual content
- **Fast**: ~5ms inference time per file (near-constant, independent of file size)
- **Lightweight**: Model weighs only a few MBs
- **Production-proven**: Used at scale by Google (Gmail, Drive, Safe Browsing), VirusTotal, and abuse.ch — processing hundreds of billions of samples weekly
- **Apache 2.0 license**: Permissive, suitable for integration
- **Multiple interfaces**: CLI (Rust), Python API, JS/TS, Go bindings (WIP)

### Integration Points in GoClaw

#### 1. File Upload / Ingestion Gate
```
User uploads file → Magika scan → Verify actual type matches expected type → Accept or reject
```
- Validate files before they enter any skill processing pipeline
- Block mismatches (e.g., PE binary uploaded as `.docx`)

#### 2. Skill Pre-flight Check
Before skill scripts execute, verify input files are the expected type:
```json
{
  "path": "uploads/user_file.docx",
  "expected": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
  "actual": "application/x-dosexec",
  "action": "block"
}
```

#### 3. Security Layer (5-Layer Security Model)
Add Magika as an additional layer in GoClaw's security architecture:
- **Layer 0**: Input validation
- **Layer 1**: File type verification (Magika) ← NEW
- **Layer 2**: Content sanitization
- **Layer 3**: Execution sandboxing
- **Layer 4**: Output validation

#### 4. `magika` Binary as System Dependency
Add `magika` to the package installer (`dep_installer.go`) as a recognized system binary, similar to `ffmpeg`, `tesseract`, `pandoc`:
```
apk add magika  # or install via pip: pipx install magika
```

### Implementation Options

| Option | Pros | Cons |
|--------|------|------|
| **CLI integration** (call `magika` binary from Go) | Simple, no Go dependency, works immediately | Process spawn overhead |
| **Go bindings** (when available) | Native, fastest, no subprocess | Go bindings still WIP |
| **Python API** (via existing Python skill runtime) | Available now, well-documented | Requires Python runtime |
| **HTTP microservice** (sidecar) | Language-agnostic, scalable | Adds infrastructure complexity |

**Recommended**: Start with CLI integration (option 1) for immediate value, migrate to Go bindings when stable.

### Configuration

```json
{
  "security": {
    "magika": {
      "enabled": true,
      "mode": "high-confidence",
      "block_on_mismatch": true,
      "allowed_types": ["document", "code", "text", "image"],
      "blocked_types": ["executable", "archive", "inode"],
      "max_file_size_mb": 50
    }
  }
}
```

### Use Cases
- **Skill security**: Ensure PDF/docx/xlsx skills only receive valid files of the expected type
- **Upload validation**: Reject spoofed files at the gateway before they reach agents
- **Audit logging**: Log file type detection results for compliance and forensics
- **Malware prevention**: Catch disguised executables, scripts, or polyglot files
- **Multi-tenant isolation**: Prevent cross-tenant file type attacks in shared environments

### References
- **Repo**: https://github.com/google/magika
- **Website**: https://securityresearch.google/magika/
- **Research Paper**: IEEE/ACM ICSE 2025
- **Web Demo**: https://securityresearch.google/magika/demo/magika-demo/
- **Google OSS Blog**: https://opensource.googleblog.com/2024/02/magika-ai-powered-fast-and-efficient-file-type-identification.html

---

**Labels:** enhancement, security, malware-protection, file-handling


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Integrate Google Magika for AI-powered file type detection & malware defense #890

Feature Request: Integrate Magika for AI-powered file type detection & malware defense

Problem

Proposed Solution

What is Magika?

Integration Points in GoClaw

1. File Upload / Ingestion Gate

2. Skill Pre-flight Check

3. Security Layer (5-Layer Security Model)

4. `magika` Binary as System Dependency

Implementation Options

Configuration

Use Cases

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Option	Pros	Cons
CLI integration (call `magika` binary from Go)	Simple, no Go dependency, works immediately	Process spawn overhead
Go bindings (when available)	Native, fastest, no subprocess	Go bindings still WIP
Python API (via existing Python skill runtime)	Available now, well-documented	Requires Python runtime
HTTP microservice (sidecar)	Language-agnostic, scalable	Adds infrastructure complexity

[Feature Request] Integrate Google Magika for AI-powered file type detection & malware defense #890

Description

Feature Request: Integrate Magika for AI-powered file type detection & malware defense

Problem

Proposed Solution

What is Magika?

Integration Points in GoClaw

1. File Upload / Ingestion Gate

2. Skill Pre-flight Check

3. Security Layer (5-Layer Security Model)

4. magika Binary as System Dependency

Implementation Options

Configuration

Use Cases

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

4. `magika` Binary as System Dependency