fabiantax · fabiantax · Mar 4, 2026 · Mar 4, 2026 · Mar 4, 2026 · Mar 5, 2026
diff --git a/.claude/commands/anno-ner.md b/.claude/commands/anno-ner.md
@@ -0,0 +1,41 @@
+Run the Rust NER extraction pipeline using the anno crate (GLiNER zero-shot). Pass a source file path and options.
+
+## Instructions
+
+1. Build and run the pipeline in `graphrag-pipeline/`:
+   ```
+   cd C:/Users/fabia/Projects/llama.cpp/llama.cpp/graphrag-pipeline
+   cargo run -- $ARGUMENTS
+   ```
+
+2. If no arguments given, show usage help:
+   ```
+   cargo run -- --help
+   ```
+
+## CLI Options
+
+- `--source <file>` — Input text file (paper, profiling log, code comments)
+- `--dry-run` — Print extracted entities/relations without writing to FalkorDB
+- `--ner-only` — Skip LLM relation extraction, NER pass only (fast, free)
+- `--labels <csv>` — Custom entity types (default: hardware,gpu_feature,optimization_technique,algorithm,software_framework,performance_metric,memory_pattern,kernel_operation,model_architecture,constraint,data_structure,research_paper)
+
+## How It Works
+
+- **anno crate**: Uses `GLiNEROnnx` with model `onnx-community/gliner_small-v2.1`
+- **ZeroShotNER trait**: `extract_with_types(text, labels, threshold=0.5)`
+- Chunks text at ~600 tokens (2400 chars) with 400-char overlap (GraphRAG optimal)
+- Outputs `<source>_extracted.json` with entities and relations
+
+## Typical Workflows
+
+- **Quick NER scan**: `--source paper.txt --ner-only --dry-run`
+- **Full pipeline**: `--source paper.txt` (needs ANTHROPIC_API_KEY + FalkorDB running)
+- **Custom domain**: `--source log.txt --labels "kernel,bandwidth,latency,occupancy" --ner-only`
+
+## Build
+
+```
+cd C:/Users/fabia/Projects/llama.cpp/llama.cpp/graphrag-pipeline
+cargo build
+```
diff --git a/.claude/commands/arxiv.md b/.claude/commands/arxiv.md
@@ -0,0 +1,29 @@
+Search arXiv for papers and read their contents. Pass a search query or paper ID.
+
+## Instructions
+
+1. Load arXiv MCP tools:
+   - Use ToolSearch to load: `select:mcp__arxiv-server__search_papers,mcp__arxiv-server__read_paper,mcp__arxiv-server__list_papers,mcp__arxiv-server__download_paper`
+
+2. Execute the user's request: $ARGUMENTS
+
+3. Search phase:
+   - Use `mcp__arxiv-server__search_papers` with the query
+   - Search tips: prefix `ti:` for title, `au:` for author
+   - Category filters: `cs.LG` (ML), `cs.AR` (architecture), `cs.DC` (distributed), `cs.PF` (performance)
+   - Example: `ti:flash attention au:dao cat:cs.LG`
+
+4. Read phase:
+   - For each relevant paper, use `mcp__arxiv-server__read_paper` with the arXiv ID
+   - Summarize: problem, method, key results, relevance to GPU optimization
+
+5. Save findings:
+   - If FalkorDB is running (use /falkordb), create research_paper nodes and relations
+   - Save paper text to `graphrag-pipeline/sources/` for later NER extraction
+   - Report paper IDs, titles, and key takeaways
+
+## Quick Examples
+
+- Search: `mcp__arxiv-server__search_papers` with query `"flash attention v3 hopper"`
+- Read: `mcp__arxiv-server__read_paper` with id `"2307.08691"`
+- List recent: `mcp__arxiv-server__list_papers` with category `"cs.LG"` and max_results `5`
diff --git a/.claude/commands/falkordb.md b/.claude/commands/falkordb.md
@@ -0,0 +1,32 @@
+Query or modify the FalkorDB gpu_optimization knowledge graph. Pass a Cypher query or describe what you want.
+
+## Instructions
+
+1. Load FalkorDB MCP tools:
+   - Use ToolSearch to load: `+falkordb` (finds graph query/create tools)
+
+2. Connection details:
+   - Host: localhost:6379 (Redis protocol)
+   - Graph name: `gpu_optimization`
+   - Browser UI: http://localhost:3000
+
+3. Execute the user's request: $ARGUMENTS
+
+## Common Cypher Patterns
+
+- **List all node labels**: `CALL db.labels()`
+- **List all relation types**: `CALL db.relationshipTypes()`
+- **Find a node**: `MATCH (n {name: 'SharedMemoryTiling'}) RETURN n`
+- **All neighbors**: `MATCH (n {name: 'DeltaNet'})-[r]-(m) RETURN n, type(r), m`
+- **Shortest path**: `MATCH p=shortestPath((a {name: 'X'})-[*]-(b {name: 'Y'})) RETURN p`
+- **Create node**: `CREATE (:optimization_technique {name: 'MyTech', description: 'desc', type: 'optimization_technique'})`
+- **Create relation**: `MATCH (a {name: 'X'}), (b {name: 'Y'}) CREATE (a)-[:IMPROVES]->(b)`
+- **Fuzzy search**: `MATCH (n) WHERE n.name CONTAINS 'SSM' RETURN n`
+
+## Relation Types
+
+IMPLEMENTS, USES, OPTIMIZES, TARGETS, IMPROVES, REDUCES, ELIMINATES, MEASURES, LIMITS, ENABLES, EXTENDS, BUILDS_ON, VALIDATES, COMPETES_WITH, IS_PART_OF, IS_FEATURE_OF, REQUIRES, COULD_IMPROVE, INTRODUCES, PORTS_TO
+
+## Entity Types
+
+hardware, gpu_feature, optimization_technique, algorithm, software_framework, performance_metric, memory_pattern, kernel_operation, model_architecture, constraint, data_structure, research_paper
diff --git a/.claude/commands/gliner.md b/.claude/commands/gliner.md
@@ -0,0 +1,149 @@
+GLiNER zero-shot NER and relation extraction reference. Use when working with GLiNER models, gline-rs, or the graphrag-pipeline.
+
+## GLiNER Model Modes
+
+GLiNER supports multiple extraction modes via different model architectures:
+
+### 1. Span Mode (NER only)
+- Models: `gliner_small-v2.1`, `gliner_large-v2.1`, `gliner_multi-v2.1`
+- API: `GLiNER::<SpanMode>::new(params, runtime_params, tokenizer, model)`
+- Input: `TextInput::from_str(&texts, &labels)`
+- Output: `SpanOutput` — list of entity spans with class, text, probability
+- Fast, small models (175MB int8). Good for high-throughput NER.
+
+### 2. Token Mode (NER, multitask models)
+- Models: `gliner-multitask-large-v0.5`, `gliner-relex-large-v0.5`
+- API: `TokenPipeline::new(tokenizer)?.to_composable(&model, &params)`
+- Same input/output as span mode but uses token-level classification
+- Required for multitask models that also support relation extraction
+
+### 3. Relation Extraction (via composed pipeline)
+- Model: `gliner-multitask-large-v0.5` (same model does both NER + RE)
+- Requires `TokenPipeline` (NER) chained with `RelationPipeline` (RE)
+- Relations are schema-driven: define allowed subject/object entity types per relation
+
+## gline-rs API (Rust crate v1.0.1)
+
+### NER (Span Mode)
+```rust
+use gliner::model::GLiNER;
+use gliner::model::pipeline::span::SpanMode;
+use gliner::model::params::Parameters;
+use gliner::model::input::text::TextInput;
+use orp::params::RuntimeParameters;
+
+let model = GLiNER::<SpanMode>::new(
+    Parameters::default(),
+    RuntimeParameters::default().with_threads(2),
+    "models/gliner_small-v2.1/tokenizer.json",
+    "models/gliner_small-v2.1/onnx/model_int8.onnx",
+)?;
+let input = TextInput::from_str(&["some text"], &["person", "company"])?;
+let output = model.inference(input)?;
+for spans in &output.spans {
+    for span in spans {
+        println!("{} [{}] {:.0}%", span.text(), span.class(), span.probability() * 100.0);
+    }
+}
+```
+
+### NER + Relation Extraction (Composed Pipeline)
+```rust
+use composable::*;
+use orp::model::Model;
+use orp::params::RuntimeParameters;
+use gliner::model::params::Parameters;
+use gliner::model::pipeline::{token::TokenPipeline, relation::RelationPipeline};
+use gliner::model::input::{text::TextInput, relation::schema::RelationSchema};
+
+let params = Parameters::default();
+let model = Model::new("models/gliner-multitask-large-v0.5/onnx/model_q4f16.onnx",
+                        RuntimeParameters::default())?;
+
+let mut schema = RelationSchema::new();
+schema.push_with_allowed_labels("USES", &["software_framework"], &["algorithm"]);
+schema.push_with_allowed_labels("TARGETS", &["optimization_technique"], &["hardware"]);
+// Or unconstrained:
+schema.push("IMPROVES");
+
+let pipeline = composed![
+    TokenPipeline::new("models/gliner-multitask-large-v0.5/tokenizer.json")?
+        .to_composable(&model, &params),
+    RelationPipeline::default("models/gliner-multitask-large-v0.5/tokenizer.json", &schema)?
+        .to_composable(&model, &params),
+];
+
+let input = TextInput::from_str(&["text"], &["person", "company"])?;
+pipeline.apply(input)?;
+```
+
+### Output Structures
+```rust
+// Entity (from SpanOutput or TokenPipeline)
+span.text()        -> &str      // "Bill Gates"
+span.class()       -> &str      // "person"
+span.probability() -> f32       // 0.999
+span.offsets()     -> (usize, usize)
+
+// Relation (from RelationOutput)
+relation.subject()     -> &str  // "Bill Gates"
+relation.object()      -> &str  // "Microsoft"
+relation.class()       -> &str  // "founded"
+relation.probability() -> f32   // 0.997
+```
+
+### Parameters
+```rust
+Parameters::default()
+    .with_threshold(0.5)       // confidence threshold
+    .with_flat_ner(true)       // no overlapping entities
+    .with_multi_label(false)   // no overlapping different-class spans
+    .with_max_length(Some(512)) // max sequence length
+```
+
+## Available Models (local)
+
+| Model | Path | Size | Mode | Capabilities |
+|-------|------|------|------|-------------|
+| gliner_small-v2.1 | `models/gliner_small-v2.1/` | 175MB (int8) | Span | NER only |
+| gliner-multitask-large-v0.5 | `models/gliner-multitask-large-v0.5/` | 519MB (q4f16) | Token | NER + Relations |
+
+## ONNX Models on HuggingFace
+
+| Repo | Tasks | License |
+|------|-------|---------|
+| `onnx-community/gliner_small-v2.1` | NER | Apache 2.0 |
+| `onnx-community/gliner_large-v2.1` | NER | Apache 2.0 |
+| `onnx-community/gliner-multitask-large-v0.5` | NER + RE | Apache 2.0 |
+| `knowledgator/gliner-relex-large-v0.5` | NER + RE (needs ONNX conversion) | Apache 2.0 |
+
+## Domain Entity Types (GPU optimization)
+
+```
+hardware, gpu_feature, optimization_technique, algorithm,
+software_framework, performance_metric, memory_pattern,
+kernel_operation, model_architecture, constraint,
+data_structure, research_paper
+```
+
+## Domain Relation Types
+
+```
+IMPLEMENTS, USES, OPTIMIZES, TARGETS, IMPROVES, REDUCES,
+ELIMINATES, MEASURES, LIMITS, ENABLES, EXTENDS, BUILDS_ON,
+VALIDATES, COMPETES_WITH, IS_PART_OF, IS_FEATURE_OF,
+REQUIRES, COULD_IMPROVE, INTRODUCES, PORTS_TO
+```
+
+## Docker
+
+```bash
+# NER only (fast, no API key needed)
+docker compose run --rm graphrag --source sources/paper.txt --ner-only --dry-run
+
+# Full pipeline (NER + LLM relations + FalkorDB)
+ANTHROPIC_API_KEY=sk-... docker compose run --rm graphrag --source sources/paper.txt
+
+# Skip local NER, LLM-only
+docker compose run --rm graphrag --source sources/paper.txt --skip-ner
+```
diff --git a/.claude/commands/graph-enrichment.md b/.claude/commands/graph-enrichment.md
@@ -0,0 +1,36 @@
+Run the full knowledge graph enrichment pipeline: arXiv paper -> chunk -> NER -> LLM relations -> dedup -> FalkorDB. Pass a paper ID, URL, or topic.
+
+## Instructions
+
+1. Load tools:
+   - Use ToolSearch to load: `select:mcp__arxiv-server__search_papers,mcp__arxiv-server__read_paper,mcp__arxiv-server__download_paper`
+   - Use ToolSearch to load: `+falkordb` (for graph merge verification)
+
+2. **Acquire source** from $ARGUMENTS:
+   - If arXiv ID (e.g. `2307.08691`): use `mcp__arxiv-server__read_paper` to get full text
+   - If search topic: use `mcp__arxiv-server__search_papers`, pick best result, then read it
+   - Save text to `C:/Users/fabia/Projects/llama.cpp/llama.cpp/graphrag-pipeline/sources/<id>.txt`
+
+3. **Run extraction pipeline**:
+   ```bash
+   cd C:/Users/fabia/Projects/llama.cpp/llama.cpp/graphrag-pipeline
+   cargo run -- --source sources/<id>.txt
+   ```
+   - Without ANTHROPIC_API_KEY: add `--ner-only` (NER pass only, no LLM gleaning)
+   - For preview: add `--dry-run` (print results, skip FalkorDB merge)
+
+4. **Verify in FalkorDB**:
+   - Query new nodes: `MATCH (n) WHERE n.name CONTAINS '<keyword>' RETURN n`
+   - Check relations: `MATCH (n)-[r]->(m) RETURN n.name, type(r), m.name ORDER BY n.name LIMIT 20`
+
+5. **Report**: Summarize entities created, relations found, and any dedup merges.
+
+## Pipeline Stages (GraphRAG + LightRAG hybrid)
+
+| Stage | Technique | Detail |
+|-------|-----------|--------|
+| Chunk | GraphRAG | 600-token chunks, 100-token overlap |
+| NER | anno/GLiNER | Zero-shot with 12 GPU-domain entity types |
+| Relations | GraphRAG gleaning | Claude Haiku, multi-round extraction per chunk |
+| Dedup | LightRAG | Normalize names, merge properties, deduplicate rels |
+| Merge | Incremental | MATCH-or-CREATE into FalkorDB `gpu_optimization` graph |
diff --git a/.claude/commands/research.md b/.claude/commands/research.md
@@ -0,0 +1,29 @@
+Research a topic using web search, arXiv papers, and Hugging Face. Synthesize findings into actionable results.
+
+## Instructions
+
+1. Load research tools first:
+   - Use ToolSearch to load: WebSearch, WebFetch
+   - Use ToolSearch to load arxiv MCP tools: mcp__arxiv-server__search_papers, mcp__arxiv-server__read_paper, mcp__arxiv-server__list_papers
+   - Use ToolSearch to load HuggingFace MCP tools: mcp__claude_ai_Hugging_Face__paper_search, mcp__claude_ai_Hugging_Face__hub_repo_search
+
+2. Search phase — run these in parallel:
+   - WebSearch for: $ARGUMENTS
+   - mcp__arxiv-server__search_papers for relevant papers
+   - mcp__claude_ai_Hugging_Face__paper_search if ML models are relevant
+
+3. Deep-dive phase:
+   - For each promising result, use WebFetch to get details (GitHub READMEs, docs)
+   - For key papers, use mcp__arxiv-server__read_paper to read full content
+   - Use mcp__claude_ai_Hugging_Face__hub_repo_search for relevant models/datasets
+
+4. Compile findings into a structured comparison table with:
+   - Project name + URL
+   - Language/SDK (TypeScript, Rust, Python, etc.)
+   - Key features relevant to the query
+   - Maturity (stars, last update, version)
+   - How it could apply to our GPU optimization work
+
+5. If FalkorDB is running, suggest new entities/relations to add from findings
+
+6. Eval: verify at least 3 sources were consulted and findings are cross-referenced
diff --git a/.claude/settings.json b/.claude/settings.json
@@ -0,0 +1,5 @@
+{
+  "env": {
+    "CLAUDE_CODE_TASK_LIST_ID": "llama-cpp-graphrag"
+  }
+}