refactor: move memory.ts + ollama.ts to smriti/src, sync upstream v1.1.6 by ashu17706 · Pull Request #1 · zero8dotdev/qmd

ashu17706 · 2026-03-10T09:48:18Z

Summary

Remove smriti additions from qmd/src/: memory.ts, ollama.ts, and memory.test.ts moved out of the QMD fork into smriti/src/. This makes qmd/ pure upstream code — no smriti-specific files live here anymore.
Sync upstream tobi/qmd to v1.1.6: Merged 105+ upstream commits (clean, no conflicts since our additions were already removed).

Why

The previous approach added smriti-specific modules (memory.ts, ollama.ts) directly into the QMD fork, making upstream syncs painful. Now:

qmd/ = pure upstream clone, synced with git merge upstream/main
smriti/src/qmd-internals.ts = thin adapter (the only file touching ../qmd/src/* internals)
smriti/src/memory.ts + smriti/src/ollama.ts = smriti-owned implementations

Future upstream syncs: cd qmd && git merge upstream/main — no conflicts ever.

Changes

qmd/src/memory.ts — deleted (moved to smriti/src/memory.ts)
qmd/src/ollama.ts — deleted (moved to smriti/src/ollama.ts)
qmd/src/memory.test.ts — deleted (tests live in smriti/test/)
bun.lock — regenerated after upstream merge (new deps: picomatch, yaml, zod, better-sqlite3, vitest)

* feat: MCP HTTP transport with daemon lifecycle Add streaming HTTP transport as an alternative to stdio for the MCP server. A long-lived HTTP server avoids reloading 3 GGUF models (~2GB) on every client connection, reducing warm query latency from ~16s (CLI) to ~10s. New CLI surface: qmd mcp --http [--port N] # foreground, default port 3000 qmd mcp --http --daemon # background, PID in ~/.cache/qmd/mcp.pid qmd mcp stop # stop daemon via PID file qmd status # now shows MCP daemon liveness Server implementation (mcp.ts): - Extract createMcpServer(store) shared by stdio and HTTP transports - HTTP transport uses WebStandardStreamableHTTPServerTransport with JSON responses (stateless, no SSE) - /health endpoint with uptime, /mcp for MCP protocol, 404 otherwise - Request logging to stderr with timestamps, tool names, query args Daemon lifecycle (qmd.ts): - PID file + log file management with stale PID detection - Absolute paths in Bun.spawn (process.execPath + import.meta.path) so daemon works regardless of cwd - mkdirSync for cache dir on fresh installs - Removes top-level SIGTERM/SIGINT handlers before starting HTTP server so async cleanup in mcp.ts actually runs Move hybridQuery() and vectorSearchQuery() into store.ts as standalone functions that take a Store as first argument. Both CLI and MCP now call the identical pipeline, eliminating the class of bugs where one copy drifts from the other. Shared pipeline (store.ts): - hybridQuery(): BM25 probe → expand → FTS+vec search → RRF → chunk → rerank (chunks only) → position-aware blending → dedup - vectorSearchQuery(): expand → vec search → dedup → sort - SearchHooks interface for optional progress callbacks - Constants: STRONG_SIGNAL_MIN_SCORE, STRONG_SIGNAL_MIN_GAP, RERANK_CANDIDATE_LIMIT (40), addLineNumbers() Bugs fixed by unification: - MCP now gets strong-signal short-circuit (was CLI-only) - Reranker candidate limit unified at 40 (MCP had 30) - File dedup added to hybrid query (MCP was missing it) - Collection filter pushed into searchVec DB query - Filter-then-slice ordering fixed (MCP was slice-then-filter) * feat: type-routed query expansion — lex→FTS, vec/hyde→vector expandQuery() now returns typed ExpandedQuery[] instead of string[], preserving the lex/vec/hyde type info from the LLM's GBNF-structured output. hybridQuery() and vectorSearchQuery() route searches by type: lex queries go to FTS only, vec/hyde go to vector only. Previously, every expanded query ran through BOTH backends — keyword variants wasted embedding forward passes, semantic paraphrases wasted BM25 lookups. Type routing eliminates ~4 calls/query with zero quality loss (cross-backend noise actually hurt RRF fusion). Cache format changed from newline-separated text to JSON (preserves types). Old cache entries gracefully re-expand on first access. CLI expansion tree now shows query types: ├─ original query ├─ lex: keyword variant ├─ vec: semantic meaning └─ hyde: hypothetical document... Benchmark (5 queries, 1756-doc index, warm LLM, Apple Silicon): Metric Old (untyped) New (typed) Delta Avg backend calls 10.0 6.0 -40% Total wall time 1278ms 549ms -57% Avg saved/query — — 146ms "authentication setup" 12 → 7 calls 511 → 112ms "database migration strategy" 10 → 6 calls 182 → 106ms "how to handle errors in API" 10 → 6 calls 216 → 121ms "meeting notes from last week" 10 → 6 calls 228 → 110ms "performance optimization" 8 → 5 calls 141 → 100ms Savings come from skipped embed() calls (~30-80ms each). FTS is synchronous SQLite (~0ms), so lex→FTS routing is free while vec/hyde→vector-only avoids wasted embedding passes. * fix: MCP query snippets now use reranker's best chunk, not full body extractSnippet() was scanning the entire document body for keyword matches to build the snippet. But hybridQuery() already identified the most relevant chunk via cross-attention reranking — rescanning the full body is redundant and can land on a less relevant section if the query terms appear elsewhere in the document. CLI was already using bestChunk (set during the refactor). MCP was still using body — a pre-existing inconsistency, not a regression. * feat: dynamic MCP instructions + tool annotations The MCP server now generates instructions at startup from actual index state and injects them into the initialize response. LLMs see collection names, document counts, content descriptions, and search strategy guidance in their system prompt — zero tool calls needed for orientation. Previously, the only guidance was generic static tool descriptions and a user-invocable "query" prompt that no LLM would discover on its own. An LLM connecting to QMD had no idea what collections existed, what they contained, or how to scope searches effectively. * change default port to 8181 * fix: BM25 score normalization was inverted The normalization formula `1 / (1 + |bm25|)` is a decreasing function of match strength. FTS5 BM25 scores are negative where more negative = better match (e.g., -10 is strong, -0.5 is weak). The formula mapped: strong match (raw -10) → 1/(1+10) = 9% ← should be highest weak match (raw -0.5) → 1/(1+0.5) = 67% ← should be lowest Three downstream effects: 1. `--min-score 0.5` (or MCP minScore: 0.5) filtered OUT strong matches and kept only weak ones. The MCP instructions recommend this threshold. 2. CLI `formatScore()` color bands never showed green for BM25 results (best matches scored ~9%, green threshold is 70%). 3. The strong signal optimization in hybridQuery (skip ~2s LLM expansion when BM25 already has a clear winner) was dead code — strong matches scored ~0.09, never reaching the 0.85 threshold. Fix: `|x| / (1 + |x|)` — same (0,1) range, monotonic, no per-query normalization needed, but now correctly maps strong → high, weak → low. The normalization was born broken (Math.max(0, x) clamped all negative BM25 to 0 → every score = 1.0), then PR tobi#76 changed to Math.abs which made scores vary but inverted the direction. Neither state was ever correct. * fix: rerank cache key ignores chunk content The rerank cache key was (query, file, model) but the actual text sent to the reranker is a keyword-selected chunk that varies by query terms. Two different queries hitting the same file can select different chunks, but the second query gets a stale cached score from the first chunk. Example: Query "auth flow" → selects chunk about authentication → score 0.92 Query "auth tokens" → same file, selects chunk about tokens → cache HIT on (query, file, model) → returns 0.92 from wrong chunk Fix: include full chunk text in cache key. getCacheKey() already SHA-256 hashes its inputs, so this adds no key bloat — just disambiguation. Old cache entries become natural misses (different key shape) and re-warm on next query. * rename MCP tools for clarity, rewrite descriptions for LLM tool selection Rename MCP tools: vsearch → vector_search, query → deep_search. LLMs see these names — self-documenting names reduce reliance on descriptions for tool selection. CLI commands stay unchanged (qmd vsearch, qmd query) — different namespace, users type those. Rewrite all search tool descriptions to be action-oriented: - search: "Search by keyword. Finds documents containing exact words and phrases in the query." - vector_search: "Search by meaning. Finds relevant documents even when they use different words than the query — handles synonyms, paraphrases, and related concepts." - deep_search: "Deep search. Auto-expands the query into variations, searches each by keyword and meaning, and reranks for top hits across all results." Rewrite instructions ladder — each tool says what it does, no "start here" / "escalate as needed" strategy language. Delete the "query" prompt (registerPrompt) — it restated what descriptions + instructions already cover. No LLM proactively calls prompts/get to learn how to use tools. * supress HTTP server logs during tests

searchResultsToMarkdown and searchResultsToXml in formatter.ts were silently dropping the context field. Added formatter.test.ts covering context visibility across all output formats. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

List query first in --help as the recommended search method. Add vector-search and deep-search as undocumented CLI aliases matching MCP tool names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Generated with PaperBanana (Gemini 3 Pro). Shows query expansion fanning HyDE+Vec into vector searches, Lex into BM25, merged via reciprocal rank fusion and LLM reranking.

Three improvements to hybridQuery: 1. Collection filter pushed into SQL: searchFTS and searchVec now accept collectionName directly instead of filtering post-hoc. Reduces noise in FTS probe and all expanded-query FTS calls. Also fixes MCP server's FTS search to use SQL-level filtering. 2. Batch embed for vector searches: instead of embedding each vec/hyde query sequentially (one embed call per query), we now collect all texts that need vector search and embed them in a single embedBatch() call. The sqlite-vec lookups still run sequentially (they're fast), but the expensive LLM embed step is batched. 3. FTS-first ordering: all lex expansions run immediately (sync, no LLM needed) before the vector embedding batch. This means FTS results are ready while embeddings compute. Also cleans up legacy collectionId parameter naming (was number, now properly string collectionName throughout).

QMD was running all models on CPU even when CUDA/Vulkan/Metal was available. The getLlama() call used no gpu option, defaulting to false. Now: - ensureLlama() tries cuda → vulkan → metal → CPU fallback - Prints warning to stderr if falling back to CPU - 'qmd status' shows GPU type, device names, VRAM, and CPU cores - On this machine: 7.5s query vs 5+ minutes on CPU (reranker) The reranker (Qwen3-Reranker-0.6B) calls are serialized by a lock in node-llama-cpp's rankAndSort() — each of the 40 chunks is evaluated sequentially. This is inherent to the library's design (single sequence context). GPU acceleration is the fix, not batching — the lock prevents true parallelism regardless.

node-llama-cpp's LlamaRankingContext uses a single sequence with a withLock() guard, making rankAll() effectively sequential despite using Promise.all(). Each document evaluation erases the context, evaluates tokens, and extracts the logit — all serialized. Fix: create 4 parallel ranking contexts from the same model (model weights are shared, only KV cache is duplicated). Split documents across contexts and evaluate in parallel via Promise.all(). Benchmarks (40 chunks, CUDA, 4x A6000): - 1 context: 898ms (baseline) - 2 contexts: 460ms (2.0x) - 4 contexts: 338ms (2.7x) ← sweet spot - 8 contexts: 458ms (VRAM contention) End-to-end 'qmd query' time: 7.5s → 3.7s Gracefully handles VRAM limits — if creating the Nth context fails, falls back to however many were successfully created.

Holistic overhaul of context management: 1. Parallel embedding contexts: embedBatch now splits work across multiple EmbeddingContexts (same pattern as reranking). Each context is ~143 MB. Benchmarked 6x speedup on 20 texts with 4 contexts vs 1. 2. Rerank context size: was using auto (40960 tokens = 11.6 GB per context!). Reranking chunks are ~800 tokens max, so 1024 is plenty. Now 711 MB per context — 16x less VRAM. 4 contexts went from 46 GB to 2.8 GB. 3. Adaptive parallelism via computeParallelism(): checks available VRAM and allocates at most 25% of free VRAM for contexts, capped at 8. Falls back to 1 on CPU (no benefit from multiple contexts with node-llama-cpp's withLock serialization). Gracefully handles allocation failures — uses however many contexts succeeded. VRAM budget per operation: - Embed: N × 143 MB (nomic-embed, 2048 ctx) - Rerank: N × 711 MB (Qwen3-Reranker-0.6B, 1024 ctx) - Generate: ~1.1 GB (qmd-expansion-1.7B, fresh ctx per call) Works across: - Large GPU boxes (4x A6000, 190 GB): allocates up to 8 contexts - Consumer GPUs (16 GB): 2-4 contexts fit comfortably - Apple Metal (8-16 GB unified): 1-4 contexts depending on memory - CPU-only: single context (parallelism has no benefit)

Holistic tuning pass on context and GPU configuration: GPU detection: - Use getLlamaGpuTypes() to discover available backends at runtime instead of try/catch loop. Prefer CUDA > Metal > Vulkan > CPU. - getLlama({gpu:'auto'}) returns false even when CUDA is available (node-llama-cpp issue), so we can't rely on it. Context tuning: - Rerank context: 2048 tokens (was auto=40960). The Qwen3 reranker template adds ~200 tokens overhead, chunks are ~800, query ~50. Total ~1050 tokens, so 2048 gives comfortable margin. VRAM per context: ~960 MB (was 11.6 GB with auto). - Flash attention enabled for rerank contexts (~20% less VRAM). Falls back gracefully if flash attention not supported. - Embed context: kept at model default (2048 for nomic-embed). Platform considerations: - CUDA (server): up to 8 parallel contexts, flash attention - Metal (MacBook): 1-4 contexts depending on unified memory - Vulkan: detected and used if CUDA/Metal unavailable - CPU: single context (parallelism has no benefit due to locks) Context size was 1024 initially but Qwen3's reranker template is verbose (system prompt + instruct + think tags) — some inputs exceeded 1024 tokens. Bumped to 2048 for safety.

Standalone benchmark for the reranking pipeline. Reports: - System info (CPU, GPU, VRAM) - Model VRAM usage - Per-config: parallelism, flash attention, median time, throughput (docs/s), VRAM per context, total VRAM, peak RSS - Speedup relative to baseline (1 context) Usage: bun src/bench-rerank.ts # full (40 docs, 3 iters, 1/2/4/8 ctx) bun src/bench-rerank.ts --quick # quick (10 docs, 1 iter) bun src/bench-rerank.ts --docs 100 # custom doc count Results on this machine: CUDA: 254ms/40 docs (8 ctx), 688ms (1 ctx) = 2.7x speedup CPU: 9697ms/40 docs (1 ctx) = 38x slower than single GPU ctx

Our assumption that CPU can't benefit from multiple contexts was wrong. The withLock in node-llama-cpp serializes within a single context, but separate contexts with split threads run on different cores in true parallel. Key changes: - computeParallelism() now returns >1 on CPU (cores / 4, max 4) - threadsPerContext() splits math cores evenly across contexts - Both embed and rerank contexts get proper thread counts - Benchmark updated to test CPU parallelism Before (CPU, 40 docs): 9.7s (4.1 docs/s) — 6 threads, 1 context After (CPU, 40 docs): 2.3s (17.2 docs/s) — 32 threads, 8 contexts Two fixes stacked: 1. Thread count: default was 6 (library hardcode), now uses all math cores — 2× improvement alone 2. Multi-context: splitting cores across 8 contexts gives another 2.2× on top End-to-end 'qmd query' on CPU: 10.3s → 2.9s CPU benchmark (Threadripper PRO 7975WX, 32 math cores): 1 ctx: 5001ms (8.0 docs/s) 2 ctx: 3585ms (11.2 docs/s) 1.4× 4 ctx: 2874ms (13.9 docs/s) 1.7× 8 ctx: 2323ms (17.2 docs/s) 2.2×

Replace hard 800-token boundary chunking with scoring algorithm that finds natural document break points. Chunks now end at headings, code blocks, and paragraph boundaries when possible. - Add break point scoring: h1=100, h2=90, h3=80, codeblock=80, blank=20 - Use squared distance decay so headings win even at window edge - Protect code fences from being split - Increase chunk size to 900 tokens to accommodate smart boundaries - Add comprehensive tests for chunking functions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add Smart Chunking section explaining break point scoring, distance decay formula, and code fence protection. Update token counts from 800 to 900 throughout. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

extractSnippet was using the snippet output length (500 chars) to determine the search window, which was too small even for fixed chunks. With variable-length smart chunks, this could miss relevant content entirely. Now uses CHUNK_SIZE_CHARS as fallback, ensuring the entire chunk region is searched regardless of actual chunk length. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add test-preload.ts with global afterAll hook that ensures llama.cpp Metal resources are properly disposed before process exit, avoiding GGML_ASSERT failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The 4 chars/token estimate is accurate for prose but code can be 1.7-2 chars/token. This caused chunks to exceed the embedding model's 2048 token context limit. - Use 3 chars/token as initial estimate (balanced for mixed content) - Add safety net: re-chunk any chunks that still exceed token limit - Use actual chars/token ratio when re-chunking for accuracy Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Scope package to @tobi/qmd, version 0.9.0 - Add files whitelist, publishConfig, repo metadata - Add CI workflow (bun tests on ubuntu + macos, bun latest + 1.1.0) - Add publish workflow (triggers on v* tags, publishes to npm) - Add release script for version bumping + changelog generation - Add LICENSE (MIT) and initial CHANGELOG.md - Update install instructions to use @tobi/qmd

Model download + GPU inference won't work on CI runners. Uses describe.skipIf(CI) for LlamaCpp Integration, LLM Session Management, vector search, and deep search tests.

Token-based chunking, vector search, hybrid search, and store LlamaCpp integration tests all require model downloads.

Update README installation and quick-start commands to Node examples. - replace bun install/link commands with npm-based Node workflow - bump package version to 0.9.9 for CLI and MCP metadata - keep Bun guidance as optional development/runtime note

Document both Node and Bun execution paths. - Update install examples to `@tobilu/qmd` for npm and bun. - Add npx/bunx one-off usage examples. - Reflect Bun as first-class supported runtime in requirements.

Fix claude plugin setup syntax

feat: use `build: "autoAttempt"` on `getLlama`

fix(index): deactivate stale docs on empty collection updates

fix(store): handle emoji-only filenames in handelize (tobi#302)

feat: add ignore patterns for collections

fix(llm): make query expansion context size configurable

fix: skip unreadable files during indexing instead of crashing

…ess-guard fix(cli): suppress progress bars when not TTY

fix(cli): prevent parser breakage on empty results across output formats

fix: support multiple concurrent HTTP clients

feat: expose candidateLimit as MCP tool parameter and CLI flag

…-name fix(package.json): correct Windows sqlite-vec package name + add linux-arm64

…-model feat: add QMD_EMBED_MODEL env var for multilingual embedding support

…aces feat(query): add --explain score traces for hybrid retrieval

- Cap rerank contexts at 4 to avoid VRAM exhaustion on high-core machines - Deduplicate identical chunk texts before sending to reranker - Cache rerank scores by chunk content instead of file path — same text from different files now shares a single reranker call - Add truncation cache to avoid re-tokenizing duplicate documents

@vyalamar

Add optional `intent` parameter that steers query expansion, reranking, chunk selection, and snippet extraction without searching on its own. When a query like "performance" is ambiguous (web-perf vs team health vs fitness), intent provides background context that disambiguates results across all pipeline stages: - expandQuery: includes intent in LLM prompt ("Query intent: {intent}") - rerank: prepends intent to rerank query for Qwen3-Reranker - chunk selection: intent terms scored at 0.5x weight vs query terms - snippet extraction: intent terms scored at 0.3x weight - strong-signal bypass: disabled when intent provided Available via CLI (--intent flag or intent: line in query documents), MCP (intent field on query tool), and programmatic API. Adapted from PR tobi#180 (thanks @vyalamar).

Allow QMD to be used as a library (`import { createStore } from '@tobilu/qmd'`) in addition to CLI and MCP modes. The constructor requires explicit dbPath and either a configPath (YAML file) or inline config object — no defaults assumed, making it safe to embed in any application. - Add src/index.ts entry point with QMDStore interface exposing search, retrieval, collection/context management, and index health - Add setConfigSource() to collections.ts for inline config support (in-memory config with no file I/O) - Add main/types/exports fields to package.json - Add SDK documentation section to README - Add 56 unit tests covering constructor, collections, contexts, search, document retrieval, config isolation, YAML persistence, and lifecycle

These modules are Smriti-specific additions (conversation memory layer and Ollama client) that don't belong in the QMD source tree. Moving them to smriti/src/ means: - qmd/ submodule stays as pure upstream code - Upstream syncs are conflict-free (no smriti code in qmd/src/) - A thin adapter (smriti/src/qmd-internals.ts) is the sole coupling point

igrigorik and others added 30 commits February 10, 2026 16:37

feat: promote query as primary search command, add CLI aliases

96634da

List query first in --help as the recommended search method. Add vector-search and deep-search as undocumented CLI aliases matching MCP tool names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: disable following symlinks in glob.scan. Closes tobi#134

5dec3ab

feat: add --version/-v flag. Closes tobi#88

da79e77

fix: allow $ route filenames in handelize. Closes tobi#162

0eabfe7

fix: reactivate deactivated documents on re-index. Closes tobi#168

96643a2

fix: verify sqlite-vec readiness after extension load. Closes tobi#169

73136e4

Add QMD architecture diagram to README

03a25d6

Generated with PaperBanana (Gemini 3 Pro). Shows query expansion fanning HyDE+Vec into vector searches, Lex into BM25, merged via reciprocal rank fusion and LLM reranking.

docs: document smart chunking algorithm in README

3211225

Add Smart Chunking section explaining break point scoring, distance decay formula, and code fence protection. Update token counts from 800 to 900 throughout. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: proper cleanup of Metal GPU resources in tests

537d15a

Add test-preload.ts with global afterAll hook that ensures llama.cpp Metal resources are properly disposed before process exit, avoiding GGML_ASSERT failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

test: skip LLM integration tests in CI

ed4df97

Model download + GPU inference won't work on CI runners. Uses describe.skipIf(CI) for LlamaCpp Integration, LLM Session Management, vector search, and deep search tests.

test: skip all model-dependent tests in CI

73985a2

Token-based chunking, vector search, hybrid search, and store LlamaCpp integration tests all require model downloads.

ci: use npm trusted publishing (OIDC) instead of token

53bf2eb

chore: rename package scope to @tobilu/qmd

5d73752

chore: fix bin path, add author, use token-based npm publish

00ff084

ci: use trusted publishing (OIDC provenance)

ee58a68

docs: show bun/node install and package scope

b88c10b

Document both Node and Bun execution paths. - Update install examples to `@tobilu/qmd` for npm and bun. - Add npx/bunx one-off usage examples. - Reflect Bun as first-class supported runtime in requirements.

tobi and others added 30 commits March 7, 2026 14:24

Merge pull request tobi#311 from gi11es/patch-1

6934c46

Fix claude plugin setup syntax

Merge pull request tobi#310 from giladgd/nodeLlamaCppUseBuildAutoAttempt

f75c668

feat: use `build: "autoAttempt"` on `getLlama`

Merge pull request tobi#312 from 0xble/fix/empty-collection-deactivate

72e96d1

fix(index): deactivate stale docs on empty collection updates

Merge pull request tobi#308 from debugerman/fix/handelize-emoji-crash

e6b50cf

fix(store): handle emoji-only filenames in handelize (tobi#302)

Merge pull request tobi#304 from sebkouba/feature/collection-ignore

a28163f

feat: add ignore patterns for collections

Merge pull request tobi#313 from 0xble/fix/expand-context-size-config

ee08997

fix(llm): make query expansion context size configurable

Merge pull request tobi#253 from jimmynail/fix/skip-unreadable-files

271feb7

fix: skip unreadable files during indexing instead of crashing

Merge pull request tobi#230 from Balneario-de-Cofrentes/fix/tty-progr…

0b3fb07

…ess-guard fix(cli): suppress progress bars when not TTY

Merge pull request tobi#228 from amsminn/fix-empty-results-format

8bd9336

fix(cli): prevent parser breakage on empty results across output formats

Merge pull request tobi#286 from joelev/fix/multi-session-http

e3bc5cc

fix: support multiple concurrent HTTP clients

Merge pull request tobi#255 from pandysp/feat/expose-candidate-limit

a4b641d

feat: expose candidateLimit as MCP tool parameter and CLI flag

Merge pull request tobi#225 from ilepn/fix/sqlite-vec-windows-package…

cb5d84f

…-name fix(package.json): correct Windows sqlite-vec package name + add linux-arm64

Merge pull request tobi#273 from daocoding/feature/configurable-embed…

7904ab9

…-model feat: add QMD_EMBED_MODEL env var for multilingual embedding support

feat(query): add --explain score traces for hybrid search

b068ad0

Merge pull request tobi#242 from vyalamar/feat/query-explain-score-tr…

44d7145

…aces feat(query): add --explain score traces for hybrid retrieval

docs: write changelog for 1.1.2

0ff9bec

release: v1.1.2

b838f74

docs: write changelog for 1.1.5

66fb5b1

release: v1.1.5

da9cf69

docs: credit Ilya Grigorik in 1.1.5 changelog

ba97c03

fix: update Store type to match intent parameter signatures

4fa1168

docs: write changelog for 1.1.6

0c83dc1

release: v1.1.6

032f26e

refactor: remove memory.test.ts (tests moved to smriti/test/)

14b1ce0

Merge remote-tracking branch 'upstream/main' into HEAD

cf7ae99

chore: regenerate bun.lock after upstream merge

f951f99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: move memory.ts + ollama.ts to smriti/src, sync upstream v1.1.6#1

refactor: move memory.ts + ollama.ts to smriti/src, sync upstream v1.1.6#1
ashu17706 wants to merge 156 commits intomainfrom
refactor/move-memory-ollama-to-smriti

ashu17706 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Conversation

ashu17706 commented Mar 10, 2026

Summary

Why

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants