Skip to content

feat(vectorstores): add Valkey vector store integration#1

Open
rileydes-improving wants to merge 4 commits into
mainfrom
feature/valkey-vector-store
Open

feat(vectorstores): add Valkey vector store integration#1
rileydes-improving wants to merge 4 commits into
mainfrom
feature/valkey-vector-store

Conversation

@rileydes-improving

Copy link
Copy Markdown
Owner
  • Add ValkeyApi credential for host/port/auth configuration
  • Add ValkeyUrlApi credential for connection URL-based setup
  • Implement ValkeyVectorStore node with full CRUD operations
  • Support vector similarity search with metadata filtering
  • Add comprehensive integration tests with Valkey 9.1+ and valkey-search v1.2+ module
  • Add unit tests covering core functionality and edge cases
  • Include Valkey icon asset for UI representation
  • Update package.json with valkey-glide dependency

- Add ValkeyApi credential for host/port/auth configuration
- Add ValkeyUrlApi credential for connection URL-based setup
- Implement ValkeyVectorStore node with full CRUD operations
- Support vector similarity search with metadata filtering
- Add comprehensive integration tests with Valkey 9.1+ and valkey-search  v1.2+ module
- Add unit tests covering core functionality and edge cases
- Include Valkey icon asset for UI representation
- Update package.json with valkey-glide dependency

Signed-off-by: Riley Des <riley.desserre@improving.com>

@edlng edlng left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: 3 actionable findings.

Comment thread packages/components/nodes/vectorstores/Valkey/Valkey.ts
Comment thread packages/components/nodes/vectorstores/Valkey/Valkey.ts
Comment thread packages/components/nodes/vectorstores/Valkey/Valkey.ts Outdated

@MatthiasHowellYopp MatthiasHowellYopp left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a solid, well-structured integration. The test coverage — both unit and integration — is thorough and deterministic. A few issues worth addressing before merge:


Valkey.ts

[warning] createGlideClient uses any for clientConfig (~line 490)
Defeats strict-mode safety and masks type errors. The Glide client API has a typed config interface — use it.

import type { GlideClientConfiguration } from '@valkey/valkey-glide'
const clientConfig: GlideClientConfiguration = { ... }

[warning] Connection leak in init() (~line 590)
The persistent GlideClient created in init() has no teardown path. The comment acknowledges this matches the Postgres/TypeORM pattern, but one TCP connection per chatflow instance accumulates indefinitely in long-running servers. At minimum, document a maximum pool size or track a Flowise lifecycle teardown hook issue.

[warning] JSON.parse(valkeyMetadataFilter) without schema validation (~line 578)
A non-array value (e.g. "tag1") would silently produce a string filter instead of the expected array.

const parsed = JSON.parse(valkeyMetadataFilter)
if (!Array.isArray(parsed) && typeof parsed !== 'string') {
    throw new Error('valkeyMetadataFilter must be a JSON array of strings or a string')
}
filter = parsed

[nit] //@ts-ignore on vectorStoreMethods
Suppresses a real type mismatch. Use the correct IVectorStoreMethods interface or an explicit inline type instead.

[nit] Unused import: Embeddings alongside EmbeddingsInterface (lines 6–7)
Embeddings is only used as a type annotation where EmbeddingsInterface would be more correct and consistent.


ValkeyApi.credential.ts

[nit] default: '6379' is a string but type is 'number' (line 28)

default: 6379

Valkey.int.test.ts

[warning] deleteAll test uses scan without cursor loop (~line 80)
Uses client.scan('0', ...) once without looping on the cursor — may miss keys on larger datasets. The TTL test already does this correctly; the deleteAll test should match.

[nit] if (skip()) return makes skipped tests appear green
Intentional per the file header, but test.skip / xit would mark them as skipped in the report instead of passing, giving clearer CI signal.

[nit] catch (err: any) in checkValkeyAvailability
Prefer narrowing over any:

catch (err) {
    const msg = err instanceof Error ? err.message : String(err)
}

package.json

[nit] Dependency may be in the wrong section
@valkey/valkey-glide appears to be added under devDependencies. If the Valkey node is shipped as part of the components package, it should be a regular dependency.


Verdict

The core implementation is correct and well-tested. The connection-leak concern in init() and the missing filter validation are the most meaningful issues. Everything else is polish. Approve with changes — address the connection-leak documentation and the any typing in createGlideClient before merge.

- Replace `any` in createGlideClient with GlideClientConfiguration type
- Add validation for valkeyMetadataFilter after JSON.parse
- Pass sortby/dialect/limit from buildQuery to GlideFt.search
- Flatten array metadata values into TAG field (fixes silent drop)
- Replace @ts-ignore with @ts-expect-error + explanation
- Replace lodash flatten with native .flat()
- Use EmbeddingsInterface instead of Embeddings class for type assertions
- Import GlideClientConfiguration from @valkey/valkey-glide
- Fix port default from string '6379' to number 6379 in credential
- Use cursor loop in deleteAll test scan verification
- Narrow catch (err: any) to proper instanceof check
@rileydes-improving

Copy link
Copy Markdown
Owner Author

Review feedback addressed

@edlng @MatthiasHowellYopp — pushed two commits addressing 10 of 13 findings. All unit tests pass (65/65), lint is clean.

What changed

80da960b — Valkey.ts fixes:

  • createGlideClient typed with GlideClientConfiguration (was any)
  • Filter validation after JSON.parse — now throws on non-string, non-array inputs
  • @ts-ignore replaced with @ts-expect-error + rationale
  • Embeddings class swapped for EmbeddingsInterface in type assertions
  • lodash flatten replaced with native .flat()
  • sortby/dialect/limit passed through to GlideFt.search
  • flatMap handles array metadata values in TAG field

17cfc2e1 — Credential + test fixes:

  • Port default: '6379'6379 (matches declared type: 'number')
  • deleteAll test scan uses proper cursor loop
  • catch (err: any) narrowed with instanceof Error

Pushbacks (3 findings)

Connection leak in init() (MatthiasHowellYopp):
The persistent client is intentional. Flowise has no lifecycle teardown hook, and every other vector store (Postgres, TypeORM, Pinecone) works the same way. Recreating the client per query would bypass GLIDE's multiplexed connection and automatic reconnection. The in-code comment at line ~672 documents this tradeoff.

if (skip()) return vs test.skip (MatthiasHowellYopp):
Jest's test.skip is statically evaluated — it can't be called conditionally after an async beforeAll. Since Valkey availability requires a TCP connection attempt, runtime conditional skipping is the only option. This is a common pattern in integration test suites with external service dependencies.

@valkey/valkey-glide in wrong section (MatthiasHowellYopp):
Verified: line 118 of packages/components/package.json, inside "dependencies" (starts line 33). "devDependencies" begins at line 193. No change needed.


Suggestions for tightening AI-assisted reviews

A few calibration notes that might help catch false positives before posting:

  1. Run a verification command on factual claims. The dependency-placement finding would have been caught by grep -n "valkey-glide" packages/components/package.json and checking which section header it falls under. Takes 30 seconds, avoids a round-trip.

  2. Check runtime constraints before suggesting alternatives. The test.skip suggestion assumed it was a drop-in replacement. Checking Jest docs for "conditional skip" reveals that test.skip is compile-time only — async conditions require a different pattern.

  3. Cross-reference existing codebase patterns. The connection-leak concern is valid in isolation, but grep -r "client.close\|finally" packages/components/nodes/vectorstores/ shows every Flowise vector store follows the same lifecycle pattern. When you see a pattern repeated 4+ times across a codebase, it's likely a deliberate architectural choice.

  4. Verify before suggesting "confirm with docs." The SORTBY/LIMIT finding correctly identified dead code but hedged with "confirm with valkey-glide 2.4.0 docs." The reviewer could have checked node_modules/@valkey/valkey-glide/build-ts/server-modules/GlideFtOptions.d.ts directly — it takes less time than writing the hedge.

These are calibration points, not complaints. The TAG field array bug and the missing filter validation were real production-data-loss risks, and catching those is exactly what makes a review valuable. More of that, less of the easily-verifiable nits.

@MatthiasHowellYopp MatthiasHowellYopp left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jonathan-Improving Jonathan-Improving left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valkey Vector Store Node Review

Verdict: COMMENT — Solid implementation that meets AEA-488 acceptance criteria. GLIDE API usage is mostly correct, test coverage is comprehensive, and the Flowise node pattern is followed well. A few items to address before upstream submission.

Summary

  • ✅ All AEA-488 acceptance criteria met
  • ✅ Comprehensive unit tests (798 lines) and integration tests (494 lines)
  • ✅ Correct GlideFt.create/search/info/dropindex usage
  • ✅ Proper TAG value escaping in FT.SEARCH queries
  • ✅ Connection lifecycle (try/finally/close) in upsert/delete paths
  • ✅ Concurrent index creation handled gracefully
  • ⚠️ 2 must-fix: NaN propagation in topK, potential query injection via field names
  • ⚠️ 3 suggestions: missing Decoder for binary data, dead createIndexOptions field, inconsistent retry strategy

Findings

See inline comments for details.

Comment thread packages/components/nodes/vectorstores/Valkey/Valkey.ts Outdated
Comment thread packages/components/nodes/vectorstores/Valkey/Valkey.ts
Comment thread packages/components/nodes/vectorstores/Valkey/Valkey.ts
Comment thread packages/components/nodes/vectorstores/Valkey/Valkey.ts Outdated
Comment thread packages/components/nodes/vectorstores/Valkey/Valkey.ts
- Guard against NaN propagation when topK is non-numeric string
- Validate contentKey/metadataKey/vectorKey against /^[a-zA-Z_][a-zA-Z0-9_]*$/
  to prevent FT.SEARCH query injection via crafted field names
- Remove unused createIndexOptions (interface, field, constructor assignment)
- Document that retryStrategy is ClusterBatchOptions-only per valkey-glide API

Signed-off-by: Riley Des <riley.desserre@improving.com>
@rileydes-improving rileydes-improving requested review from Jonathan-Improving and removed request for edlng June 3, 2026 14:36

@Jonathan-Improving Jonathan-Improving left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review (Round 2): Valkey Vector Store Integration

Verdict: COMMENT — 1 finding (1M). All 8 round-1 findings correctly resolved. GLIDE TypeScript API usage verified correct.

Round 1 Resolution Summary

All 8 findings addressed:

  • ✅ TAG field dropping arrays → flatMap handles arrays
  • ✅ SORTBY/LIMIT/DIALECT not passed → now passed explicitly
  • ✅ NaN propagation on topK → isNaN guard added
  • ✅ Field name query injection → regex validation in constructor
  • ✅ Dead createIndexOptions code → removed
  • ✅ Decoder option → confirmed not needed (returnFields excludes binary)
  • ✅ Retry strategy asymmetry → documented (API limitation)
  • ✅ lodash flatten → native .flat()

GLIDE API Verification

All TypeScript GLIDE calls verified correct:

  • GlideFt.search/create/dropindex/info ✅
  • Batch/ClusterBatch constructor + hset + expire + exec ✅
  • GlideClient.createClient config (addresses, credentials, useTLS) ✅
  • ClusterScanCursor scan loop ✅
  • FtSearchOptions structure (params, returnFields, sortby, limit, dialect) ✅
  • Field schema (TAG with separator, VECTOR with attributes) ✅

New Finding (Round 2)

See inline comment below.

Comment on lines +236 to +244
[`${this.metadataKey}_tags`]: Object.values(metadata)
.flatMap((v) =>
Array.isArray(v)
? v.filter((x) => x != null && typeof x !== 'object').map(String)
: v != null && typeof v !== 'object'
? [String(v)]
: []
)
.join(',')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion — Medium] TAG field comma-separator collision with metadata values containing commas

Metadata values are joined with , (the TAG separator configured at line 138), but individual values that contain commas are NOT escaped before joining. A metadata value like "San Francisco, CA" produces two tags (San Francisco and CA) instead of one.

Impact: Metadata filtering with TAG-based queries will return incorrect results when metadata values contain commas — a common occurrence in real data (locations, multi-word categories, CSV-like fields).

Suggested fix: Escape commas in individual tag values before joining:

.flatMap((v) =>
    Array.isArray(v)
        ? v.filter((x) => x != null && typeof x !== 'object').map(String)
        : v != null && typeof v !== 'object' ? [String(v)] : []
)
.map((t) => t.replace(/,/g, '\\\\,'))  // escape commas within values
.join(',')

Or use a different TAG separator that is less likely to appear in natural metadata (e.g., | or \x1f).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants