Skip to content

[Feature]: Integrate PostgreSQL with pgvector as the Vector Store #8

@Harmeet10000

Description

@Harmeet10000

Prerequisites

  • I have searched the existing issues to avoid duplicates
  • I understand that this is just a suggestion and might not be implemented

Problem Statement

Our current vector store solution (if any, or a placeholder) may not be optimal for long-term scalability, complex metadata filtering, or integration with our existing data infrastructure. Specific issues might include:

  • Difficulty in performing advanced queries combining vector similarity with structured metadata filtering.
  • Challenges in managing the vector store alongside other application data, leading to fragmented data management.
  • Potential limitations in scalability or cost-effectiveness compared to a robust, self-hosted solution.
  • Desire to leverage an existing and familiar database technology (PostgreSQL) for vector storage.

Proposed Solution

Integrate PostgreSQL with the pgvector extension as the primary vector store for our LangChain project. This approach leverages the power and familiarity of PostgreSQL for both structured data and high-performance vector similarity search.

Key aspects of the integration would include:

  • pgvector Extension: Ensure pgvector is installed and enabled on our PostgreSQL instance.
  • LangChain Integration: Use LangChain's PGVector class to interact with the PostgreSQL vector store for embedding storage and retrieval.
  • Embedding Storage: Store document embeddings and associated metadata directly in PostgreSQL tables.
  • Hybrid Search: Enable powerful hybrid search capabilities, combining pgvector's similarity search with standard SQL queries for filtering based on metadata. This will significantly enhance retrieval capabilities.
  • Scalability and Management: Leverage PostgreSQL's inherent features for replication, backups, and general database management.

This move would centralize our data management, provide robust querying capabilities, and ensure a scalable and maintainable vector store solution within our existing tech stack.

Alternatives Considered

  • Dedicated vector databases (e.g., Pinecone, Weaviate, Milvus): While these offer specialized performance for vector search, pgvector provides a compelling alternative by allowing us to keep vector and structured data together within a familiar ecosystem, potentially simplifying deployment and reducing infrastructure complexity/cost if PostgreSQL is already in use.
  • In-memory vector stores (e.g., FAISS for local development): These are good for development but are not suitable for production-grade persistent storage or scalability.
  • Other relational databases with custom vector implementations: Less mature or performant than pgvector, which is specifically optimized for this use case.

Additional Context

Using pgvector is an increasingly popular and powerful choice for LangChain applications, offering a strong balance between performance, flexibility, and integration with existing data infrastructure. It allows us to perform precise RAG queries by filtering on document metadata before or during the vector similarity search.
See pgvector GitHub and LangChain PGVector documentation for more details.

Priority

Critical

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions