Morphik Core

Note: Morphik is launching a hosted service soon! Please sign up for the waitlist.

What is Morphik?

Morphik is an open-source database designed for AI applications that simplifies working with unstructured data. It provides advanced RAG (Retrieval Augmented Generation) capabilities with multi-modal support, knowledge graphs, and intuitive APIs.

Built for scale and performance, Morphik can handle millions of documents while maintaining fast retrieval times. Whether you're prototyping a new AI application or deploying production-grade systems, Morphik provides the infrastructure you need.

Features

📄 First-class Support for Unstructured Data
- Ingest ANY file format (PDFs, videos, text) with intelligent parsing
- Advanced retrieval with ColPali multi-modal embeddings
- Automatic document chunking and embedding
🧠 Knowledge Graph Integration
- Extract entities and relationships automatically
- Graph-enhanced retrieval for more relevant results
- Explore document connections visually
🔍 Advanced RAG Capabilities
- Multi-stage retrieval with vector search and reranking
- Fine-tuned similarity thresholds
- Detailed metadata filtering
📏 Natural Language Rules Engine
- Define schema-like rules for unstructured data
- Extract structured metadata during ingestion
- Transform documents with natural language instructions
💾 Persistent KV-caching
- Pre-process and "freeze" document states
- Reduce compute costs and response times
- Cache selective document subsets
🔌 MCP Support
- Model Context Protocol integration
- Easy knowledge sharing with AI systems
🧩 Extensible Architecture
- Support for custom parsers and embedding models
- Multiple storage backends (S3, local)
- Vector store integration with PostgreSQL/pgvector

Quick Start

Installation

# Clone the repository
git clone https://github.com/morphik-org/morphik-core.git
cd morphik-core

# Create a virtual environment
python3.12 -m venv .venv
source .venv/bin/activate  # Linux/macOS

# Install dependencies
pip install -r requirements.txt

# Configure and start the server
python quick_setup.py
python start_server.py

Using the Python SDK

from morphik import Morphik

# Connect to Morphik server
db = Morphik("morphik://localhost:8000")

# Ingest a document
doc = db.ingest_text("This is a sample document about AI technology.", 
                    metadata={"category": "tech", "author": "Morphik"})

# Ingest a file (PDF, DOCX, video, etc.)
doc = db.ingest_file("path/to/document.pdf", 
                    metadata={"category": "research"})

# Use ColPali for multi-modal documents (PDFs with images, charts, etc.)
doc = db.ingest_file("path/to/report_with_charts.pdf", use_colpali=True)

# Apply natural language rules during ingestion
rules = [
    {"type": "metadata_extraction", "schema": {"title": "string", "author": "string"}},
    {"type": "natural_language", "prompt": "Remove all personally identifiable information"}
]
doc = db.ingest_file("path/to/document.pdf", rules=rules)

# Retrieve relevant document chunks
chunks = db.retrieve_chunks("What are the latest AI advancements?", 
                           filters={"category": "tech"}, 
                           k=5)

# Generate a completion with context
response = db.query("Explain the benefits of knowledge graphs in AI applications",
                   filters={"category": "research"})
print(response.completion)

# Create and use a knowledge graph
db.create_graph("tech_graph", filters={"category": "tech"})
response = db.query("How does AI relate to cloud computing?", 
                   graph_name="tech_graph", 
                   hop_depth=2)

Batch Operations

# Ingest multiple files
docs = db.ingest_files(
    ["doc1.pdf", "doc2.pdf"],
    metadata={"category": "research"},
    parallel=True
)

# Ingest all PDFs in a directory
docs = db.ingest_directory(
    "data/documents",
    recursive=True,
    pattern="*.pdf"
)

# Batch retrieve documents
docs = db.batch_get_documents(["doc_id1", "doc_id2"])

Multi-modal Retrieval (ColPali)

# Ingest a PDF with charts and images
db.ingest_file("report_with_charts.pdf", use_colpali=True)

# Retrieve relevant chunks, including images
chunks = db.retrieve_chunks(
    "Show me the Q2 revenue chart", 
    use_colpali=True, 
    k=3
)

# Process retrieved images
for chunk in chunks:
    if hasattr(chunk.content, 'show'):  # If it's an image
        chunk.content.show()
    else:
        print(chunk.content)

Why Choose Morphik?

Feature	Morphik	Traditional Vector DBs	Document DBs	LLM Frameworks
Multi-modal Support	✅ Advanced ColPali embedding for text + images	❌ or Limited	❌	❌
Knowledge Graphs	✅ Automated extraction & enhanced retrieval	❌	❌	❌
Rules Engine	✅ Natural language rules & schema definition	❌	❌	Limited
Caching	✅ Persistent KV-caching with selective updates	❌	❌	Limited
Scalability	✅ Millions of documents with PostgreSQL	✅	✅	Limited
Video Content	✅ Native video parsing & transcription	❌	❌	❌
Deployment Options	✅ Self-hosted, cloud, or hybrid	Varies	Varies	Limited
Open Source	✅ MIT License	Varies	Varies	Varies
API & SDK	✅ Clean Python SDK & RESTful API	Varies	Varies	Varies

Key Advantages

ColPali Multi-modal Embeddings: Process and retrieve from documents based on both textual and visual content, maintaining the visual context that other systems miss.
Cache Augmented Retrieval: Pre-process and "freeze" document states to reduce compute costs by up to 80% and drastically improve response times.
Schema-like Rules for Unstructured Data: Define rules to extract consistent metadata from unstructured content, bringing database-like queryability to any document format.
Enterprise-grade Scalability: Built on proven PostgreSQL database technology that can scale to millions of documents while maintaining sub-second retrieval times.

Documentation

For comprehensive documentation:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Community

Discord - Join our community
GitHub - Contribute to development

Built with ❤️ by Morphik

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
core		core
docs		docs
evaluations		evaluations
examples		examples
scripts		scripts
sdks/python		sdks/python
ui-component		ui-component
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DOCKER.md		DOCKER.md
Dockerfile.ui		Dockerfile.ui
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
db_atf_demo_hq.gif		db_atf_demo_hq.gif
docker-compose.yml		docker-compose.yml
dockerfile		dockerfile
dump.sql		dump.sql
init.sql		init.sql
morphik.toml		morphik.toml
morphik_no_pad.png		morphik_no_pad.png
ollama-entrypoint.sh		ollama-entrypoint.sh
package-lock.json		package-lock.json
postgres.dockerfile		postgres.dockerfile
pytest.ini		pytest.ini
quick_setup.py		quick_setup.py
requirements.txt		requirements.txt
shell.py		shell.py
start_server.py		start_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Morphik Core

What is Morphik?

Features

Quick Start

Installation

Using the Python SDK

Batch Operations

Multi-modal Retrieval (ColPali)

Why Choose Morphik?

Key Advantages

Documentation

License

Community

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Morphik Core

What is Morphik?

Features

Quick Start

Installation

Using the Python SDK

Batch Operations

Multi-modal Retrieval (ColPali)

Why Choose Morphik?

Key Advantages

Documentation

License

Community

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages