PostgreSQL-based RAG Foundation Framework with Database Query Capabilities
A comprehensive PostgreSQL-based RAG foundation framework that provides data synchronization, vector search, and advanced database query capabilities for Farcaster protocol data
In recent years, as Crypto and LLM technology evolved in parallel, a subtle yet profound realization began to surface: philosophically, LLM might stand at the exact opposite end of Crypto. These two domains represent radically different human responses to entropy β one seeks to impose rational order, the other chooses to dance faster in the chaos.
Crypto β especially in its cryptopunk ethos β is an art of entropy control, using cryptography as a blade to carve order into a high-entropy world. Meanwhile, the LLM community behaves more like a tribe of intuitive dancers around a black box, embracing vibes, emergence, and generative chaos, without insisting on transparency, verification, or absolute control.
This core divergence creates two entirely different cultures:
- Crypto worships openness, transparency, and permissionless access.
- LLM culture seems content with black-box outputs, hardware aristocracy, and closed creative loops.
To fuse the two is to dance in chains.
SnapRAG is precisely such a Don Quixoteβstyle endeavor.
It is fully open-source, written in Rust, with extreme performance and absolutely no hidden backend logic. No vendor lock-in, no gated APIs β anyone can run it locally and hold full sovereignty over their AI stack. Built on top of the decentralized social protocol Farcaster, SnapRAG leverages a full Snapchain node to synchronize every piece of raw network data, forming the foundation of a truly omniscient Farcaster AI.
As of today, the Snapchain network has produced:
- 17,894,257 blocks
- 1,393,099 registered users
- 224,749,776 casts
This is a dataset of massive scale.
But unlike Twitter, Farcaster grants radical data freedom β run a node, and you instantly gain access to everyone's data. No rate limits, no API gatekeeping, no silent throttling. On top of this freedom, SnapRAG offers a powerful operational toolkit:
- β Import all Farcaster data into a local database
- β Vectorize all Casts into embeddings
- β Perform high-performance semantic search β locally
For developers, this is the moment of reclaiming AI computation sovereignty:
Your node. Your database. Your embeddings. Your AI β not rented from someone else's API ceiling.
SnapRAG is not just a tool β it is a full capability surface.
It is a CLI, an API service, and an MCP-compatible interface, all with built-in support for our Farcaster-native protocol x402. This means SnapRAG can serve both individual hackers and teams seeking to build minimal-profitable open AI infrastructure on Farcaster.
We deeply respect the work done by Neynar β it's a fantastic product. But it remains closed. SnapRAG openly invites teams like Neynar to migrate onto this open infrastructure and run truly real-time, open, permissionless AI services at Farcaster scale.
Because that's what blockchain ethos demands.
Everything we do is so that the spirit of Crypto does not dissolve in the age of AI.
SnapRAG is a PostgreSQL-based RAG foundation framework designed specifically for Farcaster protocol data. It provides a complete data synchronization layer, vector search capabilities, and advanced database query functionality, making it an ideal foundation for building RAG (Retrieval-Augmented Generation) applications on top of Farcaster data.
- ποΈ RAG Foundation: PostgreSQL-based framework for building RAG applications
- π Library + CLI: Use as Rust library OR standalone CLI tool
- π Data Synchronization: Complete historical + real-time Farcaster data sync
- π Vector Search: Built-in pgvector support for semantic similarity search
- π Advanced Queries: Rich database query capabilities and analytics
- π High Performance: Rust-based with async PostgreSQL integration
- π‘οΈ Data Integrity: Complete audit trail and change tracking
- ποΈ CLI Tools: Full command-line interface for all operations
- Quick Start
- Docker Deployment π³ NEW
- Features
- Library Usage β NEW
- CLI Commands
- Database Schema
- Text Search Capabilities
- Block Data Distribution
- Installation
- Usage
- Architecture
- Testing
- Configuration
- Performance Tuning
- Troubleshooting
- Contributing
Note: SnapRAG can be built without a database connection! The build process uses SQLx offline mode by default, so you can compile the project before setting up your database. This makes it easy to get started quickly.
# 1. Clone and setup
git clone <repository-url> && cd snaprag
# 2. Build the project (no database required!)
cargo build
# 3. Create configuration file
cp config.example.toml config.toml
# 4. Edit config.toml with your database connection details
# Update the database.url field with your actual database connection string
# 5. Check your configuration
make check-config # Verify config.toml is valid
# 6. Ensure required extensions are enabled on your database
# Connect to your database and run:
# CREATE EXTENSION IF NOT EXISTS vector;
# CREATE EXTENSION IF NOT EXISTS pg_trgm;
# 7. Run database migrations and application
make migrate # Run database migrations
make run # Run the application# Cargo.toml
[dependencies]
snaprag = { path = "../snaprag" }
tokio = { version = "1.0", features = ["full"] }// src/main.rs
use snaprag::{SnapRag, AppConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = AppConfig::load()?;
let snaprag = SnapRag::new(&config).await?;
// Search profiles
let results = snaprag.semantic_search_profiles(
"developers interested in crypto",
10,
Some(0.7)
).await?;
println!("Found {} profiles", results.len());
Ok(())
}See examples in examples/ directory!
# 1. Create config.toml
cp config.example.toml config.toml
vim config.toml # Update database URL
# 2. Run container
docker run -d \
--name snaprag \
-p 3000:3000 \
-v $(pwd)/config.toml:/app/config.toml:ro \
-v $(pwd)/logs:/app/logs \
--add-host=host.docker.internal:host-gateway \
ryankung/snaprag:latest api
# Access at http://localhost:3000# Quick start (all in one)
make -f Makefile.docker docker-quick-start
# Or step by step
make -f Makefile.docker setup-config # Create config.toml
make -f Makefile.docker docker-build # Build image
make -f Makefile.docker docker-run # Run containerπ Complete Docker Documentation β
Design Philosophy:
- β Single-container deployment - Simple and fast
- β Config always external - Security best practice (no secrets in image)
- β Connects to external services - Use your existing PostgreSQL/Redis
- β Production-ready - Non-root user, health checks, logging
Key Commands:
make -f Makefile.docker help # Show all commands
make -f Makefile.docker docker-build # Build image
make -f Makefile.docker docker-run # Run API server
make -f Makefile.docker docker-logs # View logs
make -f Makefile.docker docker-stop # Stop containerRequirements:
- PostgreSQL 15+ with pgvector (running separately)
- Redis 7+ (optional, for caching)
- config.toml configured with your database URL
- PostgreSQL-based Architecture: Built on PostgreSQL for robust data management
- Vector Search Ready: Built-in pgvector support for semantic similarity search
- Query Interface: Rich database query capabilities for complex analytics
- Data Synchronization Layer: Complete Farcaster data sync from snapchain
- RAG Application Ready: Designed as a foundation for building RAG applications
- Historical Data Sync: Complete synchronization of past Farcaster data from snapchain
- Real-time Monitoring: Live monitoring of new blocks and user activities
- Shard-based Processing: Efficient processing of data across multiple shards
- Lock File Management: Prevents concurrent sync processes with PID tracking
- Progress Tracking: Real-time sync progress and status monitoring
- Historical Profile Preservation: Complete snapshot history of user profile changes
- Efficient Current State Access: Fast queries for current profile data
- Vector Embeddings Support: Built-in support for pgvector for semantic search
- Advanced Database Queries: Complex analytics and data exploration capabilities
- Change Tracking: Detailed audit trail of all profile modifications
- Username Proofs: Support for Farcaster-style username verification
- Activity Timeline: Comprehensive user activity tracking
- No Data Cleanup: All historical data is preserved indefinitely
- Comprehensive CLI: Full command-line interface for all operations
- Sync Management: Start, stop, and monitor synchronization processes
- Data Querying: List and search FIDs, profiles, casts, and relationships
- Database Operations: Migration, reset, and maintenance commands
# Show help
cargo run -- --help
# List available data
cargo run list fid --limit 50
cargo run list profiles --limit 20
cargo run list casts --limit 100
cargo run list follows --limit 50
# Reset all data
cargo run reset --force
# Show configuration
cargo run config# Sync all data (historical + real-time)
cargo run sync all
# Start historical sync with optional range
cargo run sync start
cargo run sync start --from 1000000 --to 2000000
# Start real-time sync
cargo run sync realtime
# Show sync status
cargo run sync status
# Stop running sync
cargo run sync stopThe system uses the following main tables:
user_profiles: Current profile state (latest values only)user_profile_snapshots: Historical profile snapshotsuser_data_changes: Detailed change trackinguser_activity_timeline: User activity history
fids: Farcaster ID registryfname_transfers: Username transfer historysigners: User signer keyssigner_history: Signer key changesstorage_rent_events: Storage rent eventsid_register_events: ID registration events
sync_state: Synchronization state and progressshard_block_info: Shard and block tracking for data origin
SnapRAG includes built-in support for PostgreSQL's pg_trgm extension, providing powerful trigram-based text search capabilities across all text fields.
Find text with high similarity using trigram matching:
-- Find casts with text similar to "crypto" (similarity threshold: 0.3)
SELECT fid, text, similarity(text, 'crypto') as sim
FROM casts
WHERE text % 'crypto'
ORDER BY sim DESC
LIMIT 10;
-- Find usernames similar to "alice"
SELECT fid, username, similarity(username, 'alice') as sim
FROM username_proofs
WHERE username % 'alice'
ORDER BY sim DESC;Use ILIKE with trigram optimization for pattern searches:
-- Find casts containing "bitcoin" (case-insensitive)
SELECT fid, text
FROM casts
WHERE text ILIKE '%bitcoin%'
ORDER BY timestamp DESC
LIMIT 20;
-- Find usernames starting with "crypto"
SELECT fid, username
FROM username_proofs
WHERE username ILIKE 'crypto%';Find text with typos or variations:
-- Find casts mentioning "ethereum" with fuzzy matching
SELECT fid, text, similarity(text, 'ethereum') as sim
FROM casts
WHERE text % 'ethereum'
AND similarity(text, 'ethereum') > 0.4
ORDER BY sim DESC;The following GIN indexes are automatically created for optimal text search performance:
idx_casts_text_trgm: Fast text search on cast contentidx_cast_embeddings_text_trgm: Text search on embedding textidx_cast_embedding_chunks_text_trgm: Text search on embedding chunksidx_user_profile_changes_value_trgm: Text search on profile field valuesidx_username_proofs_username_trgm: Text search on usernames
- Use similarity thresholds: Always specify minimum similarity thresholds for better performance
- Combine with other filters: Use text search with timestamp or user filters
- Index utilization: The trigram indexes are automatically used by PostgreSQL when appropriate
-- Example: Efficient combined query
SELECT fid, text, timestamp
FROM casts
WHERE text % 'defi'
AND similarity(text, 'defi') > 0.3
AND timestamp > extract(epoch from now() - interval '7 days') * 1000
ORDER BY timestamp DESC
LIMIT 50;Based on our analysis of the snapchain network, here's the distribution of user messages across different block ranges:
- Blocks 0-10: No user messages, only system messages
- Blocks 0-1000: No user messages found
- Blocks 5000-6000: No user messages found
- Blocks 10000-10100: No user messages found
- Blocks 50000-50100: No user messages found
- Blocks 625000-625100: No user messages found
- Blocks 1250000-1250100: β User messages found
- Blocks 2500000-2500100: β User messages found
- Blocks 5000000-5001000: β High user message activity
- Profile Creation Messages: Found Type 11 (UserDataAdd) messages
- Current Network Height: ~15,550,000 blocks
- User messages start around block 625,000+
- Early blocks contain only system messages (ValidatorMessage types)
- Profile creation/modification messages (Type 11) are common in higher blocks
- Block production rate: ~1 second per block
- Timestamp format: snapchain-specific timestamps (not Unix epoch)
- No user messages: 0-1000, 5000-6000, 10000-10100
- First user messages: 625000-625100
- Active user activity: 1250000-1250100, 2500000-2500100
- High activity: 5000000-5001000
- Rust 1.70+
- PostgreSQL 15+ with pgvector and pg_trgm extensions
- Remote database access
# 1. Clone the repository
git clone <repository-url>
cd snaprag
# 2. Build the project (no database required!)
cargo build
# 3. Create and configure config.toml
cp config.example.toml config.toml
# Edit config.toml and update the database.url field
# 4. Ensure required extensions are enabled on your remote database
# Connect to your database and run:
# CREATE EXTENSION IF NOT EXISTS vector;
# CREATE EXTENSION IF NOT EXISTS pg_trgm;
# 5. Run database migrations
make migrate
# 6. Run the application
cargo runFor production or development with remote databases:
# 1. Ensure your remote PostgreSQL has required extensions
# Connect to your remote database and run:
psql -h your-db-host -U your-username -d your-database
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
\q
# 2. Create and configure config.toml
cp config.example.toml config.toml
# Edit config.toml and update the database.url field with your connection string
# 3. Run database migrations
make migrate
# 4. Build and run
cargo build
cargo runIf you need to set up PostgreSQL locally for development:
# 1. Build the project first (no database required!)
cargo build
# 2. Install PostgreSQL and required extensions
# On Ubuntu/Debian:
sudo apt-get install postgresql-15 postgresql-15-pgvector postgresql-15-pgtrgm
# On macOS with Homebrew:
brew install postgresql@15 pgvector
# On CentOS/RHEL:
sudo yum install postgresql15-server postgresql15-contrib
# Then compile pgvector from source
# 3. Create database and user
sudo -u postgres psql
CREATE DATABASE snaprag;
CREATE USER snaprag WITH PASSWORD 'snaprag123';
GRANT ALL PRIVILEGES ON DATABASE snaprag TO snaprag;
\q
# 4. Connect to database and enable required extensions
psql -U snaprag -d snaprag -h localhost
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
\q
# 5. Create and configure config.toml
cp config.example.toml config.toml
# Edit config.toml and update database.url to: postgresql://snaprag:snaprag123@localhost/snaprag
# 6. Run migrations and start the application
make migrate
cargo runFor easier development workflow:
# Development Commands
make check-config # Check configuration file
make migrate # Run database migrations
make run # Run the application
make run-example # Run basic usage example
# Testing Commands
make test # Run all tests
# Build Commands
make build # Build the project
make build-release # Build in release mode
make clean # Clean build artifacts
# Code Quality Commands
make check # Run clippy and format checks
make fix # Fix clippy and format issues
make docs # Generate and open documentation
make bench # Run benchmarks| Command | Description |
|---|---|
make help |
Show all available commands |
make check-config |
Check configuration file |
make migrate |
Run database migrations |
make run |
Run the application |
make run-example |
Run basic usage example |
make test |
Run all tests |
make test-strict |
Run tests with strict settings (warnings as errors) |
make test-quick |
Run quick tests (unit tests only) |
make test-integration |
Run integration tests only |
make build |
Build the project |
make build-release |
Build in release mode |
make clean |
Clean build artifacts |
make check |
Run clippy and format checks |
make fix |
Fix clippy and format issues |
make docs |
Generate documentation |
make bench |
Run benchmarks |
SnapRAG can be used as a Rust library in your projects:
use snaprag::{SnapRag, AppConfig};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize
let config = AppConfig::load()?;
let snaprag = SnapRag::new(&config).await?;
// Query data
let profiles = snaprag.search_profiles("developer").await?;
let stats = snaprag.get_statistics().await?;
println!("Found {} developers, {} total users",
profiles.len(), stats.total_fids);
Ok(())
}// Profile semantic search
let results = snaprag.semantic_search_profiles(
"AI and blockchain developers",
10,
Some(0.7) // similarity threshold
).await?;
// Cast semantic search with engagement metrics
let casts = snaprag.semantic_search_casts(
"discussions about frames",
15,
Some(0.7)
).await?;
for cast in casts {
println!("{} ({}% match, {} replies, {} reactions)",
cast.text,
(cast.similarity * 100.0) as i32,
cast.reply_count,
cast.reaction_count
);
}// Create RAG service
let rag = snaprag.create_rag_service().await?;
// Natural language query
let response = rag.query("Find the most active builders on Farcaster").await?;
println!("Answer: {}", response.answer);
println!("Sources: {} profiles", response.sources.len());impl SnapRag {
// Initialization
pub async fn new(config: &AppConfig) -> Result<Self>;
pub async fn init_database(&self) -> Result<()>;
// Sync
pub async fn start_sync(&mut self) -> Result<()>;
pub async fn start_sync_with_range(&mut self, from: u64, to: u64) -> Result<()>;
// Queries
pub async fn get_profile(&self, fid: i64) -> Result<Option<UserProfile>>;
pub async fn search_profiles(&self, query: &str) -> Result<Vec<UserProfile>>;
pub async fn list_casts(&self, limit: Option<i64>) -> Result<Vec<Cast>>;
pub async fn get_user_activity(...) -> Result<Vec<UserActivityTimeline>>;
// Semantic Search
pub async fn semantic_search_profiles(...) -> Result<Vec<SearchResult>>;
pub async fn semantic_search_casts(...) -> Result<Vec<CastSearchResult>>;
// Services
pub async fn create_rag_service(&self) -> Result<RagService>;
pub fn create_embedding_service(&self) -> Result<Arc<EmbeddingService>>;
pub fn create_llm_service(&self) -> Result<Arc<LlmService>>;
// Embeddings
pub async fn backfill_profile_embeddings(&self, limit: Option<usize>) -> Result<...>;
pub async fn backfill_cast_embeddings(&self, limit: Option<usize>) -> Result<...>;
}# Run example code
cargo run --example simple_query
cargo run --example semantic_search
cargo run --example rag_query
cargo run --example custom_pipelineUser Query
β
γRetrievalγ
ββ Semantic Search (vector similarity)
ββ Keyword Search (text matching)
ββ Hybrid Search (RRF fusion)
ββ Auto Search (intelligent selection)
β
γRankingγ
ββ Vector Similarity Scoring
ββ RRF (Reciprocal Rank Fusion)
ββ Score Normalization
β
γContext Assemblyγ
ββ Profile/Cast Formatting
ββ Author Information
ββ Engagement Metrics
ββ Length Management (4096 tokens)
β
γGenerationγ
ββ Prompt Template
ββ LLM Query (OpenAI/Ollama)
ββ Streaming Response
β
Answer + Sources
| Method | Use Case | Performance |
|---|---|---|
| Semantic | Conceptual queries ("find AI developers") | ~10ms |
| Keyword | Exact matches (names, specific terms) | ~5ms |
| Hybrid | Complex queries (combines both with RRF) | ~15ms |
| Auto | Unknown - system chooses best method | Adaptive |
- Profile search: ~10ms (10K profiles)
- Cast search: ~50ms (100K casts)
- Embedding generation: ~200ms (OpenAI)
- Embedding backfill: ~50 casts/sec (5x parallel)
- Sync processing: 38% faster (batch optimization)
# Start historical sync (recommended for first run)
cargo run sync start
# Start with specific block range
cargo run sync start --from 1000000 --to 2000000
# Start real-time sync (after historical sync)
cargo run sync realtime
# Run both historical and real-time sync
cargo run sync all# Check sync status
cargo run sync status
# Stop running sync
cargo run sync stop
# Reset all data and start fresh
cargo run reset --force# List FIDs
cargo run list fid --limit 50
# List user profiles
cargo run list profiles --limit 20
# List casts
cargo run list casts --limit 100
# List follow relationships
cargo run list follows --limit 50use snaprag::models::*;
use snaprag::database::Database;
// Create a user profile
let create_request = CreateUserProfileRequest {
fid: 12345,
username: Some("alice".to_string()),
display_name: Some("Alice Smith".to_string()),
bio: Some("Blockchain enthusiast".to_string()),
message_hash: vec![1, 2, 3, 4, 5],
timestamp: 1640995200,
};
let profile = db.create_user_profile(create_request).await?;
// Update a profile
let update_request = UpdateUserProfileRequest {
fid: 12345,
data_type: UserDataType::Bio,
new_value: "Senior blockchain developer".to_string(),
message_hash: vec![6, 7, 8, 9, 10],
timestamp: 1640995800,
};
let updated_profile = db.update_user_profile(update_request).await?;
// Query historical data
let snapshot_query = ProfileSnapshotQuery {
fid: 12345,
start_timestamp: Some(1640995200),
end_timestamp: Some(1640995800),
limit: Some(10),
offset: None,
};
let snapshots = db.get_profile_snapshots(snapshot_query).await?;The system supports vector embeddings for semantic search:
-- Search for similar profiles
SELECT
fid,
username,
display_name,
bio,
(profile_embedding <=> query_embedding) as similarity_score
FROM user_profile_snapshots
WHERE (profile_embedding <=> query_embedding) < 0.8
ORDER BY similarity_score
LIMIT 20;- PostgreSQL Core: Robust database foundation for RAG applications
- Vector Search Engine: pgvector integration for semantic similarity
- Query Interface: Rich database query capabilities and analytics
- Data Synchronization: Complete Farcaster data sync from snapchain
- RAG Application Layer: Ready-to-use foundation for building RAG apps
- SyncService: Orchestrates the synchronization process
- ShardProcessor: Processes individual shard chunks
- SnapchainClient: Communicates with snapchain gRPC service
- StateManager: Manages sync state persistence
- SyncLockFile: Tracks running sync processes
- PID Management: Prevents concurrent sync operations
- Progress Tracking: Real-time sync progress monitoring
- Graceful Shutdown: Clean process termination
- SQLx Integration: Async PostgreSQL operations
- Migration System: Schema versioning and updates
- Connection Pooling: Efficient database connections
- Vector Support: pgvector integration for semantic search
- Query Engine: Advanced database query capabilities
# Run all tests
cargo test
# Run strict tests (recommended for development)
make test-strict
# Run quick tests (unit tests only)
make test-quick
# Run integration tests only
make test-integration
# Run specific test categories
cargo test integration_sync_test
cargo test grpc_shard_chunks_test
cargo test database_tests
# Run with verbose output
cargo test -- --nocaptureSnapRAG includes a comprehensive strict testing setup that ensures high code quality:
- Smart Warning Handling: Automatically distinguishes between generated code and hand-written code warnings
- Timeout Protection: Prevents tests from hanging indefinitely
- Comprehensive Validation: Tests strict configuration functionality
- Intelligent Error Detection: Differentiates between actual test failures and generated code warnings
# Run strict tests with intelligent warning handling
./scripts/run_strict_tests.sh
# Or use the Makefile target
make test-strict- Integration Tests: End-to-end CLI functionality testing
- gRPC Tests: Real snapchain service interaction tests
- Database Tests: Database operations and schema tests
- Unit Tests: Individual component testing
- Strict Validation Tests: Test configuration and warning handling
Pfp: Profile PictureDisplay: Display NameBio: Bio/DescriptionUrl: Website URLUsername: UsernameLocation: LocationTwitter: Twitter usernameGithub: GitHub usernameBanner: Banner imagePrimaryAddressEthereum: Ethereum addressPrimaryAddressSolana: Solana addressProfileToken: Profile token (CAIP-19)
Fname: Farcaster nameEnsL1: ENS L1Basename: Basename
Create a config.toml file in your project root (copy from config.example.toml):
# Database Configuration
[database]
url = "postgresql://username:password@your-db-host:5432/your-database"
max_connections = 20
min_connections = 5
connection_timeout = 30
# Snapchain Configuration
[snapchain]
http_endpoint = "http://your-snapchain-host:8080"
grpc_endpoint = "your-snapchain-host:8080"
# Logging Configuration
[logging]
level = "info"
backtrace = true
# Embeddings Configuration
[embeddings]
dimension = 1536
model = "text-embedding-ada-002"
# Performance Configuration
[performance]
enable_vector_indexes = true
vector_index_lists = 100- No Data Loss: All historical data is preserved
- Efficient Queries: Current state is optimized for fast access
- Complete Audit Trail: Every change is tracked with timestamps and message hashes
- Vector Support: Built-in support for semantic search and RAG applications
- Snapshot-based History: Complete profile snapshots at each change point
# Check if your remote database is accessible
psql -h your-db-host -U your-username -d your-database -c "SELECT 1;"
# Check network connectivity
ping your-db-host
# Verify configuration file exists and is valid
ls -la config.toml
cargo run --bin migrate # This will show detailed connection info# Connect to your remote database and enable required extensions:
psql -h your-db-host -U your-username -d your-database -c "CREATE EXTENSION IF NOT EXISTS vector;"
psql -h your-db-host -U your-username -d your-database -c "CREATE EXTENSION IF NOT EXISTS pg_trgm;"
# If extensions are not installed on your remote database, contact your database administrator
# or install them on your local development environment:
sudo apt-get install postgresql-15-pgvector postgresql-15-pgtrgm # Ubuntu/Debian
brew install pgvector # macOS (pg_trgm is included in PostgreSQL contrib)# Check if sync is running
cargo run sync status
# Stop any running sync
cargo run sync stop
# Reset and start fresh
cargo run reset --force
cargo run sync start# Check snapchain endpoint connectivity
curl http://your-snapchain-host:8080/v1/info
# Verify gRPC endpoint
telnet your-snapchain-host 8080Add to your postgresql.conf:
# Memory settings
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 4MB
# Vector-specific settings
max_connections = 200
shared_preload_libraries = 'vector'
# Index settings
maintenance_work_mem = 64MB
// In your application code
let pool = PgPool::builder()
.max_connections(20)
.min_connections(5)
.acquire_timeout(Duration::from_secs(30))
.build(&database_url)
.await?;sqlx: Async PostgreSQL driverserde: Serialization/deserializationchrono: Date/time handlinguuid: UUID generationpgvector: Vector similarity searchtokio: Async runtimeanyhow: Error handlingthiserror: Custom error typestonic: gRPC clientprost: Protocol buffersreqwest: HTTP clientlibc: System calls
This project is licensed under the GPTv3 License - see the LICENSE file for details.
SnapRAG follows strict development standards to ensure high code quality:
- Strict Testing: All tests must pass with zero warnings (except generated code)
- Code Formatting: Automatic formatting with
rustfmt - Linting: Comprehensive clippy checks with strict settings
- Documentation: All public APIs must be documented
# Set up development environment
make check-config # Verify configuration
make migrate # Set up database
# Development workflow
make test-strict # Run strict tests (recommended)
make check # Run clippy and format checks
make fix # Auto-fix formatting and clippy issues
make docs # Generate documentation
# Before committing
make test-strict && make check && make docsSnapRAG includes comprehensive Cursor IDE rules for enhanced development experience:
- Project-specific rules: Tailored for Farcaster data synchronization
- CI/CD guidelines: Automated testing and deployment rules
- Rust standards: Best practices for Rust development
- Notification system: Task completion notifications
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes following the development workflow
- Add tests for new functionality
- Run the strict test suite (
make test-strict) - Ensure all checks pass (
make check) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Submit a pull request
- Ensure all tests pass with
make test-strict - Follow the existing code style and formatting
- Add documentation for new public APIs
- Include tests for new functionality
- Update README if adding new features or changing behavior
Why formatting issues occur:
- Generated code (
src/generated/_.rs) is auto-generated during build and may need formatting - Manual edits without formatting before commit
- Different Rust versions may format code differently
How to prevent:
- β Pre-commit hook automatically formats code before commit (already installed)
- β
Run
cargo fmt --allafter building (generated code changes) - β
Run
make fixbefore committing - β
Use
cargo fmt --all -- --checkto verify formatting
If CI fails on formatting:
cargo fmt --all
git add .
git commit -m "chore: fix formatting"
git pushFor questions, issues, or contributions, please open an issue on the GitHub repository.
- π Documentation: Check this README and inline code documentation
- π Bug Reports: Use GitHub issues with detailed reproduction steps
- π‘ Feature Requests: Open a GitHub issue with use case description
- π¬ Discussions: Use GitHub Discussions for questions and ideas
