Skip to content

Architecture Migration: Support both Cloudflare Serverless and Self-hosted Options #2753

@baderdean

Description

@baderdean

Problem Statement

The current architecture has limitations with Supabase's worker constraints:

  • Throughput limits: Supabase has limited throughput for high-volume mining
  • Worker duration: Edge Functions have execution time limits (not suitable for long-running mining)
  • Egress costs: High data transfer costs from Supabase
  • Redis dependency: Currently relies on Redis for queue management

Current Architecture Issues

  • Mining processes run on Express backend with Redis as message queue
  • All data flows through Supabase PostgreSQL (including temporary mining data)
  • Limited scalability for concurrent mining sessions
  • No separation between permanent data and temporary mining data

Proposed Solution: Two Independent Modes

Goal: Support two mutually exclusive deployment modes:

  • Cloudflare - Production/SaaS deployment (leadminer.io)
  • PostgreSQL - Dev/Local/Self-hosted deployment

Mode A: Cloudflare Serverless (Production - leadminer.io)

┌──────────────┐    ┌──────────────┐    ┌──────────────────┐
│   IMAP       │───▶│ Email Fetcher │───▶│  Cloudflare R2   │
│   Server     │    │  (Bun/Node)   │    │  (Raw emails)    │
└──────────────┘    └──────┬───────┘    └────────┬─────────┘
                           │                      │
                           ▼                      ▼
                  ┌────────────────┐    ┌──────────────────┐
                  │ Cloudflare     │    │  D1 (SQL)        │
                  │ Queues        │───▶│  (Temp mining    │
                  │ (Work Queue)  │    │   tables)        │
                  └────────┬───────┘    └────────┬─────────┘
                           │                      │
                           ▼                      │
                  ┌────────────────┐               │
                  │ Cloudflare     │───────────────┘
                  │ Workers        │    Write refined contacts
                  │ (Processors)   │
                  └────────┬───────┘
                           │
                           ▼
                  ┌────────────────┐
                  │  Supabase      │
                  │  PostgreSQL   │
                  │  (Permanent)  │
                  └────────────────┘

Components:

Component Solution Cost
Mining Workers Cloudflare Workers $5-50/mo
Work Queue Cloudflare Queues $5-20/mo
Raw Email Storage Cloudflare R2 $5-15/mo
Temp Tables Cloudflare D1 $5-15/mo
Permanent Data Supabase PostgreSQL $25/mo

Total: ~$45-125/mo

Mode B: PostgreSQL (Dev/Local/Self-Hosted)

┌──────────────┐    ┌──────────────┐    ┌──────────────────┐
│   IMAP       │───▶│ Email Fetcher │───▶│  PostgreSQL      │
│   Server     │    │  (Bun/Node)   │    │  (Temp mining   │
└──────────────┘    └──────┬───────┘    │   separate DB)   │
                           │              └────────┬─────────┘
                           ▼                       │
                  ┌────────────────┐               │
                  │ Redis Streams  │───────────────┘
                  │ (Message Queue)│    Write refined contacts
                  └────────┬───────┘
                           │
                           ▼
                  ┌────────────────┐
                  │  PostgreSQL    │
                  │  (Main DB)     │
                  └────────────────┘

Components:

Component Solution Cost
Mining Workers Express + Bull/Redis Included
Work Queue Redis Streams $0 (self-hosted)
Temp Tables Separate PostgreSQL DB $0 (self-hosted)
Permanent Data PostgreSQL (main DB) Included

Total: ~$20-50/mo (infrastructure only)


Data Separation Strategy

What Goes Where (Both Modes)

Data Type Mining Temp Permanent Cloudflare PostgreSQL
Raw emails Yes No R2 File storage
messages table Yes No D1 Temp DB
pointsofcontact Yes No D1 Temp DB
persons (during mining) Yes No D1 Temp DB
refined_persons No Yes Supabase Main DB
users, profiles No Yes Supabase Main DB
organizations No Yes Supabase Main DB
campaigns No Yes Supabase Main DB

Benefits of Separation

  1. Supabase load reduced - Only refined contacts hit permanent DB
  2. Mining scales independently - External storage handles GB bursts
  3. Cost optimization - Reduced egress on Supabase
  4. Performance - Workers process faster without DB latency

CI/CD: Cloudflare Deployment (leadminer.io)

Required GitHub Actions

# .github/workflows/cloudflare-deploy.yml
name: Deploy to Cloudflare

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  deploy-workers:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Deploy Mining Workers
        uses: cloudflare/wrangler-action@v3
        with:
          apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          accountId: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          command: deploy --env production

  deploy-queues:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Deploy Queues
        uses: cloudflare/wrangler-action@v3
        with:
          apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          command: queues deploy

  deploy-d1:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Deploy D1 Database
        run: |
          wrangler d1 execute mining-temp --local=${{ !github.event.pull_request }} --env production

  deploy-r2:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure R2
        run: wrangler r2 bucket create mining-emails

Required Secrets (GitHub Repository)

Secret Description
CLOUDFLARE_API_TOKEN Cloudflare API token with Workers edit permissions
CLOUDFLARE_ACCOUNT_ID Cloudflare account ID
CLOUDFLARE_R2_ACCESS_KEY_ID R2 access key
CLOUDFLARE_R2_SECRET_ACCESS_KEY R2 secret key
SUPABASE_SERVICE_KEY Supabase service role key

Environment Files per Environment

Environment Workers D1 R2 Queues
Production (leadminer.io) mining-workers-prod mining-temp-prod mining-emails-prod mining-queue-prod
Staging mining-workers-staging mining-temp-staging mining-emails-staging mining-queue-staging

Deployment Pipeline

┌─────────────────────────────────────────────────────────────┐
│                    CI/CD Pipeline                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Push to main ──▶ Build ──▶ Test ──▶ Deploy Workers        │
│       │              │         │         │                  │
│       │              │         │         ▼                  │
│       │              │         │    Deploy Queues         │
│       │              │         │         │                  │
│       │              │         │         ▼                  │
│       │              │         │    Deploy D1              │
│       │              │         │         │                  │
│       │              │         │         ▼                  │
│       │              │         │    Deploy R2              │
│       │              │         │         │                  │
│       ▼              ▼         ▼         ▼                  │
│  Tag Release    Lint     Unit      Deploy Complete        │
│                  Tests    Tests                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Implementation: Simple Env Toggle

No feature flags - just environment configuration selecting the mode.

// Simple toggle - one or the other
STORAGE_MODE=cloudflare  // or: postgresql

// Cloudflare mode bindings
CLOUDFLARE_R2_BUCKET=
CLOUDFLARE_D1_DATABASE=
CLOUDFLARE_QUEUES=

// PostgreSQL mode (existing)
POSTGRES_CONNECTION_STRING=
REDIS_URL=

Migration Phases

Phase 1: Data Schema Separation

  • Create D1 schema for temp mining tables (parallel to PostgreSQL existing)
  • Add mining_id to track data lifecycle

Phase 2: Storage Abstraction Layer

  • Create storage interface with Cloudflare/PostgreSQL adapters
  • Implement both adapter implementations
  • Add environment-based mode selection

Phase 3: Cloudflare Mode Implementation

  • Set up Cloudflare Workers for mining
  • Configure R2 + D1 bindings
  • Set up Cloudflare Queues
  • Connect to Supabase for permanent data

Phase 4: PostgreSQL Mode Implementation

  • Verify existing PostgreSQL path works unchanged
  • Test Redis queue integration

Phase 5: CI/CD Setup

  • Create GitHub Actions workflow for Cloudflare deployment
  • Configure secrets in GitHub repository
  • Set up environment promotion (staging → production)
  • Test deployment pipeline

Phase 6: Documentation

  • Document deployment options
  • Add environment variable reference
  • Update README

Key Decisions (Resolved)

Question Answer
Which path to prioritize? Cloudflare (primary) - PostgreSQL for dev/local/selfhost only
Timeline? As short as possible
Feature parity? Both modes support identical features
Data migration? None - separate schemas from start
Rollback strategy? Not needed - two independent modes
Feature flags? None - simple env-based toggle
CI/CD? GitHub Actions → Cloudflare Workers (leadminer.io)

Related Files

  • backend/src/workers/ - Current worker implementations
  • backend/src/db/pg/ - Database interfaces
  • backend/src/utils/redis.ts - Redis utilities
  • supabase/migrations/ - Schema definitions
  • .github/workflows/ - CI/CD workflows

Next Steps

  1. Approve this plan
  2. Start Phase 1: Data Schema Separation
  3. Proceed iteratively through each phase

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions