Architecture Migration: Support both Cloudflare Serverless and Self-hosted Options

## Problem Statement

The current architecture has limitations with Supabase's worker constraints:
- **Throughput limits**: Supabase has limited throughput for high-volume mining
- **Worker duration**: Edge Functions have execution time limits (not suitable for long-running mining)
- **Egress costs**: High data transfer costs from Supabase
- **Redis dependency**: Currently relies on Redis for queue management

### Current Architecture Issues
- Mining processes run on Express backend with Redis as message queue
- All data flows through Supabase PostgreSQL (including temporary mining data)
- Limited scalability for concurrent mining sessions
- No separation between permanent data and temporary mining data

---

## Proposed Solution: Two Independent Modes

**Goal**: Support two mutually exclusive deployment modes:
- **Cloudflare** - Production/SaaS deployment (leadminer.io)
- **PostgreSQL** - Dev/Local/Self-hosted deployment

### Mode A: Cloudflare Serverless (Production - leadminer.io)
```
┌──────────────┐    ┌──────────────┐    ┌──────────────────┐
│   IMAP       │───▶│ Email Fetcher │───▶│  Cloudflare R2   │
│   Server     │    │  (Bun/Node)   │    │  (Raw emails)    │
└──────────────┘    └──────┬───────┘    └────────┬─────────┘
                           │                      │
                           ▼                      ▼
                  ┌────────────────┐    ┌──────────────────┐
                  │ Cloudflare     │    │  D1 (SQL)        │
                  │ Queues        │───▶│  (Temp mining    │
                  │ (Work Queue)  │    │   tables)        │
                  └────────┬───────┘    └────────┬─────────┘
                           │                      │
                           ▼                      │
                  ┌────────────────┐               │
                  │ Cloudflare     │───────────────┘
                  │ Workers        │    Write refined contacts
                  │ (Processors)   │
                  └────────┬───────┘
                           │
                           ▼
                  ┌────────────────┐
                  │  Supabase      │
                  │  PostgreSQL   │
                  │  (Permanent)  │
                  └────────────────┘
```

**Components:**
| Component | Solution | Cost |
|-----------|----------|------|
| Mining Workers | Cloudflare Workers | $5-50/mo |
| Work Queue | Cloudflare Queues | $5-20/mo |
| Raw Email Storage | Cloudflare R2 | $5-15/mo |
| Temp Tables | Cloudflare D1 | $5-15/mo |
| Permanent Data | Supabase PostgreSQL | $25/mo |

**Total: ~$45-125/mo**

### Mode B: PostgreSQL (Dev/Local/Self-Hosted)
```
┌──────────────┐    ┌──────────────┐    ┌──────────────────┐
│   IMAP       │───▶│ Email Fetcher │───▶│  PostgreSQL      │
│   Server     │    │  (Bun/Node)   │    │  (Temp mining   │
└──────────────┘    └──────┬───────┘    │   separate DB)   │
                           │              └────────┬─────────┘
                           ▼                       │
                  ┌────────────────┐               │
                  │ Redis Streams  │───────────────┘
                  │ (Message Queue)│    Write refined contacts
                  └────────┬───────┘
                           │
                           ▼
                  ┌────────────────┐
                  │  PostgreSQL    │
                  │  (Main DB)     │
                  └────────────────┘
```

**Components:**
| Component | Solution | Cost |
|-----------|----------|------|
| Mining Workers | Express + Bull/Redis | Included |
| Work Queue | Redis Streams | $0 (self-hosted) |
| Temp Tables | Separate PostgreSQL DB | $0 (self-hosted) |
| Permanent Data | PostgreSQL (main DB) | Included |

**Total: ~$20-50/mo (infrastructure only)**

---

## Data Separation Strategy

### What Goes Where (Both Modes)

| Data Type | Mining Temp | Permanent | Cloudflare | PostgreSQL |
|-----------|------------|-----------|------------|------------|
| Raw emails | Yes | No | R2 | File storage |
| messages table | Yes | No | D1 | Temp DB |
| pointsofcontact | Yes | No | D1 | Temp DB |
| persons (during mining) | Yes | No | D1 | Temp DB |
| refined_persons | No | Yes | Supabase | Main DB |
| users, profiles | No | Yes | Supabase | Main DB |
| organizations | No | Yes | Supabase | Main DB |
| campaigns | No | Yes | Supabase | Main DB |

### Benefits of Separation
1. **Supabase load reduced** - Only refined contacts hit permanent DB
2. **Mining scales independently** - External storage handles GB bursts
3. **Cost optimization** - Reduced egress on Supabase
4. **Performance** - Workers process faster without DB latency

---

## CI/CD: Cloudflare Deployment (leadminer.io)

### Required GitHub Actions

```yaml
# .github/workflows/cloudflare-deploy.yml
name: Deploy to Cloudflare

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  deploy-workers:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Deploy Mining Workers
        uses: cloudflare/wrangler-action@v3
        with:
          apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          accountId: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
          command: deploy --env production

  deploy-queues:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Deploy Queues
        uses: cloudflare/wrangler-action@v3
        with:
          apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
          command: queues deploy

  deploy-d1:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Deploy D1 Database
        run: |
          wrangler d1 execute mining-temp --local=${{ !github.event.pull_request }} --env production

  deploy-r2:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure R2
        run: wrangler r2 bucket create mining-emails
```

### Required Secrets (GitHub Repository)

| Secret | Description |
|--------|-------------|
| `CLOUDFLARE_API_TOKEN` | Cloudflare API token with Workers edit permissions |
| `CLOUDFLARE_ACCOUNT_ID` | Cloudflare account ID |
| `CLOUDFLARE_R2_ACCESS_KEY_ID` | R2 access key |
| `CLOUDFLARE_R2_SECRET_ACCESS_KEY` | R2 secret key |
| `SUPABASE_SERVICE_KEY` | Supabase service role key |

### Environment Files per Environment

| Environment | Workers | D1 | R2 | Queues |
|-------------|---------|----|----|----|
| Production (leadminer.io) | mining-workers-prod | mining-temp-prod | mining-emails-prod | mining-queue-prod |
| Staging | mining-workers-staging | mining-temp-staging | mining-emails-staging | mining-queue-staging |

### Deployment Pipeline

```
┌─────────────────────────────────────────────────────────────┐
│                    CI/CD Pipeline                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Push to main ──▶ Build ──▶ Test ──▶ Deploy Workers        │
│       │              │         │         │                  │
│       │              │         │         ▼                  │
│       │              │         │    Deploy Queues         │
│       │              │         │         │                  │
│       │              │         │         ▼                  │
│       │              │         │    Deploy D1              │
│       │              │         │         │                  │
│       │              │         │         ▼                  │
│       │              │         │    Deploy R2              │
│       │              │         │         │                  │
│       ▼              ▼         ▼         ▼                  │
│  Tag Release    Lint     Unit      Deploy Complete        │
│                  Tests    Tests                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

---

## Implementation: Simple Env Toggle

No feature flags - just environment configuration selecting the mode.

```typescript
// Simple toggle - one or the other
STORAGE_MODE=cloudflare  // or: postgresql

// Cloudflare mode bindings
CLOUDFLARE_R2_BUCKET=
CLOUDFLARE_D1_DATABASE=
CLOUDFLARE_QUEUES=

// PostgreSQL mode (existing)
POSTGRES_CONNECTION_STRING=
REDIS_URL=
```

---

## Migration Phases

### Phase 1: Data Schema Separation
- [ ] Create D1 schema for temp mining tables (parallel to PostgreSQL existing)
- [ ] Add mining_id to track data lifecycle

### Phase 2: Storage Abstraction Layer
- [ ] Create storage interface with Cloudflare/PostgreSQL adapters
- [ ] Implement both adapter implementations
- [ ] Add environment-based mode selection

### Phase 3: Cloudflare Mode Implementation
- [ ] Set up Cloudflare Workers for mining
- [ ] Configure R2 + D1 bindings
- [ ] Set up Cloudflare Queues
- [ ] Connect to Supabase for permanent data

### Phase 4: PostgreSQL Mode Implementation
- [ ] Verify existing PostgreSQL path works unchanged
- [ ] Test Redis queue integration

### Phase 5: CI/CD Setup
- [ ] Create GitHub Actions workflow for Cloudflare deployment
- [ ] Configure secrets in GitHub repository
- [ ] Set up environment promotion (staging → production)
- [ ] Test deployment pipeline

### Phase 6: Documentation
- [ ] Document deployment options
- [ ] Add environment variable reference
- [ ] Update README

---

## Key Decisions (Resolved)

| Question | Answer |
|----------|--------|
| Which path to prioritize? | Cloudflare (primary) - PostgreSQL for dev/local/selfhost only |
| Timeline? | As short as possible |
| Feature parity? | Both modes support identical features |
| Data migration? | None - separate schemas from start |
| Rollback strategy? | Not needed - two independent modes |
| Feature flags? | None - simple env-based toggle |
| CI/CD? | GitHub Actions → Cloudflare Workers (leadminer.io) |

---

## Related Files

- `backend/src/workers/` - Current worker implementations
- `backend/src/db/pg/` - Database interfaces
- `backend/src/utils/redis.ts` - Redis utilities
- `supabase/migrations/` - Schema definitions
- `.github/workflows/` - CI/CD workflows

---

## Next Steps

1. Approve this plan
2. Start Phase 1: Data Schema Separation
3. Proceed iteratively through each phase

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Migration: Support both Cloudflare Serverless and Self-hosted Options #2753

Problem Statement

Current Architecture Issues

Proposed Solution: Two Independent Modes

Mode A: Cloudflare Serverless (Production - leadminer.io)

Mode B: PostgreSQL (Dev/Local/Self-Hosted)

Data Separation Strategy

What Goes Where (Both Modes)

Benefits of Separation

CI/CD: Cloudflare Deployment (leadminer.io)

Required GitHub Actions

Required Secrets (GitHub Repository)

Environment Files per Environment

Deployment Pipeline

Implementation: Simple Env Toggle

Migration Phases

Phase 1: Data Schema Separation

Phase 2: Storage Abstraction Layer

Phase 3: Cloudflare Mode Implementation

Phase 4: PostgreSQL Mode Implementation

Phase 5: CI/CD Setup

Phase 6: Documentation

Key Decisions (Resolved)

Related Files

Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Solution	Cost
Mining Workers	Cloudflare Workers	$5-50/mo
Work Queue	Cloudflare Queues	$5-20/mo
Raw Email Storage	Cloudflare R2	$5-15/mo
Temp Tables	Cloudflare D1	$5-15/mo
Permanent Data	Supabase PostgreSQL	$25/mo

Component	Solution	Cost
Mining Workers	Express + Bull/Redis	Included
Work Queue	Redis Streams	$0 (self-hosted)
Temp Tables	Separate PostgreSQL DB	$0 (self-hosted)
Permanent Data	PostgreSQL (main DB)	Included

Data Type	Mining Temp	Permanent	Cloudflare	PostgreSQL
Raw emails	Yes	No	R2	File storage
messages table	Yes	No	D1	Temp DB
pointsofcontact	Yes	No	D1	Temp DB
persons (during mining)	Yes	No	D1	Temp DB
refined_persons	No	Yes	Supabase	Main DB
users, profiles	No	Yes	Supabase	Main DB
organizations	No	Yes	Supabase	Main DB
campaigns	No	Yes	Supabase	Main DB

Secret	Description
`CLOUDFLARE_API_TOKEN`	Cloudflare API token with Workers edit permissions
`CLOUDFLARE_ACCOUNT_ID`	Cloudflare account ID
`CLOUDFLARE_R2_ACCESS_KEY_ID`	R2 access key
`CLOUDFLARE_R2_SECRET_ACCESS_KEY`	R2 secret key
`SUPABASE_SERVICE_KEY`	Supabase service role key

Environment	Workers	D1	R2	Queues
Production (leadminer.io)	mining-workers-prod	mining-temp-prod	mining-emails-prod	mining-queue-prod
Staging	mining-workers-staging	mining-temp-staging	mining-emails-staging	mining-queue-staging

Question	Answer
Which path to prioritize?	Cloudflare (primary) - PostgreSQL for dev/local/selfhost only
Timeline?	As short as possible
Feature parity?	Both modes support identical features
Data migration?	None - separate schemas from start
Rollback strategy?	Not needed - two independent modes
Feature flags?	None - simple env-based toggle
CI/CD?	GitHub Actions → Cloudflare Workers (leadminer.io)

Architecture Migration: Support both Cloudflare Serverless and Self-hosted Options #2753

Description

Problem Statement

Current Architecture Issues

Proposed Solution: Two Independent Modes

Mode A: Cloudflare Serverless (Production - leadminer.io)

Mode B: PostgreSQL (Dev/Local/Self-Hosted)

Data Separation Strategy

What Goes Where (Both Modes)

Benefits of Separation

CI/CD: Cloudflare Deployment (leadminer.io)

Required GitHub Actions

Required Secrets (GitHub Repository)

Environment Files per Environment

Deployment Pipeline

Implementation: Simple Env Toggle

Migration Phases

Phase 1: Data Schema Separation

Phase 2: Storage Abstraction Layer

Phase 3: Cloudflare Mode Implementation

Phase 4: PostgreSQL Mode Implementation

Phase 5: CI/CD Setup

Phase 6: Documentation

Key Decisions (Resolved)

Related Files

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions