Skip to content

ruwadgroup/sabit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Sabit

Verified, not guessed.

A self-hostable email-verification engine in Rust. A layered, free-before-expensive pipeline (syntax → IDN/EAI → typo → gibberish → disposable/role → DNS/MX/null-MX → auth records → SMTP probe → catch-all) emits a 0–100 confidence score plus a coarse safe / risky / invalid / unknown status. The engine runs inside the sabitd server daemon. Reach it three ways: the sabit CLI, an HTTP API, and an MCP server for AI agents, all returning the same result.

License: Apache-2.0 CI Core: Rust MCP Conventional Commits

Getting started · Usage · MCP · CLI · API · Architecture · Docs

Table of contents

Overview

Email verification is sold as a binary checkbox ("is this address valid?"), and that framing is why every vendor's marketed 98–99% accuracy collapses in independent tests. Hunter.io's 2026 benchmark of 15 verifiers on ~3,000 real business addresses found no tool scored above 75%; the top three were Hunter (70.00%), Clearout (68.37%), and Kickbox (67.53%). The hard part isn't syntax. Catch-all domains make per-mailbox SMTP probing meaningless (15–28% of B2B domains are catch-all), Yahoo returns 250 for every address as an anti-harvesting measure, and probing from a dirty IP gets you blocklisted.

Sabit treats verification as what it is: a multi-signal probabilistic risk-scoring pipeline, not a yes/no checker. Fast, free, local checks run ahead of the slow, fragile, network-bound ones; every signal is weighed into one score; and risky and unknown are first-class outcomes, not failures.

Use it however you work: the sabit CLI for scripts and one-off checks, the HTTP API for your backend or signup flow, or the MCP server so an AI agent can decide, with evidence, whether an address is safe to email. Every interface talks to the same sabitd engine and returns the same result object.

Features

  • Layered, free-before-expensive pipeline. Every check that can run offline runs first; the network is the last resort, and a null-MX or no-MX+no-A domain short-circuits to invalid before any probe.
  • 0–100 confidence score + coarse status. safe / risky / invalid / unknown, modeled on Reacher's taxonomy so existing tooling maps cleanly. You set the threshold per use case: strict for cold outreach, lenient for signup forms.
  • Three interfaces, one result. A CLI, an HTTP API, and an MCP server call the same engine and return the same object. The MCP tools (verify, submit_batch, task_status, auth_check, parse_bounce, doctor) are honest by design: a tool never returns safe for an address the protocol couldn't confirm.
  • Never sends mail. The prober opens TCP to the MX, runs EHLO → MAIL FROM → RCPT TO → QUIT, and reads the reply code. No DATA is transmitted, so no message is delivered.
  • Catch-all aware. Repeated randomized probes detect accept-all domains and downgrade the result to risky instead of a false valid.
  • Disposable & role detection, kept fresh. A community blocklist (7k+ domains) plus MX-fingerprint clustering and role/free-webmail classification. The daemon refreshes the volatile lists from source on a schedule (or sabit lists update).
  • Deliverability built in. SPF/DKIM/DMARC/BIMI reading, DSN/NDR bounce parsing (RFC 3464/3463), and ARF feedback-loop parsing (RFC 5965).
  • Self-hosted and private. Runs on your own unblacklisted VPS; nothing about what you verify leaves your server; no telemetry.

Why Sabit

Most of the field is closed SaaS that hides its method and oversells its accuracy. The strongest open prior art, Reacher / check-if-email-exists, proved the modern technique: SMTP for domains where it still works, headless provider-specific checks where it doesn't, and per-domain rule overrides. Sabit builds on that, adds an explicit weighted scoring model and a deliverability layer, and exposes it through a CLI, an HTTP API, and an MCP server so it fits a script, a backend, or an AI agent.

Principle How Sabit applies it
Cheapest signal first Local layers (syntax/typo/gibberish/disposable) resolve obvious input before any network call
Probabilistic, not binary Every signal is weighted into a 0–100 score; risky/unknown exist for the gray zone
Honest about uncertainty Catch-all, anti-harvesting 250s, and greylisting are reported as such, never hidden as valid
Built to integrate One result contract over a CLI, an HTTP API, and an MCP server: evidence and confidence
Responsible probing by default FCrDNS-checked HELO, real MAIL FROM, per-domain caps, backoff, caching, no DATA

What Sabit is not

  • Not a sender. Sabit verifies and scores; it never delivers mail. The deliverability layer parses bounces and reads auth records, but the only thing it sends a target MX is an envelope it abandons before DATA.
  • Not a guarantee. The only ground truth for deliverability and ownership is double opt-in. Everything else is inference; Sabit expresses confidence rather than faking certainty.
  • Not a list-buying or scraping tool. It has no address-generation or harvesting features; dictionary attacks and list scraping are the abuse it is designed against.
  • Not a hosted SaaS. It is software you run. There is no Sabit cloud, no per-check billing, and no telemetry.

Architecture

   CLIENTS (thin: no verification logic, just point at a sabitd over HTTP)
   ┌──────────────────────────────┐   ┌──────────────────────────────┐
   │ sabit-mcp  (MCP, for agents) │   │ sabit  (CLI: verify/batch/…)  │
   └───────────────┬──────────────┘   └───────────────┬──────────────┘
                   └───────────────┬──────────────────┘
                          HTTPS (REST + task polling)
                                   │
   ════════════════════════════════▼═══════════════════════════════════  server boundary
   ┌──────────────────────────────────────────────────────────────────┐
   │  sabitd: the daemon (runs where probing is possible: port 25,     │
   │  clean IP, PTR/FCrDNS). HTTP API + task/job worker (sabit-server). │
   ├──────────────────────────────────────────────────────────────────┤
   │  sabit-core (Rust, the only place behavior lives)                  │
   │   Local   • Syntax (RFC 5321) • Typo • IDN→Punycode • Gibberish    │
   │           • Disposable + MX-fingerprint • Role / free classify     │
   │   Network • DNS: MX+A/AAAA, null-MX, SPF/DKIM/DMARC/BIMI (hickory) │
   │           • SMTP probe: RCPT-TO, no DATA, greylist retry           │
   │           • Catch-all detection • Per-provider routing             │
   │           • Bounce/DSN parser   • Feedback-loop ingester           │
   │   Scoring • Weighted 0–100 score → {safe|risky|invalid|unknown}    │
   └──────────────────────────────────────────────────────────────────┘

All verification behavior lives in sabit-core, which runs only inside sabitd, the server daemon on a host that can probe (outbound port 25, a clean IP, matching PTR/FCrDNS). The sabit CLI and sabit-mcp are thin: they translate transport and relay the result contract, so a laptop or an agent host needs nothing special. The core splits into a local half (pure, offline, deterministic; verify --local) and a network half (DNS + SMTP). Full reasoning is in ARCHITECTURE.md.

The result contract

Every interface returns the same object. A verify of one address:

{
  "input": "jane.doe@example.com",
  "status": "safe",
  "score": 96,
  "syntax": {
    "valid": true,
    "normalized": "jane.doe@example.com",
    "domain": "example.com",
    "is_idn": false,
    "is_eai": false
  },
  "mx": { "accepts_mail": true, "null_mx": false, "records": ["aspmx.l.example.com"] },
  "smtp": {
    "can_connect": true,
    "is_deliverable": true,
    "is_catch_all": false,
    "has_full_inbox": false,
    "is_disabled": false,
    "verif_method": "smtp"
  },
  "misc": { "is_disposable": false, "is_role_account": false, "is_free": false },
  "auth": { "spf": true, "dkim": true, "dmarc": "p=reject", "bimi": false },
  "signals": { "gibberish": 0.02, "typo_suggestion": null, "greylisted": false },
  "took_ms": 812
}

status is the coarse verdict; score is the tunable number; signals and the per-layer blocks are the evidence. Field reference: the result object and How it works.

Getting started

The installer pulls prebuilt binaries; you don't build anything.

1. Run the server on a probing-capable host

sabitd needs outbound port 25, a clean (unblacklisted) IP, and matching PTR/FCrDNS, so run it on a VPS, not a laptop.

curl -fsSL https://raw.githubusercontent.com/ruwadgroup/sabit/main/install.sh | sh -s -- daemon

This installs sabitd, a systemd unit, /etc/sabit/sabit.toml, and a sabit system user. Set your hostname and turn on auth before exposing it:

sudoedit /etc/sabit/sabit.toml
#   [smtp]   helo = "sabit.example.com"            # this host's FQDN, matching its PTR
#   [server] api_key_file = "/etc/sabit/keys.txt"  # one `key` or `key:label` per line
sudo systemctl enable --now sabitd
sabit doctor --server http://127.0.0.1:8080        # port 25 / resolver / PTR-FCrDNS ready?

sabitd serves plain HTTP on loopback; put a TLS-terminating reverse proxy (nginx/Caddy) in front before it faces the network. Details: Deployment.

2. Point a client at it

On your workstation, CI, or next to an agent:

curl -fsSL https://raw.githubusercontent.com/ruwadgroup/sabit/main/install.sh | sh -s -- cli mcp

Tell the client where the server is, in ~/.config/sabit/config.toml:

default = "prod"

[instances.prod]
url = "https://sabit.example.com"
token = "sk_live_your_token"

Then verify:

sabit verify jane.doe@example.com              # one address
sabit verify --local "asdf@gmial.con"          # offline layers only
sabit batch contacts.csv --out results.jsonl   # async task: submit, poll, fetch

For an AI agent (Claude Code), point sabit-mcp at the same server via env:

SABIT_SERVER=https://sabit.example.com SABIT_TOKEN=sk_live_your_token \
  claude mcp add sabit -- sabit-mcp

Building from source instead: cargo build --release, then sudo make install-server or make install-client. More: Getting started.

Documentation

Doc What you'll find
Getting started Install the server, configure a client, first verification
Usage Everyday CLI usage: verify, batch tasks, auth, bounces
Use with AI agents The MCP server: setup, tools, a typical agent flow
CLI reference Every command and flag
HTTP API Endpoints, bearer auth, the result object, errors
Configuration Server (sabit.toml) and client config
Deployment Running sabitd in production (port 25, systemd, TLS, scaling)
How it works The scoring model, the pipeline, and honest uncertainty
Architecture The deep design reference

Start at docs/README.md.

Repository layout

sabit/
├── crates/
│   ├── sabit-core/     # the engine: local + network layers + scoring (server-side only)
│   ├── sabit-server/   # HTTP API + task/job worker (lib, used by sabitd)
│   ├── sabitd/         # the server daemon: runs the engine + API on a probing-capable host
│   ├── sabit-client/   # thin HTTP client + multi-instance config (shared by the clients)
│   ├── sabit-cli/      # the `sabit` binary, a thin client of a sabitd instance
│   └── sabit-mcp/      # the MCP server, a thin client that exposes sabitd to agents
├── packaging/          # systemd unit, sysusers/tmpfiles, default config, nfpm (.deb/.rpm)
├── data/               # disposable lists, MX fingerprints, rules.json, gibberish model
├── docs/               # everything in the table above
├── install.sh          # the one-line installer
└── .github/            # CI, release, security scanning, templates

Verification only happens in sabitd, which must run where SMTP probing is possible (outbound port 25, a clean IP, matching PTR/FCrDNS). The sabit CLI and sabit-mcp carry no verification logic; they point at one or more sabitd instances over HTTP and relay the result.

Roadmap

The engine (local + DNS + SMTP layers, catch-all detection, per-provider routing, scoring), the sabitd HTTP API with the async task system, bearer auth, the deliverability parsers (DSN, ARF), and the Linux packaging are in place. Still ahead: an SMTP-verdict cache, a broader per-provider headless module set (Outlook B2C / Yahoo), real STARTTLS upgrade, a Streamable-HTTP MCP transport, and shared/persistent task storage. Full plan: ROADMAP.md.

Responsible use

SMTP probing uses the same handshake spammers use to harvest addresses, and aggressive probing can violate a receiving provider's ToS and get your IP blocklisted. Sabit ships responsible defaults (FCrDNS-checked HELO, a real MAIL FROM, per-domain rate caps, jittered backoff, and never sending DATA), but the operator is responsible for lawful use. Verifying a list you already hold for a legitimate purpose is generally defensible; probing arbitrary harvested addresses is not. See Deployment. This is technical guidance, not legal advice.

Contributing

Contributions are welcome, especially disposable-domain and MX-fingerprint updates, per-provider rules (data/rules.json), accuracy-corpus addresses (with permission), and SMTP edge-case reports. Read CONTRIBUTING.md for the invariants, dev setup, and commit conventions (Conventional Commits with enforced scopes).

Community & support

License

Apache-2.0. Sabit bundles community disposable-domain lists and a gibberish model derived from an open word corpus under their own licenses; see LICENSING.md.

About

A self-hostable, multi-signal email-verification engine in Rust, built for AI agents over MCP. A Ruwad Group project.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors