Skip to content

Replicating the services database #119

@progval

Description

@progval

Currently, sable_services writes its database as a single JSON file on disk. This is similar to what Atheme does, so we know it works are least at Libera.Chat's scale.

While this can easily be replicated to other services, it means sable_services going down causes an outage where people cannot login, channel ops cannot be opped, etc. This happens on Libera.Chat from time to time.

Given Sable's distributed architecture, we can do better here. @spb's idea is to have multiple sable_services nodes, one of which would be a leader and would stream its database to the other.
The database could remain a single JSON file, but it might become a scaling concern to copy this file over and over. We see a few options to solve this:

  1. use a database that supports streaming replication, like PostgreSQL.
  2. make sable_services nodes coordinate over the Sable network, and each have their own independent database
  3. make sable_services nodes share a single replicated database (Cassandra, something on top of Ceph, CockroachDB, ...)

With options 1 and 2, if we want high availability,it means sable_services needs to somehow have a leader election, because we can't allow write to the same objects from multiple nodes at the same time. PostgreSQL does not provide a solution to this, and expects users to tell it when to switch between follower/leader state.

And option 3 may be unsustainable for Libera, as all solutions I'm aware of in this space require extensive specialized knowledge with that solution (maybe not CockroachDB though? I've never tried it). In particular, Cassandra and Ceph are designed to work with petabyte-scale data, which is far beyond what we need here. Additionally, they often come with constraints/caveats in what software developers can do with the database.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions