-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Currently, sable_services writes its database as a single JSON file on disk. This is similar to what Atheme does, so we know it works are least at Libera.Chat's scale.
While this can easily be replicated to other services, it means sable_services going down causes an outage where people cannot login, channel ops cannot be opped, etc. This happens on Libera.Chat from time to time.
Given Sable's distributed architecture, we can do better here. @spb's idea is to have multiple sable_services nodes, one of which would be a leader and would stream its database to the other.
The database could remain a single JSON file, but it might become a scaling concern to copy this file over and over. We see a few options to solve this:
- use a database that supports streaming replication, like PostgreSQL.
- make
sable_servicesnodes coordinate over the Sable network, and each have their own independent database - make
sable_servicesnodes share a single replicated database (Cassandra, something on top of Ceph, CockroachDB, ...)
With options 1 and 2, if we want high availability,it means sable_services needs to somehow have a leader election, because we can't allow write to the same objects from multiple nodes at the same time. PostgreSQL does not provide a solution to this, and expects users to tell it when to switch between follower/leader state.
And option 3 may be unsustainable for Libera, as all solutions I'm aware of in this space require extensive specialized knowledge with that solution (maybe not CockroachDB though? I've never tried it). In particular, Cassandra and Ceph are designed to work with petabyte-scale data, which is far beyond what we need here. Additionally, they often come with constraints/caveats in what software developers can do with the database.