4.8. Fault tolerance and High Availability

4.8.1. Timeouts
4.8.2. Options

Shardman provides fault tolerance out-of-the box. The shardmand daemon monitors the cluster configuration and manages stolon clusters, which are used to guarantee high availability of all shards and fault tolerance. The common Shardman configuration (shardmand, stolon clusters) stored into etcd cluster.

To ensure fault tolerance for each stolon cluster, you must set the Repfactor > 0 in cross replication mode (PlacementPolicy=cross) or add at least one replica in manual topology mode (PlacementPolicy=manual).

Stolon sentinels have the responsibility of observing the keepers and carrying out elections to choose one of the keepers as the master. Sentinels hold elections when the cluster starts and every time the current master keeper goes down.

One of the keepers is elected as the master. All write operations take place at the master and the other instances are used for follower instances.

In case of automatic failover, stolon will take care of changing slave to master and failed master to standby automatically. Only one additional thing you need is etcd to store the master/slave instant information by stolon.

If necessary, you can switch to the new master manually by calling the shardmanctl shard switch command.

Аutomatic failover is built on the use of timeouts, which can be overridden in sdmspec.json as in the example:

                {
                  "ClusterSpec":{
                    "StolonSpec":{
                      "failInterval": "20s",
                      "sleepInterval": "5s",
                      "convergenceTimeout": "30s",
                      "deadKeeperRemovalInterval": "48h",
                      "requestTimeout": "10s",
                      ...
                    },
                    ...
                  },
                ...
                }
            

4.8.1. Timeouts

convergenceTimeout

Interval to wait for a db to be converged to the required state when no long operation are expected (default 30s).

deadKeeperRemovalInterval

Interval after which a dead keeper will be removed from the cluster data (default 48h).

failInterval

Interval after the first fail to declare a keeper or a db as not healthy (default 20s).

requestTimeout

Time after which any request (keepers checks from sentinel etc...) will fail (default 10s).

sleepInterval

Interval to wait before next check (default 5s).

4.8.2. Options

You could specify some high availability options to specify cluster behavior in fault state.

masterDemotionEnabled

Enable master demotion in case the replica group master has lost connectivity with etcd-cluster. Master attempts to connect to each of its standby nodes to determine if any of them have become the master. If it discovers another master, it shuts down its own DBMS instance until connectivity with etcd is restored to prevent inconsistent write and stale reads. If the master fails to connect to one of its standby nodes for a time longer than masterDemotionTimeout value, DBMS instance shutdown occurs.

Default: false.

masterDemotionTimeout

Configure timeout during which the master will attemts to connect to its standbys in cases connectivity loss with etcd (default 30s).

minSyncMonitorEnabled

Enable monitor for MinSynchronousStandbys value for the every replica group. If node lose connection with the cluster (all keepers are unhealthy: keeper doesn't update his state longer than minSyncMonitorUnhealthyTimeout), monitor decrease MinSynchronousStandbys value for every replica group related to the disconnected node to the max available value. It allows to prevent read-only condition caused by the fake replica. The maximum available value is always less or equal than the value specified in the cluster configuration. If all keepers related to the disconnected node become healthy, monitor change replica group MinSynchronousStandbys value to the value specified in the cluster configuration.

Default: false.

minSyncMonitorUnhealthyTimeout

Time duration after which the node (and all keepers related to this node) will be considered in the unhealthy condition. Works only if parameter minSyncMonitorEnabled set to true (default 30s).