4.9. Fault Tolerance and High Availability

4.9.1. Timeouts
4.9.2. Options

Shardman provides out-of-the-box fault tolerance. The shardmand daemon monitors the cluster configuration and manages stolon clusters, which are used to guarantee high availability of all shards and fault tolerance. The common Shardman configuration (shardmand, stolon clusters) is stored in an etcd cluster.

To ensure fault tolerance for each stolon cluster, you must set Repfactor > 0 in the cross-replication mode (PlacementPolicy=cross) or add at least one replica in the manual-topology mode (PlacementPolicy=manual).

stolon sentinels have the responsibility of observing the keepers and carrying out elections to choose one of the keepers as the master. Sentinels hold elections when the cluster starts and every time the current master keeper goes down.

One of the keepers is elected as the master. All write operations take place at the master, and the other instances are used as follower instances.

In the case of automatic failover, stolon will take care of automatically changing slave to master and failed master to standby. Only one additional thing you need is etcd to store the master/slave instant information by stolon.

If necessary, you can switch to a new master manually by running the shardmanctl shard switch command.

Аutomatic failover is based on the use of timeouts, which can be overridden in sdmspec.json, as in the example:

                {
                "ShardSpec":{
                    "failInterval": "20s",
                    "sleepInterval": "5s",
                    "convergenceTimeout": "30s",
                    "deadKeeperRemovalInterval": "48h",
                    "requestTimeout": "10s",
                    ...
                },
                ...
                },
                ...
            

4.9.1. Timeouts

convergenceTimeout

Interval to wait for a database to be converged to the required state when no long operation are expected.

Default: 30s.

deadKeeperRemovalInterval

Interval after which a dead keeper will be removed from the cluster data.

Default: 48h.

failInterval

Interval after the first failure to declare a keeper or a database as not healthy.

Default: 20s.

requestTimeout

Time after which any request (keeper checks from sentinel etc...) will fail.

Default: 10s.

sleepInterval

Interval to wait before the next check.

Default: 5s.

4.9.2. Options

You could specify some high availability options to specify cluster behavior in fault state.

masterDemotionEnabled

Enable master demotion in case the replica group master has lost connectivity with etcd-cluster. Master attempts to connect to each of its standby nodes to determine if any of them have become the master. If it discovers another master, it shuts down its own DBMS instance until connectivity with etcd is restored to prevent inconsistent write and stale reads. If the master fails to connect to one of its standby nodes for a time longer than masterDemotionTimeout value, DBMS instance shutdown occurs.

Default: false.

masterDemotionTimeout

Configure timeout during which the master will attemts to connect to its standbys in cases connectivity loss with etcd (default 30s).

minSyncMonitorEnabled

Enable monitor for MinSynchronousStandbys value for every replica group. If a node loses connection with the cluster (all keepers are unhealthy: keeper does not update its state longer than minSyncMonitorUnhealthyTimeout), the monitor decreases the MinSynchronousStandbys value for every replica group related to the disconnected node to the maximum available value. It allows preventing a read-only condition caused by the fake replica. The maximum available value is always less than or equal to the value specified in the cluster configuration. If all keepers related to the disconnected node become healthy, the monitor changes the replica group MinSynchronousStandbys value to the value specified in the cluster configuration.

Default: false.

minSyncMonitorUnhealthyTimeout

Time duration after which the node (and all keepers related to this node) will be considered to be in the unhealthy condition. Works only if parameter minSyncMonitorEnabled set to true (default 30s).