Scroll to navigation

SAUNAFS-URAFT.CFG(5)   SAUNAFS-URAFT.CFG(5)

NAME

saunafs-uraft.cfg - main configuration file for saunafs-uraft

DESCRIPTION

The file saunafs-uraft.cfg contains configuration of SaunaFS HA suite.

This configuration is consumed by:

saunafs-uraft (the daemon): election parameters, status port, timeouts.

saunafs-uraft-helper (the helper script): floating IP network details and local sfsmaster management.

SYNTAX

Syntax is:

OPTION = VALUE

Lines starting with # character are ignored.

OPTIONS

Configuration options:

URAFT_NODE_ADDRESS

Contains an IP address or hostname of uraft node, possibly with port.
This option should be specified multiple times in order to contain information on every node in the cluster.

Example:
URAFT_NODE_ADDRESS = node1:9427
URAFT_NODE_ADDRESS = node2
URAFT_NODE_ADDRESS = 192.168.0.1:9427

URAFT_ID

This option is strictly connected with the ones above.

It identifies on which node this uraft instance runs on.
Node numbers start from 0 and have the same order as URAFT_NODE_ADDRESS entries.

For example, if this configuration resides on node with hostname node2, its URAFT_ID should be set to 1.

LOCAL_MASTER_ADDRESS

Specifies an address of local SaunaFS master server (default: localhost).

LOCAL_MASTER_MATOCL_PORT

Specifies a client port of local SaunaFS master server (default: 9421).

URAFT_PORT

UDP port used for uRaft peer-to-peer messages. This port must be reachable between all voting nodes. (default: 9427).

URAFT_STATUS_PORT

TCP port for the uRaft status endpoint. This endpoint is useful for debugging and automated tests. (default: 9428).

URAFT_FLOATING_IP

Floating IP to be used as an active master.

URAFT_FLOATING_NETMASK

Network mask matching floating IP configuration.

URAFT_FLOATING_IFACE

Network interface where floating IP should be managed (example: eth0).
Important: If the network interface used for the floating IP becomes unavailable, the failover
mechanism will not restore it. As a result, the floating IP cannot be recovered, causing SaunaFS
services that depend on it to become unavailable.
The failover mechanism is based on the URAFT_FLOATING_IP_CHECK_PERIOD option.
See its documentation for more details.

ELECTION_TIMEOUT_MIN

[advanced] Minimal timeout in milliseconds for election algorithm. (default: 400)

ELECTION_TIMEOUT_MAX

[advanced] Maximal timeout in milliseconds for election algorithm. (default: 600)

URAFT_ELECTOR_MODE

[advanced] Controls the election role of this node.

•If set to 0 (Default value), the node can both vote in LEADER elections and be elected as the
new leader.

•If set to 1, the node acts as an elector only: it participates in voting for a new leader,
but cannot itself be elected leader.

When a node runs in elector mode, the metadata service is not required, which reduces its memory consumption.

HEARTBEAT_PERIOD

[advanced] Period in milliseconds between subsequent heartbeat messages. (default: 20)

LOCAL_MASTER_CHECK_PERIOD

[advanced] Period in milliseconds between checking whether local
master is alive. This value drives how frequently the daemon calls saunafs-uraft-helper isalive
on metadata-capable nodes. (default: 250)

URAFT_FLOATING_IP_CHECK_PERIOD

[advanced] Period in milliseconds between checking whether
the floating IP is alive. A value of 0 disables floating IP monitoring and its automatic recovery
mechanism. Adjust this setting based on network stability and failover requirements. (default: 500)

URAFT_CHECK_CMD_PERIOD

[advanced] Period in milliseconds between checks of long-running helper commands (promotion/demotion). (default: 100).

URAFT_GETVERSION_TIMEOUT

[advanced] Timeout in milliseconds for helper commands that are expected to be fast, such as:

saunafs-uraft-helper metadata-version

saunafs-uraft-helper isalive

If this timeout is too low, the daemon may repeatedly log timeouts and keep a node blocked for promotions (blocked_promote=1),
preventing elections. (default: 100).

URAFT_PROMOTE_TIMEOUT

[advanced] Timeout in milliseconds for promotion (saunafs-uraft-helper promote). (default: 1000000000).

URAFT_DEMOTE_TIMEOUT

[advanced] Timeout in milliseconds for demotion (saunafs-uraft-helper demote). (default: 1000000000).

URAFT_DEAD_HANDLER_TIMEOUT

[advanced] Timeout in milliseconds for the dead handler (saunafs-uraft-helper dead). (default: 1000000000).

QUORUM_LOSS_GRACE_HEARTBEATS

[advanced] Number of consecutive missed heartbeats allowed before quorum is considered lost.

This parameter defines how many heartbeats may be missed in a row before the system declares quorum
loss and demotes the current Leader. Increasing this value helps tolerate short-lived network
glitches or transient latency spikes, reducing unnecessary demotions and improving overall
cluster stability. The default value is 5.

TIMING RELATIONSHIPS

Raft-style leader election relies on the relationship between heartbeat and election timeouts.

At a high level:

•Heartbeats are sent every HEARTBEAT_PERIOD.

•A follower starts an election if it has not observed a valid leader heartbeat for a duration
between ELECTION_TIMEOUT_MIN and ELECTION_TIMEOUT_MAX.

Practical guidance:

•Choose election timeouts substantially larger than the heartbeat period. A common starting point is:

ELECTION_TIMEOUT_MIN >= 10 * HEARTBEAT_PERIOD

•Keep ELECTION_TIMEOUT_MAX close to ELECTION_TIMEOUT_MIN (for example 1.5x) to reduce worst-case failover time,
but not so close that elections synchronize.

QUORUM_LOSS_GRACE_HEARTBEATS effectively controls demotion sensitivity:

demotion window ~= QUORUM_LOSS_GRACE_HEARTBEATS * HEARTBEAT_PERIOD

•In WAN/VPN environments with packet loss, consider increasing this value and/or increasing HEARTBEAT_PERIOD.

EXAMPLES

The examples below show complete configurations for three common environments.
Each node uses the same file contents except for URAFT_ID (and optionally URAFT_ELECTOR_MODE).

Example 1: LAN / Low Packet Loss

This configuration targets low-latency LAN environments where packet loss is rare and leader churn is unlikely.

# Cluster membership
URAFT_NODE_ADDRESS = 10.0.0.11:9427
URAFT_NODE_ADDRESS = 10.0.0.12:9427
URAFT_NODE_ADDRESS = 10.0.0.13:9427
# Per-node setting (0, 1, or 2 depending on which node)
URAFT_ID = 0
# Local metadata service
LOCAL_MASTER_ADDRESS = localhost
LOCAL_MASTER_MATOCL_PORT = 9421
LOCAL_MASTER_CHECK_PERIOD = 250
# Raft timing
HEARTBEAT_PERIOD = 20
ELECTION_TIMEOUT_MIN = 400
ELECTION_TIMEOUT_MAX = 600
QUORUM_LOSS_GRACE_HEARTBEATS = 10
# Helper timeouts
URAFT_GETVERSION_TIMEOUT = 200
# Status endpoint
URAFT_STATUS_PORT = 9428
# Floating IP
URAFT_FLOATING_IP = 10.0.0.100
URAFT_FLOATING_NETMASK = 24
URAFT_FLOATING_IFACE = eth0
URAFT_FLOATING_IP_CHECK_PERIOD = 500

Example 2: WAN/VPN / Higher Loss and Jitter

This configuration targets deployments where voting nodes communicate across a WAN or where the floating IP is effectively reached via VPN.

Symptoms of overly aggressive timing in these environments can include leader ping-pong effect (rapid leadership churn)
and prolonged periods without a stable Leader.

The strategy is:

•Send heartbeats less frequently.

•Use larger election timeouts.

•Increase quorum-loss hysteresis.

# Cluster membership (example with 5 voters)
URAFT_NODE_ADDRESS = 172.18.0.4:9427
URAFT_NODE_ADDRESS = 172.18.0.5:9427
URAFT_NODE_ADDRESS = 172.18.0.6:9427
URAFT_NODE_ADDRESS = 172.18.0.7:9427
URAFT_NODE_ADDRESS = 172.18.0.8:9427
# Per-node setting
URAFT_ID = 0
# Optional: elector nodes (set URAFT_ELECTOR_MODE=1 on electors)
URAFT_ELECTOR_MODE = 0
# Local metadata service
LOCAL_MASTER_ADDRESS = localhost
LOCAL_MASTER_MATOCL_PORT = 9421
LOCAL_MASTER_CHECK_PERIOD = 500
# Raft timing (slower but more stable over lossy links)
HEARTBEAT_PERIOD = 100
ELECTION_TIMEOUT_MIN = 3000
ELECTION_TIMEOUT_MAX = 4500
QUORUM_LOSS_GRACE_HEARTBEATS = 30
# Helper timeouts (avoid timeouts during transient stalls)
URAFT_GETVERSION_TIMEOUT = 1000
# Status endpoint
URAFT_STATUS_PORT = 9428
# Floating IP
URAFT_FLOATING_IP = 172.18.0.10
URAFT_FLOATING_NETMASK = 32
URAFT_FLOATING_IFACE = lo
URAFT_FLOATING_IP_CHECK_PERIOD = 1000

Example 3: 5-Node Topology (3 Metadata + 2 Electors)

This example shows a common production topology:

•3 metadata-capable nodes (eligible to become Leader)

•2 elector-only nodes (vote in elections but never become Leader)

This reduces the risk that a metadata node becomes unavailable due to resource pressure, while still increasing the number of voters to improve resilience.


Important

Elector nodes are still voting members. They must have stable network connectivity to
the metadata nodes; otherwise they can contribute to leader churn.

All nodes share the same base configuration. Per-node differences are:

URAFT_ID must be unique and match the node’s position in URAFT_NODE_ADDRESS.

URAFT_ELECTOR_MODE must be set to 1 on elector nodes.

# Cluster membership (5 voters total)
URAFT_NODE_ADDRESS = 10.0.0.11:9427   # metadata
URAFT_NODE_ADDRESS = 10.0.0.12:9427   # metadata
URAFT_NODE_ADDRESS = 10.0.0.13:9427   # metadata
URAFT_NODE_ADDRESS = 10.0.0.21:9427   # elector-only
URAFT_NODE_ADDRESS = 10.0.0.22:9427   # elector-only
# Per-node setting
URAFT_ID = 0
# Set to 0 on metadata nodes (IDs 0-2), set to 1 on elector nodes (IDs 3-4)
URAFT_ELECTOR_MODE = 0
# Local metadata service (used only on metadata nodes)
LOCAL_MASTER_ADDRESS = localhost
LOCAL_MASTER_MATOCL_PORT = 9421
LOCAL_MASTER_CHECK_PERIOD = 250
# Raft timing (LAN-friendly defaults)
HEARTBEAT_PERIOD = 20
ELECTION_TIMEOUT_MIN = 400
ELECTION_TIMEOUT_MAX = 600
QUORUM_LOSS_GRACE_HEARTBEATS = 10
# Helper timeouts
URAFT_GETVERSION_TIMEOUT = 200
# Status endpoint
URAFT_STATUS_PORT = 9428
# Floating IP (managed by the Leader, which must be a metadata node)
URAFT_FLOATING_IP = 10.0.0.100
URAFT_FLOATING_NETMASK = 24
URAFT_FLOATING_IFACE = eth0
URAFT_FLOATING_IP_CHECK_PERIOD = 500

TROUBLESHOOTING

Repeated "Isalive timeout" lines

Increase URAFT_GETVERSION_TIMEOUT and ensure the local sfsmaster responds promptly to sfsmaster -o ha-cluster-managed isalive.

Nodes stuck with blocked_promote=1

This indicates the daemon is refusing to allow leader promotion on that node.
The most common causes are repeated isalive timeouts or repeated dead status.

Leader churn / ping-pong leadership

Increase election timeouts and quorum-loss grace, and ensure voting nodes have stable connectivity.

REPORTING BUGS

Report bugs to the GitHub repository <https://github.com/leil-io/saunafs> as an issue.

COPYRIGHT

Copyright 2008-2009 Gemius SA

Copyright 2013-2019 Skytechnology sp. z o.o.

Copyright 2023 Leil Storage OÜ

SaunaFS is free software: you can redistribute it and/or modify it under the
terms of the GNU General Public License as published by the Free Software
Foundation, version 3.

SaunaFS is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with SaunaFS. If not, see <http://www.gnu.org/licenses/>.

SEE ALSO

saunafs-uraft(8)

2026-03-27