OPTIONS¶
Configuration options:
PERSONALITY
Current personality of this instance of metadata
server. Valid values are master, shadow and
ha-cluster-managed. If installation is managed by an HA cluster the
only valid value is ha-cluster-managed, otherwise the only valid values
are master and shadow, in which case only one metadata server in
SaunaFS shall have master personality.
master means that this instance of metadata server acts as main metadata
server govering all file system metadata modifications.
shadow means that this instance of metadata server acts as backup
metadata server ready for immediate deployment as new master in case of
current master failure.
Metadata server personality can be changed at any moment as long as one changes
personality from shadow to master, changing personality the
other way around is forbidden.
ha-cluster-managed means that this instance is managed by HA cluster,
server runs in shadow mode as long as its not remotely promoted to
master.
CLUSTER_ID
Uniquely identifies this namespace. Prevents accidental
connections to wrong metadata or chunkservers. (default is
default)
DATA_PATH
where to store metadata files and lock file
WORKING_USER
user to run daemon as
WORKING_GROUP
group to run daemon as (optional - if empty then default
user group will be used)
SYSLOG_IDENT
name of process to place in syslog messages (default is
sfsmaster)
LOCK_MEMORY
whether to perform mlockall() to avoid swapping out
sfsmaster process (default is 0, i.e. no)
LIMIT_GLIBC_MALLOC_ARENAS
Linux only: limit glibc malloc arenas to given value -
prevents from using huge amount of virtual memory. This can influence
performance by reducing memory fragmentation and improving cache locality, but
it may also lead to contention and reduced parallelism in multi-threaded
applications. Use it in constrained memory environments, recommended values
are 4 or 8. (default is 0: disabled or let glibc decide)
NICE_LEVEL
nice level to run daemon with (default is -19 if
possible; note: process must be started as root to increase priority)
EXPORTS_FILENAME
alternative name of sfsexports.cfg file
TOPOLOGY_FILENAME
alternative name of sfstopology.cfg file
CUSTOM_GOALS_FILENAME
alternative name of sfsgoals.cfg file
PREFER_LOCAL_CHUNKSERVER
If a client mountpoint has a local chunkserver, and a
given chunk happens to reside locally, then sfsmaster will list the local
chunkserver first. However, when the local client mount is issuing many
read(s)/write(s), to many local chunks, these requests can overload the local
chunkserver and disk subsystem. Setting this to 0(the default is 1) means that
remote chunkservers will be considered as equivalent to the local chunkserver.
This is useful when the network is faster than the disk, and when there is
high-IO load on the client mountpoints.
BACK_LOGS
number of metadata change log files (default is 50)
BACK_META_KEEP_PREVIOUS
number of previous metadata files to be kept (default is
1)
AUTO_RECOVERY
when this option is set (equals 1) master will try to
recover metadata from changelog when it is being started after a crash;
otherwise it will refuse to start and sfsmetarestore should be used to
recover the metadata (default is 0)
OPERATIONS_DELAY_INIT
initial delay in seconds before starting chunk operations
(default is 300)
OPERATIONS_DELAY_DISCONNECT
chunk operations delay in seconds after chunkserver
disconnection (default is 3600)
MATOML_LISTEN_HOST
IP address to listen on for metalogger connections
(* means any)
MATOML_LISTEN_PORT
port to listen on for metalogger connections (default is
9419)
MATOML_LOG_PRESERVE_SECONDS
how many seconds of change logs have to be preserved in
memory (default is 600; note: logs are stored in blocks of 5k lines, so
sometimes real number of seconds may be little bigger; zero disables extra
logs storage)
MATOCS_LISTEN_HOST
IP address to listen on for chunkserver connections
(* means any)
MATOCS_LISTEN_PORT
port to listen on for chunkserver connections (default is
9420)
MATOCL_LISTEN_HOST
IP address to listen on for client (mount) connections
(* means any)
MATOCL_LISTEN_PORT
port to listen on for client (mount) connections (default
is 9421)
MATOTS_LISTEN_HOST
IP address to listen on for tapeserver connections
(* means any)
MATOTS_LISTEN_PORT
Port to listen on for tapeserver connections (default is
9424)
CHUNKS_LOOP_MAX_CPS
Chunks loop shouldn’t check more chunks per
seconds than given number (default is 100000)
CHUNKS_LOOP_MIN_TIME
Chunks loop will check all chunks in specified time
(default is 300) unless CHUNKS_LOOP_MAX_CPS will force slower
execution.
CHUNKS_LOOP_PERIOD
Time in milliseconds between chunks loop execution
(default is 1000).
CHUNKS_LOOP_MAX_CPU
Hard limit on CPU usage by chunks loop (percentage value,
default is 60).
CHUNKS_SOFT_DEL_LIMIT
Soft maximum number of chunks to delete on one
chunkserver (default is 10)
CHUNKS_HARD_DEL_LIMIT
Hard maximum number of chunks to delete on one
chunkserver (default is 25)
CHUNKS_WRITE_REP_LIMIT
Maximum number of chunks to replicate to one chunkserver
(default is 2)
CHUNKS_READ_REP_LIMIT
Maximum number of chunks to replicate from one
chunkserver (default is 10)
ENDANGERED_CHUNKS_PRIORITY
Percentage of endangered chunks that should be replicated
with high priority. Example: when set to 0.2, up to 20% of chunks served in
one turn would be extracted from endangered priority queue. When set to 1
(max), no other chunks would be processed as long as there are any endangered
chunks in the queue (not advised) (default is 0, i.e. there is no overhead for
prioritizing endangered chunks).
ENDANGERED_CHUNKS_MAX_CAPACITY
Max capacity of endangered chunks queue. This value can
limit memory usage of master server if there are lots of endangered chunks in
the system. This value is ignored if ENDANGERED_CHUNKS_PRIORITY is set to 0.
(default is 1Mi, i.e. no more than 1Mi chunks will be kept in a queue).
ACCEPTABLE_DIFFERENCE
A maximum difference between disk usage on chunkservers
that doesn’t trigger chunk rebalancing (default is 0.1, i.e.
10%).
CHUNKS_REBALANCING_BETWEEN_LABELS
When balancing disk usage, allow moving chunks between
servers with different labels (default is 0, i.e. chunks will be moved only
between servers with the same label).
GLOBALIOLIMITS_FILENAME
Configuration of global I/O limits (default is no I/O
limiting)
GLOBALIOLIMITS_RENEGOTIATION_PERIOD_SECONDS
How often mountpoints will request bandwidth allocations
under constant, predictable load (default is 0.1)
GLOBALIOLIMITS_ACCUMULATE_MS
After inactivity, no waiting is required to transfer the
amount of data equivalent to normal data flow over the period of that many
milliseconds (default is 250)
METADATA_CHECKSUM_INTERVAL
how often metadata checksum shall be sent to backup
servers (default is: every 50 metadata updates)
METADATA_CHECKSUM_RECALCULATION_SPEED
how fast should metadata be recalculated in background
(default : 100 objects per function call)
DISABLE_METADATA_CHECKSUM_VERIFICATION
should checksum verification be disabled while applying
changelog
NO_ATIME
when this option is set to 1 inode access time is not
updated on every access, otherwise (when set to 0) it is updated (default is
0)
METADATA_SAVE_REQUEST_MIN_PERIOD
minimal time in seconds between metadata dumps caused by
requests from shadow masters (default is 1800)
SESSION_SUSTAIN_TIME
Time in seconds for which client session data (e.g. list
of open files) should be sustained in the master server after connection with
the client was lost. Values between 60 and 604800 (one week) are accepted.
(default is 86400)
USE_BDB_FOR_NAME_STORAGE
When this option is set to 1 Berkeley DB is used for
storing file/directory names in file (DATA_PATH/name_storage.db). By default
all strings are kept in system memory. (default is 0)
BDB_NAME_STORAGE_CACHE_SIZE
Size of memory cache (in MB) for file/directory names
used by Berkeley DB storage. (default is 10)
AVOID_SAME_IP_CHUNKSERVERS
When this option is set to 1, process of selecting
chunkservers for chunks will try to avoid using those that share the same ip.
(default is 0)
REDUNDANCY_LEVEL
minimum number of required redundant chunk parts that can
be lost before chunk becomes endangered (default is 0)
SNAPSHOT_INITIAL_BATCH_SIZE
This option can be used to specify initial number of
snapshotted nodes that will be atomically cloned before enqueuing the task for
execution in fixed-sized batches. (default is 1000)
SNAPSHOT_INITIAL_BATCH_SIZE_LIMIT
This option specifies the maximum initial batch size set
for snapshot request. (default is 10000)
FILE_TEST_LOOP_MIN_TIME Test files loop will try to check
all files in specified time in seconds (default is 3600). It’s
possible for the loop to take more time if the master server is busy or the
machine doesn’t have enough processing power to make all the needed
calculations.
Options below are mandatory for all Shadow instances:
MASTER_HOST
address of the host running SaunaFS metadata server that
currently acts as master
MASTER_PORT
port number where SaunaFS metadata server currently
running as master listens for connections from 'shadow’s and
metaloggers (default is 9420)
MASTER_RECONNECTION_DELAY
delay in seconds before trying to reconnect to metadata
server after disconnection (default is 1)
MASTER_TIMEOUT
timeout (in seconds) for metadata server connections
(default is 60)
LOAD_FACTOR_PENALTY
When set, percentage of load will be added to chunkserver
disk usage to determine most fitting chunkserver. Heavy loaded chunkservers
will be picked for operations less frequently. (default is 0, correct values
are in range from 0 to 0.5)
PRIORITIZE_DATA_PARTS
When set, master server will prioritize data parts in EC
goals to land in the chunkservers with higher percentage of available space.
Could cause parities landing always in the same chunkservers if the cluster is
not well balanced. (default: 1)
POLL_TIMEOUT_MS
Maximum amount of time in milliseconds that the polling
operation will wait for events. The value is applied for the polling in the
events loop. Smaller values could reduce latency at the cost of CPU usage
(default: 50)
ENABLE_PROMETHEUS (EXPERIMENTAL)
Whether to enable Prometheus support and metric
collection. Note that this requires compiling with Prometheus support. Set to
either 1 to enable, or 0 to disable (default is 0)
PROMETHEUS_HOST (EXPERIMENTAL)
Host address where Prometheus metric data can be
collected, must be in the format of HOST:PORT (default 0.0.0.0:9499)
CREATE_EMPTY_FOLDERS_WHEN_SPACE_DEPLETED (EXPERIMENTAL)
When enabled, this option allows the system to create metadata for empty
folders when storage space is depleted. This helps preserving the intended
file and directory structure. When this option is disabled, no metadata for
folders will be created if there is insufficient storage space (default:
1).
LOG_LEVEL
Setup logging. Uses the environment variable
SAUNAFS_LOG_LEVEL or config value LOG_LEVEL to determine logging level. Valid
log levels are
•trace
•debug
•info
•warn or warning
•err or error
•crit or critical
•off
TLS_CERT_FILE (EXPERIMENTAL)
Path to the TLS certificate file the master/shadow server
will use for TLS connections (there is no default value).
TLS_KEY_FILE (EXPERIMENTAL)
Path to the TLS private key file the master/shadow server
will use for TLS connections (there is no default value).
TLS_CA_CERT_FILE (EXPERIMENTAL)
Path to the trusted CA certificate which is used to
authenticate the TLS connection (there is no default value).
USE_CHUNKSERVER_SIDE_CHUNK_LOCK (EXPERIMENTAL)
When set to 1, enables sending chunk part lock messages
to the chunkservers. This can be useful to track down which chunk parts are
currently being written. Reloadable (default: 0).
EMPTY_RESERVED_FILES_PERIOD_MSECONDS (EXPERIMENTAL)
Interval for periodic cleaning of reserved files, in
milliseconds. If set to 0, the reserved files deletion is disabled. (Default:
0)
Warning
Administrator-only option. Enabling periodic deletion may remove
master-side
references to reserved files that are still held by other applications or
client instances that have not released them yet. This can disrupt ongoing
operations and lead to unexpected errors.
Recommended approach: Do not enable automatic deletion unless you
fully understand the impact on running workloads. Prefer manually freeing
reserved files by locating each reference or process using them and stopping
it before releasing the reservation.