Scroll to navigation

mfsscadmin(1) This is part of MooseFS mfsscadmin(1)

NAME

mfsscadmin - MooseFS storage class administration tool

SYNOPSIS

mfscreatesclass [-?] [-M MOUNTPOINT] [-a admin_only] [-m labels_mode] [-o arch_mode] [-C CREATION_LABELS] -K KEEP_LABELS [-A ARCH_LABELS [-d arch_delay] [-s min_file_length]] [-T TRASH_LABELS [-t min_trashretention]] SCLASS_NAME...

mfsmodifysclass [-?] [-M MOUNTPOINT] [-f] [-a admin_only] [-m labels_mode] [-o arch_mode] [-C CREATION_LABELS] [-K KEEP_LABELS] [-A ARCH_LABELS] [-d arch_delay] [-s min_file_length] [-T TRASH_LABELS] [-t min_trashretention] SCLASS_NAME...

mfsdeletesclass [-?] [-M MOUNTPOINT] SCLASS_NAME...

mfsclonesclass [-?] [-M MOUNTPOINT] SRC_SCLASS_NAME DST_SCLASS_NAME...

mfsrenamesclass [-?] [-M MOUNTPOINT] SRC_SCLASS_NAME DST_SCLASS_NAME

mfslistsclass [-?] [-M MOUNTPOINT] [-l] [-i] [SCLASS_NAME_GLOB_PATTERN]

mfsimportsclass [-?] [-M MOUNTPOINT] [-r] [-n filename]

DESCRIPTION

This is a set of tools for creating and modifying storage classes, which can be later applied to MooseFS objects with mfssclass tools (see mfssclass(1)). Storage class is a set of labels expressions and options that indicates on which chunkservers the files in this class should be written and later kept.

mfscreatesclass creates a new storage class with given options, described below, and names it SCLASS_NAME; there can be more than one name provided, multiple storage classes with the same definition will be created then

mfsmodifysclass changes the given options in a class or classes indicated by SCLASS_NAME parameter(s)

mfsdeletesclass removes the class or classes indicated by SCLASS_NAME parameter(s); if any of the classes is not empty (i.e. it is still used by some MooseFS objects), it will not be removed and the tool will return an error and an error message will be printed; empty classes will be removed in any case

mfsclonesclass copies class indicated by SRC_SCLASS_NAME under a new name provided with DST_SCLASS_NAME

mfsrenamesclass changes the name of a class from SRC_SCLASS_NAME to DST_SCLASS_NAME

mfslistsclass lists all the classes

mfsimportsclass imports storage classes definitions from stdin or a file and creates them; input format should be identical to mfslistsclass -l output.

OPTIONS

-C optional parameter, that tells the system to which chunkservers, defined by the CREATION_LABELS expression, the chunk should be first written just after creation; if this parameter is not provided for a class, the KEEP_LABELS chunkservers will be used

-K mandatory parameter, that tells the system on which chunkservers, defined by the KEEP_LABELS expression, the chunk(s) should be kept always, except for special conditions like creating, archiving and deleting (moving to Trash), if defined

-A optional parameter, that tells the system on which chunkservers, defined by the ARCH_LABELS expression, the chunk(s) should be kept for archiving purposes; the system starts to treat a chunk as archive, when atime/mtime/ctime (as set by -o) of the file it belongs to is older than the number of hours specified with -d option; see also ARCHIVE BEHAVIOUR section below

-d optional parameter that defines after how much time from atime/mtime/ctime (as set by -o) a file (and its chunks) are treated as archive; minimum unit is hours, default is 24, for value formating see TIME

-o optional parameter that defines archive flags. C - ctime, M - mtime, A - atime, R - reversible, F - fastmode, P - per chunk ; default is C; see ARCHIVE BEHAVIOUR section below for details

-s optional parameter that defines minimum file length in bytes that can be archived; default is 0

-T optional parameter, that tells the system on which chunkservers, defined by the TRASH_LABELS expression, the chunk(s) of files in Trash should be kept; see also -t

-t optional parameter, that defines, how much time in Trash must be left for the system to actually use the schema defined in -T for a chunk; minimum unit is hours, default is 0, for value formating see TIME

-a can be either 1 or 0 and indicates if the storage class is available to everyone (0) or admin only (1)

-f force the changes on a predefined storage class (see PREDEFINED STORAGE CLASSES section), use with caution!

-m label mode used; possible values are l (or L, loose, Loose, LOOSE) for LOOSE mode, d (or D, std, Std, STD) for DEFAULT mode and s (or S, strict, Strict, STRICT) for STRICT mode; if no mode is defined, DEFAULT mode is assumed; behaviour of label modes is described below in LABEL MODES section

-l list also definitions, not only the names of existing storage classes

-i case insensitive storage class name matching

-r replace (overwrite) existing classes when importing storage classes

-n use provided filename as the source of storage classes definitions for importing, instead of stdin

-M MooseFS mount point, doesn't need to be specified if a tool is run inside MooseFS mounted directory or MooseFS is mounted in /mnt/mfs/

-? displays short usage message

TIME

For time variables their value can be defined as a number of seconds or hours (integer), depending on minimum unit of the variable, or as a time period in one of two possible formats:

first format: #.#T where T is one of: s-seconds, m-minutes, h-hours, d-days or w-weeks; fractions of minimum unit will be rounded to integer value

second format: #w#d#h#m#s, any number of definitions can be ommited, but the remaining definitions must be in order (so #d#m is still a valid definition, but #m#d is not); ranges: s,m: 0 to 59, h: 0 to 23, d: 0 t o 6, w is unlimited and the first definition is also always unlimited (i.e. for #d#h#m d will be unlimited)

If a minimum unit of a variable is larger than seconds, units below the minimum one will not be accepted. For example, a variable that has hours as a minimum unit will not accept s and m units.

Examples:

1.5d is the same as 1d12h, is the same as 36h

2.5w is the same as 2w3d12h, is the same as 420h; 2w84h is not a valid time period (h is not the first definition, so it is bound by range 0 to 23)

LABELS EXPRESSIONS

Labels are letters (A-Z - 26 letters) that can be assigned to chunkservers. Each chunkserver can have multiple (up to 26) labels. Labels are defined in mfschunkserver.cfg file, for more information refer to the appropriate manpage.

Labels expression is a set of subexpressions separated by commas. For full copies each subexpression specifies the storage schema of one copy of a file. Subexpression can be: an asterisk or a label schema. Label schema can be one label or an expression with sums, multiplications, negations and brackets. Sum means a file can be stored on any chunkserver matching any element of the sum (logical or). Multiplication means a file can be stored only on a chunkserver matching all elements (logical and). Asterisk means any chunkserver. Negation means any chunkserver but the one matching negated subexpression. Identical subexpressions can be shortened by adding a number in front of one instead of repeating it a number of times.

For EC labels expression starts with @ sign, followed by a number of data parts then + sign and a number that says how many parity parts the chunk should have. Possible numbers of data parts are 4 or 8. Possible numbers of parity parts are 1 (CE version) or 1 to 9 (PRO version). So, for example, @4+1 means EC with 4 data parts and 1 parity part, @8+3 means EC with 8 data parts and 3 parity parts. If number of data parts is omitted then the master uses the default value defined by DEFAULT_EC_DATA_PARTS - see mfsmaster.cfg(5). In this case @2 means @8+2 or @4+2. Then, maximum of two subexpressions can follow, separated by commas. If only one is present, it defines where all the parts should be kept. If both are present, the first subexpression defines where data parts should be kept, the second subexpression defines where parity parts should be kept.

Labels expression can be either a regular labels expression or EC labels expression (i.e. EC labels expression cannot be a subexpression). EC labels expression can only be used in place of ARCHIVE_LABELS or TRASH_LABELS in the storage class definition, regular labels expression can be use in any place.

At the end of each label expression one or two extending informations, divided with a special separator, can be added. The first possible extension, is the distinguish extension and the separator is the slash (/) sign. Second is labels mode override and this extenstion is separated by colon (:) sign.

Distinguish extension can be a list of labels or one of the following special strings:

[IP] or [I] - distinguish by IP number

[RACK] or [R] - distinguish by RACK, as defined in topology, see mfstopology.cfg(5)

If present, the distinguish part lets the system know that it should try to distribute full copies so that each copy is either on a different label from the list or on a chunkserver with different IP address or from a different rack. For EC the distinguish part is currently ignored.

NOTICE! If CHUNKS_UNIQUE_MODE is defined in mfsmaster.cfg to a value other than 0, it will override any distinguish setting in storage classes. For more informations about this parameter refer to mfsmaster.cfg(5) manual.

Labels mode override extension can be one of three characters: d (alternatively D or in string form std or Std or STD), s (alternatively S or in string form strict or Strict or STRICT) or l (alternatively L or in string form loose or Loose or LOOSE) and they mean that the DEFAULT, STRICT or LOOSE label mode, respectively, should be applied only to this one labels expression. For explanation about label modes see the LABEL MODES section.

One or both extensions can be present for each labels expression, each has to start with their separator and if both are present, the order has to be kept, i.e. the distinguish extension has to be first and the label mode extension needs to be second.

Examples of labels expressions:

A,B - files will have two copies, one copy will be stored on chunkserver(s) with label A, the other on chunkserver(s) with label B

A,* - files will have two copies, one copy will be stored on chunkserver(s) with label A, the other on any chunkserver(s)

A,!A - files will have two copies, one copy will be stored on chunkserver(s) with label A, the other on any chunkserver(s) that doesn't have the label A

*,* - files will have two copies, stored on any chunkservers (different for each copy)

AB,C+D+E - files will have two copies, one copy will be stored on any chunkserver(s) that has both labels A and B (multiplication of labels), the other on any chunkserver(s) that has either the C label or the D label or the E label (sum of labels)

A,B[X+Y],C[X+Y] - files will have three copies, one copy will be stored on any chunkserver(s) with A label, the second on any chunserver(s) that has the B label and either X or Y label, the third on any chunkserver(s), that has the C label and either X or Y label

2A expression is equivalent to A,A expression

A,3BC expression is equivalent to A,BC,BC,BC expression

2 expression is equivalent to 2* expression is equivalent to *,* expression

3*/[IP] - files will have 3 copies, each copy will be kept on a chunkserver with different IP address

A,B/[RACK] - files will have two copies, one copy will be stored on chunkserver(s) with label A, the other on chunkserver(s) with label B in a different rack than the other copy

S,H,H/ABX-Z - files will have 3 copies, one on server with label S, two on servers with label H, but each copy will be on a server with different label from the set of A, B, X, Y, Z

@4+1 - files will be kept in EC format, 4 data parts and 1 parity part

@8+3 - files will be kept in EC format, 8 data parts and 3 parity parts

@2 - files will be kept in EC format, default number of data parts, 2 parity parts

@4+3,Z - files will be kept in EC format, 4 data parts and 3 parity parts - all on chunkservers with label Z.

@2,A(X+Y) - files will be kept in EC format, default number of data parts, 2 parity parts, all parts will be kept on chunsevers with label A and either X or Y

@3,S,H - files will be kept in EC format, default number of data parts will be kept on chunkservers with label S, 3 parity parts will be kept on chunkservers with label H

AB,AC:l - files will be kept in copies format, one copy on a server with labels A and B, the second on a server with labels A and C and the behaviour of this should be LOOSE

@4+2,X,Y:s - files will be kept in EC format, 4 data parts will be kept on servers with label X, 2 parity (checksum) parts should be kept on servers with label Y and the behaviour of this should be STRICT

2A/[IP]:s - files should be kept in 2 copies, both copies on servers with label A, but each server should have different IP, behaviour of this when accounting for labels should be STRICT

LABEL MODES

It is important to specify what to do when it is not possible to meet the labels requirement of a storage class, i.e.: there is no space available on all servers with needed labels, there is not enough servers with needed labels or servers with needed labels are all busy. The question is if the system should create chunks on other servers (with non-matching labels) or not. This decision must be made by the user.

There are 3 modes of operation: DEFAULT, LOOSE and STRICT. The modes work a bit different depending on if a chunk is stored in copies or EC format, due to the different nature and algorithms that each of those format uses.

For copies format the 3 modes behave as follows:

In DEFAULT mode in case of overloaded servers the system will wait for them, but in case of no space available it will use other servers and will replicate data to correct servers when it becomes possible. This means if some servers are in busy state for a long time, it might not be possible to create new chunks with certain storage classes and endangered (undergoal) chunks from those classes are at higher risk of being completely lost due to delayed replications.

In STRICT mode, during writing a new file, the system will return error (ENOSPC) in case of no space available on servers marked with labels specified for chunk creation. It will still wait for overloaded servers. Undergoal repliactions will not be performed if there is no space on servers with labels matching the storage class. This means high risk of losing data if servers with some labels are permamently filled up with data!

In LOOSE mode the system will immediately use other servers in case of overloaded servers or no space on servers and will replicate data to correct servers when it becomes possible. There is no delay or error on file creation and undergoal replications are always done as soon as possible.

This table sums up the modes behaviour for chunks stored in copy format:

DEFAULT STRICT LOOSE
CREATE - BUSY WAIT WAIT WRITE ANY
CREATE - NO SPACE WRITE ANY ENOSPC WRITE ANY
REPLICATE - BUSY WAIT WAIT WRITE ANY
REPLICATE - NO SPACE WRITE ANY NO COPY WRITE ANY

For chunks stored in EC format the 3 modes behave as follows:

In general, chunks will only be converted from copy format to EC format if there are enough servers in the system to safely store all the parts of the EC format. For EC @N+X format, where N is number of data parts and can be either 4 or 8 and X is number of parity/checksum parts and can be equal to 1 (CE version) or any number from 1 to 9 (PRO version), the general requirements are:
- at least N+2X chunk servers to convert new chunks from copy format to EC format
- at least N+X chunk servers to keep chunks that are already in EC format still in this format
- if there are less than N+X servers, all chunks will revert to copy (KEEP definition) format.

In LOOSE mode the system will try to use first the servers matching the label expression defined in the used storage class, but if not enough servers with "correct" labels are available (because they are busy or have no space or are just not defined), it will use any available chunk servers regardless of label; so the N+2X and N+X are calculated from all available chunk servers when the system decides what format to use to keep a chunk. Also, when one part of a chunk in EC format becomes unavailable or corrupted, restoration of such part will also be done to any available server, if a server with "correct" labels cannot currently be used.

It's important to remember that if not enough servers with "correct" labels are available for a chunk in LOOSE mode, the system may use however many it wants of the "other" chunk servers, not just the minimal amount that is missing from the "correct" number of servers.

In STRICT mode the system will only use the servers matching the label expression defined in the used storage class, so only available or short-term busy servers matching defined label expression will be used for calculation of N+2X and N+X when the system decides what format to use to keep a chunk. When one part of a chunk in EC format becomes unavailable or corrupted, restoration of such part can only be done to a server with "correct" label; if such a server is unavailable long term (i.e. is not available outright or only temporarily busy), this will automatically mean that the chunk needs to be reverted to keep format anyway (if the missing part is a parity/checksum part, the chunk will just revert to copy format using all available data parts, if a data part is missing, it will be restored to a chunk server hosting another part of the same chunk - which is not allowed under normal circumstances - and then the conversion to copy format will follow immediately).

In DEFAULT mode the system will behave like in STRICT mode when it needs to make a decision whether it will convert a new chunk from copy format to EC format, that is the N+2X in this step is calculated only from "correctly" labeled servers. But to make a decision whether existing chunks need to be converted back from EC format to copy format it will look at all available servers, regardless of labels, so the N+X in this step is calculated from all available servers, like in LOOSE mode. X. In case of missing parts, if it's not possible to restore them to chunk servers with "correct" labels, the system will also adapt the LOOSE mode behaviour and try to use any available servers.

Notice! When a chunk is converted from copy format to EC format, the system first performs a "local split" operation, that is it picks one copy of the chunk and calculates all EC parts necessary on the server occupied by this selected copy. Then these parts are moved to separate chunkservers, matching the labels in the storage class definition for used EC mode. But temporarily, between the split and the "moving out" of the parts, they can be recorded on a "wrong" chunk server even in STRICT mode. This is because of the mechanics of the "local split" operation.

ARCHIVE BEHAVIOUR

Chunks have archive flag set during file maintenance loop, which means that the time to archiving defined by -d option is the minimum time that has to pass before the flag is set, not the exact time.

Default behaviour of the system is that once a chunk has the archive bit set on, it IS NOT switched off even if atime/ctime/mtime changes, unless R flag is set by option -o. Writing to a chunk will always switch its archive flag off.

Archive flags:

C - use file's ctime to determine if archive flag should be set on - this is the default flag

M - use file's mtime to determine if archive flag should be set on

A - use file's atime to determine if archive flag should be set on

R - reversible, if atime/mtime/ctime changes for a file, system verifies if archive flag should be turned off for its chunks

F - fastmode, chunk has archive flag set to on as soon as possible, whatever is defined with -d option is disregarded

P - "per chunk" mode, use chunk's mtime to determine if archive flag should be set on

Archive flag can be modified manually. See mfsarchive(1)

PREDEFINED STORAGE CLASSES

For compatibility reasons, every fresh or freshly upgraded instance of MooseFS has 9 predefined storage classes. Their names are single digits, from 1 to 9, and their definitions are * to 9*. They are equivalents of simple numeric goals from previous versions of the system. In case of an upgrade, all files that had goal N before upgrade, will now have N storage class. These classes can be modified only when option -f is specified. It is advised to create new storage classes in an upgraded system and migrate files with mfsxchgsclass tool, rather than modify the predefined classes. The predefined classes CANNOT be deleted.

REPORTING BUGS

Report bugs to <bugs@moosefs.com>.

COPYRIGHT

Copyright (C) 2024 Jakub Kruszona-Zawadzki, Saglabs SA

This file is part of MooseFS.

MooseFS is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2 (only).

MooseFS is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with MooseFS; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02111-1301, USA or visit http://www.gnu.org/licenses/gpl-2.0.html

SEE ALSO

mfsmount(8), mfstools(1), mfssclass(1), mfsarchive(1), mfsmaster.cfg(5), mfschunkserver.cfg(5), mfstopology.cfg(5)

September 2024 MooseFS 4.56.6-1