Scroll to navigation

MTBL_FILESET(3)   MTBL_FILESET(3)

NAME

mtbl_fileset - automatic multiple MTBL data file merger

SYNOPSIS

#include <mtbl.h>

Fileset objects:

struct mtbl_fileset *
mtbl_fileset_init(const char *fname, const struct mtbl_fileset_options *fopt);

struct mtbl_fileset *
mtbl_fileset_dup(struct mtbl_fileset *orig, const struct mtbl_fileset_options *fopt);

void
mtbl_fileset_destroy(struct mtbl_fileset **f);

void
mtbl_fileset_reload(struct mtbl_fileset *f);

void
mtbl_fileset_reload_now(struct mtbl_fileset *f);

const struct mtbl_source *
mtbl_fileset_source(struct mtbl_fileset *f);

void
mtbl_fileset_partition(struct mtbl_fileset *f,

mtbl_filename_filter_func cb,
void *clos,
struct mtbl_merger **m1,
struct mtbl_merger **m2);

Fileset options:

struct mtbl_fileset_options *
mtbl_fileset_options_init(void);

void
mtbl_fileset_options_destroy(struct mtbl_fileset_options **fopt);

void
mtbl_fileset_options_set_merge_func(

struct mtbl_fileset_options *fopt,
mtbl_merge_func fp,
void *clos);

void
mtbl_fileset_options_set_dupsort_func(

struct mtbl_fileset_options *fopt,
mtbl_dupsort_func fp,
void *clos);

void
mtbl_fileset_options_set_filename_filter_func(

struct mtbl_fileset_options *fopt,
mtbl_filename_filter_func fp,
void *clos);

void
mtbl_fileset_options_set_reload_interval(

struct mtbl_fileset_options *fopt,
uint32_t reload_interval);

DESCRIPTION

The mtbl_fileset is a convenience interface for automatically maintaining a merged view of a set of MTBL data files. The merged entries may be consumed via the mtbl_source(3) and mtbl_iter(3) interfaces.

mtbl_fileset objects are initialized from a "setfile", which specifies a list of filenames of MTBL data files, one per line. For each filename, if it is not an absolute path, then the directory name of the fileset file’s pathname is prepended to it. Examples: If the setfile path is "/export/dnstable/mtbl/dns.fileset", and a line in it is of the form "dns.2017.Y.mtbl" then dnstable will try to open "/export/dnstable/mtbl/dns.2017.Y.mtbl". If the setfile path is "/export/dnstable/mtbl/dns.fileset", and a line in it is of the form "/tmp/dns.2017.Y.mtbl" then dnstable will try to open "/tmp/dns.2017.Y.mtbl".

Internally, an mtbl_reader object is initialized from each filename and added to an mtbl_merger object. The setfile is watched for changes and the addition or removal of filenames from the setfile will result in the corresponding addition or removal of mtbl_reader objects.

Because the MTBL format does not allow duplicate keys, the caller must provide a function which will accept a key and two conflicting values for that key and return a replacement value. This function may be called multiple times for the same key if the same key is inserted more than twice. See mtbl_merger(3) for further details about the merge function.

mtbl_fileset objects are created with the mtbl_fileset_init() function, which requires the path to a "setfile", fname, and a non-NULL fopt argument which has been configured with a merge function fp. mtbl_fileset_source() should then be called in order to consume output via the mtbl_source(3) interface.

The mtbl_fileset_dup() function, which requires a mtbl_fileset object and a non-NULL opt argument, creates a duplicate mtbl_fileset object that shares the underlying fileset, but with different fileset options, as specified.

Accesses via the mtbl_source(3) interface will implicitly check for updates to the setfile. However, it may be necessary to explicitly call the mtbl_fileset_reload() function in order to check for updates, especially if files are being removed from the setfile and the mtbl_source is infrequently accessed.

The mtbl_fileset_reload() function avoids checking for updates more frequently than every reload_interval seconds. If reload_interval is set to MTBL_FILESET_RELOAD_INTERVAL_NEVER, then mtbl_fileset_reload() function will only load the fileset once. The mtbl_fileset_reload_now() function can be called to bypass the reload_interval check.

The mtbl_fileset_partition() function yields two struct mtbl_merger objects that are split based on the output of a callback. The caller is responsible for calling mtbl_merger_destroy() on each of these mergers. Calls to mtbl_source_*() on the fileset’s source object, and calls to mtbl_fileset_reload() and mtbl_fileset_reload_now() may leave these mergers in an inconsistent state.

Fileset options

merge_func

See mtbl_merger(3). An mtbl_merger object is used internally for the external sort.

dupsort_func

See mtbl_merger(3). Used to sort the entries with duplicate keys during the merge process based on their data.

fname_filter_func

Used to filter specific files by name from a fileset. If the function returns false, the file’s data will not be included in the results returned by any iterators on the fileset.

reload_interval

Specifies the interval between checks for updates to the setfile, in seconds. Defaults to 60 seconds. MTBL_FILESET_RELOAD_INTERVAL_NEVER is a special value that indicates to never reload the fileset.

03/27/2019