Scroll to navigation

OMINDEX(1) User Commands OMINDEX(1)

NAME

omindex - Index static website data via the filesystem

SYNOPSIS

omindex [OPTIONS] --db DATABASE [BASEDIR] DIRECTORY

DESCRIPTION

omindex - Index static website data via the filesystem

DIRECTORY is the directory to start indexing from.

BASEDIR is the directory corresponding to URL (default: DIRECTORY).

OPTIONS

set duplicate handling: ARG can be 'ignore' or 'replace' (default: replace)
skip the deletion of documents corresponding to deleted files
how to handle documents we extract no text from: ARG can be index, warn (issue a diagnostic and index), or skip. (default: warn)
path to database to use
base url BASEDIR corresponds to (default: /)
assume any file with extension EXT has MIME Content-Type TYPE, instead of using libmagic (empty TYPE removes any existing mapping for EXT; other special TYPE values: 'ignore' and 'skip')
assume any file with leaf name matching shell wildcard pattern GLOB has MIME Content-Type TYPE (special TYPE values: 'ignore' and 'skip')
process files with MIME Content-Type M using command CMD, which produces output (on stdout or in a temporary file) with format T (Content-Type or file extension; currently txt (default), html or svg) in character encoding C (default: UTF-8). E.g. -Fapplication/octet-stream:'|strings -n8' or -Ftext/x-foo,,utf-16:'foo2utf16 %f %t'
process files with MIME Content-Type TYPE using worker sub-process WORKER. WORKER is the name of the program to run to start the worker. If it has no path then it's looked for in pkglibbindir (which can be overridden by setting environment variable XAPIAN_OMEGA_PKGLIBBINDIR). This invocation will look in: /usr/local/lib/xapian-omega/bin
bulk-load --filter arguments from FILE, which should contain one such argument per line (e.g. text/x-bar:bar2txt --utf8). Lines starting with # are treated as comments and ignored.
bulk-load --worker arguments from FILE, which should contain one such argument per line (e.g. text/x-bar:omindex_libbar). Lines starting with # are treated as comments and ignored.
set recursion limit (0 = unlimited)
follow symbolic links
ignore meta robots tags and similar exclusions
index data for spelling correction
maximum size of file to index (in bytes or with a suffix of 'K'/'k', 'M'/'m', 'G'/'g') (default: unlimited)
what to use for the stored sample of text for HTML documents - SOURCE can be 'body' or 'description' (default: 'body')
maximum size for the document text sample (supports the same formats as --max-size). (default: 512)
maximum size for the document title (supports the same formats as --max-size). (default: 128)
retry files which omindex failed to extract text from on a previous run
sleep for SECS seconds before opening each directory - sleeping for 2 seconds seems to reliably work around problems with indexing files on Microsoft DFS shares.
track each file's ctime so we can detect changes to ownership or permissions.
index D, M and Y prefixed terms to support date range filtering using terms (we now recommend using a value slot for this instead).
ignored for compatibility with Omega 1.4.x.
show more information about what is happening
create the database anew (the default is to update if the database already exists)
set the stemming language (default: english). Possible values: arabic armenian basque catalan danish dutch dutch_porter earlyenglish english esperanto estonian finnish french german greek hindi hungarian indonesian irish italian lithuanian lovins nepali norwegian polish porter portuguese romanian russian serbian spanish swedish tamil turkish yiddish (pass 'none' to disable stemming)
display this help and exit
output version information and exit

Please report bugs at: https://xapian.org/bugs

March 2026 xapian-omega 2.0.0