table of contents
| SETOP(1) | User Commands | SETOP(1) |
NAME¶
setop - make set of strings from input
DESCRIPTION¶
Apply set operations like union, intersection, or set difference to input files and print resulting set (sorted and with unique string elements) to standard output or give answer to special queries like number of elements.
Usage: setop [input-stream]* [-C] [--include-empty] [--input-separator <value>] [--input-element <value>] [--trim <value>] [--combine (union|intersection|symmetric-difference|formula <value>)] [--subtract [input-stream]*] [--output (set|count|is-empty|contains <value>|equals <input-stream>|has-subset <input-stream>|has-superset <input-stream>)] [--output-separator <value>] [--quiet]
OPTIONS¶
- --help
- produce this help message and exit
- --version
- output name and version
- -C [ --ignore-case ]
- handle input elements case-insensitive
- --include-empty
- don’t ignore empty elements (these can come from empty lines, trimming, etc.)
- -n [ --input-separator ] arg
- describe the form of an input separator as regular expression in ECMAScript syntax; defaults to new line only if --input-element is not present; i. e. don’t forget to include the new line character \n when you set the input element manually, when desired!
- -l [ --input-element ] arg
- describe the form of input elements as regular expression in ECMAScript syntax
- -o [ --output-separator ] arg (=\n) string for separating output elements;
- escape sequences are allowed
- -t [ --trim ] arg
- trim all given characters at beginning and end of elements (escape sequences allowed)
- --combine arg
- define combination operation applied to given input streams; possible parameters are 'union' (default), 'intersection', 'symmetric-difference', and 'formula <value>'
- --subtract arg
- subtract all elements of all given streams from output set
- --output arg
- whether to output determined set or a certain information of this set instead; possible parameters are: set (default): stream all output elements count: just output number of (different) elements, don’t list them is-empty: check if resulting set is empty contains <element-string>: check if given element is contained in set equals <input-stream>: check set equality, i. e. check if output corresponds with content of <input-stream> has-subset <input-stream>: check if content of <input-stream> is subset of output set has-superset <input-stream>: check if content of <input-stream> is superset of output set
- --quiet
- suppress all output messages in case of special queries (e. g. when check if element is contained in set)
No input filename or '-' is equal to reading from standard input. When an input stream occurs multiple times in the calling command, it is read only once and cached.
The sequence of events of setop is as follows: At first, all input files are parsed and combined according to the --combine option. After that, all inputs from option --subtract are parsed and removed from result of first step. Finally, the desired output given from option --output is printed to screen: the set itself, or its number of elements, or a comparison to another set etc.
The combination type 'formula' needs an additional string mainly consisting of positive integers and combining operators. The integers 1, 2, etc. represent the first, second, etc. input stream; possible operators are '&' for intersection, '^' for symmetric difference, '|' for union, and '-' for set difference. At the moment, they are prioritized in that order (i. e. &, ^, |, -), but this may change in the future. Use brackets for being explicit about the order of evaluation. Example: --combine formula '(1 | 2) & 3' unites first and second input stream and intersects the result with third input stream.
By default each line of an input stream is considered to be an element, you can change this by defining regular expressions within the options --input-separator or --input-element. When using both, the input stream is first split according to the separator and after that filtered by the desired input element form. After finding the elements they are finally trimmed according to the argument given with --trim. The option --ignore-case lets you treat Word and WORD equal, only the first occurrence of all input streams is considered. Note that --ignore-case does not affect the regular expressions used in --input-separator and --input-element.
When describing strings and characters for the output separator or for the option --trim you can use escape sequences like \t, \n, \" and \'. But be aware that some of these sequences (especially \\ and \") might be interpreted by your shell before passing the string to setop. In that case you have to use \\\\ respectively \\\" just for describing a \ or a ". You can check your shell’s behavior with echo "\\ and \""
Special boolean queries (e. g. check if element is contained in set) return exit code EXIT_SUCCESS (= 0) when the answer is 'yes' and otherwise (e. g. element not contained in set) an exit code that is guaranteed to be unequal to EXIT_SUCCESS and to EXIT_FAILURE (= 1). This way, setop can be used in the shell. With option --quiet a verbose result message is omitted for boolean queries.
EXAMPLES¶
setop A.txt --subtract B.txt --output contains ":fooBAR-:" --trim ":-\t" --ignore-case
- case-insensitive check if element 'foobar' is contained in A minus B
setop A.txt - B.txt --combine intersection --input-element "\d+"
- output intersection of console, A, and B, where elements are recognized as strings of digits with at least one character; i. e. elements are non-negative integers
setop A.txt B.txt --combine symmetric-difference --input-separator [[:space:]-]
- find all elements contained in either A or B, not both, where a whitespace (i. e. \v \t \n \r \f or space) or a minus is interpreted as a separator between elements
For bug reports, use the issue tracker at https://github.com/phisigma/setop.
| April 2026 | setop 0.2 |