.TH file_sorter 3erl "stdlib 4.3.1.3" "Ericsson AB" "Erlang Module Definition" .SH NAME file_sorter \- File sorter. .SH DESCRIPTION .LP This module contains functions for sorting terms on files, merging already sorted files, and checking files for sortedness\&. Chunks containing binary terms are read from a sequence of files, sorted internally in memory and written on temporary files, which are merged producing one sorted file as output\&. Merging is provided as an optimization; it is faster when the files are already sorted, but it always works to sort instead of merge\&. .LP On a file, a term is represented by a header and a binary\&. Two options define the format of terms on files: .RS 2 .TP 2 .B \fI{header, HeaderLength}\fR\&: \fIHeaderLength\fR\& determines the number of bytes preceding each binary and containing the length of the binary in bytes\&. Defaults to 4\&. The order of the header bytes is defined as follows: if \fIB\fR\& is a binary containing a header only, size \fISize\fR\& of the binary is calculated as \fI<> = B\fR\&\&. .TP 2 .B \fI{format, Format}\fR\&: Option \fIFormat\fR\& determines the function that is applied to binaries to create the terms to be sorted\&. Defaults to \fIbinary_term\fR\&, which is equivalent to \fIfun binary_to_term/1\fR\&\&. Value \fIbinary\fR\& is equivalent to \fIfun(X) -> X end\fR\&, which means that the binaries are sorted as they are\&. This is the fastest format\&. If \fIFormat\fR\& is \fIterm\fR\&, \fIio:read/2\fR\& is called to read terms\&. In that case, only the default value of option \fIheader\fR\& is allowed\&. .RS 2 .LP Option \fIformat\fR\& also determines what is written to the sorted output file: if \fIFormat\fR\& is \fIterm\fR\&, then \fIio:format/3\fR\& is called to write each term, otherwise the binary prefixed by a header is written\&. Notice that the binary written is the same binary that was read; the results of applying function \fIFormat\fR\& are thrown away when the terms have been sorted\&. Reading and writing terms using the \fIio\fR\& module is much slower than reading and writing binaries\&. .RE .RE .LP Other options are: .RS 2 .TP 2 .B \fI{order, Order}\fR\&: The default is to sort terms in ascending order, but that can be changed by value \fIdescending\fR\& or by specifying an ordering function \fIFun\fR\&\&. An ordering function is antisymmetric, transitive, and total\&. \fIFun(A, B)\fR\& is to return \fItrue\fR\& if \fIA\fR\& comes before \fIB\fR\& in the ordering, otherwise \fIfalse\fR\&\&. An example of a typical ordering function is less than or equal to, \fI= {ok, _} = disk_log:open([{name,Log}, {mode,read_only}]), Input = input(Log, start), Output = output([]), Reply = file_sorter:sort(Input, Output, {format,term}), ok = disk_log:close(Log), Reply. input(Log, Cont) -> fun(close) -> ok; (read) -> case disk_log:chunk(Log, Cont) of {error, Reason} -> {error, Reason}; {Cont2, Terms} -> {Terms, input(Log, Cont2)}; {Cont2, Terms, _Badbytes} -> {Terms, input(Log, Cont2)}; eof -> end_of_input end end. output(L) -> fun(close) -> lists:append(lists:reverse(L)); (Terms) -> output([Terms | L]) end. .fi .LP For more examples of functions as input and output, see the end of the \fIfile_sorter\fR\& module; the \fIterm\fR\& format is implemented with functions\&. .LP The possible values of \fIReason\fR\& returned when an error occurs are: .RS 2 .TP 2 * \fIbad_object\fR\&, \fI{bad_object, FileName}\fR\& - Applying the format function failed for some binary, or the key(s) could not be extracted from some term\&. .LP .TP 2 * \fI{bad_term, FileName}\fR\& - \fIio:read/2\fR\& failed to read some term\&. .LP .TP 2 * \fI{file_error, FileName, file:posix()}\fR\& - For an explanation of \fIfile:posix()\fR\&, see \fIfile(3erl)\fR\&\&. .LP .TP 2 * \fI{premature_eof, FileName}\fR\& - End-of-file was encountered inside some binary term\&. .LP .RE .SH DATA TYPES .nf \fBfile_name()\fR\& = file:name() .br .fi .nf \fBfile_names()\fR\& = [file:name()] .br .fi .nf \fBi_command()\fR\& = read | close .br .fi .nf \fBi_reply()\fR\& = .br end_of_input | .br {end_of_input, value()} | .br {[object()], infun()} | .br input_reply() .br .fi .nf \fBinfun()\fR\& = fun((i_command()) -> i_reply()) .br .fi .nf \fBinput()\fR\& = file_names() | infun() .br .fi .nf \fBinput_reply()\fR\& = term() .br .fi .nf \fBo_command()\fR\& = {value, value()} | [object()] | close .br .fi .nf \fBo_reply()\fR\& = outfun() | output_reply() .br .fi .nf \fBobject()\fR\& = term() | binary() .br .fi .nf \fBoutfun()\fR\& = fun((o_command()) -> o_reply()) .br .fi .nf \fBoutput()\fR\& = file_name() | outfun() .br .fi .nf \fBoutput_reply()\fR\& = term() .br .fi .nf \fBvalue()\fR\& = term() .br .fi .nf \fBoptions()\fR\& = [option()] | option() .br .fi .nf \fBoption()\fR\& = .br {compressed, boolean()} | .br {header, header_length()} | .br {format, format()} | .br {no_files, no_files()} | .br {order, order()} | .br {size, size()} | .br {tmpdir, tmp_directory()} | .br {unique, boolean()} .br .fi .nf \fBformat()\fR\& = binary_term | term | binary | format_fun() .br .fi .nf \fBformat_fun()\fR\& = fun((binary()) -> term()) .br .fi .nf \fBheader_length()\fR\& = integer() >= 1 .br .fi .nf \fBkey_pos()\fR\& = integer() >= 1 | [integer() >= 1] .br .fi .nf \fBno_files()\fR\& = integer() >= 1 .br .fi .nf \fBorder()\fR\& = ascending | descending | order_fun() .br .fi .nf \fBorder_fun()\fR\& = fun((term(), term()) -> boolean()) .br .fi .nf \fBsize()\fR\& = integer() >= 0 .br .fi .nf \fBtmp_directory()\fR\& = [] | file:name() .br .fi .nf \fBreason()\fR\& = .br bad_object | .br {bad_object, file_name()} | .br {bad_term, file_name()} | .br {file_error, .br file_name(), .br file:posix() | badarg | system_limit} | .br {premature_eof, file_name()} .br .fi .SH EXPORTS .LP .nf .B check(FileName) -> Reply .br .fi .br .nf .B check(FileNames, Options) -> Reply .br .fi .br .RS .LP Types: .RS 3 FileNames = file_names() .br Options = options() .br Reply = {ok, [Result]} | {error, reason()} .br Result = {FileName, TermPosition, term()} .br FileName = file_name() .br TermPosition = integer() >= 1 .br .RE .RE .RS .LP Checks files for sortedness\&. If a file is not sorted, the first out-of-order element is returned\&. The first term on a file has position 1\&. .LP \fIcheck(FileName)\fR\& is equivalent to \fIcheck([FileName], [])\fR\&\&. .RE .LP .nf .B keycheck(KeyPos, FileName) -> Reply .br .fi .br .nf .B keycheck(KeyPos, FileNames, Options) -> Reply .br .fi .br .RS .LP Types: .RS 3 KeyPos = key_pos() .br FileNames = file_names() .br Options = options() .br Reply = {ok, [Result]} | {error, reason()} .br Result = {FileName, TermPosition, term()} .br FileName = file_name() .br TermPosition = integer() >= 1 .br .RE .RE .RS .LP Checks files for sortedness\&. If a file is not sorted, the first out-of-order element is returned\&. The first term on a file has position 1\&. .LP \fIkeycheck(KeyPos, FileName)\fR\& is equivalent to \fIkeycheck(KeyPos, [FileName], [])\fR\&\&. .RE .LP .nf .B keymerge(KeyPos, FileNames, Output) -> Reply .br .fi .br .nf .B keymerge(KeyPos, FileNames, Output, Options) -> Reply .br .fi .br .RS .LP Types: .RS 3 KeyPos = key_pos() .br FileNames = file_names() .br Output = output() .br Options = options() .br Reply = ok | {error, reason()} | output_reply() .br .RE .RE .RS .LP Merges tuples on files\&. Each input file is assumed to be sorted on key(s)\&. .LP \fIkeymerge(KeyPos, FileNames, Output)\fR\& is equivalent to \fIkeymerge(KeyPos, FileNames, Output, [])\fR\&\&. .RE .LP .nf .B keysort(KeyPos, FileName) -> Reply .br .fi .br .RS .LP Types: .RS 3 KeyPos = key_pos() .br FileName = file_name() .br Reply = ok | {error, reason()} | input_reply() | output_reply() .br .RE .RE .RS .LP Sorts tuples on files\&. .LP \fIkeysort(N, FileName)\fR\& is equivalent to \fIkeysort(N, [FileName], FileName)\fR\&\&. .RE .LP .nf .B keysort(KeyPos, Input, Output) -> Reply .br .fi .br .nf .B keysort(KeyPos, Input, Output, Options) -> Reply .br .fi .br .RS .LP Types: .RS 3 KeyPos = key_pos() .br Input = input() .br Output = output() .br Options = options() .br Reply = ok | {error, reason()} | input_reply() | output_reply() .br .RE .RE .RS .LP Sorts tuples on files\&. The sort is performed on the element(s) mentioned in \fIKeyPos\fR\&\&. If two tuples compare equal (\fI==\fR\&) on one element, the next element according to \fIKeyPos\fR\& is compared\&. The sort is stable\&. .LP \fIkeysort(N, Input, Output)\fR\& is equivalent to \fIkeysort(N, Input, Output, [])\fR\&\&. .RE .LP .nf .B merge(FileNames, Output) -> Reply .br .fi .br .nf .B merge(FileNames, Output, Options) -> Reply .br .fi .br .RS .LP Types: .RS 3 FileNames = file_names() .br Output = output() .br Options = options() .br Reply = ok | {error, reason()} | output_reply() .br .RE .RE .RS .LP Merges terms on files\&. Each input file is assumed to be sorted\&. .LP \fImerge(FileNames, Output)\fR\& is equivalent to \fImerge(FileNames, Output, [])\fR\&\&. .RE .LP .nf .B sort(FileName) -> Reply .br .fi .br .RS .LP Types: .RS 3 FileName = file_name() .br Reply = ok | {error, reason()} | input_reply() | output_reply() .br .RE .RE .RS .LP Sorts terms on files\&. .LP \fIsort(FileName)\fR\& is equivalent to \fIsort([FileName], FileName)\fR\&\&. .RE .LP .nf .B sort(Input, Output) -> Reply .br .fi .br .nf .B sort(Input, Output, Options) -> Reply .br .fi .br .RS .LP Types: .RS 3 Input = input() .br Output = output() .br Options = options() .br Reply = ok | {error, reason()} | input_reply() | output_reply() .br .RE .RE .RS .LP Sorts terms on files\&. .LP \fIsort(Input, Output)\fR\& is equivalent to \fIsort(Input, Output, [])\fR\&\&. .RE