STILTS-TGRIDMAP(1) | Stilts commands | STILTS-TGRIDMAP(1) |
NAME¶
stilts-tgridmap - Calculates N-dimensional density maps
SYNOPSIS¶
stilts tgridmap [ifmt=<in-format>] [istream=true|false] [in=<table>] [icmd=<cmds>] [ocmd=<cmds>] [omode=out|meta|stats|count|checksum|cgi|discard|topcat|samp|tosql|gui] [out=<out-table>] [ofmt=<out-format>] [coords=<expr> ...] [logs=true|false ...] [bounds=[<lo>]:[<hi>] ...] [binsizes=<size> ...] [nbins=<num> ...] [cols=<expr>[;<combiner>[;<name>]] ...] [combine=sum|sum-per-unit|count|count-per-unit|mean|median|Q1|Q3|min|max|stdev|stdev_pop|hit] [sparse=true|false] [runner=sequential|parallel|parallel<n>|partest]
DESCRIPTION¶
tgridmap scans an input table to create one or more N-dimensional density maps, or equivalently N-dimensional histograms, of the values in an input table, and outputs the result as an, optionally sparse, table containing a row for each grid cell. The maps/histograms can optionally be weighted by some quantity from the input table, and various options such as summing, averaging and counting are available for aggregation of inputs into the output bins.
The supplied coords parameter defines which N numeric columns of the input table form the coordinates of the bin grid, and the cols parameter defines which quantities are aggregated into each bin. Either the binsizes or nbins parameter must be supplied to define the extents of the bins on each axis. The output table contains a row for each bin, with columns giving the central (and upper/lower bound) values of each grid coordinate, and a column for each aggregated value. The rows are output in first-coordinate-slowest sequence, and the sparse parameter determines whether a row is written for every cell in the hypercube defined by the grid dimensions, or only for those cells with non-blank data.
The tabular form of the output may not be the most appropriate or compact way to write a density map, especially for multi-dimensional grids, but it means the output can be manipulated later by other STILTS commands or by TOPCAT. To do a similar job with more compact output, see tcube. See also tskymap, which does the same thing for sky geometry (and is probably a better choice if you find yourself accumulating onto a longitude-latitude grid).
OPTIONS¶
- A filename.
- A URL.
- The special value "-", meaning standard input. In this case the input format must be given explicitly using the ifmt parameter. Note that not all formats can be streamed in this way.
- A scheme specification of the form :<scheme-name>:<scheme-args>.
- A system command line with either a "<" character at the start, or a "|" character at the end ("<syscmd" or "syscmd|"). This executes the given pipeline and reads from its standard output. This will probably only work on unix-like systems.
In any case, compressed data in one of the supported compression formats (gzip, Unix compress or bzip2) will be decompressed transparently.
Commands may alternatively be supplied in an external file, by using the indirection character '@'. Thus a value of "@filename" causes the file filename to be read for a list of filter commands to execute. The commands in the file may be separated by newline characters and/or semicolons, and lines which are blank or which start with a '#' character are ignored. A backslash character '\fR' at the end of a line joins it with the following line.
Commands may alternatively be supplied in an external file, by using the indirection character '@'. Thus a value of "@filename" causes the file filename to be read for a list of filter commands to execute. The commands in the file may be separated by newline characters and/or semicolons, and lines which are blank or which start with a '#' character are ignored. A backslash character '\fR' at the end of a line joins it with the following line.
Possible values are
- out
- meta
- stats
- count
- checksum
- cgi
- discard
- topcat
- samp
- tosql
- gui
Use the help=omode flag or see SUN/256 for more information.
This parameter must only be given if omode has its default value of "out".
This parameter must only be given if omode has its default value of "out".
If supplied, this parameter must have the same number of words as the coords parameter.
If any of the bounds need to be determined automatically in this way, two passes through the data will be required, the first to determine bounds and the second to calculate the map.
If supplied, this parameter must have the same number of words as the coords parameter.
If supplied, this parameter must have the same number of words as the coords parameter.
If supplied, this parameter must have the same number of words as the coords parameter.
Each item is composed of one, two or three tokens, separated by semicolon (";") characters:
- <expr>: (required) column name or expression using the expression language for the quantity to be aggregated.
- <combiner>: (optional) combination method, using the same options as for the combine parameter. If omitted, the value specified for that parameter will be used.
- <name>: (optional) name of output column; if omitted, the <expr> value (perhaps somewhat sanitised) will be used.
It is often sufficient just to supply a space-separated list of input table column names for this parameter, but the additional syntax may be required for instance if it's required to calculate both a sum and mean of the same input column.
The default value is "1;count;COUNT" which simply provides an unweighted histogram, i.e. a count of the rows in each bin (aggregation of the value "1" using the combination method "count", yielding an output column named "COUNT").
- sum: the sum of all the combined values per bin
- sum-per-unit: the sum of all the combined values per unit of bin size
- count: the number of non-blank values per bin (weight is ignored)
- count-per-unit: the number of non-blank values per unit of bin size (weight is ignored)
- mean: the mean of the combined values
- median: the median
- Q1: first quartile
- Q3: third quartile
- min: the minimum of all the combined values
- max: the maximum of all the combined values
- stdev: the sample standard deviation of the combined values
- stdev_pop: the population standard deviation of the combined values
- hit: 1 if any values present, NaN otherwise (weight is ignored)
- Q.nnn: quantile nnn (e.g. Q.05 is the fifth percentile)
Note this value may be overridden on a per-column basis by the cols parameter.
- sequential: runs using only a single thread
- parallel: runs using multiple threads for large tables, with parallelism given by the number of available processors
- parallel<n>: runs using multiple threads for large tables, with parallelism given by the supplied value <n>
- partest: runs using multiple threads even when tables are small (only intended for testing purposes)
Using parallel processing can speed up execution considerably; however, depending on the I/O operations required, it can also slow it down by disrupting patterns of disk access. If the content of a file is on a solid state disk, or is already in cache for instance because a similar command has been run recently, then parallel will probably be faster. However, if the data is being read directly from a spinning disk, for instance because the file is too large to fit in RAM, then sequential or parallel<n> with a small <n> may be faster.
The value of this parameter should make only very tiny differences to the output table. If you notice significant discrepancies please report them.
SEE ALSO¶
If the package stilts-doc is installed, the full documentation
SUN/256 is available in HTML format:
file:///usr/share/doc/stilts/sun256/index.html
VERSION¶
STILTS version 3.5-debian
This is the Debian version of Stilts, which lack the support of
some file formats and network protocols. For differences see
file:///usr/share/doc/stilts/README.Debian
AUTHOR¶
Mark Taylor (Bristol University)
Mar 2017 |