NAME¶
funcalc - Funtools calculator (for binary tables)
SYNOPSIS¶
funcalc [\-n] [\-a argstr] [\-e expr] [\-f file] [\-l link] [\-p prog]
<iname> [oname [columns]]
OPTIONS¶
-a argstr # user arguments to pass to the compiled program
-e expr # funcalc expression
-f file # file containing funcalc expression
-l libs # libs to add to link command
-n # output generated code instead of compiling and executing
-p prog # generate named program, no execution
-u # die if any variable is undeclared (don't auto-declare)
DESCRIPTION¶
funcalc is a calculator program that allows arbitrary expressions to be
constructed, compiled, and executed on columns in a Funtools table (FITS
binary table or raw event file). It works by integrating user-supplied
expression(s) into a template C program, then compiling and executing the
program.
funcalc expressions are C statements, although some important
simplifications (such as automatic declaration of variables) are supported.
funcalc expressions can be specified in three ways: on the command line
using the
\-e [expression] switch, in a file using the
\-f
[file] switch, or from stdin (if neither
\-e nor
\-f is
specified). Of course a file containing
funcalc expressions can be read
from stdin.
Each invocation of
funcalc requires an input Funtools table file to be
specified as the first command line argument. The output Funtools table file
is the second optional argument. It is needed only if an output FITS file is
being created (i.e., in cases where the
funcalc expression only prints
values, no output file is needed). If input and output file are both
specified, a third optional argument can specify the list of columns to
activate (using
FunColumnActivate()). Note that
funcalc
determines whether or not to generate code for writing an output file based on
the presence or absence of an output file argument.
A
funcalc expression executes on each row of a table and consists of one
or more C statements that operate on the columns of that row (possibly using
temporary variables). Within an expression, reference is made to a column of
the
current row using the C struct syntax
cur-[colname]>,
e.g. cur->x, cur->pha, etc. Local scalar variables can be defined using
C declarations at very the beginning of the expression, or else they can be
defined automatically by
funcalc (to be of type double). Thus, for
example, a swap of columns x and y in a table can be performed using either of
the following equivalent
funcalc expressions:
double temp;
temp = cur->x;
cur->x = cur->y;
cur->y = temp;
or:
temp = cur->x;
cur->x = cur->y;
cur->y = temp;
When this expression is executed using a command such as:
funcalc -f swap.expr itest.ev otest.ev
the resulting file will have values of the x and y columns swapped.
By default, the data type of the variable for a column is the same as the data
type of the column as stored in the file. This can be changed by appending
":[dtype]" to the first reference to that column. In the example
above, to force x and y to be output as doubles, specify the type 'D'
explicitly:
temp = cur->x:D;
cur->x = cur->y:D;
cur->y = temp;
Data type specifiers follow standard FITS table syntax for defining columns
using TFORM:
- •
- A: ASCII characters
- •
- B: unsigned 8\-bit char
- •
- I: signed 16\-bit int
- •
- U: unsigned 16\-bit int (not standard FITS)
- •
- J: signed 32\-bit int
- •
- V: unsigned 32\-bit int (not standard FITS)
- •
- E: 32\-bit float
- •
- D: 64\-bit float
- •
- X: bits (treated as an array of chars)
Note that only the first reference to a column should contain the explicit data
type specifier.
Of course, it is important to handle the data type of the columns correctly. One
of the most frequent cause of error in
funcalc programming is the
implicit use of the wrong data type for a column in expression. For example,
the calculation:
dx = (cur->x - cur->y)/(cur->x + cur->y);
usually needs to be performed using floating point arithmetic. In cases where
the x and y columns are integers, this can be done by reading the columns as
doubles using an explicit type specification:
dx = (cur->x:D - cur->y:D)/(cur->x + cur->y);
Alternatively, it can be done using C type-casting in the expression:
dx = ((double)cur->x - (double)cur->y)/((double)cur->x + (double)cur->y);
In addition to accessing columns in the current row, reference also can be made
to the
previous row using
prev-[colname]>, and to the
next row using
next-[colname]>. Note that if
prev-[colname]> is specified in the
funcalc expression, the
very first row is not processed. If
next-[colname]> is specified in
the
funcalc expression, the very last row is not processed. In this
way,
prev and
next are guaranteed always to point to valid rows.
For example, to print out the values of the current x column and the previous
y column, use the C fprintf function in a
funcalc expression:
fprintf(stdout, "%d %d\n", cur->x, prev->y);
New columns can be specified using the same
cur-[colname]> syntax by
appending the column type (and optional tlmin/tlmax/binsiz specifiers),
separated by colons. For example, cur->avg:D will define a new column of
type double. Type specifiers are the same those used above to specify new data
types for existing columns.
For example, to create and output a new column that is the average value of the
x and y columns, a new "avg" column can be defined:
cur->avg:D = (cur->x + cur->y)/2.0
Note that the final ';' is not required for single-line expressions.
As with FITS TFORM data type specification, the column data type specifier can
be preceded by a numeric count to define an array, e.g., "10I" means
a vector of 10 short ints, "2E" means two single precision floats,
etc. A new column only needs to be defined once in a
funcalc
expression, after which it can be used without re-specifying the type. This
includes reference to elements of a column array:
cur->avg[0]:2D = (cur->x + cur->y)/2.0;
cur->avg[1] = (cur->x - cur->y)/2.0;
The 'X' (bits) data type is treated as a char array of dimension
(numeric_count/8), i.e., 16X is processed as a 2\-byte char array. Each 8\-bit
array element is accessed separately:
cur->stat[0]:16X = 1;
cur->stat[1] = 2;
Here, a 16\-bit column is created with the MSB is set to 1 and the LSB set to 2.
By default, all processed rows are written to the specified output file. If you
want to skip writing certain rows, simply execute the C "continue"
statement at the end of the
funcalc expression, since the writing of
the row is performed immediately after the expression is executed. For
example, to skip writing rows whose average is the same as the current x
value:
cur->avg[0]:2D = (cur->x + cur->y)/2.0;
cur->avg[1] = (cur->x - cur->y)/2.0;
if( cur->avg[0] == cur->x )
continue;
If no output file argument is specified on the
funcalc command line, no
output file is opened and no rows are written. This is useful in expressions
that simply print output results instead of generating a new file:
fpv = (cur->av3:D-cur->av1:D)/(cur->av1+cur->av2:D+cur->av3);
fbv = cur->av2/(cur->av1+cur->av2+cur->av3);
fpu = ((double)cur->au3-cur->au1)/((double)cur->au1+cur->au2+cur->au3);
fbu = cur->au2/(double)(cur->au1+cur->au2+cur->au3);
fprintf(stdout, "%f\t%f\t%f\t%f\n", fpv, fbv, fpu, fbu);
In the above example, we use both explicit type specification (for
"av" columns) and type casting (for "au" columns) to
ensure that all operations are performed in double precision.
When an output file is specified, the selected input table is processed and
output rows are copied to the output file. Note that the output file can be
specified as "stdout" in order to write the output rows to the
standard output. If the output file argument is passed, an optional third
argument also can be passed to specify which columns to process.
In a FITS binary table, it sometimes is desirable to copy all of the other FITS
extensions to the output file as well. This can be done by appending a '+'
sign to the name of the extension in the input file name. See
funtable
for a related example.
funcalc works by integrating the user-specified expression into a
template C program called tabcalc.c. The completed program then is compiled
and executed. Variable declarations that begin the
funcalc expression
are placed in the local declaration section of the template main program. All
other lines are placed in the template main program's inner processing loop.
Other details of program generation are handled automatically. For example,
column specifiers are analyzed to build a C struct for processing rows, which
is passed to
FunColumnSelect() and used in
FunTableRowGet(). If
an unknown variable is used in the expression, resulting in a compilation
error, the program build is retried after defining the unknown variable to be
of type double.
Normally,
funcalc expression code is added to
funcalc row
processing loop. It is possible to add code to other parts of the program by
placing this code inside special directives of the form:
[directive name]
... code goes here ...
end
The directives are:
- •
- global add code and declarations in global space,
before the main routine.
- •
- local add declarations (and code) just after the
local declarations in main
- •
- before add code just before entering the main row
processing loop
- •
- after add code just after exiting the main row
processing loop
Thus, the following
funcalc expression will declare global variables and
make subroutine calls just before and just after the main processing loop:
global
double v1, v2;
double init(void);
double finish(double v);
end
before
v1 = init();
end
... process rows, with calculations using v1 ...
after
v2 = finish(v1);
if( v2 < 0.0 ){
fprintf(stderr, "processing failed %g -> %g\n", v1, v2);
exit(1);
}
end
Routines such as
init() and
finish() above are passed to the
generated program for linking using the
\-l [link directives ...]
switch. The string specified by this switch will be added to the link line
used to build the program (before the funtools library). For example, assuming
that
init() and
finish() are in the library libmysubs.a in the
/opt/special/lib directory, use:
funcalc -l "-L/opt/special/lib -lmysubs" ...
User arguments can be passed to a compiled funcalc program using a string
argument to the "\-a" switch. The string should contain all of the
user arguments. For example, to pass the integers 1 and 2, use:
funcalc -a "1 2" ...
The arguments are stored in an internal array and are accessed as strings via
the ARGV(n) macro. For example, consider the following expression:
local
int pmin, pmax;
end
before
pmin=atoi(ARGV(0));
pmax=atoi(ARGV(1));
end
if( (cur->pha >= pmin) && (cur->pha <= pmax) )
fprintf(stderr, "%d %d %d\n", cur->x, cur->y, cur->pha);
This expression will print out x, y, and pha values for all rows in which the
pha value is between the two user-input values:
funcalc -a '1 12' -f foo snr.ev'[cir 512 512 .1]'
512 512 6
512 512 8
512 512 5
512 512 5
512 512 8
funcalc -a '5 6' -f foo snr.ev'[cir 512 512 .1]'
512 512 6
512 512 5
512 512 5
Note that it is the user's responsibility to ensure that the correct number of
arguments are passed. The ARGV(n) macro returns a NULL if a requested argument
is outside the limits of the actual number of args, usually resulting in a
SEGV if processed blindly. To check the argument count, use the ARGC macro:
local
long int seed=1;
double limit=0.8;
end
before
if( ARGC >= 1 ) seed = atol(ARGV(0));
if( ARGC >= 2 ) limit = atof(ARGV(1));
srand48(seed);
end
if ( drand48() > limit ) continue;
The macro WRITE_ROW expands to the
FunTableRowPut() call that writes the
current row. It can be used to write the row more than once. In addition, the
macro NROW expands to the row number currently being processed. Use of these
two macros is shown in the following example:
if( cur->pha:I == cur->pi:I ) continue;
a = cur->pha;
cur->pha = cur->pi;
cur->pi = a;
cur->AVG:E = (cur->pha+cur->pi)/2.0;
cur->NR:I = NROW;
if( NROW < 10 ) WRITE_ROW;
If the
\-p [prog] switch is specified, the expression is not executed.
Rather, the generated executable is saved with the specified program name for
later use.
If the
\-n switch is specified, the expression is not executed. Rather,
the generated code is written to stdout. This is especially useful if you want
to generate a skeleton file and add your own code, or if you need to check
compilation errors. Note that the comment at the start of the output gives the
compiler command needed to build the program on that platform. (The command
can change from platform to platform because of the use of different
libraries, compiler switches, etc.)
As mentioned previously,
funcalc will declare a scalar variable
automatically (as a double) if that variable has been used but not declared.
This facility is implemented using a sed script named funcalc.sed, which
processes the compiler output to sense an undeclared variable error. This
script has been seeded with the appropriate error information for gcc, and for
cc on Solaris, DecAlpha, and SGI platforms. If you find that automatic
declaration of scalars is not working on your platform, check this sed script;
it might be necessary to add to or edit some of the error messages it senses.
In order to keep the lexical analysis of
funcalc expressions (reasonably)
simple, we chose to accept some limitations on how accurately C comments,
spaces, and new-lines are placed in the generated program. In particular,
comments associated with local variables declared at the beginning of an
expression (i.e., not in a
local...end block) will usually end up in
the inner loop, not with the local declarations:
/* this comment will end up in the wrong place (i.e, inner loop) */
double a; /* also in wrong place */
/* this will be in the the right place (inner loop) */
if( cur->x:D == cur->y:D ) continue; /* also in right place */
a = cur->x;
cur->x = cur->y;
cur->y = a;
cur->avg:E = (cur->x+cur->y)/2.0;
Similarly, spaces and new-lines sometimes are omitted or added in a seemingly
arbitrary manner. Of course, none of these stylistic blemishes affect the
correctness of the generated code.
Because
funcalc must analyze the user expression using the data file(s)
passed on the command line, the input file(s) must be opened and read twice:
once during program generation and once during execution. As a result, it is
not possible to use stdin for the input file:
funcalc cannot be used as
a filter. We will consider removing this restriction at a later time.
Along with C comments,
funcalc expressions can have one-line internal
comments that are not passed on to the generated C program. These internal
comment start with the
# character and continue up to the new\-line:
double a; # this is not passed to the generated C file
# nor is this
a = cur->x;
cur->x = cur->y;
cur->y = a;
/* this comment is passed to the C file */
cur->avg:E = (cur->x+cur->y)/2.0;
As previously mentioned, input columns normally are identified by their being
used within the inner event loop. There are rare cases where you might want to
read a column and process it outside the main loop. For example, qsort might
use a column in its sort comparison routine that is not processed inside the
inner loop (and therefore not implicitly specified as a column to be read). To
ensure that such a column is read by the event loop, use the
explicit
keyword. The arguments to this keyword specify columns that should be read
into the input record structure even though they are not mentioned in the
inner loop. For example:
explicit pi pha
will ensure that the pi and pha columns are read for each row, even if they are
not processed in the inner event loop. The
explicit statement can be
placed anywhere.
Finally, note that
funcalc currently works on expressions involving FITS
binary tables and raw event files. We will consider adding support for image
expressions at a later point, if there is demand for such support from the
community.
SEE ALSO¶
See
funtools(7) for a list of Funtools help pages