NCGEN(1) | UNIDATA UTILITIES | NCGEN(1) |
NAME¶
ncgen - From a CDL file generate a netCDF-3 file, a netCDF-4 file or a C programSYNOPSIS¶
ncgen [-b]
[-c] [-f] [-k file format] [-l output language] [-n] [-o
netcdf_filename] [-x] input_file
DESCRIPTION¶
ncgen generates either a netCDF-3 (i.e. classic) binary .nc file, a netCDF-4 (i.e. enhanced) binary .nc file or a file in some source language that when executed will construct the corresponding binary .nc file. The input to ncgen is a description of a netCDF file in a small language known as CDL (network Common Data form Language), described below. If no options are specified in invoking ncgen, it merely checks the syntax of the input CDL file, producing error messages for any violations of CDL syntax. Other options can be used, for example, to create the corresponding netCDF file, or to generate a C program that uses the netCDF C interface to create the netCDF file. Note that this version of ncgen was originally called ncgen4. The older ncgen program has been renamed to ncgen3. ncgen may be used with the companion program ncdump to perform some simple operations on netCDF files. For example, to rename a dimension in a netCDF file, use ncdump to get a CDL version of the netCDF file, edit the CDL file to change the name of the dimensions, and use ncgen to generate the corresponding netCDF file from the edited CDL file.OPTIONS¶
- -b
- Create a (binary) netCDF file. If the -o option is absent, a default file name will be constructed from the netCDF name (specified after the netcdf keyword in the input) by appending the `.nc' extension. If a file already exists with the specified name, it will be overwritten.
- -c
- Generate C source code that will create a netCDF file matching the netCDF specification. The C source code is written to standard output; equivalent to -lc.
- -f
- Generate FORTRAN 77 source code that will create a netCDF file matching the netCDF specification. The source code is written to standard output; equivalent to -lf77.
- -o netcdf_file
- Name for the binary netCDF file created. If this option is specified, it implies the " -b" option. (This option is necessary because netCDF files cannot be written directly to standard output, since standard output is not seekable.)
- -k file_format
- The -k flag specifies the format of the file to be created and, by inference, the data model accepted by ncgen (i.e. netcdf-3 (classic) versus netcdf-4). The possible arguments are as follows.
- '1', 'classic' => netcdf classic file format, netcdf-3 type model.
- '2', '64-bit-offset', '64-bit offset' => netcdf 64 bit classic file format, netcdf-3 type model.
- '3', 'hdf5', 'netCDF-4', 'enhanced' => netcdf-4 file format, netcdf-4 type model.
- '4', 'hdf5-nc3', 'netCDF-4 classic model', 'enhanced-nc3' => netcdf-4 file format, netcdf-3 type model.
- -x
- Don't initialize data with fill values. This can speed up creation of large netCDF files greatly, but later attempts to read unwritten data from the generated file will not be easily detectable.
- -l output_language
- The -l flag specifies the output language to use when generating source code that will create or define a netCDF file matching the netCDF specification. The output is written to standard output. The currently supported languages have the following flags.
- c|C' => C language output.
- f77|fortran77' => FORTRAN 77 language output
- ; note that currently only the classic model is supported.
- j|java' => (experimental) Java language output
- ; targets the existing Unidata Java interface, which means that only the classic model is supported.
EXAMPLES¶
Check the syntax of the CDL file ` foo.cdl':ncgen
foo.cdl
ncgen -o x.nc
foo.cdl
ncgen -c -o
x.nc foo.cdl
USAGE¶
CDL Syntax Overview¶
Below is an example of CDL syntax, describing a netCDF file with several named dimensions (lat, lon, and time), variables (Z, t, p, rh, lat, lon, time), variable attributes (units, long_name, valid_range, _FillValue), and some data. CDL keywords are in boldface. (This example is intended to illustrate the syntax; a real CDL file would have a more complete set of attributes so that the data would be more completely self-describing.)netcdf foo { // an example netCDF specification in CDL types: ubyte enum enum_t {Clear = 0, Cumulonimbus = 1, Stratus = 2}; opaque(11) opaque_t; int(*) vlen_t; dimensions: lat = 10, lon = 5, time = unlimited ; variables: long lat(lat), lon(lon), time(time); float Z(time,lat,lon), t(time,lat,lon); double p(time,lat,lon); long rh(time,lat,lon); string country(time,lat,lon); ubyte tag; // variable attributes lat:long_name = "latitude"; lat:units = "degrees_north"; lon:long_name = "longitude"; lon:units = "degrees_east"; time:units = "seconds since 1992-1-1 00:00:00"; // typed variable attributes string Z:units = "geopotential meters"; float Z:valid_range = 0., 5000.; double p:_FillValue = -9999.; long rh:_FillValue = -1; vlen_t :globalatt = {17, 18, 19}; data: lat = 0, 10, 20, 30, 40, 50, 60, 70, 80, 90; lon = -140, -118, -96, -84, -52; group: g { types: compound cmpd_t { vlen_t f1; enum_t f2;}; } // group g group: h { variables: /g/ cmpd_t compoundvar; data: compoundvar = { {3,4,5}, Stratus } ; } // group h }
x : a = ...In this situation, x could be either a type for a global attribute, or the variable name for an attribute. Since there could both be a type named x and a variable named x, there is an ambiguity. The rule is that in this situation, x will be interpreted as a type if possible, and otherwise as a variable. If not specified, the data type of an attribute in CDL is derived from the type of the value(s) assigned to it. The length of an attribute is the number of data values assigned to it, or the number of characters in the character string assigned to it. Multiple values are assigned to non-character attributes by separating the values with commas. All values assigned to an attribute must be of the same type. The names for CDL dimensions, variables, attributes, types, and groups may contain any non-control utf-8 character except the forward slash character (`/'). However, certain characters must escaped if they are used in a name, where the escape character is the backward slash `\'. In particular, if the leading character off the name is a digit (0-9), then it must be preceded by the escape character. In addition, the characters ` !"#$%&()*,:;<=>?[]^`´{}|~\' must be escaped if they occur anywhere in a name. Note also that the words `variable', `dimension', `data', `group', and `types' are legal CDL names, but be careful that there is a space between them and any following colon character. This is mostly an issue with attribute declarations. For example, consider this.
netcdf ... { variables: int dimensions; dimensions: attribute=0 ; // this will cause an error dimensions : attribute=0 ; // this is ok. }
Primitive Data Types¶
char characters byte 8-bit data short 16-bit signed integers int 32-bit signed integers long (synonymous with int) int64 64-bit signed integers float IEEE single precision floating point (32 bits) real (synonymous with float) double IEEE double precision floating point (64 bits) ubyte unsigned 8-bit data ushort 16-bit unsigned integers uint 32-bit unsigned integers uint64 64-bit unsigned integers string arbitrary length strings
CDL Constants¶
Constants assigned to attributes or variables may be of any of the basic netCDF types. The syntax for constants is similar to C syntax, except that type suffixes must be appended to shorts and floats to distinguish them from longs and doubles. A byte constant is represented by a single character or multiple character escape sequence enclosed in single quotes. For example,'a' // ASCII `a' '\0' // a zero byte '\n' // ASCII newline character '\33' // ASCII escape character (33 octal) '\x2b' // ASCII plus (2b hex) '\377' // 377 octal = 255 decimal, non-ASCII
"a" // ASCII `a' "Two\nlines\n" // a 10-character string with two embedded newlines "a bell:\007" // a string containing an ASCII bell
-2s // a short -2 0123s // octal 0x7ffs //hexadecimal
-2 1234567890L 0123 // octal 0x7ff // hexadecimal
-2ll // an unsigned -2 0123LL // octal 0x7ffLL //hexadecimal
-2.0f 3.14159265358979f // will be truncated to less precision 1.f
-2.0 3.141592653589793 1.0e-20 1.d
Compound Constant Expressions¶
In order to assign values to variables (or attributes) whose type is user-defined type, the constant notation has been extended to include sequences of constants enclosed in curly brackets (e.g. "{"..."}"). Such a constant is called a compound constant, and compound constants can be nested. Given a type "T(*) vlen_t", where T is some other arbitrary base type, constants for this should be specified as follows.vlen_t var[2] = {t11,t12,...t1N}, {t21,t22,...t2m};The values tij, are assumed to be constants of type T. Given a type "compound cmpd_t {T1 f1; T2 f2...Tn fn}", where the Ti are other arbitrary base types, constants for this should be specified as follows.
cmpd_t var[2] = {t11,t12,...t1N}, {t21,t22,...t2n};The values tij, are assumed to be constants of type Ti. If the fields are missing, then they will be set using any specified or default fill value for the field's base type. The general set of rules for using braces are defined in the Specifying Datalists section below.
Scoping Rules¶
With the addition of groups, the name space for defined objects is no longer flat. References (names) of any type, dimension, or variable may be prefixed with the absolute path specifying a specific declaration. Thus one might sayvariables: /g1/g2/t1 v1;The type being referenced (t1) is the one within group g2, which in turn is nested in group g1. The similarity of this notation to Unix file paths is deliberate, and one can consider groups as a form of directory structure.
1. When name
is not prefixed, then scope rules are applied to locate the specified
declaration. Currently, there are three rules: one for dimensions, one for
types and enumeration constants, and one for all others.
2. When an
unprefixed name of a dimension is used (as in a variable declaration), ncgen
first looks in the immediately enclosing group for the dimension. If it is not
found there, then it looks in the group enclosing this group. This continues
up the group hierarchy until the dimension is found, or there are no more
groups to search.
3. For all
other names, only the immediately enclosing group is searched.
When an unprefixed name of a type or an enumeration constant is used, ncgen
searches the group tree using a pre-order depth-first search. This essentially
means that it will find the matching declaration that precedes the reference
textually in the cdl file and that is "highest" in the group
hierarchy.
One final note. Forward references are not allowed. This means that specifying,
for example, /g1/g2/t1 will fail if this reference occurs before g1 and/or g2
are defined.
Special Attributes¶
Special, virtual, attributes can be specified to provide performance-related information about the file format and about variable properties. The file must be a netCDF-4 file for these to take effect. These special virtual attributes are not actually part of the file, they are merely a convenient way to set miscellaneous properties of the data in CDL The special attributes currently supported are as follows: `_Format', `_Fletcher32, `_ChunkSizes', `_Endianness', `_DeflateLevel', `_Shuffle', and `_Storage'. `_Format' is a global attribute specifying the netCDF format variant. Its value must be a single string matching one of `classic', `64-bit offset', `netCDF-4', or `netCDF-4 classic model'. The rest of the special attributes are all variable attributes. Essentially all of then map to some corresponding `nc_def_var_XXX' function as defined in the netCDF-4 API. For the atttributes that are essentially boolean (_Fletcher32, _Shuffle, and _NOFILL), the value true can be specified by using the strings `true' or `1', or by using the integer 1. The value false expects either `false', `0', or the integer 0. The actions associated with these attributes are as follows.- 1.
- `_Fletcher32 sets the `fletcher32' property for a variable.
- 2.
- `_Endianness' is either `little' or `big', depending on how the variable is stored when first written.
- 3.
- `_DeflateLevel' is an integer between 0 and 9 inclusive if compression has been specified for the variable.
- 4.
- `_Shuffle' specifies if the the shuffle filter should be used.
- 5.
- `_Storage' is `contiguous' or `chunked'.
- 6.
- `_ChunkSizes' is a list of chunk sizes for each dimension of the variable
Specifying Datalists¶
Specifying datalists for variables in the `data:` section can be somewhat complicated. There are some rules that must be followed to ensure that datalists are parsed correctly by ncgen.- 1.
- The top level is automatically assumed to be a list of items, so it should not be inside {...}.
- 2.
- Instances of UNLIMITED dimensions (other than the first dimension) must be surrounded by {...} in order to specify the size.
- 3.
- Instances of vlens must be surrounded by {...} in order to specify the size.
- 4.
- Compound instances must be embedded in {...}
- 5.
- Non-scalar fields of compound instances must be embedded in {...}.
- 6.
- Datalists associated with attributes are implicitly a vector (i.e., a list) of values of the type of the attribute and the above rules must apply with that in mind.
- 7.
- No other use of braces is allowed.
Specifying Character Datalists¶
Specifying datalists for variables of type char also has some complications. consider, for exampledimensions: u=UNLIMITED; d1=1; d2=2; d3=3; d4=4; d5=5; u2=UNLIMITED; variables: char var(d3,d4); datalist: var="1", "two", "three";
- 1.
- Use the size of the rightmost dimension (d4=4) and modify
the constant list so that every string is less than or equal to this
dimension size. Longer strings are decomposed. For our example, we get
this.
datalist: var= "1", "two", "thre", "e";
- 2.
- Pad any short strings to the length of the right dimension.
This produces the following.
datalist: var= "1\0\0\0", "two\0", "thre", "e\0\0\0";
- 3.
- Move the the next to the rightmost dimension (d5 in this
case) and add fill values as needed, producing this.
datalist: var= "1\0\0\0", "two\0", "thre", "e\0\0\0", "\0\0\0\0";
- 1.
- Suppose we have only an unlimited dimension such as this
case.
variables: char var(u); datalist: var="1", "two", "three";
In this case, we treat it like it was defined as this.
variables: char var(u,d1); datalist: var="1","t","w","o","t","h","r","e","e";
This means that u will have the length of nine.
- 2.
- In netcdf-4, dimensions other than the first can be
unlimited. Of course by the rules above, the interior unlimited instances
must be delimited by {...}. For example.
variables: char var(u,u2); datalist: var={"1", "two"}, {"three"};
datalist: var={"1","t","w","o"}, {"t","h","r","e","e"};
datalist: var={"1","t","w","o","\0"}, {"t","h","r","e","e"};
BUGS¶
The programs generated by ncgen when using the -c flag use initialization statements to store data in variables, and will fail to produce compilable programs if you try to use them for large datasets, since the resulting statements may exceed the line length or number of continuation statements permitted by the compiler. The CDL syntax makes it easy to assign what looks like an array of variable-length strings to a netCDF variable, but the strings may simply be concatenated into a single array of characters. Specific use of the string type specifier may solve the problem$Date: 2010/04/29 16:38:55 $ | Printed: 0-0-0 |