table of contents
DACS_TRANSFORM(8) | DACS Web Services Manual | DACS_TRANSFORM(8) |
NAME¶
dacs_transform - rule-based document transformationSYNOPSIS¶
dacs_transform
[ dacsoptions[1]]
DESCRIPTION¶
This web service is part of the DACS suite. dacs_transform can perform a variety of transformations on an original document to produce a new document. Transformations such as redaction, insertion, and replacement are available. What makes the program interesting is that any transformation can depend on a rule that is evaluated at run-time, allowing a new document to be tailored for a specific user or context. The program looks for embedded markup (meta-information) called directives in the input document. A directive specifies a conditional (rule-based) or unconditional operation that is evaluated at that point in the document to determine the output text that is to be interpolated into the program's output. Text outside of a directive is copied verbatim to the program's output. One application of this software is to produce different versions of documentation from the same input. For example, consider a requirement to produce technical documentation for a series of printers where the printers are substantially the same (their documentation shares a lot of the same text and graphics) but each model has unique features or capabilities. Instead of producing a single manual that describes all models, which makes the manual larger and more complicated than necessary, this software provides a convenient way to create model-specific documentation from the same input. This means that the documentation common to all printers is shared by all of the manuals, yet the manual for each printer is easily customized for the particular printer.•dacs_transform is similar in
concept or purpose to the Apache modules mod_include[2] and
mod_rewrite[3]. It is not an Apache module, however, and can therefore
be used with any web server, and provides rich, rule-based selection of
regions for processing. It can be used in conjunction with
mod_ext_filter[4]. It is also conceptually similar to languages like
PHP[5] where ordinary content and special processing directives can be
combined within a document.
•The dacstransform(1)[6] command
provides similar functionality from the command line. The
transform()[7] function is also available.
•dacs_transform and
dacstransform can be particularly useful tools for generating both
static and dynamic web pages from template files.
•For security reasons, access to
dacs_transform is disabled by default. Some configuration capabilities
and features expected in a production version have not been implemented. If
there are multiple identities, only one identity ( REMOTE_USER) is
available during rule processing.
Regions¶
A directive delimits a region within the source document. A directive that is enabled processes its region in some way, otherwise the directive and its region are disabled and produce no output. Whether a directive is enabled or disabled depends on the DACS rule that is named in the directive. Zero-length regions (i.e., regions with no content) are allowed. The output that is produced by dacs_transform depends on input that is copied verbatim, the selection of regions in the original document, the document's rules, region evaluation, and the context in which the rules are evaluated. Rules can not only inspect the requesting user's identity (based on the environment variable REMOTE_USER), including roles (obtained from DACS_ROLES), they can employ C-like expressions and a variety of built-in functions (see dacs.exprs(5)[8]). Directives are not emitted as part of the transformed document. Every conditional region is given a name. With the exception of reserved names, region names have no particular significance to dacs_transform; they are simply attributes to which rules may refer. Each document will usually have one or more rules associated with it. For example, an author might assign a name representing a security level to each region in a document: public, secret, top-secret, and so on. For each of these security levels, a specified set of rules would be examined by dacs_transform to decide which identities are granted (or denied) access to the corresponding region. In this way, different users may be given different versions of the same source document; some users might not be able to view secret and top-secret content, other might be able to view all content. A document that combined text written in different languages might name regions English, French, Russian, and so on; the English regions might be enabled by dacs_transform based on the user's DACS role or the user's stored preference for that language. Another document might contain region names corresponding to time zones: PST, MST, EST, and so on. A rule might then require the time zone associated with the location corresponding to the user's apparent IP address to match that of the region being tested. Or an audio stream or speech synthesis content might be automatically enabled if a user has a role that indicates she is visually impaired. At present, no facility is available to assist with working with meta-information. It must either be added manually or generated by an application that understands how to insert meta-information in its output.
In an HTML document served by dacs_transform, a region consists of a
single directive or is delimited by a pair of directives. All other document
content is ignored with respect to transformation. By default, directives are
contained within HTML-style comments and the start of a directive is indicated
by a line having the initial nine characters "<!--DACS" and
ending with the normal HTML end-of-comment syntax, "-->". Such a
line is unlikely to occur in a document by accident, but the syntax is
configurable[9].
Note
A directive contains one or more attribute="value"
pairs. Exactly one attribute name must be a directive name that indicates the
operation to be performed. An attribute name must be a syntactically valid
variable name[10]. The value must be enclosed in matching double
quotes, single quotes, or backticks (decimal ASCII code 96). Backtick quotes
are treated differently in that the enclosed string is evaluated as an
expression[11]. Variables from the Env and Conf
namespaces[12] are instantiated. The current directive's attributes are
accessible in the Attr namespace; these attribute values are
unevaluated and quoted.
This example input contains two regions:
Note
Directive names, which are described below, are reserved and have special
meaning to dacs_transform. Unrecognized attributes are ignored but can
be referenced as arguments by rules. A given attribute cannot appear more than
once within a directive. All attribute names are case-insensitive.
•Whitespace is not ignored at the
beginning or end of an input line.
•Because the syntax for comments defined
for HTML is also acceptable in SGML and XML (and any similar markup language
that is based on SGML), dacs_transform can also work with those
documents.
•A directive cannot be "commented
out" except by modifying the line on which it occurs so that the
directive will not be recognized as such. That is, the context in which a
directive occurs with respect to the original document is not considered by
dacs_transform.
<!--DACS begin="English" --> Hello! <!--DACS end="English" --> <!--DACS begin="French" --> Bonjour! <!--DACS end="French" -->
•For all directives, the region name
"*" is reserved and indicates that the region should be enabled
without evaluating any rule. If an author wants to always insert some text or
an identification string, for instance, this feature eliminates the need to
create a rule that does nothing other than return True.
•For all directives, a region name
prefixed by the negation operator[13] inverts the selection test.
•For all directives, regardless of the
region name, an attribute named "cond" may be provided. Its value is
an expression that must evaluate to True for the region to be
processed. If a rule also applies to the directive, both the rule and the
expression must grant access.
A region name prefixed by the negation operator ("!") indicates that
the region should be enabled if the rule returns False and should be
disabled if it returns True. Note that the negation character is not
part of the region name. This syntax eliminates the need to write separate
"if-true" and "if-false" versions of the same rule,
although it is an inefficient substitute for an if/else construct.
For example, if a document only has public and secret regions, instead of
defining one rule for public regions and another for secret regions, an author
might simply define a single rule to identify secret regions and use negation:
<!--DACS begin="!secret" --> This is public stuff, not secret stuff. <!--DACS end="!secret" -->
By default, content that is included during processing of an insert, insertv, or
expand directive is recursively processed for directives. Recursion may be
disabled on a case-by-case basis by specifying the recurse attribute with a
value of "no" (case insensitively). It may also be explicitly
enabled by specifying the attribute value "yes" (case
insensitively).
A maximum stack depth is imposed, primarily to guard against infinite recursion.
This limit is currently set to 100 at compile time.
Variables set by outer levels can be referenced by inner levels. If variables at
different levels have the same name, however, only the innermost value is
accessible.
Directive Index:
Note
This feature is only partially implemented. In the current implementation, a
filter directive must use the expr operation and may not include another
filter region.
Original document content within a delimited region, if any, is replaced by new
material using the filter directive. This directive must have a corresponding
end directive. A newline is appended to the result; if this behaviour is
undesirable, use filterv[18].
Either the expr or uri operations must be specified.
If an expr attribute is present, the original document content, including its
final newline character, is passed to the given expression[11] as the
value of the variable ${DACS::stdin}. The value of the expression (a
string) replaces the region's original content. An evaluation error causes the
program to terminate.
If the uri attribute is present, it specifies a web service to which the region
should be passed as input and the output of which should replace the original
document content. By default, the URI is invoked using the POST method but if
a method attribute is present it specifies the HTTP method to use. The http
and https schemes are supported. The input is passed as an argument named
CONTENT.
If no operation attribute is provided, the original content is evaluated as an
expression[11] and its value becomes the new content of the region.
To interpolate the current date you might use:
Or equivalently:
Tip
To simply substitute variable values into the original content, use the
expand[14] directive or the expand()[21] function. For example:
Or, alternatively:
In either variation, the three lines in the document are replaced by a single
line:
Tip
A filter directive with an expr or uri attribute and an empty region can be
written more simply using an insert directive.
filterv
1.begin: start a region
2.debug: emit variables for debugging
3.end: end a region
4.expand: insert content and interpolate
variables
5.filter: transform content
6.filterv: transform content, verbatim
7.id: insert an identification string
8.insert: insert content
9.insertv: insert content, verbatim
10.set: set or reset variables
begin
The begin directive starts a region with the
specified name:
If the region above named secret is enabled, its content is included in the
program's output. Directives that appear in the region, including other begin
directives, are processed. Variable references are not expanded; use the
expand[14] directive to interpolate variable references. Every begin
directive must have a matching end[15] directive.
The region name (the value of the begin attribute) is accessible in rules using
${Args::region}.
debug
<!--DACS begin="secret" -->
The debug directive can be helpful for
understanding or debugging processing. It emits variables that exist at the
point where an enabled debug directive is processed. This directive has no
matching end directive; it is essentially a region with no content.
By default, all variables in the Attrs, Conf, and DACS
namespaces are emitted. The attribute name show can be set to Attrs, Conf, or
DACS to restrict output to the particular namespace. The value all is
equivalent to the default. Alternatively, the attribute name Attrs can be set
to "yes" (or "no") to select (or deselect) the
Attrs namespace. The same applies for the Conf and DACS attributes.
These attribute names are case sensitive but their values are not.
The emitted output is preceded by the directive prefix string in effect and
followed by the directive suffix string in effect. It is assumed that no text
is emitted in the debugging output that might accidentally be recognized as
the suffix string.
end
The end of a region started by a
begin[16], filter[17], filterv[18] or expand[14]
directive is indicated using the end directive:
When properly balanced, regions can be nested.
expand
<!--DACS begin="secret" --> This is some secret text. <!--DACS end="secret" -->
This directive expands variable references in
inserted content. Also, text containing variable references may appear in the
original content, delimited by an end directive.
An expr, uri, or filename attribute may be used to specify the source of the
input as with the insert[19] directive. Variable references in the text
from these sources are expanded. If one of these attributes is not specified,
the directive must be terminated by an end directive.
Directives in the expanded text are recursively processed, modulo the recurse
attribute (see Recursion[20]).
filter
% cat inputfile <!--DACS expand="*" --> Nice dog, ${DACS::dog1}. <!--DACS end="*" --> Meow! <!--DACS expand="*" cond="${DACS::dog3} eq:i 'bandito'" --> Good boy, ${DACS::dog2}. <!--DACS end="*" --> % dacstransform -Ddog1=Auggie -Ddog2=Harley -Ddog3=Bandito < inputfile Nice dog, Auggie. Meow! Good boy, Harley.
<!--DACS filter="public" expr="strftime('%v')" --> <!--DACS end="public" -->
<!--DACS filter="public" --> strftime("%v") <!--DACS end="public" -->
<!--DACS filter="*" expr="expand(${DACS::stdin})" section="Hello, world" --> <h1>${Attr::section}</h1> <!--DACS end="*" -->
<!--DACS filter="*" section="Hello, world" --> expand("<h1>${Attr::section}</h1>") <!--DACS end="*" -->
<h1>Hello, world</h1>
This directive is identical to
filter[17] except that no newline is appended.
id
If the region is enabled, the directive is
replaced by the current time and date, and a DACS version
identification string. This directive has no matching end directive; it is
essentially a region with no content.
For example, the directive:
will be replaced by a line similar to this:
Note that the replacement line appears as a comment in the emitted document and
will pass through dacs_transform unaltered.
insert
<!--DACS id="*" -->
<--DACS Generated 6-Sep-2007@11:37:43 by dacstransform DACS 1.4.20 Release date 13-Aug-07 09:39:03 (revid 2034) on example.com -->
Document text is read from a specified source
using the insert directive. Exactly one filename, uri, or expr attribute must
be provided. Variable references in the inserted content are not
expanded - if that is required, use the expand[14] directive.
Directives in the inserted content are not processed if recursion[20]
has been disabled.
The region name (the value of the insert attribute) is accessible to rules as
the value of ${Args::region}. This directive has no matching end
directive; it is essentially a region with no content. Like the begin and end
directives, the insert directive names the region so that an appropriate rule
can be applied.
The filename attribute gives the pathname of a file to be inserted into the
document at the current location. When invoked as dacs_transform, the
pathname must be absolute (i.e., it must begin with a slash character).
The uri attribute gives a URI that is invoked to obtain the document to be
inserted. The GET method is used by default, but if a method attribute is
present it specifies the HTTP method to use. The returned document is inserted
at the current location. If the URI's scheme is file, it is equivalent to the
filename attribute. The http and https schemes are also recognized.
A third choice is the expr attribute. The expression[11] is evaluated and
its result is inserted into the document; an evaluation error causes the
program to terminate.
Because it is needed in many cases and harmless in many others, a newline
character is emitted after the inserted text. If this behaviour is
undesirable, use insertv[22].
These directives demonstrates how to expand a variable reference found in a
template file into the current document:
If the file /tmp/h1_template looks like:
then these two lines would be inserted in the program's output:
Equivalent but slightly simpler directives can be used for this example:
insertv
<!--DACS Interpolate browser-specific JavaScript --> <!--DACS insert="mozilla" filename="/dacs/dacs_transform/js/js1.html" --> <!--DACS insert="ie" filename="/dacs/dacs_transform/js/js2.html" --> <!--DACS insert="netscape" filename="/dacs/dacs_transform/js/js3.html" --> <!--DACS insert="*" uri="http://example.com/data" --> <!--DACS insert="*" expr="strftime('%v')" --> <!--DACS insert="*" expr="exec('/bin/date')" --> <!--DACS insert="*" expr="${Attr::arg1} + ${Attr::arg2}" arg1="10" arg2="20" -->
<!--DACS insert="*" expr="expand(get('/tmp/h1_template'))" s="Section 1"--> <!--DACS insert="*" expr="expand(get('/tmp/h1_template'))" s="Section 2"-->
<h1>${Attr::s}</h1>
<h1>Section 1</h1> <h1>Section 2</h1>
<!--DACS expand="*" filename="/tmp/h1_template" s="Section 1"--> <!--DACS expand="*" filename="/tmp/h1_template" s="Section 2"-->
This directive is identical to
insert[19] except that no newline is appended.
set
This directive is used to set variables that
will exist for the remainder of the current scope (the document being
processed) and any documents that are recursively processed. Setting a
variable creates it or overrides its existing value. Any occurrences of the
variable in an outer scope are unaffected, as are variables that are created
from attributes associated with a directive.
This directive is useful to set a variable to a string that will be used more
than once during document processing, or to establish configuration-related
variables in a single location. This directive can prevent having to
repeatedly pass the same string argument as an attribute in multiple
directives.
If gfile looks like:
and gfile2 looks like:
then processing gfile will emit this output:
<!--DACS set="*" foo="1" bar="2" --> <!--DACS debug="*" show="attrs" --> <!--DACS set="*" foo="3" bar="4" --> <!--DACS debug="*" show="attrs" --> <!--DACS insert="*" filename="gfile2" --> <!--DACS debug="*" show="attrs" -->
<!--DACS debug="*" show="attrs" --> <!--DACS set="*" foo="5" bar="6" bazz="7" --> <!--DACS debug="*" show="attrs" -->
<!--DACS Debug: Attrs: debug="*" show="attrs" set="*" foo="1" bar="2" --> <!--DACS Debug: Attrs: debug="*" show="attrs" set="*" foo="3" bar="4" --> <!--DACS Debug: Attrs: debug="*" show="attrs" insert="*" filename="gfile2" set="*" foo="3" bar="4" --> <!--DACS Debug: Attrs: debug="*" show="attrs" set="*" foo="5" bar="6" bazz="7" insert="*" filename="gfile2" --> <!--DACS Debug: Attrs: debug="*" show="attrs" set="*" foo="3" bar="4" -->
Configuration variables can be set to change some of the program's defaults:
•transform_docs: This is the full
pathname of the root directory in which original documents are kept. By
default, the program will use a subdirectory
${Conf::DACS_HOME}dacs_transform/docs. (default:
/usr/local/dacs/dacs_transform/docs)
Security
Change the default with care. In the absence of an appropriate access control
rule, setting the pathname to "/" or the empty string, would provide
access to any file on the server that can be read by this web service.
•transform_acls: This is the VFS
specification for the rules. By default, the program will use
${Conf::DACS_HOME}dacs_transform/acls. (default:
[transform-acls]dacs-fs:/usr/local/dacs/dacs_transform/acls)
•transform_annotation: This is
the annotation to interpolate in redacted text instead of the default.
•transform_prefix: Instead of the
default prefix used to introduce a directive, the value of this variable is
used. It must appear at the beginning of a line.
•transform_suffix: Instead of the
default prefix used to end a directive, the value of this variable is
used.
•transform_rprefix: A line whose
beginning matches the specified regular expression introduces a
directive.
•transform_rsuffix: The end of a
directive is found by matching the specified regular expression.
IEEE Std 1003.2 ("POSIX.2") "extended" regular expressions
are supported ( regex(3)[23]).Web Service Arguments¶
In addition to the standard CGI arguments[24], dacs_transform understands the following CGI arguments: DOCThis value of this argument specifies the
document to be transformed as a file on the server running
dacs_transform. It is an absolute path that is remapped within a
segregated area of the web server.
DOCURI
This argument specifies the document to be
transformed as a URI. The URI must use the http or https scheme. The HTTP GET
method is used to fetch the document and the URI may include query arguments.
The URI must be properly URL-encoded.
ANNOTATE
If this argument is present and its value is
"yes", each deleted region of text is denoted in the retrieved
document; contiguous deleted regions are denoted only once. The default
annotation assumes a document Content-Type of text/html.
CONTENT_TYPE
If this argument is present, this will be the
MIME Content-Type of the returned document. If this argument is omitted, the
program will try to guess the type based on the suffix of the name of the
original document, or default to text/html.
DIAGNOSTICS¶
The program exits 0 if everything was fine, 1 if an error occurred.SEE ALSO¶
dacstransform(1)[6], dacs.exprs(5)[11], dacs.acls(5)[25]BUGS¶
There is a lot of room for improvement and new features. Directive syntax slants towards the arcane; that may get worse before it gets better.AUTHOR¶
Distributed Systems Software ( www.dss.ca[26])COPYING¶
Copyright2003-2012 Distributed Systems Software. See the LICENSE[27] file that accompanies the distribution for licensing information.NOTES¶
- 1.
- dacsoptions
- 2.
- mod_include
- 3.
- mod_rewrite
- 4.
- mod_ext_filter
- 5.
- PHP
- 7.
- transform()
- 9.
- configurable
- 10.
- syntactically valid variable name
- 11.
- expression
- 12.
- namespaces
- 13.
- negation operator
- 14.
- expand
- 15.
- end
- 16.
- begin
- 17.
- filter
- 18.
- filterv
- 19.
- insert
- 20.
- Recursion
- 21.
- expand()
- 22.
- insertv
- 23.
- regex(3)
- 24.
- standard CGI arguments
- 25.
- dacs.acls(5)
- 26.
- www.dss.ca
- 27.
- LICENSE
10/22/2012 | DACS 1.4.27b |