NAME¶
flexml - generate validating XML processor and applications from DTD
SYNOPSIS¶
flexml [
-ASHDvdnLXV] [
-sskel] [
-ppubid] [
-iinit_header] [
-uuri] [
-rrootags] [
-aactions]
name[
.dtd]
DESCRIPTION¶
Flexml reads
name.dtd which must be a DTD (Document Type
Definition) describing the format of XML (Extensible Markup Language)
documents, and produces a "validating" XML
processor with an
interface to support XML
applications. Proper applications can be
generated optionally from special "action files", either for linking
or textual combination with the processor.
The generated processor will only validate documents that conform strictly to
the DTD,
without extending it, more precisely we in practice restrict
XML rule [28] to
[28r] doctypedecl ::= '<!DOCTYPE' S Name S ExternalID S? '>'
where the "ExternalId" denotes the used DTD. (One might say, in fact,
that
flexml implements "non-extensible" markup. :)
The generated processor is a
flex(1) scanner, by default named
name .l with a corresponding C header file
name.h
for separate compilation of generated applications. Optionally
flexml
takes an
actions file with per-element actions and produces a C file
with element functions for an XML application with entry points called from
the XML processor (it can also fold the XML application into the XML processor
to make stand-alone XML applications but this prevents sharing of the
processor between applications).
In "OPTIONS" we list the possible options, in "ACTION FILE
FORMAT" we explain how to write applications, in "COMPILATION"
we explain how to compile produced processors and applications into
executables, and in "BUGS" we list the current limitations of the
system before giving standard references.
OPTIONS¶
Flexml takes the following options.
- --stand-alone, -A
- Generate a stand-alone scanner application. If
combined with -aactions then the application will be named
as actions with the extension replaced by .l, otherwise it
will be in name.l. Conflicts with -S, -H, and
-D.
- --actions actions, -a
actions
- Uses the actions file to produce an XML application
in the file with the same name as actions after replacing the
extension with .c. If combined with -A then instead the
stand-alone application will include the action functions.
- --dummy [app_name], -D
[app_name]
- Generate a dummy application with just empty functions to
be called by the XML processor. If app_name is not specified on the
command line, it defaults to name-dummy.c. If combined with
-a actions then the application will insert the specified
actions and be named as actions with the extension replaced by
.c. Conflicts with -A; implied by -a unless either of
-SHD is specified.
- --debug, -d
- Turns on debug mode in the flex scanner and also prints out
the details of the DTD analysis performed by flexml.
- --header [header_name],
-H [header_name]
- Generate the header file. If the header_name is not
specified on the command line, defaults to name.h. Conflicts
with -A; on by default if none of -SHD specified.
- --lineno, -L
- Makes the XML processor (as produced by flex(1))
count the lines in the input and keep it available to XML application
actions in the integer "yylineno". (This is off by default as
the performance overhead is significant.)
- --quiet, -q
- Prevents the XML processor (as produced by flex(1))
from reporting the error it runs into on stderr. Instead, users will have
to pool for error messages with the parse_err_msg() function. By
default, error messages are written on stderr.
- --dry-run, -n
- "Dry-run": do not produce any of the output
files.
- --pubid pubid, -p pubid
- Sets the document type to be "PUBLIC" with the
identifier pubid instead of "SYSTEM", the default.
- --init_header init_header, -i
init_header
- Puts a line containing "#include
"init_header"" in the "%{...%}" section at the
top of the generated .l file. This may be useful for making various flex
"#define"s, for example "YY_INPUT" or
"YY_DECL".
- --sysid=sysid
- Overrides the "SYSTEM" id of the accepted DTD.
Sometimes useful when your dtd is placed in a subdirectory.
- --root-tags roottags, -r
roottags
- Restricts the XML processor to validate only documents with
one of the root elements listed in the comma-separated
roottags.
- --scanner [scanner_name],
-S [scanner_name]
- Generate the scanner. If scanner_name is not given
on command line, it defaults to name.l. Conflicts with
-A; on by default if none of -SHD specified.
- --skel skel, -s skel
- Use the skeleton scanner skel instead of the
default.
- --act-bin flexml-act, -T
flexml-act
- This is an internal option mainly used to test versions of
flexml not installed yet.
- --stack-increment stack_increment, -b
stack_increment
- Sets the FLEXML_BUFFERSTACKSIZE to stack_increment (100000
by default). This controls how much the data stack grows in each
realloc().
- --tag-prefix STRING, -O
STRING
- Use STRING to differentiate multiple versions of flexml in
the same C code, just like the -P flex argument.
- --uri uri, -u uri
- Sets the URI of the DTD, used in the "DOCTYPE"
header, to the specified uri (the default is the DTD name).
- --verbose, -v
- Be verbose: echo each DTD declaration (after parameter
expansion).
- --version, -V
- Print the version of flexml and exit.
Action files, passed to the
-a option, are XML documents conforming to
the DTD
flexml-act.dtd which is the following:
<!ELEMENT actions ((top|start|end)*,main?)>
<!ENTITY % C-code "(#PCDATA)">
<!ELEMENT top %C-code;>
<!ELEMENT start %C-code;> <!ATTLIST start tag NMTOKEN #REQUIRED>
<!ELEMENT end %C-code;> <!ATTLIST end tag NMTOKEN #REQUIRED>
<!ELEMENT main %C-code;>
The elements should be used as follows:
- "top"
- Use for top-level C code such as global declarations,
utility functions, etc.
- "start"
- Attaches the code as an action to the element with the name
of the required ""tag"" attribute. The
""%C-code;"" component should be C code suitable for
inclusion in a C block (i.e., within "{"..."}" so it
may contain local variables); furthermore the following extensions are
available:
"{" attribute"}": Can be used to access the value
of the attribute as set with
attribute"="value in the start tag. In C,
"{" attribute"}" will be interpreted depending
on the declaration of the attribute. If the attribute is declared as an
enumerated type like
<!ATTLIST attrib (alt1 | alt2 |...) ...>
then the C attribute value is of an enumerated type with the elements
written "{"
attribute"="alt1"}", "{"
attribute"=" alt2"}", etc.; furthermore
an unset attribute has the "value" "{!"
attribute"}". If the attribute is not an enumeration then
"{" attribute"}" is a null-terminated C string
(of type "char*") and "{!"
attribute"}" is "NULL".
- "end"
- Similarly attaches the code as an action to the end tag
with the name of the required ""tag"" attribute; also
here the ""%C-code;"" component should be C code
suitable for inclusion in a C block. In case the element has
"Mixed" contents, i.e, was declared to permit
"#PCDATA", then the following variable is available:
"{#PCDATA}": Contains the text ("#PCDATA") of the
element as a null-terminated C string (of type "char*"). In case
the Mixed contents element actually mixed text and child elements then
"pcdata" contains the plain concatenation of the text fragments
as one string.
- "main"
- Finally, an optional ""main"" element
can contain the C "main" function of the XML application.
Normally the "main" function should include (at least) one call
of the XML processor:
"yylex()": Invokes the XML processor produced by flex(1) on
the XML document found on the standard input (actually the
"yyin" file handle: see the manual for flex(1) for
information on how to change this as well as the name "yylex").
If no "main" action is provided then the following is used:
int main() { exit(yylex()); }
It is advisable to use XML <"![CDATA[" ... "]]">
sections for the C code to make sure that all characters are properly passed
to the output file.
Finally note that
Flexml handles empty elements <
tag"/"> as equivalent to <
tag><"/"
tag>.
COMPILATION¶
The following
make(1) file fragment shows how one can compile
flexml-generated programs:
# Programs.
FLEXML = flexml -v
# Generate linkable XML processor with header for application.
%.l %.h: %.dtd
$(FLEXML) $<
# Generate C source from flex scanner.
%.c: %.l
$(FLEX) -Bs -o"$@" "$<"
# Generate XML application C source to link with processor.
# Note: The dependency must be of the form "appl.c: appl.act proc.dtd".
%.c: %.act
$(FLEXML) -D -a $^
# Direct generation of stand-alone XML processor+application.
# Note: The dependency must be of the form "appl.l: appl.act proc.dtd".
%.l: %.act
$(FLEXML) -A -a $^
BUGS¶
The present version of
flexml is to be considered in "early
beta" state thus bugs should be expected (and the author would like to
hear about them). Here are some known restrictions that we hope to overcome in
the future:
- •
- The character set is merely ASCII (actually flex(1)
handles 8 bit characters but only the ASCII character set is common with
the XML default UTF-8 encoding).
- •
- "ID" type attributes are not validated for
uniqueness; "IDREF" and "IDREFS" attributes are not
validated for existence.
- •
- The "ENTITY" and "ENTITIES" attribute
types are not supported.
- •
- "NOTATION" declarations are not supported.
- •
- The various "xml:"-attributes are treated like
any other attributes; in particular "xml:spaces" should be
supported.
- •
- The DTD parser is presently a perl hack so it may parse
some DTDs badly; in particular the expansion of parameter entities may not
conform fully to the XML specification.
- •
- A child should be able to "return" a value for
the parent (also called a synthesised attribute). Similarly an
element in Mixed contents should be able to inject text into the
"pcdata" of the parent.
FILES¶
- /usr/share/flexml/skel
- The skeleton scanner with the generic parts of XML
scanning.
- /usr/share/doc/flexml/flexml/
- License, further documentation, and examples.
SEE ALSO¶
flex(1), Extensible Markup Language (XML) 1.0 (W3C Recommendation
REC-xml-1998-0210).
AUTHOR¶
Flexml was written by Kristoffer Rose,
<"krisrose@debian.org">.
COPYRIGHT¶
The program is Copyright (c) 1999 Kristoffer Rose (all rights reserved) and
distributed under the GNU General Public License (GPL, also known as
"copyleft", which clarifies that the author provides absolutely no
warranty for
flexml and ensures that
flexml is and will remain
available for all uses, even comercial).
ACKNOWLEDGEMENT¶
I am grateful to NTSys (France) for supporting the development of
flexml.
Finally extend my severe thanks to Jef Poskanzer, Vern Paxson, and the rest of
the
flex maintainers and GNU developers for a great tool.