NAME¶
Embperl::Syntax - base class for defining custom syntaxes
SYNOPSIS¶
DESCRIPTION¶
Embperl::Syntax provides a base class from which all custom syntaxes should be
derived. Currently Embperl comes with the following derived syntaxes:
- EmbperlHTML
- all the HTML tag that Embperl recognizes by default
- EmbperlBlocks
- all the [ ] blocks that Embperl supports
- Embperl
- The default syntax; is derived from "EmbperlHtml"
and "EmbperlBlocks"
- ASP
- <% %> and <%= %>, see perldoc
Embperl::Syntax::ASP
- SSI
- Server Side Includes, see perldoc Embperl::Syntax::SSI
- Perl
- File contains pure Perl (similar to Apache::Registry), but
can be used inside EmbperlObject
- Text
- File contains only Text, no actions is taken on the
Text
- Mail
- Defines the <mail:send> tag, for sending mail. This
is an example for a taglib, which could be a base for writing your own
taglib to extent the number of available tags
- POD
- Parses POD out of any file and creates a XML tree similar
to pod2xml, which can be formatted by XSLT afterwards.
You can choose which syntax is used inside your page, either by the
"EMBPERL_SYNTAX" configuration directive, the "syntax",
parameter to "Execute" or the "[$ syntax $]" metacommand.
You can also specify multiple syntaxes e.g.
PerlSetEnv EMBPERL_SYNTAX "Embperl SSI"
Execute ({inputfile => '*', syntax => 'Embperl ASP'}) ;
The syntax metacommand allows you to switch the syntax or to add or subtract
syntaxes e.g.
[$ syntax + Mail $]
will add the Mail taglib so the <mail:send> tag is available after this
line.
[$ syntax - Mail $]
now the <mail:send> tag is unknown again
[$ syntax SSI $]
now you can only use SSI commands inside your page.
Defining your own Syntax¶
If you want to define your own syntax, you have to derive a new class from one
of the existing ones and extent it with new tags/functionality. The best thing
is to take a look at the syntax classes that comes with Embperl. (inside the
directory Embperl/Syntax/).
For example if you want to add new html tags, derive from
Embperl::Syntax::HTML, if you want to add new metacommands derive from
Embperl::Syntax::EmbperlBlocks.
Some of the classes define addtionaly methods to easily add new tags. See the
respective pod file, which methods are available for a certain class.
Embperl::Syntax defines the basic methods to create a syntax:
Methods¶
Embperl::Syntax -> new / $self -> new¶
Create a new syntax class. This method should only be called inside a
constructor of a derived class.
$self -> AddToRoot ($elements)¶
This adds a new element to the root of the parser tree. $elements must be a
hashref. See
Embperl::Syntax::ASP for an example.
$self -> AddInitCode ($compiletimecode, $initcode, $termcode,
$procinfo)¶
This gives you the possibility to add some Perl code, that is always executed at
the beginning of a document ($initcode), at the end of the document
($termcode) or at compile time ($compiletimecode). The three strings must be
valid Perl code. See
Embperl::Syntax::SSI for an example. $procinfo is
a hashref that can consits of addtional processor infos (see below) for the
document.
$self -> GetRoot¶
Returns the root of the parser tree.
Embperl::Syntax::GetSyntax ($name, $oldname)¶
Returns a syntax object which is build form the syntaxes named in $name. If
$oldname is given, $name can start with a "+" or "-" to
add or subtract a syntax. This is normally only needed by Embperl itself or to
implement a syntax switch statement (see
Embperl::Syntax::SSI for an
example.)
$self -> CloneHash ($old, $replace)¶
Clones a hash which is given as hashref in $old, optional replace the tags given
in the hashref $replace and return a hashref to the new hash.
Syntax Structure and Parameter¶
Internaly the syntax object builds a data structure which serve as base for the
parser. This structure consists of a list of tokens and options, which starts
with a dash:
Tokens¶
- '-lsearch' => 1
- Do an linear serach instead of a binary search. This is
necessary if the tokens can't clearly separated.
- '-defnodetype' => ntypText,
- Defines the default type for text nodes. Without any
specification the type is CDATA, which mean no escaping takes places. With
"ntypText" all special characters are escaped.
- '-rootnode'
- Name for a root node to insert always.
- <name> => \%tokendescription
- All items which does not start with a slash are treated as
names. The name of a token is only descriptive and is used in error
messages. The item must contain a hashref which describes the token.
Tokendescription¶
Each token can have the following members:
- 'text' => '<'
- Start text
- 'end' => '>'
- End text
- 'matchall'
- when set to 1 new token starts at next character, when set
to -1 new token starts at next character, but only if it is the first
token inside another one.
- 'nodename'
- Text that should be outputed when node is stringifyed.
Defaults to text. If the first character is a ':' you can specify the
sourounding delimiters for this tag with
:<start>:<end>:<text>:<endtag>. Example:
':{:}:NAME' . If the nodename starts with a '!' a unique internal id is
generated, so two or more nodename of the same text, can have different
meaning in different contexts.
- 'contains' =>
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_0123456789'
- Token consists of the following characters. Either
"start" and "end" or "contains" could
be specified.
NOTE: If a item that only specfifies contains but no text should be
compiled, you must specfify a nodname.
- 'unescape' => 1
- If "optRawInput" isn't set unescape the data of
the inside the node
- 'nodetype' => ntypEndTag
- Type of the node
- 'cdatatype' => ntypAttrValue
- Type of nodes for data (which is not matched by 'inside'
definitions) inside this node. Set to zero to not generate any nodes for
text inside of this node, other then these that are matched by a 'inside'
definition.
- 'endtag'
- Name of the tag that marks the end of a block. This is used
by the parser to track correct nesting.
- 'follow' => \%tokenlist
- Hashref that specifices one or more tokens that must follow
this token.
- 'inside' => \%tokenlist
- Hashref that specifices one or more tokens that could occur
inside a node that is started with this token.
- exitinside
- when the token found, the parser stop searching in the
current level and continues with the tokens that are defined in the hash
from there the current one was "called" via inside
- donteat
- set to 1 to don't eat the start text, so it will be matched
again, by any tokens set under "inside". Set 2 to don't the end
text. Set to 3 for both.
- 'procinfo' =>
- Processor info. Hashref with information how to process
this token.
Processor info¶
The processor info gives information how to compile this token to valid code
that can be executed later on by the processor. There could be information for
multiple processors. At the moment only the
embperl processor is
defined. Normaly you must not worry about different processor, because the
syntax object knows inside that all procinfo is for the
embperl
processor.
procinfo is a parameter to many methods, it is a hashref and
can take the following items:
- perlcode => <string> or <arrayref>
- Code to generate. You can also specify a arrayref of
strings. The first string which contains matching attributes are used. The
following special strings are replaced:
- %#<N>%
- Text of childnode number <N> (starting with
zero)
- %><N>%
- Text of sibling node number <N> . 0 gives the current
node, > 0 gives the Nth next node, < 0 gives the Nth previous
node.
- %&<attr>%
- Value of attribute <attr>.
- %^<stackname>%
- Stringvalue of given stack
- %?<stackname>%
- Set if stackvalue was used
- %$n%
- Source Dom Tree, Index of current node.
- %$t%
- Source Dom Tree
- %$x%
- Index of current node
- %$l%
- Index of last node
- %$c%
- Sets the current node Index, if not already done
- %$q%
- Index of source Dom Tree
- %$p%
- Number of current checkpoint
- %%
- Gives a single %
All of the above special values (expect those start with $) allows the following
modifiers:
- %<X>*<N>%
- Attribute/Child etc. must exist.
- %<X>!<N>%
- Attribute/Child etc. must not exist.
- %<X>=<N>:<value1>|<value2>|<value3>%
- Attribute/Child etc. must have the value = <value1>
or <value2> etc.
- %<X>~<N>:<value1>|<value2>|<value3>%
- Attribute/Child etc. must contain the substring
<value1> or <value2> etc. and a non alphanum character must
follow the substring.
writing a minus sign (-) after * ! = or ~ will cause the child/attribute not to
be included, but the condition is evaluated. Writing an ' will cause the value
to be quoted.
- perlcodeend => <string>
- Code to generate at the end of the block.
- compiletimeperlcode => <string> or
<arrayref>
- Code that is executed at compile time. You can also specify
a arrayref of string. The first string which contains matching attributes
are used. The same special strings are replaced as in
"perlcode".
$_[0] contains the Embperl request object. The method "Code" can
be used to get or set the perl code that should be generated by this node.
If the code begins with #!- all newlines are removed in the code. This is
basically useful to keep all code on the same line, so the line number in
error reporting matches the line in the source.
- compiletimeperlcodeend => <string>
- Code that is executed at compile time, but at the end of
the tag. The same special strings are replaced as in "perlcode".
$_[0] contains the Embperl request object. The method "Code" can
be used to get or set the perl code that should be generated by this node.
If the code begins with #!- all newlines are removed in the code. This is
basically useful to keep all code on the same line, so the linenumber in
error reporting matches the line in the source.
- perlcoderemove => 0/1
- Remove perlcode if perlcodeend condition is not met.
- removenode => <removelevel>
- Remove node after compiling. <removelevel> could be
one of the following, values could be added:
- 1.
- Remove this node only
- 2.
- Remove next node if it consists of only white spaces and
optKeepSpaces isn't set.
- 3.
- Replace next node with one space if next node consists only
of white spaces and optKeepSpaces isn't set.
- 4.
- Set this node to ignore for output.
- 5.
- Remove all child nodes
- 6.
- Set all child nodes to ignore for output.
- 7.
- Calculate Attributes values of this node also for nodes
that are set to ignore for output (makes only sense if 8 is also
set).
- removespaces => <removeflags>
- Remove spaces before or after tag.
- 1.
- Remove all white spaces before tag
- 2.
- Remove all white spaces after tag
- 3.
- Remove spaces and tabs before tag
- 4.
- Remove spaces and tabs after tag
- 5.
- Remove all spaces and tabs but one before tag
- 6.
- Remove all whihe space after text inside of tag
- 7.
- Remove spaces and tabs after text inside of tag
- mayjump => 0/1
- If set, tells the compiler that this code may jump to
another program location. (e.g. if, while, goto etc.). Could also be a
condition as described under perlcode.
- compilechilds => 0/1
- Compile child nodes. Default: 1
- stackname => <name>
- Name of stack for "push",
"stackmatch"
- stackname2 => <name>
- Name of stack for "push2"
- push => <value>
- Push value on stack which name is given with
"stackname". Value could include the same specical values as
"perlcode"
- push2 => <value>
- Push value on stack which name is given with
"stackname2". Value could include the same specical values as
"perlcode"
- stackmatch => <value>
- Check if value on stack which name is given with
"stackname" is the same as the given value. If not give a error
message about tag mismatch. Value could include the same specical values
as "perlcode"
- switchcodetype => <1/2>
- 1 means put the following code into normal code which is
executed everytime the page is requested
2 means put the following code put into code which is executed direct after
compilation. This is mainly for defining subs, or using modules etc.
- addflags
- cdatatype
- forcetype
- insidemustexist
- matchall
- exitinside
- addfirstchild
- starttag
- endtag
- parsetimeperlcode
- contains