.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "Tree::XPathEngine 3pm" .TH Tree::XPathEngine 3pm "2022-11-20" "perl v5.36.0" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" Tree::XPathEngine \- a re\-usable XPath engine .SH "DESCRIPTION" .IX Header "DESCRIPTION" This module provides an XPath engine, that can be re-used by other module/classes that implement trees. .PP It is designed to be compatible with Class::XPath, ie it passes its tests if you replace Class::XPath by Tree::XPathEngine. .PP This code is a more or less direct copy of the XML::XPath module by Matt Sergeant. I only removed the \s-1XML\s0 processing part (that parses an \s-1XML\s0 document and load it as a tree in memory) to remove the dependency on XML::Parser, applied a couple of patches, removed a whole bunch of \s-1XML\s0 specific things (comment, processing inistructions, namespaces...), renamed a whole lot of methods to make Pod::Coverage happy, and changed the docs. .PP The article eXtending \s-1XML\s0 XPath, http://www.xmltwig.com/article/extending_xml_xpath/ should give authors who want to use this module enough background to do so. .PP Otherwise, my email is below ;\-\-) .PP \&\fB\s-1WARNING\s0\fR: while the underlying code is rather solid, this module most likely lacks docs. .PP As they say, \*(L"patches welcome\*(R"... but I am also interested in any experience using this module, what were the tricky parts, and how could the code or the docs be improved. .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use Tree::XPathEngine; \& \& my $tree= my_tree\->new( ...); \& my $xp = Tree::XPathEngine\->new(); \& \& my @nodeset = $xp\->find(\*(Aq/root/kid/grankid[1]\*(Aq); # find all first grankids \& \& package tree; \& \& # needs to provide these methods \& sub xpath_get_name { ... } \& sub xpath_get_next_sibling { ... } \& sub xpath_get_previous_sibling { ... } \& sub xpath_get_root_node { ... } \& sub xpath_get_parent_node { ... } \& sub xpath_get_child_nodes { ... } \& sub xpath_is_element_node { return 1; } \& sub xpath_cmp { ... } \& sub xpath_get_attributes { ... } # only if attributes are used \& sub xpath_to_literal { ... } # only if you want to use findnodes_as_string or findvalue .Ve .SH "DETAILS" .IX Header "DETAILS" .SH "API" .IX Header "API" The \s-1API\s0 of Tree::XPathEngine itself is extremely simple to allow you to get going almost immediately. The deeper \s-1API\s0's are more complex, but you shouldn't have to touch most of that. .ie n .SS "new %options" .el .SS "new \f(CW%options\fP" .IX Subsection "new %options" \fIoptions\fR .IX Subsection "options" .IP "xpath_name_re" 4 .IX Item "xpath_name_re" a regular expression used to match names (node names or attribute names) by default it is qr/[A\-Za\-z_][\ew.\-]*/ in order to work under perl 5.6.n, but you might want to use something like qr/\ep{L}[\ew.\-]*/ in 5.8.n, to accommodate letter outside of the ascii range. .ie n .SS "findnodes ($path, $context)" .el .SS "findnodes ($path, \f(CW$context\fP)" .IX Subsection "findnodes ($path, $context)" Returns a list of nodes found by \f(CW$path\fR, in context \f(CW$context\fR. In scalar context returns an \f(CW\*(C`Tree::XPathEngine::NodeSet\*(C'\fR object. .ie n .SS "findnodes_as_string ($path, $context)" .el .SS "findnodes_as_string ($path, \f(CW$context\fP)" .IX Subsection "findnodes_as_string ($path, $context)" Returns the text values of the nodes .ie n .SS "findvalue ($path, $context)" .el .SS "findvalue ($path, \f(CW$context\fP)" .IX Subsection "findvalue ($path, $context)" Returns either a \f(CW\*(C`Tree::XPathEngine::Literal\*(C'\fR, a \f(CW\*(C`Tree::XPathEngine::Boolean\*(C'\fR or a \f(CW\*(C`Tree::XPathEngine::Number\*(C'\fR object. If the path returns a NodeSet, \&\f(CW$nodeset\fR\->xpath_to_literal is called automatically for you (and thus a \&\f(CW\*(C`Tree::XPathEngine::Literal\*(C'\fR is returned). Note that for each of the objects stringification is overloaded, so you can just print the value found, or manipulate it in the ways you would a normal perl value (e.g. using regular expressions). .ie n .SS "exists ($path, $context)" .el .SS "exists ($path, \f(CW$context\fP)" .IX Subsection "exists ($path, $context)" Returns true if the given path exists. .ie n .SS "matches($node, $path, $context)" .el .SS "matches($node, \f(CW$path\fP, \f(CW$context\fP)" .IX Subsection "matches($node, $path, $context)" Returns true if the node matches the path. .ie n .SS "find ($path, $context)" .el .SS "find ($path, \f(CW$context\fP)" .IX Subsection "find ($path, $context)" The find function takes an XPath expression (a string) and returns either a Tree::XPathEngine::NodeSet object containing the nodes it found (or empty if no nodes matched the path), or one of Tree::XPathEngine::Literal (a string), Tree::XPathEngine::Number, or Tree::XPathEngine::Boolean. It should always return something \- and you can use \->\fBisa()\fR to find out what it returned. If you need to check how many nodes it found you should check \f(CW$nodeset\fR\->size. See Tree::XPathEngine::NodeSet. .SS "XPath variables" .IX Subsection "XPath variables" XPath lets you use variables in expressions (see the XPath spec: ). .ie n .IP "set_var ($var_name, $val)" 4 .el .IP "set_var ($var_name, \f(CW$val\fR)" 4 .IX Item "set_var ($var_name, $val)" sets the variable \f(CW$var_name\fR to val .IP "get_var ($var_name)" 4 .IX Item "get_var ($var_name)" get the value of the variable (there should be no need to use this method from outside the module, but it looked silly to have \f(CW\*(C`set_var\*(C'\fR and \f(CW\*(C`_get_var\*(C'\fR). .SH "How to use this module" .IX Header "How to use this module" The purpose of this module is to add XPah support to generic tree modules. .PP It works by letting you create a Tree::XPathEngine object, that will be called to resolve XPath queries on a context. The context is a node (or a list of nodes) in a tree. .PP The tree should share some characteristics with a \s-1XML\s0 tree: it is made of nodes, there are 2 kinds of nodes, document (the whole tree, the root of the tree is a child of this node), elements(regular nodes in the tree) and attributes. .PP Nodes in the tree are expected to provide methods that will be called by the XPath engine to resolve the query. Not all of the possible methods need be available, depending on the type of XPath queries that need to be supported: for example if the nodes do not have a text value then there is no need for a \&\f(CW\*(C`string_value\*(C'\fR method, and XPath queries cannot include the \f(CW\*(C`string()\*(C'\fR function (using it will trigger a \fBruntime\fR error). .PP Most of the expected methods are usual methods for a tree module, so it should not be too difficult to implement them, by aliasing existing methods to the required ones. .PP Just in case, here is a fast way to alias for example your own \f(CW\*(C`parent\*(C'\fR method to the \f(CW\*(C`get_parent_node\*(C'\fR needed by Tree::XPathEngine: .PP .Vb 1 \& *get_parent_node= *parent; # in the node package .Ve .PP The XPath engine expects the whole tree and attributes to be full blown objects, which provide a set of methods similar to nodes. If they are not, see below for ways to \*(L"fake\*(R" it. .SS "Methods to be provided by the nodes" .IX Subsection "Methods to be provided by the nodes" .IP "xpath_get_name" 4 .IX Item "xpath_get_name" returns the name of the node. .Sp Not used for the document. .IP "xpath_string_value" 4 .IX Item "xpath_string_value" The text corresponding to the node, used by the \f(CW\*(C`string()\*(C'\fR function (for queries like \f(CW\*(C`//foo[string()="bar"]\*(C'\fR) .IP "xpath_get_next_sibling" 4 .IX Item "xpath_get_next_sibling" .PD 0 .IP "xpath_get_previous_sibling" 4 .IX Item "xpath_get_previous_sibling" .IP "xpath_get_root_node" 4 .IX Item "xpath_get_root_node" .PD returns the document object. see \*(L"Document object\*(R" below for more details. .IP "xpath_get_parent_node" 4 .IX Item "xpath_get_parent_node" The parent of the root of the tree is the document node. .Sp The parent of an attribute is its element. .IP "xpath_get_child_nodes" 4 .IX Item "xpath_get_child_nodes" returns a list of children. .Sp note that the attributes are not children of an element .IP "xpath_is_element_node" 4 .IX Item "xpath_is_element_node" .PD 0 .IP "xpath_is_document_node" 4 .IX Item "xpath_is_document_node" .IP "xpath_is_attribute_node" 4 .IX Item "xpath_is_attribute_node" .IP "xpath_is_text_node" 4 .IX Item "xpath_is_text_node" .PD only if the tree includes textual nodes .IP "xpath_to_string" 4 .IX Item "xpath_to_string" returns the node as a string .IP "xpath_to_number" 4 .IX Item "xpath_to_number" returns the node value as a number object .Sp .Vb 2 \& sub xpath_to_number \& { return XML::XPath::Number\->new( $_[0]\->xpath_string_value); } .Ve .ie n .IP "xpath_cmp ($node_a, $node_b)" 4 .el .IP "xpath_cmp ($node_a, \f(CW$node_b\fR)" 4 .IX Item "xpath_cmp ($node_a, $node_b)" compares 2 nodes and returns \-1, 0 or 1 depending on whether \f(CW$a_node\fR is before, equal to or after \f(CW$b_node\fR in the tree. .Sp This is needed in order to return sorted results and to remove duplicates. .Sp See \*(L"Ordering nodesets\*(R" below for a ready-to-use sorting method if your tree does not have a \f(CW\*(C`cmp\*(C'\fR method .SS "Element specific methods" .IX Subsection "Element specific methods" .IP "xpath_get_attributes" 4 .IX Item "xpath_get_attributes" returns the list of attributes, attributes should be objects that support the following methods: .SH "Tricky bits" .IX Header "Tricky bits" .SS "Document object" .IX Subsection "Document object" The original XPath works on \s-1XML,\s0 and is roughly speaking based on the \s-1DOM\s0 model of an \s-1XML\s0 document. As far as the XPath engine is concerned, it still deals with a \s-1DOM\s0 tree. .PP One of the possibly annoying consequences is that in the \s-1DOM\s0 the document itself is a node, that has a single element child, the root of the document tree. If the tree you want to use this module on doesn't follow that model, if its root element \fBis\fR the tree itself, then you will have to fake it. .PP This is how I did it in Tree::DAG_Node::XPath: .PP .Vb 7 \& # in package Tree::DAG_Node::XPath \& sub xpath_get_root_node \& { my $node= shift; \& # The parent of root is a Tree::DAG_Node::XPath::Root \& # that helps getting the tree to mimic a DOM tree \& return $node\->root\->xpath_get_parent_node; \& } \& \& sub xpath_get_parent_node \& { my $node= shift; \& \& return $node\->mother # normal case, any node but the root \& # the root parent is a Tree::DAG_Node::XPath::Root object \& # which contains the reference of the (real) root node \& || bless { root => $node }, \*(AqTree::DAG_Node::XPath::Root\*(Aq; \& } \& \& # class for the fake root for a tree \& package Tree::DAG_Node::XPath::Root; \& \& \& sub xpath_get_child_nodes { return ( $_[0]\->{root}); } \& sub address { return \-1; } # the root is before all other nodes \& sub xpath_get_attributes { return [] } \& sub xpath_is_document_node { return 1 } \& sub xpath_is_element_node { return 0 } \& sub xpath_is_attribute_node { return 0 } .Ve .SS "Attribute objects" .IX Subsection "Attribute objects" If the attributes in the original tree are not objects, but simple fields in a hash, you can generate objects on the fly: .PP .Vb 11 \& # in the element package \& sub xpath_get_attributes \& { my $elt= shift; \& my $atts= $elt\->attributes; # returns a reference to a hash of attributes \& my $rank=\-1; # used for sorting \& my @atts= map { bless( { name => $_, value => $atts\->{$_}, elt => $elt, rank => $rank \-\- }, \& \*(AqTree::DAG_Node::XPath::Attribute\*(Aq) \& } \& sort keys %$atts; \& return @atts; \& } \& \& # the attribute package \& package Tree::DAG_Node::XPath::Attribute; \& use Tree::XPathEngine::Number; \& \& # not used, instead get_attributes in Tree::DAG_Node::XPath directly returns an \& # object blessed in this class \& #sub new \& # { my( $class, $elt, $att)= @_; \& # return bless { name => $att, value => $elt\->att( $att), elt => $elt }, $class; \& # } \& \& sub xpath_get_value { return $_[0]\->{value}; } \& sub xpath_get_name { return $_[0]\->{name} ; } \& sub xpath_string_value { return $_[0]\->{value}; } \& sub xpath_to_number { return Tree::XPathEngine::Number\->new( $_[0]\->{value}); } \& sub xpath_is_document_node { 0 } \& sub xpath_is_element_node { 0 } \& sub xpath_is_attribute_node { 1 } \& sub to_string { return qq{$_[0]\->{name}="$_[0]\->{value}"}; } \& \& # Tree::DAG_Node uses the address field to sort nodes, which simplifies things quite a bit \& sub xpath_cmp { $_[0]\->address cmp $_[1]\->address } \& sub address \& { my $att= shift; \& my $elt= $att\->{elt}; \& return $elt\->address . \*(Aq:\*(Aq . $att\->{rank}; \& } .Ve .SS "Ordering nodesets" .IX Subsection "Ordering nodesets" XPath query results must be sorted, and duplicates removed, so the XPath engine needs to be able to sort nodes. .PP I does so by calling the \f(CW\*(C`cmp\*(C'\fR method on nodes. .PP One of the easiest way to write such a method, for static trees, is to have a method of the object return its position in the tree as a number. .PP If that is not possible, here is a method that should work (note that it only compares elements): .PP .Vb 1 \& # in the tree element package \& \& sub xpath_cmp($$) \& { my( $a, $b)= @_; \& if( UNIVERSAL::isa( $b, $ELEMENT)) # $ELEMENT is the tree element class \& { # 2 elts, compare them \& return $a\->elt_cmp( $b); \& } \& elsif( UNIVERSAL::isa( $b, $ATTRIBUTE)) # $ATTRIBUTE is the attribute class \& { # elt <=> att, compare the elt to the att\->{elt} \& # if the elt is the att\->{elt} (cmp return 0) then \-1, elt is before att \& return ($a\->elt_cmp( $b\->{elt}) ) || \-1 ; \& } \& elsif( UNIVERSAL::isa( $b, $TREE)) # $TREE is the tree class \& { # elt <=> document, elt is after document \& return 1; \& } \& else \& { die "unknown node type ", ref( $b); } \& } \& \& \& sub elt_cmp \& { my( $a, $b)=@_; \& \& # easy cases \& return 0 if( $a == $b); \& return 1 if( $a\->in($b)); # a starts after b \& return \-1 if( $b\->in($a)); # a starts before b \& \& # ancestors does not include the element itself \& my @a_pile= ($a, $a\->ancestors); \& my @b_pile= ($b, $b\->ancestors); \& \& # the 2 elements are not in the same twig \& return undef unless( $a_pile[\-1] == $b_pile[\-1]); \& \& # find the first non common ancestors (they are siblings) \& my $a_anc= pop @a_pile; \& my $b_anc= pop @b_pile; \& \& while( $a_anc == $b_anc) \& { $a_anc= pop @a_pile; \& $b_anc= pop @b_pile; \& } \& \& # from there move left and right and figure out the order \& my( $a_prev, $a_next, $b_prev, $b_next)= ($a_anc, $a_anc, $b_anc, $b_anc); \& while() \& { $a_prev= $a_prev\->_prev_sibling || return( \-1); \& return 1 if( $a_prev == $b_next); \& $a_next= $a_next\->_next_sibling || return( 1); \& return \-1 if( $a_next == $b_prev); \& $b_prev= $b_prev\->_prev_sibling || return( 1); \& return \-1 if( $b_prev == $a_next); \& $b_next= $b_next\->_next_sibling || return( \-1); \& return 1 if( $b_next == $a_prev); \& } \& } \& \& sub in \& { my ($self, $ancestor)= @_; \& while( $self= $self\->xpath_get_parent_node) { return $self if( $self == $ancestor); } \& } \& \& sub ancestors \& { my( $self)= @_; \& while( $self= $self\->xpath_get_parent_node) { push @ancestors, $self; } \& return @ancestors; \& } \& \& # in the attribute package \& sub xpath_cmp($$) \& { my( $a, $b)= @_; \& if( UNIVERSAL::isa( $b, $ATTRIBUTE)) \& { # 2 attributes, compare their elements, then their name \& return ($a\->{elt}\->elt_cmp( $b\->{elt}) ) || ($a\->{name} cmp $b\->{name}); \& } \& elsif( UNIVERSAL::isa( $b, $ELEMENT)) \& { # att <=> elt : compare the att\->elt and the elt \& # if att\->elt is the elt (cmp returns 0) then 1 (elt is before att) \& return ($a\->{elt}\->elt_cmp( $b) ) || 1 ; \& } \& elsif( UNIVERSAL::isa( $b, $TREE)) \& { # att <=> document, att is after document \& return 1; \& } \& else \& { die "unknown node type ", ref( $b); } \& } .Ve .SH "XPath extension" .IX Header "XPath extension" The module supports the XPath recommendation to the same extend as XML::XPath (that is, rather completely). .PP It includes a perl-specific extension: direct support for regular expressions. .PP You can use the usual (in Perl!) \f(CW\*(C`=~\*(C'\fR and \f(CW\*(C`!~\*(C'\fR operators. Regular expressions are / delimited (no other delimiter is accepted, \e inside regexp must be backslashed), the \f(CW\*(C`imsx\*(C'\fR modifiers can be used. .PP .Vb 2 \& $xp\->findnodes( \*(Aq//@att[.=~ /^v.$/]\*(Aq); # returns the list of attributes att \& # whose value matches ^v.$ .Ve .SH "TODO" .IX Header "TODO" provide inheritable node and attribute classes for typical cases, starting with nodes where the root \s-1IS\s0 the tree, and where attributes are a simple hash (similar to what I did in Tree::DAG_Node). .PP better docs (patches welcome). .SH "SEE ALSO" .IX Header "SEE ALSO" Tree::DAG_Node::XPath for an exemple of using this module .PP for background information .PP Class::XPath, which is probably easier to use, but at this point supports much less of XPath that Tree::XPathEngine. .SH "AUTHOR" .IX Header "AUTHOR" Michel Rodriguez, \f(CW\*(C`\*(C'\fR .PP This code is heavily based on the code for XML::XPath by Matt Sergeant copyright 2000 Axkit.com Ltd .SH "BUGS" .IX Header "BUGS" Please report any bugs or feature requests to \&\f(CW\*(C`bug\-tree\-xpathengine@rt.cpan.org\*(C'\fR, or through the web interface at . I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. .SH "ACKNOWLEDGEMENTS" .IX Header "ACKNOWLEDGEMENTS" .SH "COPYRIGHT & LICENSE" .IX Header "COPYRIGHT & LICENSE" XML::XPath Copyright 2000\-2004 AxKit.com Ltd. Copyright 2006 Michel Rodriguez, All Rights Reserved. .PP This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.