NAME¶
HTML::StripScripts - Strip scripting constructs out of HTML
SYNOPSIS¶
use HTML::StripScripts;
my $hss = HTML::StripScripts->new({ Context => 'Inline' });
$hss->input_start_document;
$hss->input_start('<i>');
$hss->input_text('hello, world!');
$hss->input_end('</i>');
$hss->input_end_document;
print $hss->filtered_document;
DESCRIPTION¶
This module strips scripting constructs out of HTML, leaving as much
non-scripting markup in place as possible. This allows web applications to
display HTML originating from an untrusted source without introducing XSS
(cross site scripting) vulnerabilities.
You will probably use HTML::StripScripts::Parser rather than using this module
directly.
The process is based on whitelists of tags, attributes and attribute values.
This approach is the most secure against disguised scripting constructs hidden
in malicious HTML documents.
As well as removing scripting constructs, this module ensures that there is a
matching end for each start tag, and that the tags are properly nested.
Previously, in order to customise the output, you needed to subclass
"HTML::StripScripts" and override methods. Now, most customisation
can be done through the "Rules" option provided to
"new()". (See examples/declaration/ and examples/tags/ for cases
where subclassing is necessary.)
The HTML document must be parsed into start tags, end tags and text before it
can be filtered by this module. Use either HTML::StripScripts::Parser or
HTML::StripScripts::Regex instead if you want to input an unparsed HTML
document.
See examples/direct/ for an example of how to feed tokens directly to
HTML::StripScripts.
CONSTRUCTORS¶
- new ( CONFIG )
- Creates a new "HTML::StripScripts" filter object,
bound to a particular filtering policy. If present, the CONFIG parameter
must be a hashref. The following keys are recognized (unrecognized keys
will be silently ignored).
$s = HTML::Stripscripts->new({
Context => 'Document|Flow|Inline|NoTags',
BanList => [qw( br img )] | {br => '1', img => '1'},
BanAllBut => [qw(p div span)],
AllowSrc => 0|1,
AllowHref => 0|1,
AllowRelURL => 0|1,
AllowMailto => 0|1,
EscapeFiltered => 0|1,
Rules => { See below for details },
});
- "Context"
- A string specifying the context in which the filtered
document will be used. This influences the set of tags that will be
allowed.
If present, the "Context" value must be one of:
- "Document"
- If "Context" is "Document" then the
filter will allow a full HTML document, including the "HTML" tag
and "HEAD" and "BODY" sections.
- "Flow"
- If "Context" is "Flow" then most of the
cosmetic tags that one would expect to find in a document body are
allowed, including lists and tables but not including forms.
- "Inline"
- If "Context" is "Inline" then only
inline tags such as "B" and "FONT" are allowed.
- "NoTags"
- If "Context" is "NoTags" then no tags
are allowed.
The default "Context" value is "Flow".
- "BanList"
- If present, this option must be an arrayref or a hashref.
Any tag that would normally be allowed (because it presents no XSS hazard)
will be blocked if the lowercase name of the tag is in this list.
For example, in a guestbook application where "HR" tags are used
to separate posts, you may wish to prevent posts from including
"HR" tags, even though "HR" is not an XSS risk.
- "BanAllBut"
- If present, this option must be reference to an array
holding a list of lowercase tag names. This has the effect of adding all
but the listed tags to the ban list, so that only those tags listed will
be allowed.
- "AllowSrc"
- By default, the filter won't allow constructs that cause
the browser to fetch things automatically, such as "SRC"
attributes in "IMG" tags. If this option is present and true
then those constructs will be allowed.
- "AllowHref"
- By default, the filter won't allow constructs that cause
the browser to fetch things if the user clicks on something, such as the
"HREF" attribute in "A" tags. Set this option to a
true value to allow this type of construct.
- "AllowRelURL"
- By default, the filter won't allow relative URLs such as
"../foo.html" in "SRC" and "HREF" attribute
values. Set this option to a true value to allow them.
"AllowHref" and / or "AllowSrc" also need to be set to
true for this to have any effect.
- "AllowMailto"
- By default, "mailto:" links are not allowed. If
"AllowMailto" is set to a true value, then this construct will
be allowed. This can be enabled separately from AllowHref.
- "EscapeFiltered"
- By default, any filtered tags are outputted as
"<!--filtered-->". If "EscapeFiltered" is set to
a true value, then the filtered tags are converted to HTML entities.
For instance:
<br> --> <br>
- "Rules"
- The "Rules" option provides a very flexible way
of customising the filter.
The focus is safety-first, so it is applied after all of the previous
validation. This means that you cannot all malicious data should already
have been cleared.
Rules can be specified for tags and for attributes. Any tag or attribute not
explicitly listed will be handled by the default "*" rules.
The following is a synopsis of all of the options that you can use to
configure rules. Below, an example is broken into sections and explained.
Rules => {
tag => 0 | 1 | sub { tag_callback }
| {
attr => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
'*' => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
required => [qw(attrname attrname)],
tag => sub { tag_callback }
},
'*' => 0 | 1 | sub { tag_callback }
| {
attr => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
'*' => 0 | 1 | 'regex' | qr/regex/ | sub { attr_callback},
tag => sub { tag_callback }
}
}
EXAMPLE:
Rules => {
##########################
##### EXPLICIT RULES #####
##########################
## Allow <br> tags, reject <img> tags
br => 1,
img => 0,
## Send all <div> tags to a sub
div => sub { tag_callback },
## Allow <blockquote> tags,and allow the 'cite' attribute
## All other attributes are handled by the default C<*>
blockquote => {
cite => 1,
},
## Allow <a> tags, and
a => {
## Allow the 'title' attribute
title => 1,
## Allow the 'href' attribute if it matches the regex
href => '^http://yourdomain.com'
OR href => qr{^http://yourdomain.com},
## 'style' attributes are handled by a sub
style => sub { attr_callback },
## All other attributes are rejected
'*' => 0,
## Additionally, the <a> tag should be handled by this sub
tag => sub { tag_callback},
## If the <a> tag doesn't have these attributes, filter the tag
required => [qw(href title)],
},
##########################
##### DEFAULT RULES #####
##########################
## The default '*' rule - accepts all the same options as above.
## If a tag or attribute is not mentioned above, then the default
## rule is applied:
## Reject all tags
'*' => 0,
## Allow all tags and all attributes
'*' => 1,
## Send all tags to the sub
'*' => sub { tag_callback },
## Allow all tags, reject all attributes
'*' => { '*' => 0 },
## Allow all tags, and
'*' => {
## Allow the 'title' attribute
title => 1,
## Allow the 'href' attribute if it matches the regex
href => '^http://yourdomain.com'
OR href => qr{^http://yourdomain.com},
## 'style' attributes are handled by a sub
style => sub { attr_callback },
## All other attributes are rejected
'*' => 0,
## Additionally, all tags should be handled by this sub
tag => sub { tag_callback},
},
- Tag Callbacks
-
sub tag_callback {
my ($filter,$element) = (@_);
$element = {
tag => 'tag',
content => 'inner_html',
attr => {
attr_name => 'attr_value',
}
};
return 0 | 1;
}
A tag callback accepts two parameters, the $filter object and the
C$element>. It should return 0 to completely ignore the tag and its
content (which includes any nested HTML tags), or 1 to accept and output
the tag.
The $element is a hash ref containing the keys:
- "tag"
- This is the tagname in lowercase, eg "a",
"br", "img". If you set the tag value to an empty
string, then the tag will not be outputted, but the tag contents
will.
- "content"
- This is the equivalent of DOM's innerHTML. It contains the
text content and any HTML tags contained within this element. You can
change the content or set it to an empty string so that it is not
outputted.
- "attr"
- "attr" contains a hashref containing the
attribute names and values
If for instance, you wanted to replace "<b>" tags with
"<span>" tags, you could do this:
sub b_callback {
my ($filter,$element) = @_;
$element->{tag} = 'span';
$element->{attr}{style} = 'font-weight:bold';
return 1;
}
- Attribute Callbacks
-
sub attr_callback {
my ( $filter, $tag, $attr_name, $attr_val ) = @_;
return undef | '' | 'value';
}
Attribute callbacks accept four parameters, the $filter object, the $tag
name, the $attr_name and the $attr_value.
It should return either "undef" to reject the attribute, or the
value to be used. An empty string keeps the attribute, but without a
value.
- "BanList" vs "BanAllBut" vs
"Rules"
- It is not necessary to use "BanList" or
"BanAllBut" - everything can be done via "Rules",
however it may be simpler to write:
BanAllBut => [qw(p div span)]
The logic works as follows:
* If BanAllBut exists, then ban everything but the tags in the list
* Add to the ban list any elements in BanList
* Any tags mentioned explicitly in Rules (eg a => 0, br => 1)
are added or removed from the BanList
* A default rule of { '*' => 0 } would ban all tags except
those mentioned in Rules
* A default rule of { '*' => 1 } would allow all tags except
those disallowed in the ban list, or by explicit rules
METHODS¶
This class provides the following methods:
- hss_init ()
- This method is called by new() and does the actual
initialisation work for the new HTML::StripScripts object.
- input_start_document ()
- This method initializes the filter, and must be called once
before starting on each HTML document to be filtered.
- input_start ( TEXT )
- Handles a start tag from the input document. TEXT must be
the full text of the tag, including angle-brackets.
- input_end ( TEXT )
- Handles an end tag from the input document. TEXT must be
the full text of the end tag, including angle-brackets.
- input_text ( TEXT )
- Handles some non-tag text from the input document.
- input_process ( TEXT )
- Handles a processing instruction from the input
document.
- input_comment ( TEXT )
- Handles an HTML comment from the input document.
- input_declaration ( TEXT )
- Handles an declaration from the input document.
- input_end_document ()
- Call this method to signal the end of the input
document.
- filtered_document ()
- Returns the filtered document as a string.
SUBCLASSING¶
The only reason for subclassing this module now is to add to the list of
accepted tags, attributes and styles (See "WHITELIST INITIALIZATION
METHODS"). Everything else can be achieved with "Rules".
The "HTML::StripScripts" class is subclassable. Filter objects are
plain hashes and "HTML::StripScripts" reserves only hash keys that
start with "_hss". The filter configuration can be set up by
invoking the
hss_init() method, which takes the same arguments as
new().
OUTPUT METHODS¶
The filter outputs a stream of start tags, end tags, text, comments,
declarations and processing instructions, via the following
"output_*" methods. Subclasses may override these to intercept the
filter output.
The default implementations of the "output_*" methods pass the text on
to the
output() method. The default implementation of the
output() method appends the text to a string, which can be fetched with
the
filtered_document() method once processing is complete.
If the
output() method or the individual "output_*" methods are
overridden in a subclass, then
filtered_document() will not work in
that subclass.
- output_start_document ()
- This method gets called once at the start of each HTML
document passed through the filter. The default implementation does
nothing.
- output_end_document ()
- This method gets called once at the end of each HTML
document passed through the filter. The default implementation does
nothing.
- output_start ( TEXT )
- This method is used to output a filtered start tag.
- output_end ( TEXT )
- This method is used to output a filtered end tag.
- output_text ( TEXT )
- This method is used to output some filtered non-tag
text.
- output_declaration ( TEXT )
- This method is used to output a filtered declaration.
- output_comment ( TEXT )
- This method is used to output a filtered HTML comment.
- output_process ( TEXT )
- This method is used to output a filtered processing
instruction.
- output ( TEXT )
- This method is invoked by all of the default
"output_*" methods. The default implementation appends the text
to the string that the filtered_document() method will return.
- output_stack_entry ( TEXT )
- This method is invoked when a tag plus all text and nested
HTML content within the tag has been processed. It adds the tag plus its
content to the content for its parent tag.
REJECT METHODS¶
When the filter encounters something in the input document which it cannot
transform into an acceptable construct, it invokes one of the following
"reject_*" methods to put something in the output document to take
the place of the unacceptable construct.
The TEXT parameter is the full text of the unacceptable construct.
The default implementations of these methods output an HTML comment containing
the text "filtered". If "EscapeFiltered" is set to true,
then the rejected text is HTML escaped instead.
Subclasses may override these methods, but should exercise caution. The TEXT
parameter is unfiltered input and may contain malicious constructs.
- reject_start ( TEXT )
- reject_end ( TEXT )
- reject_text ( TEXT )
- reject_declaration ( TEXT )
- reject_comment ( TEXT )
- reject_process ( TEXT )
WHITELIST INITIALIZATION METHODS¶
The filter refers to various whitelists to determine which constructs are
acceptable. To modify these whitelists, subclasses can override the following
methods.
Each method is called once at object initialization time, and must return a
reference to a nested data structure. These references are installed into the
object, and used whenever the filter needs to refer to a whitelist.
The default implementations of these methods can be invoked as class methods.
See examples/tags/ and examples/declaration/ for examples of how to override
these methods.
- init_context_whitelist ()
- Returns a reference to the "Context" whitelist,
which determines which tags may appear at each point in the document, and
which other tags may be nested within them.
It is a hash, and the keys are context names, such as "Flow" and
"Inline".
The values in the hash are hashrefs. The keys in these subhashes are
lowercase tag names, and the values are context names, specifying the
context that the tag provides to any other tags nested within it.
The special context "EMPTY" as a value in a subhash indicates that
nothing can be nested within that tag.
- init_attrib_whitelist ()
- Returns a reference to the "Attrib" whitelist,
which determines which attributes each tag can have and the values that
those attributes can take.
It is a hash, and the keys are lowercase tag names.
The values in the hash are hashrefs. The keys in these subhashes are
lowercase attribute names, and the values are attribute value class names,
which are short strings describing the type of values that the attribute
can take, such as "color" or "number".
- init_attval_whitelist ()
- Returns a reference to the "AttVal" whitelist,
which is a hash that maps attribute value class names from the
"Attrib" whitelist to coderefs to subs to validate (and
optionally transform) a particular attribute value.
The filter calls the attribute value validation subs with the following
parameters:
- "filter"
- A reference to the filter object.
- "tagname"
- The lowercase name of the tag in which the attribute
appears.
- "attrname"
- The name of the attribute.
- "attrval"
- The attribute value found in the input document, in
canonical form (see "CANONICAL FORM").
The validation sub can return undef to indicate that the attribute should be
removed from the tag, or it can return the new value for the attribute, in
canonical form.
- init_style_whitelist ()
- Returns a reference to the "Style" whitelist,
which determines which CSS style directives are permitted in
"style" tag attributes. The keys are value names such as
"color" and "background-color", and the values are
class names to be used as keys into the "AttVal" whitelist.
- init_deinter_whitelist
- Returns a reference to the "DeInter" whitelist,
which determines which inline tags the filter should attempt to
automatically de-interleave if they are encountered interleaved. For
example, the filter will transform:
<b>hello <i>world</b> !</i>
Into:
<b>hello <i>world</i></b><i> !</i>
because both "b" and "i" appear as keys in the
"DeInter" whitelist.
CHARACTER DATA PROCESSING¶
These methods transform attribute values and non-tag text from the input
document into canonical form (see "CANONICAL FORM"), and transform
text in canonical form into a suitable form for the output document.
- text_to_canonical_form ( TEXT )
- This method is used to reduce non-tag text from the input
document to canonical form before passing it to the filter_text()
method.
The default implementation unescapes all entities that map to
"US-ASCII" characters other than ampersand, and replaces any
ampersands that don't form part of valid entities with
"&".
- quoted_to_canonical_form ( VALUE )
- This method is used to reduce attribute values quoted with
doublequotes or singlequotes to canonical form before passing it to the
handler subs in the "AttVal" whitelist.
The default behavior is the same as that of
"text_to_canonical_form()", plus it converts any CR, LF or TAB
characters to spaces.
- unquoted_to_canonical_form ( VALUE )
- This method is used to reduce attribute values without
quotes to canonical form before passing it to the handler subs in the
"AttVal" whitelist.
The default implementation simply replaces all ampersands with
"&", since that corresponds with the way most browsers
treat entities in unquoted values.
- canonical_form_to_text ( TEXT )
- This method is used to convert the text in canonical form
returned by the filter_text() method to a form suitable for
inclusion in the output document.
The default implementation runs anything that doesn't look like a valid
entity through the escape_html_metachars() method.
- canonical_form_to_attval ( ATTVAL )
- This method is used to convert the text in canonical form
returned by the "AttVal" handler subs to a form suitable for
inclusion in doublequotes in the output tag.
The default implementation converts CR, LF and TAB characters to a single
space, and runs anything that doesn't look like a valid entity through the
escape_html_metachars() method.
- validate_href_attribute ( TEXT )
- If the "AllowHref" filter configuration option is
set, then this method is used to validate "href" type attribute
values. TEXT is the attribute value in canonical form. Returns a possibly
modified attribute value (in canonical form) or "undef" to
reject the attribute.
The default implementation allows only absolute "http" and
"https" URLs, permits port numbers and query strings, and
imposes reasonable length limits.
It does not URI escape the query string, and it does not guarantee properly
formatted URIs, it just tries to give safe URIs. You can always use an
attribute callback (see "Attribute Callbacks") to provide
stricter handling.
- validate_mailto ( TEXT )
- If the "AllowMailto" filter configuration option
is set, then this method is used to validate "href" type
attribute values which begin with "mailto:". TEXT is the
attribute value in canonical form. Returns a possibly modified attribute
value (in canonical form) or "undef" to reject the attribute.
This uses a lightweight regex and does not guarantee that email addresses
are properly formatted. You can always use an attribute callback (see
"Attribute Callbacks") to provide stricter handling.
- validate_src_attribute ( TEXT )
- If the "AllowSrc" filter configuration option is
set, then this method is used to validate "src" type attribute
values. TEXT is the attribute value in canonical form. Returns a possibly
modified attribute value (in canonical form) or "undef" to
reject the attribute.
The default implementation behaves as validate_href_attribute().
OTHER METHODS TO OVERRIDE¶
As well as the output, reject, init and cdata methods listed above, it might
make sense for subclasses to override the following methods:
- filter_text ( TEXT )
- This method will be invoked to filter blocks of non-tag
text in the input document. Both input and output are in canonical form,
see "CANONICAL FORM".
The default implementation does no filtering.
- escape_html_metachars ( TEXT )
- This method is used to escape all HTML metacharacters in
TEXT. The return value must be a copy of TEXT with metacharacters escaped.
The default implementation escapes a minimal set of metacharacters for
security against XSS vulnerabilities. The set of characters to escape is a
compromise between the need for security and the need to ensure that the
filter will work for documents in as many different character sets as
possible.
Subclasses which make strong assumptions about the document character set
will be able to escape much more aggressively.
- strip_nonprintable ( TEXT )
- Returns a copy of TEXT with runs of nonprintable characters
replaced with spaces or some other harmless string. Avoids replacing
anything with the empty string, as that can lead to other security issues.
The default implementation strips out only NULL characters, in order to
avoid scrambling text for as many different character sets as possible.
Subclasses which make some sort of assumption about the character set in use
will be able to have a much wider definition of a nonprintable character,
and hence a more secure strip_nonprintable() implementation.
ATTRIBUTE VALUE HANDLER SUBS¶
References to the following subs appear in the "AttVal" whitelist
returned by the
init_attval_whitelist() method.
- _hss_attval_style( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value hander for the "style"
attribute.
- _hss_attval_size ( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value handler for attributes who's values are
some sort of size or length.
- _hss_attval_number ( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value handler for attributes who's values are a
simple integer.
- _hss_attval_color ( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value handler for color attributes.
- _hss_attval_text ( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value handler for text attributes.
- _hss_attval_word ( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value handler for attributes who's values must
consist of a single short word, with minus characters permitted.
- _hss_attval_wordlist ( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value handler for attributes who's values must
consist of one or more words, separated by spaces and/or commas.
- _hss_attval_wordlistq ( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value handler for attributes who's values must
consist of one or more words, separated by commas, with optional
doublequotes around words and spaces allowed within the doublequotes.
- _hss_attval_href ( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value handler for "href" type
attributes. If the "AllowHref" or "AllowMailto"
configuration options are set, uses the validate_href_attribute()
method to check the attribute value.
- _hss_attval_src ( FILTER, TAGNAME, ATTRNAME, ATTRVAL )
- Attribute value handler for "src" type
attributes. If the "AllowSrc" configuration option is set, uses
the validate_src_attribute() method to check the attribute
value.
- _hss_attval_stylesrc ( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value handler for "src" type style
pseudo attributes.
- _hss_attval_novalue ( FILTER, TAGNAME, ATTRNAME, ATTRVAL
)
- Attribute value handler for attributes that have no value
or a value that is ignored. Just returns the attribute name as the
value.
Many of the methods described above deal with text from the input document,
encoded in what I call "canonical form", defined as follows:
All characters other than ampersands represent themselves. Literal ampersands
are encoded as "&". Non "US-ASCII" characters may
appear as literals in whatever character set is in use, or they may appear as
named or numeric HTML entities such as "æ",
"穩" and "ÿ". Unknown named entities
such as "&foo;" may appear.
The idea is to be able to be able to reduce input text to a minimal form,
without making too many assumptions about the character set in use.
PRIVATE METHODS¶
The following methods are internal to this class, and should not be invoked from
elsewhere. Subclasses should not use or override these methods.
- _hss_prepare_ban_list (CFG)
- Returns a hash ref representing all the banned tags, based
on the values of BanList and BanAllBut
- _hss_prepare_rules (CFG)
- Returns a hash ref representing the tag and attribute rules
(See "Rules").
Returns undef if no filters are specified, in which case the attribute
filter code has very little performance impact. If any rules are
specified, then every tag and attribute is checked.
- _hss_get_attr_filter ( DEFAULT_FILTERS TAG_FILTERS
ATTR_NAME)
- Returns the attribute filter rule to apply to this
particular attribute.
Checks for:
- a named attribute rule in a named tag
- a default * attribute rule in a named tag
- a named attribute rule in the default * rules
- a default * attribute rule in the default * rules
- _hss_join_attribs (FILTERED_ATTRIBS)
- Accepts a hash ref containing the attribute names as the
keys, and the attribute values as the values. Escapes them and returns a
string ready for output to HTML
- _hss_decode_numeric ( NUMERIC )
- Returns the string that should replace the numeric entity
NUMERIC in the text_to_canonical_form() method.
- _hss_tag_is_banned ( TAGNAME )
- Returns true if the lower case tag name TAGNAME is on the
list of harmless tags that the filter is configured to block, false
otherwise.
- _hss_get_to_valid_context ( TAG )
- Tries to get the filter to a context in which the tag TAG
is allowed, by introducing extra end tags or start tags if necessary. TAG
can be either the lower case name of a tag or the string 'CDATA'.
Returns 1 if an allowed context is reached, or 0 if there's no reasonable
way to get to an allowed context and the tag should just be rejected.
- _hss_close_innermost_tag ()
- Closes the innermost open tag.
- _hss_context ()
- Returns the current named context of the filter.
- _hss_valid_in_context ( TAG, CONTEXT )
- Returns true if the lowercase tag name TAG is valid in
context CONTEXT, false otherwise.
- _hss_valid_in_current_context ( TAG )
- Returns true if the lowercase tag name TAG is valid in the
filter's current context, false otherwise.
BUGS AND LIMITATIONS¶
- Performance
- This module does a lot of work to ensure that tags are
correctly nested and are not left open, causing unnecessary overhead for
applications where that doesn't matter.
Such applications may benefit from using the more lightweight
HTML::Scrubber::StripScripts module instead.
- Strictness
- URIs and email addresses are cleaned up to be safe, but not
necessarily accurate. That would have required adding dependencies.
Attribute callbacks can be used to add this functionality if required, or
the validation methods can be overriden.
By default, filtered HTML may not be valid strict XHTML, for instance empty
required attributes may be outputted. However, with "Rules", it
should be possible to force the HTML to validate.
- REPORTING BUGS
- Please report any bugs or feature requests to
bug-html-stripscripts@rt.cpan.org, or through the web interface at
<http://rt.cpan.org>.
SEE ALSO¶
HTML::Parser, HTML::StripScripts::Parser, HTML::StripScripts::Regex
AUTHOR¶
Original author Nick Cleaton <nick@cleaton.net>
New code added and module maintained by Clinton Gormley
<clint@traveljury.com>
COPYRIGHT¶
Copyright (C) 2003 Nick Cleaton. All Rights Reserved.
Copyright (C) 2007 Clinton Gormley. All Rights Reserved.
LICENSE¶
This module is free software; you can redistribute it and/or modify it under the
same terms as Perl itself.