.\" Automatically generated by Pod::Man 4.14 (Pod::Simple 3.43)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "RSSLite 3pm"
.TH RSSLite 3pm "2022-11-20" "perl v5.36.0" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
XML::RSSLite \- lightweight, "relaxed" RSS (and XML\-ish) parser
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 1
\&  use XML::RSSLite;
\&
\&  parseRSS(\e%result, \e$content);
\&
\&  print "=== Channel ===\en",
\&        "Title: $result{\*(Aqtitle\*(Aq}\en",
\&        "Desc:  $result{\*(Aqdescription\*(Aq}\en",
\&        "Link:  $result{\*(Aqlink\*(Aq}\en\en";
\&
\&  foreach $item (@{$result{\*(Aqitems\*(Aq}}) {
\&  print "  \-\-\- Item \-\-\-\en",
\&        "  Title: $item\->{\*(Aqtitle\*(Aq}\en",
\&        "  Desc:  $item\->{\*(Aqdescription\*(Aq}\en",
\&        "  Link:  $item\->{\*(Aqlink\*(Aq}\en\en";
\&  }
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
This module attempts to extract the maximum amount of content from
available documents, and is less concerned with \s-1XML\s0 compliance than
alternatives. Rather than rely on XML::Parser, it uses heuristics and good
old-fashioned Perl regular expressions. It stores the data in a simple
hash structure, and \*(L"aliases\*(R" certain tags so that when done, you can
count on having the minimal data necessary for re-constructing a valid
\&\s-1RSS\s0 file. This means you get the basic title, description, and link for a
channel and its items.
.PP
This module extracts more usable links by parsing \*(L"scriptingNews\*(R" and
\&\*(L"weblog\*(R" formats in addition to \s-1RDF & RSS.\s0 It also \*(L"sanitizes\*(R" the
output for best results. The munging includes:
.IP "Remove html tags to leave plain text" 4
.IX Item "Remove html tags to leave plain text"
.PD 0
.IP "Remove leading whitespace from URIs" 4
.IX Item "Remove leading whitespace from URIs"
.IP "By defaul strips characters except 0\-9~!@#$%^&*()\-+=a\-zA\-Z[];',.:""<>?\es" 4
.IX Item "By defaul strips characters except 0-9~!@#$%^&*()-+=a-zA-Z[];',.:""<>?s"
.IP "Use <url> tags when <link> is empty" 4
.IX Item "Use <url> tags when <link> is empty"
.IP "Use misplaced urls in <title> when <link> is empty" 4
.IX Item "Use misplaced urls in <title> when <link> is empty"
.IP "Exract links from <a href=...> if required" 4
.IX Item "Exract links from <a href=...> if required"
.IP "Limit links to ftp and http(s)" 4
.IX Item "Limit links to ftp and http(s)"
.IP "Join relative item urls (beginning with / or #) to the site base" 4
.IX Item "Join relative item urls (beginning with / or #) to the site base"
.PD
.SS "\s-1EXPORT\s0"
.IX Subsection "EXPORT"
.ie n .IP "parseRSS($outHashRef, $inScalarRef, [$strip])" 4
.el .IP "parseRSS($outHashRef, \f(CW$inScalarRef\fR, [$strip])" 4
.IX Item "parseRSS($outHashRef, $inScalarRef, [$strip])"
.RS 4
.PD 0
.IP "inScalarRef \- required" 4
.IX Item "inScalarRef - required"
.PD
Reference to a scalar containing the document to be parsed. \s-1NOTE:\s0 The
contents will effectively be destroyed. Make a deep copy first if you care.
.IP "outHashRef \- required" 4
.IX Item "outHashRef - required"
Reference to the hash within which to store the parsed content.
.IP "strip \- optional" 4
.IX Item "strip - optional"
An expression indicating the level of winnowing to be performed on the
characters permitted in the results.
.RS 4
.IP "1 strip non-printable characters" 4
.IX Item "1 strip non-printable characters"
.PD 0
.IP "0 no characters are removed" 4
.IX Item "0 no characters are removed"
.IP "undefined (Default) strip everything but:" 4
.IX Item "undefined (Default) strip everything but:"
.PD
0\-9~!@#$%^&*()\-+= a\-zA\-Z[];',.:"<>?\et\en
.RE
.RS 4
.RE
.RE
.RS 4
.RE
.SS "\s-1EXPORTABLE\s0"
.IX Subsection "EXPORTABLE"
.ie n .IP "parseXML(\e%parsedTree, \e$parseThis, 'topTag', $comments);" 4
.el .IP "parseXML(\e%parsedTree, \e$parseThis, 'topTag', \f(CW$comments\fR);" 4
.IX Item "parseXML(%parsedTree, $parseThis, 'topTag', $comments);"
.RS 4
.PD 0
.IP "parsedTree \- required" 4
.IX Item "parsedTree - required"
.PD
Reference to hash to store the parsed document within.
.IP "parseThis  \- required" 4
.IX Item "parseThis - required"
Reference to scalar containing the document to parse.
.IP "topTag     \- optional" 4
.IX Item "topTag - optional"
Tag to consider the root node, leaving this undefined is not recommended.
.IP "comments   \- optional" 4
.IX Item "comments - optional"
.RS 4
.PD 0
.IP "false will remove contents from parseThis" 4
.IX Item "false will remove contents from parseThis"
.IP "true will not remove comments from parseThis" 4
.IX Item "true will not remove comments from parseThis"
.IP "array reference is true, comments are stored here" 4
.IX Item "array reference is true, comments are stored here"
.RE
.RS 4
.RE
.RE
.RS 4
.RE
.PD
.SS "\s-1CAVEATS\s0"
.IX Subsection "CAVEATS"
This is not a conforming parser. It does not handle the following
.IP "\(bu" 4

.Sp
.Vb 1
\&  <foo bar=">">
.Ve
.IP "\(bu" 4

.Sp
.Vb 1
\&  <foo><bar> <bar></bar> <bar></bar> </bar></foo>
.Ve
.IP "\(bu" 4

.Sp
.Vb 1
\&  <![CDATA[ ]]>
.Ve
.IP "\(bu" 4

.Sp
.Vb 1
\&  PI
.Ve
.PP
It's non-validating, without a \s-1DTD\s0 the following cannot be properly addressed
.IP "entities" 4
.IX Item "entities"
.PD 0
.IP "namespaces" 4
.IX Item "namespaces"
.PD
This may or may not be arriving in some future release.
.SH "SEE ALSO"
.IX Header "SEE ALSO"
\&\fBperl\fR\|(1), \f(CW\*(C`XML::RSS\*(C'\fR, \f(CW\*(C`XML::SAX::PurePerl\*(C'\fR,
\&\f(CW\*(C`XML::Parser::Lite\*(C'\fR, <XML::Parser>
.SH "AUTHOR"
.IX Header "AUTHOR"
Jerrad Pierce <jpierce@cpan.org>.
.PP
Scott Thomason <scott@thomasons.org>
.SH "LICENSE"
.IX Header "LICENSE"
Portions Copyright (c) 2002,2003,2009 Jerrad Pierce, (c) 2000 Scott Thomason.
All rights reserved. This program is free software; you can redistribute it 
and/or modify it under the same terms as Perl itself.