.\" -*- mode: troff; coding: utf-8 -*- .\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. .ie n \{\ . ds C` "" . ds C' "" 'br\} .el\{\ . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is >0, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{\ . if \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{\ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "HTML::HTML5::Writer 3pm" .TH HTML::HTML5::Writer 3pm 2024-03-05 "perl v5.38.2" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH NAME HTML::HTML5::Writer \- output a DOM as HTML5 .SH SYNOPSIS .IX Header "SYNOPSIS" .Vb 1 \& use HTML::HTML5::Writer; \& \& my $writer = HTML::HTML5::Writer\->new; \& print $writer\->document($dom); .Ve .SH DESCRIPTION .IX Header "DESCRIPTION" This module outputs XML::LibXML::Node objects as HTML5 strings. It works well on DOM trees that represent valid HTML/XHTML documents; less well on other DOM trees. .SS Constructor .IX Subsection "Constructor" .ie n .IP """$writer = HTML::HTML5::Writer\->new(%opts)""" 4 .el .IP "\f(CW$writer = HTML::HTML5::Writer\->new(%opts)\fR" 4 .IX Item "$writer = HTML::HTML5::Writer->new(%opts)" Create a new writer object. Options include: .RS 4 .IP \(bu 4 \&\fBmarkup\fR .Sp Choose which serialisation of HTML5 to use: 'html' or 'xhtml'. .IP \(bu 4 \&\fBpolyglot\fR .Sp Set to true in order to attempt to produce output which works as both XML and HTML. Set to false to produce content that might not. .Sp If you don't explicitly set it, then it defaults to false for HTML, and true for XHTML. .IP \(bu 4 \&\fBdoctype\fR .Sp Set this to a string to choose which tag to output. Note, this purely sets the tag and does not change how the rest of the document is output. This really is just a plain string literal... .Sp .Vb 2 \& # Yes, this works... \& my $w = HTML::HTML5::Writer\->new(doctype => \*(Aq\*(Aq); .Ve .Sp The following constants are provided for convenience: \&\fBDOCTYPE_HTML2\fR, \&\fBDOCTYPE_HTML32\fR, \&\fBDOCTYPE_HTML4\fR (latest stable strict HTML 4.x), \&\fBDOCTYPE_HTML4_RDFA\fR (latest stable HTML 4.x+RDFa), \&\fBDOCTYPE_HTML40\fR (strict), \&\fBDOCTYPE_HTML40_FRAMESET\fR, \&\fBDOCTYPE_HTML40_LOOSE\fR, \&\fBDOCTYPE_HTML40_STRICT\fR, \&\fBDOCTYPE_HTML401\fR (strict), \&\fBDOCTYPE_HTML401_FRAMESET\fR, \&\fBDOCTYPE_HTML401_LOOSE\fR, \&\fBDOCTYPE_HTML401_RDFA10\fR, \&\fBDOCTYPE_HTML401_RDFA11\fR, \&\fBDOCTYPE_HTML401_STRICT\fR, \&\fBDOCTYPE_HTML5\fR, \&\fBDOCTYPE_LEGACY\fR (about:legacy\-compat), \&\fBDOCTYPE_NIL\fR (empty string), \&\fBDOCTYPE_XHTML1\fR (strict), \&\fBDOCTYPE_XHTML1_FRAMESET\fR, \&\fBDOCTYPE_XHTML1_LOOSE\fR, \&\fBDOCTYPE_XHTML1_STRICT\fR, \&\fBDOCTYPE_XHTML11\fR, \&\fBDOCTYPE_XHTML_BASIC\fR, \&\fBDOCTYPE_XHTML_BASIC_10\fR, \&\fBDOCTYPE_XHTML_BASIC_11\fR, \&\fBDOCTYPE_XHTML_MATHML_SVG\fR, \&\fBDOCTYPE_XHTML_RDFA\fR (latest stable strict XHTML+RDFa), \&\fBDOCTYPE_XHTML_RDFA10\fR, \&\fBDOCTYPE_XHTML_RDFA11\fR. .Sp Defaults to DOCTYPE_HTML5 for HTML and DOCTYPE_LEGACY for XHTML. .IP \(bu 4 \&\fBcharset\fR .Sp This module always returns strings in Perl's internal utf8 encoding, but you can set the 'charset' option to 'ascii' to create output that would be suitable for re-encoding to ASCII (e.g. it will entity-encode characters which do not exist in ASCII). .IP \(bu 4 \&\fBquote_attributes\fR .Sp Set this to a true to force attributes to be quoted. If not explicitly set, the writer will automatically detect when attributes need quoting. .IP \(bu 4 \&\fBvoids\fR .Sp Set this to true to force void elements to always be terminated with '/>'. If not explicitly set, they'll only be terminated that way in polyglot or XHTML documents. .IP \(bu 4 \&\fBstart_tags\fR and \fBend_tags\fR .Sp Except in polyglot and XHTML documents, some elements allow their start and/or end tags to be omitted in certain circumstances. By setting these to true, you can prevent them from being omitted. .IP \(bu 4 \&\fBrefs\fR .Sp Special characters that can't be encoded as named entities need to be encoded as numeric character references instead. These can be expressed in decimal or hexadecimal. Setting this option to \&'dec' or 'hex' allows you to choose. The default is 'hex'. .RE .RS 4 .RE .SS "Public Methods" .IX Subsection "Public Methods" .ie n .IP """$writer\->document($node)""" 4 .el .IP \f(CW$writer\->document($node)\fR 4 .IX Item "$writer->document($node)" Outputs (i.e. returns a string that is) an XML::LibXML::Document as HTML. .ie n .IP """$writer\->element($node)""" 4 .el .IP \f(CW$writer\->element($node)\fR 4 .IX Item "$writer->element($node)" Outputs an XML::LibXML::Element as HTML. .ie n .IP """$writer\->attribute($node)""" 4 .el .IP \f(CW$writer\->attribute($node)\fR 4 .IX Item "$writer->attribute($node)" Outputs an XML::LibXML::Attr as HTML. .ie n .IP """$writer\->text($node)""" 4 .el .IP \f(CW$writer\->text($node)\fR 4 .IX Item "$writer->text($node)" Outputs an XML::LibXML::Text as HTML. .ie n .IP """$writer\->cdata($node)""" 4 .el .IP \f(CW$writer\->cdata($node)\fR 4 .IX Item "$writer->cdata($node)" Outputs an XML::LibXML::CDATASection as HTML. .ie n .IP """$writer\->comment($node)""" 4 .el .IP \f(CW$writer\->comment($node)\fR 4 .IX Item "$writer->comment($node)" Outputs an XML::LibXML::Comment as HTML. .ie n .IP """$writer\->pi($node)""" 4 .el .IP \f(CW$writer\->pi($node)\fR 4 .IX Item "$writer->pi($node)" Outputs an XML::LibXML::PI as HTML. .ie n .IP """$writer\->doctype""" 4 .el .IP \f(CW$writer\->doctype\fR 4 .IX Item "$writer->doctype" Outputs the writer's DOCTYPE. .ie n .IP """$writer\->encode_entities($string, characters=>$more)""" 4 .el .IP "\f(CW$writer\->encode_entities($string, characters=>$more)\fR" 4 .IX Item "$writer->encode_entities($string, characters=>$more)" Takes a string and returns the same string with some special characters replaced. These special characters do not include any of '&', '<', '>' or '"', but you can provide a string of additional characters to treat as special: .Sp .Vb 1 \& $encoded = $writer\->encode_entities($raw, characters=>\*(Aq&<>"\*(Aq); .Ve .ie n .IP """$writer\->encode_entity($char)""" 4 .el .IP \f(CW$writer\->encode_entity($char)\fR 4 .IX Item "$writer->encode_entity($char)" Returns \f(CW$char\fR entity-encoded. Encoding is done regardless of whether \&\f(CW$char\fR is "special" or not. .ie n .IP """$writer\->is_xhtml""" 4 .el .IP \f(CW$writer\->is_xhtml\fR 4 .IX Item "$writer->is_xhtml" Boolean indicating if \f(CW$writer\fR is configured to output XHTML. .ie n .IP """$writer\->is_polyglot""" 4 .el .IP \f(CW$writer\->is_polyglot\fR 4 .IX Item "$writer->is_polyglot" Boolean indicating if \f(CW$writer\fR is configured to output polyglot HTML. .ie n .IP """$writer\->should_force_start_tags""" 4 .el .IP \f(CW$writer\->should_force_start_tags\fR 4 .IX Item "$writer->should_force_start_tags" .PD 0 .ie n .IP """$writer\->should_force_end_tags""" 4 .el .IP \f(CW$writer\->should_force_end_tags\fR 4 .IX Item "$writer->should_force_end_tags" .PD Booleans indicating whether optional start and end tags should be forced. .ie n .IP """$writer\->should_quote_attributes""" 4 .el .IP \f(CW$writer\->should_quote_attributes\fR 4 .IX Item "$writer->should_quote_attributes" Boolean indicating whether attributes need to be quoted. .ie n .IP """$writer\->should_slash_voids""" 4 .el .IP \f(CW$writer\->should_slash_voids\fR 4 .IX Item "$writer->should_slash_voids" Boolean indicating whether void elements should be closed in the XHTML style. .SH "BUGS AND LIMITATIONS" .IX Header "BUGS AND LIMITATIONS" Certain DOM constructs cannot be output in non-XML HTML. e.g. .PP .Vb 9 \& my $xhtml = < \& Test \&
This text is within the HR element \& \& XHTML \& my $dom = XML::LibXML\->new\->parse_string($xhtml); \& my $writer = HTML::HTML5::Writer\->new(markup=>\*(Aqhtml\*(Aq); \& print $writer\->document($dom); .Ve .PP In HTML, there's no way to serialise that properly in HTML. Right now this module just outputs that HR element with text contained within it, a la XHTML. In future versions, it may emit a warning or throw an error. .PP In these cases, the HTML::HTML5::{Parser,Writer} combination is not round-trippable. .PP Outputting elements and attributes in foreign (non-XHTML) namespaces is implemented pretty naively and not thoroughly tested. I'd be interested in any feedback people have, especially on round-trippability of SVG, MathML and RDFa content in HTML. .PP Please report any bugs to . .SH "SEE ALSO" .IX Header "SEE ALSO" HTML::HTML5::Parser, HTML::HTML5::Builder, HTML::HTML5::ToText, XML::LibXML. .SH AUTHOR .IX Header "AUTHOR" Toby Inkster . .SH "COPYRIGHT AND LICENSE" .IX Header "COPYRIGHT AND LICENSE" Copyright (C) 2010\-2012 by Toby Inkster. .PP This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.