.\" -*- mode: troff; coding: utf-8 -*-
.\" Automatically generated by Pod::Man 5.0102 (Pod::Simple 3.45)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
.ie n \{\
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
.    if \nF \{\
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{\
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "MsOffice::Word::Surgeon::PackagePart 3pm"
.TH MsOffice::Word::Surgeon::PackagePart 3pm 2024-12-21 "perl v5.40.0" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH NAME
MsOffice::Word::Surgeon::PackagePart \- Operations on a single part within the ZIP package of a docx document
.SH SYNOPSIS
.IX Header "SYNOPSIS"
.Vb 6
\&  my $part = $surgeon\->document;
\&  print $part\->plain_text;
\&  $part\->replace(qr[$pattern], $replacement_callback);
\&  $part\->replace_image($image_alt_text, $image_PNG_content);
\&  $part\->unlink_fields;
\&  $part\->reveal_bookmarks;
.Ve
.SH DESCRIPTION
.IX Header "DESCRIPTION"
This class is part of MsOffice::Word::Surgeon; it encapsulates operations for a single
\&\fIpackage part\fR within the ZIP package of a \f(CW\*(C`.docx\*(C'\fR document.
It is mostly used for the \fIdocument\fR part, that contains the XML representation of the
main document body. However, other parts such as headers, footers, footnotes, etc. have the
same internal representation and therefore the same operations can be invoked.
.SH METHODS
.IX Header "METHODS"
.SS new
.IX Subsection "new"
.Vb 4
\&  my $part = MsOffice::Word::Surgeon::PackagePart\->new(
\&    surgeon   => $surgeon,
\&    part_name => $name,
\&  );
.Ve
.PP
Constructor for a new part object. This is called internally from
MsOffice::Word::Surgeon; it is not meant to be called directly
by clients.
.PP
\fIConstructor arguments\fR
.IX Subsection "Constructor arguments"
.IP surgeon 4
.IX Item "surgeon"
a weak reference to the main surgeon object
.IP part_name 4
.IX Item "part_name"
ZIP member name of this part
.PP
\fIOther attributes\fR
.IX Subsection "Other attributes"
.PP
Other attributes, not passed through the constructor but generated lazily on demand, are :
.IP contents 4
.IX Item "contents"
the XML contents of this part
.IP runs 4
.IX Item "runs"
a decomposition of the XML contents into a collection of
MsOffice::Word::Surgeon::Run objects.
.IP relationships 4
.IX Item "relationships"
an arrayref of Office relationships associated with this part. This information comes from
a \f(CW\*(C`.rels\*(C'\fR member in the ZIP archive, named after the name of the package part.
Array indices correspond to relationship numbers. Array values are hashrefs with
keys
.RS 4
.IP Id 4
.IX Item "Id"
the full relationship id
.IP num 4
.IX Item "num"
the numeric part of \f(CW\*(C`rId\*(C'\fR
.IP Type 4
.IX Item "Type"
the full reference to the XML schema for this relationship
.IP short_type 4
.IX Item "short_type"
only the last word of the type, e.g. 'image', 'style', etc.
.IP Target 4
.IX Item "Target"
designation of the target within the ZIP file. The prefix 'word/' must be
added for having a complete Zip member name.
.RE
.RS 4
.RE
.IP images 4
.IX Item "images"
a hashref of images within this package part. Keys of the hash are image \fIalternative texts\fR.
If present, the alternative \fItitle\fR will be preferred; otherwise the alternative \fIdescription\fR will be taken
(note : the \fItitle\fR field was displayed in Office 2013 and 2016, but more recent versions only display
the \fIdescription\fR field \-\- see
MsOffice documentation <https://support.microsoft.com/en-us/office/add-alternative-text-to-a-shape-picture-chart-smartart-graphic-or-other-object-44989b2a-903c-4d9a-b742-6a75b451c669>).
.Sp
Images without alternative text will not be accessible through the current Perl module.
.Sp
Values of the hash are zip member names for the corresponding
image representations in \f(CW\*(C`.png\*(C'\fR format.
.SS "Contents restitution"
.IX Subsection "Contents restitution"
\fIcontents\fR
.IX Subsection "contents"
.PP
Returns a Perl string with the current internal XML representation of the part
contents.
.PP
\fIoriginal_contents\fR
.IX Subsection "original_contents"
.PP
Returns a Perl string with the XML representation of the
part contents, as it was in the ZIP archive before any
modification.
.PP
\fIindented_contents\fR
.IX Subsection "indented_contents"
.PP
Returns an indented version of the XML contents, suitable for inspection in a text editor.
This is produced by "toString" in XML::LibXML::Document and therefore is returned as an encoded
byte string, not a Perl string.
.PP
\fIplain_text\fR
.IX Subsection "plain_text"
.PP
Returns the text contents of the part, without any markup.
Paragraphs and breaks are converted to newlines, all other formatting instructions are ignored.
.PP
\fIruns\fR
.IX Subsection "runs"
.PP
Returns a list of MsOffice::Word::Surgeon::Run objects. Each of
these objects holds an XML fragment; joining all fragments
restores the complete document.
.PP
.Vb 1
\&  my $contents = join "", map {$_\->as_xml} $self\->runs;
.Ve
.SS "Modifying contents"
.IX Subsection "Modifying contents"
\fIcleanup_XML\fR
.IX Subsection "cleanup_XML"
.PP
.Vb 1
\&  $part\->cleanup_XML(%args);
.Ve
.PP
Apply several other methods for removing unnecessary nodes within the internal
XML. This method successively calls "reduce_all_noises", "unlink_fields",
"suppress_bookmarks" and "merge_runs".
.PP
Currently there is only one legal arg :
.ie n .IP """no_caps""" 4
.el .IP \f(CWno_caps\fR 4
.IX Item "no_caps"
If true, the method "remove_caps_property" in MsOffice::Word::Surgeon::Run is automatically
called for each run object. As a result, all texts within runs with the \f(CW\*(C`caps\*(C'\fR property are automatically
converted to uppercase.
.PP
\fIreduce_noise\fR
.IX Subsection "reduce_noise"
.PP
.Vb 1
\&  $part\->reduce_noise($regex1, $regex2, ...);
.Ve
.PP
This method is used for removing unnecessary information in the XML
markup.  It applies the given list of regexes to the whole document,
suppressing matches.  The final result is put back into 
\&\f(CW\*(C`$self\->contents\*(C'\fR. Regexes may be given either as \f(CW\*(C`qr/.../\*(C'\fR
references, or as names of builtin regexes (described below).  Regexes
are applied to the whole XML contents, not only to run nodes.
.PP
\fInoise_reduction_regex\fR
.IX Subsection "noise_reduction_regex"
.PP
.Vb 1
\&  my $regex = $part\->noise_reduction_regex($regex_name);
.Ve
.PP
Returns the builtin regex corresponding to the given name.
Known regexes are :
.PP
.Vb 7
\&  proof_checking       => qr(<w:(?:proofErr[^>]+|noProof/)>),
\&  revision_ids         => qr(\esw:rsid\ew+="[^"]+"),
\&  complex_script_bold  => qr(<w:bCs/>),
\&  page_breaks          => qr(<w:lastRenderedPageBreak/>),
\&  language             => qr(<w:lang w:val="[^/>]+/>),
\&  empty_run_props      => qr(<w:rPr></w:rPr>),
\&  soft_hyphens         => qr(<w:softHyphen/>),
.Ve
.PP
\fIreduce_all_noises\fR
.IX Subsection "reduce_all_noises"
.PP
.Vb 1
\&  $part\->reduce_all_noises;
.Ve
.PP
Applies all regexes from the previous method.
.PP
\fImerge_runs\fR
.IX Subsection "merge_runs"
.PP
.Vb 1
\&  $part\->merge_runs(no_caps => 1); # optional arg
.Ve
.PP
Walks through all runs of text within the document, trying to merge
adjacent runs when possible (i.e. when both runs have the same
properties, and there is no other XML node inbetween).
.PP
This operation is a prerequisite before performing replace operations, because
documents edited in MsWord often have run boundaries across sentences or
even in the middle of words; so regex searches can only be successful if those
artificial boundaries have been removed.
.PP
If the argument \f(CW\*(C`no_caps => 1\*(C'\fR is present, the merge operation
will also convert runs with the \f(CW\*(C`w:caps\*(C'\fR property, putting all letters
into uppercase and removing the property; this makes more merges possible.
.PP
\fIreplace\fR
.IX Subsection "replace"
.PP
.Vb 1
\&  $part\->replace($pattern, $replacement, %replacement_args);
.Ve
.PP
Replaces all occurrences of \f(CW$pattern\fR regex within the text nodes by the
given \f(CW$replacement\fR. This is not exactly like a search-replace
operation performed within MsWord, because the search does not cross boundaries
of text nodes. In order to maximize the chances of successful replacements,
the "cleanup_XML" method is automatically called before starting the operation.
.PP
The argument \f(CW$pattern\fR can be either a string or a reference to a regular expression.
It should not contain any capturing parentheses, because that would perturb text
splitting operations.
.PP
The argument \f(CW$replacement\fR can be either a fixed string, or a reference to
a callback subroutine that will be called for each match.
.PP
The \f(CW%replacement_args\fR hash can be used to pass information to the callback
subroutine. That hash will be enriched with three entries :
.IP matched 4
.IX Item "matched"
The string that has been matched by \f(CW$pattern\fR.
.IP run 4
.IX Item "run"
The run object in which this text resides.
.IP xml_before 4
.IX Item "xml_before"
The XML fragment (possibly empty) found before the matched text .
.PP
The callback subroutine may return either plain text or structured XML.
See "SYNOPSIS" in MsOffice::Word::Surgeon::Run for an example of a replacement callback.
.PP
The following special keys within \f(CW%replacement_args\fR are interpreted by the 
\&\f(CWreplace()\fR method itself, and therefore are not passed to the callback subroutine :
.IP keep_xml_as_is 4
.IX Item "keep_xml_as_is"
if true, no call is made to the "cleanup_XML" method before performing the replacements
.IP dont_overwrite_contents 4
.IX Item "dont_overwrite_contents"
if true, the internal XML contents is not modified in place; the new XML after performing
replacements is merely returned to the caller.
.IP cleanup_args 4
.IX Item "cleanup_args"
the argument should be an arrayref and will be passed to the "cleanup_XML" method. This
is typically used as
.Sp
.Vb 1
\&  $part\->replace($pattern, $replacement, cleanup_args => [no_caps => 1]);
.Ve
.SS "Operations on bookmarks"
.IX Subsection "Operations on bookmarks"
\fIbookmark_boundaries\fR
.IX Subsection "bookmark_boundaries"
.PP
.Vb 2
\&  my $boundaries               = part\->bookmark_boundaries;
\&  my ($boundaries, $final_xml) = part\->bookmark_boundaries;
.Ve
.PP
Parses the XML content to discover bookmark boundaries.
In scalar context, returns an arrayref of MsOffice::Word::Surgeon::BookmarkBoundary objects.
In list context, returns the arrayref followed by a plain string containing the final XML fragment.
.PP
\fIsuppress_bookmarks\fR
.IX Subsection "suppress_bookmarks"
.PP
.Vb 1
\&  $part\->suppress_bookmarks(full_range => [qw/foo bar/], markup_only => qr/^_/);
.Ve
.PP
Suppresses bookmarks according to the specified options :
.IP full_range 4
.IX Item "full_range"
For bookmark names matching this option, the bookmark will be fully
suppressed (not only the start and end markers, but also any
content inbetween).
.IP markup_only 4
.IX Item "markup_only"
For bookmark names matching this option, start and end markers
are suppressed, but the inner content remains.
.PP
Options may be specified as lists of strings, or regexes, or coderefs ... anything suitable
to be compared through match::simple. In absence of any options, the default
is \f(CW\*(C`markup_only => qr/./\*(C'\fR, meaning that all bookmarks markup is suppressed.
.PP
Removing bookmarks is useful because
MsWord may silently insert bookmarks in unexpected places; therefore
some searches within the text may fail because of such bookmarks.
.PP
The \f(CW\*(C`full_range\*(C'\fR option is especially convenient for removing bookmarks associated
with ASK fields. Such bookmarks contain ranges of text that are 
never displayed by MsWord.
.PP
\fIreveal_bookmarks\fR
.IX Subsection "reveal_bookmarks"
.PP
.Vb 1
\&  $part\->reveal_bookmarks(color => \*(Aqgreen\*(Aq);
.Ve
.PP
Usually bookmarks boundaries in MsWord are not visible; the only way to have a visual clue is to turn on
an option in
Advanced / Show document content / Show bookmarks <https://support.microsoft.com/en-gb/office/troubleshoot-bookmarks-9cad566f-913d-49c6-8d37-c21e0e8d6db0> \-\- but this only displays where bookmarks start and end, without the names of the bookmarks.
.PP
The \f(CWreveal_bookmarks()\fR method will insert a visible run before each bookmark start and after each bookmark end, showing
the bookmark name. This is an interesting tool for documenting where bookmarks are located in an existing document.
.PP
Options to this method are :
.IP color 4
.IX Item "color"
The highlighting color for visible marks. This should be a valid
highlighting color, i.e black, blue, cyan, darkBlue, darkCyan,
darkGray, darkGreen, darkMagenta, darkRed, darkYellow, green,
lightGray, magenta, none, red, white or yellow. Default is yellow.
.IP props 4
.IX Item "props"
A string in \f(CW\*(C`sprintf\*(C'\fR format for building the XML to be inserted in \f(CW\*(C`<w:rPr>\*(C'\fR node
when displaying bookmarks marks, i.e. the style for displaying such marks.
The default is just a highlighting property :  \f(CW\*(C`<w:highlight w:val="%s"/>\*(C'\fR.
.IP start 4
.IX Item "start"
A string in \f(CW\*(C`sprintf\*(C'\fR format for generating text before a bookmark start.
Default is \f(CW\*(C`<%s>\*(C'\fR.
.IP end 4
.IX Item "end"
A string in \f(CW\*(C`sprintf\*(C'\fR format for generating text after a bookmark end.
Default is \f(CW\*(C`</%s>\*(C'\fR.
.IP ignore 4
.IX Item "ignore"
A regexp for deciding which bookmarks will not be revealed. Default is \f(CW\*(C`qr/^_/\*(C'\fR,
because bookmarks with an initial underscore are usually technical bookmarks inserted
automatically by MsWord, such as \f(CW\*(C`_GoBack\*(C'\fR or \f(CW\*(C`_Toc53196147\*(C'\fR.
.SS "Operations on fields"
.IX Subsection "Operations on fields"
\fIfields\fR
.IX Subsection "fields"
.PP
.Vb 2
\&  my $fields               = part\->fields;
\&  my ($fields, $final_xml) = part\->fields;
.Ve
.PP
Parses the XML content to discover MsWord fields.
In scalar context, returns an arrayref of MsOffice::Word::Surgeon::Field objects.
In list context, returns the arrayref followed by a plain string containing the final XML fragment.
.PP
\fIreplace_fields\fR
.IX Subsection "replace_fields"
.PP
.Vb 2
\&  my $field_replacer = sub {my ($code, $result) = @_; return "...";};
\&  $part\->replace_fields($field_replacer);
.Ve
.PP
Replaces MsWord fields by the product of the \f(CW$field_replacer\fR callback.
The callback receives two arguments :
.ie n .IP $code 4
.el .IP \f(CW$code\fR 4
.IX Item "$code"
A plain string containing the field's full code instruction, i.e a keyword followed by optional arguments and switches,
including initial and final spaces. Embedded fields are represented in curly braces, like for example
.Sp
\&\f(CW\*(C`IF { DOCPROPERTY foo } = "bar" "is bar" "is not bar"\*(C'\fR.
.ie n .IP $result 4
.el .IP \f(CW$result\fR 4
.IX Item "$result"
An XML fragment containing the current value for the field.
.PP
The callback should return an XML fragment suitable to be inserted within an MsWord \fIrun\fR.
.PP
\fIreveal_fields\fR
.IX Subsection "reveal_fields"
.PP
.Vb 1
\&  $part\->reveal_fields;
.Ve
.PP
Replaces each field with a textual representation of its code instruction, embedded in curly braces.
.PP
\fIunlink_fields\fR
.IX Subsection "unlink_fields"
.PP
.Vb 1
\&  $part\->unlink_fields;
.Ve
.PP
Replaces each field with its current result, i.e removing the code instruction.
This is the equivalent of performing Ctrl\-Shift\-F9 in MsWord on the whole document.
.SS "Operations on images"
.IX Subsection "Operations on images"
\fIreplace_image\fR
.IX Subsection "replace_image"
.PP
.Vb 1
\&  $part\->replace_image($image_alt_text, $image_PNG_content);
.Ve
.PP
Replaces an existing PNG image by a new image. All features of the old image will
be preserved (size, positioning, border, etc.) \-\- only the image itself will be
replaced. The \f(CW$image_alt_text\fR must correspond to the \fIalternative text\fR set in Word
for this image.
.PP
This operation replaces a ZIP member within the \f(CW\*(C`.docx\*(C'\fR file. If several XML
nodes refer to the \fIsame\fR ZIP member, i.e. if the same image is displayed at several
locations, the new image will appear at all locations, even if they do not have the
same alternative text \-\- unfortunately this module currently has no facility for
duplicating an existing image into separate instances. So if your intent is to only replace
one instance of the image, your original document should contain several distinct copies
of the \f(CW\*(C`.PNG\*(C'\fR file.
.PP
\fIadd_image\fR
.IX Subsection "add_image"
.PP
.Vb 1
\&  my $rId = $part\->add_image($image_PNG_content);
.Ve
.PP
Stores the given PNG image within the ZIP file, adds it as a relationship to the
current part, and returns the relationship id. This operation is not sufficient
to  make the image visible in Word : it just stores the image, but you still
have to insert a proper \f(CW\*(C`drawing\*(C'\fR node in the contents XML, using the \f(CW$rId\fR.
Future versions of this module may offer helper methods for that purpose;
currently it must be done by hand.
.SH AUTHOR
.IX Header "AUTHOR"
Laurent Dami, <dami AT cpan DOT org<gt>
.SH "COPYRIGHT AND LICENSE"
.IX Header "COPYRIGHT AND LICENSE"
Copyright 2019\-2024 by Laurent Dami.
.PP
This program is free software, you can redistribute it and/or modify it under the terms of the Artistic License version 2.0.