NAME¶
doctools::toc::parse - Parsing text in doctoc format
SYNOPSIS¶
package require
doctools::toc::parse ?0.1?
package require
Tcl 8.4
package require
doctools::toc::structure
package require
doctools::msgcat
package require
doctools::tcl::parse
package require
fileutil
package require
logger
package require
snit
package require
struct::list
package require
struct::stack
::doctools::toc::parse text text
::doctools::toc::parse file path
::doctools::toc::parse includes
::doctools::toc::parse include add path
::doctools::toc::parse include remove path
::doctools::toc::parse include clear
::doctools::toc::parse vars
::doctools::toc::parse var set name value
::doctools::toc::parse var unset name
::doctools::toc::parse var clear ?
pattern?
DESCRIPTION¶
This package provides commands to parse text written in the
doctoc markup
language and convert it into the canonical serialization of the table of
contents encoded in the text. See the section
ToC serialization format
for specification of their format.
This is an internal package of doctools, for use by the higher level packages
handling
doctoc documents.
API¶
- ::doctools::toc::parse text text
- The command takes the string contained in text and
parses it under the assumption that it contains a document written using
the doctoc markup language. An error is thrown if this assumption
is found to be false. The format of these errors is described in section
Parse errors.
When successful the command returns the canonical serialization of the table
of contents which was encoded in the text. See the section ToC
serialization format for specification of that format.
- ::doctools::toc::parse file path
- The same as text, except that the text to parse is
read from the file specified by path.
- ::doctools::toc::parse includes
- This method returns the current list of search paths used
when looking for include files.
- ::doctools::toc::parse include add
path
- This method adds the path to the list of paths
searched when looking for an include file. The call is ignored if the path
is already in the list of paths. The method returns the empty string as
its result.
- ::doctools::toc::parse include remove
path
- This method removes the path from the list of paths
searched when looking for an include file. The call is ignored if the path
is not contained in the list of paths. The method returns the empty string
as its result.
- ::doctools::toc::parse include clear
- This method clears the list of search paths for include
files.
- ::doctools::toc::parse vars
- This method returns a dictionary containing the current set
of predefined variables known to the vset markup command during
processing.
- ::doctools::toc::parse var set name
value
- This method adds the variable name to the set of
predefined variables known to the vset markup command during
processing, and gives it the specified value. The method returns
the empty string as its result.
- ::doctools::toc::parse var unset
name
- This method removes the variable name from the set
of predefined variables known to the vset markup command during
processing. The method returns the empty string as its result.
- ::doctools::toc::parse var clear
?pattern?
- This method removes all variables matching the
pattern from the set of predefined variables known to the
vset markup command during processing. The method returns the empty
string as its result.
The pattern matching is done with string match, and the default
pattern used when none is specified, is *.
PARSE ERRORS¶
The format of the parse error messages thrown when encountering violations of
the
doctoc markup syntax is human readable and not intended for
processing by machines. As such it is not documented.
However, the errorCode attached to the message is machine-readable and
has the following format:
- [1]
- The error code will be a list, each element describing a
single error found in the input. The list has at least one element,
possibly more.
- [2]
- Each error element will be a list containing six strings
describing an error in detail. The strings will be
- [1]
- The path of the file the error occured in. This may be
empty.
- [2]
- The range of the token the error was found at. This range
is a two-element list containing the offset of the first and last
character in the range, counted from the beginning of the input (file).
Offsets are counted from zero.
- [3]
- The line the first character after the error is on. Lines
are counted from one.
- [4]
- The column the first character after the error is at.
Columns are counted from zero.
- [5]
- The message code of the error. This value can be used as
argument to msgcat::mc to obtain a localized error message,
assuming that the application had a suitable call of
doctools::msgcat::init to initialize the necessary message catalogs
(See package doctools::msgcat).
- [6]
- A list of details for the error, like the markup command
involved. In the case of message code doctoc/include/syntax this
value is the set of errors found in the included file, using the format
described here.
[DOCTOC] NOTATION OF TABLES OF CONTENTS¶
The doctoc format for tables of contents, also called the
doctoc markup
language, is too large to be covered in single section. The interested
reader should start with the document
- [1]
- doctoc language introduction
and then proceed from there to the formal specifications, i.e. the documents
- [1]
- doctoc language syntax and
- [2]
- doctoc language command reference.
to get a thorough understanding of the language.
Here we specify the format used by the doctools v2 packages to serialize tables
of contents as immutable values for transport, comparison, etc.
We distinguish between
regular and
canonical serializations. While
a table of contents may have more than one regular serialization only exactly
one of them will be
canonical.
- regular serialization
- [1]
- The serialization of any table of contents is a nested Tcl
dictionary.
- [2]
- This dictionary holds a single key, doctools::toc,
and its value. This value holds the contents of the table of
contents.
- [3]
- The contents of the table of contents are a Tcl dictionary
holding the title of the table of contents, a label, and its elements. The
relevant keys and their values are
- title
- The value is a string containing the title of the table of
contents.
- label
- The value is a string containing a label for the table of
contents.
- items
- The value is a Tcl list holding the elements of the table,
in the order they are to be shown.
Each element is a Tcl list holding the type of the item, and its
description, in this order. An alternative description would be that it is
a Tcl dictionary holding a single key, the item type, mapped to the item
description.
The two legal item types and their descriptions are
- reference
- This item describes a single entry in the table of
contents, referencing a single document. To this end its value is a Tcl
dictionary containing an id for the referenced document, a label, and a
longer textual description which can be associated with the entry. The
relevant keys and their values are
- id
- The value is a string containing the id of the document
associated with the entry.
- label
- The value is a string containing a label for this entry.
This string also identifies the entry, and no two entries (references and
divisions) in the containing list are allowed to have the same label.
- desc
- The value is a string containing a longer description for
this entry.
- division
- This item describes a group of entries in the table of
contents, inducing a hierarchy of entries. To this end its value is a Tcl
dictionary containing a label for the group, an optional id to a document
for the whole group, and the list of entries in the group. The relevant
keys and their values are
- id
- The value is a string containing the id of the document
associated with the whole group. This key is optional.
- label
- The value is a string containing a label for the group.
This string also identifies the entry, and no two entries (references and
divisions) in the containing list are allowed to have the same label.
- items
- The value is a Tcl list holding the elements of the group,
in the order they are to be shown. This list has the same structure as the
value for the keyword items used to describe the whole table of
contents, see above. This closes the recusrive definition of the
structure, with divisions holding the same type of elements as the whole
table of contents, including other divisions.
- canonical serialization
- The canonical serialization of a table of contents has the
format as specified in the previous item, and then additionally satisfies
the constraints below, which make it unique among all the possible
serializations of this table of contents.
- [1]
- The keys found in all the nested Tcl dictionaries are
sorted in ascending dictionary order, as generated by Tcl's builtin
command lsort -increasing -dict.
BUGS, IDEAS, FEEDBACK¶
This document, and the package it describes, will undoubtedly contain bugs and
other problems. Please report such in the category
doctools of the
Tcllib SF Trackers [
http://sourceforge.net/tracker/?group_id=12883].
Please also report any ideas for enhancements you may have for either package
and/or documentation.
KEYWORDS¶
doctoc, doctools, lexer, parser
CATEGORY¶
Documentation tools
COPYRIGHT¶
Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>