- bookworm 1.0.0-2
- testing 1.2.0-1
- unstable 2.0.2-2
- experimental 2.0.2-1
LOWDOWN(3) | Library Functions Manual | LOWDOWN(3) |
NAME¶
lowdown
— simple
markdown translator library
LIBRARY¶
library “liblowdown”
SYNOPSIS¶
#include
<sys/queue.h>
#include <stdio.h>
#include <lowdown.h>
struct lowdown_metadata
struct lowdown_node
struct lowdown_opts
DESCRIPTION¶
This library parses lowdown(5) into various output formats.
The library consists first of a high-level interface consisting of lowdown_buf(3), lowdown_buf_diff(3), lowdown_file(3), and lowdown_file_diff(3).
The high-level functions interface with low-level functions that perform parsing and formatting. These consist of lowdown_doc_new(3), lowdown_doc_parse(3), and lowdown_doc_free(3) for parsing lowdown(5) documents into an abstract syntax tree.
The front-end functions for freeing, allocation, and rendering are as follows.
- HTML5:
- gemini:
- LaTeX:
- OpenDocument:
- roff:
- UTF-8 ANSI terminal:
- debugging:
To compile and link, use pkg-config(1):
% cc `pkg-config --cflags lowdown` -c -o sample.o sample.c % cc -o sample sample.o `pkg-config --libs lowdown`
Pledge Promises¶
The lowdown
library is built to operate in
security-sensitive environments, such as those using
pledge(2) on OpenBSD. The only
promise required is stdio for
lowdown_file_diff(3) and
lowdown_file(3): both require access to the stream for
reading input.
Types¶
All lowdown
functions use one or more of
the following structures.
The struct lowdown_opts structure manage features. It has the following fields:
- unsigned int feat
- Features used during the parse. This bit-field may have the following bits
OR'd:
LOWDOWN_ATTRS
- Parse PHP extra link, header, and image attributes.
LOWDOWN_AUTOLINK
- Parse
http
,https
,ftp
,mailto
, and relative links or link fragments. LOWDOWN_COMMONMARK
- Tighten input parsing to the CommonMark specification. This also uses the first ordered list value instead of starting all lists at one. This feature is experimental and incomplete.
LOWDOWN_DEFLIST
- Parse PHP extra definition lists. This is currently constrained to single-key lists.
LOWDOWN_FENCED
- Parse GFM fenced (language-specific) code blocks.
LOWDOWN_FOOTNOTES
- Parse MMD style footnotes. This only supports the referenced footnote style, not the "inline" style.
LOWDOWN_HILITE
- Parse highlit sequences. This are disabled by default because it may be erroneously interpreted as section headers.
LOWDOWN_IMG_EXT
- Deprecated. Use
LOWDOWN_ATTRS
instead. LOWDOWN_MATH
- Parse mathematics equations.
LOWDOWN_METADATA
- Parse in-document MMD metadata. For the first paragraph to count as meta-data, the first line must have a colon in it.
LOWDOWN_NOCODEIND
- Do not parse indented content as code blocks.
LOWDOWN_NOINTEM
- Do not parse emphasis within words.
LOWDOWN_STRIKE
- Parse strikethrough sequences.
LOWDOWN_SUPER
- Parse super-scripts. This accepts foo^bar, which puts the parts following the caret until whitespace in superscripts; or foo^(bar), which puts only the parts in parenthesis.
LOWDOWN_TABLES
- Parse GFM tables.
LOWDOWN_TASKLIST
- Parse GFM task list items.
The default value is zero (none).
- unsigned int oflags
- Features used by the output generators. This bit-field may have the
following enabled. Note that bits are by definition specific to an output
type.
For
LOWDOWN_HTML
:LOWDOWN_HTML_ESCAPE
- If
LOWDOWN_HTML_SKIP_HTML
has not been set, escapes in-document HTML so that it is rendered as opaque text. LOWDOWN_HTML_HARD_WRAP
- Retain line-breaks within paragraphs.
LOWDOWN_HTML_HEAD_IDS
- Have an identifier written with each header element consisting of an HTML-escaped version of the header contents.
LOWDOWN_HTML_OWASP
- When escaping text, be extra paranoid in following the OWASP suggestions for which characters to escape.
LOWDOWN_HTML_NUM_ENT
- Convert, when possible, HTML entities to their numeric form. If not set, the entities are used as given in the input.
LOWDOWN_HTML_SKIP_HTML
- Do not render in-document HTML at all.
For
LOWDOWN_GEMINI
, there are several flags for controlling link placement. By default, links (images, autolinks, and links) are queued when specified in-line then emitted in a block sequence after the nearest block element.LOWDOWN_GEMINI_LINK_END
- Emit the queue of links at the end of the document instead of after the nearest block element.
LOWDOWN_GEMINI_LINK_IN
- Render all links within the flow of text. This will cause breakage when nested links, such as images within links, links in blockquotes, etc. It should not be used unless in carefully crafted documents.
LOWDOWN_GEMINI_LINK_NOREF
- Do not format link labels. Takes precedence over
LOWDOWN_GEMINI_LINK_ROMAN
. LOWDOWN_GEMINI_LINK_ROMAN
- When formatting link labels, use lower-case Roman numerals instead of the default lowercase hexavigesimal (i.e., “a”, “b”, ..., “aa”, “ab”, ...).
LOWDOWN_GEMINI_METADATA
- Print metadata as the canonicalised key followed by a colon then the value, each on one line (newlines replaced by spaces). The metadata block is terminated by a double newline. If there is no metadata, this does nothing.
There may only be one of
LOWDOWN_GEMINI_LINK_END
orLOWDOWN_GEMINI_LINK_IN
. If both are specified, the latter is unset.For
LOWDOWN_FODT
:LOWDOWN_ODT_SKIP_HTML
- Do not render in-document HTML at all. Text within HTML elements remains.
For
LOWDOWN_LATEX
:LOWDOWN_LATEX_NUMBERED
- Use the default numbering scheme for sections, subsections, etc. If not specified, these are inhibited.
LOWDOWN_LATEX_SKIP_HTML
- Do not render in-document HTML at all. Text within HTML elements remains.
And for
LOWDOWN_MAN
andLOWDOWN_NROFF
:LOWDOWN_NROFF_GROFF
- Use GNU extensions (i.e., for groff(1)) when
rendering output. The groff arguments must include
-m
pdfmark for formatting links withLOWDOWN_MAN
or-m
spdf instead of-m
s forLOWDOWN_NROFF
. Applies to theLOWDOWN_MAN
andLOWDOWN_NROFF
output types. LOWDOWN_NROFF_NUMBERED
- Use numbered sections if
LOWDOWON_NROFF_GROFF
is not specified. Only applies to theLOWDOWN_NROFF
output type. LOWDOWN_NROFF_SKIP_HTML
- Do not render in-document HTML at all. Text within HTML elements remains.
LOWDOWN_NROFF_SHORTLINK
- Render link URLs in short form. Applies to images, autolinks, and
regular links. Only in
LOWDOWN_MAN
or whenLOWDOWN_NROFF_GROFF
is not specified. LOWDOWN_NROFF_NOLINK
- Don't show links at all if they have embedded text. Applies to images
and regular links. Only in
LOWDOWN_MAN
or whenLOWDOWN_NROFF_GROFF
is not specified.
For
LOWDOWN_TERM
:LOWDOWN_TERM_NOANSI
- Don't apply ANSI style codes at all. This implies
LOWDOWN_TERM_NOCOLOUR
. LOWDOWN_TERM_NOCOLOUR
- Don't apply ANSI colour codes. This will still show underline, bold, etc. This should not be used in difference mode, as the output will make no sense.
LOWDOWN_TERM_NOLINK
- Don't show links at all. Applies to images and regular links:
autolinks are still shown. This may be combined with
LOWDOWN_TERM_SHORTLINK
to also shorten autolinks. LOWDOWN_TERM_SHORTLINK
- Render link URLs in short form. Applies to images, autolinks, and
regular links. This may be combined with
LOWDOWN_TERM_NOLINK
to only show shortened autolinks.
For any mode, you may specify:
LOWDOWN_SMARTY
- Don't use smart typography formatting.
LOWDOWN_STANDALONE
- Emit a full document instead of a document fragment. This envelope is
largely populated from metadata if
LOWDOWN_METADATA
was provided as an option or as given in meta or metaovr.
- size_t maxdepth
- The maximum parse depth before the parser exits. Most documents will have a parse depth in the single digits.
- size_t cols
- For
LOWDOWN_TERM
, the "soft limit" for width of terminal output not including margins. If zero, 80 shall be used. - size_t hmargin
- For
LOWDOWN_TERM
, the left margin (space characters). - size_t vmargin
- For
LOWDOWN_TERM
, the top/bottom margin (newlines). - enum lowdown_type type
- May be set to
LOWDOWN_HTML
for HTML5 output,LOWDOWN_LATEX
for LaTeX,LOWDOWN_MAN
for-m
an macros,LOWDOWN_FODT
for “flat” OpenDocument,LOWDOWN_TERM
for ANSI-compatible UTF-8 terminal output,LOWDOWN_GEMINI
for the Gemini format, orLOWDOWN_NROFF
for-m
s macros. TheLOWDOWN_TREE
type causes a debug tree to be written. - struct lowdown_opts_odt odt
- If type is
LOWDOWN_FODT
, this contains const char *sty, which is eitherNULL
or the OpenDocument styles used when creating standalone documents. IfNULL
, the default styles are used. - char **meta
- An array of metadata key-value pairs or
NULL
. Each pair must appear as if provided on one line (or multiple lines) of the input, including the terminating newline character. If not consisting of a valid pair (e.g., no newline, no colon), then it is ignored. When processed, these values are overridden by those in the document (ifLOWDOWN_METADATA
is specified) or by those in metaovr. - size_t metasz
- Number of pairs in metaovr.
- char **metaovr
- See meta. The difference is that metaovr is applied after meta and in-document metadata, so it overrides prior values.
- size_t metaovrsz
- Number of pairs in metaovr.
Another common structure is struct
lowdown_metadata, which is used to hold parsed (and output-formatted)
metadata keys and values if LOWDOWN_METADATA
was
provided as an input bit. This structure consists of the following
fields:
- char *key
- The metadata key in its lowercase, canonical form.
- char *value
- The metadata value as rendered in the current output format. This may be an empty string.
The abstract syntax tree is encoded in struct lowdown_node, which consists of the following.
- enum lowdown_rndrt type
- The node type. (Described below.)
- size_t id
- An identifier unique within the document. This can be used as a table index since the number is assigned from a monotonically increasing point during the parse.
- struct lowdown_node *parent
- The parent of the node, or
NULL
at the root. - enum lowdown_chng chng
- Change tracking: whether this node was inserted
(
LOWDOWN_CHNG_INSERT
), deleted (LOWDOWN_CHNG_DELETE
), or neither (LOWDOWN_CHNG_NONE
). - struct lowdown_nodeq children
- A possibly-empty list of child nodes.
- <anon union>
- An anonymous union of type-specific structures. See below for a description of each one.
The nodes may be one of the following types, with default rendering in HTML5 to illustrate functionality.
LOWDOWN_BLOCKCODE
- A block-level (and possibly language-specific) snippet of code. Described
by the
<pre><code>
elements. LOWDOWN_BLOCKHTML
- A block-level snippet of HTML. This is simply opaque HTML content. (Only if configured during parse.)
LOWDOWN_BLOCKQUOTE
- A block-level quotation. Described by the
<blockquote>
element. LOWDOWN_CODESPAN
- A snippet of code. Described by the
<code>
element. LOWDOWN_DOC_HEADER
- A header with data gathered from document metadata (if configured).
Described by the
<head>
element. (Only if configured during parse.) LOWDOWN_DOUBLE_EMPHASIS
- Bold (or otherwise notable) content. Described by the
<strong>
element. LOWDOWN_EMPHASIS
- Italic (or otherwise notable) content. Described by the
<em>
element. LOWDOWN_ENTITY
- An HTML entity, which may either be named or numeric.
LOWDOWN_FOOTNOTE
- A footnote. (Only if configured during parse.)
LOWDOWN_HEADER
- A block-level header. Described (in the HTML case) by one of
<h1>
through<h6>
. LOWDOWN_HIGHLIGHT
- Marked test. Described by the
<mark>
element. (Only if configured during parse.) LOWDOWN_HRULE
- A horizontal line. Described by
<hr>
. LOWDOWN_IMAGE
- An image. Described by the
<img>
element. LOWDOWN_LINEBREAK
- A hard line-break within a block context. Described by the
<br>
element. LOWDOWN_LINK
- A link to external media. Described by the
<a>
element. LOWDOWN_LINK_AUTO
- Like
LOWDOWN_LINK
, except inferred from text content. Described by the<a>
element. (Only if configured during parse.) LOWDOWN_LIST
- A block-level list enclosure. Described by
<ul>
or<ol>
. LOWDOWN_LISTITEM
- A block-level list item, always appearing within a
LOWDOWN_LIST
. Described by<li>
. LOWDOWN_MATH_BLOCK
- A block (or inline) of mathematical text in LaTeX format. Described within
\[xx\]
or\(xx\)
. This is usually (in HTML) externally handled by a JavaScript renderer. (Only if configured during parse.) LOWDOWN_META
- Meta-data keys and values. (Only if configured during parse.) These are
described by elements in the
<head>
element. LOWDOWN_NORMAL_TEXT
- Normal text content.
LOWDOWN_PARAGRAPH
- A block-level paragraph. Described by the
<p>
element. LOWDOWN_RAW_HTML
- An inline of raw HTML. (Only if configured during parse.)
LOWDOWN_ROOT
- The root of the document. This is always the topmost node, and the only
node where the parent field is
NULL
. LOWDOWN_STRIKETHROUGH
- Content struck through. Described by the
<del>
element. (Only if configured during parse.) LOWDOWN_SUPERSCRIPT
- A superscript. Described by the
<sup>
element. (Only if configured during parse.) LOWDOWN_TABLE_BLOCK
- A table block. Described by
<table>
. (Only if configured during parse.) LOWDOWN_TABLE_BODY
- A table body section. Described by
<tbody>
. Parent is alwaysLOWDOWN_TABLE_BLOCK
. (Only if configured during parse.) LOWDOWN_TABLE_CELL
- A table cell. Described by
<td>
or<th>
if in the header. Parent is alwaysLOWDOWN_TABLE_ROW
. (Only if configured during parse.) LOWDOWN_TABLE_HEADER
- A table header section. Described by
<thead>
. Parent is alwaysLOWDOWN_TABLE_BLOCK
. (Only if configured during parse.) LOWDOWN_TABLE_ROW
- A table row. Described by
<tr>
. Parent is alwaysLOWDOWN_TABLE_HEADER
orLOWDOWN_TABLE_BODY
. (Only if configured during parse.) LOWDOWN_TRIPLE_EMPHASIS
- Combination of
LOWDOWN_EMPHASIS
andLOWDOWN_DOUBLE_EMPHASIS
.
The following anonymous union structures correspond to certain nodes. Note that all buffers may be zero-length.
- rndr_autolink
- For
LOWDOWN_LINK_AUTO
, the link address as link and the link type type, which may be one ofHALINK_EMAIL
for e-mail links andHALINK_NORMAL
otherwise. Any buffer may be empty-sized. - rndr_blockcode
- For
LOWDOWN_BLOCKCODE
, the opaque text of the block and the optional lang of the code language. - rndr_blockhtml
- For
LOWDOWN_BLOCKHTML
, the opaque HTML text. - rndr_codespan
- The opaque text of the contents.
- rndr_definition
- For
LOWDOWN_DEFINITION
, containing flags that may beHLIST_FL_BLOCK
if the definition list should be interpreted as containing block elements. - rndr_entity
- For
LOWDOWN_ENTITY
, the entity text. - rndr_header
- For
LOWDOWN_HEADER
, the level of the header starting at zero (this value is relative to the metadata base header level, defaulting to one), optional space-separated class list attr_cls, and optional single identifier attr_id. - rndr_image
- For
LOWDOWN_IMAGE
, the image address link, the image title title, dimensions NxN (width by height) in dims, and alternate text alt. CSS in-line style for width and height may be given in attr_width and/or attr_height, and a space-separated list of classes may be in attr_cls and a single identifier may be in attr_id. - rndr_link
- Like rndr_autolink, but without a type and further defining an optional link title title, optional space-separated class list attr_cls, and optional single identifier attr_id.
- rndr_list
- For
LOWDOWN_LIST
, consists of a bitfield flags that may be set toHLIST_FL_ORDERED
for an ordered list andHLIST_FL_UNORDERED
for an unordered one. IfHLIST_FL_BLOCK
is set, the list should be output as if items were separate blocks. The start value forHLIST_FL_ORDERED
is the starting list item position, which is one by default and never zero. - rndr_listitem
- For
LOWDOWN_LISTITEM
, consists of a bitfield flags that may be set toHLIST_FL_ORDERED
for an ordered list,HLIST_FL_UNORDERED
for an unordered list,HLIST_FL_DEF
for definition list data,HLIST_FL_CHECKED
orHLIST_FL_UNCHECKED
for an unordered “task” list element, and/orHLIST_FL_BLOCK
for list item output as if containing block elements. TheHLIST_FL_BLOCK
should not be used: use the parent list (or definition list) flags for this. The num is the index in aHLIST_FL_ORDERED
list. It is monotonically increasing with each item in the list, starting at the start variable given in struct rndr_list. - rndr_math
- For
LOWDOWN_MATH
, the mode of display in blockmode: if 1, in-line math; if 2, multi-line. The opaque equation, which is assumed to be in LaTeX format, is in the opaque text. - rndr_meta
- Each
LOWDOWN_META
key-value pair is represented. The keys are lower-case without spaces or non-ASCII characters. If provided, enclosed nodes may consist only ofLOWDOWN_NORMAL_TEXT
andLOWDOWN_ENTITY
. - rndr_normal_text
- The basic text content for
LOWDOWN_NORMAL_TEXT
. - rndr_paragraph
- For
LOWDOWN_PARAGRAPH
, species how many lines the paragraph has in the input file and beoln, set to non-zero if the paragraph ends with an empty line instead of a breaking block element. - rndr_raw_html
- For
LOWDOWN_RAW_HTML
, the opaque HTML text. - rndr_table
- For
LOWDOWN_TABLE_BLOCK
, the number of columns in each row or header row. The number of columns in rndr_table, rndr_table_header, and rndr_table_cell are the same. - rndr_table_cell
- For
LOWDOWN_TABLE_CELL
, the current col column number out of columns. See rndr_table_header for a description of the bits in flags. The number of columns in rndr_table, rndr_table_header, and rndr_table_cell are the same. - rndr_table_header
- For
LOWDOWN_TABLE_HEADER
, the number of columns in each row and the per-column flags, which may tested for equality againstHTBL_FL_ALIGN_LEFT
,HTBL_FL_ALIGN_RIGHT
, orHTBL_FL_ALIGN_CENTER
after being masked withHTBL_FL_ALIGNMASK
; orHTBL_FL_HEADER
. If no alignment is specified after the mask, the default should be left-aligned. The number of columns in rndr_table, rndr_table_header, and rndr_table_cell are the same.
SEE ALSO¶
lowdown(1), lowdown_buf(3), lowdown_buf_diff(3), lowdown_diff(3), lowdown_doc_free(3), lowdown_doc_new(3), lowdown_doc_parse(3), lowdown_file(3), lowdown_file_diff(3), lowdown_gemini_free(3), lowdown_gemini_new(3), lowdown_gemini_rndr(3), lowdown_html_free(3), lowdown_html_new(3), lowdown_html_rndr(3), lowdown_latex_free(3), lowdown_latex_new(3), lowdown_latex_rndr(3), lowdown_metaq_free(3), lowdown_nroff_free(3), lowdown_nroff_new(3), lowdown_nroff_rndr(3), lowdown_odt_free(3), lowdown_odt_new(3), lowdown_odt_rndr(3), lowdown_term_free(3), lowdown_term_new(3), lowdown_term_rndr(3), lowdown_tree_rndr(3), lowdown(5)
AUTHORS¶
lowdown
was forked from
hoedown by
Kristaps Dzonsons,
kristaps@bsd.lv. It has been
considerably modified since.
October 21, 2024 | Debian |