.TH xmerl_sax_parser 3erl "xmerl 1.3.31.1" "Ericsson AB" "Erlang Module Definition" .SH NAME xmerl_sax_parser \- XML SAX parser API .SH DESCRIPTION .LP A SAX parser for XML that sends the events through a callback interface\&. SAX is the \fISimple API for XML\fR\&, originally a Java-only API\&. SAX was the first widely adopted API for XML in Java, and is a \fIde facto\fR\& standard where there are versions for several programming language environments other than Java\&. .SH "DATA TYPES" .RS 2 .TP 2 .B \fIoption()\fR\&: Options used to customize the behaviour of the parser\&. Possible options are: .RS 2 .LP .RE .RS 2 .TP 2 .B \fI{continuation_fun, ContinuationFun}\fR\&: ContinuationFun is a call back function to decide what to do if the parser runs into EOF before the document is complete\&. .TP 2 .B \fI{continuation_state, term()}\fR\&: State that is accessible in the continuation call back function\&. .TP 2 .B \fI{event_fun, EventFun}\fR\&: EventFun is the call back function for parser events\&. .TP 2 .B \fI{event_state, term()}\fR\&: State that is accessible in the event call back function\&. .TP 2 .B \fI{file_type, FileType}\fR\&: Flag that tells the parser if it\&'s parsing a DTD or a normal XML file (default normal)\&. .RS 2 .TP 2 * \fIFileType = normal | dtd\fR\& .LP .RE .TP 2 .B \fI{encoding, Encoding}\fR\&: Set default character set used (default UTF-8)\&. This character set is used only if not explicitly given by the XML document\&. .RS 2 .TP 2 * \fIEncoding = utf8 | {utf16,big} | {utf16,little} | latin1 | list\fR\& .LP .RE .TP 2 .B \fIskip_external_dtd\fR\&: Skips the external DTD during parsing\&. This option is the same as {external_entities, none} and {fail_undeclared_ref, false} but just for the DTD\&. .TP 2 .B \fIdisallow_entities\fR\&: Implies that parsing fails if an ENTITY declaration is found\&. .TP 2 .B \fI{entity_recurse_limit, N}\fR\&: Sets how many levels of recursion that is allowed for entities\&. Default is 3 levels\&. .TP 2 .B \fI{external_entities, AllowedType}\fR\&: Sets which types of external entities that should be allowed, if not allowed it\&'s just skipped\&. .RS 2 .TP 2 * \fIAllowedType = all | file | none\fR\& .LP .RE .TP 2 .B \fI{fail_undeclared_ref, Boolean}\fR\&: Decides how the parser should behave when an undeclared reference is found\&. Can be useful if one has turned of external entities so that an external DTD is not parsed\&. Default is true\&. .RE .TP 2 .B : .TP 2 .B \fIevent()\fR\&: The SAX events that are sent to the user via the callback\&. .RS 2 .LP .RE .RS 2 .TP 2 .B \fIstartDocument\fR\&: Receive notification of the beginning of a document\&. The SAX parser will send this event only once before any other event callbacks\&. .TP 2 .B \fIendDocument\fR\&: Receive notification of the end of a document\&. The SAX parser will send this event only once, and it will be the last event during the parse\&. .TP 2 .B \fI{startPrefixMapping, Prefix, Uri}\fR\&: Begin the scope of a prefix-URI Namespace mapping\&. Note that start/endPrefixMapping events are not guaranteed to be properly nested relative to each other: all startPrefixMapping events will occur immediately before the corresponding startElement event, and all endPrefixMapping events will occur immediately after the corresponding endElement event, but their order is not otherwise guaranteed\&. There will not be start/endPrefixMapping events for the "xml" prefix, since it is predeclared and immutable\&. .RS 2 .TP 2 * \fIPrefix = string()\fR\& .LP .TP 2 * \fIUri = string()\fR\& .LP .RE .TP 2 .B \fI{endPrefixMapping, Prefix}\fR\&: End the scope of a prefix-URI mapping\&. .RS 2 .TP 2 * \fIPrefix = string()\fR\& .LP .RE .TP 2 .B \fI{startElement, Uri, LocalName, QualifiedName, Attributes}\fR\&: Receive notification of the beginning of an element\&. The Parser will send this event at the beginning of every element in the XML document; there will be a corresponding endElement event for every startElement event (even when the element is empty)\&. All of the element\&'s content will be reported, in order, before the corresponding endElement event\&. .RS 2 .TP 2 * \fIUri = string()\fR\& .LP .TP 2 * \fILocalName = string()\fR\& .LP .TP 2 * \fIQualifiedName = {Prefix, LocalName}\fR\& .LP .TP 2 * \fIPrefix = string()\fR\& .LP .TP 2 * \fIAttributes = [{Uri, Prefix, AttributeName, Value}]\fR\& .LP .TP 2 * \fIAttributeName = string()\fR\& .LP .TP 2 * \fIValue = string()\fR\& .LP .RE .TP 2 .B \fI{endElement, Uri, LocalName, QualifiedName}\fR\&: Receive notification of the end of an element\&. The SAX parser will send this event at the end of every element in the XML document; there will be a corresponding startElement event for every endElement event (even when the element is empty)\&. .RS 2 .TP 2 * \fIUri = string()\fR\& .LP .TP 2 * \fILocalName = string()\fR\& .LP .TP 2 * \fIQualifiedName = {Prefix, LocalName}\fR\& .LP .TP 2 * \fIPrefix = string()\fR\& .LP .RE .TP 2 .B \fI{characters, string()}\fR\&: Receive notification of character data\&. .TP 2 .B \fI{ignorableWhitespace, string()}\fR\&: Receive notification of ignorable whitespace in element content\&. .TP 2 .B \fI{processingInstruction, Target, Data}\fR\&: Receive notification of a processing instruction\&. The Parser will send this event once for each processing instruction found: note that processing instructions may occur before or after the main document element\&. .RS 2 .TP 2 * \fITarget = string()\fR\& .LP .TP 2 * \fIData = string()\fR\& .LP .RE .TP 2 .B \fI{comment, string()}\fR\&: Report an XML comment anywhere in the document (both inside and outside of the document element)\&. .TP 2 .B \fIstartCDATA\fR\&: Report the start of a CDATA section\&. The contents of the CDATA section will be reported through the regular characters event\&. .TP 2 .B \fIendCDATA\fR\&: Report the end of a CDATA section\&. .TP 2 .B \fI{startDTD, Name, PublicId, SystemId}\fR\&: Report the start of DTD declarations, it\&'s reporting the start of the DOCTYPE declaration\&. If the document has no DOCTYPE declaration, this event will not be sent\&. .RS 2 .TP 2 * \fIName = string()\fR\& .LP .TP 2 * \fIPublicId = string()\fR\& .LP .TP 2 * \fISystemId = string()\fR\& .LP .RE .TP 2 .B \fIendDTD\fR\&: Report the end of DTD declarations, it\&'s reporting the end of the DOCTYPE declaration\&. .TP 2 .B \fI{startEntity, SysId}\fR\&: Report the beginning of some internal and external XML entities\&. ??? .TP 2 .B \fI{endEntity, SysId}\fR\&: Report the end of an entity\&. ??? .TP 2 .B \fI{elementDecl, Name, Model}\fR\&: Report an element type declaration\&. The content model will consist of the string "EMPTY", the string "ANY", or a parenthesised group, optionally followed by an occurrence indicator\&. The model will be normalized so that all parameter entities are fully resolved and all whitespace is removed,and will include the enclosing parentheses\&. Other normalization (such as removing redundant parentheses or simplifying occurrence indicators) is at the discretion of the parser\&. .RS 2 .TP 2 * \fIName = string()\fR\& .LP .TP 2 * \fIModel = string()\fR\& .LP .RE .TP 2 .B \fI{attributeDecl, ElementName, AttributeName, Type, Mode, Value}\fR\&: Report an attribute type declaration\&. .RS 2 .TP 2 * \fIElementName = string()\fR\& .LP .TP 2 * \fIAttributeName = string()\fR\& .LP .TP 2 * \fIType = string()\fR\& .LP .TP 2 * \fIMode = string()\fR\& .LP .TP 2 * \fIValue = string()\fR\& .LP .RE .TP 2 .B \fI{internalEntityDecl, Name, Value}\fR\&: Report an internal entity declaration\&. .RS 2 .TP 2 * \fIName = string()\fR\& .LP .TP 2 * \fIValue = string()\fR\& .LP .RE .TP 2 .B \fI{externalEntityDecl, Name, PublicId, SystemId}\fR\&: Report a parsed external entity declaration\&. .RS 2 .TP 2 * \fIName = string()\fR\& .LP .TP 2 * \fIPublicId = string()\fR\& .LP .TP 2 * \fISystemId = string()\fR\& .LP .RE .TP 2 .B \fI{unparsedEntityDecl, Name, PublicId, SystemId, Ndata}\fR\&: Receive notification of an unparsed entity declaration event\&. .RS 2 .TP 2 * \fIName = string()\fR\& .LP .TP 2 * \fIPublicId = string()\fR\& .LP .TP 2 * \fISystemId = string()\fR\& .LP .TP 2 * \fINdata = string()\fR\& .LP .RE .TP 2 .B \fI{notationDecl, Name, PublicId, SystemId}\fR\&: Receive notification of a notation declaration event\&. .RS 2 .TP 2 * \fIName = string()\fR\& .LP .TP 2 * \fIPublicId = string()\fR\& .LP .TP 2 * \fISystemId = string()\fR\& .LP .RE .RE .TP 2 .B \fIunicode_char()\fR\&: Integer representing valid unicode codepoint\&. .TP 2 .B \fIunicode_binary()\fR\&: Binary with characters encoded in UTF-8 or UTF-16\&. .TP 2 .B \fIlatin1_binary()\fR\&: Binary with characters encoded in iso-latin-1\&. .RE .SH EXPORTS .LP .B file(Filename, Options) -> Result .br .RS .LP Types: .RS 3 Filename = string() .br Options = [option()] .br Result = {ok, EventState, Rest} | .br {Tag, Location, Reason, EndTags, EventState} .br Rest = unicode_binary() | latin1_binary() .br Tag = atom() (fatal_error, or user defined tag) .br Location = {CurrentLocation, EntityName, LineNo} .br CurrentLocation = string() .br EntityName = string() .br LineNo = integer() .br EventState = term() .br Reason = term() .br .RE .RE .RS .LP Parse file containing an XML document\&. This functions uses a default continuation function to read the file in blocks\&. .RE .LP .B stream(Xml, Options) -> Result .br .RS .LP Types: .RS 3 Xml = unicode_binary() | latin1_binary() | [unicode_char()] .br Options = [option()] .br Result = {ok, EventState, Rest} | .br {Tag, Location, Reason, EndTags, EventState} .br Rest = unicode_binary() | latin1_binary() | [unicode_char()] .br Tag = atom() (fatal_error or user defined tag) .br Location = {CurrentLocation, EntityName, LineNo} .br CurrentLocation = string() .br EntityName = string() .br LineNo = integer() .br EventState = term() .br Reason = term() .br .RE .RE .RS .LP Parse a stream containing an XML document\&. .RE .SH "CALLBACK FUNCTIONS" .LP The callback interface is based on that the user sends a fun with the correct signature to the parser\&. .SH EXPORTS .LP .B Module:ContinuationFun(State) -> {NewBytes, NewState} .br .RS .LP Types: .RS 3 State = NewState = term() .br NewBytes = binary() | list() (should be same as start input in stream/2) .br .RE .RE .RS .LP This function is called whenever the parser runs out of input data\&. If the function can\&'t get hold of more input an empty list or binary (depends on start input in stream/2) is returned\&. Other types of errors is handled through exceptions\&. Use throw/1 to send the following tuple {Tag = atom(), Reason = string()} if the continuation function encounters a fatal error\&. Tag is an atom that identifies the functional entity that sends the exception and Reason is a string that describes the problem\&. .RE .LP .B Module:EventFun(Event, Location, State) -> NewState .br .RS .LP Types: .RS 3 Event = event() .br Location = {CurrentLocation, Entityname, LineNo} .br CurrentLocation = string() .br Entityname = string() .br LineNo = integer() .br State = NewState = term() .br .RE .RE .RS .LP This function is called for every event sent by the parser\&. The error handling is done through exceptions\&. Use throw/1 to send the following tuple {Tag = atom(), Reason = string()} if the application encounters a fatal error\&. Tag is an atom that identifies the functional entity that sends the exception and Reason is a string that describes the problem\&. .RE