NAME¶
Net::Z3950::Simple2ZOOM::Config - configuration file for the Simple2ZOOM gateway
SYNOPSIS¶
<client>
<authentication>http://some.url/{user}/pwd={pass}<authentication>
<database name="srubooks">
<zurl>http://z3950.loc.gov:7090/voyager</zurl>
<option name="sru">get</option>
<charset>marc-8</charset>
<search>
<querytype>cql</querytype>
<map use="4"><index>title</index></map>
<map use="1003"><index>creator</index></map>
</search>
</database>
</client>
DESCRIPTION¶
The universal Swiss Army Gateway "simple2zoom" is configured by a
single file, named on the command-line, and expressed in XML. This file
specifies which back-end databases are supported, how the back-ends are
contacted, what character-sets they provide records in, and how to map Z39.50
searches to CQL.
The structure of the file is pretty simple.
Top Level¶
- <client>
- The top-level element is <client>. It contains a
single optional <authentication> element, any number of
<database> elements and a single optional <search> element.
The second of these specifies how to interpret requests to search in the
configured databases; the last provides query mapping specifications for
dynamically specified databases.
- <authentication>
- This element contains a URL template, specifying the
address of an HTTP authentication server. The template must include the
special strings "{user}" and "{pass}", which are
substituted with the username and password supplied in the Init request,
if any. The resulting URL is actioned and the result examined: any
successful response (HTTP status 200) indicates that the username/password
combination is acceptable, and that the session can continue; any other
response (e.g. 401 Authorization Required) results in the Init request
being refused with BIB-1 diagnostic 1014 (Init/AC: Authentication System
error).
If the <authentication> element is omitted from the configuration, no
authentication credentials are required, and any that are provided are
ignored.
(A trivial example of an authentication server script is included in the
Simple2ZOOM distribution, as "etc/sru-auth".)
- <database>
- The <database> element carries a "name"
attribute specifying the Z39.50 database name by which is it is known to
clients. It contains several complex elements, and is discussed in more
detail below.
- <search>
- Each <search> element, whether contained within a
specific <database> (see below) or at the top level, consists of a
single mandatory <querytype> element followed by any number of
<map>s. The content of <querytype> indicates the type of query
that should be sent to the back-end server, with Simple2ZOOM reposible for
translating incoming queries as required into that format. At present, the
only supported value is "cql".
- <map>
- Each <map> element carries a "use"
attribute, which is the numeric value of BIB-1 use attribute to be
supported, and optionally contains a single <index> element which in
turn contains the name of the corresponding CQL index. Type-1 searches
against the specified BIB-1 access point are mapped to CQL searches
against the specified index.
If the <index> is omitted within a <map>, then the generated CQL
query term has no index specified. This can be useful for BIB-1 attributes
such as 1016 (any) and 1035 (anywhere).
Databases¶
The <database> element which describes each database contains the
following elements in the specified order.
In general, <database> entries are of two kinds: those connecting through
to a Z39.50 database will have no <search> element, since no query
mapping is necessary to translate an incoming Type-1 query; but those
connecting to an SRU or SRW database will have a <search> element with
<querytype> set to "cql" and containing information on how to
map from specified BIB-1 use attributes to CQL indexes.
- <zurl>
- Contains the target address of the back-end database (e.g.
"tcp:z3950.indexdata.com/gils" or
"http://z3950.loc.gov:7090/voyager").
- <resultsetid>
- Optional. If provided, it must take one of the following
values, and if it is omitted then the value "fallback" is
assumed:
- "id"
- When queries are received that include references to
existing result-sets, these are translated into result-set references
using the "cql.resultSetId" index. It is an error if the server
does not support this facility.
- "search"
- References to existing result-sets are rewritten as
resubmissions of the query. This works on all servers, but does not
reliably give precisely correct results if the database is updated between
searches.
- "fallback"
- Result-set references are used when supported, but
resubmissions of prior queries are used when this facility is
unavailable.
- <nonamedresultsets>
- This is optional. If provided, it is empty and indicates
that the back-end database does not support named result sets.
- <option>
- There may be any number of these. Each <option>
element carries a "name" attribute and contains a corresponding
value. These are ZOOM options which are applied to the connection when it
is first created, and can be used to control, for example, the desired
"elementSetName" or "schema" of the records provided
by the back-end. A particularly important option is "sru", which
may be set to "get", "post" or "soap" to
request vanilla SRU, SRU over POST and SRW respectively.
- <charset>
- Optional. Contains the name of the character-set in which
the back-end target supplies records (e.g. "marc-8")
- <search>
- Optional. Provides specifications for how to search the
database, exactly like the top-level <search> element described
above.
- <schema>
- Optional and repeatable: each element indicates special
handling for when records are requested in a particular schema. See
below.
- <sutrs-record>, <usmarc-record>,
<grs1-record>.
- Optional. Provides specifications for how to construct
records in the relevant syntaxes when they are requested by clients.
The format is the same in all cases: the specification contains a list of
<field> elements, each of which has an "xpath" attribute
and textual content. Records are built by accessing the data addressed by
the specified XPath expressions, and encoding each as an element addressed
as specified by the element content. The interpretation of the content is
different for different record-syntaxes:
- SUTRS
- The content is ignored.
- USMARC
- The content indicates a MARC field by a string consisting
of the following parts, in order: a three-digit field number; optionally a
slash followed by the first indicator; optionally another slash followed
by the second indicator; optionally a dollar sign followed by a subfield
tag. In other words, MARC field specifications much match the regular
expression "/^\d\d\d(/w)?(/w?)(\$\w)?$/". It is impossible to
specify the second indicator without the first, but a subfield may be
specified along with zero, one or two indicators.
As usual, a few examples are worth any amount of explanation:
001
260$c
500$a
100/1$a
245/1/0$a
- GRS-1
- The content indicates an address within a GRS-1 record in
the form of one or more consecutive (type,value) pairs, each enclosed in
parentheses. For example, "(1,14)" would indicate an element of
type 1 (tagSet-M) with value 14 (localControlNumber). A longer path such
as "(3,admin)(2,6)" indicates an abstract field (tagSet-G
element 6) within an "admin" sub-record.
Schemas¶
Each <schema> element is empty, but carries the following attributes,
which are used to provide records to Z39.50 clients in MARC formats.
- oid
- Mandatory. This is the OID of a Z39.50 record-syntax which
is to be handled by schema mapping. Requests in this database for this
record-syntax are handled as specified. Example value:
1.2.840.10003.5.10
- sru
- Mandatory. This is the URI of an SRU/W schema which is
requested from the SRU or SRW back-end in order to fulfill the request.
Example value: "info:srw/schema/1/marcxml-v1.1"
- format
- Optional. Indicates which of the MARC variants is in use,
so that the record can be formatted correctly. Defaults to
"MARC21" if omitted. Example values: "MARC21",
"USMARC", "UNIMARC"
- encoding
- Optional. Indicates which character-set to use for the
formatted record. Defaults to "UTF-8" if omitted. Example
values: "UTF-8", <MARC-8>
NOTE that in its current form this schema-mapping only works for the
specific though common combination of Z39.50 front-end, SRU back-end and MARC
record syntax.
CONFIGURATION FILE SCHEMA¶
The Simple2ZOOM distribution includes, in the "etc" directory, an XML
schema which can be used to validate configuration files. This schema is
provided in four formats:
- simple2zoom.rnc
- Relax-NG compact format: a simple, elegant, terse and
wholly comprehensible XML constraint language that you don't even need to
learn in order to understand. This is the master version: the others are
automatically generated from it.
- simple2zoom.rng
- Relax-NG XML format: the world seems to have this zany
fetish that everything must be specified in XML, so Relax-NG has an XML
syntax that corresponds trivially with the much nicer compact syntax. The
principle value of this is that "xmllint" understands it.
- simple2zoom.dtd
- An old-fashioned DTD (document type definition).
- simple2zoom.xsd
- If you must.
Use whichever you like. For example,
xmllint --relaxng simple2zoom.rng --noout test.xml
xmllint --dtdvalid simple2zoom.dtd --noout test.xml
xmllint --schema simple2zoom.xsd --noout test.xml
SEE ALSO¶
The "simple2zoom" program.
The "Net::Z3950::Simple2ZOOM" module.
AUTHOR¶
Mike Taylor <mike@indexdata.com>
COPYRIGHT AND LICENCE¶
Copyright (C) 2007 by Index Data.
This library is free software; you can redistribute it and/or modify it under
the same terms as Perl itself, either Perl version 5.8.8 or, at your option,
any later version of Perl 5 you may have available.