The following sections describe the configuration files that govern
the internal management of grs
records.
The system searches for the files
in the directories specified by the profilePath
setting in the zebra.cfg
file.
The abstract syntax definition (also known as an Abstract Record Structure, or ARS) is the focal point of the record schema description. For a given schema, the ABS file may state any or all of the following:
The object identifier of the Z39.50 schema associated with the ARS, so that it can be referred to by the client.
The attribute set (which can possibly be a compound of multiple sets) which applies in the profile. This is used when indexing and searching the records belonging to the given profile.
The tag set (again, this can consist of several different sets). This is used when reading the records from a file, to recognize the different tags, and when transmitting the record to the client - mapping the tags to their numerical representation, if they are known.
The variant set which is used in the profile. This provides a vocabulary for specifying the forms of data that appear inside the records.
Element set names, which are a shorthand way for the client to ask for a subset of the data elements contained in a record. Element set names, in the retrieval module, are mapped to element specifications, which contain information equivalent to the Espec-1 syntax of Z39.50.
Map tables, which may specify mappings to other database profiles, if desired.
Possibly, a set of rules describing the mapping of elements to a MARC representation.
A list of element descriptions (this is the actual ARS of the schema, in Z39.50 terms), which lists the ways in which the various tags can be used and organized hierarchically.
Several of the entries above simply refer to other files, which describe the given objects.
This section describes the syntax and use of the various tables which are used by the retrieval module.
The number of different file types may appear daunting at first, but each type corresponds fairly clearly to a single aspect of the Z39.50 retrieval facilities. Further, the average database administrator, who is simply reusing an existing profile for which tables already exist, shouldn't have to worry too much about the contents of these tables.
Generally, the files are simple ASCII files, which can be maintained using any text editor. Blank lines, and lines beginning with a (#) are ignored. Any characters on a line followed by a (#) are also ignored. All other lines contain directives, which provide some setting or value to the system. Generally, settings are characterized by a single keyword, identifying the setting, followed by a number of parameters. Some settings are repeatable (r), while others may occur only once in a file. Some settings are optional (o), while others again are mandatory (m).
The name of this file type is slightly misleading in Z39.50 terms, since, apart from the actual abstract syntax of the profile, it also includes most of the other definitions that go into a database profile.
When a record in the canonical, SGML-like format is read from a file
or from the database, the first tag of the file should reference the
profile that governs the layout of the record. If the first tag of the
record is, say, <gils>
, the system will look
for the profile definition in the file gils.abs
.
Profile definitions are cached, so they only have to be read once
during the lifespan of the current process.
When writing your own input filters, the record-begin command introduces the profile, and should always be called first thing when introducing a new record.
The file may contain the following directives:
symbolic-name
(m) This provides a shorthand name or description for the profile. Mostly useful for diagnostic purposes.
OID-name
(m) The reference name of the OID for the profile. The reference names can be found in the util module of YAZ.
filename
(m) The attribute set that is used for indexing and searching records belonging to this profile.
filename
(o) The tag set (if any) that describe that fields of the records.
filename
(o) The variant set used in the profile.
filename
(o,r) This points to a conversion table that might be used if the client asks for the record in a different schema from the native one.
filename
(o) Points to a file containing parameters for representing the record contents in the ISO2709 syntax. Read the description of the MARC representation facility below.
name filename
(o,r) Associates the given element set name with an element selection file. If an (@) is given in place of the filename, this corresponds to a null mapping for the given element set name.
tags
(o) This directive specifies a list of attributes
which should be appended to the attribute list given for each
element. The effect is to make every single element in the abstract
syntax searchable by way of the given attributes. This directive
provides an efficient way of supporting free-text searching across all
elements. However, it does increase the size of the index
significantly. The attributes can be qualified with a structure, as in
the elm
directive below.
path name attributes
(o,r) Adds an element to the abstract record syntax of the schema.
The path
follows the
syntax which is suggested by the Z39.50 document - that is, a sequence
of tags separated by slashes (/). Each tag is given as a
comma-separated pair of tag type and -value surrounded by parenthesis.
The name
is the name of the element, and
the attributes
specifies which attributes to use when indexing the element in a
comma-separated list.
A !
in place of the attribute name is equivalent
to specifying an attribute name identical to the element name.
A -
in place of the attribute name
specifies that no indexing is to take place for the given element.
The attributes can be qualified with field
types
to specify which
character set should govern the indexing procedure for that field.
The same data element may be indexed into several different
fields, using different character set definitions.
See the Chapter 10, Field Structure and Character Sets
.
The default field type is w
for
word.
xpath attributes
Specifies indexing for record nodes given by
xpath
. Unlike directive
elm, this directive allows you to index attribute
contents. The xpath
uses
a syntax similar to XPath. The attributes
have same syntax and meaning as directive elm, except that operator
! refers to the nodes selected by xpath
.
field$subfield attributes
This directive is specifically for MARC-formatted records,
ingested either in the form of MARCXML documents, or in the
ISO2709/Z39.2 format using the grs.marcxml input filter. You can
specify indexing rules for any subfield, or you can leave off the
$subfield
part and specify default rules
for all subfields of the given field (note: default rules should come
after any subfield-specific rules in the configuration file). The
attributes
have the same syntax and meaning
as for the 'elm' directive above.
encodingname
This directive specifies character encoding for external records. For records such as XML that specifies encoding within the file via a header this directive is ignored. If neither this directive is given, nor an encoding is set within external records, ISO-8859-1 encoding is assumed.
enable
/disable
If this directive is followed by enable
,
then extra indexing is performed to allow for XPath-like queries.
If this directive is not specified - equivalent to
disable
- no extra XPath-indexing is performed.
systemTag
actualTag
Specifies what information, if any, Zebra should
automatically include in retrieval records for the
``system fields'' that it supports.
systemTag
may
be any of the following:
rank
An integer indicating the relevance-ranking score assigned to the record.
sysno
An automatically generated identifier for the record,
unique within this database. It is represented by the
<localControlNumber>
element in
XML and the (1,14)
tag in GRS-1.
size
The size, in bytes, of the retrieved record.
The actualTag
parameter may be
none
to indicate that the named element
should be omitted from retrieval records.
The mechanism for controlling indexing is not adequate for complex databases, and will probably be moved into a separate configuration table eventually.
The following is an excerpt from the abstract syntax file for the GILS profile.
name gils reference GILS-schema attset gils.att tagset gils.tag varset var1.var maptab gils-usmarc.map # Element set names esetname VARIANT gils-variant.est # for WAIS-compliance esetname B gils-b.est esetname G gils-g.est esetname F @ elm (1,10) rank - elm (1,12) url - elm (1,14) localControlNumber Local-number elm (1,16) dateOfLastModification Date/time-last-modified elm (2,1) title w:!,p:! elm (4,1) controlIdentifier Identifier-standard elm (2,6) abstract Abstract elm (4,51) purpose ! elm (4,52) originator - elm (4,53) accessConstraints ! elm (4,54) useConstraints ! elm (4,70) availability - elm (4,70)/(4,90) distributor - elm (4,70)/(4,90)/(2,7) distributorName ! elm (4,70)/(4,90)/(2,10) distributorOrganization ! elm (4,70)/(4,90)/(4,2) distributorStreetAddress ! elm (4,70)/(4,90)/(4,3) distributorCity !
This file type describes the Use
elements of
an attribute set.
It contains the following directives.
symbolic-name
(m) This provides a shorthand name or description for the attribute set. Mostly useful for diagnostic purposes.
OID-name
(m) The reference name of the OID for
the attribute set.
The reference names can be found in the util
module of YAZ
.
filename
(o,r) This directive is used to
include another attribute set as a part of the current one. This is
used when a new attribute set is defined as an extension to another
set. For instance, many new attribute sets are defined as extensions
to the bib-1
set.
This is an important feature of the retrieval
system of Z39.50, as it ensures the highest possible level of
interoperability, as those access points of your database which are
derived from the external set (say, bib-1) can be used even by clients
who are unaware of the new set.
att-value att-name [local-value]
(o,r) This
repeatable directive introduces a new attribute to the set. The
attribute value is stored in the index (unless a
local-value
is
given, in which case this is stored). The name is used to refer to the
attribute from the abstract syntax
.
This is an excerpt from the GILS attribute set definition. Notice how the file describing the bib-1 attribute set is referenced.
name gils reference GILS-attset include bib1.att att 2001 distributorName att 2002 indextermsControlled att 2003 purpose att 2004 accessConstraints att 2005 useConstraints
This file type defines the tagset of the profile, possibly by referencing other tag sets (most tag sets, for instance, will include tagsetG and tagsetM from the Z39.50 specification. The file may contain the following directives.
(m) This provides a shorthand name or description for the tag set. Mostly useful for diagnostic purposes.
(o) The reference name of the OID for the tag set. The reference names can be found in the util module of YAZ. The directive is optional, since not all tag sets are registered outside of their schema.
(m) The type number of the tagset within the schema profile (note: this specification really should belong to the .abs file. This will be fixed in a future release).
(o,r) This directive is used to include the definitions of other tag sets into the current one.
(o,r) Introduces a new tag to the set. The number is the tag number as used in the protocol (there is currently no mechanism for specifying string tags at this point, but this would be quick work to add). The names parameter is a list of names by which the tag should be recognized in the input file format. The names should be separated by slashes (/). The type is the recommended data type of the tag. It should be one of the following:
structured
string
numeric
bool
oid
generalizedtime
intunit
int
octetstring
null
The following is an excerpt from the TagsetG definition file.
name tagsetg reference TagsetG type 2 tag 1 title string tag 2 author string tag 3 publicationPlace string tag 4 publicationDate string tag 5 documentId string tag 6 abstract string tag 7 name string tag 8 date generalizedtime tag 9 bodyOfDisplay string tag 10 organization string
The variant set file is a straightforward representation of the variant set definitions associated with the protocol. At present, only the Variant-1 set is known.
These are the directives allowed in the file.
(m) This provides a shorthand name or description for the variant set. Mostly useful for diagnostic purposes.
(o) The reference name of the OID for the variant set, if one is required. The reference names can be found in the util module of YAZ.
(m,r) Introduces a new class to the variant set.
(m,r) Addes a new type to the current class (the one introduced by the most recent class directive). The type names belong to the same name space as the one used in the tag set definition file.
The following is an excerpt from the file describing the variant set Variant-1.
name variant-1 reference Variant-1 class 1 variantId type 1 variantId octetstring class 2 body type 1 iana string type 2 z39.50 string type 3 other string
The element set specification files describe a selection of a subset of the elements of a database record. The element selection mechanism is equivalent to the one supplied by the Espec-1 syntax of the Z39.50 specification. In fact, the internal representation of an element set specification is identical to the Espec-1 structure, and we'll refer you to the description of that structure for most of the detailed semantics of the directives below.
Not all of the Espec-1 functionality has been implemented yet. The fields that are mentioned below all work as expected, unless otherwise is noted.
The directives available in the element set file are as follows:
(o) If variants are used in
the following, this should provide the name of the variantset used
(it's not currently possible to specify a different set in the
individual variant request). In almost all cases (certainly all
profiles known to us), the name
Variant-1
should be given here.
(o) This directive provides a default variant request for use when the individual element requests (see below) do not contain a variant request. Variant requests consist of a blank-separated list of variant components. A variant component is a comma-separated, parenthesized triple of variant class, type, and value (the two former values being represented as integers). The value can currently only be entered as a string (this will change to depend on the definition of the variant in question). The special value (@) is interpreted as a null value, however.
(o,r) This corresponds to a simple element request in Espec-1. The path consists of a sequence of tag-selectors, where each of these can consist of either:
A simple tag, consisting of a comma-separated type-value pair in parenthesis, possibly followed by a colon (:) followed by an occurrences-specification (see below). The tag-value can be a number or a string. If the first character is an apostrophe ('), this forces the value to be interpreted as a string, even if it appears to be numerical.
A WildThing, represented as a question mark (?), possibly followed by a colon (:) followed by an occurrences specification (see below).
A WildPath, represented as an asterisk (*). Note that the last element of the path should not be a wildPath (wildpaths don't work in this version).
The occurrences-specification can be either the string
all
, the string last
, or
an explicit value-range. The value-range is represented as
an integer (the starting point), possibly followed by a
plus (+) and a second integer (the number of elements, default
being one).
The variant-request has the same syntax as the defaultVariantRequest above. Note that it may sometimes be useful to give an empty variant request, simply to disable the default for a specific set of fields (we aren't certain if this is proper Espec-1, but it works in this implementation).
The following is an example of an element specification belonging to the GILS profile.
simpleelement (1,10) simpleelement (1,12) simpleelement (2,1) simpleelement (1,14) simpleelement (4,1) simpleelement (4,52)
Sometimes, the client might want to receive a database record in a schema that differs from the native schema of the record. For instance, a client might only know how to process WAIS records, while the database record is represented in a more specific schema, such as GILS. In this module, a mapping of data to one of the MARC formats is also thought of as a schema mapping (mapping the elements of the record into fields consistent with the given MARC specification, prior to actually converting the data to the ISO2709). This use of the object identifier for USMARC as a schema identifier represents an overloading of the OID which might not be entirely proper. However, it represents the dual role of schema and record syntax which is assumed by the MARC family in Z39.50.
These are the directives of the schema mapping file format:
(m) A symbolic name for the target schema of the table. Useful mostly for diagnostic purposes.
(m) An OID name for the target schema. This is used, for instance, by a server receiving a request to present a record in a different schema from the native one. The name, again, is found in the oid module of YAZ.
(o,r) Adds an element mapping rule to the table.