The PQF grammar is documented in the YAZ manual, and shall not be repeated here. This textual PQF representation is not transmitted to Zebra during search, but it is in the client mapped to the equivalent Z39.50 binary query parse tree.
The RPN parse tree - or the equivalent textual representation in PQF - may start with one specification of the attribute set used. Following is a query tree, which consists of atomic query parts (APT) or named result sets, eventually paired by boolean binary operators, and finally recursively combined into complex query trees.
Attribute sets define the exact meaning and semantics of queries issued. Zebra comes with some predefined attribute set definitions, others can easily be defined and added to the configuration.
Table 5.1. Attribute sets predefined in Zebra
Attribute set | PQF notation (Short hand) | Status | Notes |
---|---|---|---|
Explain | exp-1 | Special attribute set used on the special automagic
IR-Explain-1 database to gain information on
server capabilities, database names, and database
and semantics. | predefined |
BIB-1 | bib-1 | Standard PQF query language attribute set which defines the semantics of Z39.50 searching. In addition, all of the non-use attributes (types 2-14) define the hard-wired Zebra internal query processing. | default |
GILS | gils | Extension to the BIB-1 attribute set. | predefined |
The use attributes (type 1) mappings the
predefined attribute sets are found in the
attribute set configuration files tab/*.att
.
The Zebra internal query processing is modeled after the BIB-1 attribute set, and the non-use attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with Section 2.4, “Zebra general Bib1 Non-Use Attributes (type 2-6)”.
A pair of sub query trees, or of atomic queries, is combined using the standard boolean operators into new query trees. Thus, boolean operators are always internal nodes in the query tree.
Table 5.2. Boolean operators
Keyword | Operator | Description |
---|---|---|
@and | binary AND operator | Set intersection of two atomic queries hit sets |
@or | binary OR operator | Set union of two atomic queries hit sets |
@not | binary AND NOT operator | Set complement of two atomic queries hit sets |
@prox | binary PROXIMITY operator | Set intersection of two atomic queries hit sets. In addition, the intersection set is purged for all documents which do not satisfy the requested query term proximity. Usually a proper subset of the AND operation. |
For example, we can combine the terms information and retrieval into different searches in the default index of the default attribute set as follows. Querying for the union of all documents containing the terms information OR retrieval:
Z> find @or information retrieval
Querying for the intersection of all documents containing the terms information AND retrieval: The hit set is a subset of the corresponding OR query.
Z> find @and information retrieval
Querying for the intersection of all documents containing the terms information AND retrieval, taking proximity into account: The hit set is a subset of the corresponding AND query (see the PQF grammar for details on the proximity operator):
Z> find @prox 0 3 0 2 k 2 information retrieval
Querying for the intersection of all documents containing the terms information AND retrieval, in the same order and near each other as described in the term list. The hit set is a subset of the corresponding PROXIMITY query.
Z> find "information retrieval"
Atomic queries are the query parts which work on one access point only. These consist of an attribute list followed by a single term or a quoted term list, and are often called Attributes-Plus-Terms (APT) queries.
Atomic (APT) queries are always leaf nodes in the PQF query tree. UN-supplied non-use attributes types 2-12 are either inherited from higher nodes in the query tree, or are set to Zebra's default values. See Section 2.3, “BIB-1 Attribute Set” for details.
Table 5.3. Atomic queries (APT)
Name | Type | Notes |
---|---|---|
attribute list | List of orthogonal attributes | Any of the orthogonal attribute types may be omitted, these are inherited from higher query tree nodes, or if not inherited, are set to the default Zebra configuration values. |
term | single term or quoted term list | Here the search terms or list of search terms is added to the query |
Querying for the term information in the default index using the default attribute set, the server choice of access point/index, and the default non-use attributes.
Z> find information
Equivalent query fully specified including all default values:
Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information
Finding all documents which have the term debussy in the title field.
Z> find @attr 1=4 debussy
The scan operation is only supported with atomic APT queries, as it is bound to one access point at a time. Boolean query trees are not allowed during scan.
For example, we might want to scan the title index, starting with the term debussy, and displaying this and the following terms in lexicographic order:
Z> scan @attr 1=4 debussy
Named result sets are supported in Zebra, and result sets can be used as operands without limitations. It follows that named result sets are leaf nodes in the PQF query tree, exactly as atomic APT queries are.
After the execution of a search, the result set is available at the server, such that the client can use it for subsequent searches or retrieval requests. The Z30.50 standard actually stresses the fact that result sets are volatile. It may cease to exist at any time point after search, and the server will send a diagnostic to the effect that the requested result set does not exist any more.
Defining a named result set and re-using it in the next query, using yaz-client. Notice that the client, not the server, assigns the string '1' to the named result set.
Z> f @attr 1=4 mozart ... Number of hits: 43, setno 1 ... Z> f @and @set 1 @attr 1=4 amadeus ... Number of hits: 14, setno 2
Named result sets are only supported by the Z39.50 protocol. The SRU web service is stateless, and therefore the notion of named result sets does not exist when accessing a Zebra server by the SRU protocol.
The numeric use (type 1) attribute is usually referred to from a given attribute set. In addition, Zebra let you use any internal index name defined in your configuration as use attribute value. This is a great feature for debugging, and when you do not need the complexity of defined use attribute values. It is the preferred way of accessing Zebra indexes directly.
Finding all documents which have the term list "information retrieval" in an Zebra index, using its internal full string name. Scanning the same index.
Z> find @attr 1=sometext "information retrieval" Z> scan @attr 1=sometext aterm
Searching or scanning the bib-1 use attribute 54 using its string name:
Z> find @attr 1=Code-language eng Z> scan @attr 1=Code-language ""
It is possible to search in any silly string index - if it's defined in your indexing rules and can be parsed by the PQF parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results.
Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"
See also Section 3.5, “Mapping from PQF atomic APT queries to Zebra internal register indexes” for details, and the section called “The SRU Server” for the SRU PQF query extension using string names as a fast debugging facility.
As we have seen above, it is possible (albeit seldom a great idea) to emulate XPath 1.0 based search by defining use (type 1) string attributes which in appearance resemble XPath queries. There are two problems with this approach: first, the XPath-look-alike has to be defined at indexing time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact gets populated from a possible entirely different XML element than it pretends to access.
When using the GRS-1 Record Model
(see Chapter 9, GRS-1 Record Model and Filter Modules), we have the
possibility to embed life
XPath expressions
in the PQF queries, which are here called
use (type 1) xpath
attributes. You must enable the
xpath enable
directive in your
.abs
configuration files.
Only a very restricted subset of the XPath 1.0 standard is supported as the GRS-1 record model is simpler than a full XML DOM structure. See the following examples for possibilities.
Finding all documents which have the term "content" inside a text node found in a specific XML DOM subtree, whose starting element is addressed by XPath.
Z> find @attr 1=/root content Z> find @attr 1=/root/first content
Notice that the
XPath must be absolute, i.e., must start with '/', and that the
XPath descendant-or-self
axis followed by a
text node selection text()
is implicitly
appended to the stated XPath.
It follows that the above searches are interpreted as:
Z> find @attr 1=/root//text() content Z> find @attr 1=/root/first//text() content
Searching inside attribute strings is possible:
Z> find @attr 1=/link/@creator morten
Filter the addressing XPath by a predicate working on exact
string values in
attributes (in the XML sense) can be done: return all those docs which
have the term "english" contained in one of all text sub nodes of
the subtree defined by the XPath
/record/title[@lang='en']
. And similar
predicate filtering.
Z> find @attr 1=/record/title[@lang='en'] english Z> find @attr 1=/link[@creator='sisse'] sibelius Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius
Combining numeric indexes, boolean expressions, and xpath based searches is possible:
Z> find @attr 1=/record/title @and foo bar Z> find @and @attr 1=/record/title foo @attr 1=4 bar
Escaping PQF keywords and other non-parseable XPath constructs
with '{ }'
to prevent client-side PQF parsing
syntax errors:
Z> find @attr {1=/root/first[@attr='danish']} content Z> find @attr {1=/record/@set} oai
It is worth mentioning that these dynamic performed XPath queries are a performance bottleneck, as no optimized specialized indexes can be used. Therefore, avoid the use of this facility when speed is essential, and the database content size is medium to large.
The Z39.50 standard defines the
Explain attribute set
Exp-1, which is used to discover information
about a server's search semantics and functional capabilities
Zebra exposes a "classic"
Explain database by base name IR-Explain-1
, which
is populated with system internal information.
The attribute-set exp-1
consists of a single
use attribute (type 1).
In addition, the non-Use BIB-1 attributes, that is, the types Relation, Position, Structure, Truncation, and Completeness are imported from the BIB-1 attribute set, and may be used within any explain query.
The following Explain search attributes are supported:
ExplainCategory
(@attr 1=1),
DatabaseName
(@attr 1=3),
DateAdded
(@attr 1=9),
DateChanged
(@attr 1=10).
A search in the use attribute ExplainCategory
supports only these predefined values:
CategoryList
, TargetInfo
,
DatabaseInfo
, AttributeDetails
.
See tab/explain.att
and the
Z39.50 standard
for more information.
Classic Explain only defines retrieval of Explain information
via ASN.1. Practically no Z39.50 clients supports this. Fortunately
they don't have to - Zebra allows retrieval of this information
in other formats:
SUTRS, XML,
GRS-1 and ASN.1
Explain.
List supported categories to find out which explain commands are supported:
Z> base IR-Explain-1 Z> find @attr exp1 1=1 categorylist Z> form sutrs Z> show 1+2
Get target info, that is, investigate which databases exist at this server endpoint:
Z> base IR-Explain-1 Z> find @attr exp1 1=1 targetinfo Z> form xml Z> show 1+1 Z> form grs-1 Z> show 1+1 Z> form sutrs Z> show 1+1
List all supported databases, the number of hits
is the number of databases found, which most commonly are the
following two:
the Default
and the
IR-Explain-1
databases.
Z> base IR-Explain-1 Z> find @attr exp1 1=1 databaseinfo Z> form sutrs Z> show 1+2
Get database info record for database Default
.
Z> base IR-Explain-1 Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default
Identical query with explicitly specified attribute set:
Z> base IR-Explain-1 Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default
Get attribute details record for database
Default
.
This query is very useful to study the internal Zebra indexes.
If records have been indexed using the alvis
XSLT filter, the string representation names of the known indexes can be
found.
Z> base IR-Explain-1 Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default
Identical query with explicitly specified attribute set:
Z> base IR-Explain-1 Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default
Most of the information contained in this section is an excerpt of the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS found at . The BIB-1 Attribute Set Semantics from 1995, also in an updated BIB-1 Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of Zebra's capabilities, and the example queries.
A use attribute specifies an access point for any atomic query.
These access points are highly dependent on the attribute set used
in the query, and are user configurable using the following
default configuration files:
tab/bib1.att
,
tab/dan1.att
,
tab/explain.att
, and
tab/gils.att
.
For example, some few BIB-1 use
attributes from the tab/bib1.att
are:
att 1 Personal-name att 2 Corporate-name att 3 Conference-name att 4 Title ... att 1009 Subject-name-personal att 1010 Body-of-text att 1011 Date/time-added-to-db ... att 1016 Any att 1017 Server-choice att 1018 Publisher ... att 1035 Anywhere att 1036 Author-Title-Subject
New attribute sets can be added by adding new
tab/*.att
configuration files, which need to
be sourced in the main configuration zebra.cfg
.
In addition, Zebra allows the access of internal index names and dynamic XPath as use attributes; see Section 2.1.5, “Zebra's special access point of type 'string'” and Section 2.1.6, “Zebra's special access point of type 'XPath' for GRS-1 filters”.
Phrase search for information retrieval in the title-register, scanning the same register afterwards:
Z> find @attr 1=4 "information retrieval" Z> scan @attr 1=4 information
Relation attributes describe the relationship of the access point (left side of the relation) to the search term as qualified by the attributes (right side of the relation), e.g., Date-publication <= 1975.
Table 5.4. Relation Attributes (type 2)
Relation | Value | Notes |
---|---|---|
Less than | 1 | supported |
Less than or equal | 2 | supported |
Equal | 3 | default |
Greater or equal | 4 | supported |
Greater than | 5 | supported |
Not equal | 6 | unsupported |
Phonetic | 100 | unsupported |
Stem | 101 | unsupported |
Relevance | 102 | supported |
AlwaysMatches | 103 | supported * |
AlwaysMatches searches are only supported if alwaysmatches indexing has been enabled. See Section 1, “The default.idx file”
The relation attributes 1-5 are supported and work exactly as expected. All ordering operations are based on a lexicographical ordering, except when the structure attribute numeric (109) is used. In this case, ordering is numerical. See Section 2.4.3, “Structure Attributes (type 4)”.
Z> find @attr 1=Title @attr 2=1 music ... Number of hits: 11745, setno 1 ... Z> find @attr 1=Title @attr 2=2 music ... Number of hits: 11771, setno 2 ... Z> find @attr 1=Title @attr 2=3 music ... Number of hits: 532, setno 3 ... Z> find @attr 1=Title @attr 2=4 music ... Number of hits: 11463, setno 4 ... Z> find @attr 1=Title @attr 2=5 music ... Number of hits: 11419, setno 5
The relation attribute Relevance (102) is supported, see Section 9, “Relevance Ranking and Sorting of Result Sets” for full information.
Ranked search for information retrieval in the title-register:
Z> find @attr 1=4 @attr 2=102 "information retrieval"
The relation attribute
AlwaysMatches (103) is in the default
configuration
supported in conjecture with structure attribute
Phrase (1) (which may be omitted by
default).
It can be configured to work with other structure attributes,
see the configuration file
tab/default.idx
and
Section 3.5, “Mapping from PQF atomic APT queries to Zebra internal
register indexes”.
AlwaysMatches (103) is a great way to discover how many documents have been indexed in a given field. The search term is ignored, but needed for correct PQF syntax. An empty search term may be supplied.
Z> find @attr 1=Title @attr 2=103 "" Z> find @attr 1=Title @attr 2=103 @attr 4=1 ""
The position attribute specifies the location of the search term within the field or subfield in which it appears.
Table 5.5. Position Attributes (type 3)
Position | Value | Notes |
---|---|---|
First in field | 1 | supported * |
First in subfield | 2 | supported * |
Any position in field | 3 | default |
Zebra only supports first-in-field seaches if the
firstinfield
is enabled for the index
Refer to Section 1, “The default.idx file”.
Zebra does not distinguish between first in field and
first in subfield. They result in the same hit count.
Searching for first position in (sub)field in only supported in Zebra
2.0.2 and later.
The structure attribute specifies the type of search term. This causes the search to be mapped on different Zebra internal indexes, which must have been defined at index time.
The possible values of the
structure attribute (type 4)
can be defined
using the configuration file tab/default.idx
.
The default configuration is summarized in this table.
Table 5.6. Structure Attributes (type 4)
Structure | Value | Notes |
---|---|---|
Phrase | 1 | default |
Word | 2 | supported |
Key | 3 | supported |
Year | 4 | supported |
Date (normalized) | 5 | supported |
Word list | 6 | supported |
Date (un-normalized) | 100 | unsupported |
Name (normalized) | 101 | unsupported |
Name (un-normalized) | 102 | unsupported |
Structure | 103 | unsupported |
Urx | 104 | supported |
Free-form-text | 105 | supported |
Document-text | 106 | supported |
Local-number | 107 | supported |
String | 108 | unsupported |
Numeric string | 109 | supported |
The structure attribute values
Word list (6)
is supported, and maps to the boolean AND
combination of words supplied. The word list is useful when
Google-like bag-of-word queries need to be translated from a GUI
query language to PQF. For example, the following queries
are equivalent:
Z> find @attr 1=Title @attr 4=6 "mozart amadeus" Z> find @attr 1=Title @and mozart amadeus
The structure attribute value
Free-form-text (105)
and
Document-text (106)
are supported, and map both to the boolean OR
combination of words supplied. The following queries
are equivalent:
Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman" Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman" Z> find @attr 1=Body-of-text @or bach @or salieri teleman
This OR
list of terms is very useful in
combination with relevance ranking:
Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman"
The structure attribute value
Local number (107)
is supported, and maps always to the Zebra internal document ID,
irrespectively which use attribute is specified. The following queries
have exactly the same unique record in the hit set:
Z> find @attr 4=107 10 Z> find @attr 1=4 @attr 4=107 10 Z> find @attr 1=1010 @attr 4=107 10
In
the GILS schema (gils.abs
), the
west-bounding-coordinate is indexed as type n
,
and is therefore searched by specifying
structure=Numeric String.
To match all those records with west-bounding-coordinate greater
than -114 we use the following query:
Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114
The exact mapping between PQF queries and Zebra internal indexes and index types is explained in Section 3.5, “Mapping from PQF atomic APT queries to Zebra internal register indexes”.
The truncation attribute specifies whether variations of one or more characters are allowed between search term and hit terms, or not. Using non-default truncation attributes will broaden the document hit set of a search query.
Table 5.7. Truncation Attributes (type 5)
Truncation | Value | Notes |
---|---|---|
Right truncation | 1 | supported |
Left truncation | 2 | supported |
Left and right truncation | 3 | supported |
Do not truncate | 100 | default |
Process # in search term | 101 | supported |
RegExpr-1 | 102 | supported |
RegExpr-2 | 103 | supported |
The truncation attribute values 1-3 perform the obvious way:
Z> scan @attr 1=Body-of-text schnittke ... * schnittke (81) schnittkes (31) schnittstelle (1) ... Z> find @attr 1=Body-of-text @attr 5=1 schnittke ... Number of hits: 95, setno 7 ... Z> find @attr 1=Body-of-text @attr 5=2 schnittke ... Number of hits: 81, setno 6 ... Z> find @attr 1=Body-of-text @attr 5=3 schnittke ... Number of hits: 95, setno 8
The truncation attribute value
Process # in search term (101)
is a
poor-man's regular expression search. It maps
each #
to .*
, and
performs then a Regexp-1 (102)
regular
expression search. The following two queries are equivalent:
Z> find @attr 1=Body-of-text @attr 5=101 schnit#ke Z> find @attr 1=Body-of-text @attr 5=102 schnit.*ke ... Number of hits: 89, setno 10
The truncation attribute value
Regexp-1 (102)
is a normal regular search,
see Section 3.6, “Zebra Regular Expressions in Truncation Attribute (type = 5)” for details.
Z> find @attr 1=Body-of-text @attr 5=102 schnit+ke Z> find @attr 1=Body-of-text @attr 5=102 schni[a-t]+ke
The truncation attribute value
Regexp-2 (103)
is a Zebra specific extension
which allows fuzzy matches. One single
error in spelling of search terms is allowed, i.e., a document
is hit if it includes a term which can be mapped to the used
search term by one character substitution, addition, deletion or
change of position.
Z> find @attr 1=Body-of-text @attr 5=100 schnittke ... Number of hits: 81, setno 14 ... Z> find @attr 1=Body-of-text @attr 5=103 schnittke ... Number of hits: 103, setno 15 ...
The Completeness Attributes (type = 6)
is used to specify that a given search term or term list is either
part of the terms of a given index/field
(Incomplete subfield (1)
), or is
what literally is found in the entire field's index
(Complete field (3)
).
Table 5.8. Completeness Attributes (type = 6)
Completeness | Value | Notes |
---|---|---|
Incomplete subfield | 1 | default |
Complete subfield | 2 | deprecated |
Complete field | 3 | supported |
The Completeness Attributes (type = 6)
is only partially and conditionally
supported in the sense that it is ignored if the hit index is
not of structure type="w"
or
type="p"
.
Incomplete subfield (1)
is the default, and
makes Zebra use
register type="w"
, whereas
Complete field (3)
triggers
search and scan in index type="p"
.
The Complete subfield (2)
is a reminiscent
from the happy MARC
binary format days. Zebra does not support it, but maps silently
to Complete field (3)
.
The exact mapping between PQF queries and Zebra internal indexes and index types is explained in Section 3.5, “Mapping from PQF atomic APT queries to Zebra internal register indexes”.