2. RPN queries and semantics

2. RPN queries and semantics
Prev	Chapter 5. Query Model	Next

The PQF grammar is documented in the YAZ manual, and shall not be repeated here. This textual PQF representation is not transmitted to Zebra during search, but it is in the client mapped to the equivalent Z39.50 binary query parse tree.

2.1. RPN tree structure

The RPN parse tree - or the equivalent textual representation in PQF - may start with one specification of the attribute set used. Following is a query tree, which consists of atomic query parts (APT) or named result sets, eventually paired by boolean binary operators, and finally recursively combined into complex query trees.

2.1.1. Attribute sets

Attribute sets define the exact meaning and semantics of queries issued. Zebra comes with some predefined attribute set definitions, others can easily be defined and added to the configuration.

Table 5.1. Attribute sets predefined in Zebra

Attribute set	PQF notation (Short hand)	Status	Notes
Explain	`exp-1`	Special attribute set used on the special automagic `IR-Explain-1` database to gain information on server capabilities, database names, and database and semantics.	predefined
BIB-1	`bib-1`	Standard PQF query language attribute set which defines the semantics of Z39.50 searching. In addition, all of the non-use attributes (types 2-14) define the hard-wired Zebra internal query processing.	default
GILS	`gils`	Extension to the BIB-1 attribute set.	predefined

The use attributes (type 1) mappings the predefined attribute sets are found in the attribute set configuration files tab/*.att.

Note

The Zebra internal query processing is modeled after the BIB-1 attribute set, and the non-use attributes type 2-6 are hard-wired in. It is therefore essential to be familiar with Section 2.4, “Zebra general Bib1 Non-Use Attributes (type 2-6)”.

2.1.2. Boolean operators

A pair of sub query trees, or of atomic queries, is combined using the standard boolean operators into new query trees. Thus, boolean operators are always internal nodes in the query tree.

Table 5.2. Boolean operators

Keyword	Operator	Description
`@and`	binary AND operator	Set intersection of two atomic queries hit sets
`@or`	binary OR operator	Set union of two atomic queries hit sets
`@not`	binary AND NOT operator	Set complement of two atomic queries hit sets
`@prox`	binary PROXIMITY operator	Set intersection of two atomic queries hit sets. In addition, the intersection set is purged for all documents which do not satisfy the requested query term proximity. Usually a proper subset of the AND operation.

For example, we can combine the terms information and retrieval into different searches in the default index of the default attribute set as follows. Querying for the union of all documents containing the terms information OR retrieval:

       Z> find @or information retrieval

Querying for the intersection of all documents containing the terms information AND retrieval: The hit set is a subset of the corresponding OR query.

       Z> find @and information retrieval

Querying for the intersection of all documents containing the terms information AND retrieval, taking proximity into account: The hit set is a subset of the corresponding AND query (see the PQF grammar for details on the proximity operator):

       Z> find @prox 0 3 0 2 k 2 information retrieval

Querying for the intersection of all documents containing the terms information AND retrieval, in the same order and near each other as described in the term list. The hit set is a subset of the corresponding PROXIMITY query.

       Z> find "information retrieval"

2.1.3. Atomic queries (APT)

Atomic queries are the query parts which work on one access point only. These consist of an attribute list followed by a single term or a quoted term list, and are often called Attributes-Plus-Terms (APT) queries.

Atomic (APT) queries are always leaf nodes in the PQF query tree. UN-supplied non-use attributes types 2-12 are either inherited from higher nodes in the query tree, or are set to Zebra's default values. See Section 2.3, “BIB-1 Attribute Set” for details.

Table 5.3. Atomic queries (APT)

Name	Type	Notes
attribute list	List of orthogonal attributes	Any of the orthogonal attribute types may be omitted, these are inherited from higher query tree nodes, or if not inherited, are set to the default Zebra configuration values.
term	single term or quoted term list	Here the search terms or list of search terms is added to the query

Querying for the term information in the default index using the default attribute set, the server choice of access point/index, and the default non-use attributes.

       Z> find information

Equivalent query fully specified including all default values:

       Z> find @attrset bib-1 @attr 1=1017 @attr 2=3 @attr 3=3 @attr 4=1 @attr 5=100 @attr 6=1 information

Finding all documents which have the term debussy in the title field.

       Z> find @attr 1=4 debussy

The scan operation is only supported with atomic APT queries, as it is bound to one access point at a time. Boolean query trees are not allowed during scan.

For example, we might want to scan the title index, starting with the term debussy, and displaying this and the following terms in lexicographic order:

       Z> scan @attr 1=4 debussy

2.1.4. Named Result Sets

Named result sets are supported in Zebra, and result sets can be used as operands without limitations. It follows that named result sets are leaf nodes in the PQF query tree, exactly as atomic APT queries are.

After the execution of a search, the result set is available at the server, such that the client can use it for subsequent searches or retrieval requests. The Z30.50 standard actually stresses the fact that result sets are volatile. It may cease to exist at any time point after search, and the server will send a diagnostic to the effect that the requested result set does not exist any more.

Defining a named result set and re-using it in the next query, using yaz-client. Notice that the client, not the server, assigns the string '1' to the named result set.

       Z> f @attr 1=4 mozart
       ...
       Number of hits: 43, setno 1
       ...
       Z> f @and @set 1 @attr 1=4 amadeus
       ...
       Number of hits: 14, setno 2

Note

Named result sets are only supported by the Z39.50 protocol. The SRU web service is stateless, and therefore the notion of named result sets does not exist when accessing a Zebra server by the SRU protocol.

2.1.5. Zebra's special access point of type 'string'

The numeric use (type 1) attribute is usually referred to from a given attribute set. In addition, Zebra let you use any internal index name defined in your configuration as use attribute value. This is a great feature for debugging, and when you do not need the complexity of defined use attribute values. It is the preferred way of accessing Zebra indexes directly.

Finding all documents which have the term list "information retrieval" in an Zebra index, using its internal full string name. Scanning the same index.

       Z> find @attr 1=sometext "information retrieval"
       Z> scan @attr 1=sometext aterm

Searching or scanning the bib-1 use attribute 54 using its string name:

       Z> find @attr 1=Code-language eng
       Z> scan @attr 1=Code-language ""

It is possible to search in any silly string index - if it's defined in your indexing rules and can be parsed by the PQF parser. This is definitely not the recommended use of this facility, as it might confuse your users with some very unexpected results.

       Z> find @attr 1=silly/xpath/alike[@index]/name "information retrieval"

See also Section 3.5, “Mapping from PQF atomic APT queries to Zebra internal register indexes” for details, and the section called “The SRU Server” for the SRU PQF query extension using string names as a fast debugging facility.

2.1.6. Zebra's special access point of type 'XPath' for GRS-1 filters

As we have seen above, it is possible (albeit seldom a great idea) to emulate XPath 1.0 based search by defining use (type 1) string attributes which in appearance resemble XPath queries. There are two problems with this approach: first, the XPath-look-alike has to be defined at indexing time, no new undefined XPath queries can entered at search time, and second, it might confuse users very much that an XPath-alike index name in fact gets populated from a possible entirely different XML element than it pretends to access.

When using the GRS-1 Record Model (see Chapter 9, GRS-1 Record Model and Filter Modules), we have the possibility to embed life XPath expressions in the PQF queries, which are here called use (type 1) xpath attributes. You must enable the xpath enable directive in your .abs configuration files.

Note

Only a very restricted subset of the XPath 1.0 standard is supported as the GRS-1 record model is simpler than a full XML DOM structure. See the following examples for possibilities.

Finding all documents which have the term "content" inside a text node found in a specific XML DOM subtree, whose starting element is addressed by XPath.

       Z> find @attr 1=/root content
       Z> find @attr 1=/root/first content

Notice that the XPath must be absolute, i.e., must start with '/', and that the XPath descendant-or-self axis followed by a text node selection text() is implicitly appended to the stated XPath. It follows that the above searches are interpreted as:

       Z> find @attr 1=/root//text() content
       Z> find @attr 1=/root/first//text() content

Searching inside attribute strings is possible:

       Z> find @attr 1=/link/@creator morten

Filter the addressing XPath by a predicate working on exact string values in attributes (in the XML sense) can be done: return all those docs which have the term "english" contained in one of all text sub nodes of the subtree defined by the XPath /record/title[@lang='en']. And similar predicate filtering.

       Z> find @attr 1=/record/title[@lang='en'] english
       Z> find @attr 1=/link[@creator='sisse'] sibelius
       Z> find @attr 1=/link[@creator='sisse']/description[@xml:lang='da'] sibelius

Combining numeric indexes, boolean expressions, and xpath based searches is possible:

       Z> find @attr 1=/record/title @and foo bar
       Z> find @and @attr 1=/record/title foo @attr 1=4 bar

Escaping PQF keywords and other non-parseable XPath constructs with '{ }' to prevent client-side PQF parsing syntax errors:

       Z> find @attr {1=/root/first[@attr='danish']} content
       Z> find @attr {1=/record/@set} oai

Warning

It is worth mentioning that these dynamic performed XPath queries are a performance bottleneck, as no optimized specialized indexes can be used. Therefore, avoid the use of this facility when speed is essential, and the database content size is medium to large.

2.2. Explain Attribute Set

The Z39.50 standard defines the Explain attribute set Exp-1, which is used to discover information about a server's search semantics and functional capabilities Zebra exposes a "classic" Explain database by base name IR-Explain-1, which is populated with system internal information.

The attribute-set exp-1 consists of a single use attribute (type 1).

In addition, the non-Use BIB-1 attributes, that is, the types Relation, Position, Structure, Truncation, and Completeness are imported from the BIB-1 attribute set, and may be used within any explain query.

2.2.1. Use Attributes (type = 1)

The following Explain search attributes are supported: ExplainCategory (@attr 1=1), DatabaseName (@attr 1=3), DateAdded (@attr 1=9), DateChanged(@attr 1=10).

A search in the use attribute ExplainCategory supports only these predefined values: CategoryList, TargetInfo, DatabaseInfo, AttributeDetails.

See tab/explain.att and the Z39.50 standard for more information.

2.2.2. Explain searches with yaz-client

Classic Explain only defines retrieval of Explain information via ASN.1. Practically no Z39.50 clients supports this. Fortunately they don't have to - Zebra allows retrieval of this information in other formats: SUTRS, XML, GRS-1 and ASN.1 Explain.

List supported categories to find out which explain commands are supported:

       Z> base IR-Explain-1
       Z> find @attr exp1 1=1 categorylist
       Z> form sutrs
       Z> show 1+2

Get target info, that is, investigate which databases exist at this server endpoint:

       Z> base IR-Explain-1
       Z> find @attr exp1 1=1 targetinfo
       Z> form xml
       Z> show 1+1
       Z> form grs-1
       Z> show 1+1
       Z> form sutrs
       Z> show 1+1

List all supported databases, the number of hits is the number of databases found, which most commonly are the following two: the Default and the IR-Explain-1 databases.

       Z> base IR-Explain-1
       Z> find @attr exp1 1=1 databaseinfo
       Z> form sutrs
       Z> show 1+2

Get database info record for database Default.

       Z> base IR-Explain-1
       Z> find @and @attr exp1 1=1 databaseinfo @attr exp1 1=3 Default

Identical query with explicitly specified attribute set:

       Z> base IR-Explain-1
       Z> find @attrset exp1 @and @attr 1=1 databaseinfo @attr 1=3 Default

Get attribute details record for database Default. This query is very useful to study the internal Zebra indexes. If records have been indexed using the alvis XSLT filter, the string representation names of the known indexes can be found.

       Z> base IR-Explain-1
       Z> find @and @attr exp1 1=1 attributedetails @attr exp1 1=3 Default

Identical query with explicitly specified attribute set:

       Z> base IR-Explain-1
       Z> find @attrset exp1 @and @attr 1=1 attributedetails @attr 1=3 Default

2.3. BIB-1 Attribute Set

Most of the information contained in this section is an excerpt of the ATTRIBUTE SET BIB-1 (Z39.50-1995) SEMANTICS found at . The BIB-1 Attribute Set Semantics from 1995, also in an updated BIB-1 Attribute Set version from 2003. Index Data is not the copyright holder of this information, except for the configuration details, the listing of Zebra's capabilities, and the example queries.

2.3.1. Use Attributes (type 1)

A use attribute specifies an access point for any atomic query. These access points are highly dependent on the attribute set used in the query, and are user configurable using the following default configuration files: tab/bib1.att, tab/dan1.att, tab/explain.att, and tab/gils.att.

For example, some few BIB-1 use attributes from the tab/bib1.att are:

       att 1               Personal-name
       att 2               Corporate-name
       att 3               Conference-name
       att 4               Title
       ...
       att 1009            Subject-name-personal
       att 1010            Body-of-text
       att 1011            Date/time-added-to-db
       ...
       att 1016            Any
       att 1017            Server-choice
       att 1018            Publisher
       ...
       att 1035            Anywhere
       att 1036            Author-Title-Subject

New attribute sets can be added by adding new tab/*.att configuration files, which need to be sourced in the main configuration zebra.cfg.

In addition, Zebra allows the access of internal index names and dynamic XPath as use attributes; see Section 2.1.5, “Zebra's special access point of type 'string'” and Section 2.1.6, “Zebra's special access point of type 'XPath' for GRS-1 filters”.

Phrase search for information retrieval in the title-register, scanning the same register afterwards:

       Z> find @attr 1=4 "information retrieval"
       Z> scan @attr 1=4 information

2.4. Zebra general Bib1 Non-Use Attributes (type 2-6)

2.4.1. Relation Attributes (type 2)

Relation attributes describe the relationship of the access point (left side of the relation) to the search term as qualified by the attributes (right side of the relation), e.g., Date-publication <= 1975.

Table 5.4. Relation Attributes (type 2)

Relation	Value	Notes
Less than	1	supported
Less than or equal	2	supported
Equal	3	default
Greater or equal	4	supported
Greater than	5	supported
Not equal	6	unsupported
Phonetic	100	unsupported
Stem	101	unsupported
Relevance	102	supported
AlwaysMatches	103	supported *

Note

AlwaysMatches searches are only supported if alwaysmatches indexing has been enabled. See Section 1, “The default.idx file”

The relation attributes 1-5 are supported and work exactly as expected. All ordering operations are based on a lexicographical ordering, except when the structure attribute numeric (109) is used. In this case, ordering is numerical. See Section 2.4.3, “Structure Attributes (type 4)”.

       Z> find @attr 1=Title @attr 2=1 music
       ...
       Number of hits: 11745, setno 1
       ...
       Z> find @attr 1=Title @attr 2=2 music
       ...
       Number of hits: 11771, setno 2
       ...
       Z> find @attr 1=Title @attr 2=3 music
       ...
       Number of hits: 532, setno 3
       ...
       Z> find @attr 1=Title @attr 2=4 music
       ...
       Number of hits: 11463, setno 4
       ...
       Z> find @attr 1=Title @attr 2=5 music
       ...
       Number of hits: 11419, setno 5

The relation attribute Relevance (102) is supported, see Section 9, “Relevance Ranking and Sorting of Result Sets” for full information.

Ranked search for information retrieval in the title-register:

       Z> find @attr 1=4 @attr 2=102 "information retrieval"

The relation attribute AlwaysMatches (103) is in the default configuration supported in conjecture with structure attribute Phrase (1) (which may be omitted by default). It can be configured to work with other structure attributes, see the configuration file tab/default.idx and Section 3.5, “Mapping from PQF atomic APT queries to Zebra internal register indexes”.

AlwaysMatches (103) is a great way to discover how many documents have been indexed in a given field. The search term is ignored, but needed for correct PQF syntax. An empty search term may be supplied.

       Z> find @attr 1=Title  @attr 2=103  ""
       Z> find @attr 1=Title  @attr 2=103  @attr 4=1 ""

2.4.2. Position Attributes (type 3)

The position attribute specifies the location of the search term within the field or subfield in which it appears.

Table 5.5. Position Attributes (type 3)

Position	Value	Notes
First in field	1	supported *
First in subfield	2	supported *
Any position in field	3	default

Note

Zebra only supports first-in-field seaches if the firstinfield is enabled for the index Refer to Section 1, “The default.idx file”. Zebra does not distinguish between first in field and first in subfield. They result in the same hit count. Searching for first position in (sub)field in only supported in Zebra 2.0.2 and later.

2.4.3. Structure Attributes (type 4)

The structure attribute specifies the type of search term. This causes the search to be mapped on different Zebra internal indexes, which must have been defined at index time.

The possible values of the structure attribute (type 4) can be defined using the configuration file tab/default.idx. The default configuration is summarized in this table.

Table 5.6. Structure Attributes (type 4)

Structure	Value	Notes
Phrase	1	default
Word	2	supported
Key	3	supported
Year	4	supported
Date (normalized)	5	supported
Word list	6	supported
Date (un-normalized)	100	unsupported
Name (normalized)	101	unsupported
Name (un-normalized)	102	unsupported
Structure	103	unsupported
Urx	104	supported
Free-form-text	105	supported
Document-text	106	supported
Local-number	107	supported
String	108	unsupported
Numeric string	109	supported

The structure attribute values Word list (6) is supported, and maps to the boolean AND combination of words supplied. The word list is useful when Google-like bag-of-word queries need to be translated from a GUI query language to PQF. For example, the following queries are equivalent:

       Z> find @attr 1=Title @attr 4=6 "mozart amadeus"
       Z> find @attr 1=Title  @and mozart amadeus

The structure attribute value Free-form-text (105) and Document-text (106) are supported, and map both to the boolean OR combination of words supplied. The following queries are equivalent:

       Z> find @attr 1=Body-of-text @attr 4=105 "bach salieri teleman"
       Z> find @attr 1=Body-of-text @attr 4=106 "bach salieri teleman"
       Z> find @attr 1=Body-of-text @or bach @or salieri teleman

This OR list of terms is very useful in combination with relevance ranking:

       Z> find @attr 1=Body-of-text @attr 2=102 @attr 4=105 "bach salieri teleman"

The structure attribute value Local number (107) is supported, and maps always to the Zebra internal document ID, irrespectively which use attribute is specified. The following queries have exactly the same unique record in the hit set:

       Z> find @attr 4=107 10
       Z> find @attr 1=4 @attr 4=107 10
       Z> find @attr 1=1010 @attr 4=107 10

In the GILS schema (gils.abs), the west-bounding-coordinate is indexed as type n, and is therefore searched by specifying structure=Numeric String. To match all those records with west-bounding-coordinate greater than -114 we use the following query:

       Z> find @attr 4=109 @attr 2=5 @attr gils 1=2038 -114

Note

The exact mapping between PQF queries and Zebra internal indexes and index types is explained in Section 3.5, “Mapping from PQF atomic APT queries to Zebra internal register indexes”.

2.4.4. Truncation Attributes (type = 5)

The truncation attribute specifies whether variations of one or more characters are allowed between search term and hit terms, or not. Using non-default truncation attributes will broaden the document hit set of a search query.

Table 5.7. Truncation Attributes (type 5)

Truncation	Value	Notes
Right truncation	1	supported
Left truncation	2	supported
Left and right truncation	3	supported
Do not truncate	100	default
Process # in search term	101	supported
RegExpr-1	102	supported
RegExpr-2	103	supported

The truncation attribute values 1-3 perform the obvious way:

       Z> scan @attr 1=Body-of-text  schnittke
       ...
       * schnittke (81)
       schnittkes (31)
       schnittstelle (1)
       ...
       Z> find @attr 1=Body-of-text  @attr 5=1 schnittke
       ...
       Number of hits: 95, setno 7
       ...
       Z> find @attr 1=Body-of-text  @attr 5=2 schnittke
       ...
       Number of hits: 81, setno 6
       ...
       Z> find @attr 1=Body-of-text  @attr 5=3 schnittke
       ...
       Number of hits: 95, setno 8

The truncation attribute value Process # in search term (101) is a poor-man's regular expression search. It maps each # to .*, and performs then a Regexp-1 (102) regular expression search. The following two queries are equivalent:

       Z> find @attr 1=Body-of-text  @attr 5=101 schnit#ke
       Z> find @attr 1=Body-of-text  @attr 5=102 schnit.*ke
       ...
       Number of hits: 89, setno 10

The truncation attribute value Regexp-1 (102) is a normal regular search, see Section 3.6, “Zebra Regular Expressions in Truncation Attribute (type = 5)” for details.

       Z> find @attr 1=Body-of-text  @attr 5=102 schnit+ke
       Z> find @attr 1=Body-of-text  @attr 5=102 schni[a-t]+ke

The truncation attribute value Regexp-2 (103) is a Zebra specific extension which allows fuzzy matches. One single error in spelling of search terms is allowed, i.e., a document is hit if it includes a term which can be mapped to the used search term by one character substitution, addition, deletion or change of position.

       Z> find @attr 1=Body-of-text  @attr 5=100 schnittke
       ...
       Number of hits: 81, setno 14
       ...
       Z> find @attr 1=Body-of-text  @attr 5=103 schnittke
       ...
       Number of hits: 103, setno 15
       ...

2.4.5. Completeness Attributes (type = 6)

The Completeness Attributes (type = 6) is used to specify that a given search term or term list is either part of the terms of a given index/field (Incomplete subfield (1)), or is what literally is found in the entire field's index (Complete field (3)).

Table 5.8. Completeness Attributes (type = 6)

Completeness	Value	Notes
Incomplete subfield	1	default
Complete subfield	2	deprecated
Complete field	3	supported

The Completeness Attributes (type = 6) is only partially and conditionally supported in the sense that it is ignored if the hit index is not of structure type="w" or type="p".

Incomplete subfield (1) is the default, and makes Zebra use register type="w", whereas Complete field (3) triggers search and scan in index type="p".

The Complete subfield (2) is a reminiscent from the happy MARC binary format days. Zebra does not support it, but maps silently to Complete field (3).

Note

The exact mapping between PQF queries and Zebra internal indexes and index types is explained in Section 3.5, “Mapping from PQF atomic APT queries to Zebra internal register indexes”.