Zebra - User's Guide and Reference

Adam Dickmeiss

Heikki Levanto

Marc Cromme

Mike Taylor

Sebastian Hammer

2.1.4

Abstract

Zebra is a free, fast, friendly information management system. It can index records in XML, SGML, MARC, e-mail archives and many other formats, and quickly find them using a combination of boolean searching and relevance ranking. Search-and-retrieve applications can be written using APIs in a wide variety of languages, communicating with the Zebra server using industry-standard information-retrieval protocols or web services.

This manual explains how to build and install Zebra, configure it appropriately for your application, add data and set up a running information service. It describes version 2.1.4 of Zebra.


Table of Contents

1. Introduction
1. Overview
2. Zebra Features Overview
2.1. Zebra Document Model
2.2. Zebra Search Features
2.3. Zebra Index Scanning
2.4. Zebra Document Presentation
2.5. Zebra Sorting and Ranking
2.6. Zebra Live Updates
2.7. Zebra Networked Protocols
2.8. Zebra Data Size and Scalability
2.9. Zebra Supported Platforms
3. References and Zebra based Applications
3.1. Koha free open-source ILS
3.2. Kete Open Source Digital Library and Archiving software
3.3. ReIndex.Net web based ILS
3.4. DADS - the DTV Article Database Service
3.5. ULS (Union List of Serials)
3.6. Various web indexes
4. Support
2. Installation
1. UNIX
2. GNU/Debian
2.1. GNU/Debian Linux on amd64/i386 Platform
2.2. GNU/Debian and Ubuntu on other architectures
3. Windows
4. Upgrading from Zebra version 1.3.x
3. Tutorial
1. A first OAI indexing example
2. Searching the OAI database by web service
3. Presenting search results in different formats
4. More interesting searches
5. Investigating the content of the indexes
6. Setting up a correct SRU web service
7. Searching the OAI database by Z39.50 protocol
4. Overview of Zebra Architecture
1. Local Representation
2. Main Components
2.1. Core Zebra Libraries Containing Common Functionality
2.2. Zebra Indexer
2.3. Zebra Searcher/Retriever
2.4. YAZ Server Frontend
2.5. Record Models and Filter Modules
2.5.1. DOM XML Record Model and Filter Module
2.5.2. ALVIS XML Record Model and Filter Module
2.5.3. GRS-1 Record Model and Filter Modules
2.5.4. TEXT Record Model and Filter Module
3. Indexing and Retrieval Workflow
4. Retrieval of Zebra internal record data
5. Query Model
1. Query Model Overview
1.1. Query Languages
1.1.1. Prefix Query Format (PQF)
1.1.2. Common Query Language (CQL)
1.2. Operation types
1.2.1. Explain Operation
1.2.2. Search Operation
1.2.3. Scan Operation
2. RPN queries and semantics
2.1. RPN tree structure
2.1.1. Attribute sets
2.1.2. Boolean operators
2.1.3. Atomic queries (APT)
2.1.4. Named Result Sets
2.1.5. Zebra's special access point of type 'string'
2.1.6. Zebra's special access point of type 'XPath' for GRS-1 filters
2.2. Explain Attribute Set
2.2.1. Use Attributes (type = 1)
2.2.2. Explain searches with yaz-client
2.3. BIB-1 Attribute Set
2.3.1. Use Attributes (type 1)
2.4. Zebra general Bib1 Non-Use Attributes (type 2-6)
2.4.1. Relation Attributes (type 2)
2.4.2. Position Attributes (type 3)
2.4.3. Structure Attributes (type 4)
2.4.4. Truncation Attributes (type = 5)
2.4.5. Completeness Attributes (type = 6)
3. Extended Zebra RPN Features
3.1. Zebra specific retrieval of all records
3.2. Zebra specific Search Extensions to all Attribute Sets
3.2.1. Zebra Extension Embedded Sort Attribute (type 7)
3.2.2. Zebra Extension Rank Weight Attribute (type 9)
3.2.3. Zebra Extension Term Reference Attribute (type 10)
3.2.4. Local Approximative Limit Attribute (type 11)
3.2.5. Global Approximative Limit Attribute (type 12)
3.3. Zebra specific Scan Extensions to all Attribute Sets
3.3.1. Zebra Extension Result Set Narrow (type 8)
3.3.2. Zebra Extension Approximative Limit (type 12)
3.4. Zebra special IDXPATH Attribute Set for GRS-1 indexing
3.4.1. IDXPATH Use Attributes (type = 1)
3.5. Mapping from PQF atomic APT queries to Zebra internal register indexes
3.5.1. Mapping of PQF APT access points
3.5.2. Mapping of PQF APT structure and completeness to register type
3.6. Zebra Regular Expressions in Truncation Attribute (type = 5)
4. Server Side CQL to PQF Query Translation
6. Administrating Zebra
1. Record Types
2. The Zebra Configuration File
3. Locating Records
4. Indexing with no Record IDs (Simple Indexing)
5. Indexing with File Record IDs
6. Indexing with General Record IDs
7. Register Location
8. Safe Updating - Using Shadow Registers
8.1. Description
8.2. How to Use Shadow Register Files
9. Relevance Ranking and Sorting of Result Sets
9.1. Overview
9.2. Static Ranking
9.3. Dynamic Ranking
9.3.1. Dynamically ranking using PQF queries with the 'rank-1' algorithm
9.3.2. Dynamically ranking CQL queries
9.4. Sorting
10. Extended Services: Remote Insert, Update and Delete
10.1. Extended services in the Z39.50 protocol
10.2. Extended services from yaz-client
10.3. Extended services from yaz-php
10.4. Extended services debugging guide
7. DOM XML Record Model and Filter Module
1. DOM Record Filter Architecture
2. DOM XML filter pipeline configuration
2.1. Input pipeline
2.2. Extract pipeline
2.3. Store pipeline
2.4. Retrieve pipeline
2.5. Canonical Indexing Format
2.5.1. Processing-instruction governed indexing format
2.5.2. Magic element governed indexing format
2.5.3. Semantics of the indexing formats
3. DOM Record Model Configuration
3.1. DOM Indexing Configuration
3.2. DOM Indexing MARCXML
3.3. DOM Indexing Wizardry
3.4. Debuggig DOM Filter Configurations
8. ALVIS XML Record Model and Filter Module
1. ALVIS Record Filter
1.1. ALVIS Internal Record Representation
1.2. ALVIS Canonical Indexing Format
2. ALVIS Record Model Configuration
2.1. ALVIS Indexing Configuration
2.2. ALVIS Exchange Formats
2.3. ALVIS Filter OAI Indexing Example
9. GRS-1 Record Model and Filter Modules
1. GRS-1 Record Filters
1.1. GRS-1 Canonical Input Format
1.1.1. Record Root
1.1.2. Variants
1.2. GRS-1 REGX And TCL Input Filters
2. GRS-1 Internal Record Representation
2.1. Tagged Elements
2.2. Variants
2.3. Data Elements
3. GRS-1 Record Model Configuration
3.1. The Abstract Syntax
3.2. The Configuration Files
3.3. The Abstract Syntax (.abs) Files
3.4. The Attribute Set (.att) Files
3.5. The Tag Set (.tag) Files
3.6. The Variant Set (.var) Files
3.7. The Element Set (.est) Files
3.8. The Schema Mapping (.map) Files
3.9. The MARC (ISO2709) Representation (.mar) Files
4. GRS-1 Exchange Formats
5. Extended indexing of MARC records
5.1. The index-formula
5.2. Notation of index-formula for Zebra
5.2.1. Examples
10. Field Structure and Character Sets
1. The default.idx file
2. Charmap Files
3. ICU Chain Files
I. Reference
zebraidxZebra Administrative Tool
zebrasrv — Zebra Server
idzebra-config — Script to get information about idzebra
idzebra-abs2dom — Converts .abs files to DOM XML configuration files
A. License
B. GNU General Public License
1. Preamble
2. TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2.1. Section 0
2.2. Section 1
2.3. Section 2
2.4. Section 3
2.5. Section 4
2.6. Section 5
2.7. Section 6
2.8. Section 7
2.9. Section 8
2.10. Section 9
2.11. Section 10
2.12. NO WARRANTY Section 11
2.13. Section 12
3. How to Apply These Terms to Your New Programs
C. About Index Data and the Zebra Server

List of Figures

7.1. DOM XML filter architecture

List of Tables

1.1. Zebra document model
1.2. Zebra search functionality
1.3. Zebra index scanning
1.4. Zebra document presentation
1.5. Zebra sorting and ranking
1.6. Zebra live updates
1.7. Zebra networked protocols
1.8. Zebra data size and scalability
1.9. Zebra supported platforms
4.1. Special Retrieval Elements
5.1. Attribute sets predefined in Zebra
5.2. Boolean operators
5.3. Atomic queries (APT)
5.4. Relation Attributes (type 2)
5.5. Position Attributes (type 3)
5.6. Structure Attributes (type 4)
5.7. Truncation Attributes (type 5)
5.8. Completeness Attributes (type = 6)
5.9. Zebra Search Attribute Extensions
5.10. Zebra Scan Attribute Extensions
5.11. Zebra specific IDXPATH Use Attributes (type 1)
5.12. Access point name mapping
5.13. Structure and completeness mapping to register types
5.14. Regular Expression Operands
5.15. Regular Expression Operators
6.1. Extended services Z39.50 Package Fields
7.1. DOM XML filter pipelines overview
10.1. Character maps predefined in Zebra

List of Examples

10.1. Field types
10.2. Indexing Greek text
10.3. MARCXML indexing using ICU