Installation instructions for the OCKHAM Harvest to Query Service (H2Q)
Version 0.5.3
Installation of H2Q is fairly straight-forward. At this time, it is a bit more complicated than we'd like it to be, but remember, we still haven't hit the 1.0 version yet.
1) First, download the H2Q package from here:
http://wiki.osuosl.org/download/attachments/1797/h2q-0.5.3.tar.gz?version=1
2) Next, you will need to download and install the following prerequisites:
YAZ - Z39.50 client/server. http://indexdata.dk/yaz
The following PERL modules (all can be had via RPM, apt-get, or CPAN):
- LWP::UserAgent
- Net::OAI::Harvester
- XML::SAX::Expat
- Date::Calc
TCL Scripting Language - you need to make sure that tcl8.3 or greater is installed, as well as the development libraries (Zebra needs tcl-devel in order to build properly - most Linux distributions should already have this installed. You can go to http://tcl.sourceforge.net/
if you need to install a newer version).
Zebra - XML-based, high performance structured text indexing and retrieval engine. http://www.indexdata.dk/zebra/
Apache - Web server. H2Q has been tested with the latest 1.x version of apache, but should work with all versions. http://httpd.apache.org
Php - Php scripting language. http://www.php.net/
(tested with the latest 4.x version)
3) Untar the H2Q package directly into your htdocs directory. This should place the necessary files and directories in the location where your web server will access H2Q.
4) Both Apache and the owner of the Zebra process will need write permission in the zebra subdirectory under htdocs. It is recommended that the same user runs the process for both apache and zebra. Change the permissions of the zebra subdirectory so that it is writable by this user.
a. Change to the directory where you untarred the files (usually htdocs)
b. Make sure that apache has permission to read the files in the 'zebra' directory
c. The owner of the Zebra server process needs to have writing permission in the 'zebra' folder and all 'zebra' subfolders as well.
d. Apache should have read/write/execute permissions for all H2Q folders:
- ./config
- ./records
- ./status
- ./zebra
- ./zebra/database
- ./zebra/shadow
e. Apache should have write permissions to all *.LCK files inside of the ./zebra folder
5) Set up global path to H2Q
You need to replace global path to your H2Q installation path. Replace '/home/trubin/public_html/' with your own path in both of the following files: scheduler.pl, scheduler.cfg
6) Configure cron (Unix scheduler). If apache username is different than yours, you will need to be a superuser to run crontab. Run a command 'crontab -u apache_username ./scheduler.cfg' (NOTE: in some linux distributions, the command syntax is 'crontab ./scheduler.cfg -u apache_username')
7) You now need to make 2 symbollic links in the zebra directory under your htdocs folder. The 2 symbollic links should look like the following:
ln -s /LOCATION/TO/ZEBRAIDX/zebraidx
ln -s /LOCATION/TO/ZEBRASRV/zebrasrv
8) Go into your htdocs/zebra directory, and run the zebra server by typing the following command:
./zebrasrv @:2100 &
Everything should now be running. Point your browser to http://yourhost/index.php
, and try harvesting an OAI repository. Once you successfully harvest the repository, check the permissions of the files in the htdocs/zebra folder. Zebra will create several new files, which are likely to be owned by www-data (or whatever user the web server is running as). Add writing permissions to them so that Zebra background process owned by a user can write to them. Then, fire up yaz, connect to your zebra server (remember, on port 2100 or whatever port you chose), and do a scan for a term. You should get back a successful list of terms. If this occurs, everything is working as it should!