ODN Manual 1.0.0 wip

Table Of Contents

Previous topic

DataStore plugin

Next topic

Spatial plugin

This Page

Harvester plugin

As root install:

yum install redis

and start redis:

systemctl enable redis
systemctl start redis

Installing the CKAN harvester

As user ckan:

. default/bin/activate
cd default/src
git clone https://github.com/ckan/ckanext-harvest.git
cd ckanext-harvest/
pip install -e .
pip install -r pip-requirements.txt

Edit file /etc/ckan/default/production.ini and add the harvester related plugins:

ckan.plugins = [...] harvest ckan_harvester
ckan.harvest.mq.type = redis

Init the db for the harvester services:

paster --plugin=ckanext-harvest harvester initdb --config=/etc/ckan/default/production.ini

Script harvesting

Running harvesting procedure requires issuing a couple of command lines. It’s handy to create a script file that runs them. We’ll use the same script to run the cron’ed harvest.

Create the file /usr/lib/ckan/run_harvester.sh and add the following lines:

#!/bin/bash

. /usr/lib/ckan/default/bin/activate

paster --plugin=ckanext-harvest harvester job-all --config=/etc/ckan/default/production.ini
paster --plugin=ckanext-harvest harvester run     --config=/etc/ckan/default/production.ini

and make it executable:

chmod +x /usr/lib/ckan/run_harvester.sh

Important

The “official” script would only require the line with harvester run. The line containing the harvester job-all is an additional command that will force harvesting from all the configured sources.

See ckan_harvesting_running for further details.

Periodic harvesting

Add a cron job for the harvester:

crontab -e -u ckan

Add in the crontab the following line to run the harvesting every 15 minutes:

*/15 * * * * /usr/lib/ckan/run_harvester.sh

supervisord configuration

CKAN does not provide a default script for autostarting; we’ll use the supervisord daemon to do that.

You should already have installed it when installing CKAN at supervisord configuration

As root create the file /etc/supervisord.d/gather.ini and put in it these lines:

[program:ckan_gather_consumer]
command=/usr/lib/ckan/default/bin/paster --plugin=ckanext-harvest harvester gather_consumer --config=/etc/ckan/default/production.ini
user=ckan
autostart=true
autorestart=true
numprocs=1
log_stdout=true
log_stderr=true
stdout_logfile=/var/log/ckan/gather_out.log
stderr_logfile=/var/log/ckan/gather_err.log
logfile=/var/log/ckan/gather.log
startsecs=10
startretries=3

Create file /etc/supervisord.d/fetch.ini and put in this content:

[program:ckan_fetch_consumer]
command=/usr/lib/ckan/default/bin/paster --plugin=ckanext-harvest harvester fetch_consumer --config=/etc/ckan/default/production.ini
user=ckan
autostart=true
autorestart=true
numprocs=1
log_stdout=true
log_stderr=true
stdout_logfile=/var/log/ckan/fetch_out.log
stderr_logfile=/var/log/ckan/fetch_err.log
logfile=/var/log/ckan/fetch.log
startsecs=10
startretries=3

Then restart supervisord:

systemctl restart supervisord