As root install:
yum install redis
and start redis:
systemctl enable redis
systemctl start redis
As user ckan:
. default/bin/activate
cd default/src
git clone https://github.com/ckan/ckanext-harvest.git
cd ckanext-harvest/
pip install -e .
pip install -r pip-requirements.txt
Edit file /etc/ckan/default/production.ini and add the harvester related plugins:
ckan.plugins = [...] harvest ckan_harvester
ckan.harvest.mq.type = redis
Init the db for the harvester services:
paster --plugin=ckanext-harvest harvester initdb --config=/etc/ckan/default/production.ini
Running harvesting procedure requires issuing a couple of command lines. It’s handy to create a script file that runs them. We’ll use the same script to run the cron’ed harvest.
Create the file /usr/lib/ckan/run_harvester.sh and add the following lines:
#!/bin/bash
. /usr/lib/ckan/default/bin/activate
paster --plugin=ckanext-harvest harvester job-all --config=/etc/ckan/default/production.ini
paster --plugin=ckanext-harvest harvester run --config=/etc/ckan/default/production.ini
and make it executable:
chmod +x /usr/lib/ckan/run_harvester.sh
Important
The “official” script would only require the line with harvester run. The line containing the harvester job-all is an additional command that will force harvesting from all the configured sources.
See ckan_harvesting_running for further details.
Add a cron job for the harvester:
crontab -e -u ckan
Add in the crontab the following line to run the harvesting every 15 minutes:
*/15 * * * * /usr/lib/ckan/run_harvester.sh
CKAN does not provide a default script for autostarting; we’ll use the supervisord daemon to do that.
You should already have installed it when installing CKAN at supervisord configuration
As root create the file /etc/supervisord.d/gather.ini and put in it these lines:
[program:ckan_gather_consumer]
command=/usr/lib/ckan/default/bin/paster --plugin=ckanext-harvest harvester gather_consumer --config=/etc/ckan/default/production.ini
user=ckan
autostart=true
autorestart=true
numprocs=1
log_stdout=true
log_stderr=true
stdout_logfile=/var/log/ckan/gather_out.log
stderr_logfile=/var/log/ckan/gather_err.log
logfile=/var/log/ckan/gather.log
startsecs=10
startretries=3
Create file /etc/supervisord.d/fetch.ini and put in this content:
[program:ckan_fetch_consumer]
command=/usr/lib/ckan/default/bin/paster --plugin=ckanext-harvest harvester fetch_consumer --config=/etc/ckan/default/production.ini
user=ckan
autostart=true
autorestart=true
numprocs=1
log_stdout=true
log_stderr=true
stdout_logfile=/var/log/ckan/fetch_out.log
stderr_logfile=/var/log/ckan/fetch_err.log
logfile=/var/log/ckan/fetch.log
startsecs=10
startretries=3
Then restart supervisord:
systemctl restart supervisord