Introduction
Having worked with CKAN for quite some time but also due to the nature of how robust and flexible CKAN is I decided to write a short step by step guide on how you can set up and run CKAN instances with ease. Oh, and did I forget to mention that CKAN is an open-source and completely free software?!
For those who aren’t already familiar with the Comprehensive Knowledge Archive Network (CKAN), it is a tool for making open data websites. Think of it as a content management system like WordPress – but for data, instead of pages and blog posts. It helps you manage and publish collections of data. It is used by national and local governments, research institutions, and other organizations that collect a lot of data.
Once your data is published, users can use its faceted search features to browse and find the data they need and preview it using maps, graphs, and tables – whether they are developers, journalists, researchers, NGOs, citizens, or even your staff.
Okay, enough boring stuff – let’s get to the point!
Prerequisites
This guide will be heavily dependent on docker so some docker knowledge will be useful but not essential as we’ll provide all docker images, steps, and commands.
First, let’s install and setup git, docker, and docker-compose. To do so follow the provided instructions in the following links:
Once you’ve successfully installed git, docker, and docker-compose we’re ready to go to the next step.
Setting up and running a docker-based CKAN
We, at Keitaro, acknowledge and understand the value of open-source software hence we created a dockerized CKAN setup which enables you to easily run, update and extend CKAN in the most efficient way when using containerized solutions.
Our images are based on Alpine Linux and include only the required extensions to start a CKAN instance. The docker images are built using a multi-stage docker approach to produce slim production-grade docker images with the right libraries and configuration. This multi-stage approach allows us to build python binary wheels in the build stages that later on we install in the main stage.
The first step would be to clone our repository by executing the following command in the terminal: git clone [email protected]:keitaroinc/docker-ckan.git
NOTE: we’re using git protocol when cloning the repository but feel free to change it and use https instead with: git clone https://github.com/keitaroinc/docker-ckan.git
Once the cloning is completed navigate to the newly created folder docker-ckan/compose
and simply type: docker-compose up -d --build
and voila in a few minutes (depending on your connection) after all images are pulled and built CKAN should be up and running.
To verify this open your preferred browser and go to http://localhost:5000.
Simple enough, isn’t it?! Let’s try to explain what’s happening behind the scenes and how you can create your own custom images.
IMPORTANT: The default setup creates an admin user by default with the following credentials:
- user: sysadmin
- password: password
It is highly recommended that you change the default credentials!
The -d
flag from the docker-compose up -d --build
is so that containers are run in the background. You can view the logs from all running containers with: docker-compose logs -f
or docker-compose logs -f CONTAINER_NAME
for a specific service and exit by sending a Ctrl+C signal.
If we examine the docker-compose.yml
file, we can see that this file contains the definition for all necessary services such as the PostgreSQL database, solr, redis, datapusher, and CKAN along with the respective configuration for each service.
If you have a technical background, you will immediately notice that our setup heavily relies on environment variables that we use during build time or on runtime to configure CKAN. Runtime variables are located in .ckan-env
while build-time variables are in .env
.
This way of dynamically configuring CKAN during the runtime is enabled by ckanext-envvars extension which is pre-installed and enabled in our core CKAN image which leads us to our next step – how to extend CKAN.
How to extend and modify default CKAN behavior by using extensions
As mentioned in the previous section, the ckanext-envvars extension allows us to dynamically configure CKAN on the fly so we’ll make use of this and extend the core CKAN image with the ckanext-disqus extension to enable users to comment on datasets by using disqus.
Create a file named Dockerfile
which will contain our custom build. Edit the Dockerfile
and add the following snippet:
###################
### Extensions ####
###################
FROM keitaro/ckan:2.9.1 as extbuild
# Switch to the root user
USER root
# Install any system packages necessary to build extensions
RUN apk add --no-cache python3-dev
# Locations and tags of additional CKAN extensions
ENV DISQUS_GIT_URL=https://github.com/keitaroinc/ckanext-disqus
ENV DISQUS_GIT_BRANCH=ckan-2.9
# Fetch and build the custom CKAN extensions
RUN pip wheel --wheel-dir=/wheels git+${DISQUS_GIT_URL}@${DISQUS_GIT_BRANCH}#egg=ckanext-disqus
############
### MAIN ###
############
FROM keitaro/ckan:2.9.1
# Add the custom extensions to the plugins list
ENV CKAN__PLUGINS envvars image_view text_view recline_view datastore datapusher disqus
# Switch to the root user
USER root
COPY --from=extbuild /wheels /srv/app/ext_wheels
# Install and enable the custom extensions
RUN pip install --no-index --find-links=/srv/app/ext_wheels ckanext-disqus && \
ckan config-tool ${APP_DIR}/production.ini "ckan.plugins = ${CKAN__PLUGINS}" && \
chown -R ckan:ckan /srv/app
# Remove wheels
RUN rm -rf /srv/app/ext_wheels
# Switch to the ckan user
USER ckan
Next, we need to edit the .ckan-env
file and append disqus
to the list of plugins on line 14:
CKAN__PLUGINS=envvars image_view text_view recline_view datastore datapusher disqus
as well as add configuration for disqus at the end of the file:
# Disqus
CKAN___DISQUS__NAME=some-disqus-name
Before we can test our new extended image we need to stop the running services with docker-compose down -v
and edit the docker-compose.yml
file to configure docker to use the newly built Dockerfile
.
Open the docker-compose.yml
and replace the line number 12
image: keitaro/ckan:${CKAN_VERSION}
with
build:
context: .
dockerfile: Dockerfile
With this change, instead of using our pre-built image, we’re making sure that CKAN will be started using the custom Dockerfile that we just created. Now you can run docker-compose up -d --build
again and browse the instance in which we installed and enabled the disqus extension.
To verify this you can check the list of running plugins by visiting the following CKAN API URL: http://localhost:5000/api/3/action/status_show
If you are eager to learn more you can visit our docker-ckan repository and explore the codebase. You can find a bit more complex examples here.
We’re trying to keep the documentation and readme up to date but please do reach out if we missed something.
If you are interested in the pre-built images, you can also visit us on dockerhub.
Final thoughts
To sum up, we’ve familiarized ourselves with our dockerized CKAN setup, we’ve covered the use of docker and multi-stage builds to build and extend CKAN and docker-compose so we can spin up an integrated environment for CKAN but wait… there’s more.
If by any chance you are using kubernetes and helm, we got you covered! We also have our own CKAN helm chart that you can use to deploy a CKAN instance on your cluster. If you are interested in this, stay tuned with us as we’ll do another blog post on how you can use our CKAN helm chart.
Happy coding! 😀