Ckanext-dataextractor is a powerful extension for CKAN that streamlines data extraction, manipulation, and integration processes. With customizable configurations, comprehensive documentation, and support resources, users can efficiently retrieve and refine datasets hosted on CKAN instances, enabling seamless integration into their data workflows. This extension empowers organizations to unlock valuable insights and make informed decisions based on the data extracted from CKAN.
This CKAN extension offers a simple installation process, and we will guide you through the steps to set it up and start leveraging its powerful data extraction capabilities within your CKAN environment.
Let’s start!
Requirements
For example, you might want to mention here which versions of CKAN this extension works with.
Installation
To install ckanext-dataextractor:
Activate your CKAN virtual environment, for example:
. /usr/lib/ckan/default/bin/activate
Install the ckanext-dataextractor Python package into your virtual environment
pip install ckanext-dataextractor
Add dataextractor to the ckan.plugins setting in your CKAN config file (by default the config file is located at /etc/ckan/default/production.ini).
Restart CKAN. For example if you’ve deployed CKAN with Apache on Ubuntu:
sudo service apache2 reload
Config Settings
Add Azure storage account settings:
ckanext.dataextractor.azure_storage_account_name = ...
ckanext.dataextractor.azure_storage_account_key = ...
ckanext.dataextractor.azure_storage_container_name = ...
Add blobs expiration in days config:
ckanext.dataextractor.blob_expiration_days = ...
Add resource rows per page limit, default max is 10:
ckanext.dataextractor.resource_rows_limit = ...
Add pagination pages shown limit, default max is 6:
ckanext.dataextractor.pagination_limit = ...
Limit the number of records shown when using datastore_resource_search action (defaults to 10000):
ckanext.dataextractor.default_search_limit = ...
Setup query timeout limit (in milliseconds) for datastore read- only account (defaults to 60000):
ckanext.dataextractor.query_timeout = ...
Override default search limit and retrieve/download all data for a given resource (defaults to False):
ckanext.dataextractor.enable_full_download = ...
Change datastore root url shown in the examples in Data API window:
ckanext.dataextractor.datastore_root_url = ...
Development Installation
To install ckanext-dataextractor for development, activate your CKAN virtualenv and do:
git clone https://github.com/viderumglobal/ckanext-dataextractor.git
cd ckanext-dataextractor
python setup.py develop
pip install -r dev-requirements.txt
Documentation
In order to view the documentation for all API actions open documentation/index.html
.
If you want to update or rebuild the documentation please visit the guide for writing documentation.
Running the Tests
To run the tests, do:
nosetests --nologcapture --with-pylons=test.ini
To run the tests and produce a coverage report, first make sure you have coverage installed in your virtualenv (pip install coverage
) then run:
nosetests --nologcapture --with-pylons=test.ini --with-coverage --cover-package=ckanext.dataextractor --cover-inclusive --cover-erase --cover-tests
Registering ckanext-dataextractor on PyPI
ckanext-dataextractor should be availabe on PyPI as https://pypi.python.org/pypi/ckanext-dataextractor. If that link doesn’t work, then you can register the project on PyPI for the first time by following these steps:
- Create a source distribution of the project:
python setup.py sdist
- Register the project:
python setup.py register
- Upload the source distribution to PyPI:
python setup.py sdist upload
- Tag the first release of the project on GitHub with the version number from the
setup.py
file. For example if the version number insetup.py
is 0.0.1 then do:
git tag 0.0.1
git push --tags
Releasing a New Version of ckanext-dataextractor
ckanext-dataextractor is availabe on PyPI as https://pypi.python.org/pypi/ckanext-dataextractor. To publish a new version to PyPI follow these steps:
- Update the version number in the
setup.py
file. See PEP 440 for how to choose version numbers. - Create a source distribution of the new version:
python setup.py sdist
- Upload the source distribution to PyPI:
python setup.py sdist upload
- Tag the new release of the project on GitHub with the version number from the setup.py file. For example if the version number in setup.py is 0.0.2 then do:
git tag 0.0.2
git push --tags
Conclusion:
Ckanext-dataextractor simplifies data extraction from CKAN, enabling users to efficiently retrieve and manipulate datasets. With its seamless integration, customizable configurations, and support resources, ckanext-dataextractor helps organizations unlock their CKAN-hosted data’s full potential.