Spatial Analytics

How to use QGIS spatial algorithms with python scripts?

Nikhil Hubballi

Nikhil Hubballi

9 minutes

How to use QGIS spatial algorithms with python scripts?

Spatial Analytics

QGIS is one of the first tools you come across when you learn about GIS and spatial analysis. You can handle almost every aspect of spatial data and its processing using this open-source software package. Even though it has extensive GUI features to work on, sometimes it's essential to have a way to deal with scripts. Especially for data scientists and data engineers building workflows, the need to have automated scripts that are scalable is high. QGIS offers a Python API called PyQGIS for this very purpose. You can automate most of the QGIS related actions and spatial algorithms through python scripts. Let's explore more on this Python API and learn how to use the QGIS spatial algorithms on python.

If you would like to read more about how location(geospatial) data is rising in importance and is the way forward in data analytics, check out my article on the topic here.

QGIS, as part of its python API, offers a python console on its software. You can use this to access almost anything ranging from QGIS menus and layers to running some algorithms on the data layers. The console does a decent job of handling some small-scale execution of functions. But, if the goal is working with complex workflows and handling larger datasets, this console loses its lustre. It has basic functionalities and lacks the sophistication required for complex workflows.

You can't import the QGIS library from your default python environment. QGIS has its version of python installed to handle all the required modules to run the software. So if you need to use the QGIS library from the python console/jupyter notebook, you need to make sure your python can find the QGIS library paths. Or you can install the QGIS library in your python environment. Let's look at the options in more detail.

Install QGIS libraries using Conda

If you use Anaconda for managing python libraries and work with data science projects, you'll mostly be aware of conda. Similar to pip, conda is a package management system for Python and few other languages. Using conda, you can install the QGIS package like any other library on Python. You can directly install the package in your default (read 'global') python environment. But since QGIS usually has its specific requirements for the dependency modules. So it might upgrade or downgrade critical packages that might create dependency module version conflicts for your other projects.

Ideally, if you use Python for different projects, set up an environment for each project or at least one for the data science workflow. By keeping these separate from the global python environment, you'll keep your system free of package dependency related errors. Therefore the best option to install QGIS libraries is to do it in a virtual environment. It helps in isolating your QGIS packages from the global Python environment.

To install QGIS from conda in your active Python virtual environment, run the following command from the terminal from within the active Python virtual environment. This command installs the necessary QGIS libraries.


conda install -c conda-forge qgis

Map the QGIS libraries from your Virtual Python Environment

The above method might install few core QGIS libraries for use with your python scripts. But this won't still get you access to all of the QGIS spatial algorithms you use on the Desktop package. For example, if you want to use a GRASS or SAGA processing algorithm that's available on the Processing Toolbox of QGIS, it's not possible by installing just QGIS core libraries. If you use the desktop software regularly, you might find that the installation is heavy and storage consuming. Every time you create a project environment, installing such a big package takes up a lot of storage.

Instead, you can use a much simpler solution. You can use the existing installation of QGIS Desktop and its libraries (even GRASS, SAGA and other algorithms installed using QGIS plugins) by mapping its system environment paths from your default Virtual Python environment. You can use one existing QGIS installation for multiple Python environments without the problem of dependency packages, all while using more than just the core libraries of QGIS.

In this case, I'm demonstrating the process on a Mac OS. But you can follow similar steps for Linux & windows with slight modifications.

Step 1: Fetch System Paths & OS Environment Variables from QGIS Python Console

Open the QGIS Desktop app and open the Python console. Then run the following lines first to export system paths to a CSV file.


import sys
import pandas as pd
paths = sys.path
df = pd.DataFrame({'paths':paths})
df.to_csv('./qgis_sys_paths.csv', index=False)

Once you do the export of system paths, you need to export the environment variables in QGIS.


import os
import json
env = dict(os.environ)
rem = ['SECURITYSESSIONID', 'LaunchInstanceID', 'TMPDIR']
_ = [env.pop(r, None) for r in rem]

with open('./qgis_env.json', 'w') as f:
    json.dump(env, f, ensure_ascii=False, indent=4)

The python version you are using, ideally, should be the same as or earlier version of the one used with QGIS installation. To find the Python version and its path installed with QGIS, run the following code and look for the corresponding python executable in the path '/Applications/Qgis.app/Contents/MacOS/bin/' for Mac OS.


from platform import python_version
print(python_version())

Step 2: Initialise QGIS libraries in Python Script before using its Algorithms

Before you can use the QGIS libraries and their spatial algorithms in your python script, we need to set up the environment variables and paths we just exported. Also, we need to initialise the processing module of the QGIS.

First, we import a few necessary python libraries to deal with setting up the environment.


# necessary imports
import os
import sys
import json
import pandas as pd

Once we import these libraries, we need to set the environment variables and system paths.


# set up system paths
qspath = './qgis_sys_paths.csv' # provide the path where you saved this file.
paths = pd.read_csv(qspath).paths.tolist()
sys.path += paths

# set up environment variables
qepath = './qgis_env.json'
js = json.loads(open(qepath, 'r').read())
for k, v in js.items():
    os.environ[k] = v

# In special cases, we might also need to map the PROJ_LIB to handle the projections
# for mac OS
os.environ['PROJ_LIB'] = '/Applications/Qgis.app/Contents/Resources/proj'

Then, we can actually import the QGIS libraries from our python.


# qgis library imports
import PyQt5.QtCore
import gdal
import qgis.PyQt.QtCore
from qgis.core import (QgsApplication,
                       QgsProcessingFeedback,
                       QgsProcessingRegistry)
from qgis.analysis import QgsNativeAlgorithms

In the next step, we initialise the processing module and its algorithms by adding the Native algorithms of QGIS to the processing registry.


feedback = QgsProcessingFeedback()

# initializing processing module
QgsApplication.setPrefixPath(js['HOME'], True)
qgs = QgsApplication([], False)
qgs.initQgis() # use qgs.exitQgis() to exit the processing module at the end of the script.

# initialize processing algorithms
from processing.core.Processing import Processing
Processing.initialize()
import processing

QgsApplication.processingRegistry().addProvider(QgsNativeAlgorithms())

By this step, you have access to all of the QGIS libraries and their spatial algorithms to use from python. You can check all the algorithms you have access to by running the following code.


algs = dict()
for alg in QgsApplication.processingRegistry().algorithms():
    algs[alg.displayName()] = alg.id()
print(algs)

Currently, these steps solve the import of algorithms from providers like QGIS native algorithms & GRASS. I'm still working on enabling SAGA and other plugins like Orfeo Toolbox etc., for use with Python. Keep checking this blog for updates, or if you know how, let me know.

Running the Algorithm

There are a lot of algorithms you can access from the library. You can run the algorithm to help to see the description of each of the algorithms. The parameters to be supplied to the algorithm are also shown with the help output. To see the help, just run the following code by providing the algorithm id:


processing.algorithmHelp("native:centroids")

# which would print this on the console.
"""
Centroids (native:centroids)

This algorithm creates a new point layer, with points representing the centroid of the geometries in an input layer.

The attributes associated to each point in the output layer are the same ones associated to the original features.

----------------
Input parameters
----------------

INPUT: Input layer

	Parameter type:	QgsProcessingParameterFeatureSource

	Accepted data types:
		- str: layer ID
		- str: layer name
		- str: layer source
		- QgsProcessingFeatureSourceDefinition
		- QgsProperty
		- QgsVectorLayer

ALL_PARTS: Create centroid for each part

	Parameter type:	QgsProcessingParameterBoolean

	Accepted data types:
		- bool
		- int
		- str
		- QgsProperty

OUTPUT: Centroids

	Parameter type:	QgsProcessingParameterFeatureSink

	Accepted data types:
		- str: destination vector file, e.g. 'd:/test.shp'
		- str: 'memory:' to store result in temporary memory layer
		- str: using vector provider ID prefix and destination URI, e.g. 'postgres:…' to store result in PostGIS table
		- QgsProcessingOutputLayerDefinition
		- QgsProperty

----------------
Outputs
----------------

OUTPUT: <QgsProcessingOutputVectorLayer>
			Centroids
"""

So we know for the case of running centroid algorithm on any vector layer, we know we have to supply 3 parameters and their accepted data type. Let's run a simple centroid algorithm on a vector file containing a few polygons.

qgis centroids algorithm with python
A sample vector file containing grid polygons over the city of Los Angeles. Photo by Author.

For the algorithm, we need to create a parameter dictionary and once done we run the centroid algorithm.


parameters = {
    'ALL_PARTS': 1,
    'INPUT': './grid-polygons.geojson',
    'OUTPUT': './grid-centroids.geojson'
}
processing.run("native:centroids", parameters)

# this would print out the output, once the algorithm is run.
"""
{'OUTPUT': './grid-centroids.geojson'}
"""

And we can visualise and see the final output of grid centroids on the QGIS Desktop app.

qgis centroids algorithm with python
A centroid was created for each of the grid polygons over the city of Los Angeles. Photo by Author.

Things to be aware of

  • It's ideal keeping the Python version of your virtual environment the same as or earlier version of the one installed with QGIS. Otherwise, this might create issues with some modules that are built for an earlier version of Python.
  • Don't forget to exit the QGIS module you initialised by running qgs.exitQgis() where qgs is the variable you used to initialise the module.
  • For windows users, since there's an OSgeo4W shell, this entire process is handled slightly different and is not covered here.
  • For users with M1 MacBooks, QGIS is installed with intel architecture using Rosetta. And the python installed globally is built on arm64 architecture. Even if you use the Anaconda Python with intel architecture, there are still libraries (esp. data science & spatial data science) that can't be installed. It's essential to match the architecture of the installation of modules so that you can use the QGIS and other libraries with python.
  • If you find that your global python installation doesn't match with one on QGIS, esp. in M1 MacBooks, you can use the QGIS python itself for the data science workflow. Since it's built for the spatial data science needs, there's not much left to add to use it for the projects. You can get this python path by following step 1 above.
    macOS: /Applications/Qgis.app/Contents/MacOS/bin/python[version]
    linux: /usr/share/qgis/python
  • You can use QGIS Python as mentioned above to install spyder & jupyter notebooks using pip for use with your daily workflows.

I hope this was useful in setting up and using QGIS libraries with your daily python workflows. I'll keep updating this part as I find more information, and I'd be happy to receive any suggestions to improve this further.

If you liked this blog, please subscribe to the blog and get notified about future blog posts. You can find me on LinkedIn, Twitter for any queries or discussions. Check out my previous blog on how to geocode addresses for free here.

Nikhil Hubballi

Nikhil Hubballi

Hi there. My name is Nikhil Hubballi, and I’m a Data Scientist with a background in Space Sciences. Currently, as a Senior Data Scientist @PwC AC Kolkata‘s Spatial Analytics team, I work with geospatial data to derive actionable insights.

Do you like our stuff? Subscribe now.