cloudfit-public-docs

This Way Up: A Bottom-Up Look At Python Packaging

Packaging Python software and libraries is one of those topics which at first glance looks like it ought to be simple but very quickly turns out to be anything but. To further complicate things, the Python packaging landscape is in the midst of significant change with new tools and standards are appearing at a dizzying rate. If you’ve found yourself feeling confused and frustrated with Python packaging or wondering where (if anywhere) this story is going, this article may be for you.

TL;DR: The safest thing to do right now is to continue using setuptools and pip, but with a little background understanding many new tools can be used safely too.

Setuptools and pip remain part of the core Python project and are not going anywhere. They’re also experiencing a period of accelerated development and you may find that past issues you may have had have been resolved. Furthermore, ongoing packaging standards development prioritises backward (and forward) compatibility with these tools making them a safe choice for new and existing projects.

Nevertheless, as Python packaging standards work progresses, the risk of adopting other tools is shrinking rapidly but remains a significant factor, for now. However, with a little background understanding of what’s going on behind the scenes, it is possible to understand and manage these early adopter risks. Continue reading to learn more…

In this article we’ll dig into what is really happening under the hood in Python’s packaging machinery. Armed with this low-level knowledge, it becomes easier to understand what the myriad of high-level tools are really up to, and what the future may hold. As a bonus, you’ll hopefully gain some intuition for where to look when things go wrong, how to approach complex situations and avoid common pitfalls.

As a rough outline, the remainder of this article is organised as follows:

Warning: Some parts of this article will quickly show their age!

This article was published in October 2021. Given the feverish pace of packaging standards development, things will have no doubt advanced by the time you read this. Whilst there will be new developments which go unmentioned here, we have avoided discussing anything but established tools or accepted final standards which are likely to remain relevant for some time to come.

How did we get here?

Before delving into any specifics, it is helpful to have a broad picture of how Python’s packaging ecosystem has evolved to reach where we are today.

In 1998, a group of core Python developers, then called the Distutils-SIG, set about the challenge of standardising the distribution of Python modules through the introduction of distutils. As a part of the standard library, distutils grew in popularity but, with releases tied to new Python versions, development proved awkward with limited opportunities to fix bugs or introduce new features.

Around 2004, a number of Distutils-SIG members began work on setuptools: an enhanced, drop-in replacement for distutils, living outside the standard library. In the years which followed, setuptools introduced a raft of concepts that we now take for granted:

As setuptools (and its spin-off ‘pip’ tool) became firmly established as the de-facto standard, distutils began being phased out in 2014 (and is deprecated as of Python 3.10 and will finally be removed in Python 3.12).

Over time, some Python developers grew dissatisfied with setuptools and pip and began to develop numerous extensions and alternatives such as Flit, Poetry, PBR and setuptools_csm to name just a few. Worse, some projects even included their own forks of, or monkey-patched, distutils and setuptools (see NumPy for a prominent example). Unfortunately, due to the lack of a standard for Python packaging (beyond ‘whatever setuptools and pip do’) there became a serious risk of fragmentation. Spurred on by this, the Distutils-SIG, now called the Python Packaging Authority (PyPA), have been working hard to standardise Python packaging formats and mechanisms.

PyPA’s standardisation efforts have made huge progress and received wide community support but much still remains to be done. In the rest of this article we’ll highlight significant standards and prominent gaps and the implications for Python developers. Where possible we’ll also provide links to the relevant standards documents. These largely come in the form of either Python Enhancement Proposals (PEPs) or PyPA specifications. Fortunately, these are often remarkably accessible and filled with useful background information. Be aware, however, that due to the pace of development, it is not uncommon to find accepted specifications which have not yet been fully implemented in practice.

With the stage set, it’s time to dive into some details…

Package formats

Broadly speaking, there are two types of Python package: ‘sdist’ (‘source distribution’) source packages and ‘wheel’ binary packages (another of Python’s Monty Python references). Source packages contain code and some means to build and install it whilst binary packages come in a form which can be installed by just extracting the files as-is. These formats are both well defined and we’ll give a brief introduction to them below.

There is also a third, now deprecated format: the Python ‘egg’. You’re very unlikely to encounter this format directly, however much of setuptools and pip’s internal behaviour was influenced by this format. As a result, it’ll be helpful to talk a little about eggs too.

sdist (source package format)

The sdist (‘source distribution’) package format is defined by the remarkably brief PyPA ‘Source Distribution Format’ specification.

An sdist is a tarball with a name of the form {name}-{version}.tar.gz with the files laid out inside as follows:

The PKG-INFO file contains just enough metadata to support online package indices like PyPI. Unfortunately a lot of useful package metadata, such as dependency information, is not included.

Details: The PKG-INFO file format.

The PKG-INFO uses a slightly quaint email-derived file format to define metadata fields according to the PyPA Core Metadata Specifications. As a sample, a snippet from the PKG-INFO shipped with NumPy is shown below:

Metadata-Version: 1.2
Name: numpy
Version: 1.20.1
Summary:  NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
...

The remaining source files in an sdist usually amount to direct checkout of the package’s source repository, stripped of any extraneous files unrelated to the package.

The sdist specification requires that there be some standardised means for building the software contained in an sdist. Historically, this has meant inclusion of a setuptools setup.py script but a more recently a pyproject.toml file may be used instead. Using this file, build tools other than setuptools can be triggered instead. We’ll talk much more about what ‘building’ entails, as well as this new mechanism (opaquely referred to as PEP 517), later on.

wheels (binary package format)

The ‘wheel’ binary package format is defined in the PyPA binary distribution format specification (though historically was defined by PEP 427). A wheel is an ordinary zip file with a name of the form {name}-{version}-{python version}-{python abi}-{os}.whl. As well as the package name and version, this indicates several important details defining what platform this binary was built for.

Details: Wheel filename parts.

The contents of the non obvious fields in a wheel’s filename follow the standard defined in PEP 425. As an example, one of NumPy’s wheels is named numpy-1.20.1-cp39-cp39-macosx_10_9_x86_64.whl meaning:

For wheels containing only platform-independent Python code, many of the filename parts are replaced with more generic values. For example, the (pure-Python) ‘requests’ package is named requests-2.25.1-py2.py3-none-any.whl meaning:

Conceptually, a wheel is installed by simply extracting it somewhere import will look: no extra build steps are required. Some non-build-related steps, like installing dependencies, might still be necessary but we’ll return to this when we talk about package installation later on.

As well as ready-to-extract Python modules, wheels also include a metadata directory named {name}-{version}.dist-info. At a minimum this has the structure:

The METADATA file contains metadata describing the package. Unlike the PKG-INFO file found in sdists, this metadata is relatively complete, for example including a list of the package’s dependencies.

Details: METADATA file format.

Like the PKG-INFO file, the METADATA also uses the same email-derived file format and field names defined by the PyPA Core Metadata Specifications.

The WHEEL file contains metadata about the wheel itself such as the version of the wheel format used and whether the wheel contains any platform-specific binary files (e.g. compiled C code).

Finally, the RECORD file contains the name and cryptographic hash of every file in the wheel and is used for two distinct purposes. This file is primarily used to facilitate clean uninstallation of a package (since it lists all files to be removed). Its other purpose, however, is facilitating file integrity checks for wheels which are cryptographically signed.

Details: Why wheels for pure-Python packages?

You may be wondering what the benefit of the wheel format is for pure-python packages over just using sdists. Whilst its true that a pure-python wheel will contain more-or-less the same files as an sdist, installing an sdist still requires the execution of arbitrary code (either setup.py file or some other build system). This step might even require the installation of the build system itself! Wheels, however, only require a simple unzipping operation. Furthermore, unlike sdists, wheels include important metadata such as a list of package dependencies which are essential for proper installation.

Python Eggs (legacy package format)

Before the wheel format came the Python ‘egg’ format (or rather egg formats: there are several). Eggs were introduced by setuptools with the original strap line ‘eggs are to Python as JARs are to Java’. As such, the ambitions for the egg format went well beyond mere software distribution and formed a pillar of the now (informally) deprecated pkg_resources module of setuptools. For example, pkg_resources defined mechanisms for locating egg packages at runtime and dynamically loading them into sys.path. Amongst other things, this enabled multiple versions of a package to be installed simultaneously, something now more commonly achieved using virtual environments (more on those later).

Details: Egg format specifications.

The egg format(s) are implementation-defined by setuptools, though an informal description is available in the setuptools documentation.

The pkg_resources module documentation is also found on the setuptools documentation site.

Whilst eggs pioneered many of the principles now used by wheels and many other parts of the Python ecosystem, the format (and pkg_resources) have numerous shortcomings. For example, unlike wheels, eggs do not precisely identify the platform on which any embedded binary code is able to run. Eggs also included Python bytecode files (e.g. *.pyc and *.pyo files) which inconveniently ties even pure-Python eggs to a particular version of Python.

Though their use is now strongly discouraged, the egg format and pkg_resources live on in the internals of setuptools and pip. Beyond the production of occasional egg-related log messages this implementation detail usually has few implications, though in difficult situations it can lead to confusion. As such, its useful to be aware when debugging obscure packaging issues that eggs and pkg_resources may be involved.

Details: Use of eggs by setuptools for editable installs.

Setuptools’ most visible use of eggs is for creating ‘editable installs’ (produced by pip install -e ... or python setup.py develop) – a task for which a standard mechanism remains to be defined. Specifically, the package’s source directory is turned into an ‘egg-info’ egg (through the creation of a special *.egg-info metadata directory) and an ‘egg-link’ egg is installed pointing to the source directory. We won’t go into further details here but you can read more about these egg mechanisms in the setuptools documentation. We’ll also discuss editable installs in general later on.

Building packages

Having described the ‘sdist’ and ‘wheel’ package formats, we now move our attention to how they are created (‘built’) from a source code checkout.

For pure Python packages, the build process boils down to gathering the required files and metadata and stuffing them into the appropriate archive format. Building ‘impure’ packages (including C code, for example) is complicated by the need to invoke compilers to produce binaries when making wheel packages.

Though builds might be triggered directly (for example when you’re creating a software release) they can also be initiated indirectly by tools such as pip. For example, when using pip to install a package from source, pip will invoke that packages’ build system to produce a wheel which it then installs. As a result, it is important for build systems to expose a well-defined API that tools like pip can call on in addition to any user-facing interfaces they might have.

Historically the de-facto standard Python build ‘API’ was setuptools’ setup.py command line interface. Unfortunately this interface is complicated and full of setuptools-specific options making it difficult for other build systems to implement correctly. To address this, between 2015 and 2017 the PyPA standardised a new, simplified, Python-based API for build systems to expose in PEP 517.

Details: PEP 517 Build system API

If you’re curious: the minimum API a build system needs to implement is remarkably simple and includes just two functions:

These two functions take the name of a directory into which a built sdist or wheel is to be written and are then expected to ‘get on with it’.

PEP 517 from a user’s perspective

PEP 517 conceptually divides the build process in two parts:

  1. Preparing a build environment (e.g. an isolated virtual environment with the build software installed)
  2. Building the sdist or wheel

PEP 517 refers to the software doing step 1 as the ‘build frontend’ and the software doing the actual building (step 2) as the ‘build backend’.

For example, when pip installs a package from source using the PEP 517 mechanism, pip acts as a build frontend. First pip will create a temporary virtual environment for the build. Next it installs the build backend specified by the package (e.g. setuptools, Flit, Poetry, etc.) and then invokes it to produce a wheel. Finally pip will install the wheel and destroy the temporary virtual environment.

PEP 517 envisages that developers might choose use a generic build frontend instead of, for example, setup.py, or Poetry and Flit’s native command line tools. PyPA have created such a generic frontend for this purpose, ungoogleably named ‘build’. Build’s minimalistic command line interface essentially consists of:

$ python -m build --sdist  # Build an sdist package
$ python -m build --wheel  # Build a wheel package

Like pip, the build command carries out its duties by quietly creating a temporary build environment, installing and running a build backend to produce whichever kind of package was requested.

Of course, it is not mandatory to use PyPA’s build command and using the build systems’ own interfaces is not discouraged. However using PyPA’s build command can simplify CI processes since they needn’t know which build system a given project is using. It also potentially saves you having to learn to use new build tools when working on an unfamiliar codebase.

PEP 517 from a developer’s perspective

The primary benefit of PEP 517 is that it frees 3rd party build systems from error-prone mimicry of setuptools’ behaviour. As a developer, this guarantees that no matter which (PEP 517 compliant) build system you choose for your project, your users (and tools like pip) will be able to build and install your software.

To take advantage of the PEP 517 build process your project must identify which build system is being used in a pyproject.toml file. The exact details vary from build tool to tool (see their documentation for details) but, as an example, the Flit build system is configured by the following lines:

[build-system]
requires = ["flit_core >=2,<4"]
build-backend = "flit_core.buildapi"

This specifies that the flit_core package is required for the build and that the build backend exposes its API in the flit_core.buildapi Python module. Of course, other build backends will use different names.

Details: Backward compatibility.

For backward compatibility with pre-PEP 517 projects, when a PEP 517 build front end encounters a project with no suitable pyproject.toml, it will fall back on looking for and running setup.py.

From a developer’s perspective, the choice of which build backend you use boils down to taste and functionality. For example, setuptools uses its setup.py or setup.cfg formats for configuration whilst Poetry and Flit define their own somewhat simpler formats. Likewise, setuptools provides sophisticated mechanisms for compiling non-Python code whilst Poetry and Flit only support building pure-Python packages. Further, whilst setuptools and Flit only provide a build system, Poetry provides a build system as just one part a slew of other development tools.

Configuration files

Before moving on to discussing how our newly built Python packages are actually installed, lets take a brief diversion onto the topic of configuration files, specifically the two mentioned in passing in the previous section: pyproject.toml and setup.cfg.

pyproject.toml

The pyproject.toml file format is defined in PEP 518 is intended as a central place for Python build and general development tool related configuration. In the previous section we saw that PEP 517 uses a pyproject.toml file to define which build backend a project uses, but this isn’t the only use for this file.

Details: What is TOML?

TOML (Tom’s Obvious Minimal Language) is (yet another) human-readable configuration file format. PEP 518 outlines the thoughtful and sound reasoning which led to TOML being used over other formats such as JSON, YAML and INI.

As well as the [build-system] section (or ‘table’ in TOML parlance) defined by PEP 517, PEP 518 also states that any Python tool may define its own [tool.<PROJECT-NAME>] section. For example, both Flit and Poetry use their own tool.* section in in the pyproject.toml for configuration. Setuptools is also considering moving to using pyproject.toml for its configuration with a longer term view of deprecating setup.py and setup.cfg (though continuing to support both for backward compatibility).

Since build backends request many of the same pieces of information in practice (e.g. the package name, description and dependencies), a [project] section has been defined by PEP 621. This provides a standardised place for package information may be placed to avoid tools having to define their own configuration options. At the time of writing, this PEP has only recently been accepted but support is appearing in a number of build backends.

setup.cfg

A long standing criticism of setuptools (and distutils before it) was its use of a Python script (setup.py), instead of a configuration file, to define build options. Recognising this, in 2016 setuptools introduced a new configuration format: setup.cfg (using an INI-like syntax).

Noticing this apparently ‘blessed’ configuration file, several popular Python tools started exploiting file for their own configurations too. Unfortunately setuptools had never intended their to be shared with other tools and so no formal namespaces were defined to prevent tools’ configurations from clashing. In fact, setuptools was not the first tool to begin using the setup.cfg filename and so there are already conflicting interpretations of this file in the wild.

Whilst setuptools is unlikely to remove support for setup.cfg files, it is likely that they will be deprecated in favour of pyproject.toml in the not too distant future. As such, at the time of writing setup.cfg remains the preferred approach to configuring setuptools.

Package installation

Now that we’ve covered package formats and builds the next logical topic is package installation. Whilst many aspects of package installation are now well specified, many details remain to be defined beyond “whatever pip or setuptools do”. This situation continues to improve but in this section we’ll describe how things work today whilst pointing out where changes are likely to occur.

Broadly speaking, there are two kinds installation: regular installations and editable (a.k.a. development) installations. We’ll look at each of these in turn. Historically, the egg format had its own installation conventions but we’ll omit those here (except as they apply to editable installations).

Regular (wheel) installations

Of the two package formats, only the wheel specification defines an installation mechanism. (Sdists are installed by first building a wheel and then installing that.)

Wheel installation is broken into two phases: ‘unpack’ and ‘spread’.

In the unpack phase, the wheel is unzipped into the installation location and the extracted files and metadata are checked for consistency. (We’ll discuss installation locations in a later section.)

In the spread phase, any scripts and data files are moved into their final locations (e.g. into a location on the user’s PATH). The installer may also modify the shebang of scripts as it installs them to point to the correct Python interpreter. Finally, all .py files are preemptively compiled into bytecode (producing .pyc files) to make import quicker. During all of these steps the RECORD metadata file unpacked from the wheel is updated to reflect the installed locations of all files.

Package uninstallation consists of removing all files listed in the RECORD metadata file.

A notable omission from the formally specified wheel installation procedure is the process of downloading and installing of any missing package dependencies. At present this step is not rigorously specified (“just do what pip does”) but the metadata critical to this step is now fully specified (see the metadata format specifications and PEP 508 for version number range format specifications).

Details: Dependency installation by pip.

Though pip’s behaviour largely boils down to “find packages with suitable versions on PyPI and install them”, pip also imposes several undocumented (though mostly sensible) rules (see this Python packaging question for an example).

In 2020, pip’s dependency resolver was overhauled in order to finally make it capable of correctly resolving complicated dependency requirements. Prior to this, in tricky cases pip could fail or even install invalid combinations of package versions. Whilst the new resolver is a major improvement, it can sometimes be slow, needing to download several versions of packages (to read their metadata) before finding a working combination.

Editable (development) installations

Setuptools provides a python setup.py develop (or pip install -e .) command which can be used to ‘install’ a source checkout in ‘editable’ mode. In this mode, changes made to a project are reflected immediately without the need to reinstall it afterwards.

Unfortunately, the editable install process is not yet standardised. An attempt was made in PEP 517 but ultimately postponed for a later standard. At present, most build tools which support editable installations fall back on using (or mimicking) setuptools. Since editable installs can often be a source of strife, we’ll briefly explain how setuptools implements them below.

Details: Pip and editable installs.

Pip’s ‘editable install’ mode (enabled by the -e flag) is just a wrapper around setuptools’ setup.py develop behaviour. Even for packages which use a PEP 517 build system, using -e will currently cause pip to ignore this and call setup.py develop directly. As such, editable installs via pip are only supported for setuptools-based packages at present.

Setuptools’ editable install procedure broadly consists of:

Details: Metadata installation.

Setuptools uses the deprecated egg format to install package metadata. As a result after performing an editable install you’re likely to discover a {package name}.egg-info directory in your source tree and *.egg-link files in your Python installation. The former contains a copy of your package’s metadata and the latter tells pip where to find that metadata.

Note that the *.egg-link file is only used to locate package metadata: the mechanism used to add your project to sys.path is different (.pth files).

Installation locations, sys.path and virtual environments

So, as we’ve just discussed, installing Python packages largely amounts to unpacking them into a directory listed in sys.path. This list enumerates the places Python will search whenever an import statement is used. The key question is what exactly ends up in this list and which directory should an installer pick? The answer to this question is somewhat intricate but is often at the heart of many a confusing “broken Python” problem so we’ll dig into it here.

Details: Behaviour of import

In 99% of cases Python’s import mechanism consists of scanning the directories in sys.path for .py files or directories containing an __init__.py file. There are, however, a couple of cases where import might behave differently.

The import mechanism supports a special case for ‘namespace packages’ whereby certain directories without an __init__.py may also be scanned. See PEP 420 for a good explanation of namespace packages and what the import mechanism does to support them.

Python also provides hooks for overriding the default import behaviour with arbitrary logic. In practice it is unlikely you’ll ever have a reason to do this (nor encounter any software which does) but it is worth being aware of.

On startup the sys.path list is populated with four distinct kinds of path:

The locations of the latter two kinds of path are fairly explicit however the standard library and site packages paths are more complicated. Initially we’ll how these are discovered when a virtual environments are not used. Once we understand this, we’ll return to what virtual environments do afterwards.

Standard library paths

The Python standard library is shipped with, and tightly coupled to the specific Python interpreter you’re using. When the interpreter first starts, one of its first tasks is to discover the location of the standard library. This involves scanning locations nearby the python binary in the file system for expected ‘landmark’ files (such as os.py – the os module).

You should never install packages into nor modify files in the standard library directories.

Details: Python standard library locations.

The Python standard library is sometimes split into two parts on disk: pure Python parts and platform-dependent parts. For example, Fedora-like Linux systems place pure-python parts in /usr/lib/python3.9 and platform-dependent parts in /usr/lib64/python3.9/lib-dynload.

The exact location and naming of the standard library paths, and the precise mechanism used to discover them, varies by platform, Python interpreter version and interpreter compilation options used.

In CPython, the precise standard library path discovery logic is defined (and explained) in:

Site packages paths

‘Site packages’ in Python nomenclature are additional non-standard-library packages installed on a system. These paths are added to sys.path by the site module during Python startup.

There are typically two site packages directories: the system site packages and user site packages. The system site packages are intended to hold packages installed by the system administrator and are typically read-only to ordinary users. The user site packages directory is located within your home directory and is the intended place for user-installed packages. User site packages directories are always listed first in sys.path giving them priority over system site packages.

As a rule of thumb it is often best to install packages in to your user site packages rather than system wide. If nothing else it saves the need for administrator privileges but, on some platforms (especially Linux), doing so avoids clashes with the operating system’s package manager.

Setuptools and pip default to installing packages into the system site packages location but accept a --user argument causing installation into user site packages instead. The unfortunate default behaviour results from user site packages being introduced some time later than the system site packages location.

Details: Python Package installation best-practices under Linux.

On many Linux systems it is considered good practice to install system-wide site packages exclusively via the Linux distribution’s own package manager (and not pip). This is because pip and the system package manager can frequently end up treading on each other’s toes resulting in problems which can be difficult to untangle. Installing packages into the user site-packages directory using pip, however, is always safe (and strongly recommended).

Details: Determining the location of site packages directories.

The site module’s documentation provides a fairly precise description of how the site packages directories are defined and discovered. Alternatively this mechanism is also outlined by PEP 370 which introduced the user site packages directory.

The system site packages directory typically resides nearby the standard library directory. For example, on many (non-Debian) Linux systems this is /usr/lib/python3.9/site-packages. The user site packages directory resides within a user’s home directory. For example under Linux this will typically be something like ~/.local/lib/python3.9/site-packages. Like the standard library, site packages paths might also be split into platform-independent and platform-dependant parts.

Details: Quirks of Debian-based Linux distributions.

As a general warning, Debian (and derivatives such as Ubuntu) distribute modified versions of Python and associated tools such as pip. These differences can sometimes cause confusion (and occasionally introduce Debian-specific bugs).

More specifically in the context of Python packaging, Debian systems have two system site packages locations (which Debian calls dist-packages rather than site-packages): one resides in /usr/lib/python3/dist-packages and another in /usr/local/lib/python3/dist-packages. The former is intended for apt-installed packages whilst the latter is intended for pip installed packages in the hope of avoiding the two tools fighting. To support this, Debian’s Python adds special behaviour to the site module and modifies pip to cause it to install into the alternative installation location during system-wide installations.

Site packages and *.pth files

As well as discovering the site packages locations during startup, the site module also scans the site packages locations for .pth files. In the common case, these files contain a newline-delimited list of extra directories to be added to sys.path. This mechanism is widely used to facilitate editable installs by adding source directories to the Python path.

Details: Setuptools’ use of .pth files for editable installs.

Rather than creating one .pth file per package, setuptools uses a single easy_install.pth file instead. This implementation detail is a reference to the (deprecated) easy_install tool, a predecessor of pip.

Details: Use of .pth files outside of site packages directories.

Because support for .pth files is part of the site module and not Python’s import system generally, .pth files can only be used within site packages directories. For example, .pth files in directories mentioned in the PYTHONPATH environment variable or added to sys.path manually will be ignored.

Details: Other functions of the site module.

The site module also provides a number of other more advanced mechanisms, such as the ability to register arbitrary custom Python code to manipulate sys.path during startup. This is a rarely used feature, though you may spot setuptools’ using it in -nspkg.pth files to provide backward compatibility with older namespace packages. See the site module’s documentation for more details of this aspect of the .pth file format.

Virtual Environments

Python virtual environments (standardised in PEP 405) are a convenient mechanism for creating isolated Python installations in which packages may be installed and used without impacting the system Python installation. For example they are often used to provide a consistent environment for testing and development, isolated from other system or user installed packages.

As we learnt in the previous section, all non-standard-library Python packages live in the various system and user site packages directories. As such, all a Python virtual environment needs to do to provide isolation from these packages is to create its own (private) site packages directory and omit the system- and user-wide directories from sys.path. To restate this: all Python virtual environments do is change what ends up in sys.path: no other sandboxing, system virtualisation or other fancy techniques are involved.

A minimal Python virtual environment is defined by the following file structure below (defined in PEP 405):

When the Python binary is executed from within the virtual environment directory, it will look for (and discover) the pyvenv.cfg file in its parent directory. This informs Python that it is running within a virtual environment and changes the way it populates sys.path. Specifically, it will cause Python to skip adding the system and user site packages locations to sys.path and instead add the virtual environment’s <virtual env dir>/lib/pythonX.Y/site-packages directory instead.

The activate script (if provided) carries out only one essential purpose: it adds the bin directory of the virtual environment to the top of the users’ PATH.

Details: Additional steps during virtual environment creation.

Virtual environments created using the built-in python -m venv command (or the popular third party virtualenv command) will add more than just this bare minimum to the virtual environment. For example, they will also install a copy of pip and setuptools into the environment’s site-packages directory making it easier to install new packages.

Details: The pyvenv.cfg file.

The pyvenv.cfg file actually contains two important pieces of information. Firstly it contains the original location of the Python interpreter (for example /usr/bin), enabling it to use its normal logic for locating the Python standard library and add this to sys.path as usual. Secondly, it contains a flag indicating whether the system and user site packages directories should be added to sys.path. Typically this flag is not set ensuring proper isolation but is provided in the less common case where it is desired to also use system wide packages alongside any privately installed in the virtual environment.

Details: Legacy virtual environments.

Prior to the standardisation of virtual environments (in PEP 405) Python virtual environments were constructed using various hacky means involving copying and patching parts of Python and its standard library. Virtual environments created in this way (e.g. using very old versions of the virtualenv command) can be spotted by the absence of a pyvenv.cfg file. If you spot one of these, it is a good idea to recreate the environment using a more up to date version of virtualenv or via the new python -m venv command.

Conclusions

For the average Python developer, the current explosion of tools and apparent divergence of opinions on Python packaging can be somewhat alarming. Fortunately, this is not an indication of a schism with the Python community. On the contrary, the sudden diversity of tooling and ideas is the result of the fundamentals of Python packaging finally crystallising into interoperable standards. This common core is what enables Python projects employing these tools and ideas to work seamlessly together as part of the wider Python ecosystem.

In the past, third party Python packaging tools have been hampered by the need to mimic the arcane internal behaviours of Python’s de-facto standard tooling: setuptools and pip. This meant that such tools were often fragile and constantly ran the risk of breakage whenever setuptools or pip changed their internal behaviour. Consequently adoption of new tools and ideas have historically been limited.

Recognising this, the core Python developers have been working hard for many years to standardise and substantially refactor Python packaging processes to enable tools other than setuptools or pip to coexist. This is no small task due to the diversity of systems on which Python runs and the wide range of applications Python packages must support. There’s also the significant burden of backward compatibility with ad-hoc Python packaging schemes going back several decades.

As we’ve highlighted in this article, many fundamental components such as package formats and build system interfaces have now been standardised, though much work still remains. Nevertheless, these have provided firm foundations upon which a vibrant new ecosystem of packaging tools are being built. From tightly-scoped packaging-focused tools like Flit to all-encompassing Python development environments such as Poetry, these new tools are no longer building on shifting stands. Meanwhile existing and battle-tested tools like setuptools and pip have benefited from improved robustness and usability.

We hope that this article has introduced you to some of the underlying principles of Python and its packaging system. Armed with this fundamental knowledge we hope you will be in a better position to approach difficult packaging problems and exploit new and emerging tools.