Packaging Python software and libraries is one of those topics which at first glance looks like it ought to be simple but very quickly turns out to be anything but. To further complicate things, the Python packaging landscape is in the midst of significant change with new tools and standards are appearing at a dizzying rate. If you’ve found yourself feeling confused and frustrated with Python packaging or wondering where (if anywhere) this story is going, this article may be for you.
Setuptools and pip remain part of the core Python project and are not going anywhere. They’re also experiencing a period of accelerated development and you may find that past issues you may have had have been resolved. Furthermore, ongoing packaging standards development prioritises backward (and forward) compatibility with these tools making them a safe choice for new and existing projects.
Nevertheless, as Python packaging standards work progresses, the risk of adopting other tools is shrinking rapidly but remains a significant factor, for now. However, with a little background understanding of what’s going on behind the scenes, it is possible to understand and manage these early adopter risks. Continue reading to learn more…
In this article we’ll dig into what is really happening under the hood in Python’s packaging machinery. Armed with this low-level knowledge, it becomes easier to understand what the myriad of high-level tools are really up to, and what the future may hold. As a bonus, you’ll hopefully gain some intuition for where to look when things go wrong, how to approach complex situations and avoid common pitfalls.
As a rough outline, the remainder of this article is organised as follows:
sys.path
and virtual environments –
Where do installed packages end up? (Plus: demystifying the source of a
great many ‘broken Python’ situations.)This article was published in October 2021. Given the feverish pace of packaging standards development, things will have no doubt advanced by the time you read this. Whilst there will be new developments which go unmentioned here, we have avoided discussing anything but established tools or accepted final standards which are likely to remain relevant for some time to come.
Before delving into any specifics, it is helpful to have a broad picture of how Python’s packaging ecosystem has evolved to reach where we are today.
In 1998, a group of core Python developers, then called the Distutils-SIG, set about the challenge of standardising the distribution of Python modules through the introduction of distutils. As a part of the standard library, distutils grew in popularity but, with releases tied to new Python versions, development proved awkward with limited opportunities to fix bugs or introduce new features.
Around 2004, a number of Distutils-SIG members began work on setuptools: an enhanced, drop-in replacement for distutils, living outside the standard library. In the years which followed, setuptools introduced a raft of concepts that we now take for granted:
The ability to specify dependencies on other packages
Improved packaging formats with better metadata and support for including pre-built binaries
Mechanisms for allowing multiple versions of a module to be installed and used simultaneously.
Tools for automatically downloading and installing Python packages from the Python Package Index (PyPI)
As setuptools (and its spin-off ‘pip’ tool) became firmly established as the de-facto standard, distutils began being phased out in 2014 (and is deprecated as of Python 3.10 and will finally be removed in Python 3.12).
Over time, some Python developers grew dissatisfied with setuptools and pip and began to develop numerous extensions and alternatives such as Flit, Poetry, PBR and setuptools_csm to name just a few. Worse, some projects even included their own forks of, or monkey-patched, distutils and setuptools (see NumPy for a prominent example). Unfortunately, due to the lack of a standard for Python packaging (beyond ‘whatever setuptools and pip do’) there became a serious risk of fragmentation. Spurred on by this, the Distutils-SIG, now called the Python Packaging Authority (PyPA), have been working hard to standardise Python packaging formats and mechanisms.
PyPA’s standardisation efforts have made huge progress and received wide community support but much still remains to be done. In the rest of this article we’ll highlight significant standards and prominent gaps and the implications for Python developers. Where possible we’ll also provide links to the relevant standards documents. These largely come in the form of either Python Enhancement Proposals (PEPs) or PyPA specifications. Fortunately, these are often remarkably accessible and filled with useful background information. Be aware, however, that due to the pace of development, it is not uncommon to find accepted specifications which have not yet been fully implemented in practice.
With the stage set, it’s time to dive into some details…
Broadly speaking, there are two types of Python package: ‘sdist’ (‘source distribution’) source packages and ‘wheel’ binary packages (another of Python’s Monty Python references). Source packages contain code and some means to build and install it whilst binary packages come in a form which can be installed by just extracting the files as-is. These formats are both well defined and we’ll give a brief introduction to them below.
There is also a third, now deprecated format: the Python ‘egg’. You’re very unlikely to encounter this format directly, however much of setuptools and pip’s internal behaviour was influenced by this format. As a result, it’ll be helpful to talk a little about eggs too.
The sdist (‘source distribution’) package format is defined by the remarkably brief PyPA ‘Source Distribution Format’ specification.
An sdist is a tarball with a
name of the form {name}-{version}.tar.gz
with the files laid out inside as
follows:
{name}-{version}/
PKG-INFO
(metadata file)The PKG-INFO
file contains just enough metadata to support online package
indices like PyPI. Unfortunately a lot of useful package
metadata, such as dependency information, is not included.
PKG-INFO
file format.
The PKG-INFO
uses a slightly quaint email-derived file format to define
metadata fields according to the PyPA Core Metadata
Specifications.
As a sample, a snippet from the PKG-INFO
shipped with
NumPy is shown below:
Metadata-Version: 1.2
Name: numpy
Version: 1.20.1
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
...
The remaining source files in an sdist usually amount to direct checkout of the package’s source repository, stripped of any extraneous files unrelated to the package.
The sdist
specification
requires that there be some standardised means for building the software
contained in an sdist. Historically, this has meant inclusion of a setuptools
setup.py
script but a more recently a pyproject.toml
file may be used
instead. Using this file, build tools other than setuptools can be triggered
instead. We’ll talk much more about what ‘building’ entails, as well as this
new mechanism (opaquely referred to as PEP
517), later on.
The ‘wheel’ binary package format is defined in the PyPA binary distribution
format
specification
(though historically was defined by PEP
427). A wheel is an ordinary zip
file with a name of the form {name}-{version}-{python version}-{python
abi}-{os}.whl
. As well as the package name and version, this indicates several
important details defining what platform this binary was built for.
The contents of the non obvious fields in a wheel’s filename follow the
standard defined in PEP 425. As
an example, one of NumPy’s wheels is named
numpy-1.20.1-cp39-cp39-macosx_10_9_x86_64.whl
meaning:
numpy
= Package name is ‘numpy’1.20.1
= Package version is 1.20.1cp39
= For use specifically with
CPython 3.9 (e.g. not
other CPython versions or other Python implementations, e.g.
PyPy).
etc.)cp39
= Contains binary code compiled against the CPython 3.9
ABImacosx_10_9_x86_64
= Built for 64-bit Intel computers running MacOS 10.9.For wheels containing only platform-independent Python code, many of the
filename parts are replaced with more generic values. For example, the
(pure-Python) ‘requests’ package is named
requests-2.25.1-py2.py3-none-any.whl
meaning:
Conceptually, a wheel is installed by simply extracting it somewhere import
will look: no extra build steps are required. Some non-build-related steps,
like installing dependencies, might still be necessary but we’ll return to this
when we talk about package installation later on.
As well as ready-to-extract Python modules, wheels also include a metadata
directory named {name}-{version}.dist-info
. At a minimum this has the
structure:
{name}-{version}.dist-info/
METADATA
– Package metadataWHEEL
– Wheel format metadataRECORD
– File list and checksumsThe METADATA
file contains metadata describing the package. Unlike the
PKG-INFO
file found in sdists, this metadata is relatively complete, for
example including a list of the package’s dependencies.
METADATA
file format.
Like the PKG-INFO
file, the METADATA
also uses the same email-derived
file format and field names defined by the PyPA Core Metadata
Specifications.
The WHEEL
file contains metadata about the wheel itself such as the version
of the wheel format used and whether the wheel contains any platform-specific
binary files (e.g. compiled C code).
Finally, the RECORD
file contains the name and cryptographic hash of every
file in the wheel and is used for two distinct purposes. This file is primarily
used to facilitate clean uninstallation of a package (since it lists all files
to be removed). Its other purpose, however, is facilitating file integrity
checks for wheels which are cryptographically signed.
You may be wondering what the benefit of the wheel format is for pure-python
packages over just using sdists. Whilst its true that a pure-python wheel will
contain more-or-less the same files as an sdist, installing an sdist still
requires the execution of arbitrary code (either setup.py
file or some
other build system). This step might even require the installation of
the build system itself! Wheels, however, only require a simple unzipping
operation. Furthermore, unlike sdists, wheels include important metadata such
as a list of package dependencies which are essential for proper installation.
Before the wheel format came the Python ‘egg’ format (or rather egg formats:
there are several). Eggs were introduced by setuptools with the original strap
line ‘eggs are to Python as
JARs are to Java’. As such,
the ambitions for the egg format went well beyond mere software distribution
and formed a pillar of the now (informally)
deprecated pkg_resources
module of
setuptools.
For example, pkg_resources
defined mechanisms for locating egg packages at
runtime and dynamically loading them into sys.path
. Amongst other things,
this enabled multiple versions of a package to be installed simultaneously,
something now more commonly achieved using virtual environments (more on those
later).
The egg format(s) are implementation-defined by setuptools, though an informal description is available in the setuptools documentation.
The pkg_resources
module
documentation
is also found on the setuptools documentation site.
Whilst eggs pioneered many of the principles now used by wheels and many other
parts of the Python ecosystem, the format (and pkg_resources
) have numerous
shortcomings. For example, unlike wheels, eggs do not precisely identify the
platform on which any embedded binary code is able to run. Eggs also included
Python bytecode files (e.g. *.pyc
and *.pyo
files) which inconveniently
ties even pure-Python eggs to a particular version of Python.
Though their use is now strongly discouraged, the egg format and
pkg_resources
live on in the internals of setuptools and pip. Beyond the
production of occasional egg-related log messages this implementation detail
usually has few implications, though in difficult situations it can lead to
confusion. As such, its useful to be aware when debugging obscure packaging
issues that eggs and pkg_resources
may be involved.
Setuptools’ most visible use of eggs is for creating ‘editable installs’
(produced by pip install -e ...
or python setup.py develop
) – a task for
which a standard mechanism remains to be
defined.
Specifically, the package’s source directory is turned into an ‘egg-info’ egg
(through the creation of a special *.egg-info
metadata directory) and an
‘egg-link’ egg is installed pointing to the source directory. We won’t go
into further details here but you can read more about these egg mechanisms in
the setuptools
documentation.
We’ll also discuss editable installs in general later on.
Having described the ‘sdist’ and ‘wheel’ package formats, we now move our attention to how they are created (‘built’) from a source code checkout.
For pure Python packages, the build process boils down to gathering the required files and metadata and stuffing them into the appropriate archive format. Building ‘impure’ packages (including C code, for example) is complicated by the need to invoke compilers to produce binaries when making wheel packages.
Though builds might be triggered directly (for example when you’re creating a software release) they can also be initiated indirectly by tools such as pip. For example, when using pip to install a package from source, pip will invoke that packages’ build system to produce a wheel which it then installs. As a result, it is important for build systems to expose a well-defined API that tools like pip can call on in addition to any user-facing interfaces they might have.
Historically the de-facto standard Python build ‘API’ was setuptools’
setup.py
command line interface. Unfortunately this interface is complicated
and full of setuptools-specific options making it difficult for other build
systems to implement correctly. To address this, between 2015 and 2017 the PyPA
standardised a new, simplified, Python-based API for build systems to expose in
PEP 517.
If you’re curious: the minimum API a build system needs to implement is remarkably simple and includes just two functions:
These two functions take the name of a directory into which a built sdist or wheel is to be written and are then expected to ‘get on with it’.
PEP 517 conceptually divides the build process in two parts:
PEP 517 refers to the software doing step 1 as the ‘build frontend’ and the software doing the actual building (step 2) as the ‘build backend’.
For example, when pip installs a package from source using the PEP 517 mechanism, pip acts as a build frontend. First pip will create a temporary virtual environment for the build. Next it installs the build backend specified by the package (e.g. setuptools, Flit, Poetry, etc.) and then invokes it to produce a wheel. Finally pip will install the wheel and destroy the temporary virtual environment.
PEP 517 envisages that developers
might choose use a generic build frontend instead of, for example, setup.py
,
or Poetry and Flit’s native command line tools. PyPA have created such a
generic frontend for this purpose, ungoogleably named
‘build’. Build’s minimalistic
command line interface essentially consists of:
$ python -m build --sdist # Build an sdist package
$ python -m build --wheel # Build a wheel package
Like pip, the build command carries out its duties by quietly creating a temporary build environment, installing and running a build backend to produce whichever kind of package was requested.
Of course, it is not mandatory to use PyPA’s build command and using the build systems’ own interfaces is not discouraged. However using PyPA’s build command can simplify CI processes since they needn’t know which build system a given project is using. It also potentially saves you having to learn to use new build tools when working on an unfamiliar codebase.
The primary benefit of PEP 517 is that it frees 3rd party build systems from error-prone mimicry of setuptools’ behaviour. As a developer, this guarantees that no matter which (PEP 517 compliant) build system you choose for your project, your users (and tools like pip) will be able to build and install your software.
To take advantage of the PEP 517
build process your project must identify which build system is being used in a
pyproject.toml
file. The exact details vary from build tool to tool (see
their documentation for details) but, as an example, the
Flit build system is configured by
the following lines:
[build-system]
requires = ["flit_core >=2,<4"]
build-backend = "flit_core.buildapi"
This specifies that the flit_core
package is required for the build and that
the build backend exposes its API in the flit_core.buildapi
Python module.
Of course, other build backends will use different names.
For backward compatibility with pre-PEP
517 projects, when a PEP
517 build front end encounters a
project with no suitable pyproject.toml
, it will fall back on looking for and
running setup.py
.
From a developer’s perspective, the choice of which build backend you use
boils down to taste and functionality. For example, setuptools uses its
setup.py
or setup.cfg
formats for configuration whilst Poetry and Flit
define their own somewhat simpler formats. Likewise, setuptools provides
sophisticated mechanisms for compiling non-Python code whilst Poetry and Flit
only support building pure-Python packages. Further, whilst setuptools and Flit
only provide a build system, Poetry provides a build system as just one part a
slew of other development tools.
Before moving on to discussing how our newly built Python packages are actually
installed, lets take a brief diversion onto the topic of configuration files,
specifically the two mentioned in passing in the previous section:
pyproject.toml
and setup.cfg
.
pyproject.toml
The pyproject.toml
file format is defined in PEP
518 is intended as a central place
for Python build and general development tool related configuration. In the
previous section we saw that PEP
517 uses a pyproject.toml
file to
define which build backend a project uses, but this isn’t the only use for this
file.
TOML (Tom’s Obvious Minimal Language) is (yet another) human-readable configuration file format. PEP 518 outlines the thoughtful and sound reasoning which led to TOML being used over other formats such as JSON, YAML and INI.
As well as the [build-system]
section (or ‘table’ in TOML parlance) defined
by PEP 517, PEP
518 also states that any Python
tool may define its own [tool.<PROJECT-NAME>]
section. For example, both
Flit and
Poetry use their own tool.*
section in in the pyproject.toml
for configuration. Setuptools is also
considering moving to using pyproject.toml
for its
configuration with a longer
term view of deprecating setup.py
and setup.cfg
(though continuing to
support both for backward compatibility).
Since build backends request many of the same pieces of information in practice
(e.g. the package name, description and dependencies), a [project]
section
has been defined by PEP 621. This
provides a standardised place for package information may be placed to avoid
tools having to define their own configuration options. At the time of writing,
this PEP has only recently been accepted but support is appearing in a number
of build backends.
setup.cfg
A long standing criticism of setuptools (and distutils before it) was its use
of a Python script (setup.py
), instead of a configuration file, to define
build options. Recognising this, in 2016 setuptools introduced a new
configuration format: setup.cfg
(using an INI-like
syntax).
Noticing this apparently ‘blessed’ configuration file, several popular Python
tools started exploiting file for their own configurations too. Unfortunately
setuptools had never intended their to be shared with other tools and so no
formal namespaces were defined to prevent tools’ configurations from
clashing.
In fact, setuptools was not the first tool to begin using the setup.cfg
filename and so there are already conflicting interpretations of this file in
the wild.
Whilst setuptools is unlikely to remove support for setup.cfg
files, it is
likely that they will be deprecated in favour of pyproject.toml
in the not
too distant future. As such,
at the time of writing setup.cfg
remains the preferred approach to
configuring setuptools.
Now that we’ve covered package formats and builds the next logical topic is package installation. Whilst many aspects of package installation are now well specified, many details remain to be defined beyond “whatever pip or setuptools do”. This situation continues to improve but in this section we’ll describe how things work today whilst pointing out where changes are likely to occur.
Broadly speaking, there are two kinds installation: regular installations and editable (a.k.a. development) installations. We’ll look at each of these in turn. Historically, the egg format had its own installation conventions but we’ll omit those here (except as they apply to editable installations).
Of the two package formats, only the wheel specification defines an installation mechanism. (Sdists are installed by first building a wheel and then installing that.)
Wheel installation is broken into two phases: ‘unpack’ and ‘spread’.
In the unpack phase, the wheel is unzipped into the installation location and the extracted files and metadata are checked for consistency. (We’ll discuss installation locations in a later section.)
In the spread phase, any scripts and data files are moved into their final
locations (e.g. into a location on the user’s PATH
). The installer may also
modify the shebang of scripts
as it installs them to point to the correct Python interpreter. Finally, all
.py
files are preemptively compiled into bytecode (producing .pyc
files) to
make import
quicker. During all of these steps the RECORD
metadata file
unpacked from the wheel is updated to reflect the installed locations of all
files.
Package uninstallation consists of removing all files listed in the RECORD
metadata file.
A notable omission from the formally specified wheel installation procedure is the process of downloading and installing of any missing package dependencies. At present this step is not rigorously specified (“just do what pip does”) but the metadata critical to this step is now fully specified (see the metadata format specifications and PEP 508 for version number range format specifications).
Though pip’s behaviour largely boils down to “find packages with suitable versions on PyPI and install them”, pip also imposes several undocumented (though mostly sensible) rules (see this Python packaging question for an example).
In 2020, pip’s dependency resolver was overhauled in order to finally make it capable of correctly resolving complicated dependency requirements. Prior to this, in tricky cases pip could fail or even install invalid combinations of package versions. Whilst the new resolver is a major improvement, it can sometimes be slow, needing to download several versions of packages (to read their metadata) before finding a working combination.
Setuptools provides a python setup.py develop
(or pip install -e .
) command
which can be used to ‘install’ a source checkout in ‘editable’ mode. In this
mode, changes made to a project are reflected immediately without the need to
reinstall it afterwards.
Unfortunately, the editable install process is not yet standardised. An attempt was made in PEP 517 but ultimately postponed for a later standard. At present, most build tools which support editable installations fall back on using (or mimicking) setuptools. Since editable installs can often be a source of strife, we’ll briefly explain how setuptools implements them below.
Pip’s ‘editable install’ mode (enabled by the -e
flag) is just a wrapper
around setuptools’ setup.py develop
behaviour. Even for packages which use
a PEP 517 build system, using
-e
will currently cause pip to ignore this and call setup.py develop
directly. As such, editable installs via pip are only supported for
setuptools-based packages at present.
Setuptools’ editable install procedure broadly consists of:
sys.path
using a .pth
file (more on these
later).PATH
Setuptools uses the deprecated egg format to install package metadata. As a
result after performing an editable install you’re likely to discover a
{package name}.egg-info
directory in your source tree and *.egg-link
files
in your Python installation. The former contains a copy of your package’s
metadata and the latter tells pip where to find that metadata.
Note that the *.egg-link
file is only used to locate package metadata:
the mechanism used to add your project to sys.path
is different (.pth
files).
sys.path
and virtual environmentsSo, as we’ve just discussed, installing Python packages largely amounts to
unpacking them into a directory listed in
sys.path
. This list
enumerates the places Python will search whenever an import
statement is
used. The key question is what exactly ends up in this list and which
directory should an installer pick? The answer to this question is somewhat
intricate but is often at the heart of many a confusing “broken Python” problem
so we’ll dig into it here.
import
In 99% of cases Python’s import
mechanism consists of scanning the
directories in sys.path
for .py
files or directories containing an
__init__.py
file. There are, however, a couple of cases where import
might behave differently.
The import
mechanism supports a special case for ‘namespace packages’ whereby
certain directories without an __init__.py
may also be scanned. See PEP 420
for a good explanation of namespace packages and what the import mechanism does
to support them.
Python also provides hooks for overriding the default import behaviour with arbitrary logic. In practice it is unlikely you’ll ever have a reason to do this (nor encounter any software which does) but it is worth being aware of.
On startup the sys.path
list is populated with four distinct kinds of path:
os
and re
. There is usually just one of these in sys.path
.numpy
and your own packages). There are typically several
site packages paths. Package installers will pick one of these based on the
kind of installation performed.python
is started
with no script specified, the current working directory is added instead.PYTHONPATH
environment variable. In general you should avoid using this
mechanism – for example by using virtual environments instead (more on those
later).The locations of the latter two kinds of path are fairly explicit however the standard library and site packages paths are more complicated. Initially we’ll how these are discovered when a virtual environments are not used. Once we understand this, we’ll return to what virtual environments do afterwards.
The Python standard library is shipped with, and tightly coupled to the
specific Python interpreter you’re using. When the interpreter first starts,
one of its first tasks is to discover the location of the standard library.
This involves scanning locations nearby the python
binary in the file system
for expected ‘landmark’ files (such as os.py
– the os
module).
You should never install packages into nor modify files in the standard library directories.
The Python standard library is sometimes
split
into two parts on disk: pure Python parts and platform-dependent parts. For
example, Fedora-like Linux systems place pure-python parts in
/usr/lib/python3.9
and platform-dependent parts in
/usr/lib64/python3.9/lib-dynload
.
The exact location and naming of the standard library paths, and the precise mechanism used to discover them, varies by platform, Python interpreter version and interpreter compilation options used.
In CPython, the precise standard library path discovery logic is defined (and explained) in:
PC/getpathp.c
for WindowsModules/getpath.c
for all other platforms‘Site packages’ in Python nomenclature are additional non-standard-library
packages installed on a system. These paths are added to sys.path
by the
site
module during Python
startup.
There are typically two site packages
directories: the system site
packages and user site packages. The system site packages are intended to hold
packages installed by the system administrator and are typically read-only to
ordinary users. The user site packages directory is located within your home
directory and is the intended place for user-installed packages. User site
packages directories are always listed first in sys.path
giving them priority
over system site packages.
As a rule of thumb it is often best to install packages in to your user site packages rather than system wide. If nothing else it saves the need for administrator privileges but, on some platforms (especially Linux), doing so avoids clashes with the operating system’s package manager.
Setuptools and pip default to installing packages into the system site packages
location but accept a --user
argument causing installation into user site
packages instead. The unfortunate default behaviour results from user site
packages being introduced some time later than the system site packages
location.
On many Linux systems it is considered good practice to install system-wide site packages exclusively via the Linux distribution’s own package manager (and not pip). This is because pip and the system package manager can frequently end up treading on each other’s toes resulting in problems which can be difficult to untangle. Installing packages into the user site-packages directory using pip, however, is always safe (and strongly recommended).
The site
module’s
documentation provides a fairly
precise description of how the site packages directories are defined and
discovered. Alternatively this mechanism is also outlined by PEP
370 which introduced the user site
packages directory.
The system site packages directory typically resides nearby the standard
library directory. For example, on many (non-Debian) Linux systems this is
/usr/lib/python3.9/site-packages
. The user site packages directory resides
within a user’s home directory. For example under Linux this will typically be
something like ~/.local/lib/python3.9/site-packages
. Like the standard
library, site packages paths might also be split into platform-independent and
platform-dependant parts.
As a general warning, Debian (and derivatives such as Ubuntu) distribute modified versions of Python and associated tools such as pip. These differences can sometimes cause confusion (and occasionally introduce Debian-specific bugs).
More specifically in the context of Python packaging, Debian systems have
two system site packages locations (which Debian calls dist-packages
rather than site-packages
): one resides in /usr/lib/python3/dist-packages
and another in /usr/local/lib/python3/dist-packages
. The former is intended
for apt-installed packages
whilst the latter is intended for pip installed packages in the hope of
avoiding the two tools fighting. To support this, Debian’s Python adds special
behaviour to the site
module and modifies pip to cause it to install into the
alternative installation location during system-wide installations.
*.pth
filesAs well as discovering the site packages locations during startup, the site
module also scans the site
packages locations for .pth
files. In the common case, these files contain a
newline-delimited list of extra directories to be added to sys.path
. This
mechanism is widely used to facilitate editable installs by adding source
directories to the Python path.
.pth
files for editable installs.
Rather than creating one .pth
file per package, setuptools uses a single
easy_install.pth
file instead. This implementation detail is a reference to
the (deprecated)
easy_install
tool, a predecessor of pip.
.pth
files outside of site packages
directories.
Because support for .pth
files is part of the site
module and not
Python’s import system generally, .pth
files can only be used within site
packages directories. For example, .pth
files in directories mentioned in
the PYTHONPATH
environment variable or added to sys.path
manually will
be ignored.
site
module.
The site
module also provides a number of other more advanced mechanisms,
such as the ability to register arbitrary custom Python code to manipulate
sys.path
during startup. This is a rarely used feature, though you may spot
setuptools’ using
it in
-nspkg.pth
files to provide backward compatibility with older namespace
packages.
See the site
module’s
documentation for more details
of this aspect of the .pth
file format.
Python virtual environments (standardised in PEP 405) are a convenient mechanism for creating isolated Python installations in which packages may be installed and used without impacting the system Python installation. For example they are often used to provide a consistent environment for testing and development, isolated from other system or user installed packages.
As we learnt in the previous section, all non-standard-library Python packages
live in the various system and user site packages directories. As such, all a
Python virtual environment needs to do to provide isolation from these packages
is to create its own (private) site packages directory and omit the system- and
user-wide directories from sys.path
. To restate this: all Python virtual
environments do is change what ends up in sys.path
: no other
sandboxing,
system virtualisation or other
fancy techniques are involved.
A minimal Python virtual environment is defined by the following file structure below (defined in PEP 405):
venv_directory/
bin/
python
– A symlink to or copy of the Python interpreter binaryactivate
– (Optional) A convenient shell script to activate the
virtual environment.lib/
pythonX.Y/
– With ‘X.Y’ replaced with the Python version (e.g. 3.9)
site-packages/
– The environments’ private site packages directorypyvenv.cfg
– A ‘magic’ file which Python looks for to detect that its in
virtual environmentWhen the Python binary is executed from within the virtual environment
directory, it will look for (and discover) the pyvenv.cfg
file in its parent
directory. This informs Python that it is running within a virtual environment
and changes the way it populates sys.path
. Specifically, it will cause Python
to skip adding the system and user site packages locations to sys.path
and
instead add the virtual environment’s <virtual env
dir>/lib/pythonX.Y/site-packages
directory instead.
The activate
script (if provided) carries out only one essential purpose: it
adds the bin
directory of the virtual environment to the top of the users’
PATH
.
Virtual environments created using the built-in python -m
venv
command (or
the popular third party virtualenv
command) will add more than just this bare minimum to the virtual environment.
For example, they will also install a copy of pip and setuptools into the
environment’s site-packages directory making it easier to install new packages.
pyvenv.cfg
file.
The pyvenv.cfg
file actually contains two important pieces of information.
Firstly it contains the original location of the Python interpreter (for
example /usr/bin
), enabling it to use its normal logic for locating the
Python standard library and add this to sys.path
as usual. Secondly, it
contains a flag indicating whether the system and user site packages
directories should be added to sys.path
. Typically this flag is not set
ensuring proper isolation but is provided in the less common case where it is
desired to also use system wide packages alongside any privately installed
in the virtual environment.
Prior to the standardisation of virtual environments (in PEP
405) Python virtual environments
were constructed using various hacky
means involving
copying and patching parts of Python and its standard library. Virtual
environments created in this way (e.g. using very old versions of the
virtualenv command) can be spotted
by the absence of a pyvenv.cfg
file. If you spot one of these, it is a good
idea to recreate the environment using a more up to date version of
virtualenv
or via the new python -m venv
command.
For the average Python developer, the current explosion of tools and apparent divergence of opinions on Python packaging can be somewhat alarming. Fortunately, this is not an indication of a schism with the Python community. On the contrary, the sudden diversity of tooling and ideas is the result of the fundamentals of Python packaging finally crystallising into interoperable standards. This common core is what enables Python projects employing these tools and ideas to work seamlessly together as part of the wider Python ecosystem.
In the past, third party Python packaging tools have been hampered by the need to mimic the arcane internal behaviours of Python’s de-facto standard tooling: setuptools and pip. This meant that such tools were often fragile and constantly ran the risk of breakage whenever setuptools or pip changed their internal behaviour. Consequently adoption of new tools and ideas have historically been limited.
Recognising this, the core Python developers have been working hard for many years to standardise and substantially refactor Python packaging processes to enable tools other than setuptools or pip to coexist. This is no small task due to the diversity of systems on which Python runs and the wide range of applications Python packages must support. There’s also the significant burden of backward compatibility with ad-hoc Python packaging schemes going back several decades.
As we’ve highlighted in this article, many fundamental components such as package formats and build system interfaces have now been standardised, though much work still remains. Nevertheless, these have provided firm foundations upon which a vibrant new ecosystem of packaging tools are being built. From tightly-scoped packaging-focused tools like Flit to all-encompassing Python development environments such as Poetry, these new tools are no longer building on shifting stands. Meanwhile existing and battle-tested tools like setuptools and pip have benefited from improved robustness and usability.
We hope that this article has introduced you to some of the underlying principles of Python and its packaging system. Armed with this fundamental knowledge we hope you will be in a better position to approach difficult packaging problems and exploit new and emerging tools.