Python Packaging
Write Python Project as a Package
A package is a collection of Python modules that can be imported into Python scripts. Examples include numpy
and scipy
. The basic usage includes import
and from ... import ...
The good thing about the Python package is that it provides a well-organized structure for the Python code to achieve a specific functionality (e.g., implement a project). The scripts in the package should be self-contained and focused on implementing project requirements. They are not general scripts for arbitrary purposes.
A project should have a clear goal and requirements. For example, an ML project aims to implement a new algorithm, while a web development project aims to design a tool to facilitate web page design. When we start any project, since it is not a single-script task, it is recommended to organize the code into a package structure, which facilitates future distribution and development.
It is good practice to write necessary functions and classes to the package and then import and combine them to achieve different purposes. Designing and separating functions into the smallest implementable functions is an art. It also depends on the project.
We refer to the post Coding Practice for Python Projects for a reference structure for Python projects.
src layout vs flat layout. Package Discovery and Namespace Packages
Packaging Configuration
There are three files related to Python project packaging: setup.py
, setup.cfg
, and pyproject.toml
. These files are used to package the Python code into distributable libraries. Packaging is useful when we package our code into Python packages, which are commonly used in editable development, final distribution, etc. If we only write simple scripts for tests, packaging is unnecessary.
When we want to distribute Python code, we need first to package the code to make it into an agreed format and then ship it. The distributed package is also called the library. For python packages, there are two types of distributions: source and binary, see Overview of Python Packaging. Python creates the wheel, a package format to ship libraries with compiled artifacts. Mature Python libraries can be uploaded to PyPI to be found and used by all Python users.
A Brief History of Python Build Tools
This post is helpful. Extra: A History of Python Build Tools
As we see, especially for binary libraries, we need tools to compile it (build it) and then package it. People have developed many tools to achieve this. This tool is also called the Python build tool. At the very beginning, in Python 2.2, distutils
was a module of Python’s standard library that allowed users to install and distribute their own packages. Then, it is superseded by setuptools
and was deprecated in PEP 632.
To use setuptools
to build a Python project, we generally need setup()
functions in the module. What we do is that we create a setup.py
script in the project which call functions setuptools
, and run the script to build our python project. Until now, people have needed to write a Python script to build a project. If we want to change some building parameters to the project, we need to read and understand the setup.py
script and change it, which is considered a good style since there is too much logic to configure a project. Therefore, to make the configuration more clear, people extract settings (or options) in the setup.py
(more specifically, settings in setup()
function) to a new configuration file setup.cfg
. Then, it is sufficient to change building options in the configuration file. There is a need to write complex code in setup.py
. See What’s the difference between setup.py and setup.cfg in python projects.
However, setuptools
is not in the Python’s standard library. It means that if we want to package a Python project, we first need to install the setuptools
package and use it to parse the required packages. For example, I created a python package foo
, which uses numpy
. In order to build the package, I first install setuptools
and tell it my required package is numpy
. In fact, I need both numpy
and setuptools
to build my project. In the era of distutils
, this was not a problem for Python developers, as distutils
was shipped as part of Python’s standard library. Therefore, can we use a configuration file that writes setuptools
as a required package? Besides, there are other Python building tools, for example, flit, hatch, pdm, poetry, trampolim, and whey. Can we also use a configuration file that specifies which build tool to use?
This consideration is reflected in PEP 517/518 in 2015, where people tried to use a standardized configuration file pyproject.toml
to specify the build configurations. Since majority of Python projects were built by setuptools
. First, two configuration files, pyproject.toml
and setup.cfg
, were used to specify built configurations, where the first specifies using setuptools
and the second specifies the setup options.
Now move to 2020, PEP 621 decided to incorporate project metadata (build options) to the pyproject.toml
. In this way, there is no need for setup.cfg
since everything can be written into a single pyproject.toml
. With PEP 660, the Python community standardized a way to use wheel files to create editable installs, and therefore, setup.py
is no longer required. Therefore, for the current Python project, we only need to include one pyproject.toml
. setup.py
and setup.cfg
are no longer needed for build configurations. It is also recommended by PyPA that modern Python projects use pyproject.toml
for build configurations.
Difference Between Three Files
From the history, we know that
setup.py
is a Python script for building a Python project using utilities from the packagesetuptools
.setup.cfg
is a straightforward configuration file for thesetup()
function in thesetup.py
. It is created to reduce the complex logic needed to set build configurations. People can modify configurations directly in this file.pyproject.tmol
is a new configuration file that unifies the build end selection and builds configurations. It is recommended to usepyproject.toml
for build configuration.
This post is helpful. Understanding setup.py, setup.cfg and pyproject.toml in Python
Usage
Some useful references:
- A Practical Guide to Using Setup.py
- A Practical Guide to Setuptools and Pyproject.toml
- Writing your
pyproject.toml
Tutorial by PyPA - Configuring setuptools using
pyproject.toml
files Tutorial by SETUPTOOLS
Use setup.py
Only
from setuptools import setup, find_packages
setup(
name='my_proj',
version='0.0.1',
description='pip install test with setup.py only',
packages=find_packages(include=['my_proj', 'my_proj.*']),
install_requires=[
'numpy>=1.26.0',
'scipy>=1.13.0'
],
extras_require={
'interactive': ['matplotlib>=3.6.0',],
}
)
Use setup.py
and setup.cfg
from setuptools import setup, find_packages
setup()
[metadata]
name = my_proj
version = 0.0.1
description = pip install test with setup.cfg
[options]
packages = find:
install_requires =
numpy >= 1.26.0
scipy >= 1.13.0
[options.extras_require]
interactive = matplotlib>=3.6.0
[options.packages.find]
include = my_proj, my_proj*
We can find more specifications of setup.cfg
in Configuring setuptools using setup.cfg
files
Note: We cannot use a single setup.cfg
for building. A setup.py
file containing a setup()
function call is still required even if the configuration resides in setup.cfg
. They need to be used together. See Configuring setuptools using setup.cfg
files
Use pyproject.toml
and setup.cfg
The setup.cfg
remains the same as the previous approach.
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
Use pyproject.toml
Only
[build-system]
requires = ["setuptools >= 61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "my_proj"
version = "0.0.1"
description = "pip install test with pyproject.toml only."
requires-python = ">=3.8"
dependencies = [
"numpy>=1.26.0",
"scipy>=1.13.0",
]
[project.optional-dependencies]
interactive = ["matplotlib>=3.6.0"]
[tool.setuptools.packages.find]
where = ["."]
include = ["my_proj", "my_proj.*"]