Coding Practice for Python Projects

We summarize some coding practices when writing Python projects for better code readability and maintenance.

Common Practice

Draw functionality blocks first. Then, implement each block by module or class.
- Classes organize functions of a specific object. For example, a class can be defined to create a NN, train the NN, and get the parameter of NN. It implements necessary APIs to the NN.
- Modules are higher-level abstractions of classes. A module can have several classes with similar functionalities. For example, scipy has optimize module and linalg module. Inside each module, there are classes and functions
Scripts should be used to
- call functions
- testing small tasks
- define parameter files
Encapsulate scripts using functions, especially for testing scripts. Easy for debugging and parallel implementation.
Create separate virtual environments for different projects.
- Put the virtual environment in the project root directory.
- Recommended virtual environment naming format: .venv-3.x. Use a hidden directory; specify the Python version.
Use a tmp directory to store the unrelated or intermediate testing results.
Always write a README for a project. A README should contain at least the following:
- General project objective and necessary descriptions.
- Package requirements and code instructions.
- Code block diagram or code structure specifications.
In the project structure, the directory project_name or src stores all the source files but no scripts for debugging. Create new scripts in the scripts directory to call functions and classes in project_name or src.
Use json or yaml file to define project parameters, including environment parameters, etc.

Use parseArg to allow user-level parameter modification from bash scripts. Also, change the parameter in the parseArg function.

Reference Structure for Python Projects

project_root_directory
├── LICENSE
├── README.md
├── data
│   ├── raw/external		<- data from third party
│   ├── processed 		<- final data or other names
│   └── tmp			<- tmp data
│
├── docs			<- project documents, which can be a website 
│
├── project_name		<- project source code, flat-layout
│   ├── __init__.py		<- make project_name a module
│   │
│   ├── data			<- data related, constant, configurations, download or generate data scripts
│   │   ├── __init__.py
│   │   ├── locconfig.json
│   │   └── locconfig.py
│   │
│   ├── module1			<- module1 implementation
│   │
│   ├── visualization		<- visualization
│   │   ├── __init__.py
│   │   └── plot_utils.py
│   │
│   └── module2.py		<- module2 or other utilities
│
├── examples		        <- example scripts of using package utilities
│   └── example_demo.py
│
├── tests			<- test scripts
│   └── test_1.py
│
├── scripts			<- bash scripts
│   └── bash_script.sh
│
├── requirements.txt 	<- the requirements for reproducing the environment. `pip freeze > requirements.txt`
└── pyproject.toml      <- package configuration

Note:

Modern Python packaging prefers the use of pyproject.toml to setup.py and setup.cfg.
A project may contain data like configurations. For ML projects, data can be hyperparameters or scripts to download datasets.
examples can also be named by experiments in ML projects. Depending on the project requirements, project_name does not have to contain subdirectories. Subdirectories in the package should serve distinct purposes.
External data or interim data generated by examples or testing should be separated by the project code. Put them in the data directory for better management.

For a typical Python project structure, we refer to Image To Latex Project.

Coding Practice for Python Projects

Common Practice

Reference Structure for Python Projects

Useful References