Coding Practice for Python Projects
We summarize some coding practices when writing Python projects for better code readability and maintenance.
Common Practice
- Draw functionality blocks first. Then, implement each block by module or class.
- Classes organize functions of a specific object. For example, a class can be defined to create a NN, train the NN, and get the parameter of NN. It implements necessary APIs to the NN.
- Modules are higher-level abstractions of classes. A module can have several classes with similar functionalities. For example, scipy has
optimizemodule andlinalgmodule. Inside each module, there are classes and functions
- Scripts should be used to
- call functions
- testing small tasks
- define parameter files
- Encapsulate scripts using functions, especially for testing scripts. Easy for debugging and parallel implementation.
- Create separate virtual environments for different projects.
- Put the virtual environment in the project root directory.
- Recommended virtual environment naming format:
.venv-3.x. Use a hidden directory; specify the Python version.
- Use a
tmpdirectory to store the unrelated or intermediate testing results. - Always write a
READMEfor a project. A README should contain at least the following:- General project objective and necessary descriptions.
- Package requirements and code instructions.
- Code block diagram or code structure specifications.
- In the project structure, the directory
project_nameorsrcstores all the source files but no scripts for debugging. Create new scripts in thescriptsdirectory to call functions and classes inproject_nameorsrc. - Use
jsonoryamlfile to define project parameters, including environment parameters, etc.
Use parseArg to allow user-level parameter modification from bash scripts. Also, change the parameter in the parseArg function.
Reference Structure for Python Projects
project_root_directory
├── LICENSE
├── README.md
├── data
│ ├── raw/external <- data from third party
│ ├── processed <- final data or other names
│ └── tmp <- tmp data
│
├── docs <- project documents, which can be a website
│
├── project_name <- project source code, flat-layout
│ ├── __init__.py <- make project_name a module
│ │
│ ├── data <- data related, constant, configurations, download or generate data scripts
│ │ ├── __init__.py
│ │ ├── locconfig.json
│ │ └── locconfig.py
│ │
│ ├── module1 <- module1 implementation
│ │
│ ├── visualization <- visualization
│ │ ├── __init__.py
│ │ └── plot_utils.py
│ │
│ └── module2.py <- module2 or other utilities
│
├── examples <- example scripts of using package utilities
│ └── example_demo.py
│
├── tests <- test scripts
│ └── test_1.py
│
├── scripts <- bash scripts
│ └── bash_script.sh
│
├── requirements.txt <- the requirements for reproducing the environment. `pip freeze > requirements.txt`
└── pyproject.toml <- package configuration
Note:
- Modern Python packaging prefers the use of
pyproject.tomltosetup.pyandsetup.cfg. - A project may contain data like configurations. For ML projects, data can be hyperparameters or scripts to download datasets.
examplescan also be named byexperimentsin ML projects. Depending on the project requirements,project_namedoes not have to contain subdirectories. Subdirectories in the package should serve distinct purposes.- External data or interim data generated by examples or testing should be separated by the project code. Put them in the
datadirectory for better management.
For a typical Python project structure, we refer to Image To Latex Project.