Coding Practice for Python Projects
We summarize some coding practices when writing Python projects for better code readability and maintenance.
Common Practice
- Draw functionality blocks first. Then, implement each block by module or class.
- Classes organize functions of a specific object. For example, a class can be defined to create a NN, train the NN, and get the parameter of NN. It implements necessary APIs to the NN.
- Modules are higher-level abstractions of classes. A module can have several classes with similar functionalities. For example, scipy has
optimize
module andlinalg
module. Inside each module, there are classes and functions
- Scripts should be used to
- call functions
- testing small tasks
- define parameter files
- Encapsulate scripts using functions, especially for testing scripts. Easy for debugging and parallel implementation.
- Create separate virtual environments for different projects.
- Put the virtual environment in the project root directory.
- Recommended virtual environment naming format:
.venv-3.x
. Use a hidden directory; specify the Python version.
- Use a
tmp
directory to store the unrelated or intermediate testing results. - Always write a
README
for a project. A README should contain at least the following:- General project objective and necessary descriptions.
- Package requirements and code instructions.
- Code block diagram or code structure specifications.
- In the project structure, the directory
project_name
orsrc
stores all the source files but no scripts for debugging. Create new scripts in thescripts
directory to call functions and classes inproject_name
orsrc
. - Use
json
oryaml
file to define project parameters, including environment parameters, etc.
Use parseArg
to allow user-level parameter modification from bash scripts. Also, change the parameter in the parseArg
function.
Reference Structure for Python Projects
project_root_directory
├── LICENSE
├── README.md
├── data
│ ├── raw/external <- data from third party
│ ├── processed <- final data or other names
│ └── tmp <- tmp data
│
├── docs <- project documents, which can be a website
│
├── project_name <- project source code, flat-layout
│ ├── __init__.py <- make project_name a module
│ │
│ ├── data <- data related, constant, configurations, download or generate data scripts
│ │ ├── __init__.py
│ │ ├── locconfig.json
│ │ └── locconfig.py
│ │
│ ├── module1 <- module1 implementation
│ │
│ ├── visualization <- visualization
│ │ ├── __init__.py
│ │ └── plot_utils.py
│ │
│ └── module2.py <- module2 or other utilities
│
├── examples <- example scripts of using package utilities
│ └── example_demo.py
│
├── tests <- test scripts
│ └── test_1.py
│
├── scripts <- bash scripts
│ └── bash_script.sh
│
├── requirements.txt <- the requirements for reproducing the environment. `pip freeze > requirements.txt`
└── pyproject.toml <- package configuration
Note:
- Modern Python packaging prefers the use of
pyproject.toml
tosetup.py
andsetup.cfg
. - A project may contain data like configurations. For ML projects, data can be hyperparameters or scripts to download datasets.
examples
can also be named byexperiments
in ML projects. Depending on the project requirements,project_name
does not have to contain subdirectories. Subdirectories in the package should serve distinct purposes.- External data or interim data generated by examples or testing should be separated by the project code. Put them in the
data
directory for better management.
For a typical Python project structure, we refer to Image To Latex Project.