Python Parallelization
In general, there are two ways to achieve parallelization:
- Python
multiprocessing
module - bash parallel
for
loops.
multiprocessing
Module
The Python multiprocessing
module is more suitable for parallelizing similar tasks. For example, in a Machine Learning project, we want to obtain the mean-variance training performance of the image classifier using different random seeds. The hyperparameters are fixed, and thus, it is easier to implement the parallel training using the multiprocessing
module.
Bash for
Loop
Bash parallel for loop is more suitable for parallelizing different tasks. For example, we want to train an image classifier with different hyperparameters. Generally, the configurations or parameters for different tasks are stored in separate configuration files, which makes the bash for
loop easier to access.
Note that all data saving and loading operations should be coded in Python instead of bash scripts.
Some Notes
These two parallelization approaches can achieve similar functionalities. There are no strict rules to specify which one to use. The choice should depend on the specific project.