In general, there are two ways to achieve parallelization:

  • Python multiprocessing module
  • bash parallel for loops.

multiprocessing Module

The Python multiprocessing module is more suitable for parallelizing similar tasks. For example, in a Machine Learning project, we want to obtain the mean-variance training performance of the image classifier using different random seeds. The hyperparameters are fixed, and thus, it is easier to implement the parallel training using the multiprocessing module.

Bash for Loop

Bash parallel for loop is more suitable for parallelizing different tasks. For example, we want to train an image classifier with different hyperparameters. Generally, the configurations or parameters for different tasks are stored in separate configuration files, which makes the bash for loop easier to access.

Note that all data saving and loading operations should be coded in Python instead of bash scripts.

Some Notes

These two parallelization approaches can achieve similar functionalities. There are no strict rules to specify which one to use. The choice should depend on the specific project.