Parallel computation in SIDEKIT

Multiprocessing

SIDEKIT makes an extensive use of parallel computating to speed up the process of massive quantity of data. Implementations of SIDEKIT method rely on the Multiprocessing module which is part of the Python standard modules and allows the use of multiple cores on a single machine.

All methods making use the Multiprocessing parallelisation ability are using the num_thread parameter that defines the number of parallel process to run.

Parallelisation using Multiprocessing module is done in two ways depending on the nature of the computation.
  • some methods use a Multiprocessing.Pool of process

  • other methods are parallelised via a decorator that allows the ode to be written for a single Process (more readable) and to be parallelized at running time. The reading of the decorator might be tideous but the main rule when using it is: explicit all argument names passed to the method as the decorator parallelises the code according to the named arguments. The use of unnamed arguments might disable the parallel processing or even worse: duplicate the work made (for instance process a given list of file on each process instead of sharing the list amongst process).

MPI

Since version 1.2, SIDEKIT offers a MPI implementation of the most computational demanding methods:
  • estimation of a UBM-GMM via EM

  • estimation of a total variability model via EM

  • extraction of i-vectors

In Python, most of the MPI functionnalities are accessible via the mpi4py module. MPI will launch process on one or many nodes according to your command.

The use of MPI allows to make use of multiple nodes or a full cluster by using SLURM or TORQUE for instance.

The remaining of this page describes how to run a Python script on a given list of machine using the mpi4py module.

To see an example of code using MPI, refer to the Train an i-vector system using MPI.

Writing code for MPI

When writing code for MPI you first need to create an instance of MPI.COMM_WORLD that manages the communication between nodes.

from mpi4py import MPI
comm = MPI.COMM_WORLD

From this point onward, each process has a unique rank starting form 0. Code writen for MPI directly specifies within the code on which process to run which instruction. Every line of code is executed in every process unless explicitly specified.

print("This is Process: {} over {}".format(comm.rank, comm.size))

if comm.rank == 0:
    print("I'm process 0")

The code above will display:

This is Process: 0 over 10
This is Process: 1 over 10
This is Process: 2 over 10
This is Process: 3 over 10
This is Process: 4 over 10
This is Process: 5 over 10
This is Process: 6 over 10
This is Process: 7 over 10
This is Process: 8 over 10
This is Process: 9 over 10
I'm process 0

As you see, only process 0 executes the last instruction. In SIDEKIT, this conditional statement is mostly used to separate the master node from the others when using MAP/REDUCE approach where accumulators are summed on the master node or information are spread in all nodes from this master node.

When calling for SIDEKIT MPI functions, the MPI.COMM_WORLD instance is created within the function and should not be created outside.

To launch 10 process on a single node run

mpirun -np 10 ./my_script.py

Warning

Make sure you my_script.py file starts with the proper header: #!/usr/bin/env python as MPI needs to know what interpreter to call to execute your script.

To launch 10 process on multiple nodes run

mpirun --hostfile my_server_list ./my_script.py

Where my_server_list is a text file that looks like:

192.168.0.81:4
192.168.0.156:1
192.168.0.153:1
192.168.0.152:2
192.168.0.154:2

Each line of the HOSTFILE consists of the IP address of the node and the number of process to run on this node, both information separated by a column. In this example, the script will run on 5 nodes with a total of 10 process.

Note

each process launch by MPI is not able to fork other process on the node unless you explicitly specify (refer to the MPI documentation for more information).

At that point in time, SIDEKIT does not mix multiprocessing and MPI.