.. _MPI:

Parallel computation in SIDEKIT
===============================


Multiprocessing
---------------

**SIDEKIT** makes an extensive use of parallel computating to speed up the process of massive
quantity of data.
Implementations of **SIDEKIT** method rely on the Multiprocessing module which is part of the
Python standard modules and allows the use of multiple cores on a single machine.

All methods making use the Multiprocessing parallelisation ability are using the ``num_thread`` parameter that defines
the number of parallel process to run.

Parallelisation using Multiprocessing module is done in two ways depending on the nature of the computation.
   - some methods use a Multiprocessing.Pool of process
   - other methods are parallelised via a decorator that allows the ode to be written for a single Process
     (more readable) and to be parallelized at running time.
     The reading of the decorator might be tideous but the main rule when using it is: **explicit all argument
     names passed to the method** as the decorator parallelises the code according to the named arguments.
     The use of unnamed arguments might disable the parallel processing or even worse: duplicate the work made
     (for instance process a given list of file on each process instead of sharing the list amongst process).

MPI
---

Since version 1.2, **SIDEKIT** offers a MPI implementation of the most computational demanding methods:
   - estimation of a UBM-GMM via EM
   - estimation of a total variability model via EM
   - extraction of i-vectors

In Python, most of the MPI functionnalities are accessible via the ``mpi4py`` `module <http://pythonhosted.org/mpi4py/>`_.
MPI will launch process on one or many nodes according to your command.

The use of MPI allows to make use of multiple nodes or a full cluster by using SLURM or TORQUE for instance.

The remaining of this page describes how to run a Python script on a given list of machine using the ``mpi4py`` module.

To see an example of code using MPI, refer to the :ref:`mpi_tutorial`.

Writing code for MPI
********************

When writing code for MPI you first need to create an instance of ``MPI.COMM_WORLD`` that manages the communication
between nodes.

.. code:: python

   from mpi4py import MPI
   comm = MPI.COMM_WORLD


From this point onward, each process has a unique *rank* starting form 0.
Code writen for MPI directly specifies *within the code* on which process to run which instruction.
Every line of code is executed in every process unless explicitly specified.

.. code:: python

   print("This is Process: {} over {}".format(comm.rank, comm.size))

   if comm.rank == 0:
       print("I'm process 0")

The code above will display::

   This is Process: 0 over 10
   This is Process: 1 over 10
   This is Process: 2 over 10
   This is Process: 3 over 10
   This is Process: 4 over 10
   This is Process: 5 over 10
   This is Process: 6 over 10
   This is Process: 7 over 10
   This is Process: 8 over 10
   This is Process: 9 over 10
   I'm process 0

As you see, only process 0 executes the last instruction.
In **SIDEKIT**, this conditional statement is mostly used to separate the master node from the others when
using MAP/REDUCE approach where accumulators are summed on the master node or information are spread in all nodes
from this master node.

When calling for **SIDEKIT** MPI functions, the ``MPI.COMM_WORLD`` instance is created within the function
and should not be created outside.

To launch 10 process on a single node run
*****************************************

.. code:: python

   mpirun -np 10 ./my_script.py

.. warning::
   Make sure you `my_script.py` file starts with the proper header: ``#!/usr/bin/env python`` as MPI needs to know what interpreter
   to call to execute your script.

To launch 10 process on multiple nodes run
******************************************

.. code:: python

   mpirun --hostfile my_server_list ./my_script.py

Where `my_server_list` is a text file that looks like::

   192.168.0.81:4
   192.168.0.156:1
   192.168.0.153:1
   192.168.0.152:2
   192.168.0.154:2

Each line of the `HOSTFILE` consists of the IP address of the node and the number of process to run on this node,
both information separated by a column.
In this example, the script will run on 5 nodes with a total of 10 process.

.. note:: each process launch by MPI is not able to fork other process on the node unless you explicitly specify
          (refer to the MPI documentation for more information).

          At that point in time, SIDEKIT does not mix multiprocessing and MPI.