Train an i-vector extractor

Total Variability models (TV) are trained via EM algorithm using the FactorAnalyser class from SIDEKIT.

TV are trained using sufficient statistics that are accumulated using a StatServer object (or a neural network). The training also required a UBM of type Mixture.

SIDEKIT provides four implementations of the Total Variability EM estimation. Three are methods of the FactorAnalyser class while the fourth one is available in the sidekit_mpi module and required the installation of the MPI library.

  1. total_variability_raw that is provided for didactic purpose, the code is written using the plain (raw) mathematical formulas without any optimization.

  2. total_variability_single that provides a single process implementation of the EM algorithm. this version runs on a single process on a single machine but has been optimized

  3. total_variability is the parallelised and optimised implementation. This method makes use of the Multiprocessing module to parallelise computation on a single machine.

1. Get to know the algorithm with total_variability_raw

We strongly encourage you to READ the code if this method to understand how the EM algorithm works for total variability model.

We strongly discourage you to USE this method as it is absolutely not optimized.

For a usable version of the same method refer to section 3 a(or 2) below.

2. Using a single process on one machine

Training of a TV model on a single machine, single process. Before running:

  • train a GMM-UBM of type Mixture

  • accumulate sufficient statistics using a StatServer object

You can then train the TV model by running:

fa = sidekit.FactorAnalyser()

fa.total_variability_single(stat_server_filename,
                            ubm,
                            tv_rank,
                            nb_iter=20,
                            min_div=True,
                            tv_init=None,
                            batch_size=300,
                            save_init=False,
                            output_file_name=None)
In this example:
  • stat_server_filename is a list of file names for StatServer containing sufficient statistics of all sessions to train the TV model

  • ubm is the Mixture object for which the sufficient statistics have been computed

  • tv_rank is an integer, it is the rank of the resulting Total Variability matrix (size of the i-vectors)

  • nb_iter is the number of iterations to run for the EM algorithm

  • min_div is a boolean, if True every iteration include a Minimum divergence re-estimation step

  • tv_init is a matrix used to initialize the training if None, the matrix is initialized randomly

  • batch_size is the number of session that are processed at once to reduce memory footprint

  • save_init is a boolean, if True, the initial model is saved

  • output_file_name is the name of the file the model will be saved to

3. Using multiple process on one machine with Python MultiProcessing

Training of a TV model on a single machine, multiple process. Before running:

  • train a GMM-UBM of type Mixture

  • accumulate sufficient statistics using a StatServer object

You can then train the TV model by running:

fa = sidekit.FactorAnalyser()

fa.total_variability(stat_server_filename,
                     ubm,
                     tv_rank,
                     nb_iter=20,
                     min_div=True,
                     tv_init=None,
                     batch_size=300,
                     save_init=False,
                     output_file_name=None,
                     num_thread=1)
In this example:
  • stat_server_filename is a list of file names for StatServer containing sufficient statistics of all sessions to train the TV model

  • ubm is the Mixture object for which the sufficient statistics have been computed

  • tv_rank is an integer, it is the rank of the resulting Total Variability matrix (size of the i-vectors)

  • nb_iter is the number of iterations to run for the EM algorithm

  • min_div is a boolean, if True every iteration include a Minimum divergence re-estimation step

  • tv_init is a matrix used to initialize the training if None, the matrix is initialized randomly

  • batch_size is the number of session that are processed at once to reduce memory footprint

  • save_init is a boolean, if True, the initial model is saved

  • output_file_name is the name of the file the model will be saved to

  • num_thread is the number of process to run on the machine

Warning

The batchsize parameter might cause troubles due to the limitation of the Pickle module. Objects and data are exchanged between process via pickling which does not accept “too big” objects.

Note that Numpy and Scipy are linked to the low level BLAS library that might also parallelise the computation on multiple cores. Thus don’t set a number of process that is too high.

We recommend setting the number of parallel process between 5 and 10 depending on your machine.

4. Using multiple process on multiple nodes with MPI

See Parallel computation in SIDEKIT for details about MPI installation and use.

Training of a TV model on a single machine, multiple process. Before running:

  • train a GMM-UBM of type Mixture

  • accumulate sufficient statistics using a StatServer object

You can then train the TV model by running:

fa = sidekit.FactorAnalyser()

fa = sidekit.sidekit_mpi.total_variability(stat_server_filename,
                                           ubm,
                                           tv_rank=10,
                                           nb_iter=10,
                                           min_div=True,
                                           tv_init=fa_init.F,
                                           save_init=False,
                                           output_file_name="tv_mpi")
In this example:
  • stat_server_filename is a list of file names for StatServer containing sufficient statistics of all sessions to train the TV model

  • ubm is the Mixture object for which the sufficient statistics have been computed

  • tv_rank is an integer, it is the rank of the resulting Total Variability matrix (size of the i-vectors)

  • nb_iter is the number of iterations to run for the EM algorithm

  • min_div is a boolean, if True every iteration include a Minimum divergence re-estimation step

  • tv_init is a matrix used to initialize the training if None, the matrix is initialized randomly

  • save_init is a boolean, if True, the initial model is saved

  • output_file_name is the name of the file the model will be saved to