FactorAnalyser

class factor_analyser.FactorAnalyser(input_file_name=None, mean=None, F=None, G=None, H=None, Sigma=None)[source]

A class to train factor analyser such as total variability models and Probabilistic Linear Discriminant Analysis (PLDA).

Attr mean

mean vector

Attr F

between class matrix

Attr G

within class matrix

Attr H

MAP covariance matrix (for Joint Factor Analysis only)

Attr Sigma

residual covariance matrix

extract_ivectors(ubm, stat_server_filename, prefix='', batch_size=300, uncertainty=False, num_thread=1)[source]

Parallel extraction of i-vectors using multiprocessing module

Parameters
  • ubm – Mixture object (the UBM)

  • stat_server_filename – name of the file from which the input StatServer is read

  • prefix – prefix used to store the StatServer in its file

  • batch_size – number of sessions to process in a batch

  • uncertainty – a boolean, if True, return the diagonal of the uncertainty matrices

  • num_thread – number of process to run in parallel

Returns

a StatServer with i-vectors in the stat1 attribute and a matrix of uncertainty matrices (optional)

extract_ivectors_single(ubm, stat_server, uncertainty=False)[source]

Estimate i-vectors for a given StatServer using single process on a single node.

Parameters
  • stat_server – sufficient statistics stored in a StatServer

  • ubm – Mixture object (the UBM)

  • uncertainty – boolean, if True, return an additional matrix with uncertainty matrices (diagonal of the matrices)

Returns

a StatServer with i-vectors in the stat1 attribute and a matrix of uncertainty matrices (optional)

plda(stat_server, rank_f, nb_iter=10, scaling_factor=1.0, output_file_name=None, save_partial=False, save_final=True)[source]

Train a simplified Probabilistic Linear Discriminant Analysis model (no within class covariance matrix but full residual covariance matrix)

Parameters
  • stat_server – StatServer object with training statistics

  • rank_f – rank of the between class covariance matrix

  • nb_iter – number of iterations to run

  • scaling_factor – scaling factor to downscale statistics (value bewteen 0 and 1)

  • output_file_name – name of the output file where to store PLDA model

  • save_partial – boolean, if True, save PLDA model after each iteration

static read(input_filename)[source]

Read a generic FactorAnalyser model from a HDF5 file

Parameters

input_filename – the name of the file to read from

Returns

a FactorAnalyser object

total_variability(stat_server_filename, ubm, tv_rank, nb_iter=20, min_div=True, tv_init=None, batch_size=300, save_init=False, output_file_name=None, num_thread=1)[source]

Train a total variability model using multiple process on a single node. this method is the recommended one to train a Total Variability matrix.

Optimization:

Only half of symmetric matrices are stored here process sessions per batch in order to control the memory footprint Batches are processed by a pool of workers running in different process The implementation is based on a multiple producers / single consumer approach

Parameters
  • stat_server_filename – a list of StatServer file names to process

  • ubm – a Mixture object

  • tv_rank – rank of the total variability model

  • nb_iter – number of EM iteration

  • min_div – boolean, if True, apply minimum divergence re-estimation

  • tv_init – initial matrix to start the EM iterations with

  • batch_size – size of batch to load in memory for each worker

  • save_init – boolean, if True, save the initial matrix

  • output_file_name – name of the file where to save the matrix

  • num_thread – number of process to run in parallel

total_variability_raw(stat_server, ubm, tv_rank, nb_iter=20, min_div=True, tv_init=None, save_init=False, output_file_name=None)[source]

Train a total variability model using a single process on a single node. This method is provided for didactic purpose and should not be used as it uses to much memory and is to slow. If you want to use a single process run: “total_variability_single”

Parameters
  • stat_server – the StatServer containing data to train the model

  • ubm – a Mixture object

  • tv_rank – rank of the total variability model

  • nb_iter – number of EM iteration

  • min_div – boolean, if True, apply minimum divergence re-estimation

  • tv_init – initial matrix to start the EM iterations with

  • save_init – boolean, if True, save the initial matrix

  • output_file_name – name of the file where to save the matrix

total_variability_single(stat_server_filename, ubm, tv_rank, nb_iter=20, min_div=True, tv_init=None, batch_size=300, save_init=False, output_file_name=None)[source]

Train a total variability model using a single process on a single node. Use this method to run a single process on a single node with optimized code.

Optimization:

Only half of symmetric matrices are stored here process sessions per batch in order to control the memory footprint

Parameters
  • stat_server_filename – the name of the file for StatServer, containing data to train the model

  • ubm – a Mixture object

  • tv_rank – rank of the total variability model

  • nb_iter – number of EM iteration

  • min_div – boolean, if True, apply minimum divergence re-estimation

  • tv_init – initial matrix to start the EM iterations with

  • batch_size – number of sessions to process at once to reduce memory footprint

  • save_init – boolean, if True, save the initial matrix

  • output_file_name – name of the file where to save the matrix