FactorAnalyser¶

class factor_analyser.FactorAnalyser(input_file_name=None, mean=None, F=None, G=None, H=None, Sigma=None)[source]¶

A class to train factor analyser such as total variability models and Probabilistic Linear Discriminant Analysis (PLDA).

Attr mean: mean vector
Attr F: between class matrix
Attr G: within class matrix
Attr H: MAP covariance matrix (for Joint Factor Analysis only)
Attr Sigma: residual covariance matrix

extract_ivectors(ubm, stat_server_filename, prefix='', batch_size=300, uncertainty=False, num_thread=1)[source]¶

Parallel extraction of i-vectors using multiprocessing module

Parameters

ubm – Mixture object (the UBM)
stat_server_filename – name of the file from which the input StatServer is read
prefix – prefix used to store the StatServer in its file
batch_size – number of sessions to process in a batch
uncertainty – a boolean, if True, return the diagonal of the uncertainty matrices
num_thread – number of process to run in parallel

Returns

a StatServer with i-vectors in the stat1 attribute and a matrix of uncertainty matrices (optional)

extract_ivectors_single(ubm, stat_server, uncertainty=False)[source]¶

Estimate i-vectors for a given StatServer using single process on a single node.

Parameters

stat_server – sufficient statistics stored in a StatServer
ubm – Mixture object (the UBM)
uncertainty – boolean, if True, return an additional matrix with uncertainty matrices (diagonal of the matrices)

Returns

a StatServer with i-vectors in the stat1 attribute and a matrix of uncertainty matrices (optional)

plda(stat_server, rank_f, nb_iter=10, scaling_factor=1.0, output_file_name=None, save_partial=False, save_final=True)[source]¶

Train a simplified Probabilistic Linear Discriminant Analysis model (no within class covariance matrix but full residual covariance matrix)

Parameters

stat_server – StatServer object with training statistics
rank_f – rank of the between class covariance matrix
nb_iter – number of iterations to run
scaling_factor – scaling factor to downscale statistics (value bewteen 0 and 1)
output_file_name – name of the output file where to store PLDA model
save_partial – boolean, if True, save PLDA model after each iteration

static read(input_filename)[source]¶

Read a generic FactorAnalyser model from a HDF5 file

Parameters: input_filename – the name of the file to read from
Returns: a FactorAnalyser object

total_variability(stat_server_filename, ubm, tv_rank, nb_iter=20, min_div=True, tv_init=None, batch_size=300, save_init=False, output_file_name=None, num_thread=1)[source]¶

Train a total variability model using multiple process on a single node. this method is the recommended one to train a Total Variability matrix.

Optimization:: Only half of symmetric matrices are stored here process sessions per batch in order to control the memory footprint Batches are processed by a pool of workers running in different process The implementation is based on a multiple producers / single consumer approach

Parameters

stat_server_filename – a list of StatServer file names to process
ubm – a Mixture object
tv_rank – rank of the total variability model
nb_iter – number of EM iteration
min_div – boolean, if True, apply minimum divergence re-estimation
tv_init – initial matrix to start the EM iterations with
batch_size – size of batch to load in memory for each worker
save_init – boolean, if True, save the initial matrix
output_file_name – name of the file where to save the matrix
num_thread – number of process to run in parallel

total_variability_raw(stat_server, ubm, tv_rank, nb_iter=20, min_div=True, tv_init=None, save_init=False, output_file_name=None)[source]¶

Train a total variability model using a single process on a single node. This method is provided for didactic purpose and should not be used as it uses to much memory and is to slow. If you want to use a single process run: “total_variability_single”

Parameters

stat_server – the StatServer containing data to train the model
ubm – a Mixture object
tv_rank – rank of the total variability model
nb_iter – number of EM iteration
min_div – boolean, if True, apply minimum divergence re-estimation
tv_init – initial matrix to start the EM iterations with
save_init – boolean, if True, save the initial matrix
output_file_name – name of the file where to save the matrix

total_variability_single(stat_server_filename, ubm, tv_rank, nb_iter=20, min_div=True, tv_init=None, batch_size=300, save_init=False, output_file_name=None)[source]¶

Train a total variability model using a single process on a single node. Use this method to run a single process on a single node with optimized code.

Optimization:: Only half of symmetric matrices are stored here process sessions per batch in order to control the memory footprint

Parameters

stat_server_filename – the name of the file for StatServer, containing data to train the model
ubm – a Mixture object
tv_rank – rank of the total variability model
nb_iter – number of EM iteration
min_div – boolean, if True, apply minimum divergence re-estimation
tv_init – initial matrix to start the EM iterations with
batch_size – number of sessions to process at once to reduce memory footprint
save_init – boolean, if True, save the initial matrix
output_file_name – name of the file where to save the matrix

Previous topic

Next topic

This Page

FactorAnalyser¶