FactorAnalyser¶
-
class
factor_analyser.
FactorAnalyser
(input_file_name=None, mean=None, F=None, G=None, H=None, Sigma=None)[source]¶ A class to train factor analyser such as total variability models and Probabilistic Linear Discriminant Analysis (PLDA).
- Attr mean
mean vector
- Attr F
between class matrix
- Attr G
within class matrix
- Attr H
MAP covariance matrix (for Joint Factor Analysis only)
- Attr Sigma
residual covariance matrix
-
extract_ivectors
(ubm, stat_server_filename, prefix='', batch_size=300, uncertainty=False, num_thread=1)[source]¶ Parallel extraction of i-vectors using multiprocessing module
- Parameters
ubm – Mixture object (the UBM)
stat_server_filename – name of the file from which the input StatServer is read
prefix – prefix used to store the StatServer in its file
batch_size – number of sessions to process in a batch
uncertainty – a boolean, if True, return the diagonal of the uncertainty matrices
num_thread – number of process to run in parallel
- Returns
a StatServer with i-vectors in the stat1 attribute and a matrix of uncertainty matrices (optional)
-
extract_ivectors_single
(ubm, stat_server, uncertainty=False)[source]¶ Estimate i-vectors for a given StatServer using single process on a single node.
- Parameters
stat_server – sufficient statistics stored in a StatServer
ubm – Mixture object (the UBM)
uncertainty – boolean, if True, return an additional matrix with uncertainty matrices (diagonal of the matrices)
- Returns
a StatServer with i-vectors in the stat1 attribute and a matrix of uncertainty matrices (optional)
-
plda
(stat_server, rank_f, nb_iter=10, scaling_factor=1.0, output_file_name=None, save_partial=False, save_final=True)[source]¶ Train a simplified Probabilistic Linear Discriminant Analysis model (no within class covariance matrix but full residual covariance matrix)
- Parameters
stat_server – StatServer object with training statistics
rank_f – rank of the between class covariance matrix
nb_iter – number of iterations to run
scaling_factor – scaling factor to downscale statistics (value bewteen 0 and 1)
output_file_name – name of the output file where to store PLDA model
save_partial – boolean, if True, save PLDA model after each iteration
-
static
read
(input_filename)[source]¶ Read a generic FactorAnalyser model from a HDF5 file
- Parameters
input_filename – the name of the file to read from
- Returns
a FactorAnalyser object
-
total_variability
(stat_server_filename, ubm, tv_rank, nb_iter=20, min_div=True, tv_init=None, batch_size=300, save_init=False, output_file_name=None, num_thread=1)[source]¶ Train a total variability model using multiple process on a single node. this method is the recommended one to train a Total Variability matrix.
- Optimization:
Only half of symmetric matrices are stored here process sessions per batch in order to control the memory footprint Batches are processed by a pool of workers running in different process The implementation is based on a multiple producers / single consumer approach
- Parameters
stat_server_filename – a list of StatServer file names to process
ubm – a Mixture object
tv_rank – rank of the total variability model
nb_iter – number of EM iteration
min_div – boolean, if True, apply minimum divergence re-estimation
tv_init – initial matrix to start the EM iterations with
batch_size – size of batch to load in memory for each worker
save_init – boolean, if True, save the initial matrix
output_file_name – name of the file where to save the matrix
num_thread – number of process to run in parallel
-
total_variability_raw
(stat_server, ubm, tv_rank, nb_iter=20, min_div=True, tv_init=None, save_init=False, output_file_name=None)[source]¶ Train a total variability model using a single process on a single node. This method is provided for didactic purpose and should not be used as it uses to much memory and is to slow. If you want to use a single process run: “total_variability_single”
- Parameters
stat_server – the StatServer containing data to train the model
ubm – a Mixture object
tv_rank – rank of the total variability model
nb_iter – number of EM iteration
min_div – boolean, if True, apply minimum divergence re-estimation
tv_init – initial matrix to start the EM iterations with
save_init – boolean, if True, save the initial matrix
output_file_name – name of the file where to save the matrix
-
total_variability_single
(stat_server_filename, ubm, tv_rank, nb_iter=20, min_div=True, tv_init=None, batch_size=300, save_init=False, output_file_name=None)[source]¶ Train a total variability model using a single process on a single node. Use this method to run a single process on a single node with optimized code.
- Optimization:
Only half of symmetric matrices are stored here process sessions per batch in order to control the memory footprint
- Parameters
stat_server_filename – the name of the file for StatServer, containing data to train the model
ubm – a Mixture object
tv_rank – rank of the total variability model
nb_iter – number of EM iteration
min_div – boolean, if True, apply minimum divergence re-estimation
tv_init – initial matrix to start the EM iterations with
batch_size – number of sessions to process at once to reduce memory footprint
save_init – boolean, if True, save the initial matrix
output_file_name – name of the file where to save the matrix