Extract your I-Vectors¶

Once trained a Universal Background Model (GMM or DNN) and a Total Variability matrix, you are now ready to extract i-vectors.

Starting from version 1.2 of SIDEKIT, the extraction process is managed with a FactorAnalyser.

Considering that you have already created:

a Mixture, ubm, to use as a UBM

a FeaturesServer, features_server, to load acoustic features

a FactorAnalyser

and that you have computed sufficient statistics on one or multiple set of segments and that those statistics are stored in one or multiple StatServer, stat_server.

1. Extract i-vectors in a single process¶

The following code wil extract i-vectors for the set of segments which statistics are in stat_server.

fa = sidekit.FactorAnalyser()

iv, iv_uncertainty = fa.extract_ivectors_single(ubm,
                                                stat_server,
                                                uncertainty=True)

Where:

ubm is a Mixture
stat_server is an object of type StatServer
uncertainty is a boolean, if True, the method also returns a matrix where each line is the diagonal of the of the uncerainty matrix of the corresponding i-vector.

Note

iv is a StatServer that contains i-vectors in stat1 and ones in stat0.

2. Extract i-vectors on multiple process on a single node¶

The following code wil extract i-vectors for the set of segments which statistics are in stat_server using multiple process on a single machine.

Due to the limitations of the Multiprocessing module (related to the pickling of objects), we advertise to keep batchsize of a few hundred sessions.

fa = sidekit.FactorAnalyser()

iv, iv_uncertainty = fa.extract_ivectors(ubm,
                                         stat_server_filename,
                                         prefix='',
                                         batch_size=300,
                                         uncertainty=False,
                                         num_thread=1)

Where:

ubm is a Mixture
stat_server_filename is the name of an HDF5 containing a StatServer
prefix is the prefix of the statistic data set within the HDF5 file
batch_size number of sessions to process on each process
uncertainty is a boolean, if True, the method also returns a matrix where each line is the diagonal of the of the uncertainty matrix of the corresponding i-vector.
num_thread, number of process to run in parallel

3. Extract i-vectors on multiple nodes¶

SIDEKIT also provide a function to extract i-vectors on several nodes (machines) which is especially appropriate for big size models (> 4000 distributions).

Refer to the Parallel computation in SIDEKIT. page to see how to launch your computation on several nodes.

The code to execute should look like this:

fa = sidekit.FactorAnalyser()

sidekit.sidekit_mpi.extract_ivector(stat_server_file_name,
                                    ubm,
                                    output_file_name,
                                    uncertainty=False,
                                    prefix='')

Where:

stat_server_filename is a filename of a StatServer containing sufficient statistics that will be used to generate i-vectors
ubm is a Mixture object
output_file_name name of the HDF5 file where i-vectors will be stored
uncertainty is a boolean, if True, the method also returns a matrix where each line is the diagonal of the of the uncertainty matrix of the corresponding i-vector. This matrix is stored on disk in a HDF5 file.
prefix is the prefix of the sufficient statistics in the HDF5 file

Table of Contents

Previous topic

Next topic

This Page

Extract your I-Vectors¶

1. Extract i-vectors in a single process¶

2. Extract i-vectors on multiple process on a single node¶

3. Extract i-vectors on multiple nodes¶