Extract your I-Vectors¶
Once trained a Universal Background Model (GMM or DNN) and a Total Variability matrix, you are now ready to extract i-vectors.
Starting from version 1.2 of SIDEKIT, the extraction process is managed with a FactorAnalyser.
Considering that you have already created:
a Mixture, ubm, to use as a UBM
a FeaturesServer, features_server, to load acoustic features
and that you have computed sufficient statistics on one or multiple set of segments and that those statistics are stored in one or multiple StatServer, stat_server.
1. Extract i-vectors in a single process¶
The following code wil extract i-vectors for the set of segments which statistics are in stat_server.
fa = sidekit.FactorAnalyser()
iv, iv_uncertainty = fa.extract_ivectors_single(ubm,
stat_server,
uncertainty=True)
- Where:
ubm is a
Mixture
stat_server is an object of type
StatServer
uncertainty is a boolean, if True, the method also returns a matrix where each line is the diagonal of the of the uncerainty matrix of the corresponding i-vector.
Note
iv
is a StatServer that contains i-vectors in stat1 and ones in stat0.
2. Extract i-vectors on multiple process on a single node¶
The following code wil extract i-vectors for the set of segments which statistics are in stat_server using multiple process on a single machine.
Due to the limitations of the Multiprocessing module (related to the pickling of objects), we advertise to keep batchsize of a few hundred sessions.
fa = sidekit.FactorAnalyser()
iv, iv_uncertainty = fa.extract_ivectors(ubm,
stat_server_filename,
prefix='',
batch_size=300,
uncertainty=False,
num_thread=1)
- Where:
ubm is a
Mixture
stat_server_filename is the name of an HDF5 containing a
StatServer
prefix is the prefix of the statistic data set within the HDF5 file
batch_size number of sessions to process on each process
uncertainty is a boolean, if True, the method also returns a matrix where each line is the diagonal of the of the uncertainty matrix of the corresponding i-vector.
num_thread, number of process to run in parallel
3. Extract i-vectors on multiple nodes¶
SIDEKIT also provide a function to extract i-vectors on several nodes (machines) which is especially appropriate for big size models (> 4000 distributions).
Refer to the Parallel computation in SIDEKIT. page to see how to launch your computation on several nodes.
The code to execute should look like this:
fa = sidekit.FactorAnalyser()
sidekit.sidekit_mpi.extract_ivector(stat_server_file_name,
ubm,
output_file_name,
uncertainty=False,
prefix='')
- Where:
stat_server_filename is a filename of a StatServer containing sufficient statistics that will be used to generate i-vectors
ubm is a
Mixture
objectoutput_file_name name of the HDF5 file where i-vectors will be stored
uncertainty is a boolean, if True, the method also returns a matrix where each line is the diagonal of the of the uncertainty matrix of the corresponding i-vector. This matrix is stored on disk in a HDF5 file.
prefix is the prefix of the sufficient statistics in the HDF5 file