Run an `i`-vector system ======================== This script runs an experiment on the male NIST Speaker Recognition Evaluation 2010 extended core task. For more details about the protocol, refer to the `NIST-SRE `_ website. .. The complete Python script can be downloaded :download:`here ` In order to get this scirpt running on your machine, you will need to modify a limited number of options to indicate where your features are located and how many threads you want to run in parallel. Getting ready ------------- Load your favorite modules before going further. .. code-block:: python import sidekit Set parameters of your system: .. code-block:: python distrib_nb = 2048 # number of Gaussian distributions for each GMM rank_TV = 400 # Rank of the total variability matrix tv_iteration = 10 # number of iterations to run plda_rk = 400 # rank of the PLDA eigenvoice matrix feature_dir = '/lium/spk1/larcher/mfcc_24/' # directory where to find the features feature_extension = 'h5' # Extension of the feature files nbThread = 10 # Number of parallel process to run Load list of files to process. All the files neede to run this tutorial are available at :ref:`datasets`. .. code-block:: python with open("task/ubm_list.txt", "r") as fh: ubm_list = np.array([line.rstrip() for line in fh]) tv_idmap = sidekit.IdMap("task/tv_idmap.h5") plda_male_idmap = sidekit.IdMap("task/plda_male_idmap.h5") enroll_idmap = sidekit.IdMap("task/core_male_sre10_trn.h5") test_idmap = sidekit.IdMap("task/test_sre10_idmap.h5") The lists needed are: - the list of files to train the GMM-UBM - an IdMap listing the files to train the total variability matrix - an IdMap to train the PLDA, WCCN, Mahalanobis matrices - the IdMap listing the enrolment segments and models - the IdMap describing the test segments Load Key and Ndx: .. code-block:: python test_ndx = sidekit.Ndx("task/core_core_all_sre10_ndx.h5") keys = sidekit.Key('task/core_core_all_sre10_cond5_key.h5') Define the FeaturesServer to load the acoustic features: .. code-block:: python fs = sidekit.FeaturesServer(feature_filename_structure="{dir}/{{}}.{ext}".format(dir=feature_dir, ext=feature_extension), dataset_list=["energy", "cep", "vad"], mask="[0-12]", feat_norm="cmvn", keep_all_features=False, delta=True, double_delta=True, rasta=True, context=None) Train your system ----------------- Train now the UBM-GMM using EM algorithm and write it to disk. After each iteration, the current version of the mixture is written to disk. .. code-block:: python ubm = sidekit.Mixture() llk = ubm.EM_split(fs, ubm_list, distrib_nb, num_thread=nbThread, save_partial='gmm/ubm') ubm.write('gmm/ubm_{}.h5'.format(distrib_nb)) Create StatServers for the enrollment, test and background data and compute the statistics: .. code-block:: python enroll_stat = sidekit.StatServer(enroll_idmap, ubm) enroll_stat.accumulate_stat(ubm=ubm, feature_server=fs, seg_indices=range(enroll_stat.segset.shape[0]) ,num_thread=nbThread) enroll_stat.write('data/stat_sre10_core-core_enroll_{}.h5'.format(distrib_nb)) test_stat = sidekit.StatServer(test_idmap, ubm) test_stat.accumulate_stat(ubm=ubm, feature_server=fs, seg_indices=range(test_stat.segset.shape[0]), num_thread=nbThread) test_stat.write('data/stat_sre10_core-core_test_{}.h5'.format(distrib_nb)) back_idmap = plda_all_idmap.merge(tv_idmap) back_stat = sidekit.StatServer(back_idmap, ubm) back_stat.accumulate_stat(ubm=ubm, feature_server=fs, seg_indices=range(back_stat.segset.shape[0]), num_thread=nbThread) back_stat.write('data/stat_back_{}.h5'.format(distrib_nb)) Train Total Variability Matrix for i-vector extraction. After each iteration, the matrix is saved to disk. .. code-block:: python tv_stat = sidekit.StatServer.read_subset('data/stat_back_{}.h5'.format(distrib_nb), tv_idmap) tv_mean, tv, _, __, tv_sigma = tv_stat.factor_analysis(rank_f = rank_TV, rank_g = 0, rank_h = None, re_estimate_residual = False, it_nb = (tv_iteration,0,0), min_div = True, ubm = ubm, batch_size = 100, num_thread = nbThread, save_partial = "data/TV_{}".format(distrib_nb)) sidekit.sidekit_io.write_tv_hdf5((tv, tv_mean, tv_sigma), "data/TV_{}".format(distrib_nb)) Extract i-vectors for target models, training and test segments: .. code-block:: python enroll_stat = sidekit.StatServer('data/stat_sre10_core-core_enroll_{}.h5'.format(distrib_nb)) enroll_iv = enroll_stat.estimate_hidden(tv_mean, tv_sigma, V=tv, batch_size=100, num_thread=nbThread)[0] enroll_iv.write('data/iv_sre10_core-core_enroll_{}.h5'.format(distrib_nb)) test_stat = sidekit.StatServer('data/stat_sre10_core-core_test_{}.h5'.format(distrib_nb)) test_iv = test_stat.estimate_hidden(tv_mean, tv_sigma, V=tv, batch_size=100, num_thread=nbThread)[0] test_iv.write('data/iv_sre10_core-core_test_{}.h5'.format(distrib_nb)) plda_stat = sidekit.StatServer.read_subset('data/stat_back_{}.h5'.format(distrib_nb), plda_all_idmap) plda_iv = plda_stat.estimate_hidden(tv_mean, tv_sigma, V=tv, batch_size=100, num_thread=nbThread)[0] plda_iv.write('data/iv_plda_{}.h5'.format(distrib_nb)) Run the tests ------------- .. code-block:: python keys = [] for cond in range(9): keys.append(sidekit.Key('/lium/buster1/larcher/nist/sre10/core_core_{}_sre10_cond{}_key.h5'.format("all", cond + 1))) enroll_iv = sidekit.StatServer('data/iv_sre10_core-core_enroll_{}.h5'.format(distrib_nb)) test_iv = sidekit.StatServer('data/iv_sre10_core-core_test_{}.h5'.format(distrib_nb)) plda_iv = sidekit.StatServer.read_subset('data/iv_plda_{}.h5'.format(distrib_nb), plda_male_idmap) Using Cosine similarity ~~~~~~~~~~~~~~~~~~~~~~~ A simple cosine scoring without any normalization of the i-vectors. .. code-block:: python scores_cos = sidekit.iv_scoring.cosine_scoring(enroll_iv, test_iv, test_ndx, wccn = None) A version where `i`-vectors are normalized using Within Class Covariance normalization (WCCN). .. code-block:: python wccn = plda_iv.get_wccn_choleski_stat1() scores_cos_wccn = sidekit.iv_scoring.cosine_scoring(enroll_iv, test_iv, test_ndx, wccn=wccn) The same with a Linear Discriminant Analysis performed first to reduce the dimension of `i`-vectors to 150 dimensions. .. code-block:: python LDA = plda_iv.get_lda_matrix_stat1(150) plda_iv_lda = copy.deepcopy(plda_iv) enroll_iv_lda = copy.deepcopy(enroll_iv) test_iv_lda = copy.deepcopy(test_iv) plda_iv_lda.rotate_stat1(LDA) enroll_iv_lda.rotate_stat1(LDA) test_iv_lda.rotate_stat1(LDA) scores_cos_lda = sidekit.iv_scoring.cosine_scoring(enroll_iv_lda, test_iv_lda, test_ndx, wccn=None) And now combine LDA and WCCN: .. code-block:: python wccn = plda_iv_lda.get_wccn_choleski_stat1() scores_cos_lda_wcnn = sidekit.iv_scoring.cosine_scoring(enroll_iv_lda, test_iv_lda, test_ndx, wccn=wccn) Using Mahalanobis distance ~~~~~~~~~~~~~~~~~~~~~~~~~~ If the scoring is 'mahalanobis', `i`-vectors are normalized using one iteration of the Eigen Factor Radial algorithm (equivalent to the so called length-normalization). Then scores are computed using a Mahalanobis distance. .. code-block:: python meanEFR, CovEFR = plda_iv.estimate_spectral_norm_stat1(3) plda_iv_efr1 = copy.deepcopy(plda_iv) enroll_iv_efr1 = copy.deepcopy(enroll_iv) test_iv_efr1 = copy.deepcopy(test_iv) plda_iv_efr1.spectral_norm_stat1(meanEFR[:1], CovEFR[:1]) enroll_iv_efr1.spectral_norm_stat1(meanEFR[:1], CovEFR[:1]) test_iv_efr1.spectral_norm_stat1(meanEFR[:1], CovEFR[:1]) M1 = plda_iv_efr1.get_mahalanobis_matrix_stat1() scores_mah_efr1 = sidekit.iv_scoring.mahalanobis_scoring(enroll_iv_efr1, test_iv_efr1, test_ndx, M1) Using Two-covariance scoring ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If the scoring is '2cov', two 2-covariance models are trained with and without `i`-vector normalization. The normalization applied consists of one iteration of Spherical Noramlization. .. code-block:: python W = plda_iv.get_within_covariance_stat1() B = plda_iv.get_between_covariance_stat1() scores_2cov = sidekit.iv_scoring.two_covariance_scoring(enroll_iv, test_iv, test_ndx, W, B) meanSN, CovSN = plda_iv.estimate_spectral_norm_stat1(1, 'sphNorm') plda_iv_sn1 = copy.deepcopy(plda_iv) enroll_iv_sn1 = copy.deepcopy(enroll_iv) test_iv_sn1 = copy.deepcopy(test_iv) plda_iv_sn1.spectral_norm_stat1(meanSN[:1], CovSN[:1]) enroll_iv_sn1.spectral_norm_stat1(meanSN[:1], CovSN[:1]) test_iv_sn1.spectral_norm_stat1(meanSN[:1], CovSN[:1]) W1 = plda_iv_sn1.get_within_covariance_stat1() B1 = plda_iv_sn1.get_between_covariance_stat1() scores_2cov_sn1 = sidekit.iv_scoring.two_covariance_scoring(enroll_iv_sn1, test_iv_sn1, test_ndx, W1, B1) Using Probabilistic Linear Discriminant Analysis ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Normalize i-vector using Spherical Nuisance Normalization and compute scores using Probabilistic Linear Discriminant Analysis .. code-block:: python meanSN, CovSN = plda_iv.estimate_spectral_norm_stat1(1, 'sphNorm') plda_iv.spectral_norm_stat1(meanSN[:1], CovSN[:1]) enroll_iv.spectral_norm_stat1(meanSN[:1], CovSN[:1]) test_iv.spectral_norm_stat1(meanSN[:1], CovSN[:1]) plda_mean, plda_F, plda_G, plda_H, plda_Sigma = plda_iv.factor_analysis(rank_f=plda_rk, rank_g=0, rank_h=None, re_estimate_residual=True, it_nb=(10,0,0), min_div=True, ubm=None, batch_size=1000, num_thread=nbThread) sidekit.sidekit_io.write_plda_hdf5((plda_mean, plda_F, plda_G, plda_Sigma), "data/plda_model_tel_m_{}.h5".format(distrib_nb)) scores_plda = sidekit.iv_scoring.PLDA_scoring(enroll_iv, test_iv, test_ndx, plda_mean, plda_F, plda_G, plda_Sigma, full_model=False) Plot the DET curves ------------------- In case you want to display the results of the experiments. First define the target prior, the parameters of the graphic window and the title of the plot. .. code-block:: python # Set the prior following NIST-SRE 2010 settings prior = sidekit.logit_effective_prior(0.001, 1, 1) # Initialize the DET plot to 2010 settings dp = sidekit.DetPlot(windowStyle='sre10', plotTitle='I-Vectors SRE 2010-ext male, cond 5') For each of the performed experiments, load the target and non-target scores for the condition 5 according to the key file. .. code-block:: python dp.set_system_from_scores(scores_cos, keys, sys_name='Cosine') dp.set_system_from_scores(scores_cos_wccn, keys, sys_name='Cosine WCCN') dp.set_system_from_scores(scores_cos_lda, keys, sys_name='Cosine LDA') dp.set_system_from_scores(scores_cos_wccn_lda, keys, sys_name='Cosine WCCN LDA') dp.set_system_from_scores(scores_mah_efr1, keys, sys_name='Mahalanobis EFR') dp.set_system_from_scores(scores_2cov, keys, sys_name='2 Covariance') dp.set_system_from_scores(scores_2cov_sn1, keys, sys_name='2 Covariance Spherical Norm') dp.set_system_from_scores(scores_plda, keys, sys_name='PLDA') Create the window and plot:: dp.create_figure() dp.plot_rocch_det(0) dp.plot_rocch_det(1) dp.plot_rocch_det(2) dp.plot_rocch_det(3) dp.plot_rocch_det(4) dp.plot_rocch_det(5) dp.plot_rocch_det(6) dp.plot_rocch_det(7) dp.plot_DR30_both(idx=0) dp.plot_mindcf_point(prior, idx=0) Depending of the data available, the following plot could be obtained at the end of this tutorial: (For this example, data used include NIST-SRE 04, 05, 06, 08, the SwitchBoard Part 2 phase 2 and 3 and Cellular part 2) Those results are far from optimal as don't generalize on other conditions of NIST-SRE 2010. This system has been trained without any specific data selection and its purpose is only to give an idea of what you can obtain. .. figure:: I-Vector_sre10_cond5_male_coreX.png .. _NIST: http://www.itl.nist.gov/iad/mig/tests/sre/2010/