Run a SVM GMM system on the RSR2015 database ============================================ This script run an experiment on the male evaluation part of the **RSR2015** database. The protocol used here is based on the one described in [Larcher14]. In this version, we only consider the non-target trials where impostors pronounce the correct text (Imp Correct). The number of Target trials performed is then - TAR correct: 10,244 - IMP correct: 573,664 [Larcher14] Anthony Larcher, Kong Aik Lee, Bin Ma and Haizhou Li, "Text-dependent speaker verification: Classifiers, databases and RSR2015," in Speech Communication 60 (2014) 56–77 Input/Output ------------ Enter: ~~~~~~ - the number of distribution for the Gaussian Mixture Models - the root directory where the RSR2015 database is stored Generates the following outputs: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - a Mixture in compressed pickle format (ubm) - a StatServer of zero and first-order statistics (enroll\_stat) - a StatServer of zero and first-order statistics (back\_stat) - a StatServer of zero and first-order statistics (nap\_stat) - a StatServer of zero and first-order statistics (test\_stat) - a StatServer containing the super vectors of MAP adapted GMM models for each speaker (enroll\_sv) - a StatServer containing the super vectors of MAP adapted GMM models for each speaker (back\_sv) - a StatServer containing the super vectors of MAP adapted GMM models for each speaker (nap\_sv) - a StatServer containing the super vectors of MAP adapted GMM models for each speaker (test\_sv) - a score file - a DET plot .. code:: python import numpy as np import sidekit import multiprocessing import os import sys import matplotlib.pyplot as mpl import logging logging.basicConfig(filename='log/rsr2015_svm-gmm.log',level=logging.DEBUG) Set your own parameters ----------------------- .. code:: python distrib_nb = 512 # number of Gaussian distributions for each GMM NAP = True # activate the Nuisance Attribute Projection nap_rank = 40 rsr2015Path = '/lium/corpus/vrac/RSR2015_V1/' # Set the number of parallel process to run. nbThread = 10 Load IdMap, Ndx, Key from HDF5 files and ubm\_list ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ that define the task. Note that these files are generated when running ``rsr2015_init.py``: .. code:: python logging.info('Load task definition') enroll_idmap = sidekit.IdMap('task/3sesspwd_eval_m_trn.h5') nap_idmap = sidekit.IdMap('task/3sess-pwd_eval_m_nap.h5') back_idmap = sidekit.IdMap('task/3sess-pwd_eval_m_back.h5') test_ndx = sidekit.Ndx('task/3sess-pwd_eval_m_ndx.h5') test_idmap = sidekit.IdMap('task/3sess-pwd_eval_m_test.h5') key = sidekit.Key('task/3sess-pwd_eval_m_key.h5') with open('task/ubm_list.txt') as inputFile: ubmList = inputFile.read().split('\n') Process the audio to save MFCC on disk ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python logging.info("Initialize FeaturesExtractor") extractor = sidekit.FeaturesExtractor(audio_filename_structure=audioDir+"/{}.wav", feature_filename_structure="./features/{}.h5", sampling_frequency=16000, lower_frequency=133.3333, higher_frequency=6955.4976, filter_bank="log", filter_bank_size=40, window_size=0.025, shift=0.01, ceps_number=19, vad="snr", snr=40, pre_emphasis=0.97, save_param=["vad", "energy", "cep"], keep_all_features=False) # Get the complete list of features to extract show_list = np.unique(np.hstack([ubmList, enroll_idmap.rightids, np.unique(test_ndx.segset)])) channel_list = np.zeros_like(show_list, dtype = int) logging.info("Extract features and save to disk") extractor.save_list(show_list=show_list, channel_list=channel_list, num_thread=nbThread) Create a FeaturesServer ~~~~~~~~~~~~~~~~~~~~~~~ From this point, all objects that need to process acoustic features will do it through a :ref:`featuresserver`. This object is initialized here. We define the type of parameters to load (log-energy + cepstral coefficients) and the post-process to apply on the fly (RASTA filtering, CMVN, addition iof the first and second derivatives, feature selection). .. code:: python # Create a FeaturesServer to load features and feed the other methods features_server = sidekit.FeaturesServer(features_extractor=None, feature_filename_structure="./features/{}.h5", sources=None, dataset_list=["energy", "cep", "vad"], mask=None, feat_norm="cmvn", global_cmvn=None, dct_pca=False, dct_pca_config=None, sdc=False, sdc_config=None, delta=True, double_delta=True, delta_filter=None, context=None, traps_dct_nb=None, rasta=True, keep_all_features=False) Train the Universal background Model (UBM) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ An empty Mixture is initialized and an EM algorithm is run to estimate the UBM before saving it to disk. Covariance matrices are diagonal in this example. .. code:: python logging.info('Train the UBM by EM') # load all features in a list of arrays ubm = sidekit.Mixture() llk = ubm.EM_split(features_server, ubmList, distrib_nb, num_thread=nbThread) ubm.write('gmm/ubm.h5') Compute the sufficient statistics on the UBM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Make use of the new UBM to compute the sufficient statistics of all enrolement sessions that should be used to train the speaker GMM models, models for the SVM training blacklist, segments to train the NAP matrix and test segments. An empty StatServer is initialized. Statistics are then computed in the StatServer which is then stored to disk: .. code:: python logging.info() enroll_stat = sidekit.StatServer(enroll_idmap, distrib_nb=512, feature_size=60) enroll_stat.accumulate_stat(ubm=ubm, feature_server=features_server, seg_indices=range(enroll_stat.segset.shape[0]), num_thread=nbThread) enroll_stat.write('data/stat_rsr2015_male_enroll.h5') back_stat = sidekit.StatServer(back_idmap, distrib_nb=512, feature_size=60) back_stat.accumulate_stat(ubm=ubm, feature_server=features_server, seg_indices=range(back_stat.segset.shape[0]), num_thread=nbThread) back_stat.write('data/stat_rsr2015_male_back.h5') nap_stat = sidekit.StatServer(nap_idmap, distrib_nb=512, feature_size=60) nap_stat.accumulate_stat(ubm=ubm, feature_server=features_server, seg_indices=range(nap_stat.segset.shape[0]), num_thread=nbThread) nap_stat.write('data/stat_rsr2015_male_nap.h5') test_stat = sidekit.StatServer(test_idmap, distrib_nb=512, feature_size=60) test_stat.accumulate_stat(ubm=ubm, feature_server=features_server, seg_indices=range(test_stat.segset.shape[0]), num_thread=nbThread) test_stat.write('data/stat_rsr2015_male_test.h5') Train a GMM for each session ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Only adapt the mean super-vector and store all of them in the enrol\_sv StatServer that is then stored in compressed picked format: .. code:: python logging.info('MAP adaptation of the speaker models') regulation_factor = 3 # MAP regulation factor enroll_sv = enroll_stat.adapt_mean_map(ubm, regulation_factor, norm=True) enroll_sv.write('data/sv_norm_rsr2015_male_enroll.h5') back_sv = back_stat.adapt_mean_map(ubm, regulation_factor, norm=True) back_sv.write('data/sv_rsr2015_male_back.h5') nap_sv = nap_stat.adapt_mean_map(ubm, regulation_factor, norm=True) nap_sv.write('data/sv_rsr2015_male_nap.h5') test_sv = test_stat.adapt_mean_map(ubm, regulation_factor, norm=True) test_sv.write('data/sv_rsr2015_male_test.h5') Apply Nuisance Attribute Projection if required ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If ``NAP == True``, estimate and apply the Nuisance Attribute Projection on all supervectors: .. code:: python if NAP: logging.info('Estimate and apply NAP') napMat = back_sv.get_nap_matrix_stat1(nap_rank); back_sv.stat1 = back_sv.stat1 - np.dot(np.dot(back_sv.stat1, napMat), napMat.transpose()) enroll_sv.stat1 = enroll_sv.stat1 - np.dot(np.dot(enroll_sv.stat1, napMat), napMat.transpose()) test_sv.stat1 = test_sv.stat1 - np.dot(np.dot(test_sv.stat1, napMat), napMat.transpose()) Train the Support Vector Machine models ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Train a Support Vector Machine for each speaker by considering the three sessions of this speaker: .. code:: python logging.info('Train the SVMs') sidekit.svm_training('svm/', back_sv, enroll_sv, num_thread=nbThread) Compute all trials and save scores in HDF5 format ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compute the scores for all trials: .. code:: python logging.info('Compute trial scores') scores_gmm_svm = sidekit.svm_scoring('svm/{}.svm', test_sv, test_ndx, num_thread=nbThread) if NAP: scores_gmm_svm.write('scores/scores_svm-gmm_NAP_rsr2015_male.h5') else: scores_gmm_svm.write('scores/scores_svm-gmm_rsr2015_male.h5') Plot DET curve and compute minDCF and EER ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python logging.info('Plot the DET curve') prior = sidekit.logit_effective_prior(0.01, 10, 1) # Initialize the DET plot to 2008 settings dp = sidekit.DetPlot(window_style='sre10', plot_title='SVM-GMM RSR2015 male') dp.set_system_from_scores(scores_gmm_svm, key, sys_name='SVM-GMM') dp.create_figure() dp.plot_rocch_det(0) dp.plot_DR30_both(idx=0) dp.plot_mindcf_point(prior, idx=0) minDCF, Pmiss, Pfa, prbep, eer = sidekit.bosaris.detplot.fast_minDCF(dp.__tar__[0], dp.__non__[0], prior, normalize=True) logging.info("minDCF = {}, eer = {}".format(minDCF, eer)) After running this script you should obtain the following curve ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. image:: rsr2015_svm_nap.png