Run a GMM-UBM system
====================

This script run an experiment on the male evaluation part of the RSR2015
database. The protocols used here is based on the one described in
[Larcher14]. In this version, we only consider the non-target trials
where impostors pronounce the correct text (Imp Correct).

The number of Target trials performed is then - TAR correct: 10,244 -
IMP correct: 573,664

[Larcher14] Anthony Larcher, Kong Aik Lee, Bin Ma and Haizhou Li,
"Text-dependent speaker verification: Classifiers, databases and
RSR2015," in Speech Communication 60 (2014) 56–77

Input/Output
------------

Enter:
~~~~~~

the number of distribution for the Gaussian Mixture Models the root
directory where the RSR2015 database is stored

Generates the following outputs:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-  a Mixture in HDF5 format (ubm)
-  a StatServer of zero and first-order statistics (enroll\_stat)
-  a StatServer containing the super vectors of MAP adapted GMM models
   for each speaker (enroll\_sv)
-  a score file
-  a DET plot

First, loads the required PYTHON packages:

.. code:: python

   import sidekit
   import os
   import sys
   import multiprocessing
   import matplotlib.pyplot as mpl
   import logging
   import numpy as np

   logging.basicConfig(filename='log/rsr2015_ubm-gmm.log',level=logging.DEBUG)

Set your own parameters
-----------------------

.. code:: python

   distribNb = 512  # number of Gaussian distributions for each GMM
   rsr2015Path = '/lium/corpus/audio/tel/en/RSR2015_v1/'

   # Default for RSR2015
   audioDir = os.path.join(rsr2015Path , 'sph/male')

   # Automatically set the number of parallel process to run.
   # The number of threads to run is set equal to the number of cores available
   # on the machine minus one or to 1 if the machine has a single core.
   nbThread = max(multiprocessing.cpu_count()-1, 1)


Load IdMap, Ndx, Key from HDF5 files and ubm\_list
--------------------------------------------------

Note that these files are generated when running rsr2015\_init.py:

.. code:: python

   print('Load task definition')
   enroll_idmap = sidekit.IdMap('task/3sesspwd_eval_m_trn.h5')
   test_ndx = sidekit.Ndx('task/3sess-pwd_eval_m_ndx.h5')
   key = sidekit.Key('task/3sess-pwd_eval_m_key.h5')
   with open('task/ubm_list.txt') as inputFile:
       ubmList = inputFile.read().split('\n')

Process the audio to save MFCC on disk
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: python

   logging.info("Initialize FeaturesExtractor")
   extractor = sidekit.FeaturesExtractor(audio_filename_structure=audioDir+"/{}.wav",
                                         feature_filename_structure="./features/{}.h5",
                                         sampling_frequency=16000,
                                         lower_frequency=133.3333,
                                         higher_frequency=6955.4976,
                                         filter_bank="log",
                                         filter_bank_size=40,
                                         window_size=0.025,
                                         shift=0.01,
                                         ceps_number=19,
                                         vad="snr",
                                         snr=40,
                                         pre_emphasis=0.97,
                                         save_param=["vad", "energy", "cep"],
                                         keep_all_features=False)

   # Get the complete list of features to extract
   show_list = np.unique(np.hstack([ubmList, enroll_idmap.rightids, np.unique(test_ndx.segset)]))
   channel_list = np.zeros_like(show_list, dtype = int)

   logging.info("Extract features and save to disk")
   extractor.save_list(show_list=show_list,
                       channel_list=channel_list,
                       num_thread=nbThread)

Create a FeaturesServer
~~~~~~~~~~~~~~~~~~~~~~~
From this point, all objects that need to process acoustic features will do it through a :ref:`featuresserver`.
This object is initialized here. We define the type of parameters to load (log-energy + cepstral coefficients)
and the post-process to apply on the fly (RASTA filtering, CMVN, addition iof the first and second derivatives,
feature selection).

.. code:: python

   # Create a FeaturesServer to load features and feed the other methods
   features_server = sidekit.FeaturesServer(features_extractor=None,
                                            feature_filename_structure="./features/{}.h5",
                                            sources=None,
                                            dataset_list=["energy", "cep", "vad"],
                                            mask=None,
                                            feat_norm="cmvn",
                                            global_cmvn=None,
                                            dct_pca=False,
                                            dct_pca_config=None,
                                            sdc=False,
                                            sdc_config=None,
                                            delta=True,
                                            double_delta=True,
                                            delta_filter=None,
                                            context=None,
                                            traps_dct_nb=None,
                                            rasta=True,
                                            keep_all_features=False)

Train the Universal background Model (UBM)
------------------------------------------

.. code:: python

   print('Train the UBM by EM')
   # Extract all features and train a GMM without writing to disk
   ubm = sidekit.Mixture()
   llk = ubm.EM_split(features_server, ubmList, distribNb, num_thread=nbThread, save_partial=True)
   ubm.write('gmm/ubm.h5')

Compute the sufficient statistics on the UBM
--------------------------------------------

Make use of the new UBM to compute the sufficient statistics of all
enrolement sessions that should be used to train the speaker GMM models.
An empty StatServer is initialized from the enroll\_idmap IdMap.
Statistics are then computed in the enroll\_stat StatServer which is
then stored in compressed pickle format:

.. code:: python

   print('Compute the sufficient statistics')
   # Create a StatServer for the enrollment data and compute the statistics
   enroll_stat = sidekit.StatServer(enroll_idmap,
                                    distrib_nb=512,
                                    feature_size=60)
   enroll_stat.accumulate_stat(ubm=ubm,
                               feature_server=features_server,
                               seg_indices=range(enroll_stat.segset.shape[0]),
                               num_thread=nbThread)
   enroll_stat.write('data/stat_rsr2015_male_enroll.h5')

Adapt the GMM speaker models from the UBM via a MAP adaptation
--------------------------------------------------------------

Train a GMM for each speaker. Only adapt the mean supervector and store
all of them in the enrol\_sv StatServer that is then stored to disk:

.. code:: python

   print('MAP adaptation of the speaker models')
   regulation_factor = 3  # MAP regulation factor
   enroll_sv = enroll_stat.adapt_mean_map_multisession(ubm, regulation_factor)
   enroll_sv.write('data/sv_rsr2015_male_enroll.h5')


Compute all trials and save scores in HDF5 format
-------------------------------------------------

.. code:: python

   print('Compute trial scores')
   scores_gmm_ubm = sidekit.gmm_scoring(ubm,
                                        enroll_sv,
                                        test_ndx,
                                        features_server,
                                        num_thread=nbThread)
   scores_gmm_ubm.write('scores/scores_gmm-ubm_rsr2015_male.h5')

Plot DET curve and compute minDCF and EER
-----------------------------------------

.. code:: python

   print('Plot the DET curve')
   # Set the prior following NIST-SRE 2008 settings
   prior = sidekit.logit_effective_prior(0.01, 10, 1)

   # Initialize the DET plot to 2008 settings
   dp = sidekit.DetPlot(window_style='sre10', plot_title='GMM-UBM_RSR2015_male')
   dp.set_system_from_scores(scores_gmm_ubm, key, sys_name='GMM-UBM')
   dp.create_figure()
   dp.plot_rocch_det(0)
   dp.plot_DR30_both(idx=0)
   dp.plot_mindcf_point(prior, idx=0)

Compute equal error rate and minDCF, plot the DET curve.

.. code:: python

   print('Plot DET curves')
   prior = sidekit.logit_effective_prior(0.001, 1, 1)
   minDCF, Pmiss, Pfa, prbep, eer = sidekit.bosaris.detplot.fast_minDCF(dp.__tar__[0], dp.__non__[0], prior, normalize=True)
   print("UBM-GMM 128g, minDCF = {}, eer = {}".format(minDCF, eer))

The following results should be obtained at the end of this tutorial:


.. image:: rsr2015_gmm-ubm.png