Programmes

The main tools allow performing segmentation, silence detection, hierarchical clustering, and Viterbi decoding using GMM models trained with EM or MAP.
The programs are stored in package fr.lium.spkDiarization.programs.

MClust

The software implements a hierarchical agglomerative clustering. It works fine with BIC mono Gaussians with full or diagonal covariance, or with GMMs using CLR-like distance.

There are 2 main algorithms:

  • the standard hierarchical agglomerative clustering using Gaussians or using GMM;
  • the linear hierarchical agglomerative clustering using Gaussians only. It fuses consecutive segments of the same speaker from the start to the end of the record or to the end to the start. ie it looks only at the similarity value on the diagonal of the similarity matrix.

The similarity is defined in parameter --cMethod, where it could be set to:

  • l stands for Gaussian BIC similarity with a a start to end algorithm,
  • r stands for Gaussian BIC similarity with a end to start algorithm,
  • h stands for Gaussian BIC similarity (standard clustering),
  • c stands for GMM Cross Likelihood Ratio similarity (CLR)[1],
  • ce stands for GMM Cross entropy (equivalent to Normalized CLR)[2][3] ,
  • kl2 stands for Gaussian symmetric Kullback–Leibler,
  • h2 stands for Gaussian symmetric Holister,
  • gd stands for Gaussian Divergence,
  • gdgmm stands for Gaussian Divergence using GMM (equivalent to KL2 using GMM)[4],
  • icr stand for Information Change Rate [5],
  • t is a GMM base similarity [6],
  • glr stands for Generalized Likelihood Ratio.

Other parameters to control the clustering are:

  • --cThr : the threshold used in stop criterion (and the penalty factor in similarity);
  • --cMaximumMerge: stop the process if the maximum number of merges is reached;
  • --cMinimumOfCluster: stop the process if the minimum number of speakers in the diarization is reached;
  • --cMinimumLength: stop the process if the minimum length of cluster id reached (? to check);
  • --saveAllStep: at each iteration the diarization is saved;
  • --tOutputMask, the output mask of the models;
  • --kind: the kind of Gaussians [FULL,DIAG];
  • --nbComp: number of Gaussians in the model (1 for Gaussian).

mSegInit

Perform two safety checks on a given feature file:

  • Checks that the sections of the file on which segmentation is to be done actually fit completely in the file (it sometimes happens in evaluation campaigns that some sound files are not as long as they are supposed to be); these sections are given to the program as a segmentation file;
  • Checks the feature vectors to ensure that there is no sequence of several identical vectors (usually resulting from a problem when recording the sound), as such sequences would disturb the segmentation process.

mSeg

A similarity-based segmentation detection software that find the instantaneous change points corresponding to segment boundaries. The default algorithm is based on the one described in [7]. It detects the change points through a similarity, computed using two Gaussians. The Gaussians are estimated over a window sliding along the whole signal. A change point, i.e. a segment boundary, is present in the middle of the window when the similarity reaches a local maximum.

Available similarities are defined using –sMethod :

The other parametres are:

  • --sThr the threshold for BIC;
  • --sModelWindowSize the size of a window in which a gaussian is computed, the size correspond to a number of features;
  • --sMinimumWindowSize the minimum size between two segment boundaries, the size correspond to a number of features;
  • --kind the type of gaussian (FULL or DIAG);
  • --sRecursion use a recursive algorithm to find boundaries, the algorithm cuts in two segments at the highest similarity and do it recursively on the both segments until the size of the segments is greater than --sMinimumWindowSize.

mTrainInit

Software for initialize a GMMs. The initialization methods are defined by –emInitMethod where value corresponds to:

  • split_all: split all the gaussians in two new gaussian at each iteration;
  • split: split the gaussian with the biggest variance in two new gaussians;
  • uniform: take n-1 features to initialize the mean of n-1 gaussians, the last gaussian correspond to the mean and variance of the whole features,
  • copy: copy a model from the input model.

mTrainEM

Software for the training of a GMM using EM algorithm (previously initialized with mTrainInit).

mTrainMAP

Software for the training of a GMM using MAP algorithm (previously initialized with mTrainInit).

mDecode

Basic Viterbi decoder using a set of GMMs.

A program that computes the likelihood scores given a set of GMMs.