EM Train GMMs

This example show how to train a UBM.

#!/bin/bash show="ubm"
# Input segmentation file, %s will be substituted with $show
# Where is the mfcc, %s will be substituted with the name of the segment show fMask="./mfcc/%s.mfcc" 
# The MFCC vector description, here it corresponds to 12 MFCC + Energy
# spro4=the mfcc was computed by SPro4 tools 
# 1:1:0:0:0:0: 1 = present, 0 not present. 
# order : static, E, delta, delta E, delta delta delta delta E 
# 13: total size of a feature vector in the mfcc file 
# 1:0:0:1 CMS by cluster fDesc="spro4,1:1:0:0:0:0,13,1:0:0:1"
# The GMM used to initialize EM, %s will be substituted with $show
gmmInit="./%s.init.gmms" # The output GMM, %s will be substituted with $show
# Initialize the UBM, ie a GMM with 8 diagonal Gaussian components
/usr/bin/java -Xmx1024m -cp ./LIUM_SpkDiarization.jar fr.lium.spkDiarization.programs.MTrainInit \
--sInputMask=$seg --fInputMask=$fMask --fInputDesc=$fDesc --kind=DIAG \
--nbComp=8 --emInitMethod=split_all --emCtrl=1,5,0.05 --tOutputMask=$gmmInit $show
# Train the UBM via EM
/usr/bin/java -Xmx1024m -cp ./LIUM_SpkDiarization.jar fr.lium.spkDiarization.programs.MTrainEM \
--sInputMask=$seg --fInputMask= $fMask --emCtrl=1,20,0.01 --fInputDesc=$fDesc \
--tInputMask=$gmmInit --tOutputMask=$gmm $show

The first call of the java virtual machine initializes the model. The initial model contains one gaussian learned over the training data. Iteratively, the gaussians are split and trained (up to 5 iterations of EM algorithm) until the number of components is reached. The second call trains the GMM using the EM algorithm. After 1 to 20 iterations, the algorithm stops if the gain of likelihood between 2 iterations is less than a given threshold.

The output file is a ArrayList<GMM> each inner GMM corresponds to a cluster of the input segmentation.

MAP Train GMMs

This example shows how to train speaker models using a MAP adaptation method. First, the UBM is copied for each speaker given the initial model. Last, the MAP adaptation is performed. Only means are adapted.
The speaker models are stored in the file “speakers.gmm”.

#copy the ubm for each speaker
java -Xmx2024m -cp LIUM_SpkDiarization.jar fr.lium.spkDiarization.programs.MTrainInit --help --sInputMask=%s.seg --fInputMask=%s.wav --fInputDesc="audio16kHz2sphinx,1:3:2:0:0:0,13,1:1:300:4"  --emInitMethod=copy --tInputMask=./ubm.gmm --tOutputMask=%s.init.gmm speakers
#train (MAP adaptation, mean only) of each speaker, the diarization file describes the training data of each speaker.
java -Xmx2024m -cp LIUM_SpkDiarization.jar fr.lium.spkDiarization.programs.MTrainMAP --help --sInputMask=%s.seg --fInputMask=%s.wav --fInputDesc="audio16kHz2sphinx,1:3:2:0:0:0,13,1:1:300:4"  --tInputMask=%s.init.gmm --emCtrl=1,5,0.01 --varCtrl=0.01,10.0 --tOutputMask=%s.gmm speakers