Training

UBM

Train a UBM. See EM Train GMMs.

Total variability matrix

TV.sh
segIn=./seg/ubm.seg
gmm=./ubm/ubm.gmm
fMask=./mfcc/%s.mfcc
 
#train Total variability matrix and i-vector of the train
java -Xmx32G  -cp $LOCALCLASSPATH fr.lium.spkDiarization.programs.ivector.TrainIVectorOrTV --help --tvTrainTotalVariabilityMatrix=true --fInputDesc=sphinx,1:3:2:0:0:0,13,1:1:0:0 --fInputMask=$fMask --sInputMask=$segIn --tInputMask=$gmm --tOutputMask=$gmm.iv --tOutputModelType=iv_txt --tvTotalVariabilityMatrixMask=mat/%s.tv.mat   --tvPartialTotalVariabilityMatrixMask=mat/%s_%i.tv.mat --tvNbIt=15 --tvSize=50  ubm

Parameters:

  • --tvTrainTotalVariabilityMatrix, the total variability matrix and i-vector are trained if true, the i-vector is trained only if false.
  • --fInputDesc the feature description, see Commun Parameters for details.
  • --fInputMask the feature input file mask, see Commun Parameters for details.
  • --sInputMask the segmentation input file mask, see Commun Parameters for details.
  • --tInputMask the UBM input file mask.
  • --tOutputMask the output i-vectors.
  • --tOutputModelType the type of i-voctors file. iv_text is raw text file.
  • --tvTotalVariabilityMatrixMask the total variability matrix mask.
  • --tvPartialTotalVariabilityMatrixMask the partial total variability mask: initial TV matrix, and matrixes computed from the 1st to the last iteration. %i is substitutes by the iteration number.
  • --tvNbIt the number of training iteration.
  • --tvSizethe size of a i-vector.

I-Vector

Normalization

EFR

The EFR normalization is describe in [1].

trainEFR.sh
#train EFR normalization
java -Xmx5G  -cp $LOCALCLASSPATH fr.lium.spkDiarization.programs.ivector.TrainEigenFactorRadialNormalisation --help --tInputMask=$gmm.iv --tInputModelType=iv_txt --tOutputMask=$gmm.efr.iv --tOutputModelType=iv_txt --nEFRNbIt=5 --nEFRMask=mat/%s.efn.xml ubm

Scoring / Distance

Mahalanobis and cosine distance are available. But only Mahalanobis was tested.

Mahalanobis Covariance Matrix

trainCovMaha.sh
#train Mahalanobis covariance matrix
java -Xmx5G  -cp $LOCALCLASSPATH fr.lium.spkDiarization.programs.ivector.ComputeMahanalobisCovariance --help --tInputMask=$gmm.efr.iv.norm --tInputModelType=iv_txt --nMahanalobisCovarianceMask=./mat/%s.mahanalobis.mat ubm

Clustering

The script below gives a full diarization process from the audio file to the ILP clustering [2]. Only the last step differs from the CLR based clustering.

Models and matrices need to be extract from this archive.

Be careful: ILP clustering need glpk program or gurobi program. The java ILP clustering make a system call to the Integer Linera programing tool (glpsolve or gurobi).

ilp_diarization.sh
#!/bin/bash
 
PATH=$PATH:..:.
 
audio=$1
 
mem=1G
show=`basename $audio .sph`
show=`basename $show .wav`
 
echo $show
 
#need JVM 1.6
java=java
 
datadir=${show}
 
pmsgmm=./models/sms.gmms
sgmm=./models/s.gmms
ggmm=./models/gender.gmms
 
uem=./sph/$show.uem.seg
 
LOCALCLASSPATH=./dist/LIUM_SpkDiarization-8.4.jar
 
echo "#####################################################"
echo "#   $show"
echo "#####################################################"
 
mkdir ./$datadir >& /dev/null
 
features=./$datadir/%s.mfcc
fDescStart="audio16kHz2sphinx,1:1:0:0:0:0,13,0:0:0"
fDesc="sphinx,1:1:0:0:0:0,13,0:0:0"
fDescD="sphinx,1:3:2:0:0:0,13,0:0:0:0"
fDescLast="sphinx,1:3:2:0:0:0,13,1:1:0:0"
fDescCLR="sphinx,1:3:2:0:0:0,13,1:1:300:4"
 
#compute the MFCC
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.tools.Wave2FeatureSet --help --fInputMask=$audio --fInputDesc=$fDescStart --fOutputMask=$features --fOutputDesc=$fDesc --sInputMask=$uem $show
 
#chech the MFCC 
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MSegInit   --help --fInputMask=$features --fInputDesc=$fDesc --sInputMask=$uem --sOutputMask=./$datadir/%s.i.seg  $show
 
#GLR based segmentation, make small segments
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MSeg   --kind=FULL --sMethod=GLR  --help --fInputMask=$features --fInputDesc=$fDesc --sInputMask=./$datadir/%s.i.seg --sOutputMask=./$datadir/%s.s.seg  $show
 
# Segmentation: linear clustering
l=2
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MClust    --help --fInputMask=$features --fInputDesc=$fDesc --sInputMask=./$datadir/%s.s.seg --sOutputMask=./$datadir/%s.l.seg --cMethod=l --cThr=$l $show
 
h=3
# hierarchical clustering
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MClust    --help --fInputMask=$features --fInputDesc=$fDesc --sInputMask=./$datadir/%s.l.seg --sOutputMask=./$datadir/%s.h.$h.seg --cMethod=h --cThr=$h $show
 
# initialize GMM
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MTrainInit   --help --nbComp=8 --kind=DIAG --fInputMask=$features --fInputDesc=$fDesc --sInputMask=./$datadir/%s.h.$h.seg --tOutputMask=./$datadir/%s.init.gmms $show
 
# EM computation
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MTrainEM   --help  --nbComp=8 --kind=DIAG --fInputMask=$features --fInputDesc=$fDesc --sInputMask=./$datadir/%s.h.$h.seg --tOutputMask=./$datadir/%s.gmms  --tInputMask=./$datadir/%s.init.gmms  $show 
 
#Viterbi decoding
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MDecode    --help --fInputMask=${features} --fInputDesc=$fDesc --sInputMask=./$datadir/%s.h.$h.seg --sOutputMask=./$datadir/%s.d.$h.seg --dPenality=250  --tInputMask=$datadir/%s.gmms $show
 
#----------------
#Speech/Music/Silence segmentation
pmsseg=./$datadir/$show.pms.seg
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MDecode  --help  --fInputDesc=$fDescD --fInputMask=$features --sInputMask=./$datadir/%s.i.seg --sOutputMask=$pmsseg --dPenality=10,10,50 --tInputMask=$pmsgmm $show
 
#filter spk segmentation according pms segmentation
fltseg=./$datadir/$show.flt.$h.seg
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.tools.SFilter --help  --fInputDesc=$fDescD --fInputMask=$features --fltSegMinLenSpeech=150 --fltSegMinLenSil=25 --sFilterClusterName=j --fltSegPadding=25 --sFilterMask=$pmsseg --sInputMask=./$datadir/%s.d.$h.seg --sOutputMask=$fltseg $show
 
#Set gender and bandwith
gseg=./$datadir/$show.g.$h.seg
java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MScore --help  --sGender --sByCluster --fInputDesc=$fDescLast --fInputMask=$features --sInputMask=$fltseg --sOutputMask=$gseg --tInputMask=$ggmm $show
 
 
#ILP Clustering
c=$2
java -Xmx$mem  -cp $LOCALCLASSPATH fr.lium.spkDiarization.programs.ivector.ILPClustering --cMethod=es_iv --ilpThr=$c --help --sInputMask=$gseg --sOutputMask=./$datadir/%s.ev_is.$c.seg --fInputMask=$features --fInputDesc=$fDescLast --tInputMask=./ubm/wld.gmm --nEFRMask=mat/wld.efn.xml --ilpGLPSolProgram=/opt/local/bin/glpsol --nMahanalobisCovarianceMask=./mat/wld.mahanalobis.mat --tvTotalVariabilityMatrixMask=./mat/wld.tv.mat --ilpOutputProblemMask=./$datadir/%s.ilp.problem.$c.txt --ilpOutputSolutionMask=./$datadir/%s.ilp.solution.$c.txt $show


1.
a

Bousquet Pierre-Michel and Bonastre Jean-François and Matrouf Driss. Intersession compensation and scoring methods in the i-vectors space for speaker recognition Interspeech 2011, Florence 2011
2.
a

Rouvier Mickael, Meignier Sylvain . A Global Optimization Framework For Speaker Diarization. Speaker Odyssey 2012.2012