UBM
Train a UBM. See EM Train GMMs.
Total variability matrix
- TV.sh
-
segIn=./seg/ubm.seg gmm=./ubm/ubm.gmm fMask=./mfcc/%s.mfcc #train Total variability matrix and i-vector of the train java -Xmx32G -cp $LOCALCLASSPATH fr.lium.spkDiarization.programs.ivector.TrainIVectorOrTV --help --tvTrainTotalVariabilityMatrix=true --fInputDesc=sphinx,1:3:2:0:0:0,13,1:1:0:0 --fInputMask=$fMask --sInputMask=$segIn --tInputMask=$gmm --tOutputMask=$gmm.iv --tOutputModelType=iv_txt --tvTotalVariabilityMatrixMask=mat/%s.tv.mat --tvPartialTotalVariabilityMatrixMask=mat/%s_%i.tv.mat --tvNbIt=15 --tvSize=50 ubm
Parameters:
-
--tvTrainTotalVariabilityMatrix
, the total variability matrix and i-vector are trained if true, the i-vector is trained only if false. -
--fInputDesc
the feature description, see Commun Parameters for details. -
--fInputMask
the feature input file mask, see Commun Parameters for details. -
--sInputMask
the segmentation input file mask, see Commun Parameters for details. -
--tInputMask
the UBM input file mask. -
--tOutputMask
the output i-vectors. -
--tOutputModelType
the type of i-voctors file.iv_text
is raw text file. -
--tvTotalVariabilityMatrixMask
the total variability matrix mask. -
--tvPartialTotalVariabilityMatrixMask
the partial total variability mask: initial TV matrix, and matrixes computed from the 1st to the last iteration.%i
is substitutes by the iteration number. -
--tvNbIt
the number of training iteration. -
--tvSize
the size of a i-vector.
I-Vector
Normalization
EFR
The EFR normalization is describe in [1].
- trainEFR.sh
-
#train EFR normalization java -Xmx5G -cp $LOCALCLASSPATH fr.lium.spkDiarization.programs.ivector.TrainEigenFactorRadialNormalisation --help --tInputMask=$gmm.iv --tInputModelType=iv_txt --tOutputMask=$gmm.efr.iv --tOutputModelType=iv_txt --nEFRNbIt=5 --nEFRMask=mat/%s.efn.xml ubm
Scoring / Distance
Mahalanobis and cosine distance are available. But only Mahalanobis was tested.
Mahalanobis Covariance Matrix
- trainCovMaha.sh
-
#train Mahalanobis covariance matrix java -Xmx5G -cp $LOCALCLASSPATH fr.lium.spkDiarization.programs.ivector.ComputeMahanalobisCovariance --help --tInputMask=$gmm.efr.iv.norm --tInputModelType=iv_txt --nMahanalobisCovarianceMask=./mat/%s.mahanalobis.mat ubm
Clustering
The script below gives a full diarization process from the audio file to the ILP clustering [2]. Only the last step differs from the CLR based clustering.
Models and matrices need to be extract from this archive.
Be careful: ILP clustering need glpk program or gurobi program. The java ILP clustering make a system call to the Integer Linera programing tool (glpsolve or gurobi).
- ilp_diarization.sh
-
#!/bin/bash PATH=$PATH:..:. audio=$1 mem=1G show=`basename $audio .sph` show=`basename $show .wav` echo $show #need JVM 1.6 java=java datadir=${show} pmsgmm=./models/sms.gmms sgmm=./models/s.gmms ggmm=./models/gender.gmms uem=./sph/$show.uem.seg LOCALCLASSPATH=./dist/LIUM_SpkDiarization-8.4.jar echo "#####################################################" echo "# $show" echo "#####################################################" mkdir ./$datadir >& /dev/null features=./$datadir/%s.mfcc fDescStart="audio16kHz2sphinx,1:1:0:0:0:0,13,0:0:0" fDesc="sphinx,1:1:0:0:0:0,13,0:0:0" fDescD="sphinx,1:3:2:0:0:0,13,0:0:0:0" fDescLast="sphinx,1:3:2:0:0:0,13,1:1:0:0" fDescCLR="sphinx,1:3:2:0:0:0,13,1:1:300:4" #compute the MFCC java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.tools.Wave2FeatureSet --help --fInputMask=$audio --fInputDesc=$fDescStart --fOutputMask=$features --fOutputDesc=$fDesc --sInputMask=$uem $show #chech the MFCC java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MSegInit --help --fInputMask=$features --fInputDesc=$fDesc --sInputMask=$uem --sOutputMask=./$datadir/%s.i.seg $show #GLR based segmentation, make small segments java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MSeg --kind=FULL --sMethod=GLR --help --fInputMask=$features --fInputDesc=$fDesc --sInputMask=./$datadir/%s.i.seg --sOutputMask=./$datadir/%s.s.seg $show # Segmentation: linear clustering l=2 java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MClust --help --fInputMask=$features --fInputDesc=$fDesc --sInputMask=./$datadir/%s.s.seg --sOutputMask=./$datadir/%s.l.seg --cMethod=l --cThr=$l $show h=3 # hierarchical clustering java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MClust --help --fInputMask=$features --fInputDesc=$fDesc --sInputMask=./$datadir/%s.l.seg --sOutputMask=./$datadir/%s.h.$h.seg --cMethod=h --cThr=$h $show # initialize GMM java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MTrainInit --help --nbComp=8 --kind=DIAG --fInputMask=$features --fInputDesc=$fDesc --sInputMask=./$datadir/%s.h.$h.seg --tOutputMask=./$datadir/%s.init.gmms $show # EM computation java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MTrainEM --help --nbComp=8 --kind=DIAG --fInputMask=$features --fInputDesc=$fDesc --sInputMask=./$datadir/%s.h.$h.seg --tOutputMask=./$datadir/%s.gmms --tInputMask=./$datadir/%s.init.gmms $show #Viterbi decoding java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MDecode --help --fInputMask=${features} --fInputDesc=$fDesc --sInputMask=./$datadir/%s.h.$h.seg --sOutputMask=./$datadir/%s.d.$h.seg --dPenality=250 --tInputMask=$datadir/%s.gmms $show #---------------- #Speech/Music/Silence segmentation pmsseg=./$datadir/$show.pms.seg java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MDecode --help --fInputDesc=$fDescD --fInputMask=$features --sInputMask=./$datadir/%s.i.seg --sOutputMask=$pmsseg --dPenality=10,10,50 --tInputMask=$pmsgmm $show #filter spk segmentation according pms segmentation fltseg=./$datadir/$show.flt.$h.seg java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.tools.SFilter --help --fInputDesc=$fDescD --fInputMask=$features --fltSegMinLenSpeech=150 --fltSegMinLenSil=25 --sFilterClusterName=j --fltSegPadding=25 --sFilterMask=$pmsseg --sInputMask=./$datadir/%s.d.$h.seg --sOutputMask=$fltseg $show #Set gender and bandwith gseg=./$datadir/$show.g.$h.seg java -Xmx$mem -classpath "$LOCALCLASSPATH" fr.lium.spkDiarization.programs.MScore --help --sGender --sByCluster --fInputDesc=$fDescLast --fInputMask=$features --sInputMask=$fltseg --sOutputMask=$gseg --tInputMask=$ggmm $show #ILP Clustering c=$2 java -Xmx$mem -cp $LOCALCLASSPATH fr.lium.spkDiarization.programs.ivector.ILPClustering --cMethod=es_iv --ilpThr=$c --help --sInputMask=$gseg --sOutputMask=./$datadir/%s.ev_is.$c.seg --fInputMask=$features --fInputDesc=$fDescLast --tInputMask=./ubm/wld.gmm --nEFRMask=mat/wld.efn.xml --ilpGLPSolProgram=/opt/local/bin/glpsol --nMahanalobisCovarianceMask=./mat/wld.mahanalobis.mat --tvTotalVariabilityMatrixMask=./mat/wld.tv.mat --ilpOutputProblemMask=./$datadir/%s.ilp.problem.$c.txt --ilpOutputSolutionMask=./$datadir/%s.ilp.solution.$c.txt $show