The recently introduced cross-show speaker diarization aims to expand the diarization task to a broader context, where speakers appearing in different recordings of the same show (the cross-show speakers) will always be identified in the same way in every recording. Each show from a collection is first individually processed with the single-show speaker diarization system. Then it is processed collectively using a CLR or ILP clustering. Experiments showed that ILP clustering provides a better speed/accuracy trade-off.
From single to cross-show
The single-show output diarization need to be merge in one diarization file.
The ID of each speaker must be dependent of the show. For example use the script below to make unique the speaker ID.
single2cross.sh
$i=0;
while(<>){
chomp;
@t=split(/ +/);
$n=$t[0]." ".$t[7];
if(! exists($d{$n})) {
$d{$n}="S".$i;
$i++;
}
print "$t[0] $t[1] $t[2] $t[3] $t[4] $t[5] $t[6] ".$d{$n}."\n";
}’> cross.seg
Cross-show clustering
2 methods are available: CLR clustreing or ILP clustering.
The parameters are similar to the single-show diarization except the input diarization is the merge of n single-show diarizations.
CLR Clustering
fDescCLR="sphinx,1:3:2:0:0:0,13,1:1:300:4"
c=1.7
java -Xmx$mem -classpath LIUM_SpkDiarization-8.4.jar fr.lium.spkDiarization.programs.MClust —help –fInputMask=$features –fInputDesc=$fDescCLR –sInputMask=cross.seg –sOutputMask=./%s.cross_clr.$c.seg –cMethod=ce –cThr=$c –tInputMask=models/ubm.gmm –emCtrl=1,5,0.01 –sTop=5,models/ubm.gmm cross
The parameters are similar the call od MClust in a single-show diarization. See Programmes for more details.
ILP Clustering
fDesc="sphinx,1:3:2:0:0:0,13,1:1:0:0"
java -Xmx$mem –cp LIUM_SpkDiarization-8.4.jar fr.lium.spkDiarization.programs.ivector.ILPClustering –cMethod=es_iv –ilpThr=$c —help –sInputMask=cross.seg –sOutputMask=./%s.ev_is.$c.seg –fInputMask=$features –fInputDesc=$fDesc –tInputMask=./ubm/wld.gmm –nEFRMask=mat/wld.efn.xml –ilpGLPSolProgram=/opt/local/bin/glpsol –nMahanalobisCovarianceMask=./mat/wld.mahanalobis.mat –tvTotalVariabilityMatrixMask=./mat/wld.tv.mat –ilpOutputProblemMask=./%s.ilp.problem.$c.txt –ilpOutputSolutionMask=./%s.ilp.solution.$c.txt cross
Models and matrices are available in this archive. Parameters correspond to:
-
--fInputDescthe feature description, see Commun Parameters for details. -
--fInputMaskthe feature input file mask, see Commun Parameters for details. -
--sOutputMaskthe segmentation output file mask, see Commun Parameters for details. -
--sInputMaskthe segmentation input file mask, see Commun Parameters for details. -
--tInputMaskthe UBM input file mask. -
--tvTotalVariabilityMatrixMaskthe total variability matrix mask. -
--cMethodthe clustering method. onlyes_ivis available. -
--ilpThrthe threshold, ie the radius of each cluster. -
--nEFRMaskthe EFR normalization mask. -
--ilpGLPSolProgramthe path and the name of the ilp solver. -
--nMahanalobisCovarianceMaskthe path and the name of the Mahanolobis covariance matrix. -
--ilpOutputProblemMaskthe path and the name of the ILP problem pass to the ILP solver. -
--ilpOutputSolutionMaskthen and the name of the output of the ILP solver.