The recently introduced cross-show speaker diarization aims to expand the diarization task to a broader context, where speakers appearing in different recordings of the same show (the cross-show speakers) will always be identified in the same way in every recording. Each show from a collection is first individually processed with the single-show speaker diarization system. Then it is processed collectively using a CLR or ILP clustering. Experiments showed that ILP clustering provides a better speed/accuracy trade-off.
From single to cross-show
The single-show output diarization need to be merge in one diarization file.
The ID of each speaker must be dependent of the show. For example use the script below to make unique the speaker ID.
single2cross.sh
$i=0;
while(<>){
chomp;
@t=split(/ +/);
$n=$t[0]." ".$t[7];
if(! exists($d{$n})) {
$d{$n}="S".$i;
$i++;
}
print "$t[0] $t[1] $t[2] $t[3] $t[4] $t[5] $t[6] ".$d{$n}."\n";
}’> cross.seg
Cross-show clustering
2 methods are available: CLR clustreing or ILP clustering.
The parameters are similar to the single-show diarization except the input diarization is the merge of n single-show diarizations.
CLR Clustering
fDescCLR="sphinx,1:3:2:0:0:0,13,1:1:300:4"
c=1.7
java -Xmx$mem -classpath LIUM_SpkDiarization-8.4.jar fr.lium.spkDiarization.programs.MClust —help –fInputMask=$features –fInputDesc=$fDescCLR –sInputMask=cross.seg –sOutputMask=./%s.cross_clr.$c.seg –cMethod=ce –cThr=$c –tInputMask=models/ubm.gmm –emCtrl=1,5,0.01 –sTop=5,models/ubm.gmm cross
The parameters are similar the call od MClust
in a single-show diarization. See Programmes for more details.
ILP Clustering
fDesc="sphinx,1:3:2:0:0:0,13,1:1:0:0"
java -Xmx$mem –cp LIUM_SpkDiarization-8.4.jar fr.lium.spkDiarization.programs.ivector.ILPClustering –cMethod=es_iv –ilpThr=$c —help –sInputMask=cross.seg –sOutputMask=./%s.ev_is.$c.seg –fInputMask=$features –fInputDesc=$fDesc –tInputMask=./ubm/wld.gmm –nEFRMask=mat/wld.efn.xml –ilpGLPSolProgram=/opt/local/bin/glpsol –nMahanalobisCovarianceMask=./mat/wld.mahanalobis.mat –tvTotalVariabilityMatrixMask=./mat/wld.tv.mat –ilpOutputProblemMask=./%s.ilp.problem.$c.txt –ilpOutputSolutionMask=./%s.ilp.solution.$c.txt cross
Models and matrices are available in this archive. Parameters correspond to:
-
--fInputDesc
the feature description, see Commun Parameters for details. -
--fInputMask
the feature input file mask, see Commun Parameters for details. -
--sOutputMask
the segmentation output file mask, see Commun Parameters for details. -
--sInputMask
the segmentation input file mask, see Commun Parameters for details. -
--tInputMask
the UBM input file mask. -
--tvTotalVariabilityMatrixMask
the total variability matrix mask. -
--cMethod
the clustering method. onlyes_iv
is available. -
--ilpThr
the threshold, ie the radius of each cluster. -
--nEFRMask
the EFR normalization mask. -
--ilpGLPSolProgram
the path and the name of the ilp solver. -
--nMahanalobisCovarianceMask
the path and the name of the Mahanolobis covariance matrix. -
--ilpOutputProblemMask
the path and the name of the ILP problem pass to the ILP solver. -
--ilpOutputSolutionMask
then and the name of the output of the ILP solver.