cross-show_speaker_diarization

The recently introduced cross-show speaker diarization aims to expand the diarization task to a broader context, where speakers appearing in different recordings of the same show (the cross-show speakers) will always be identified in the same way in every recording. Each show from a collection is first individually processed with the single-show speaker diarization system. Then it is processed collectively using a CLR or ILP clustering. Experiments showed that ILP clustering provides a better speed/accuracy trade-off.

From single to cross-show

The single-show output diarization need to be merge in one diarization file.
The ID of each speaker must be dependent of the show. For example use the script below to make unique the speaker ID.

single2cross.sh

cat */*.ev_is.seg | grep "^[^;;]" | perl -e
    
$i=0;
    while(<>){
        chomp;
        @t=split(/ +/);
        
$n=$t[0]." ".$t[7];
 
        if(! exists(
$d{$n})) {
            
$d{$n}="S".$i;
            
$i++;
        }
        print "
$t[0] $t[1] $t[2] $t[3] $t[4] $t[5] $t[6] ".$d{$n}."\n";
    }’
> cross.seg

Cross-show clustering

2 methods are available: CLR clustreing or ILP clustering.
The parameters are similar to the single-show diarization except the input diarization is the merge of n single-show diarizations.

CLR Clustering
features=./mfcc/%s.mfcc
fDescCLR="sphinx,1:3:2:0:0:0,13,1:1:300:4"
 
 
c=1.7
java -Xmx$mem -classpath LIUM_SpkDiarization-8.4.jar fr.lium.spkDiarization.programs.MClust  —help  –fInputMask=$features –fInputDesc=$fDescCLR –sInputMask=cross.seg –sOutputMask=./%s.cross_clr.$c.seg –cMethod=ce –cThr=$c –tInputMask=models/ubm.gmm –emCtrl=1,5,0.01 –sTop=5,models/ubm.gmm cross

The parameters are similar the call od MClust in a single-show diarization. See Programmes for more details.

ILP Clustering
features=./mfcc/%s.mfcc
fDesc="sphinx,1:3:2:0:0:0,13,1:1:0:0"
 
 
java -Xmx$mem  –cp LIUM_SpkDiarization-8.4.jar fr.lium.spkDiarization.programs.ivector.ILPClustering –cMethod=es_iv –ilpThr=$chelp –sInputMask=cross.seg –sOutputMask=./%s.ev_is.$c.seg –fInputMask=$features –fInputDesc=$fDesc –tInputMask=./ubm/wld.gmm –nEFRMask=mat/wld.efn.xml –ilpGLPSolProgram=/opt/local/bin/glpsol –nMahanalobisCovarianceMask=./mat/wld.mahanalobis.mat –tvTotalVariabilityMatrixMask=./mat/wld.tv.mat –ilpOutputProblemMask=./%s.ilp.problem.$c.txt –ilpOutputSolutionMask=./%s.ilp.solution.$c.txt cross

Models and matrices are available in this archive. Parameters correspond to:

  • --fInputDesc the feature description, see Commun Parameters for details.
  • --fInputMask the feature input file mask, see Commun Parameters for details.
  • --sOutputMask the segmentation output file mask, see Commun Parameters for details.
  • --sInputMask the segmentation input file mask, see Commun Parameters for details.
  • --tInputMask the UBM input file mask.
  • --tvTotalVariabilityMatrixMask the total variability matrix mask.
  • --cMethod the clustering method. only es_iv is available.
  • --ilpThr the threshold, ie the radius of each cluster.
  • --nEFRMask the EFR normalization mask.
  • --ilpGLPSolProgram the path and the name of the ilp solver.
  • --nMahanalobisCovarianceMask the path and the name of the Mahanolobis covariance matrix.
  • --ilpOutputProblemMask the path and the name of the ILP problem pass to the ILP solver.
  • --ilpOutputSolutionMask then and the name of the output of the ILP solver.