Compute MFCC – LIUM SpkDiarization

Using internal tools

Compute MFCC of a audio file

The example given below use LIUM Speaker Diarization jar. The MFCCs are copmuted using the tools provided in Sphinx 4. The command line is:

wave2mfcc.sh

#!/bin/bash
 
wave=$1
dir=`dirname $wave`
base=`basename $wave .wav`
 
java -Xmx2048m -classpath ./LIUM_SpkDiarization.jar fr.lium.spkDiarization.tools.Wave2FeatureSet --help --fInputMask=$wave --sInputMask="" --fInputDesc="audio16kHz2sphinx,1:1:0:0:0:0,13,0:0:0"--fOutputMask=$dir/$base.mfcc --fOutputDesc="audio16kHz2sphinx,1:1:0:0:0:0,13,0:0:0" $base

Description:

%s is substituted by the name of the show. Here, the name of the show is file.
--sInputMask is a segmentation file. It could be empty. In this case the MFCC is computed from the start to the start to the end of the file. If you provide a segmentation file, the output MFCC contains only the features describe in the segmentation file.
--fInputDesc and --fOutputDesc are equal and describe the feature: 13 MFCC including C0 (corresponding to the energy parameter). If --fOutputDesc is different, the transformation is applied before to save the MFCC.

Compute MFCC of a set of audio files

Suppose you want to train a model using segments drawn from a set of audio files. Wave2FeatureSet is able to compute MFCC for each audio file, to concat its in a MFCC before to save it. The segmentation corresponding to the new MFCC could be save using --sOutputMask parameter.

Using Sphinx 3

The example given below use Sphinx 3. The command line is:

#!/bin/bash
feat_sphinx.sh ./showName.sph ./showName.mfcc ./showName.uem.seg

The script feat_sphinx.sh consists of:

wave2mfcc_sphinx3.sh

#!/bin/bash
 
sph=$1                                                                                                                          
mfcc=$2
uem=$3
 
show=`basename $sph .sph`
 
echo "sphinx: $sph --> ($mfcc, $uem)"
 
# sphinxBase
sphinx_fe -nist yes -i $sph -o $mfcc 2> /dev/null
 
#or with the old version
#wave2feat -nist -i $sph -o $mfcc 2> /dev/null
 
#get the header in a temporary file
sphinx_cepview -d 0 -e 1 -header 1 -f $mfcc 2> tmp_$$.txt
 
#get the number of computed MFCC vectors
nbf=`cat tmp_$$.txt | grep frames | awk '{print $4;}'`
 
#make a uem composed of one segment starting at feature 0 with $nbf features                                      
echo "$show 1 0 $nbf U U U 1" > $uem
 
#remove the temporary file
rm -f tmp_$$.txt

Now we have the MFCC and initial segmentation files: ./showName.mfcc and ./showName.uem.seg.

Using Spro

«SPro is a free speech signal processing toolkit which provides runtime commands implementing standard feature extraction algorithms for speech related applications and a C library to implement new algorithms and to use SPro files within your own programs.»

wave2mfcc_spro.sh

#!/bin/bash                                                                                                                                                                         
 
sph=$1                                                                                                                                                                              
mfcc=$2                                                                                                                                                                             
                                                                                                                                                                               show=`basename $sph .sph`                                                                                                                                                           
 
sfbcep -v -F sphere -p 12 -m -e -f 16000 $sph $mfcc.tmp                                           
#swap order (on intel CPU)
scopy -B $mfcc.tmp $mfcc                                                                                                                                                            
rm $mfcc.tmp