Using internal tools
Compute MFCC of a audio file
The example given below use LIUM Speaker Diarization jar. The MFCCs are copmuted using the tools provided in Sphinx 4. The command line is:
- wave2mfcc.sh
-
#!/bin/bash wave=$1 dir=`dirname $wave` base=`basename $wave .wav` java -Xmx2048m -classpath ./LIUM_SpkDiarization.jar fr.lium.spkDiarization.tools.Wave2FeatureSet --help --fInputMask=$wave --sInputMask="" --fInputDesc="audio16kHz2sphinx,1:1:0:0:0:0,13,0:0:0"--fOutputMask=$dir/$base.mfcc --fOutputDesc="audio16kHz2sphinx,1:1:0:0:0:0,13,0:0:0" $base
Description:
-
%s
is substituted by the name of the show. Here, the name of the show is file. -
--sInputMask
is a segmentation file. It could be empty. In this case the MFCC is computed from the start to the start to the end of the file. If you provide a segmentation file, the output MFCC contains only the features describe in the segmentation file. -
--fInputDesc
and--fOutputDesc
are equal and describe the feature: 13 MFCC including C0 (corresponding to the energy parameter). If--fOutputDesc
is different, the transformation is applied before to save the MFCC.
Compute MFCC of a set of audio files
Suppose you want to train a model using segments drawn from a set of audio files. Wave2FeatureSet
is able to compute MFCC for each audio file, to concat its in a MFCC before to save it. The segmentation corresponding to the new MFCC could be save using --sOutputMask
parameter.
Using Sphinx 3
The example given below use Sphinx 3. The command line is:
#!/bin/bash feat_sphinx.sh ./showName.sph ./showName.mfcc ./showName.uem.seg
The script feat_sphinx.sh consists of:
- wave2mfcc_sphinx3.sh
-
#!/bin/bash sph=$1 mfcc=$2 uem=$3 show=`basename $sph .sph` echo "sphinx: $sph --> ($mfcc, $uem)" # sphinxBase sphinx_fe -nist yes -i $sph -o $mfcc 2> /dev/null #or with the old version #wave2feat -nist -i $sph -o $mfcc 2> /dev/null #get the header in a temporary file sphinx_cepview -d 0 -e 1 -header 1 -f $mfcc 2> tmp_$$.txt #get the number of computed MFCC vectors nbf=`cat tmp_$$.txt | grep frames | awk '{print $4;}'` #make a uem composed of one segment starting at feature 0 with $nbf features echo "$show 1 0 $nbf U U U 1" > $uem #remove the temporary file rm -f tmp_$$.txt
Now we have the MFCC and initial segmentation files: ./showName.mfcc and ./showName.uem.seg.
Using Spro
«SPro is a free speech signal processing toolkit which provides runtime commands implementing standard feature extraction algorithms for speech related applications and a C library to implement new algorithms and to use SPro files within your own programs.»
- wave2mfcc_spro.sh
-
#!/bin/bash sph=$1 mfcc=$2 show=`basename $sph .sph` sfbcep -v -F sphere -p 12 -m -e -f 16000 $sph $mfcc.tmp #swap order (on intel CPU) scopy -B $mfcc.tmp $mfcc rm $mfcc.tmp