Most of the tools take:
-
A diarization as input and generate another diarization as output. The exceptions are model trainers, which generate a model instead of a diarization.
-
A file containing acoustic vector or a audio file.
Diarization parameters
Input
Diarization file formats are explain in Data section.
--sInputMask=<path> set the <path> of the input diarization file. It could be:
-
an absolute path:
--sInputMask=/home/myseg.seg -
a relative path from the current directory:
--sInputMask=seg/myseg.seg -
a path where the
%sis substituted by the show name parameter. For example,--sInputMask=seg/%s.segwith show name equal tomysegis equivalent to--sInputMask=seg/myseg.seg.
The parameter --sInputFormat set information about the file. It takes two values separated by a comma. The first value set the file format and the second set the charset. The supported file format are:
-
segcorresponds to the file format of the toolkit; -
bckcorresponds to a LIUM file format close to a NIST CTM speech file; -
ctlcorresponds to the sphinx control file format with some extension in the speaker name; -
saus.seg is an extension in witch sausages graph are stored (deprecated);
-
seg.xml corresponds to an xml version of
segproposed in ANR EPAC project (experimental use only); -
media.xml corresponds to an xml version of
segproposed in ANR PORT-MEDIA project (experimental use only);
The charset correspond to a charset name defined in JVM, please read Charset javadoc page. The most use in france are ISO-8859-1 and UTF8.
To read a ctl encoded in UTF8, the parameter is --sInputFormat=seg,UTF8.
output
The output diarization parameters is similar to the input ones:
-
--sInputMaskis now--sOutputMask=PATH -
--sInputFormatis now--sOutputFormat
Feature parameters
input
The parameter --fInputMask defines the file to load. It could be:
-
an absolute path:
--fInputMask=/home/myfile.mfcc -
a relative path from the current directory:
--fInputMask=file/myfile.wav -
a path where
%sis substituted by the show name parameter. For example,--fInputMask=./file/%s.sphwith show name equal tomyshowis equivalent to--fInputMask=./file/myshow.sph.
The parameter --fInputDesc=type[:deltatype] [,s:e:ds:de:dds:dde,dim,c:r:wSize:method] contains 4 blocks separated by a comma:
-
the type of file:
type; -
the description of the feature vector:
s:e:ds:de:dds:dde; -
the number of the static parameters and energy present on disk or computed on the fly;
-
the normalization to applied on feature:
c:r:wSize:method.
The feature type could be:
-
sphinxa sphinx file (mfcc, plp, etc); -
spro4a spro4 (mfcc, lfcc, filter bank, etc); -
gztxta gzipped-text text in which each line corresponds to a vector; -
htka htk file; -
featureSetTransformationonly used in programming to transform a feature set into an other feature set; for example when you when to apply CMS on unnormalize features; -
audio8kHz2sphin,audio16kHz2sphin,audio22kHz2sphinx,audio44kHz2sphinx, oraudio48kHz2sphinxa sphere, wave audio file (recorded at 8, 16, 22, 44 or 48 KHz) is a converted in mfcc using sphinx 4; -
audio2sphinxis no more available.
The description of the vector is described by s:e:ds:de:dds:dde where:
-
scorresponds to static values, ifsequal:-
0the static is not present on disk, -
1the static is present, -
3the static is present on disk but values are removed after the loading;
-
-
ecorresponds to the energy, value could also be[0, 1, 2 , 3]; -
dcorresponds to delta, value could also be[0, 1, 2 , 3]; -
decorresponds to delta energy, value could also be[0, 1, 2 , 3]; -
ddcorresponds to delta delta, value could also be[0, 1, 2 , 3]; -
ddecorresponds to delta delta energy, value could also be[0, 1, 2 , 3].
The feature deltaType could be:
-
sphinxa sphinx style delta and delta delta; -
spro4a spro4 style delta and delta delta; -
htka htk style delta and delta delta;
The normalization is controlled by 4 parameters c:r:wSize:method, where:
-
ccorresponds to the cepstral mean subtraction (CMS), 0 signifies that CMS is not applied, whereas 1 signifies that CMS is applied; -
rcorresponds to the variance normalization, admit value is 0 or 1; -
wSizeis link to the normalization method, the value correspond of the number of frame in a sliding window, on which the normalization is computed (CMS, variance or warping); -
methodindicates how to apply the normalization:-
mean and/or variance are computed on segment if the value is set to 0,
-
mean and/or variance are computed on cluster if the value is set to 1,
-
mean and/or variance are computed on sliding window if the value is set to 2,
-
feature warping if the value is set to 3,
-
feature warping followed by a CMS and/or a variance normalization on segment if the value is set to 4,
-
feature mapping followed by a CMS and/or a variance normalization on cluster if the value is set to 5,
-
feature warping followed by a CMS and/or a variance normalization on cluster if the value is set to 6.
-
Feature warping[1] and feature mapping [2] are classical normalization method employed in speaker verification system. Read:
output
The output feature parameters is similar to the input ones:
-
--fInputMaskis--fOutputMask, -
--fInputDescis--fOutputDesc.