Most of the tools take:
-
A diarization as input and generate another diarization as output. The exceptions are model trainers, which generate a model instead of a diarization.
-
A file containing acoustic vector or a audio file.
Diarization parameters
Input
Diarization file formats are explain in Data section.
--sInputMask=<path>
set the <path>
of the input diarization file. It could be:
-
an absolute path:
--sInputMask=/home/myseg.seg
-
a relative path from the current directory:
--sInputMask=seg/myseg.seg
-
a path where the
%s
is substituted by the show name parameter. For example,--sInputMask=seg/%s.seg
with show name equal tomyseg
is equivalent to--sInputMask=seg/myseg.seg
.
The parameter --sInputFormat
set information about the file. It takes two values separated by a comma. The first value set the file format and the second set the charset. The supported file format are:
-
seg
corresponds to the file format of the toolkit; -
bck
corresponds to a LIUM file format close to a NIST CTM speech file; -
ctl
corresponds to the sphinx control file format with some extension in the speaker name; -
saus.seg is an extension in witch sausages graph are stored (deprecated);
-
seg.xml corresponds to an xml version of
seg
proposed in ANR EPAC project (experimental use only); -
media.xml corresponds to an xml version of
seg
proposed in ANR PORT-MEDIA project (experimental use only);
The charset correspond to a charset name defined in JVM, please read Charset javadoc page. The most use in france are ISO-8859-1
and UTF8
.
To read a ctl
encoded in UTF8
, the parameter is --sInputFormat=seg,UTF8
.
output
The output diarization parameters is similar to the input ones:
-
--sInputMask
is now--sOutputMask=PATH
-
--sInputFormat
is now--sOutputFormat
Feature parameters
input
The parameter --fInputMask
defines the file to load. It could be:
-
an absolute path:
--fInputMask=/home/myfile.mfcc
-
a relative path from the current directory:
--fInputMask=file/myfile.wav
-
a path where
%s
is substituted by the show name parameter. For example,--fInputMask=./file/%s.sph
with show name equal tomyshow
is equivalent to--fInputMask=./file/myshow.sph
.
The parameter --fInputDesc=type[:deltatype] [,s:e:ds:de:dds:dde,dim,c:r:wSize:method]
contains 4 blocks separated by a comma:
-
the type of file:
type
; -
the description of the feature vector:
s:e:ds:de:dds:dde
; -
the number of the static parameters and energy present on disk or computed on the fly;
-
the normalization to applied on feature:
c:r:wSize:method
.
The feature type
could be:
-
sphinx
a sphinx file (mfcc, plp, etc); -
spro4
a spro4 (mfcc, lfcc, filter bank, etc); -
gztxt
a gzipped-text text in which each line corresponds to a vector; -
htk
a htk file; -
featureSetTransformation
only used in programming to transform a feature set into an other feature set; for example when you when to apply CMS on unnormalize features; -
audio8kHz2sphin
,audio16kHz2sphin
,audio22kHz2sphinx
,audio44kHz2sphinx
, oraudio48kHz2sphinx
a sphere, wave audio file (recorded at 8, 16, 22, 44 or 48 KHz) is a converted in mfcc using sphinx 4; -
audio2sphinx
is no more available.
The description of the vector is described by s:e:ds:de:dds:dde
where:
-
s
corresponds to static values, ifs
equal:-
0
the static is not present on disk, -
1
the static is present, -
3
the static is present on disk but values are removed after the loading;
-
-
e
corresponds to the energy, value could also be[0, 1, 2 , 3]
; -
d
corresponds to delta, value could also be[0, 1, 2 , 3]
; -
de
corresponds to delta energy, value could also be[0, 1, 2 , 3]
; -
dd
corresponds to delta delta, value could also be[0, 1, 2 , 3]
; -
dde
corresponds to delta delta energy, value could also be[0, 1, 2 , 3]
.
The feature deltaType
could be:
-
sphinx
a sphinx style delta and delta delta; -
spro4
a spro4 style delta and delta delta; -
htk
a htk style delta and delta delta;
The normalization is controlled by 4 parameters c:r:wSize:method
, where:
-
c
corresponds to the cepstral mean subtraction (CMS), 0 signifies that CMS is not applied, whereas 1 signifies that CMS is applied; -
r
corresponds to the variance normalization, admit value is 0 or 1; -
wSize
is link to the normalization method, the value correspond of the number of frame in a sliding window, on which the normalization is computed (CMS, variance or warping); -
method
indicates how to apply the normalization:-
mean and/or variance are computed on segment if the value is set to 0,
-
mean and/or variance are computed on cluster if the value is set to 1,
-
mean and/or variance are computed on sliding window if the value is set to 2,
-
feature warping if the value is set to 3,
-
feature warping followed by a CMS and/or a variance normalization on segment if the value is set to 4,
-
feature mapping followed by a CMS and/or a variance normalization on cluster if the value is set to 5,
-
feature warping followed by a CMS and/or a variance normalization on cluster if the value is set to 6.
-
Feature warping[1] and feature mapping [2] are classical normalization method employed in speaker verification system. Read:
output
The output feature parameters is similar to the input ones:
-
--fInputMask
is--fOutputMask
, -
--fInputDesc
is--fOutputDesc
.