3. The FeaturesServer object¶

The FeaturesServer loads one or several datasets from one or several HDF5 files and post-process the features (normalization, addition of the temporal context, rasta filtering, feature selection…).

The FeaturesServer can also encapsulate one or several FeaturesExtractor in order to take audio files as inputs.

The FeaturesServer is used to feed all other objects in SIDEKIT.

3.1 Options of the FeaturesServer¶

Option	Value (default is bold)
features_extractor	None, a FeaturesExtractor	FeaturesExtractor that is used to process audio files
feature_filename_structure	{}, a string	Structure of the output feature filename. In case all output file name have the same structure with a different identifier, the filename is completed at run time by adding the identifier into the filename structure (see examples below).
sources	None, tuple of tuples	tuple of sources to load features different files (optional: for the case where datasets are loaded from several files and concatenated. Each tuple includes two values, a FeaturesServer and a boolean. If True, VAD labels are loaded from this source
dataset_list	None, list of features to load	string of the form ‘[“cep”, “fb”, vad”, energy”, “bnf”]’ (only when loading datasets from a single file) list of datasets to load.
mask	None	string of the form ‘[1-3,10,15-20]’ mask to apply on the concatenated dataset to select specific components. In this example, coefficients 1,2,3,10,15,16,17,18,19,20 are kept in this example,
feat_norm	None, “cmvn”, “cms”, “stg”	Type of normalization to apply as post-processing
global_cmvn	None, boolean	If True, use a global mean and std when normalizing the
dct_pca	False, boolean	if True, add temporal context by using a PCA-DCT approach
dct_pca_config	(12, 12, None)	Configuration of the PCA-DCT
sdc	False, boolean	if True, compute shifted delta cepstra coefficients
sdc_config	(1,3,7),	Configuration to compute sdc coefficients
delta	False, float	If True, append the first order derivative
double_delta	False	If True, append the second order derivative
context	(0,0)	Add a left and right context
traps_dct_nb	0, integer	Number of DCT coefficients to keep when computing TRAP coefficients
rasta	False, boolean	If True, perform RASTA filtering
keep_all_features	True, boolean	If True, keep all features, if False, keep frames according to the vad labels

3.2 Get features from a single file¶

The simpler case is to use a FeaturesServer to load and process features from a single file. Natively, a FeaturesServer is developed to take HDF5 as input but you can provide it with its own FeaturesExtractor in order to extract features from an audio file and apply some post-processing on them.

Get features from a single HDF5 file¶

The simplest case is to use a FeaturesServer in order to load and post-process acoustic features from an HDF5 file. Such a FeaturesServer is instantiated as follow:

server = sidekit.FeaturesServer(features_extractor=None,
                                feature_filename_structure="feat/sre04/{}.h5",
                                sources=None,
                                dataset_list=["energy", "cep", "vad"],
                                mask="[0-12]",
                                feat_norm="cmvn",
                                global_cmvn=None,
                                dct_pca=False,
                                dct_pca_config=None,
                                sdc=False,
                                sdc_config=None,
                                delta=True,
                                double_delta=True,
                                delta_filter=None,
                                context=None,
                                traps_dct_nb=None,
                                rasta=True,
                                keep_all_features=True)

In this example, the FeaturesServer will be used to load and concatenate cepstral coefficients and log-energy from a single HDF5 file. The selected features (which can be: energy, cep, fb and bnf) will be concatenated in a predefined order so the order of the list given as a parameter is not important. This order is: energy, cep, fb and bnf. Once these features loaded, only the first 13 coefficients are retained and post-processed (from index 0 to 12 included as given by the mask parameter).

The post-processing can include the following steps in this order:

rasta filtering
addition of the temporal context first and second derivatives, DCT-PCA or Shifted Delta Cepstra.
normalization of the features using either Cepstral Mean Variance Normalization (cmvn), Cepstral Mean Subtraction (cms) or Short term Gaussianization (stg).
frame selection according to the VAD labels that are loaded if “vad” is included in the dataset_list. If “vad” is not in the dataset_list, then all frames are kept

This FeaturesServer is then used as follow:

load(self, show, channel=0, input_feature_filename=None, label=None, start=None, stop=None)

Get features from a single audio file¶

In case you don’t want to store your features on disk as an HDF5 file it is possible to use a FeaturesServer including a FeaturesExtractor in order to compute the acoustic parameters from an audio file and to select and post-process the features on-the-fly.

In this case, the FeaturesServer is created as follow:

server = sidekit.FeaturesServer(features_extractor=extractor,
                                feature_filename_structure=None,
                                sources=None,
                                dataset_list=["energy", "cep", "vad"],
                                mask="[0-12]",
                                feat_norm="cmvn",
                                global_cmvn=None,
                                dct_pca=False,
                                dct_pca_config=None,
                                sdc=False,
                                sdc_config=None,
                                delta=True,
                                double_delta=True,
                                delta_filter=None,
                                context=None,
                                traps_dct_nb=None,
                                rasta=True,
                                keep_all_features=True)

Note

The FeaturesExtractor has to be created before.

This FeaturesServer can then be used as follow::: features, label = server.load(show, channel=0, input_feature_filename=featureFileName, label=None, start=None, stop=None)

3.3 Get features from several files¶

Sometimes you might want to combine features coming from different files.

Get features from several HDF5 files¶

Using a FeaturesServer it is possible to combine features coming from different HDF5 files In the following example, we have two sets of featurefiles which have been saved in HDF5 format. Files from the first set have a name with the pattern filename.h5 while file names from the second set follow the pattern filename_2.h5.

We are going to load energy from the first set and a few cepstral coefficients from the second set to combine them. The VAD labels will also be taken from the second set. For this purpose, we create two feature servers (one for each set) as follow:

fs_1 = sidekit.FeaturesServer(feature_filename_structure="{}.h5",
                              dataset_list=["energy"],
                              context=None)

fs_2 = sidekit.FeaturesServer(feature_filename_structure="{}_2.h5",
                             dataset_list=["cep", "vad"],
                             mask="[0-12]",
                             delta=True,
                             double_delta=True,
                             rasta=True)

As you can see, no post processing is applied on the log-energy from the first file while derivatives are added to the 13 first cepstral coefficients from the second set after applying rasta filtering.

The last step consists now in ccreating a third FeaturesServer that will call fs_1 and fs_2 and then combine the two types of feature before applying a post processing on the complete features:

fs = sidekit.FeaturesServer(sources=((fs_1, False), (fs_2, True)),
                                feat_norm="cmvn",
                                keep_all_features=False)

Energy form the first set is concatenated to the cepstral coefficients from the second set together with their first and second derivatives. Eventually, CMVN is applied on the entire features and only selected frames are kept based on the VAD label from the second set. All this is done by calling:

feat, label = fs.load("taaa")

The resulting features are 40 dimensional feature frames (13 cepstral coefficients + 13 deltas + 13 delta-deltas and the log-energy).

Get features from one audio file and one HDF5 file¶

You can use more complex combinations by concatenating features extracted on-line from one audio file and features from an HDF5 feature file. We’ll get the same features as in the previous example except that the energy from set 1 is directly computed from the audio file. Cepstral coefficients are still taken from the already extracted features.

We first create a FeaturesExtractor to process the audio file and the associated FeaturesServer that will manage the FeaturesExtractor:

extractor = sidekit.FeaturesExtractor(audio_filename_structure="{}.wav",
                                      sampling_frequency=8000,
                                      lower_frequency=0,
                                      higher_frequency=4000,
                                      filter_bank="log",
                                      filter_bank_size=40,
                                      window_size=0.025,
                                      shift=0.01,
                                      ceps_number=20,
                                      vad="snr",
                                      snr=40,
                                      pre_emphasis=0.97,
                                      save_param=["energy"],
                                      keep_all_features=True)

fs_1 = sidekit.FeaturesServer(features_extractor=extractor,
                                feature_filename_structure=None,
                                sources=None,
                                vad="snr",
                                snr=40,
                                dataset_list=["energy"],
                                keep_all_features=True)

Then, we create a second FeaturesServer that will load cepstral coefficients from the second set of feature files and perform some post processing:

fs_2 = sidekit.FeaturesServer(feature_filename_structure="{}_2.h5",
                             dataset_list=["cep", "vad"],
                             mask="[0-12]",
                             delta=True,
                             double_delta=True,
                             rasta=True)

We now combine the two FeaturesExtractor in a third one and perform CMVN:

fs = sidekit.FeaturesServer(sources=((fs_1, False), (fs_2, True)),
                                feat_norm="cmvn",
                                keep_all_features=False)

The resulting features are obtained by:

feat, label = fs.load("taab")

Table of Contents

Previous topic

Next topic

This Page