3. The FeaturesServer object

The FeaturesServer loads one or several datasets from one or several HDF5 files and post-process the features (normalization, addition of the temporal context, rasta filtering, feature selection…).

The FeaturesServer can also encapsulate one or several FeaturesExtractor in order to take audio files as inputs.

The FeaturesServer is used to feed all other objects in SIDEKIT.

3.1 Options of the FeaturesServer

Option

Value (default is bold)

features_extractor

None, a FeaturesExtractor

FeaturesExtractor that is used to process audio files

feature_filename_structure

{}, a string

Structure of the output feature filename. In case all output file name have the same structure with a different identifier, the filename is completed at run time by adding the identifier into the filename structure (see examples below).

sources

None, tuple of tuples

tuple of sources to load features different files (optional: for the case where datasets are loaded from several files and concatenated. Each tuple includes two values, a FeaturesServer and a boolean. If True, VAD labels are loaded from this source

dataset_list

None, list of features to load

string of the form ‘[“cep”, “fb”, vad”, energy”, “bnf”]’ (only when loading datasets from a single file) list of datasets to load.

mask

None

string of the form ‘[1-3,10,15-20]’ mask to apply on the concatenated dataset to select specific components. In this example, coefficients 1,2,3,10,15,16,17,18,19,20 are kept in this example,

feat_norm

None, “cmvn”, “cms”, “stg”

Type of normalization to apply as post-processing

global_cmvn

None, boolean

If True, use a global mean and std when normalizing the

dct_pca

False, boolean

if True, add temporal context by using a PCA-DCT approach

dct_pca_config

(12, 12, None)

Configuration of the PCA-DCT

sdc

False, boolean

if True, compute shifted delta cepstra coefficients

sdc_config

(1,3,7),

Configuration to compute sdc coefficients

delta

False, float

If True, append the first order derivative

double_delta

False

If True, append the second order derivative

context

(0,0)

Add a left and right context

traps_dct_nb

0, integer

Number of DCT coefficients to keep when computing TRAP coefficients

rasta

False, boolean

If True, perform RASTA filtering

keep_all_features

True, boolean

If True, keep all features, if False, keep frames according to the vad labels

3.2 Get features from a single file

The simpler case is to use a FeaturesServer to load and process features from a single file. Natively, a FeaturesServer is developed to take HDF5 as input but you can provide it with its own FeaturesExtractor in order to extract features from an audio file and apply some post-processing on them.

Get features from a single HDF5 file

The simplest case is to use a FeaturesServer in order to load and post-process acoustic features from an HDF5 file. Such a FeaturesServer is instantiated as follow:

server = sidekit.FeaturesServer(features_extractor=None,
                                feature_filename_structure="feat/sre04/{}.h5",
                                sources=None,
                                dataset_list=["energy", "cep", "vad"],
                                mask="[0-12]",
                                feat_norm="cmvn",
                                global_cmvn=None,
                                dct_pca=False,
                                dct_pca_config=None,
                                sdc=False,
                                sdc_config=None,
                                delta=True,
                                double_delta=True,
                                delta_filter=None,
                                context=None,
                                traps_dct_nb=None,
                                rasta=True,
                                keep_all_features=True)

In this example, the FeaturesServer will be used to load and concatenate cepstral coefficients and log-energy from a single HDF5 file. The selected features (which can be: energy, cep, fb and bnf) will be concatenated in a predefined order so the order of the list given as a parameter is not important. This order is: energy, cep, fb and bnf. Once these features loaded, only the first 13 coefficients are retained and post-processed (from index 0 to 12 included as given by the mask parameter).

The post-processing can include the following steps in this order:
  • rasta filtering

  • addition of the temporal context first and second derivatives, DCT-PCA or Shifted Delta Cepstra.

  • normalization of the features using either Cepstral Mean Variance Normalization (cmvn), Cepstral Mean Subtraction (cms) or Short term Gaussianization (stg).

  • frame selection according to the VAD labels that are loaded if “vad” is included in the dataset_list. If “vad” is not in the dataset_list, then all frames are kept

This FeaturesServer is then used as follow:

load(self, show, channel=0, input_feature_filename=None, label=None, start=None, stop=None)

Get features from a single audio file

In case you don’t want to store your features on disk as an HDF5 file it is possible to use a FeaturesServer including a FeaturesExtractor in order to compute the acoustic parameters from an audio file and to select and post-process the features on-the-fly.

In this case, the FeaturesServer is created as follow:

server = sidekit.FeaturesServer(features_extractor=extractor,
                                feature_filename_structure=None,
                                sources=None,
                                dataset_list=["energy", "cep", "vad"],
                                mask="[0-12]",
                                feat_norm="cmvn",
                                global_cmvn=None,
                                dct_pca=False,
                                dct_pca_config=None,
                                sdc=False,
                                sdc_config=None,
                                delta=True,
                                double_delta=True,
                                delta_filter=None,
                                context=None,
                                traps_dct_nb=None,
                                rasta=True,
                                keep_all_features=True)

Note

The FeaturesExtractor has to be created before.

This FeaturesServer can then be used as follow::

features, label = server.load(show, channel=0, input_feature_filename=featureFileName, label=None, start=None, stop=None)

3.3 Get features from several files

Sometimes you might want to combine features coming from different files.

Get features from several HDF5 files

Using a FeaturesServer it is possible to combine features coming from different HDF5 files In the following example, we have two sets of featurefiles which have been saved in HDF5 format. Files from the first set have a name with the pattern filename.h5 while file names from the second set follow the pattern filename_2.h5.

We are going to load energy from the first set and a few cepstral coefficients from the second set to combine them. The VAD labels will also be taken from the second set. For this purpose, we create two feature servers (one for each set) as follow:

fs_1 = sidekit.FeaturesServer(feature_filename_structure="{}.h5",
                              dataset_list=["energy"],
                              context=None)

fs_2 = sidekit.FeaturesServer(feature_filename_structure="{}_2.h5",
                             dataset_list=["cep", "vad"],
                             mask="[0-12]",
                             delta=True,
                             double_delta=True,
                             rasta=True)

As you can see, no post processing is applied on the log-energy from the first file while derivatives are added to the 13 first cepstral coefficients from the second set after applying rasta filtering.

The last step consists now in ccreating a third FeaturesServer that will call fs_1 and fs_2 and then combine the two types of feature before applying a post processing on the complete features:

fs = sidekit.FeaturesServer(sources=((fs_1, False), (fs_2, True)),
                                feat_norm="cmvn",
                                keep_all_features=False)

Energy form the first set is concatenated to the cepstral coefficients from the second set together with their first and second derivatives. Eventually, CMVN is applied on the entire features and only selected frames are kept based on the VAD label from the second set. All this is done by calling:

feat, label = fs.load("taaa")

The resulting features are 40 dimensional feature frames (13 cepstral coefficients + 13 deltas + 13 delta-deltas and the log-energy).

Get features from one audio file and one HDF5 file

You can use more complex combinations by concatenating features extracted on-line from one audio file and features from an HDF5 feature file. We’ll get the same features as in the previous example except that the energy from set 1 is directly computed from the audio file. Cepstral coefficients are still taken from the already extracted features.

We first create a FeaturesExtractor to process the audio file and the associated FeaturesServer that will manage the FeaturesExtractor:

extractor = sidekit.FeaturesExtractor(audio_filename_structure="{}.wav",
                                      sampling_frequency=8000,
                                      lower_frequency=0,
                                      higher_frequency=4000,
                                      filter_bank="log",
                                      filter_bank_size=40,
                                      window_size=0.025,
                                      shift=0.01,
                                      ceps_number=20,
                                      vad="snr",
                                      snr=40,
                                      pre_emphasis=0.97,
                                      save_param=["energy"],
                                      keep_all_features=True)

fs_1 = sidekit.FeaturesServer(features_extractor=extractor,
                                feature_filename_structure=None,
                                sources=None,
                                vad="snr",
                                snr=40,
                                dataset_list=["energy"],
                                keep_all_features=True)

Then, we create a second FeaturesServer that will load cepstral coefficients from the second set of feature files and perform some post processing:

fs_2 = sidekit.FeaturesServer(feature_filename_structure="{}_2.h5",
                             dataset_list=["cep", "vad"],
                             mask="[0-12]",
                             delta=True,
                             double_delta=True,
                             rasta=True)

We now combine the two FeaturesExtractor in a third one and perform CMVN:

fs = sidekit.FeaturesServer(sources=((fs_1, False), (fs_2, True)),
                                feat_norm="cmvn",
                                keep_all_features=False)

The resulting features are obtained by:

feat, label = fs.load("taab")