3. The FeaturesServer object ============================ The `FeaturesServer` loads one or several datasets from one or several HDF5 files and post-process the features (normalization, addition of the temporal context, rasta filtering, feature selection...). The `FeaturesServer` can also encapsulate one or several `FeaturesExtractor` in order to take audio files as inputs. The `FeaturesServer` is used to feed all other objects in **SIDEKIT**. 3.1 Options of the `FeaturesServer` ----------------------------------- +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | Option | Value (default is bold) | | +============================+======================================================+=================================================================================================+ | features_extractor | **None**, a FeaturesExtractor | `FeaturesExtractor` that is used to process audio files | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | feature_filename_structure | **{}**, a string | Structure of the output feature filename. In case all output file name have the same | | | | structure with a different identifier, the filename is completed at run time by adding the | | | | identifier into the filename structure (see examples below). | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | sources | **None**, tuple of tuples | tuple of sources to load features different files (optional: for the case where datasets | | | | are loaded from several files and concatenated. Each tuple includes two values, | | | | a `FeaturesServer` and a boolean. If True, VAD labels are loaded from this source | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | dataset_list | **None**, list of features to load | string of the form '["cep", "fb", vad", energy", "bnf"]' (only when loading datasets | | | | from a single file) list of datasets to load. | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | mask | **None** | string of the form '[1-3,10,15-20]' mask to apply on the concatenated dataset | | | | to select specific components. In this example, coefficients 1,2,3,10,15,16,17,18,19,20 | | | | are kept in this example, | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | feat_norm | **None**, "cmvn", "cms", "stg" | Type of normalization to apply as post-processing | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | global_cmvn | **None**, boolean | If True, use a global mean and std when normalizing the | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | dct_pca | **False**, boolean | if True, add temporal context by using a PCA-DCT approach | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | dct_pca_config | **(12, 12, None)** | Configuration of the PCA-DCT | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | sdc | **False**, boolean | if True, compute shifted delta cepstra coefficients | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | sdc_config | **(1,3,7)**, | Configuration to compute sdc coefficients | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | delta | **False**, float | If True, append the first order derivative | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | double_delta | **False** | If True, append the second order derivative | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | context | **(0,0)** | Add a left and right context | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | traps_dct_nb | **0**, integer | Number of DCT coefficients to keep when computing TRAP coefficients | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | rasta | **False**, boolean | If True, perform RASTA filtering | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ | keep_all_features | **True**, boolean | If True, keep all features, if False, keep frames according to the vad labels | +----------------------------+------------------------------------------------------+-------------------------------------------------------------------------------------------------+ 3.2 Get features from a single file ----------------------------------- The simpler case is to use a `FeaturesServer` to load and process features from a single file. Natively, a `FeaturesServer` is developed to take HDF5 as input but you can provide it with its own `FeaturesExtractor` in order to extract features from an audio file and apply some post-processing on them. Get features from a single HDF5 file ************************************ The simplest case is to use a `FeaturesServer` in order to load and post-process acoustic features from an HDF5 file. Such a `FeaturesServer` is instantiated as follow:: server = sidekit.FeaturesServer(features_extractor=None, feature_filename_structure="feat/sre04/{}.h5", sources=None, dataset_list=["energy", "cep", "vad"], mask="[0-12]", feat_norm="cmvn", global_cmvn=None, dct_pca=False, dct_pca_config=None, sdc=False, sdc_config=None, delta=True, double_delta=True, delta_filter=None, context=None, traps_dct_nb=None, rasta=True, keep_all_features=True) In this example, the `FeaturesServer` will be used to load and concatenate cepstral coefficients and log-energy from a single HDF5 file. The selected features (which can be: energy, cep, fb and bnf) will be concatenated in a predefined order so the order of the list given as a parameter is not important. This order is: energy, cep, fb and bnf. Once these features loaded, only the first 13 coefficients are retained and post-processed (from index 0 to 12 included as given by the mask parameter). The post-processing can include the following steps in this order: - rasta filtering - addition of the temporal context first and second derivatives, DCT-PCA or Shifted Delta Cepstra. - normalization of the features using either Cepstral Mean Variance Normalization (cmvn), Cepstral Mean Subtraction (cms) or Short term Gaussianization (stg). - frame selection according to the VAD labels that are loaded if "vad" is included in the dataset_list. If "vad" is not in the dataset_list, then all frames are kept This `FeaturesServer` is then used as follow:: load(self, show, channel=0, input_feature_filename=None, label=None, start=None, stop=None) Get features from a single audio file ************************************* In case you don't want to store your features on disk as an HDF5 file it is possible to use a `FeaturesServer` including a `FeaturesExtractor` in order to compute the acoustic parameters from an audio file and to select and post-process the features on-the-fly. In this case, the `FeaturesServer` is created as follow:: server = sidekit.FeaturesServer(features_extractor=extractor, feature_filename_structure=None, sources=None, dataset_list=["energy", "cep", "vad"], mask="[0-12]", feat_norm="cmvn", global_cmvn=None, dct_pca=False, dct_pca_config=None, sdc=False, sdc_config=None, delta=True, double_delta=True, delta_filter=None, context=None, traps_dct_nb=None, rasta=True, keep_all_features=True) .. note:: The `FeaturesExtractor` has to be created before. This `FeaturesServer` can then be used as follow:: features, label = server.load(show, channel=0, input_feature_filename=featureFileName, label=None, start=None, stop=None) 3.3 Get features from several files ----------------------------------- Sometimes you might want to combine features coming from different files. Get features from several HDF5 files ************************************ Using a `FeaturesServer` it is possible to combine features coming from different HDF5 files In the following example, we have two sets of featurefiles which have been saved in HDF5 format. Files from the first set have a name with the pattern `filename.h5` while file names from the second set follow the pattern `filename_2.h5`. We are going to load energy from the first set and a few cepstral coefficients from the second set to combine them. The VAD labels will also be taken from the second set. For this purpose, we create two feature servers (one for each set) as follow:: fs_1 = sidekit.FeaturesServer(feature_filename_structure="{}.h5", dataset_list=["energy"], context=None) fs_2 = sidekit.FeaturesServer(feature_filename_structure="{}_2.h5", dataset_list=["cep", "vad"], mask="[0-12]", delta=True, double_delta=True, rasta=True) As you can see, no post processing is applied on the log-energy from the first file while derivatives are added to the 13 first cepstral coefficients from the second set after applying rasta filtering. The last step consists now in ccreating a third FeaturesServer that will call `fs_1` and `fs_2` and then combine the two types of feature before applying a post processing on the complete features:: fs = sidekit.FeaturesServer(sources=((fs_1, False), (fs_2, True)), feat_norm="cmvn", keep_all_features=False) Energy form the first set is concatenated to the cepstral coefficients from the second set together with their first and second derivatives. Eventually, CMVN is applied on the entire features and only selected frames are kept based on the VAD label from the second set. All this is done by calling:: feat, label = fs.load("taaa") The resulting features are 40 dimensional feature frames (13 cepstral coefficients + 13 deltas + 13 delta-deltas and the log-energy). .. Note: label vector only contains True values as all remaining frames are the speech frames selected according to the VAD labels. .. Note: it is possible to combine as many features as we want from as many files as we like Get features from one audio file and one HDF5 file ************************************************** You can use more complex combinations by concatenating features extracted on-line from one audio file and features from an HDF5 feature file. We'll get the same features as in the previous example except that the energy from set 1 is directly computed from the audio file. Cepstral coefficients are still taken from the already extracted features. We first create a `FeaturesExtractor` to process the audio file and the associated `FeaturesServer` that will manage the `FeaturesExtractor`:: extractor = sidekit.FeaturesExtractor(audio_filename_structure="{}.wav", sampling_frequency=8000, lower_frequency=0, higher_frequency=4000, filter_bank="log", filter_bank_size=40, window_size=0.025, shift=0.01, ceps_number=20, vad="snr", snr=40, pre_emphasis=0.97, save_param=["energy"], keep_all_features=True) fs_1 = sidekit.FeaturesServer(features_extractor=extractor, feature_filename_structure=None, sources=None, vad="snr", snr=40, dataset_list=["energy"], keep_all_features=True) Then, we create a second `FeaturesServer` that will load cepstral coefficients from the second set of feature files and perform some post processing:: fs_2 = sidekit.FeaturesServer(feature_filename_structure="{}_2.h5", dataset_list=["cep", "vad"], mask="[0-12]", delta=True, double_delta=True, rasta=True) We now combine the two `FeaturesExtractor` in a third one and perform CMVN:: fs = sidekit.FeaturesServer(sources=((fs_1, False), (fs_2, True)), feat_norm="cmvn", keep_all_features=False) The resulting features are obtained by:: feat, label = fs.load("taab")