3. The FeaturesServer object¶
The FeaturesServer loads one or several datasets from one or several HDF5 files and post-process the features (normalization, addition of the temporal context, rasta filtering, feature selection…).
The FeaturesServer can also encapsulate one or several FeaturesExtractor in order to take audio files as inputs.
The FeaturesServer is used to feed all other objects in SIDEKIT.
3.1 Options of the FeaturesServer¶
Option |
Value (default is bold) |
|
---|---|---|
features_extractor |
None, a FeaturesExtractor |
FeaturesExtractor that is used to process audio files |
feature_filename_structure |
{}, a string |
Structure of the output feature filename. In case all output file name have the same structure with a different identifier, the filename is completed at run time by adding the identifier into the filename structure (see examples below). |
sources |
None, tuple of tuples |
tuple of sources to load features different files (optional: for the case where datasets are loaded from several files and concatenated. Each tuple includes two values, a FeaturesServer and a boolean. If True, VAD labels are loaded from this source |
dataset_list |
None, list of features to load |
string of the form ‘[“cep”, “fb”, vad”, energy”, “bnf”]’ (only when loading datasets from a single file) list of datasets to load. |
mask |
None |
string of the form ‘[1-3,10,15-20]’ mask to apply on the concatenated dataset to select specific components. In this example, coefficients 1,2,3,10,15,16,17,18,19,20 are kept in this example, |
feat_norm |
None, “cmvn”, “cms”, “stg” |
Type of normalization to apply as post-processing |
global_cmvn |
None, boolean |
If True, use a global mean and std when normalizing the |
dct_pca |
False, boolean |
if True, add temporal context by using a PCA-DCT approach |
dct_pca_config |
(12, 12, None) |
Configuration of the PCA-DCT |
sdc |
False, boolean |
if True, compute shifted delta cepstra coefficients |
sdc_config |
(1,3,7), |
Configuration to compute sdc coefficients |
delta |
False, float |
If True, append the first order derivative |
double_delta |
False |
If True, append the second order derivative |
context |
(0,0) |
Add a left and right context |
traps_dct_nb |
0, integer |
Number of DCT coefficients to keep when computing TRAP coefficients |
rasta |
False, boolean |
If True, perform RASTA filtering |
keep_all_features |
True, boolean |
If True, keep all features, if False, keep frames according to the vad labels |
3.2 Get features from a single file¶
The simpler case is to use a FeaturesServer to load and process features from a single file. Natively, a FeaturesServer is developed to take HDF5 as input but you can provide it with its own FeaturesExtractor in order to extract features from an audio file and apply some post-processing on them.
Get features from a single HDF5 file¶
The simplest case is to use a FeaturesServer in order to load and post-process acoustic features from an HDF5 file. Such a FeaturesServer is instantiated as follow:
server = sidekit.FeaturesServer(features_extractor=None,
feature_filename_structure="feat/sre04/{}.h5",
sources=None,
dataset_list=["energy", "cep", "vad"],
mask="[0-12]",
feat_norm="cmvn",
global_cmvn=None,
dct_pca=False,
dct_pca_config=None,
sdc=False,
sdc_config=None,
delta=True,
double_delta=True,
delta_filter=None,
context=None,
traps_dct_nb=None,
rasta=True,
keep_all_features=True)
In this example, the FeaturesServer will be used to load and concatenate cepstral coefficients and log-energy from a single HDF5 file. The selected features (which can be: energy, cep, fb and bnf) will be concatenated in a predefined order so the order of the list given as a parameter is not important. This order is: energy, cep, fb and bnf. Once these features loaded, only the first 13 coefficients are retained and post-processed (from index 0 to 12 included as given by the mask parameter).
- The post-processing can include the following steps in this order:
rasta filtering
addition of the temporal context first and second derivatives, DCT-PCA or Shifted Delta Cepstra.
normalization of the features using either Cepstral Mean Variance Normalization (cmvn), Cepstral Mean Subtraction (cms) or Short term Gaussianization (stg).
frame selection according to the VAD labels that are loaded if “vad” is included in the dataset_list. If “vad” is not in the dataset_list, then all frames are kept
This FeaturesServer is then used as follow:
load(self, show, channel=0, input_feature_filename=None, label=None, start=None, stop=None)
Get features from a single audio file¶
In case you don’t want to store your features on disk as an HDF5 file it is possible to use a FeaturesServer including a FeaturesExtractor in order to compute the acoustic parameters from an audio file and to select and post-process the features on-the-fly.
In this case, the FeaturesServer is created as follow:
server = sidekit.FeaturesServer(features_extractor=extractor,
feature_filename_structure=None,
sources=None,
dataset_list=["energy", "cep", "vad"],
mask="[0-12]",
feat_norm="cmvn",
global_cmvn=None,
dct_pca=False,
dct_pca_config=None,
sdc=False,
sdc_config=None,
delta=True,
double_delta=True,
delta_filter=None,
context=None,
traps_dct_nb=None,
rasta=True,
keep_all_features=True)
Note
The FeaturesExtractor has to be created before.
- This FeaturesServer can then be used as follow::
features, label = server.load(show, channel=0, input_feature_filename=featureFileName, label=None, start=None, stop=None)
3.3 Get features from several files¶
Sometimes you might want to combine features coming from different files.
Get features from several HDF5 files¶
Using a FeaturesServer it is possible to combine features coming from different HDF5 files In the following example, we have two sets of featurefiles which have been saved in HDF5 format. Files from the first set have a name with the pattern filename.h5 while file names from the second set follow the pattern filename_2.h5.
We are going to load energy from the first set and a few cepstral coefficients from the second set to combine them. The VAD labels will also be taken from the second set. For this purpose, we create two feature servers (one for each set) as follow:
fs_1 = sidekit.FeaturesServer(feature_filename_structure="{}.h5",
dataset_list=["energy"],
context=None)
fs_2 = sidekit.FeaturesServer(feature_filename_structure="{}_2.h5",
dataset_list=["cep", "vad"],
mask="[0-12]",
delta=True,
double_delta=True,
rasta=True)
As you can see, no post processing is applied on the log-energy from the first file while derivatives are added to the 13 first cepstral coefficients from the second set after applying rasta filtering.
The last step consists now in ccreating a third FeaturesServer that will call fs_1 and fs_2 and then combine the two types of feature before applying a post processing on the complete features:
fs = sidekit.FeaturesServer(sources=((fs_1, False), (fs_2, True)),
feat_norm="cmvn",
keep_all_features=False)
Energy form the first set is concatenated to the cepstral coefficients from the second set together with their first and second derivatives. Eventually, CMVN is applied on the entire features and only selected frames are kept based on the VAD label from the second set. All this is done by calling:
feat, label = fs.load("taaa")
The resulting features are 40 dimensional feature frames (13 cepstral coefficients + 13 deltas + 13 delta-deltas and the log-energy).
Get features from one audio file and one HDF5 file¶
You can use more complex combinations by concatenating features extracted on-line from one audio file and features from an HDF5 feature file. We’ll get the same features as in the previous example except that the energy from set 1 is directly computed from the audio file. Cepstral coefficients are still taken from the already extracted features.
We first create a FeaturesExtractor to process the audio file and the associated FeaturesServer that will manage the FeaturesExtractor:
extractor = sidekit.FeaturesExtractor(audio_filename_structure="{}.wav",
sampling_frequency=8000,
lower_frequency=0,
higher_frequency=4000,
filter_bank="log",
filter_bank_size=40,
window_size=0.025,
shift=0.01,
ceps_number=20,
vad="snr",
snr=40,
pre_emphasis=0.97,
save_param=["energy"],
keep_all_features=True)
fs_1 = sidekit.FeaturesServer(features_extractor=extractor,
feature_filename_structure=None,
sources=None,
vad="snr",
snr=40,
dataset_list=["energy"],
keep_all_features=True)
Then, we create a second FeaturesServer that will load cepstral coefficients from the second set of feature files and perform some post processing:
fs_2 = sidekit.FeaturesServer(feature_filename_structure="{}_2.h5",
dataset_list=["cep", "vad"],
mask="[0-12]",
delta=True,
double_delta=True,
rasta=True)
We now combine the two FeaturesExtractor in a third one and perform CMVN:
fs = sidekit.FeaturesServer(sources=((fs_1, False), (fs_2, True)),
feat_norm="cmvn",
keep_all_features=False)
The resulting features are obtained by:
feat, label = fs.load("taab")