xsets¶
Copyright 2014-2020 Anthony Larcher
The authors would like to thank the BUT Speech@FIT group (http://speech.fit.vutbr.cz) and Lukas BURGET for sharing the source code that strongly inspired this module. Thank you for your valuable contribution.
-
class
nnet.xsets.
CMVN
[source]¶ Crop randomly the image in a sample.
- Args:
- output_size (tuple or int): Desired output size. If int, square crop
is made.
-
class
nnet.xsets.
FrequencyMask
(max_size, feature_size)[source]¶ Crop randomly the image in a sample.
- Args:
- output_size (tuple or int): Desired output size. If int, square crop
is made.
-
class
nnet.xsets.
IdMapSet
(idmap_name, data_root_path, file_extension)[source]¶ DataSet that provide data according to a sidekit.IdMap object
-
class
nnet.xsets.
MFCC
(lowfreq=133.333, maxfreq=6855.4976, nlinfilt=0, nlogfilt=40, win_time=0.025, fs=16000, nceps=30, shift=0.01, prefac=0.97)[source]¶ Compute MFCC on the segment.
- Args:
- output_size (tuple or int): Desired output size. If int, square crop
is made.
-
class
nnet.xsets.
PreEmphasis
(pre_emp_value=0.97)[source]¶ Perform pre-emphasis filtering on audio segment
-
class
nnet.xsets.
SideSet
(data_set_yaml, set_type='train', chunk_per_segment=1, overlap=0.0, dataset_df=None)[source]¶
-
class
nnet.xsets.
StatDataset
(idmap, fs_param)[source]¶ Object that initialize a Dataset from an sidekit.IdMap
-
class
nnet.xsets.
TemporalMask
(max_size)[source]¶ Crop randomly the image in a sample.
- Args:
- output_size (tuple or int): Desired output size. If int, square crop
is made.
-
class
nnet.xsets.
VoxDataset
(segment_df, speaker_dict, duration=500, transform=None, spec_aug_ratio=0.5, temp_aug_ratio=0.5)[source]¶
-
class
nnet.xsets.
XvectorDataset
(batch_list, batch_path)[source]¶ Object that takes a list of files from a file and initialize a Dataset