xsets¶
Copyright 2014-2020 Anthony Larcher
The authors would like to thank the BUT Speech@FIT group (http://speech.fit.vutbr.cz) and Lukas BURGET for sharing the source code that strongly inspired this module. Thank you for your valuable contribution.
-
class
nnet.xsets.CMVN[source]¶ Crop randomly the image in a sample.
- Args:
- output_size (tuple or int): Desired output size. If int, square crop
is made.
-
class
nnet.xsets.FrequencyMask(max_size, feature_size)[source]¶ Crop randomly the image in a sample.
- Args:
- output_size (tuple or int): Desired output size. If int, square crop
is made.
-
class
nnet.xsets.IdMapSet(idmap_name, data_root_path, file_extension)[source]¶ DataSet that provide data according to a sidekit.IdMap object
-
class
nnet.xsets.MFCC(lowfreq=133.333, maxfreq=6855.4976, nlinfilt=0, nlogfilt=40, win_time=0.025, fs=16000, nceps=30, shift=0.01, prefac=0.97)[source]¶ Compute MFCC on the segment.
- Args:
- output_size (tuple or int): Desired output size. If int, square crop
is made.
-
class
nnet.xsets.PreEmphasis(pre_emp_value=0.97)[source]¶ Perform pre-emphasis filtering on audio segment
-
class
nnet.xsets.SideSet(data_set_yaml, set_type='train', chunk_per_segment=1, overlap=0.0, dataset_df=None)[source]¶
-
class
nnet.xsets.StatDataset(idmap, fs_param)[source]¶ Object that initialize a Dataset from an sidekit.IdMap
-
class
nnet.xsets.TemporalMask(max_size)[source]¶ Crop randomly the image in a sample.
- Args:
- output_size (tuple or int): Desired output size. If int, square crop
is made.
-
class
nnet.xsets.VoxDataset(segment_df, speaker_dict, duration=500, transform=None, spec_aug_ratio=0.5, temp_aug_ratio=0.5)[source]¶
-
class
nnet.xsets.XvectorDataset(batch_list, batch_path)[source]¶ Object that takes a list of files from a file and initialize a Dataset