xsets

Copyright 2014-2020 Anthony Larcher

The authors would like to thank the BUT Speech@FIT group (http://speech.fit.vutbr.cz) and Lukas BURGET for sharing the source code that strongly inspired this module. Thank you for your valuable contribution.

class nnet.xsets.CMVN[source]

Crop randomly the image in a sample.

Args:
output_size (tuple or int): Desired output size. If int, square crop

is made.

class nnet.xsets.FrequencyMask(max_size, feature_size)[source]

Crop randomly the image in a sample.

Args:
output_size (tuple or int): Desired output size. If int, square crop

is made.

class nnet.xsets.IdMapSet(idmap_name, data_root_path, file_extension)[source]

DataSet that provide data according to a sidekit.IdMap object

class nnet.xsets.MFCC(lowfreq=133.333, maxfreq=6855.4976, nlinfilt=0, nlogfilt=40, win_time=0.025, fs=16000, nceps=30, shift=0.01, prefac=0.97)[source]

Compute MFCC on the segment.

Args:
output_size (tuple or int): Desired output size. If int, square crop

is made.

class nnet.xsets.PreEmphasis(pre_emp_value=0.97)[source]

Perform pre-emphasis filtering on audio segment

class nnet.xsets.SideSet(data_set_yaml, set_type='train', chunk_per_segment=1, overlap=0.0, dataset_df=None)[source]
class nnet.xsets.StatDataset(idmap, fs_param)[source]

Object that initialize a Dataset from an sidekit.IdMap

class nnet.xsets.TemporalMask(max_size)[source]

Crop randomly the image in a sample.

Args:
output_size (tuple or int): Desired output size. If int, square crop

is made.

class nnet.xsets.VoxDataset(segment_df, speaker_dict, duration=500, transform=None, spec_aug_ratio=0.5, temp_aug_ratio=0.5)[source]
class nnet.xsets.XvectorDataset(batch_list, batch_path)[source]

Object that takes a list of files from a file and initialize a Dataset

class nnet.xsets.XvectorMultiDataset(batch_list, batch_path)[source]

Object that takes a list of files as a Python List and initialize a DataSet

nnet.xsets.read_batch(batch_file)[source]
Parameters

batch_file

Returns