vad¶

frontend provides methods to process an audio signal in order to extract useful parameters for speaker verification.

frontend.vad.label_fusion(label, win=3)[source]¶

Apply a morphological filtering on the label to remove isolated labels. In case the input is a two channel label (2D ndarray of boolean of same length) the labels of two channels are fused to remove overlaping segments of speech.

Parameters

label – input labels given in a 1D or 2D ndarray
win – parameter or the morphological filters

frontend.vad.pre_emphasis(input_sig, pre)[source]¶: Pre-emphasis of an audio signal. :param input_sig: the input vector of signal to pre emphasize :param pre: value that defines the pre-emphasis filter.

frontend.vad.segment_axis(a, length, overlap=0, axis=None, end='cut', endvalue=0)[source]¶

Generate a new array that chops the given array along the given axis into overlapping frames.

This method has been implemented by Anne Archibald, as part of the talk box toolkit example:

segment_axis(arange(10), 4, 2)
array([[0, 1, 2, 3],
   ( [2, 3, 4, 5],
     [4, 5, 6, 7],
     [6, 7, 8, 9]])

Parameters

a – the array to segment
length – the length of each frame
overlap – the number of array elements by which the frames should overlap
axis – the axis to operate on; if None, act on the flattened array
end – what to do with the last frame, if the array is not evenly divisible into pieces. Options are: - ‘cut’ Simply discard the extra values - ‘wrap’ Copy values from the beginning of the array - ‘pad’ Pad with a constant value
endvalue – the value to use for end=’pad’

Returns

a ndarray

The array is not copied unless necessary (either because it is unevenly strided and being flattened or because end is set to ‘pad’ or ‘wrap’).

frontend.vad.speech_enhancement(X, Gain, NN)[source]¶

This program is only to process the single file seperated by the silence section if the silence section is detected, then a counter to number of buffer is set and pre-processing is required.

Usage: SpeechENhance(wavefilename, Gain, Noise_floor)

Parameters

X – input audio signal
Gain – default value is 0.9, suggestion range 0.6 to 1.4, higher value means more subtraction or noise redcution
NN –

Returns

a 1-dimensional array of boolean that is True for high energy frames.

frontend.vad.vad_energy(log_energy, distrib_nb=3, nb_train_it=8, flooring=0.0001, ceiling=1.0, alpha=2)[source]¶

Parameters

log_energy –
distrib_nb –
nb_train_it –
flooring –
ceiling –
alpha –

Returns

frontend.vad.vad_percentil(log_energy, percent)[source]¶

Parameters

log_energy –
percent –

Returns

frontend.vad.vad_snr(sig, snr, fs=16000, shift=0.01, nwin=256)[source]¶

Select high energy frames based on the Signal to Noise Ratio of the signal. Input signal is expected encoded on 16 bits

Parameters

sig – the input audio signal
snr – Signal to noise ratio to consider
fs – sampling frequency of the input signal in Hz. Default is 16000.
shift – shift between two frames in seconds. Default is 0.01
nwin – number of samples of the sliding window. Default is 256.

Previous topic

Next topic

This Page

vad¶