vad¶
Copyright 2014-2020 Anthony Larcher and Sylvain Meignier
frontend
provides methods to process an audio signal in order to extract
useful parameters for speaker verification.
-
frontend.vad.
label_fusion
(label, win=3)[source]¶ Apply a morphological filtering on the label to remove isolated labels. In case the input is a two channel label (2D ndarray of boolean of same length) the labels of two channels are fused to remove overlaping segments of speech.
- Parameters
label – input labels given in a 1D or 2D ndarray
win – parameter or the morphological filters
-
frontend.vad.
pre_emphasis
(input_sig, pre)[source]¶ Pre-emphasis of an audio signal. :param input_sig: the input vector of signal to pre emphasize :param pre: value that defines the pre-emphasis filter.
-
frontend.vad.
segment_axis
(a, length, overlap=0, axis=None, end='cut', endvalue=0)[source]¶ Generate a new array that chops the given array along the given axis into overlapping frames.
This method has been implemented by Anne Archibald, as part of the talk box toolkit example:
segment_axis(arange(10), 4, 2) array([[0, 1, 2, 3], ( [2, 3, 4, 5], [4, 5, 6, 7], [6, 7, 8, 9]])
- Parameters
a – the array to segment
length – the length of each frame
overlap – the number of array elements by which the frames should overlap
axis – the axis to operate on; if None, act on the flattened array
end – what to do with the last frame, if the array is not evenly divisible into pieces. Options are: - ‘cut’ Simply discard the extra values - ‘wrap’ Copy values from the beginning of the array - ‘pad’ Pad with a constant value
endvalue – the value to use for end=’pad’
- Returns
a ndarray
The array is not copied unless necessary (either because it is unevenly strided and being flattened or because end is set to ‘pad’ or ‘wrap’).
-
frontend.vad.
speech_enhancement
(X, Gain, NN)[source]¶ This program is only to process the single file seperated by the silence section if the silence section is detected, then a counter to number of buffer is set and pre-processing is required.
Usage: SpeechENhance(wavefilename, Gain, Noise_floor)
- Parameters
X – input audio signal
Gain – default value is 0.9, suggestion range 0.6 to 1.4, higher value means more subtraction or noise redcution
NN –
- Returns
a 1-dimensional array of boolean that is True for high energy frames.
Copyright 2014 Sun Han Wu and Anthony Larcher
-
frontend.vad.
vad_energy
(log_energy, distrib_nb=3, nb_train_it=8, flooring=0.0001, ceiling=1.0, alpha=2)[source]¶ - Parameters
log_energy –
distrib_nb –
nb_train_it –
flooring –
ceiling –
alpha –
- Returns
-
frontend.vad.
vad_snr
(sig, snr, fs=16000, shift=0.01, nwin=256)[source]¶ Select high energy frames based on the Signal to Noise Ratio of the signal. Input signal is expected encoded on 16 bits
- Parameters
sig – the input audio signal
snr – Signal to noise ratio to consider
fs – sampling frequency of the input signal in Hz. Default is 16000.
shift – shift between two frames in seconds. Default is 0.01
nwin – number of samples of the sliding window. Default is 256.