Features¶
Copyright 20142020 Anthony Larcher and Sylvain Meignier
frontend
provides methods to process an audio signal in order to extract
useful parameters for speaker verification.

frontend.features.
audspec
(power_spectrum, fs=16000, nfilts=None, fbtype='bark', minfreq=0, maxfreq=8000, sumpower=True, bwidth=1.0)[source]¶  Parameters
power_spectrum –
fs –
nfilts –
fbtype –
minfreq –
maxfreq –
sumpower –
bwidth –
 Returns

frontend.features.
bark2hz
(z)[source]¶ Converts frequencies Bark to Hertz (Hz)
 Parameters
z –
 Returns

frontend.features.
compute_delta
(features, win=3, method='filter', filt=array([0.25, 0.5, 0.25, 0.0,  0.25,  0.5,  0.25]))[source]¶ features is a 2Dndarray each row of features is a a frame
 Parameters
features – the feature frames to compute the delta coefficients
win – parameter that set the length of the computation window. The size of the window is (win x 2) + 1
method – method used to compute the delta coefficients can be diff or filter
filt – definition of the filter to use in “filter” mode, default one is similar to SPRO4: filt=numpy.array([.2, .1, 0, .1, .2])
 Returns
the delta coefficients computed on the original features.

frontend.features.
dct_basis
(nbasis, length)[source]¶  Parameters
nbasis – number of CT coefficients to keep
length – length of the matrix to process
 Returns
a basis of DCT coefficients

frontend.features.
dolpc
(x, model_order=8)[source]¶ compute autoregressive model from spectral magnitude samples
 Parameters
x –
model_order –
 Returns

frontend.features.
fft2barkmx
(n_fft, fs, nfilts=0, width=1.0, minfreq=0.0, maxfreq=8000)[source]¶ Generate a matrix of weights to combine FFT bins into Bark bins. n_fft defines the source FFT size at sampling rate fs. Optional nfilts specifies the number of output bands required (else one per bark), and width is the constant width of each band in Bark (default 1). While wts has n_fft columns, the second half are all zero. Hence, Bark spectrum is fft2barkmx(n_fft,fs) * abs(fft(xincols, n_fft)); 20040905 dpwe@ee.columbia.edu based on rastamat/audspec.m
 Parameters
n_fft – the source FFT size at sampling rate fs
fs – sampling rate
nfilts – number of output bands required
width – constant width of each band in Bark (default 1)
minfreq –
maxfreq –
 Returns
a matrix of weights to combine FFT bins into Bark bins

frontend.features.
fft2melmx
(n_fft, fs=8000, nfilts=0, width=1.0, minfreq=0, maxfreq=4000, htkmel=False, constamp=False)[source]¶ Generate a matrix of weights to combine FFT bins into Mel bins. n_fft defines the source FFT size at sampling rate fs. Optional nfilts specifies the number of output bands required (else one per “mel/width”), and width is the constant width of each band relative to standard Mel (default 1). While wts has n_fft columns, the second half are all zero. Hence, Mel spectrum is fft2melmx(n_fft,fs)*abs(fft(xincols,n_fft)); minfreq is the frequency (in Hz) of the lowest band edge; default is 0, but 133.33 is a common standard (to skip LF). maxfreq is frequency in Hz of upper edge; default fs/2. You can exactly duplicate the mel matrix in Slaney’s mfcc.m as fft2melmx(512, 8000, 40, 1, 133.33, 6855.5, 0); htkmel=1 means use HTK’s version of the mel curve, not Slaney’s. constamp=1 means make integration windows peak at 1, not sum to 1. frqs returns bin center frqs.
% 20040905 dpwe@ee.columbia.edu based on fft2barkmx
 Parameters
n_fft –
fs –
nfilts –
width –
minfreq –
maxfreq –
htkmel –
constamp –
 Returns

frontend.features.
framing
(sig, win_size, win_shift=1, context=0, 0, pad='zeros')[source]¶  Parameters
sig – input signal, can be mono or multi dimensional
win_size – size of the window in term of samples
win_shift – shift of the sliding window in terme of samples
context – tuple of left and right context
pad – can be zeros or edge

frontend.features.
hz2bark
(f)[source]¶ Convert frequencies (Hertz) to Bark frequencies
 Parameters
f – the input frequency
 Returns

frontend.features.
hz2mel
(f, htk=True)[source]¶ Convert an array of frequency in Hz into mel.
 Parameters
f – frequency to convert
 Returns
the equivalence on the mel scale.

frontend.features.
levinson
(r, order=None, allow_singularity=False)[source]¶ LevinsonDurbin recursion.
Find the coefficients of a length(r)1 order autoregressive linear process
 Parameters
r – autocorrelation sequence of length N + 1 (first element being the zerolag autocorrelation)
order – requested order of the autoregressive coefficients. default is N.
allow_singularity – false by default. Other implementations may be True (e.g., octave)
 Returns
the N+1 autoregressive coefficients
the prediction errors
the N reflections coefficients values
This algorithm solves the set of complex linear simultaneous equations using Levinson algorithm.
where is a Hermitian Toeplitz matrix with elements .
Note
Solving this equations by Gaussian elimination would require operations whereas the levinson algorithm requires additions and multiplications.
This is equivalent to solve the following symmetric Toeplitz system of linear equations
where is the input autocorrelation vector, and denotes the complex conjugate of . The input r is typically a vector of autocorrelation coefficients where lag 0 is the first element .
>>> import numpy; from spectrum import LEVINSON >>> T = numpy.array([3., 2+0.5j, .71j]) >>> a, e, k = LEVINSON(T)

frontend.features.
lifter
(x, lift=0.6, invs=False)[source]¶ Apply lifter to matrix of cepstra (one per column) lift = exponent of x i^n liftering or, as a negative integer, the length of HTKstyle sincurve liftering. If inverse == 1 (default 0), undo the liftering.
 Parameters
x –
lift –
invs –
 Returns

frontend.features.
lpc2cep
(a, nout)[source]¶ Convert the LPC ‘a’ coefficients in each column of lpcas into frames of cepstra. nout is number of cepstra to produce, defaults to size(lpcas,1) 20030411 dpwe@ee.columbia.edu
 Parameters
a –
nout –
 Returns

frontend.features.
lpc2spec
(lpcas, nout=17)[source]¶ Convert LPC coeffs back into spectra nout is number of freq channels, default 17 (i.e. for 8 kHz)
 Parameters
lpcas –
nout –
 Returns

frontend.features.
mel2hz
(z, htk=True)[source]¶ Convert an array of mel values in Hz.
 Parameters
m – ndarray of frequencies to convert in Hz.
 Returns
the equivalent values in Hertz.

frontend.features.
mel_filter_bank
(fs, nfft, lowfreq, maxfreq, widest_nlogfilt, widest_lowfreq, widest_maxfreq)[source]¶ Compute triangular filterbank for cepstral coefficient computation.
 Parameters
fs – sampling frequency of the original signal.
nfft – number of points for the Fourier Transform
lowfreq – lower limit of the frequency band filtered
maxfreq – higher limit of the frequency band filtered
widest_nlogfilt – number of log filters
widest_lowfreq – lower frequency of the filter bank
widest_maxfreq – higher frequency of the filter bank
widest_maxfreq – higher frequency of the filter bank
 Returns
the filter bank and the central frequencies of each filter

frontend.features.
mfcc
(input_sig, lowfreq=100, maxfreq=8000, nlinfilt=0, nlogfilt=24, nwin=0.025, fs=16000, nceps=13, shift=0.01, get_spec=False, get_mspec=False, prefac=0.97)[source]¶ Compute Mel Frequency Cepstral Coefficients.
 Parameters
input_sig – input signal from which the coefficients are computed. Input audio is supposed to be RAW PCM 16bits
lowfreq – lower limit of the frequency band filtered. Default is 100Hz.
maxfreq – higher limit of the frequency band filtered. Default is 8000Hz.
nlinfilt – number of linear filters to use in low frequencies. Default is 0.
nlogfilt – number of loglinear filters to use in high frequencies. Default is 24.
nwin – length of the sliding window in seconds Default is 0.025.
fs – sampling frequency of the original signal. Default is 16000Hz.
nceps – number of cepstral coefficients to extract. Default is 13.
shift – shift between two analyses. Default is 0.01 (10ms).
get_spec – boolean, if true returns the spectrogram
get_mspec – boolean, if true returns the output of the filter banks
prefac – preemphasis filter value
 Returns
the cepstral coefficients in a ndaray as well as the Logspectrum in the meldomain in a ndarray.
Note
MFCC are computed as follows:
Preprocessing in timedomain (preemphasizing)
Compute the spectrum amplitude by windowing with a Hamming window
 Filter the signal in the spectral domain with a triangular filterbank, whose filters are approximatively
linearly spaced on the mel scale, and have equal bandwith in the mel scale
Compute the DCT of the logspectrom
Logenergy is returned as first coefficient of the feature vector.
For more details, refer to [Davis80].

frontend.features.
pca_dct
(cep, left_ctx=12, right_ctx=12, p=None)[source]¶ Apply DCT PCA as in [McLaren 2015] paper: Mitchell McLaren and Yun Lei, ‘Improved Speaker Recognition Using DCT coefficients as features’ in ICASSP, 2015
A 1Ddct is applied to the cepstral coefficients on a temporal sliding window. The resulting matrix is then flatten and reduced by using a Principal Component Analysis.
 Parameters
cep – a matrix of cepstral cefficients, 1 line per feature vector
left_ctx – number of frames to consider for left context
right_ctx – number of frames to consider for right context
p – a PCA matrix trained on a developpment set to reduce the dimension of the features. P is a portait matrix

frontend.features.
plp
(input_sig, nwin=0.025, fs=16000, plp_order=13, shift=0.01, get_spec=False, get_mspec=False, prefac=0.97, rasta=True)[source]¶ output is matrix of features, row = feature, col = frame
% fs is sampling rate of samples, defaults to 8000 % dorasta defaults to 1; if 0, just calculate PLP % modelorder is order of PLP model, defaults to 8. 0 > no PLP
 Parameters
input_sig –
fs – sampling rate of samples default is 8000
rasta – default is True, if False, juste compute PLP
model_order – order of the PLP model, default is 8, 0 means no PLP
 Returns
matrix of features, row = features, column are frames

frontend.features.
postaud
(x, fmax, fbtype='bark', broaden=0)[source]¶ do loudness equalization and cube root compression
 Parameters
x –
fmax –
fbtype –
broaden –
 Returns

frontend.features.
power_spectrum
(input_sig, fs=8000, win_time=0.025, shift=0.01, prefac=0.97)[source]¶ Compute the power spectrum of the signal. :param input_sig: :param fs: :param win_time: :param shift: :param prefac: :return:

frontend.features.
shifted_delta_cepstral
(cep, d=1, p=3, k=7)[source]¶ Compute the ShiftedDeltaCepstral features for language identification
 Parameters
cep – matrix of feature, 1 vector per line
d – represents the time advance and delay for the delta computation
k – number of deltacepstral blocks whose deltacepstral coefficients are stacked to form the final feature vector
p – time shift between consecutive blocks.
return: cepstral coefficient concatenated with shifted deltas

frontend.features.
spec2cep
(spec, ncep=13, type=2)[source]¶ Calculate cepstra from spectral samples (in columns of spec) Return ncep cepstral rows (defaults to 9) This one does type II dct, or type I if type is specified as 1 dctm returns the DCT matrix that spec was multiplied by to give cep.
 Parameters
spec –
ncep –
type –
 Returns

frontend.features.
trfbank
(fs, nfft, lowfreq, maxfreq, nlinfilt, nlogfilt, midfreq=1000)[source]¶ Compute triangular filterbank for cepstral coefficient computation.
 Parameters
fs – sampling frequency of the original signal.
nfft – number of points for the Fourier Transform
lowfreq – lower limit of the frequency band filtered
maxfreq – higher limit of the frequency band filtered
nlinfilt – number of linear filters to use in low frequencies
nlogfilt – number of loglinear filters to use in high frequencies
midfreq – frequency boundary between linear and loglinear filters
 Returns
the filter bank and the central frequencies of each filter