Features¶

frontend provides methods to process an audio signal in order to extract useful parameters for speaker verification.

frontend.features.audspec(power_spectrum, fs=16000, nfilts=None, fbtype='bark', minfreq=0, maxfreq=8000, sumpower=True, bwidth=1.0)[source]¶

Parameters

power_spectrum –
fs –
nfilts –
fbtype –
minfreq –
maxfreq –
sumpower –
bwidth –

Returns

frontend.features.bark2hz(z)[source]¶

Converts frequencies Bark to Hertz (Hz)

Parameters: z –
Returns

frontend.features.compute_delta(features, win=3, method='filter', filt=array([0.25, 0.5, 0.25, 0.0, - 0.25, - 0.5, - 0.25]))[source]¶

features is a 2D-ndarray each row of features is a a frame

Parameters

features – the feature frames to compute the delta coefficients
win – parameter that set the length of the computation window. The size of the window is (win x 2) + 1
method – method used to compute the delta coefficients can be diff or filter
filt – definition of the filter to use in “filter” mode, default one is similar to SPRO4: filt=numpy.array([.2, .1, 0, -.1, -.2])

Returns

the delta coefficients computed on the original features.

frontend.features.dct_basis(nbasis, length)[source]¶

Parameters

nbasis – number of CT coefficients to keep
length – length of the matrix to process

Returns

a basis of DCT coefficients

frontend.features.dolpc(x, model_order=8)[source]¶

compute autoregressive model from spectral magnitude samples

Parameters

x –
model_order –

Returns

frontend.features.fft2barkmx(n_fft, fs, nfilts=0, width=1.0, minfreq=0.0, maxfreq=8000)[source]¶

Generate a matrix of weights to combine FFT bins into Bark bins. n_fft defines the source FFT size at sampling rate fs. Optional nfilts specifies the number of output bands required (else one per bark), and width is the constant width of each band in Bark (default 1). While wts has n_fft columns, the second half are all zero. Hence, Bark spectrum is fft2barkmx(n_fft,fs) * abs(fft(xincols, n_fft)); 2004-09-05 dpwe@ee.columbia.edu based on rastamat/audspec.m

Parameters

n_fft – the source FFT size at sampling rate fs
fs – sampling rate
nfilts – number of output bands required
width – constant width of each band in Bark (default 1)
minfreq –
maxfreq –

Returns

a matrix of weights to combine FFT bins into Bark bins

frontend.features.fft2melmx(n_fft, fs=8000, nfilts=0, width=1.0, minfreq=0, maxfreq=4000, htkmel=False, constamp=False)[source]¶

Generate a matrix of weights to combine FFT bins into Mel bins. n_fft defines the source FFT size at sampling rate fs. Optional nfilts specifies the number of output bands required (else one per “mel/width”), and width is the constant width of each band relative to standard Mel (default 1). While wts has n_fft columns, the second half are all zero. Hence, Mel spectrum is fft2melmx(n_fft,fs)*abs(fft(xincols,n_fft)); minfreq is the frequency (in Hz) of the lowest band edge; default is 0, but 133.33 is a common standard (to skip LF). maxfreq is frequency in Hz of upper edge; default fs/2. You can exactly duplicate the mel matrix in Slaney’s mfcc.m as fft2melmx(512, 8000, 40, 1, 133.33, 6855.5, 0); htkmel=1 means use HTK’s version of the mel curve, not Slaney’s. constamp=1 means make integration windows peak at 1, not sum to 1. frqs returns bin center frqs.

% 2004-09-05 dpwe@ee.columbia.edu based on fft2barkmx

Parameters

n_fft –
fs –
nfilts –
width –
minfreq –
maxfreq –
htkmel –
constamp –

Returns

frontend.features.framing(sig, win_size, win_shift=1, context=0, 0, pad='zeros')[source]¶

Parameters

sig – input signal, can be mono or multi dimensional
win_size – size of the window in term of samples
win_shift – shift of the sliding window in terme of samples
context – tuple of left and right context
pad – can be zeros or edge

frontend.features.hz2bark(f)[source]¶

Convert frequencies (Hertz) to Bark frequencies

Parameters: f – the input frequency
Returns

frontend.features.hz2mel(f, htk=True)[source]¶

Convert an array of frequency in Hz into mel.

Parameters: f – frequency to convert
Returns: the equivalence on the mel scale.

frontend.features.levinson(r, order=None, allow_singularity=False)[source]¶

Levinson-Durbin recursion.

Find the coefficients of a length(r)-1 order autoregressive linear process

Parameters

r – autocorrelation sequence of length N + 1 (first element being the zero-lag autocorrelation)
order – requested order of the autoregressive coefficients. default is N.
allow_singularity – false by default. Other implementations may be True (e.g., octave)

Returns

the N+1 autoregressive coefficients $A=(1, a_1...a_N)$
the prediction errors
the N reflections coefficients values

This algorithm solves the set of complex linear simultaneous equations using Levinson algorithm.

$\bold{T}_M \left( \begin{array}{c} 1 \\ \bold{a}_M \end{array} \right) = \left( \begin{array}{c} \rho_M \\ \bold{0}_M \end{array} \right)$

where $\bold{T}_M$ is a Hermitian Toeplitz matrix with elements $T_0, T_1, \dots ,T_M$ .

Note

Solving this equations by Gaussian elimination would require $M^3$ operations whereas the levinson algorithm requires $M^2+M$ additions and $M^2+M$ multiplications.

This is equivalent to solve the following symmetric Toeplitz system of linear equations

$\left( \begin{array}{cccc} r_1 & r_2^* & \dots & r_{n}^*\\ r_2 & r_1^* & \dots & r_{n-1}^*\\ \dots & \dots & \dots & \dots\\ r_n & \dots & r_2 & r_1 \end{array} \right) \left( \begin{array}{cccc} a_2\\ a_3 \\ \dots \\ a_{N+1} \end{array} \right) = \left( \begin{array}{cccc} -r_2\\ -r_3 \\ \dots \\ -r_{N+1} \end{array} \right)$

where $r = (r_1 ... r_{N+1})$ is the input autocorrelation vector, and $r_i^*$ denotes the complex conjugate of $r_i$ . The input r is typically a vector of autocorrelation coefficients where lag 0 is the first element $r_1$ .

>>> import numpy; from spectrum import LEVINSON
>>> T = numpy.array([3., -2+0.5j, .7-1j])
>>> a, e, k = LEVINSON(T)

frontend.features.lifter(x, lift=0.6, invs=False)[source]¶

Apply lifter to matrix of cepstra (one per column) lift = exponent of x i^n liftering or, as a negative integer, the length of HTK-style sin-curve liftering. If inverse == 1 (default 0), undo the liftering.

Parameters

x –
lift –
invs –

Returns

frontend.features.lpc2cep(a, nout)[source]¶

Convert the LPC ‘a’ coefficients in each column of lpcas into frames of cepstra. nout is number of cepstra to produce, defaults to size(lpcas,1) 2003-04-11 dpwe@ee.columbia.edu

Parameters

a –
nout –

Returns

frontend.features.lpc2spec(lpcas, nout=17)[source]¶

Convert LPC coeffs back into spectra nout is number of freq channels, default 17 (i.e. for 8 kHz)

Parameters

lpcas –
nout –

Returns

frontend.features.mel2hz(z, htk=True)[source]¶

Convert an array of mel values in Hz.

Parameters: m – ndarray of frequencies to convert in Hz.
Returns: the equivalent values in Hertz.

frontend.features.mel_filter_bank(fs, nfft, lowfreq, maxfreq, widest_nlogfilt, widest_lowfreq, widest_maxfreq)[source]¶

Compute triangular filterbank for cepstral coefficient computation.

Parameters

fs – sampling frequency of the original signal.
nfft – number of points for the Fourier Transform
lowfreq – lower limit of the frequency band filtered
maxfreq – higher limit of the frequency band filtered
widest_nlogfilt – number of log filters
widest_lowfreq – lower frequency of the filter bank
widest_maxfreq – higher frequency of the filter bank
widest_maxfreq – higher frequency of the filter bank

Returns

the filter bank and the central frequencies of each filter

frontend.features.mfcc(input_sig, lowfreq=100, maxfreq=8000, nlinfilt=0, nlogfilt=24, nwin=0.025, fs=16000, nceps=13, shift=0.01, get_spec=False, get_mspec=False, prefac=0.97)[source]¶

Compute Mel Frequency Cepstral Coefficients.

Parameters

input_sig – input signal from which the coefficients are computed. Input audio is supposed to be RAW PCM 16bits
lowfreq – lower limit of the frequency band filtered. Default is 100Hz.
maxfreq – higher limit of the frequency band filtered. Default is 8000Hz.
nlinfilt – number of linear filters to use in low frequencies. Default is 0.
nlogfilt – number of log-linear filters to use in high frequencies. Default is 24.
nwin – length of the sliding window in seconds Default is 0.025.
fs – sampling frequency of the original signal. Default is 16000Hz.
nceps – number of cepstral coefficients to extract. Default is 13.
shift – shift between two analyses. Default is 0.01 (10ms).
get_spec – boolean, if true returns the spectrogram
get_mspec – boolean, if true returns the output of the filter banks
prefac – pre-emphasis filter value

Returns

the cepstral coefficients in a ndaray as well as the Log-spectrum in the mel-domain in a ndarray.

Note

MFCC are computed as follows:

Pre-processing in time-domain (pre-emphasizing)
Compute the spectrum amplitude by windowing with a Hamming window
Filter the signal in the spectral domain with a triangular filter-bank, whose filters are approximatively
linearly spaced on the mel scale, and have equal bandwith in the mel scale
Compute the DCT of the log-spectrom
Log-energy is returned as first coefficient of the feature vector.

For more details, refer to [Davis80].

frontend.features.pca_dct(cep, left_ctx=12, right_ctx=12, p=None)[source]¶

Apply DCT PCA as in [McLaren 2015] paper: Mitchell McLaren and Yun Lei, ‘Improved Speaker Recognition Using DCT coefficients as features’ in ICASSP, 2015

A 1D-dct is applied to the cepstral coefficients on a temporal sliding window. The resulting matrix is then flatten and reduced by using a Principal Component Analysis.

Parameters

cep – a matrix of cepstral cefficients, 1 line per feature vector
left_ctx – number of frames to consider for left context
right_ctx – number of frames to consider for right context
p – a PCA matrix trained on a developpment set to reduce the dimension of the features. P is a portait matrix

frontend.features.plp(input_sig, nwin=0.025, fs=16000, plp_order=13, shift=0.01, get_spec=False, get_mspec=False, prefac=0.97, rasta=True)[source]¶

output is matrix of features, row = feature, col = frame

% fs is sampling rate of samples, defaults to 8000 % dorasta defaults to 1; if 0, just calculate PLP % modelorder is order of PLP model, defaults to 8. 0 -> no PLP

Parameters

input_sig –
fs – sampling rate of samples default is 8000
rasta – default is True, if False, juste compute PLP
model_order – order of the PLP model, default is 8, 0 means no PLP

Returns

matrix of features, row = features, column are frames

frontend.features.postaud(x, fmax, fbtype='bark', broaden=0)[source]¶

do loudness equalization and cube root compression

Parameters

x –
fmax –
fbtype –
broaden –

Returns

frontend.features.power_spectrum(input_sig, fs=8000, win_time=0.025, shift=0.01, prefac=0.97)[source]¶: Compute the power spectrum of the signal. :param input_sig: :param fs: :param win_time: :param shift: :param prefac: :return:

frontend.features.shifted_delta_cepstral(cep, d=1, p=3, k=7)[source]¶

Compute the Shifted-Delta-Cepstral features for language identification

Parameters

cep – matrix of feature, 1 vector per line
d – represents the time advance and delay for the delta computation
k – number of delta-cepstral blocks whose delta-cepstral coefficients are stacked to form the final feature vector
p – time shift between consecutive blocks.

return: cepstral coefficient concatenated with shifted deltas

frontend.features.spec2cep(spec, ncep=13, type=2)[source]¶

Calculate cepstra from spectral samples (in columns of spec) Return ncep cepstral rows (defaults to 9) This one does type II dct, or type I if type is specified as 1 dctm returns the DCT matrix that spec was multiplied by to give cep.

Parameters

spec –
ncep –
type –

Returns

frontend.features.trfbank(fs, nfft, lowfreq, maxfreq, nlinfilt, nlogfilt, midfreq=1000)[source]¶

Compute triangular filterbank for cepstral coefficient computation.

Parameters

fs – sampling frequency of the original signal.
nfft – number of points for the Fourier Transform
lowfreq – lower limit of the frequency band filtered
maxfreq – higher limit of the frequency band filtered
nlinfilt – number of linear filters to use in low frequencies
nlogfilt – number of log-linear filters to use in high frequencies
midfreq – frequency boundary between linear and log-linear filters

Returns

the filter bank and the central frequencies of each filter

Previous topic

Next topic

This Page

Features¶