io¶

frontend provides methods to process an audio signal in order to extract useful parameters for speaker verification.

frontend.io.asl_meter(x, fs, nbits=16)[source]¶: Measure the Active Speech Level (ASR) of x following ITU-T P.56. If x is integer, it will be scaled to (-1, 1) according to nbits.

frontend.io.degrade_audio(input_path, input_extension, output_path, output_extension, input_filename, output_filename, sampling_frequency=16000, noise_file_name=None, snr=- 10, reverb_file_name=None, reverb_level=- 26.0)[source]¶

Parameters

input_filename –
output_filename –

Returns

frontend.io.pcmu2lin(p, s=4004.189931)[source]¶

Convert Mu-law PCM to linear X=(P,S) lin = pcmu2lin(pcmu) where pcmu contains a vector of mu-law values in the range 0 to 255. No checking is performed to see that numbers are in this range.

Output values are divided by the scale factor s:

s Output Range 1 +-8031 (integer values) 4004.2 +-2.005649 (default) 8031 +-1 8159 +-0.9843118 (+-1 nominal full scale)

The default scaling factor 4004.189931 is equal to sqrt((2207^2 + 5215^2)/2) this follows ITU standard G.711. The sine wave with PCM-Mu values [158 139 139 158 30 11 11 30] has a mean square value of unity corresponding to 0 dBm0. :param p: input signal encoded in PCM mu-law to convert :param s: conversion value from mu-scale oto linear scale

frontend.io.read_audio(input_file_name, framerate=None)[source]¶

Read a 1 or 2-channel audio file in SPHERE, WAVE or RAW PCM format. The format is determined from the file extension. If the sample rate read from the file is a multiple of the one given as parameter, we apply a decimation function to subsample the signal.

Parameters

input_file_name – name of the file to read from
framerate – frame rate, optional, if lower than the one read from the file, subsampling is applied

Returns

the signal as a numpy array and the sampling frequency

frontend.io.read_hdf5(h5f, show, dataset_list='cep', 'fb', 'energy', 'vad', 'bnf')[source]¶

Parameters

h5f – HDF5 file handler to read from
show – identifier of the show to read
dataset_list – list of datasets to read and concatenate

Returns

frontend.io.read_hdf5_segment(file_handler, show, dataset_list, label, start=None, stop=None, global_cmvn=False)[source]¶

Read a segment from a stream in HDF5 format. Return the features in the range start:end In case the start and end cannot be reached, the first or last feature are copied so that the length of the returned segment is always end-start

Parameters

file_name – name of the file to open
dataset – identifier of the dataset in the HDF5 file
mask –
start –
end –

:return:read_hdf5_segment

frontend.io.read_htk(input_file_name, label_file_name='', selected_label='', frame_per_second=100)[source]¶

Read a sequence of features in HTK format

Parameters

input_file_name – name of the file to read from
label_file_name – name of the label file to read from
selected_label – label to select
frame_per_second – number of frames per second

Returns

a tupple (d, fp, dt, tc, t) described below

Note

d = data: column vector for waveforms, 1 row per frame for other types
fp = frame period in seconds
dt = data type (also includes Voicebox code for generating data)
1. WAVEFORM Acoustic waveform
2. LPC Linear prediction coefficients
3. LPREFC LPC Reflection coefficients: -lpcar2rf([1 LPC]);LPREFC(1)=[];
4. LPCEPSTRA LPC Cepstral coefficients
5. LPDELCEP LPC cepstral+delta coefficients (obsolete)
6. IREFC LPC Reflection coefficients (16 bit fixed point)
7. MFCC Mel frequency cepstral coefficients
8. FBANK Log Fliter bank energies
9. MELSPEC linear Mel-scaled spectrum
10. USER User defined features
11. DISCRETE Vector quantised codebook
12. PLP Perceptual Linear prediction
13. ANON
tc = full type code = dt plus (optionally)
one or more of the following modifiers
- 64 _E Includes energy terms
- 128 _N Suppress absolute energy
- 256 _D Include delta coefs
- 512 _A Include acceleration coefs
- 1024 _C Compressed
- 2048 _Z Zero mean static coefs
- 4096 _K CRC checksum (not implemented yet)
- 8192 _0 Include 0’th cepstral coef
- 16384 _V Attach VQ index
- 32768 _T Attach delta-delta-delta index
t = text version of type code e.g. LPC_C_K

This function is a translation of the Matlab code from VOICEBOX is a MATLAB toolbox for speech processing. by Mike Brookes Home page: VOICEBOX <http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html>

frontend.io.read_htk_segment(input_file_name, start=0, stop=None)[source]¶

Read a segment from a stream in SPRO4 format. Return the features in the range start:end In case the start and end cannot be reached, the first or last feature are copied so that the length of the returned segment is always end-start

Parameters

input_file_name – name of the feature file to read from or file-like object alowing to seek in the file
start – index of the first frame to read (start at zero)
stop – index of the last frame following the segment to read. end < 0 means that end is the value of the right_context to add at the end of the file

Returns

a sequence of features in a ndarray of length end-start

frontend.io.read_label(input_file_name, selected_label='speech', frame_per_second=100)[source]¶

Read label file in ALIZE format

Parameters

input_file_name – the label file name
selected_label – the label to return. Default is ‘speech’.
frame_per_second – number of frame per seconds. Used to convert the frame number into time. Default is 100.

Returns

a logical array

frontend.io.read_pcm(input_file_name)[source]¶

Read signal from single channel PCM 16 bits

Parameters: input_file_name – name of the PCM file to read.
Returns: the audio signal read from the file in a ndarray encoded on 16 bits, None and 2 (depth of the encoding in bytes)

frontend.io.read_sph(input_file_name, mode='p')[source]¶

Read a SPHERE audio file

Parameters

input_file_name – name of the file to read
mode – specifies the following (* =default)

Note

Scaling:

‘s’ Auto scale to make data peak = +-1 (use with caution if reading in chunks)

‘r’ Raw unscaled data (integer values)

‘p’ Scaled to make +-1 equal full scale

‘o’ Scale to bin centre rather than bin edge (e.g. 127 rather than 127.5 for 8 bit values,
can be combined with n+p,r,s modes)

‘n’ Scale to negative peak rather than positive peak (e.g. 128.5 rather than 127.5 for 8 bit values,
can be combined with o+p,r,s modes)

Format

‘l’ Little endian data (Intel,DEC) (overrides indication in file)

‘b’ Big endian data (non Intel/DEC) (overrides indication in file)

File I/O
- ‘f’ Do not close file on exit
- ‘d’ Look in data directory: voicebox(‘dir_data’)
- ‘w’ Also read the annotation file *.wrd if present (as in TIMIT)
- ‘t’ Also read the phonetic transcription file *.phn if present (as in TIMIT)

NMAX maximum number of samples to read (or -1 for unlimited [default])

NSKIP number of samples to skip from start of file (or -1 to continue from previous read when FFX
is given instead of FILENAME [default])

Returns: a tupple such that (Y, FS)

Note

Y data matrix of dimension (samples,channels)
FS sample frequency in Hz
WRD{*,2} cell array with word annotations: WRD{*,:)={[t_start t_end],’text’} where times are in seconds
only present if ‘w’ option is given
PHN{*,2} cell array with phoneme annotations: PHN{*,:)={[t_start t_end],’phoneme’} where times
are in seconds only present if ‘t’ option is present
FFX Cell array containing
1. filename
2. header information
1. first header field name
2. first header field value
3. format string (e.g. NIST_1A)
4. 1. file id
  2. current position in file
  3. dataoff byte offset in file to start of data
  4. order byte order (l or b)
  5. nsamp number of samples
  6. number of channels
  7. nbytes bytes per data value
  8. bits number of bits of precision
  9. fs sample frequency
  10. min value
  11. max value
  12. coding 0=PCM,1=uLAW + 0=no compression, 0=shorten,20=wavpack,30=shortpack
  13. file not yet decompressed
5. temporary filename

If no output parameters are specified, header information will be printed. The code to decode shorten-encoded files, is not yet released with this toolkit.

frontend.io.read_spro4(input_file_name, label_file_name='', selected_label='', frame_per_second=100)[source]¶

Read a feature stream in SPRO4 format

Parameters

input_file_name – name of the feature file to read from
label_file_name – name of the label file to read if required. By Default, the method assumes no label to read from.
selected_label – label to select in the label file. Default is none.
frame_per_second – number of frame per seconds. Used to convert the frame number into time. Default is 0.

Returns

a sequence of features in a numpy array

frontend.io.read_spro4_segment(input_file_name, start=0, end=None)[source]¶

Read a segment from a stream in SPRO4 format. Return the features in the range start:end In case the start and end cannot be reached, the first or last feature are copied so that the length of the returned segment is always end-start

Parameters

input_file_name – name of the feature file to read from
start – index of the first frame to read (start at zero)
end – index of the last frame following the segment to read. end < 0 means that end is the value of the right_context to add at the end of the file

Returns

a sequence of features in a ndarray of length end-start

frontend.io.read_wav(input_file_name)[source]¶

Parameters: input_file_name –
Returns

frontend.io.write_hdf5(show, fh, cep, cep_mean, cep_std, energy, energy_mean, energy_std, fb, fb_mean, fb_std, bnf, bnf_mean, bnf_std, label, compression='percentile')[source]¶

Parameters

show – identifier of the show to write
fh – HDF5 file handler
cep – cepstral coefficients to store
cep_mean – pre-computed mean of the cepstral coefficient
cep_std – pre-computed standard deviation of the cepstral coefficient
energy – energy coefficients to store
energy_mean – pre-computed mean of the energy
energy_std – pre-computed standard deviation of the energy
fb – filter-banks coefficients to store
fb_mean – pre-computed mean of the filter bank coefficient
fb_std – pre-computed standard deviation of the filter bank coefficient
bnf – bottle-neck features to store
bnf_mean – pre-computed mean of the bottleneck features
bnf_std – pre-computed standard deviation of the bottleneck features
label – vad labels to store
compressed – boolean, default is False

Returns

Previous topic

Next topic

This Page

io¶