io¶
Copyright 2014-2020 Anthony Larcher
frontend
provides methods to process an audio signal in order to extract
useful parameters for speaker verification.
-
frontend.io.
asl_meter
(x, fs, nbits=16)[source]¶ Measure the Active Speech Level (ASR) of x following ITU-T P.56. If x is integer, it will be scaled to (-1, 1) according to nbits.
-
frontend.io.
degrade_audio
(input_path, input_extension, output_path, output_extension, input_filename, output_filename, sampling_frequency=16000, noise_file_name=None, snr=- 10, reverb_file_name=None, reverb_level=- 26.0)[source]¶ - Parameters
input_filename –
output_filename –
- Returns
-
frontend.io.
pcmu2lin
(p, s=4004.189931)[source]¶ Convert Mu-law PCM to linear X=(P,S) lin = pcmu2lin(pcmu) where pcmu contains a vector of mu-law values in the range 0 to 255. No checking is performed to see that numbers are in this range.
Output values are divided by the scale factor s:
s Output Range 1 +-8031 (integer values) 4004.2 +-2.005649 (default) 8031 +-1 8159 +-0.9843118 (+-1 nominal full scale)
The default scaling factor 4004.189931 is equal to sqrt((2207^2 + 5215^2)/2) this follows ITU standard G.711. The sine wave with PCM-Mu values [158 139 139 158 30 11 11 30] has a mean square value of unity corresponding to 0 dBm0. :param p: input signal encoded in PCM mu-law to convert :param s: conversion value from mu-scale oto linear scale
-
frontend.io.
read_audio
(input_file_name, framerate=None)[source]¶ Read a 1 or 2-channel audio file in SPHERE, WAVE or RAW PCM format. The format is determined from the file extension. If the sample rate read from the file is a multiple of the one given as parameter, we apply a decimation function to subsample the signal.
- Parameters
input_file_name – name of the file to read from
framerate – frame rate, optional, if lower than the one read from the file, subsampling is applied
- Returns
the signal as a numpy array and the sampling frequency
-
frontend.io.
read_hdf5
(h5f, show, dataset_list='cep', 'fb', 'energy', 'vad', 'bnf')[source]¶ - Parameters
h5f – HDF5 file handler to read from
show – identifier of the show to read
dataset_list – list of datasets to read and concatenate
- Returns
-
frontend.io.
read_hdf5_segment
(file_handler, show, dataset_list, label, start=None, stop=None, global_cmvn=False)[source]¶ Read a segment from a stream in HDF5 format. Return the features in the range start:end In case the start and end cannot be reached, the first or last feature are copied so that the length of the returned segment is always end-start
- Parameters
file_name – name of the file to open
dataset – identifier of the dataset in the HDF5 file
mask –
start –
end –
:return:read_hdf5_segment
-
frontend.io.
read_htk
(input_file_name, label_file_name='', selected_label='', frame_per_second=100)[source]¶ Read a sequence of features in HTK format
- Parameters
input_file_name – name of the file to read from
label_file_name – name of the label file to read from
selected_label – label to select
frame_per_second – number of frames per second
- Returns
a tupple (d, fp, dt, tc, t) described below
Note
d = data: column vector for waveforms, 1 row per frame for other types
fp = frame period in seconds
dt = data type (also includes Voicebox code for generating data)
WAVEFORM Acoustic waveform
LPC Linear prediction coefficients
LPREFC LPC Reflection coefficients: -lpcar2rf([1 LPC]);LPREFC(1)=[];
LPCEPSTRA LPC Cepstral coefficients
LPDELCEP LPC cepstral+delta coefficients (obsolete)
IREFC LPC Reflection coefficients (16 bit fixed point)
MFCC Mel frequency cepstral coefficients
FBANK Log Fliter bank energies
MELSPEC linear Mel-scaled spectrum
USER User defined features
DISCRETE Vector quantised codebook
PLP Perceptual Linear prediction
ANON
- tc = full type code = dt plus (optionally)
one or more of the following modifiers
64 _E Includes energy terms
128 _N Suppress absolute energy
256 _D Include delta coefs
512 _A Include acceleration coefs
1024 _C Compressed
2048 _Z Zero mean static coefs
4096 _K CRC checksum (not implemented yet)
8192 _0 Include 0’th cepstral coef
16384 _V Attach VQ index
32768 _T Attach delta-delta-delta index
t = text version of type code e.g. LPC_C_K
This function is a translation of the Matlab code from VOICEBOX is a MATLAB toolbox for speech processing. by Mike Brookes Home page: VOICEBOX <http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html>
-
frontend.io.
read_htk_segment
(input_file_name, start=0, stop=None)[source]¶ Read a segment from a stream in SPRO4 format. Return the features in the range start:end In case the start and end cannot be reached, the first or last feature are copied so that the length of the returned segment is always end-start
- Parameters
input_file_name – name of the feature file to read from or file-like object alowing to seek in the file
start – index of the first frame to read (start at zero)
stop – index of the last frame following the segment to read. end < 0 means that end is the value of the right_context to add at the end of the file
- Returns
a sequence of features in a ndarray of length end-start
-
frontend.io.
read_label
(input_file_name, selected_label='speech', frame_per_second=100)[source]¶ Read label file in ALIZE format
- Parameters
input_file_name – the label file name
selected_label – the label to return. Default is ‘speech’.
frame_per_second – number of frame per seconds. Used to convert the frame number into time. Default is 100.
- Returns
a logical array
-
frontend.io.
read_pcm
(input_file_name)[source]¶ Read signal from single channel PCM 16 bits
- Parameters
input_file_name – name of the PCM file to read.
- Returns
the audio signal read from the file in a ndarray encoded on 16 bits, None and 2 (depth of the encoding in bytes)
-
frontend.io.
read_sph
(input_file_name, mode='p')[source]¶ Read a SPHERE audio file
- Parameters
input_file_name – name of the file to read
mode – specifies the following (* =default)
Note
Scaling:
‘s’ Auto scale to make data peak = +-1 (use with caution if reading in chunks)
‘r’ Raw unscaled data (integer values)
‘p’ Scaled to make +-1 equal full scale
- ‘o’ Scale to bin centre rather than bin edge (e.g. 127 rather than 127.5 for 8 bit values,
can be combined with n+p,r,s modes)
- ‘n’ Scale to negative peak rather than positive peak (e.g. 128.5 rather than 127.5 for 8 bit values,
can be combined with o+p,r,s modes)
Format
‘l’ Little endian data (Intel,DEC) (overrides indication in file)
‘b’ Big endian data (non Intel/DEC) (overrides indication in file)
File I/O
‘f’ Do not close file on exit
‘d’ Look in data directory: voicebox(‘dir_data’)
‘w’ Also read the annotation file *.wrd if present (as in TIMIT)
‘t’ Also read the phonetic transcription file *.phn if present (as in TIMIT)
NMAX maximum number of samples to read (or -1 for unlimited [default])
- NSKIP number of samples to skip from start of file (or -1 to continue from previous read when FFX
is given instead of FILENAME [default])
- Returns
a tupple such that (Y, FS)
Note
Y data matrix of dimension (samples,channels)
FS sample frequency in Hz
- WRD{*,2} cell array with word annotations: WRD{*,:)={[t_start t_end],’text’} where times are in seconds
only present if ‘w’ option is given
- PHN{*,2} cell array with phoneme annotations: PHN{*,:)={[t_start t_end],’phoneme’} where times
are in seconds only present if ‘t’ option is present
FFX Cell array containing
filename
header information
first header field name
first header field value
format string (e.g. NIST_1A)
file id
current position in file
dataoff byte offset in file to start of data
order byte order (l or b)
nsamp number of samples
number of channels
nbytes bytes per data value
bits number of bits of precision
fs sample frequency
min value
max value
coding 0=PCM,1=uLAW + 0=no compression, 0=shorten,20=wavpack,30=shortpack
file not yet decompressed
temporary filename
If no output parameters are specified, header information will be printed. The code to decode shorten-encoded files, is not yet released with this toolkit.
-
frontend.io.
read_spro4
(input_file_name, label_file_name='', selected_label='', frame_per_second=100)[source]¶ Read a feature stream in SPRO4 format
- Parameters
input_file_name – name of the feature file to read from
label_file_name – name of the label file to read if required. By Default, the method assumes no label to read from.
selected_label – label to select in the label file. Default is none.
frame_per_second – number of frame per seconds. Used to convert the frame number into time. Default is 0.
- Returns
a sequence of features in a numpy array
-
frontend.io.
read_spro4_segment
(input_file_name, start=0, end=None)[source]¶ Read a segment from a stream in SPRO4 format. Return the features in the range start:end In case the start and end cannot be reached, the first or last feature are copied so that the length of the returned segment is always end-start
- Parameters
input_file_name – name of the feature file to read from
start – index of the first frame to read (start at zero)
end – index of the last frame following the segment to read. end < 0 means that end is the value of the right_context to add at the end of the file
- Returns
a sequence of features in a ndarray of length end-start
-
frontend.io.
write_hdf5
(show, fh, cep, cep_mean, cep_std, energy, energy_mean, energy_std, fb, fb_mean, fb_std, bnf, bnf_mean, bnf_std, label, compression='percentile')[source]¶ - Parameters
show – identifier of the show to write
fh – HDF5 file handler
cep – cepstral coefficients to store
cep_mean – pre-computed mean of the cepstral coefficient
cep_std – pre-computed standard deviation of the cepstral coefficient
energy – energy coefficients to store
energy_mean – pre-computed mean of the energy
energy_std – pre-computed standard deviation of the energy
fb – filter-banks coefficients to store
fb_mean – pre-computed mean of the filter bank coefficient
fb_std – pre-computed standard deviation of the filter bank coefficient
bnf – bottle-neck features to store
bnf_mean – pre-computed mean of the bottleneck features
bnf_std – pre-computed standard deviation of the bottleneck features
label – vad labels to store
compressed – boolean, default is False
- Returns