io

Copyright 2014-2020 Anthony Larcher

frontend provides methods to process an audio signal in order to extract useful parameters for speaker verification.

frontend.io.asl_meter(x, fs, nbits=16)[source]

Measure the Active Speech Level (ASR) of x following ITU-T P.56. If x is integer, it will be scaled to (-1, 1) according to nbits.

frontend.io.degrade_audio(input_path, input_extension, output_path, output_extension, input_filename, output_filename, sampling_frequency=16000, noise_file_name=None, snr=- 10, reverb_file_name=None, reverb_level=- 26.0)[source]
Parameters
  • input_filename

  • output_filename

Returns

frontend.io.pcmu2lin(p, s=4004.189931)[source]

Convert Mu-law PCM to linear X=(P,S) lin = pcmu2lin(pcmu) where pcmu contains a vector of mu-law values in the range 0 to 255. No checking is performed to see that numbers are in this range.

Output values are divided by the scale factor s:

s Output Range 1 +-8031 (integer values) 4004.2 +-2.005649 (default) 8031 +-1 8159 +-0.9843118 (+-1 nominal full scale)

The default scaling factor 4004.189931 is equal to sqrt((2207^2 + 5215^2)/2) this follows ITU standard G.711. The sine wave with PCM-Mu values [158 139 139 158 30 11 11 30] has a mean square value of unity corresponding to 0 dBm0. :param p: input signal encoded in PCM mu-law to convert :param s: conversion value from mu-scale oto linear scale

frontend.io.read_audio(input_file_name, framerate=None)[source]

Read a 1 or 2-channel audio file in SPHERE, WAVE or RAW PCM format. The format is determined from the file extension. If the sample rate read from the file is a multiple of the one given as parameter, we apply a decimation function to subsample the signal.

Parameters
  • input_file_name – name of the file to read from

  • framerate – frame rate, optional, if lower than the one read from the file, subsampling is applied

Returns

the signal as a numpy array and the sampling frequency

frontend.io.read_hdf5(h5f, show, dataset_list='cep', 'fb', 'energy', 'vad', 'bnf')[source]
Parameters
  • h5f – HDF5 file handler to read from

  • show – identifier of the show to read

  • dataset_list – list of datasets to read and concatenate

Returns

frontend.io.read_hdf5_segment(file_handler, show, dataset_list, label, start=None, stop=None, global_cmvn=False)[source]

Read a segment from a stream in HDF5 format. Return the features in the range start:end In case the start and end cannot be reached, the first or last feature are copied so that the length of the returned segment is always end-start

Parameters
  • file_name – name of the file to open

  • dataset – identifier of the dataset in the HDF5 file

  • mask

  • start

  • end

:return:read_hdf5_segment

frontend.io.read_htk(input_file_name, label_file_name='', selected_label='', frame_per_second=100)[source]

Read a sequence of features in HTK format

Parameters
  • input_file_name – name of the file to read from

  • label_file_name – name of the label file to read from

  • selected_label – label to select

  • frame_per_second – number of frames per second

Returns

a tupple (d, fp, dt, tc, t) described below

Note

  • d = data: column vector for waveforms, 1 row per frame for other types

  • fp = frame period in seconds

  • dt = data type (also includes Voicebox code for generating data)

    1. WAVEFORM Acoustic waveform

    2. LPC Linear prediction coefficients

    3. LPREFC LPC Reflection coefficients: -lpcar2rf([1 LPC]);LPREFC(1)=[];

    4. LPCEPSTRA LPC Cepstral coefficients

    5. LPDELCEP LPC cepstral+delta coefficients (obsolete)

    6. IREFC LPC Reflection coefficients (16 bit fixed point)

    7. MFCC Mel frequency cepstral coefficients

    8. FBANK Log Fliter bank energies

    9. MELSPEC linear Mel-scaled spectrum

    10. USER User defined features

    11. DISCRETE Vector quantised codebook

    12. PLP Perceptual Linear prediction

    13. ANON

  • tc = full type code = dt plus (optionally)

    one or more of the following modifiers

    • 64 _E Includes energy terms

    • 128 _N Suppress absolute energy

    • 256 _D Include delta coefs

    • 512 _A Include acceleration coefs

    • 1024 _C Compressed

    • 2048 _Z Zero mean static coefs

    • 4096 _K CRC checksum (not implemented yet)

    • 8192 _0 Include 0’th cepstral coef

    • 16384 _V Attach VQ index

    • 32768 _T Attach delta-delta-delta index

  • t = text version of type code e.g. LPC_C_K

This function is a translation of the Matlab code from VOICEBOX is a MATLAB toolbox for speech processing. by Mike Brookes Home page: VOICEBOX <http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html>

frontend.io.read_htk_segment(input_file_name, start=0, stop=None)[source]

Read a segment from a stream in SPRO4 format. Return the features in the range start:end In case the start and end cannot be reached, the first or last feature are copied so that the length of the returned segment is always end-start

Parameters
  • input_file_name – name of the feature file to read from or file-like object alowing to seek in the file

  • start – index of the first frame to read (start at zero)

  • stop – index of the last frame following the segment to read. end < 0 means that end is the value of the right_context to add at the end of the file

Returns

a sequence of features in a ndarray of length end-start

frontend.io.read_label(input_file_name, selected_label='speech', frame_per_second=100)[source]

Read label file in ALIZE format

Parameters
  • input_file_name – the label file name

  • selected_label – the label to return. Default is ‘speech’.

  • frame_per_second – number of frame per seconds. Used to convert the frame number into time. Default is 100.

Returns

a logical array

frontend.io.read_pcm(input_file_name)[source]

Read signal from single channel PCM 16 bits

Parameters

input_file_name – name of the PCM file to read.

Returns

the audio signal read from the file in a ndarray encoded on 16 bits, None and 2 (depth of the encoding in bytes)

frontend.io.read_sph(input_file_name, mode='p')[source]

Read a SPHERE audio file

Parameters
  • input_file_name – name of the file to read

  • mode – specifies the following (* =default)

Note

  • Scaling:

    • ‘s’ Auto scale to make data peak = +-1 (use with caution if reading in chunks)

    • ‘r’ Raw unscaled data (integer values)

    • ‘p’ Scaled to make +-1 equal full scale

    • ‘o’ Scale to bin centre rather than bin edge (e.g. 127 rather than 127.5 for 8 bit values,

      can be combined with n+p,r,s modes)

    • ‘n’ Scale to negative peak rather than positive peak (e.g. 128.5 rather than 127.5 for 8 bit values,

      can be combined with o+p,r,s modes)

  • Format

    • ‘l’ Little endian data (Intel,DEC) (overrides indication in file)

    • ‘b’ Big endian data (non Intel/DEC) (overrides indication in file)

  • File I/O

    • ‘f’ Do not close file on exit

    • ‘d’ Look in data directory: voicebox(‘dir_data’)

    • ‘w’ Also read the annotation file *.wrd if present (as in TIMIT)

    • ‘t’ Also read the phonetic transcription file *.phn if present (as in TIMIT)

  • NMAX maximum number of samples to read (or -1 for unlimited [default])

  • NSKIP number of samples to skip from start of file (or -1 to continue from previous read when FFX

    is given instead of FILENAME [default])

Returns

a tupple such that (Y, FS)

Note

  • Y data matrix of dimension (samples,channels)

  • FS sample frequency in Hz

  • WRD{*,2} cell array with word annotations: WRD{*,:)={[t_start t_end],’text’} where times are in seconds

    only present if ‘w’ option is given

  • PHN{*,2} cell array with phoneme annotations: PHN{*,:)={[t_start t_end],’phoneme’} where times

    are in seconds only present if ‘t’ option is present

  • FFX Cell array containing

    1. filename

    2. header information

    1. first header field name

    2. first header field value

    3. format string (e.g. NIST_1A)

      1. file id

      2. current position in file

      3. dataoff byte offset in file to start of data

      4. order byte order (l or b)

      5. nsamp number of samples

      6. number of channels

      7. nbytes bytes per data value

      8. bits number of bits of precision

      9. fs sample frequency

      10. min value

      11. max value

      12. coding 0=PCM,1=uLAW + 0=no compression, 0=shorten,20=wavpack,30=shortpack

      13. file not yet decompressed

    4. temporary filename

If no output parameters are specified, header information will be printed. The code to decode shorten-encoded files, is not yet released with this toolkit.

frontend.io.read_spro4(input_file_name, label_file_name='', selected_label='', frame_per_second=100)[source]

Read a feature stream in SPRO4 format

Parameters
  • input_file_name – name of the feature file to read from

  • label_file_name – name of the label file to read if required. By Default, the method assumes no label to read from.

  • selected_label – label to select in the label file. Default is none.

  • frame_per_second – number of frame per seconds. Used to convert the frame number into time. Default is 0.

Returns

a sequence of features in a numpy array

frontend.io.read_spro4_segment(input_file_name, start=0, end=None)[source]

Read a segment from a stream in SPRO4 format. Return the features in the range start:end In case the start and end cannot be reached, the first or last feature are copied so that the length of the returned segment is always end-start

Parameters
  • input_file_name – name of the feature file to read from

  • start – index of the first frame to read (start at zero)

  • end – index of the last frame following the segment to read. end < 0 means that end is the value of the right_context to add at the end of the file

Returns

a sequence of features in a ndarray of length end-start

frontend.io.read_wav(input_file_name)[source]
Parameters

input_file_name

Returns

frontend.io.write_hdf5(show, fh, cep, cep_mean, cep_std, energy, energy_mean, energy_std, fb, fb_mean, fb_std, bnf, bnf_mean, bnf_std, label, compression='percentile')[source]
Parameters
  • show – identifier of the show to write

  • fh – HDF5 file handler

  • cep – cepstral coefficients to store

  • cep_mean – pre-computed mean of the cepstral coefficient

  • cep_std – pre-computed standard deviation of the cepstral coefficient

  • energy – energy coefficients to store

  • energy_mean – pre-computed mean of the energy

  • energy_std – pre-computed standard deviation of the energy

  • fb – filter-banks coefficients to store

  • fb_mean – pre-computed mean of the filter bank coefficient

  • fb_std – pre-computed standard deviation of the filter bank coefficient

  • bnf – bottle-neck features to store

  • bnf_mean – pre-computed mean of the bottleneck features

  • bnf_std – pre-computed standard deviation of the bottleneck features

  • label – vad labels to store

  • compressed – boolean, default is False

Returns