StatServer

class statserver.StatServer(statserver_file_name=None, distrib_nb=0, feature_size=0, index=None, ubm=None)[source]

A class for statistic storage and processing

Attr modelset

list of model IDs for each session as an array of strings

Attr segset

the list of session IDs as an array of strings

Attr start

index of the first frame of the segment

Attr stop

index of the last frame of the segment

Attr stat0

a ndarray of float64. Each line contains 0-order statistics from the corresponding session

Attr stat1

a ndarray of float64. Each line contains 1-order statistics from the corresponding session

accumulate_stat(**kwargs)
Parameters
  • args

  • kwargs

Returns

adapt_mean_map(ubm, r=16, norm=False)[source]
Maximum A Posteriori adaptation of the mean super-vector of ubm,

train one model per segment.

Parameters
  • ubm – a Mixture object to adapt

  • r – float, the relevant factor for MAP adaptation

  • norm – boolean, normalize by using the UBM co-variance. Default is False

Returns

a StatServer with 1 as stat0 and the MAP adapted super-vectors as stat1

adapt_mean_map_multisession(ubm, r=16, norm=False)[source]
Maximum A Posteriori adaptation of the mean super-vector of ubm,

train one model per model in the modelset by summing the statistics of the multiple segments.

Parameters
  • ubm – a Mixture object to adapt

  • r – float, the relevant factor for MAP adaptation

  • norm – boolean, normalize by using the UBM co-variance. Default is False

Returns

a StatServer with 1 as stat0 and the MAP adapted super-vectors as stat1

align_models(model_list)[source]
Align models of the current StatServer to match a list of models

provided as input parameter. The size of the StatServer might be reduced to match the input list of models.

Parameters

model_list – ndarray of strings, list of models to match

align_segments(segment_list)[source]
Align segments of the current StatServer to match a list of segment

provided as input parameter. The size of the StatServer might be reduced to match the input list of segments.

Parameters

segment_list – ndarray of strings, list of segments to match

center_stat1(mu)[source]

Center first order statistics.

Parameters

mu – array to center on.

estimate_between_class(itNb, V, mean, sigma_obs, batch_size=100, Ux=None, Dz=None, minDiv=True, num_thread=1, re_estimate_residual=False, save_partial=False)[source]

Estimate the factor loading matrix for the between class covariance

Parameters
  • itNb

  • V – initial between class covariance matrix

  • mean – global mean vector

  • sigma_obs – covariance matrix of the input data

  • batch_size – size of the batches to process one by one to reduce the memory usage

  • Ux – statserver of supervectors

  • Dz – statserver of supervectors

  • minDiv – boolean, if True run the minimum divergence step after maximization

  • num_thread – number of parallel process to run

  • re_estimate_residual – boolean, if True the residual covariance matrix is re-estimated (for PLDA)

  • save_partial – boolean, if True, save FA model for each iteration

Returns

the within class factor loading matrix

estimate_hidden(mean, sigma, V=None, U=None, D=None, batch_size=100, num_thread=1)[source]

Assume that the statistics have not been whitened :param mean: global mean of the data to subtract :param sigma: residual covariance matrix of the Factor Analysis model :param V: between class covariance matrix :param U: within class covariance matrix :param D: MAP covariance matrix :param batch_size: size of the batches used to reduce memory footprint :param num_thread: number of parallel process to run

estimate_map(itNb, D, mean, Sigma, Vy=None, Ux=None, num_thread=1, save_partial=False)[source]
Parameters
  • itNb – number of iterations to estimate the MAP covariance matrix

  • D – Maximum a Posteriori marix to estimate

  • mean – mean of the input parameters

  • Sigma – residual covariance matrix

  • Vy – statserver of supervectors

  • Ux – statserver of supervectors

  • num_thread – number of parallel process to run

  • save_partial – boolean, if True save MAP matrix after each iteration

Returns

the MAP covariance matrix into a vector as it is diagonal

estimate_spectral_norm_stat1(it=1, mode='efr')[source]
Compute meta-parameters for Spectral Normalization as described

in [Bousquet11]

Can be used to perform Eigen Factor Radial or Spherical Nuisance Normalization. Default behavior is equivalent to Length Norm as described in [Garcia-Romero11]

Statistics are transformed while the meta-parameters are estimated.

Parameters
  • it – integer, number of iterations to perform

  • mode – string, can be - efr for Eigen Factor Radial - sphNorm, for Spherical Nuisance Normalization

Returns

a tupple of two lists: - a list of mean vectors - a list of co-variance matrices as ndarrays

estimate_within_class(it_nb, U, mean, sigma_obs, batch_size=100, Vy=None, Dz=None, min_div=True, num_thread=1, save_partial=False)[source]

Estimate the factor loading matrix for the within class covariance

Parameters
  • it_nb – number of iterations to estimate the within class covariance matrix

  • U – initial within class covariance matrix

  • mean – mean of the input data

  • sigma_obs – co-variance matrix of the input data

  • batch_size – number of sessions to process per batch to optimize memory usage

  • Vy – statserver of supervectors

  • Dz – statserver of supervectors

  • min_div – boolean, if True run the minimum divergence step after maximization

  • num_thread – number of parallel process to run

  • save_partial – boolean, if True, save FA model for each iteration

Returns

the within class factor loading matrix

factor_analysis(rank_f, rank_g=0, rank_h=None, re_estimate_residual=False, it_nb=10, 10, 10, min_div=True, ubm=None, batch_size=100, num_thread=1, save_partial=False, init_matrices=None, None, None)[source]
Parameters
  • rank_f – rank of the between class variability matrix

  • rank_g – rank of the within class variab1ility matrix

  • rank_h – boolean, if True, estimate the residual covariance matrix. Default is False

  • re_estimate_residual – boolean, if True, the residual covariance matrix is re-estimated (use for PLDA)

  • it_nb – tupple of three integers; number of iterations to run for F, G, H estimation

  • min_div – boolean, if True, re-estimate the covariance matrices according to the minimum divergence criteria

  • batch_size – number of sessions to process in one batch or memory optimization

  • num_thread – number of thread to run in parallel

  • ubm – origin of the space; should be None for PLDA and be a Mixture object for JFA or TV

  • save_partial – name of the file to save intermediate models, if True, save before each split of the distributions

  • init_matrices – tuple of three optional matrices to initialize the model, default is (None, None, None)

Returns

three matrices, the between class factor loading matrix, the within class factor loading matrix the diagonal MAP matrix (as a vector) and the residual covariance matrix

generator()[source]

Create a generator which yield stat0, stat1, of one session at a time

get_between_covariance_stat1()[source]
Compute and return the between-class covariance matrix of the

first-order statistics.

Returns

the between-class co-variance matrix of the first-order statistics as a ndarray.

get_lda_matrix_stat1(rank)[source]
Compute and return the Linear Discriminant Analysis matrix

on the first-order statistics. Columns of the LDA matrix are ordered according to the corresponding eigenvalues in descending order.

Parameters

rank – integer, rank of the LDA matrix to return

Returns

the LDA matrix of rank “rank” as a ndarray

get_mahalanobis_matrix_stat1()[source]

Compute and return Mahalanobis matrix of first-order statistics.

Returns

the mahalanobis matrix computed on the first-order statistics as a ndarray

get_mean_stat1()[source]

Return the mean of first order statistics

return: the mean array of the first order statistics.

get_model_segments(mod_id)[source]

Return the list of segments belonging to model modID

Parameters

mod_id – string, ID of the model which belonging segments will be returned

Returns

a list of segments belonging to the model

get_model_segments_by_index(mod_idx)[source]

Return the list of segments belonging to model number modIDX

Parameters

mod_idx – index of the model which list of segments will be returned

Returns

a list of segments belonging to the model

get_model_stat0(mod_id)[source]

Return zero-order statistics of a given model

Parameters

mod_id – ID of the model which stat0 will be returned

Returns

a matrix of zero-order statistics as a ndarray

get_model_stat0_by_index(mod_idx)[source]

Return zero-order statistics of model number modIDX

Parameters

mod_idx – integer, index of the unique model which stat0 will be returned

Returns

a matrix of zero-order statistics as a ndarray

get_model_stat1(mod_id)[source]

Return first-order statistics of a given model

Parameters

mod_id – string, ID of the model which stat1 will be returned

Returns

a matrix of first-order statistics as a ndarray

get_model_stat1_by_index(mod_idx)[source]

Return first-order statistics of model number modIDX

Parameters

mod_idx – integer, index of the unique model which stat1 will be returned

Returns

a matrix of first-order statistics as a ndarray

get_nap_matrix_stat1(co_rank)[source]
Compute return the Nuisance Attribute Projection matrix

from first-order statistics.

Parameters

co_rank – co-rank of the Nuisance Attribute Projection matrix

Returns

the NAP matrix of rank “coRank”

get_segment_stat0(seg_id)[source]

Return zero-order statistics of segment which ID is segID

Parameters

seg_id – string, ID of the segment which stat0 will be returned

Returns

a matrix of zero-order statistics as a ndarray

get_segment_stat0_by_index(seg_idx)[source]

Return zero-order statistics of segment number segIDX

Parameters

seg_idx – integer, index of the unique segment which stat0 will be returned

Returns

a matrix of zero-order statistics as a ndarray

get_segment_stat1(seg_id)[source]

Return first-order statistics of segment which ID is segID

Parameters

seg_id – string, ID of the segment which stat1 will be returned

Returns

a matrix of first-order statistics as a ndarray

get_segment_stat1_by_index(seg_idx)[source]

Return first-order statistics of segment number segIDX

Parameters

seg_idx – integer, index of the unique segment which stat1 will be returned

Returns

a matrix of first-order statistics as a ndarray

get_total_covariance_stat1()[source]
Compute and return the total covariance matrix of the first-order

statistics.

Returns

the total co-variance matrix of the first-order statistics as a ndarray.

get_wccn_choleski_stat1()[source]
Compute and return the lower Cholesky decomposition matrix of the

Within Class Co-variance Normalization matrix on the first-order statistics.

Returns

the lower Choleski decomposition of the WCCN matrix as a ndarray

get_within_covariance_stat1()[source]
Compute and return the within-class covariance matrix of the

first-order statistics.

Returns

the within-class co-variance matrix of the first-order statistics as a ndarray.

ivector_extraction_eigen_decomposition(ubm, Q, D_bar_c, Tnorm, delta=array([], dtype=float64))[source]
Compute i-vectors using the eigen decomposition approximation.

For more information, refers to[Glembeck09]_

Parameters
  • ubm – a Mixture used as UBM for i-vector estimation

  • Q – Q matrix as described in [Glembeck11]

  • D_bar_c – matrices as described in [Glembeck11]

  • Tnorm – total variability matrix pre-normalized using the co-variance of the UBM

  • delta – men vector if re-estimated using minimum divergence criteria

Returns

a StatServer which zero-order statistics are 1 and first-order statistics are approximated i-vectors.

ivector_extraction_weight(ubm, W, Tnorm, delta=array([], dtype=float64))[source]
Compute i-vectors using the ubm weight approximation.

For more information, refers to:

Glembeck, O.; Burget, L.; Matejka, P.; Karafiat, M. & Kenny, P. “Simplification and optimization of I-Vector extraction,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2011, 4516-4519

Parameters
  • ubm – a Mixture used as UBM for i-vector estimation

  • W – fix matrix pre-computed using the weights from the UBM and the total variability matrix

  • Tnorm – total variability matrix pre-normalized using the co-variance of the UBM

  • delta – men vector if re-estimated using minimum divergence criteria

Returns

a StatServer which zero-order statistics are 1 and first-order statistics are approximated i-vectors.

mean_stat_per_model()[source]

Average the zero- and first-order statistics per model and store them in a new StatServer.

Returns

a StatServer with the statistics averaged per model

merge()[source]

Merge a variable number of StatServers into one. If a pair segmentID is duplicated, keep ony one of them and raises a WARNING

norm_stat1()[source]

Divide all first-order statistics by their euclidian norm.

precompute_svm_kernel_stat1()[source]
Pre-compute the Kernel for SVM training and testing,

the output parameter is a matrix that only contains the impostor part of the Kernel. This one has to be completed by the target-dependent part during training and testing.

Returns

the impostor part of the SVM Graam matrix as a ndarray

static read(statserver_file_name, prefix='')[source]

Read StatServer in hdf5 format

Parameters
  • statserver_file_name – name of the file to read from

  • prefix – prefixe of the dataset to read from in HDF5 file

static read_subset(statserver_filename, index, prefix='')[source]

Given a statserver in HDF5 format stored on disk and an IdMap, create a StatServer object filled with sessions corresponding to the IdMap.

Parameters
  • statserver_filename – name of the statserver in hdf5 format to read from

  • index – the IdMap of sessions to load or an array of index to load

  • prefix – prefix of the group in HDF5 file

Returns

a StatServer

rotate_stat1(R)[source]

Rotate first-order statistics by a right-product.

Parameters

R – ndarray, matrix to use for right product on the first order statistics.

spectral_norm_stat1(spectral_norm_mean, spectral_norm_cov, is_sqr_inv_sigma=False)[source]
Apply Spectral Sormalization to all first order statistics.

See more details in [Bousquet11]

The number of iterations performed is equal to the length of the input lists.

Parameters
  • spectral_norm_mean – a list of mean vectors

  • spectral_norm_cov – a list of co-variance matrices as ndarrays

  • is_sqr_inv_sigma – boolean, True if

subtract_weighted_stat1(sts)[source]

Subtract the stat1 from from the sts StatServer to the stat1 of the current StatServer after multiplying by the zero-order statistics from the current statserver

Parameters

sts – a StatServer

Returns

a new StatServer

sum_stat_per_model()[source]

Sum the zero- and first-order statistics per model and store them in a new StatServer.

Returns

a StatServer with the statistics summed per model

to_hdf5(h5f, prefix='', mode='w')[source]

Write the StatServer to disk in hdf5 format.

Parameters
  • output_file_name – name of the file to write in.

  • prefix

validate(warn=False)[source]

Validate the structure and content of the StatServer. Check consistency between the different attributes of the StatServer: - dimension of the modelset - dimension of the segset - length of the modelset and segset - consistency of stat0 and stat1

Parameters

warn – bollean optional, if True, display possible warning

whiten_cholesky_stat1(mu, sigma)[source]

Whiten first-order statistics by using Cholesky decomposition of Sigma

Parameters
  • mu – array, mean vector to be subtracted from the statistics

  • sigma – narray, co-variance matrix or covariance super-vector

whiten_stat1(mu, sigma, isSqrInvSigma=False)[source]

Whiten first-order statistics If sigma.ndim == 1, case of a diagonal covariance If sigma.ndim == 2, case of a single Gaussian with full covariance If sigma.ndim == 3, case of a full covariance UBM

Parameters
  • mu – array, mean vector to be subtracted from the statistics

  • sigma – narray, co-variance matrix or covariance super-vector

  • isSqrInvSigma – boolean, True if the input Sigma matrix is the inverse of the square root of a covariance matrix