StatServer¶

class statserver.StatServer(statserver_file_name=None, distrib_nb=0, feature_size=0, index=None, ubm=None)[source]¶

A class for statistic storage and processing

Attr modelset: list of model IDs for each session as an array of strings
Attr segset: the list of session IDs as an array of strings
Attr start: index of the first frame of the segment
Attr stop: index of the last frame of the segment
Attr stat0: a ndarray of float64. Each line contains 0-order statistics from the corresponding session
Attr stat1: a ndarray of float64. Each line contains 1-order statistics from the corresponding session

accumulate_stat(**kwargs)¶

Parameters

args –
kwargs –

Returns

adapt_mean_map(ubm, r=16, norm=False)[source]¶

Maximum A Posteriori adaptation of the mean super-vector of ubm,: train one model per segment.

Parameters

ubm – a Mixture object to adapt
r – float, the relevant factor for MAP adaptation
norm – boolean, normalize by using the UBM co-variance. Default is False

Returns

a StatServer with 1 as stat0 and the MAP adapted super-vectors as stat1

adapt_mean_map_multisession(ubm, r=16, norm=False)[source]¶

Maximum A Posteriori adaptation of the mean super-vector of ubm,: train one model per model in the modelset by summing the statistics of the multiple segments.

Parameters

ubm – a Mixture object to adapt
r – float, the relevant factor for MAP adaptation
norm – boolean, normalize by using the UBM co-variance. Default is False

Returns

a StatServer with 1 as stat0 and the MAP adapted super-vectors as stat1

align_models(model_list)[source]¶

Align models of the current StatServer to match a list of models: provided as input parameter. The size of the StatServer might be reduced to match the input list of models.

Parameters: model_list – ndarray of strings, list of models to match

align_segments(segment_list)[source]¶

Align segments of the current StatServer to match a list of segment: provided as input parameter. The size of the StatServer might be reduced to match the input list of segments.

Parameters: segment_list – ndarray of strings, list of segments to match

center_stat1(mu)[source]¶

Center first order statistics.

Parameters: mu – array to center on.

estimate_between_class(itNb, V, mean, sigma_obs, batch_size=100, Ux=None, Dz=None, minDiv=True, num_thread=1, re_estimate_residual=False, save_partial=False)[source]¶

Estimate the factor loading matrix for the between class covariance

Parameters

itNb –
V – initial between class covariance matrix
mean – global mean vector
sigma_obs – covariance matrix of the input data
batch_size – size of the batches to process one by one to reduce the memory usage
Ux – statserver of supervectors
Dz – statserver of supervectors
minDiv – boolean, if True run the minimum divergence step after maximization
num_thread – number of parallel process to run
re_estimate_residual – boolean, if True the residual covariance matrix is re-estimated (for PLDA)
save_partial – boolean, if True, save FA model for each iteration

Returns

the within class factor loading matrix

estimate_hidden(mean, sigma, V=None, U=None, D=None, batch_size=100, num_thread=1)[source]¶: Assume that the statistics have not been whitened :param mean: global mean of the data to subtract :param sigma: residual covariance matrix of the Factor Analysis model :param V: between class covariance matrix :param U: within class covariance matrix :param D: MAP covariance matrix :param batch_size: size of the batches used to reduce memory footprint :param num_thread: number of parallel process to run

estimate_map(itNb, D, mean, Sigma, Vy=None, Ux=None, num_thread=1, save_partial=False)[source]¶

Parameters

itNb – number of iterations to estimate the MAP covariance matrix
D – Maximum a Posteriori marix to estimate
mean – mean of the input parameters
Sigma – residual covariance matrix
Vy – statserver of supervectors
Ux – statserver of supervectors
num_thread – number of parallel process to run
save_partial – boolean, if True save MAP matrix after each iteration

Returns

the MAP covariance matrix into a vector as it is diagonal

estimate_spectral_norm_stat1(it=1, mode='efr')[source]¶

Compute meta-parameters for Spectral Normalization as described

in [Bousquet11]

Can be used to perform Eigen Factor Radial or Spherical Nuisance Normalization. Default behavior is equivalent to Length Norm as described in [Garcia-Romero11]

Statistics are transformed while the meta-parameters are estimated.

Parameters

it – integer, number of iterations to perform
mode – string, can be - efr for Eigen Factor Radial - sphNorm, for Spherical Nuisance Normalization

Returns

a tupple of two lists: - a list of mean vectors - a list of co-variance matrices as ndarrays

estimate_within_class(it_nb, U, mean, sigma_obs, batch_size=100, Vy=None, Dz=None, min_div=True, num_thread=1, save_partial=False)[source]¶

Estimate the factor loading matrix for the within class covariance

Parameters

it_nb – number of iterations to estimate the within class covariance matrix
U – initial within class covariance matrix
mean – mean of the input data
sigma_obs – co-variance matrix of the input data
batch_size – number of sessions to process per batch to optimize memory usage
Vy – statserver of supervectors
Dz – statserver of supervectors
min_div – boolean, if True run the minimum divergence step after maximization
num_thread – number of parallel process to run
save_partial – boolean, if True, save FA model for each iteration

Returns

the within class factor loading matrix

factor_analysis(rank_f, rank_g=0, rank_h=None, re_estimate_residual=False, it_nb=10, 10, 10, min_div=True, ubm=None, batch_size=100, num_thread=1, save_partial=False, init_matrices=None, None, None)[source]¶

Parameters

rank_f – rank of the between class variability matrix
rank_g – rank of the within class variab1ility matrix
rank_h – boolean, if True, estimate the residual covariance matrix. Default is False
re_estimate_residual – boolean, if True, the residual covariance matrix is re-estimated (use for PLDA)
it_nb – tupple of three integers; number of iterations to run for F, G, H estimation
min_div – boolean, if True, re-estimate the covariance matrices according to the minimum divergence criteria
batch_size – number of sessions to process in one batch or memory optimization
num_thread – number of thread to run in parallel
ubm – origin of the space; should be None for PLDA and be a Mixture object for JFA or TV
save_partial – name of the file to save intermediate models, if True, save before each split of the distributions
init_matrices – tuple of three optional matrices to initialize the model, default is (None, None, None)

Returns

three matrices, the between class factor loading matrix, the within class factor loading matrix the diagonal MAP matrix (as a vector) and the residual covariance matrix

generator()[source]¶: Create a generator which yield stat0, stat1, of one session at a time

get_between_covariance_stat1()[source]¶

Compute and return the between-class covariance matrix of the: first-order statistics.

Returns: the between-class co-variance matrix of the first-order statistics as a ndarray.

get_lda_matrix_stat1(rank)[source]¶

Compute and return the Linear Discriminant Analysis matrix: on the first-order statistics. Columns of the LDA matrix are ordered according to the corresponding eigenvalues in descending order.

Parameters: rank – integer, rank of the LDA matrix to return
Returns: the LDA matrix of rank “rank” as a ndarray

get_mahalanobis_matrix_stat1()[source]¶

Compute and return Mahalanobis matrix of first-order statistics.

Returns: the mahalanobis matrix computed on the first-order statistics as a ndarray

get_mean_stat1()[source]¶

Return the mean of first order statistics

return: the mean array of the first order statistics.

get_model_segments(mod_id)[source]¶

Return the list of segments belonging to model modID

Parameters: mod_id – string, ID of the model which belonging segments will be returned
Returns: a list of segments belonging to the model

get_model_segments_by_index(mod_idx)[source]¶

Return the list of segments belonging to model number modIDX

Parameters: mod_idx – index of the model which list of segments will be returned
Returns: a list of segments belonging to the model

get_model_stat0(mod_id)[source]¶

Return zero-order statistics of a given model

Parameters: mod_id – ID of the model which stat0 will be returned
Returns: a matrix of zero-order statistics as a ndarray

get_model_stat0_by_index(mod_idx)[source]¶

Return zero-order statistics of model number modIDX

Parameters: mod_idx – integer, index of the unique model which stat0 will be returned
Returns: a matrix of zero-order statistics as a ndarray

get_model_stat1(mod_id)[source]¶

Return first-order statistics of a given model

Parameters: mod_id – string, ID of the model which stat1 will be returned
Returns: a matrix of first-order statistics as a ndarray

get_model_stat1_by_index(mod_idx)[source]¶

Return first-order statistics of model number modIDX

Parameters: mod_idx – integer, index of the unique model which stat1 will be returned
Returns: a matrix of first-order statistics as a ndarray

get_nap_matrix_stat1(co_rank)[source]¶

Compute return the Nuisance Attribute Projection matrix: from first-order statistics.

Parameters: co_rank – co-rank of the Nuisance Attribute Projection matrix
Returns: the NAP matrix of rank “coRank”

get_segment_stat0(seg_id)[source]¶

Return zero-order statistics of segment which ID is segID

Parameters: seg_id – string, ID of the segment which stat0 will be returned
Returns: a matrix of zero-order statistics as a ndarray

get_segment_stat0_by_index(seg_idx)[source]¶

Return zero-order statistics of segment number segIDX

Parameters: seg_idx – integer, index of the unique segment which stat0 will be returned
Returns: a matrix of zero-order statistics as a ndarray

get_segment_stat1(seg_id)[source]¶

Return first-order statistics of segment which ID is segID

Parameters: seg_id – string, ID of the segment which stat1 will be returned
Returns: a matrix of first-order statistics as a ndarray

get_segment_stat1_by_index(seg_idx)[source]¶

Return first-order statistics of segment number segIDX

Parameters: seg_idx – integer, index of the unique segment which stat1 will be returned
Returns: a matrix of first-order statistics as a ndarray

get_total_covariance_stat1()[source]¶

Compute and return the total covariance matrix of the first-order: statistics.

Returns: the total co-variance matrix of the first-order statistics as a ndarray.

get_wccn_choleski_stat1()[source]¶

Compute and return the lower Cholesky decomposition matrix of the: Within Class Co-variance Normalization matrix on the first-order statistics.

Returns: the lower Choleski decomposition of the WCCN matrix as a ndarray

get_within_covariance_stat1()[source]¶

Compute and return the within-class covariance matrix of the: first-order statistics.

Returns: the within-class co-variance matrix of the first-order statistics as a ndarray.

ivector_extraction_eigen_decomposition(ubm, Q, D_bar_c, Tnorm, delta=array([], dtype=float64))[source]¶

Compute i-vectors using the eigen decomposition approximation.: For more information, refers to[Glembeck09]_

Parameters

ubm – a Mixture used as UBM for i-vector estimation
Q – Q matrix as described in [Glembeck11]
D_bar_c – matrices as described in [Glembeck11]
Tnorm – total variability matrix pre-normalized using the co-variance of the UBM
delta – men vector if re-estimated using minimum divergence criteria

Returns

a StatServer which zero-order statistics are 1 and first-order statistics are approximated i-vectors.

ivector_extraction_weight(ubm, W, Tnorm, delta=array([], dtype=float64))[source]¶

Compute i-vectors using the ubm weight approximation.

For more information, refers to:

Glembeck, O.; Burget, L.; Matejka, P.; Karafiat, M. & Kenny, P. “Simplification and optimization of I-Vector extraction,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2011, 4516-4519

Parameters

ubm – a Mixture used as UBM for i-vector estimation
W – fix matrix pre-computed using the weights from the UBM and the total variability matrix
Tnorm – total variability matrix pre-normalized using the co-variance of the UBM
delta – men vector if re-estimated using minimum divergence criteria

Returns

a StatServer which zero-order statistics are 1 and first-order statistics are approximated i-vectors.

mean_stat_per_model()[source]¶

Average the zero- and first-order statistics per model and store them in a new StatServer.

Returns: a StatServer with the statistics averaged per model

merge()[source]¶: Merge a variable number of StatServers into one. If a pair segmentID is duplicated, keep ony one of them and raises a WARNING

norm_stat1()[source]¶: Divide all first-order statistics by their euclidian norm.

precompute_svm_kernel_stat1()[source]¶

Pre-compute the Kernel for SVM training and testing,: the output parameter is a matrix that only contains the impostor part of the Kernel. This one has to be completed by the target-dependent part during training and testing.

Returns: the impostor part of the SVM Graam matrix as a ndarray

static read(statserver_file_name, prefix='')[source]¶

Read StatServer in hdf5 format

Parameters

statserver_file_name – name of the file to read from
prefix – prefixe of the dataset to read from in HDF5 file

static read_subset(statserver_filename, index, prefix='')[source]¶

Given a statserver in HDF5 format stored on disk and an IdMap, create a StatServer object filled with sessions corresponding to the IdMap.

Parameters

statserver_filename – name of the statserver in hdf5 format to read from
index – the IdMap of sessions to load or an array of index to load
prefix – prefix of the group in HDF5 file

Returns

a StatServer

rotate_stat1(R)[source]¶

Rotate first-order statistics by a right-product.

Parameters: R – ndarray, matrix to use for right product on the first order statistics.

spectral_norm_stat1(spectral_norm_mean, spectral_norm_cov, is_sqr_inv_sigma=False)[source]¶

Apply Spectral Sormalization to all first order statistics.

See more details in [Bousquet11]

The number of iterations performed is equal to the length of the input lists.

Parameters

spectral_norm_mean – a list of mean vectors
spectral_norm_cov – a list of co-variance matrices as ndarrays
is_sqr_inv_sigma – boolean, True if

subtract_weighted_stat1(sts)[source]¶

Subtract the stat1 from from the sts StatServer to the stat1 of the current StatServer after multiplying by the zero-order statistics from the current statserver

Parameters: sts – a StatServer
Returns: a new StatServer

sum_stat_per_model()[source]¶

Sum the zero- and first-order statistics per model and store them in a new StatServer.

Returns: a StatServer with the statistics summed per model

to_hdf5(h5f, prefix='', mode='w')[source]¶

Write the StatServer to disk in hdf5 format.

Parameters

output_file_name – name of the file to write in.
prefix –

validate(warn=False)[source]¶

Validate the structure and content of the StatServer. Check consistency between the different attributes of the StatServer: - dimension of the modelset - dimension of the segset - length of the modelset and segset - consistency of stat0 and stat1

Parameters: warn – bollean optional, if True, display possible warning

whiten_cholesky_stat1(mu, sigma)[source]¶

Whiten first-order statistics by using Cholesky decomposition of Sigma

Parameters

mu – array, mean vector to be subtracted from the statistics
sigma – narray, co-variance matrix or covariance super-vector

whiten_stat1(mu, sigma, isSqrInvSigma=False)[source]¶

Whiten first-order statistics If sigma.ndim == 1, case of a diagonal covariance If sigma.ndim == 2, case of a single Gaussian with full covariance If sigma.ndim == 3, case of a full covariance UBM

Parameters

mu – array, mean vector to be subtracted from the statistics
sigma – narray, co-variance matrix or covariance super-vector
isSqrInvSigma – boolean, True if the input Sigma matrix is the inverse of the square root of a covariance matrix

Previous topic

Next topic

This Page

StatServer¶