StatServer¶

class
statserver.
StatServer
(statserver_file_name=None, distrib_nb=0, feature_size=0, index=None, ubm=None)[source]¶ A class for statistic storage and processing
 Attr modelset
list of model IDs for each session as an array of strings
 Attr segset
the list of session IDs as an array of strings
 Attr start
index of the first frame of the segment
 Attr stop
index of the last frame of the segment
 Attr stat0
a ndarray of float64. Each line contains 0order statistics from the corresponding session
 Attr stat1
a ndarray of float64. Each line contains 1order statistics from the corresponding session

accumulate_stat
(**kwargs)¶  Parameters
args –
kwargs –
 Returns

adapt_mean_map
(ubm, r=16, norm=False)[source]¶  Maximum A Posteriori adaptation of the mean supervector of ubm,
train one model per segment.
 Parameters
ubm – a Mixture object to adapt
r – float, the relevant factor for MAP adaptation
norm – boolean, normalize by using the UBM covariance. Default is False
 Returns
a StatServer with 1 as stat0 and the MAP adapted supervectors as stat1

adapt_mean_map_multisession
(ubm, r=16, norm=False)[source]¶  Maximum A Posteriori adaptation of the mean supervector of ubm,
train one model per model in the modelset by summing the statistics of the multiple segments.
 Parameters
ubm – a Mixture object to adapt
r – float, the relevant factor for MAP adaptation
norm – boolean, normalize by using the UBM covariance. Default is False
 Returns
a StatServer with 1 as stat0 and the MAP adapted supervectors as stat1

align_models
(model_list)[source]¶  Align models of the current StatServer to match a list of models
provided as input parameter. The size of the StatServer might be reduced to match the input list of models.
 Parameters
model_list – ndarray of strings, list of models to match

align_segments
(segment_list)[source]¶  Align segments of the current StatServer to match a list of segment
provided as input parameter. The size of the StatServer might be reduced to match the input list of segments.
 Parameters
segment_list – ndarray of strings, list of segments to match

estimate_between_class
(itNb, V, mean, sigma_obs, batch_size=100, Ux=None, Dz=None, minDiv=True, num_thread=1, re_estimate_residual=False, save_partial=False)[source]¶ Estimate the factor loading matrix for the between class covariance
 Parameters
itNb –
V – initial between class covariance matrix
mean – global mean vector
sigma_obs – covariance matrix of the input data
batch_size – size of the batches to process one by one to reduce the memory usage
Ux – statserver of supervectors
Dz – statserver of supervectors
minDiv – boolean, if True run the minimum divergence step after maximization
num_thread – number of parallel process to run
re_estimate_residual – boolean, if True the residual covariance matrix is reestimated (for PLDA)
save_partial – boolean, if True, save FA model for each iteration
 Returns
the within class factor loading matrix
Assume that the statistics have not been whitened :param mean: global mean of the data to subtract :param sigma: residual covariance matrix of the Factor Analysis model :param V: between class covariance matrix :param U: within class covariance matrix :param D: MAP covariance matrix :param batch_size: size of the batches used to reduce memory footprint :param num_thread: number of parallel process to run

estimate_map
(itNb, D, mean, Sigma, Vy=None, Ux=None, num_thread=1, save_partial=False)[source]¶  Parameters
itNb – number of iterations to estimate the MAP covariance matrix
D – Maximum a Posteriori marix to estimate
mean – mean of the input parameters
Sigma – residual covariance matrix
Vy – statserver of supervectors
Ux – statserver of supervectors
num_thread – number of parallel process to run
save_partial – boolean, if True save MAP matrix after each iteration
 Returns
the MAP covariance matrix into a vector as it is diagonal

estimate_spectral_norm_stat1
(it=1, mode='efr')[source]¶  Compute metaparameters for Spectral Normalization as described
in [Bousquet11]
Can be used to perform Eigen Factor Radial or Spherical Nuisance Normalization. Default behavior is equivalent to Length Norm as described in [GarciaRomero11]
Statistics are transformed while the metaparameters are estimated.
 Parameters
it – integer, number of iterations to perform
mode – string, can be  efr for Eigen Factor Radial  sphNorm, for Spherical Nuisance Normalization
 Returns
a tupple of two lists:  a list of mean vectors  a list of covariance matrices as ndarrays

estimate_within_class
(it_nb, U, mean, sigma_obs, batch_size=100, Vy=None, Dz=None, min_div=True, num_thread=1, save_partial=False)[source]¶ Estimate the factor loading matrix for the within class covariance
 Parameters
it_nb – number of iterations to estimate the within class covariance matrix
U – initial within class covariance matrix
mean – mean of the input data
sigma_obs – covariance matrix of the input data
batch_size – number of sessions to process per batch to optimize memory usage
Vy – statserver of supervectors
Dz – statserver of supervectors
min_div – boolean, if True run the minimum divergence step after maximization
num_thread – number of parallel process to run
save_partial – boolean, if True, save FA model for each iteration
 Returns
the within class factor loading matrix

factor_analysis
(rank_f, rank_g=0, rank_h=None, re_estimate_residual=False, it_nb=10, 10, 10, min_div=True, ubm=None, batch_size=100, num_thread=1, save_partial=False, init_matrices=None, None, None)[source]¶  Parameters
rank_f – rank of the between class variability matrix
rank_g – rank of the within class variab1ility matrix
rank_h – boolean, if True, estimate the residual covariance matrix. Default is False
re_estimate_residual – boolean, if True, the residual covariance matrix is reestimated (use for PLDA)
it_nb – tupple of three integers; number of iterations to run for F, G, H estimation
min_div – boolean, if True, reestimate the covariance matrices according to the minimum divergence criteria
batch_size – number of sessions to process in one batch or memory optimization
num_thread – number of thread to run in parallel
ubm – origin of the space; should be None for PLDA and be a Mixture object for JFA or TV
save_partial – name of the file to save intermediate models, if True, save before each split of the distributions
init_matrices – tuple of three optional matrices to initialize the model, default is (None, None, None)
 Returns
three matrices, the between class factor loading matrix, the within class factor loading matrix the diagonal MAP matrix (as a vector) and the residual covariance matrix

get_between_covariance_stat1
()[source]¶  Compute and return the betweenclass covariance matrix of the
firstorder statistics.
 Returns
the betweenclass covariance matrix of the firstorder statistics as a ndarray.

get_lda_matrix_stat1
(rank)[source]¶  Compute and return the Linear Discriminant Analysis matrix
on the firstorder statistics. Columns of the LDA matrix are ordered according to the corresponding eigenvalues in descending order.
 Parameters
rank – integer, rank of the LDA matrix to return
 Returns
the LDA matrix of rank “rank” as a ndarray

get_mahalanobis_matrix_stat1
()[source]¶ Compute and return Mahalanobis matrix of firstorder statistics.
 Returns
the mahalanobis matrix computed on the firstorder statistics as a ndarray

get_mean_stat1
()[source]¶ Return the mean of first order statistics
return: the mean array of the first order statistics.

get_model_segments
(mod_id)[source]¶ Return the list of segments belonging to model modID
 Parameters
mod_id – string, ID of the model which belonging segments will be returned
 Returns
a list of segments belonging to the model

get_model_segments_by_index
(mod_idx)[source]¶ Return the list of segments belonging to model number modIDX
 Parameters
mod_idx – index of the model which list of segments will be returned
 Returns
a list of segments belonging to the model

get_model_stat0
(mod_id)[source]¶ Return zeroorder statistics of a given model
 Parameters
mod_id – ID of the model which stat0 will be returned
 Returns
a matrix of zeroorder statistics as a ndarray

get_model_stat0_by_index
(mod_idx)[source]¶ Return zeroorder statistics of model number modIDX
 Parameters
mod_idx – integer, index of the unique model which stat0 will be returned
 Returns
a matrix of zeroorder statistics as a ndarray

get_model_stat1
(mod_id)[source]¶ Return firstorder statistics of a given model
 Parameters
mod_id – string, ID of the model which stat1 will be returned
 Returns
a matrix of firstorder statistics as a ndarray

get_model_stat1_by_index
(mod_idx)[source]¶ Return firstorder statistics of model number modIDX
 Parameters
mod_idx – integer, index of the unique model which stat1 will be returned
 Returns
a matrix of firstorder statistics as a ndarray

get_nap_matrix_stat1
(co_rank)[source]¶  Compute return the Nuisance Attribute Projection matrix
from firstorder statistics.
 Parameters
co_rank – corank of the Nuisance Attribute Projection matrix
 Returns
the NAP matrix of rank “coRank”

get_segment_stat0
(seg_id)[source]¶ Return zeroorder statistics of segment which ID is segID
 Parameters
seg_id – string, ID of the segment which stat0 will be returned
 Returns
a matrix of zeroorder statistics as a ndarray

get_segment_stat0_by_index
(seg_idx)[source]¶ Return zeroorder statistics of segment number segIDX
 Parameters
seg_idx – integer, index of the unique segment which stat0 will be returned
 Returns
a matrix of zeroorder statistics as a ndarray

get_segment_stat1
(seg_id)[source]¶ Return firstorder statistics of segment which ID is segID
 Parameters
seg_id – string, ID of the segment which stat1 will be returned
 Returns
a matrix of firstorder statistics as a ndarray

get_segment_stat1_by_index
(seg_idx)[source]¶ Return firstorder statistics of segment number segIDX
 Parameters
seg_idx – integer, index of the unique segment which stat1 will be returned
 Returns
a matrix of firstorder statistics as a ndarray

get_total_covariance_stat1
()[source]¶  Compute and return the total covariance matrix of the firstorder
statistics.
 Returns
the total covariance matrix of the firstorder statistics as a ndarray.

get_wccn_choleski_stat1
()[source]¶  Compute and return the lower Cholesky decomposition matrix of the
Within Class Covariance Normalization matrix on the firstorder statistics.
 Returns
the lower Choleski decomposition of the WCCN matrix as a ndarray

get_within_covariance_stat1
()[source]¶  Compute and return the withinclass covariance matrix of the
firstorder statistics.
 Returns
the withinclass covariance matrix of the firstorder statistics as a ndarray.

ivector_extraction_eigen_decomposition
(ubm, Q, D_bar_c, Tnorm, delta=array([], dtype=float64))[source]¶  Compute ivectors using the eigen decomposition approximation.
For more information, refers to[Glembeck09]_
 Parameters
ubm – a Mixture used as UBM for ivector estimation
Q – Q matrix as described in [Glembeck11]
D_bar_c – matrices as described in [Glembeck11]
Tnorm – total variability matrix prenormalized using the covariance of the UBM
delta – men vector if reestimated using minimum divergence criteria
 Returns
a StatServer which zeroorder statistics are 1 and firstorder statistics are approximated ivectors.

ivector_extraction_weight
(ubm, W, Tnorm, delta=array([], dtype=float64))[source]¶  Compute ivectors using the ubm weight approximation.
For more information, refers to:
Glembeck, O.; Burget, L.; Matejka, P.; Karafiat, M. & Kenny, P. “Simplification and optimization of IVector extraction,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2011, 45164519
 Parameters
ubm – a Mixture used as UBM for ivector estimation
W – fix matrix precomputed using the weights from the UBM and the total variability matrix
Tnorm – total variability matrix prenormalized using the covariance of the UBM
delta – men vector if reestimated using minimum divergence criteria
 Returns
a StatServer which zeroorder statistics are 1 and firstorder statistics are approximated ivectors.

mean_stat_per_model
()[source]¶ Average the zero and firstorder statistics per model and store them in a new StatServer.
 Returns
a StatServer with the statistics averaged per model

merge
()[source]¶ Merge a variable number of StatServers into one. If a pair segmentID is duplicated, keep ony one of them and raises a WARNING

precompute_svm_kernel_stat1
()[source]¶  Precompute the Kernel for SVM training and testing,
the output parameter is a matrix that only contains the impostor part of the Kernel. This one has to be completed by the targetdependent part during training and testing.
 Returns
the impostor part of the SVM Graam matrix as a ndarray

static
read
(statserver_file_name, prefix='')[source]¶ Read StatServer in hdf5 format
 Parameters
statserver_file_name – name of the file to read from
prefix – prefixe of the dataset to read from in HDF5 file

static
read_subset
(statserver_filename, index, prefix='')[source]¶ Given a statserver in HDF5 format stored on disk and an IdMap, create a StatServer object filled with sessions corresponding to the IdMap.
 Parameters
statserver_filename – name of the statserver in hdf5 format to read from
index – the IdMap of sessions to load or an array of index to load
prefix – prefix of the group in HDF5 file
 Returns
a StatServer

rotate_stat1
(R)[source]¶ Rotate firstorder statistics by a rightproduct.
 Parameters
R – ndarray, matrix to use for right product on the first order statistics.

spectral_norm_stat1
(spectral_norm_mean, spectral_norm_cov, is_sqr_inv_sigma=False)[source]¶  Apply Spectral Sormalization to all first order statistics.
See more details in [Bousquet11]
The number of iterations performed is equal to the length of the input lists.
 Parameters
spectral_norm_mean – a list of mean vectors
spectral_norm_cov – a list of covariance matrices as ndarrays
is_sqr_inv_sigma – boolean, True if

subtract_weighted_stat1
(sts)[source]¶ Subtract the stat1 from from the sts StatServer to the stat1 of the current StatServer after multiplying by the zeroorder statistics from the current statserver
 Parameters
sts – a StatServer
 Returns
a new StatServer

sum_stat_per_model
()[source]¶ Sum the zero and firstorder statistics per model and store them in a new StatServer.
 Returns
a StatServer with the statistics summed per model

to_hdf5
(h5f, prefix='', mode='w')[source]¶ Write the StatServer to disk in hdf5 format.
 Parameters
output_file_name – name of the file to write in.
prefix –

validate
(warn=False)[source]¶ Validate the structure and content of the StatServer. Check consistency between the different attributes of the StatServer:  dimension of the modelset  dimension of the segset  length of the modelset and segset  consistency of stat0 and stat1
 Parameters
warn – bollean optional, if True, display possible warning

whiten_cholesky_stat1
(mu, sigma)[source]¶ Whiten firstorder statistics by using Cholesky decomposition of Sigma
 Parameters
mu – array, mean vector to be subtracted from the statistics
sigma – narray, covariance matrix or covariance supervector

whiten_stat1
(mu, sigma, isSqrInvSigma=False)[source]¶ Whiten firstorder statistics If sigma.ndim == 1, case of a diagonal covariance If sigma.ndim == 2, case of a single Gaussian with full covariance If sigma.ndim == 3, case of a full covariance UBM
 Parameters
mu – array, mean vector to be subtracted from the statistics
sigma – narray, covariance matrix or covariance supervector
isSqrInvSigma – boolean, True if the input Sigma matrix is the inverse of the square root of a covariance matrix