StatServer¶
-
class
statserver.
StatServer
(statserver_file_name=None, distrib_nb=0, feature_size=0, index=None, ubm=None)[source]¶ A class for statistic storage and processing
- Attr modelset
list of model IDs for each session as an array of strings
- Attr segset
the list of session IDs as an array of strings
- Attr start
index of the first frame of the segment
- Attr stop
index of the last frame of the segment
- Attr stat0
a ndarray of float64. Each line contains 0-order statistics from the corresponding session
- Attr stat1
a ndarray of float64. Each line contains 1-order statistics from the corresponding session
-
accumulate_stat
(**kwargs)¶ - Parameters
args –
kwargs –
- Returns
-
adapt_mean_map
(ubm, r=16, norm=False)[source]¶ - Maximum A Posteriori adaptation of the mean super-vector of ubm,
train one model per segment.
- Parameters
ubm – a Mixture object to adapt
r – float, the relevant factor for MAP adaptation
norm – boolean, normalize by using the UBM co-variance. Default is False
- Returns
a StatServer with 1 as stat0 and the MAP adapted super-vectors as stat1
-
adapt_mean_map_multisession
(ubm, r=16, norm=False)[source]¶ - Maximum A Posteriori adaptation of the mean super-vector of ubm,
train one model per model in the modelset by summing the statistics of the multiple segments.
- Parameters
ubm – a Mixture object to adapt
r – float, the relevant factor for MAP adaptation
norm – boolean, normalize by using the UBM co-variance. Default is False
- Returns
a StatServer with 1 as stat0 and the MAP adapted super-vectors as stat1
-
align_models
(model_list)[source]¶ - Align models of the current StatServer to match a list of models
provided as input parameter. The size of the StatServer might be reduced to match the input list of models.
- Parameters
model_list – ndarray of strings, list of models to match
-
align_segments
(segment_list)[source]¶ - Align segments of the current StatServer to match a list of segment
provided as input parameter. The size of the StatServer might be reduced to match the input list of segments.
- Parameters
segment_list – ndarray of strings, list of segments to match
-
estimate_between_class
(itNb, V, mean, sigma_obs, batch_size=100, Ux=None, Dz=None, minDiv=True, num_thread=1, re_estimate_residual=False, save_partial=False)[source]¶ Estimate the factor loading matrix for the between class covariance
- Parameters
itNb –
V – initial between class covariance matrix
mean – global mean vector
sigma_obs – covariance matrix of the input data
batch_size – size of the batches to process one by one to reduce the memory usage
Ux – statserver of supervectors
Dz – statserver of supervectors
minDiv – boolean, if True run the minimum divergence step after maximization
num_thread – number of parallel process to run
re_estimate_residual – boolean, if True the residual covariance matrix is re-estimated (for PLDA)
save_partial – boolean, if True, save FA model for each iteration
- Returns
the within class factor loading matrix
Assume that the statistics have not been whitened :param mean: global mean of the data to subtract :param sigma: residual covariance matrix of the Factor Analysis model :param V: between class covariance matrix :param U: within class covariance matrix :param D: MAP covariance matrix :param batch_size: size of the batches used to reduce memory footprint :param num_thread: number of parallel process to run
-
estimate_map
(itNb, D, mean, Sigma, Vy=None, Ux=None, num_thread=1, save_partial=False)[source]¶ - Parameters
itNb – number of iterations to estimate the MAP covariance matrix
D – Maximum a Posteriori marix to estimate
mean – mean of the input parameters
Sigma – residual covariance matrix
Vy – statserver of supervectors
Ux – statserver of supervectors
num_thread – number of parallel process to run
save_partial – boolean, if True save MAP matrix after each iteration
- Returns
the MAP covariance matrix into a vector as it is diagonal
-
estimate_spectral_norm_stat1
(it=1, mode='efr')[source]¶ - Compute meta-parameters for Spectral Normalization as described
in [Bousquet11]
Can be used to perform Eigen Factor Radial or Spherical Nuisance Normalization. Default behavior is equivalent to Length Norm as described in [Garcia-Romero11]
Statistics are transformed while the meta-parameters are estimated.
- Parameters
it – integer, number of iterations to perform
mode – string, can be - efr for Eigen Factor Radial - sphNorm, for Spherical Nuisance Normalization
- Returns
a tupple of two lists: - a list of mean vectors - a list of co-variance matrices as ndarrays
-
estimate_within_class
(it_nb, U, mean, sigma_obs, batch_size=100, Vy=None, Dz=None, min_div=True, num_thread=1, save_partial=False)[source]¶ Estimate the factor loading matrix for the within class covariance
- Parameters
it_nb – number of iterations to estimate the within class covariance matrix
U – initial within class covariance matrix
mean – mean of the input data
sigma_obs – co-variance matrix of the input data
batch_size – number of sessions to process per batch to optimize memory usage
Vy – statserver of supervectors
Dz – statserver of supervectors
min_div – boolean, if True run the minimum divergence step after maximization
num_thread – number of parallel process to run
save_partial – boolean, if True, save FA model for each iteration
- Returns
the within class factor loading matrix
-
factor_analysis
(rank_f, rank_g=0, rank_h=None, re_estimate_residual=False, it_nb=10, 10, 10, min_div=True, ubm=None, batch_size=100, num_thread=1, save_partial=False, init_matrices=None, None, None)[source]¶ - Parameters
rank_f – rank of the between class variability matrix
rank_g – rank of the within class variab1ility matrix
rank_h – boolean, if True, estimate the residual covariance matrix. Default is False
re_estimate_residual – boolean, if True, the residual covariance matrix is re-estimated (use for PLDA)
it_nb – tupple of three integers; number of iterations to run for F, G, H estimation
min_div – boolean, if True, re-estimate the covariance matrices according to the minimum divergence criteria
batch_size – number of sessions to process in one batch or memory optimization
num_thread – number of thread to run in parallel
ubm – origin of the space; should be None for PLDA and be a Mixture object for JFA or TV
save_partial – name of the file to save intermediate models, if True, save before each split of the distributions
init_matrices – tuple of three optional matrices to initialize the model, default is (None, None, None)
- Returns
three matrices, the between class factor loading matrix, the within class factor loading matrix the diagonal MAP matrix (as a vector) and the residual covariance matrix
-
get_between_covariance_stat1
()[source]¶ - Compute and return the between-class covariance matrix of the
first-order statistics.
- Returns
the between-class co-variance matrix of the first-order statistics as a ndarray.
-
get_lda_matrix_stat1
(rank)[source]¶ - Compute and return the Linear Discriminant Analysis matrix
on the first-order statistics. Columns of the LDA matrix are ordered according to the corresponding eigenvalues in descending order.
- Parameters
rank – integer, rank of the LDA matrix to return
- Returns
the LDA matrix of rank “rank” as a ndarray
-
get_mahalanobis_matrix_stat1
()[source]¶ Compute and return Mahalanobis matrix of first-order statistics.
- Returns
the mahalanobis matrix computed on the first-order statistics as a ndarray
-
get_mean_stat1
()[source]¶ Return the mean of first order statistics
return: the mean array of the first order statistics.
-
get_model_segments
(mod_id)[source]¶ Return the list of segments belonging to model modID
- Parameters
mod_id – string, ID of the model which belonging segments will be returned
- Returns
a list of segments belonging to the model
-
get_model_segments_by_index
(mod_idx)[source]¶ Return the list of segments belonging to model number modIDX
- Parameters
mod_idx – index of the model which list of segments will be returned
- Returns
a list of segments belonging to the model
-
get_model_stat0
(mod_id)[source]¶ Return zero-order statistics of a given model
- Parameters
mod_id – ID of the model which stat0 will be returned
- Returns
a matrix of zero-order statistics as a ndarray
-
get_model_stat0_by_index
(mod_idx)[source]¶ Return zero-order statistics of model number modIDX
- Parameters
mod_idx – integer, index of the unique model which stat0 will be returned
- Returns
a matrix of zero-order statistics as a ndarray
-
get_model_stat1
(mod_id)[source]¶ Return first-order statistics of a given model
- Parameters
mod_id – string, ID of the model which stat1 will be returned
- Returns
a matrix of first-order statistics as a ndarray
-
get_model_stat1_by_index
(mod_idx)[source]¶ Return first-order statistics of model number modIDX
- Parameters
mod_idx – integer, index of the unique model which stat1 will be returned
- Returns
a matrix of first-order statistics as a ndarray
-
get_nap_matrix_stat1
(co_rank)[source]¶ - Compute return the Nuisance Attribute Projection matrix
from first-order statistics.
- Parameters
co_rank – co-rank of the Nuisance Attribute Projection matrix
- Returns
the NAP matrix of rank “coRank”
-
get_segment_stat0
(seg_id)[source]¶ Return zero-order statistics of segment which ID is segID
- Parameters
seg_id – string, ID of the segment which stat0 will be returned
- Returns
a matrix of zero-order statistics as a ndarray
-
get_segment_stat0_by_index
(seg_idx)[source]¶ Return zero-order statistics of segment number segIDX
- Parameters
seg_idx – integer, index of the unique segment which stat0 will be returned
- Returns
a matrix of zero-order statistics as a ndarray
-
get_segment_stat1
(seg_id)[source]¶ Return first-order statistics of segment which ID is segID
- Parameters
seg_id – string, ID of the segment which stat1 will be returned
- Returns
a matrix of first-order statistics as a ndarray
-
get_segment_stat1_by_index
(seg_idx)[source]¶ Return first-order statistics of segment number segIDX
- Parameters
seg_idx – integer, index of the unique segment which stat1 will be returned
- Returns
a matrix of first-order statistics as a ndarray
-
get_total_covariance_stat1
()[source]¶ - Compute and return the total covariance matrix of the first-order
statistics.
- Returns
the total co-variance matrix of the first-order statistics as a ndarray.
-
get_wccn_choleski_stat1
()[source]¶ - Compute and return the lower Cholesky decomposition matrix of the
Within Class Co-variance Normalization matrix on the first-order statistics.
- Returns
the lower Choleski decomposition of the WCCN matrix as a ndarray
-
get_within_covariance_stat1
()[source]¶ - Compute and return the within-class covariance matrix of the
first-order statistics.
- Returns
the within-class co-variance matrix of the first-order statistics as a ndarray.
-
ivector_extraction_eigen_decomposition
(ubm, Q, D_bar_c, Tnorm, delta=array([], dtype=float64))[source]¶ - Compute i-vectors using the eigen decomposition approximation.
For more information, refers to[Glembeck09]_
- Parameters
ubm – a Mixture used as UBM for i-vector estimation
Q – Q matrix as described in [Glembeck11]
D_bar_c – matrices as described in [Glembeck11]
Tnorm – total variability matrix pre-normalized using the co-variance of the UBM
delta – men vector if re-estimated using minimum divergence criteria
- Returns
a StatServer which zero-order statistics are 1 and first-order statistics are approximated i-vectors.
-
ivector_extraction_weight
(ubm, W, Tnorm, delta=array([], dtype=float64))[source]¶ - Compute i-vectors using the ubm weight approximation.
For more information, refers to:
Glembeck, O.; Burget, L.; Matejka, P.; Karafiat, M. & Kenny, P. “Simplification and optimization of I-Vector extraction,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2011, 4516-4519
- Parameters
ubm – a Mixture used as UBM for i-vector estimation
W – fix matrix pre-computed using the weights from the UBM and the total variability matrix
Tnorm – total variability matrix pre-normalized using the co-variance of the UBM
delta – men vector if re-estimated using minimum divergence criteria
- Returns
a StatServer which zero-order statistics are 1 and first-order statistics are approximated i-vectors.
-
mean_stat_per_model
()[source]¶ Average the zero- and first-order statistics per model and store them in a new StatServer.
- Returns
a StatServer with the statistics averaged per model
-
merge
()[source]¶ Merge a variable number of StatServers into one. If a pair segmentID is duplicated, keep ony one of them and raises a WARNING
-
precompute_svm_kernel_stat1
()[source]¶ - Pre-compute the Kernel for SVM training and testing,
the output parameter is a matrix that only contains the impostor part of the Kernel. This one has to be completed by the target-dependent part during training and testing.
- Returns
the impostor part of the SVM Graam matrix as a ndarray
-
static
read
(statserver_file_name, prefix='')[source]¶ Read StatServer in hdf5 format
- Parameters
statserver_file_name – name of the file to read from
prefix – prefixe of the dataset to read from in HDF5 file
-
static
read_subset
(statserver_filename, index, prefix='')[source]¶ Given a statserver in HDF5 format stored on disk and an IdMap, create a StatServer object filled with sessions corresponding to the IdMap.
- Parameters
statserver_filename – name of the statserver in hdf5 format to read from
index – the IdMap of sessions to load or an array of index to load
prefix – prefix of the group in HDF5 file
- Returns
a StatServer
-
rotate_stat1
(R)[source]¶ Rotate first-order statistics by a right-product.
- Parameters
R – ndarray, matrix to use for right product on the first order statistics.
-
spectral_norm_stat1
(spectral_norm_mean, spectral_norm_cov, is_sqr_inv_sigma=False)[source]¶ - Apply Spectral Sormalization to all first order statistics.
See more details in [Bousquet11]
The number of iterations performed is equal to the length of the input lists.
- Parameters
spectral_norm_mean – a list of mean vectors
spectral_norm_cov – a list of co-variance matrices as ndarrays
is_sqr_inv_sigma – boolean, True if
-
subtract_weighted_stat1
(sts)[source]¶ Subtract the stat1 from from the sts StatServer to the stat1 of the current StatServer after multiplying by the zero-order statistics from the current statserver
- Parameters
sts – a StatServer
- Returns
a new StatServer
-
sum_stat_per_model
()[source]¶ Sum the zero- and first-order statistics per model and store them in a new StatServer.
- Returns
a StatServer with the statistics summed per model
-
to_hdf5
(h5f, prefix='', mode='w')[source]¶ Write the StatServer to disk in hdf5 format.
- Parameters
output_file_name – name of the file to write in.
prefix –
-
validate
(warn=False)[source]¶ Validate the structure and content of the StatServer. Check consistency between the different attributes of the StatServer: - dimension of the modelset - dimension of the segset - length of the modelset and segset - consistency of stat0 and stat1
- Parameters
warn – bollean optional, if True, display possible warning
-
whiten_cholesky_stat1
(mu, sigma)[source]¶ Whiten first-order statistics by using Cholesky decomposition of Sigma
- Parameters
mu – array, mean vector to be subtracted from the statistics
sigma – narray, co-variance matrix or covariance super-vector
-
whiten_stat1
(mu, sigma, isSqrInvSigma=False)[source]¶ Whiten first-order statistics If sigma.ndim == 1, case of a diagonal covariance If sigma.ndim == 2, case of a single Gaussian with full covariance If sigma.ndim == 3, case of a full covariance UBM
- Parameters
mu – array, mean vector to be subtracted from the statistics
sigma – narray, co-variance matrix or covariance super-vector
isSqrInvSigma – boolean, True if the input Sigma matrix is the inverse of the square root of a covariance matrix