Mixture

class mixture.Mixture(mixture_file_name='', name='empty')[source]

A class for Gaussian Mixture Model storage. For more details about Gaussian Mixture Models (GMM) you can refer to [Bimbot04].

Attr w

array of weight parameters

Attr mu

ndarray of mean parameters, each line is one distribution

Attr invcov

ndarray of inverse co-variance parameters, 2-dimensional for diagonal co-variance distribution 3-dimensional for full co-variance

Attr invchol

3-dimensional ndarray containing upper cholesky decomposition of the inverse co-variance matrices

Attr cst

array of constant computed for each distribution

Attr det

array of determinant for each distribution

EM_diag2full(diagonal_mixture, features_server, featureList, iterations=2, num_thread=1)[source]

Expectation-Maximization estimation of the Mixture parameters.

Parameters
  • features_server – sidekit.FeaturesServer used to load data

  • featureList – list of feature files to train the GMM

  • iterations – list of iteration number for each step of the learning process

  • num_thread – number of thread to launch for parallel computing

Return llk

a list of log-likelihoods obtained after each iteration

EM_split(features_server, feature_list, distrib_nb, iterations=1, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 8, 8, num_thread=1, llk_gain=0.01, save_partial=False, output_file_name='ubm', ceil_cov=10, floor_cov=0.01)[source]

Expectation-Maximization estimation of the Mixture parameters.

Parameters
  • features_server – sidekit.FeaturesServer used to load data

  • feature_list – list of feature files to train the GMM

  • distrib_nb – final number of distributions

  • iterations – list of iteration number for each step of the learning process

  • num_thread – number of thread to launch for parallel computing

  • llk_gain – limit of the training gain. Stop the training when gain between two iterations is less than this value

  • save_partial – name of the file to save intermediate mixtures, if True, save before each split of the distributions

  • ceil_cov

  • floor_cov

Return llk

a list of log-likelihoods obtained after each iteration

EM_uniform(cep, distrib_nb, iteration_min=3, iteration_max=10, llk_gain=0.01, do_init=True)[source]

Expectation-Maximization estimation of the Mixture parameters.

Parameters
  • cep – set of feature frames to consider

  • cep – set of feature frames to consider

  • distrib_nb – number of distributions

  • iteration_min – minimum number of iterations to perform

  • iteration_max – maximum number of iterations to perform

  • llk_gain – gain in term of likelihood, stop the training when the gain is less than this value

  • do_init – boolean, if True initialize the GMM from the training data

Return llk

a list of log-likelihoods obtained after each iteration

compute_log_posterior_probabilities(cep, mu=None)[source]

Compute log posterior probabilities for a set of feature frames.

Parameters
  • cep – a set of feature frames in a ndarray, one feature per row

  • mu – a mean super-vector to replace the ubm’s one. If it is an empty vector, use the UBM

Returns

A ndarray of log-posterior probabilities corresponding to the input feature set.

compute_log_posterior_probabilities_full(cep, mu=None)[source]

Compute log posterior probabilities for a set of feature frames.

Parameters
  • cep – a set of feature frames in a ndarray, one feature per row

  • mu – a mean super-vector to replace the ubm’s one. If it is an empty vector, use the UBM

Returns

A ndarray of log-posterior probabilities corresponding to the input feature set.

dim()[source]

Return the dimension of distributions of the Mixture

Returns

an integer, size of the acoustic vectors

distrib_nb()[source]

Return the number of distribution of the Mixture

Returns

the number of distribution in the Mixture

get_distrib_nb()[source]

Return the number of Gaussian distributions in the mixture :return: then number of distributions

get_invcov_super_vector()[source]

Return Inverse covariance super-vector

Returns

an array, super-vector of the inverse co-variance coefficients

get_mean_super_vector()[source]

Return mean super-vector

Returns

an array, super-vector of the mean coefficients

init_from_diag(diag_mixture)[source]
Parameters

diag_mixture

merge(model_list)[source]

Merge a list of Mixtures into a new one. Weights are normalized uniformly :param model_list: a list of Mixture objects to merge

read(mixture_file_name, prefix='')[source]

Read a Mixture in hdf5 format

Parameters
  • mixture_file_name – name of the file to read from

  • prefix

static read_alize(file_name)[source]
Parameters

file_name

Returns

static read_htk(filename, begin_hmm=False, state2=False)[source]

Read a Mixture in HTK format

Parameters
  • filename – name of the file to read from

  • begin_hmm – boolean

  • state2 – boolean

sv_size()[source]

Return the dimension of the super-vector

Returns

an integer, size of the mean super-vector

validate()[source]

Verify the format of the Mixture

Returns

a boolean giving the status of the Mixture

static variance_control(cov, flooring, ceiling, cov_ctl)[source]

variance_control for Mixture (florring and ceiling)

Parameters
  • cov – covariance to control

  • flooring – float, florring value

  • ceiling – float, ceiling value

  • cov_ctl – co-variance to consider for flooring and ceiling