Clustering

HAC BIC

class clustering.hac_bic.HAC_BIC(cep, table, alpha=1.0, sr=False)[source]

BIC Hierarchical Agglomerative Clustering (HAC) with gaussian models

The algorithm is based upon a hierarchical agglomerative clustering. The initial set of clusters is composed of one segment per cluster. Each cluster is modeled by a Gaussian with a full covariance matrix (see gauss.GaussFull). \Delta BIC measure is employed to select the candidate clusters to group as well as to stop the merging process. The two closest clusters i and j are merged at each iteration until \Delta BIC_{i,j} > 0.

\Delta BIC_{i,j} = PBIC_{i+j} - PBIC_{i} - PBIC_{j} - P

PBIC_{x}  = \frac{n_x}{2} \log|\Sigma_x|

cst  = \frac{1}{2} \alpha \left(d + \frac{d(d+1)}{2}\right)

P  = cst + log(n_i+n_j)

where |\Sigma_i|, |\Sigma_j| and |\Sigma| are the determinants of gaussians associated to the clusters i, j and i+j. \alpha is a parameter to set up. The penalty factor P depends on d, the dimension of the features, as well as on n_i and n_j, refering to the total length of cluster i and cluster j respectively.

_dist(mi, mj)[source]

Compute the BIC distance d(i,j) :param mi: a GaussFull object :param mj: a GaussFull object :return: float

_init_distance()[source]

Compute distance matrix

_init_train()[source]

Train initial models

_merge_model(mi, mj)[source]

Merge two a GaussFull objects :param mi: a GaussFull object :param mj: a GaussFull object :return: a GaussFull object

_update_dist(i)[source]

Update row and column i of the distance matrix :param i: int

perform(to_the_end=False, min_spk=1)[source]

perform the HAC algorithm :return: a Diar object and a dictonary mapping the old cluster_list to the new lables

HAC CLR

class clustering.hac_clr.HAC_CLR(features_server, diar, ubm, ce=False, ntop=5)[source]

CLR Hierarchical Agglomerative Clustering (HAC) with GMM trained by MAP

Tools

clustering.hac_utils.argmax(distances, nb)[source]

Get argmin and min indexes between 0 and nb of a distance matrix :param distances: a numpy.ndarray :param nb: int :return: row and column indexes, the value

clustering.hac_utils.argmin(distances, nb)[source]

Get argmin and min indexes between 0 and nb of a distance matrix :param distances: a numpy.ndarray :param nb: int :return: row and column indexes, the value

clustering.hac_utils.bic_square_root(ni, nj, alpha, dim)[source]

Compute a BIC square root distance described in [Stafylakis2010].

[Stafylakis2010]
  1. Stafylakis, V. Katsouros, and G. Carayannis. The segmental bayesian information criterion and its applications to speaker diarization. Selected Topics in Signal Processing, IEEE Journal of, 4(5):857–866, 2010.
Parameters:
  • ni – covariance matrix of speaker i
  • nj – covariance matrix of speaker j
  • alpha – a threshold
  • dim – the dimenssion of the features
Returns:

a float

clustering.hac_utils.idmap_remove(idmap, index)[source]

” remove data at position index :param index: the index to remove

clustering.hac_utils.roll(mat, j)[source]

delete the line j and column j in the matrix :param mat: numpy.ndarray :param j: int :return: numpy.ndarray

clustering.hac_utils.scores_remove(scores, index_model=None, index_seg=None)[source]

” remove data at position index_model and/or index_seg :param index_model: the index in model set to remove :param index_seg: the index in segment set to remove

clustering.hac_utils.stat_server_merge(stat_server, i, j, wi=1.0, wj=1.0)[source]

merge the ith and jth stat0 and stat1 into ith data, remove jth data :param i: index destination :param j: index removed

clustering.hac_utils.stat_server_remove(stat_server, index)[source]

” remove data at position index :param index: the index to remove